Will

  

3

ﬁg “WNWWWIN}UHINHIHIHIINHWHU

 

”7m LIBRARY
7 Michigan State
9‘5 vI University

 

 

 

This is to certify that the
thesis entitled

AUGMENTING INFORMATION CHANNELS TO IMPROVE
COCHLEAR IMPLANT PATIENTS PERFORMANCE UNDER
ADVERSE CONDITIONS

presented by

ESRAA MUSTAFA AL-SHAROA

has been accepted towards fulﬁllment
of the requirements for the

MASTER OF degree in ELECTRICAL ENGINEERING

 

 

  

 

 

SCIENCE
I'M ”waif-r s t
/J_ r rQ/ésso s igna ure
r :2 [3 Li Z :27
K, . / .I

Date

MSU is an afﬁrmative-action, equal-opportunity employer

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DAIEDUE

DATEDUE

DAIEDUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:/CIRC/DateDue.inddop.1

Augmenting Information Channels to Improve Cochlear
Implant Patients Performance Under Adverse Conditions

By

Esraa Mustafa Al-sharoa

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER’S OF SCIENCE

Electrical Engineering

2007

ABSTRACT

Augmenting Information Channels to Improve Cochlear Implant Patients
Performance Under Adverse Conditions
By

Esraa Mustafa Al-sharoa

Cochlear Implant patients perform reasonably well in acoustically pristine envi-
ronments. However, performance degrades signiﬁcantly under adverse conditions,
particularly within speech-like noisy surroundings. This thesis addresses the problem
of resolving starts and ends of spoken words to improve speech intelligibility by CI
patients under severe adverse conditions.

We propose a new approach based on sparse representation of speech signals. The
approach is based on the discrete wavelet packet decomposition stemming from its
excellent ability to capture transient signals. The obtained sparse representation is
parameterized using a Gaussian mixture model to yield a feature set for classiﬁca-
tion and clustering purposes. We compare two methods to classify the start and
end segments of spoken words in a noisy environment. The ﬁrst relies on exploiting
second order statistics of the sparsely represented signals, while the second relies on
an expectation maximization approach to the Gaussian Mixture Model. We test the
performance of both methods under various signal and noise conditions. Our prelimi-
nary results demonstrate that the sparse representation can capture eminent features
in the spoken words that are indicative of start and end of the word. The proposed
approach can be useful in driving cochlear implant signal transduction mechanisms
with new features that are more robust to adverse conditions than the classical filter

bank approach commonly used in current technology.

To my beloved husband, son, and parents

iii

ACKNOWLEDGMENTS

I would like to thank my advisor Dr. Karim Oweiss, who without his guidance and
valuable comments and inputs this thesis would not have been possible. In addition,
I am very grateful to the committee members Dr. Hayder Radha, and Dr. Rong
Jin for their useful comments. I would like also to thank my husband Mahmood
Al-khassaweneh, my son Ahmad , and the coming daughter. With their help and
support I was able to ﬁnish this thesis. Finally, I would like to thank my parents
Mustafa Al-sharoa and my Mother Shama Al-kofahi, all my brothers and sisters, and

my family in-law. I thank them their for support, kindness, and love.

iv

TABLE OF CONTENTS

LIST OF TABLES ................................. vii

LIST OF FIGURES ................................ viii
CHAPTER 1

Introduction ..................................... 1

1.1 General Concepts ............................. 2

1.1.1 Normal hearing .......................... 2

1.1.2 Deafness .............................. 2

1.1.3 Speech signal characteristics ................... 2

1.2 Cochlear Implants ............................ 3

1.3 Characteristics and types of cochlear implant .............. 4

1.3.1 Characteristics of cochlear implant ............... 4

1.3.2 Types of cochlear implant .................... 5

1.4 Previous Work .............................. 7

1.5 Contributions of this thesis ........................ 9
CHAPTER 2

The Power Method ................................. 10

2.1 Introduction ................................ 11

2.2 Implementation .............................. 11

2.2.1 Framing .............................. 12

2.2.2 Discrete wavelet packet decomposition ............. 14

2.2.3 Singular value decomposition .................. 16

2.3 Simulations and Results ......................... 17

2.4 Summary and Discussion ......................... 34
CHAPTER 3

The Parametrization Method ........................... 35

3.1 Introduction ................................ 36

3.2 Implementation .............................. 36

3.2.1 Words Partitioning ........................ 37

3.2.2 Framing .............................. 37

3.2.3 Discrete Wavelet Packet Decomposition ............ 37

3.2.4 Gaussian Mixture Model .................... 37

3.2.5 Separation of Parameters .................... 38

3.2.6 Principle Component analysis .................. 38

3.2.7 Clusters Analysis ......................... 39

3.3 Simulation and Results .......................... 39

3.3.1 The effect of using the GMM model with the PCA ...... 60

3.4 Summary and Discussion ......................... 66
CHAPTER 4

Conclusions and Future Work ........................... 67

4.1 Summary of the Dissertation ....................... 67

4.2 Future Work ................................ 67

BIBLIOGRAPHY ................................. 69

vi

Table 2.1
Table 2.2
Table 2.3
Table 2.4
Table 2.5
Table 2.6
Table 2.7
Table 2.8
Table 2.9
Table 2.10
Table 2.11
Table 2.12
Table 2.13
Table 2.14
Table 2.15
Table 3.1

Table 3.2

LIST OF TABLES

A comparison between Pn and Un at subband(5,2) for Y1 in x1 . 21
A comparison between Pn and Un at subband(4,1) for Y2 in x1 . 21
A comparison between Pu and Un at subband(4,1) for Y3 in x1 . 22
A comparison between Pu and Un at subband(4,1) for Y4 in x1 . 22
A comparison between Pn and Un at subband(4,1) for Y5 in x1 . 23
A comparison between Pu and U11 at subband(5,1) for Y1 in x2 . 23
A comparison between Pu and U11 at subband(5,1) for Y2 in x2 . 23
A comparison between Pn and U11 at subband(5,1) for Y3 in x2 . 24
A comparison between Pm and U11 at subband(5,1) for Y4 in x2 . 25
A comparison between Pu and Un at subband(5,1) for Y5 in x2 . 26

A comparison between Pn and Un for Y1 in X ........... 32
A comparison between Pu and Un for Y2 in X ........... 32
A comparison between Pn and Un for Y3 in X ........... 32
A comparison between Pn and U11 for Y4 in X ........... 33
A comparison between Pu and Un for Y5 in X ........... 33

Details of the frames in the estimated transient cluster by EM at
subband(1,0) ............................. 55

Details of the frames in the estimated steady-state cluster by EM
at subband(4,13) ........................... 56

vii

Figure 1.1
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.6
Figure 2.7
Figure 2.8
Figure 2.9
Figure 2.10
Figure 2.11
Figure 2.12
Figure 3.1
Figure 3.2
Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

Figure 3.8

Figure 3.9

Figure 3.10
Figure 3.11
Figure 3.12

LIST OF FIGURES

Schematic of the cochlear implant [6] ................ 4
Block diagram for the Power method ................. 12
wavelet packet ﬁlter-bank decomposition up to three levels. . . . . 15
x1(t):Exponentially increasing signal ................ 19
A comparison between Pn and Un for x1 .............. 20
x2(t):Exponentially decaying signal ................. 24
A comparison between Pu and Un for x2 .............. 25

X(t):Combination of two signalszone feeds in and the other feeds out 26

A comparison between Pn and Un for X .............. 27
A comparison between Pu and Un for X .............. 28
A comparison between Pu and Un for X .............. 29
A comparison between Pn and Un for X .............. 30
A comparison between Pn and U11 for X .............. 31
Block diagram for the Parametrization method. .......... 36
The input signal: ”Sky that morning was clear and bright blue”. . 40
The histogram of the wavelet coefficients that represents framel at
subband (1,0) and its Gaussian mixture model. .......... 42
The histogram of the wavelet coefﬁcients that represents framel at
subband (2,0) and its Gaussian mixture model. .......... 43
PCA for the start/end and middle of the words at subband (4, 13)
and SNR=15dB. ........................... 44
PCA for the start/end and middle of the words at subband (3,1)
and SNR=15dB. ........................... 45
PCA for the start/end and middle of the words at subband (3,1)
and SNR=15dB. ........................... 46
PCA for the start/end and middle of the words at subband (4,0)
and SNR=15dB. ........................... 47
PCA for the transient and steady-state parts of the words at sub-
band (1,0) and for the test frames. ................. 49
Clustering of different classes at subband(2,0) ............ 50
Clustering of different classes at subband(3,0) ............ 51

Clustering of different classes at subband(4,0) ............ 52

viii

Figure 3.13
Figure 3.14
Figure 3.15
Figure 3.16
Figure 3.17

Figure 3.18

Figure 3.19

Figure 3.20

Figure 3.21

Figure 3.22

Figure 3.23

Figure 3.24

Clustering of different classes at subband(4,5) ............
Clustering of different classes at subband(4,8) ............
Clustering of different classes at subband(4,10). ..........
Clustering of different classes at subband(4,15). ..........

Comparison between the expected clusters and the estimated clus-
ters using EM. ............................

PCA for the transient and steady-state parts of the words at sub-
band (4, 13) and for the test frames.

Comparison between the expected clusters and the estimated clus-
ters using EM at (4,13) for parameter 2 ...............

Clustering of different classes mean and variance of the wavelet
coefficients at subband (3,0) ......................

Clustering of different classes parameters that is estimated by the
GM M at subband (3,0) ........................

Clustering of different classes parameters found using GM M with-
out doing PC'A at subband (4,1) ...................

Clustering of different classes parameters that is estimated by the
GM M using PCA at subband (4,1). ................

relationship between the separability factor and the sample size
under different S N R ..........................

ix

53
54
55
56

57

59

61

62

63

64

65

CHAPTER 1

INTRODUCTION

Cochlear implant has brought profound changes in the life of deaf people. For the past
few decades scientists were able to partially restore hearing to deaf people by electrical
stimulation of auditory nerve. The availability of this device and different signal
processing strategies played a very important role in developing different techniques
for deriving electrical stimuli from the speech signal.

Cochlear implants offer an easy way for deaf people to communicate with others.
Signal processing techniques, mainly aimed to extract features from the speech signals
to be transduced for driving implanted electrodes, help patients better communicate.
The ability of the patient to understand what the speaker says in the existence of
competing speakers, especially the recognition of the beginning and end of the words
requires a robust signal processing algorithms particularly in the presence of noise.

Research on hearing aid and cochlear implant has made a considerable progress
in the last two decades and attracted attention from different sides including speech
science, signal processing, bioengineering, otolaryngology and physiology.

The focus of this thesis is on the recognition of the beginning and end of spoken
words in a noisy environment.

This chapter is organized as follows. In section 1.1, different concepts that helps
in understanding how cochlear implant and hearing aid work are discussed. Deﬁ-
nition, characteristics, types of cochlear implants are discussed in sections 1.2 and
1.3. Section 1.4 gives an overview about what have been done in the ﬁeld of cochlear
implant and hearing aid in the past few years. Finally, section 1.5 summarizes the

contribution of this thesis.

1 . 1 General Concepts

In order to understand how the cochlear implant is used to aid patients with dam-
aged auditory system, the normal system should be studied ﬁrst. In this section a
brief description of the normal hearing, deafness and speech signal characteristics is

presented.
1 . 1. 1 Normal hearing

Human ear is divided into three main parts, the outer ear,the middle ear, and the inner
ear. The acoustic stimulus travels from the environment through the outer ear to the
middle ear where it is converted to mechanical vibrations towards the inner ear. In the
inner ear the major part is the cochlea, which is a small shell-shaped cavity ﬁlled with
fluid,transforms the received mechanical vibrations to the ﬂuid to cause displacement
in the basilar membrane. The hair cells attached to the basilar membrane are affected
depending on the resultant displacement and stimulate neurons in the auditory nerve
to transmit information about the signals to higher brain regions in the auditory

cortex[1].
1 .1.2 Deafness

Deafness occurs when mechanical energy of sound vibrations is not transmitted ap-
propriately to neural information through the hair cells, so any damage of this part
either fully or partially results in complete or partial loss of hearing. In the case
of profound deafness a large number of hair cells or auditory neurons are damaged.
However, there is usually an unknown number of cells that survive, and these can be

electrically stimulated in an attempt to restore partial hearing[2]. [2]
1.1.3 Speech signal characteristics

When designing a CI, preserving certain information in the speech signal is a very
important factor in the ability of the patients to understand the spoken words. The

speech production process can be represented by the source-ﬁlter model[3]

The lungs can be considered as the source that produce different types of exci-
tations, where there are two types. The ﬁrst is the voiced periodic sounds that are
produced by forcing the air through an opening between the vocal folds and the fre-
quency related to this type of excitation is called the fundamental frequency F0. The
second is the unvoiced sounds generated constraining the pathway along the vocal
tract and then forcing the air through it. The ﬁlter part of the speech production
model is the vocal tract where the frequency response of this ﬁlter changes with
changing the shape of the vocal tract, different shapes produce different sounds and
frequencies , where the broad spectral peaks in the spectrum are called the formant
frequencies and it was found that the formant frequencies carry signiﬁcant information

about the speech signal [4, 5, 3].

1 .2 Cochlear Implants

The principle operation of CI is shown in Figure 1.1. The microphone picks up the
acoustic signal and sends it to the speech processor that converts the received signal
into electrical one, then the electrical signal is transmitted using a transmission link
to an electrode array. Current cochlear implant uses the ﬁlter bank approach, as
shown in Figure 1.1. A four channel implant, where the sound that is picked up by
the microphone is processed through a set of bandpass ﬁlters that divide the signal
into four channels, then the envelope of the signal is extracted using the envelope
detector. The relative amplitude of the current pulses delivered to the electrodes
reflect the spectral contents of the input signal. The basic assumption underlying
the operation of CI 3 is that there are enough number of surviving auditory neurons
that can be stimulated. The cochlear implant transmit different information about
the received signal to the brain like sound pitch which is a function of the place that
is stimulated in the cochlear implant and the sound loudness which is a function of

the amplitude of the stimulus current.

Electrode Anay
Transmitter ‘ Receiver and Contacts

I

 

Microphone

 

Speech I

Processor

 

 

 

 

Elew‘odet

. Pulse
MmpMF—W Bandpass ﬁlers — 3;:sz Generation

 

 

 

 

 

Figure 1.1. Schematic of the cochlear implant[6].

1.3 Characteristics and types of cochlear implant

1.3.1 Characteristics of cochlear implant
Different cochlear implant devices has different characteristics[7, 8], it can be sum-
marized as follows:

1. Electrode design, multiple factors affect the electrode design, ﬁrst is the location
where the electrode is placed, the most common case is placing the electrodes in

the scala tympani, and this kind of placement preserve the place mechanism of the

normal cochlea for coding frequencies, where this mechanism depends on the input
frequency. For low frequencies, the electrodes near the apex are stimulated, which
causes the auditory neurons that are tuned for low frequencies to be stimulated.
For high frequencies, the electrodes near the base are stimulated to stimulate the
auditory neurons that is tuned for high frequencies. Second factor is the spacing
between electrodes and their number. The third is the conﬁguration like monopolar
and bipolar. Finally the orientation with respect to the excitable tissue[8, 9, 10].

2. Transmission link, there are two types of links that is used to transmit the signal
from the speech processor to the electrodes, they are the transcutaneous connection
and the percutaneous connection[8, 11].

3. Type of stimulation, where it can be analog or pulsatile. In the analog type,
the acoustic signal is transmitted in electrical analog form to the electrodes while in
the pulsatile stimulation the acoustic signal is transmitted in a narrow pulses form to
the electrodes[8, 11].

4. Signal processing, different devices of cochlear implant use different signal
processing strategies and this is one of the most challenging parts in the design of the

cochlear implants where it transforms the speech signal to electrical stimuli[7, 11, 12].
1.3.2 Types of cochlear implant

There are two main types of cochlear implants, they are the single channel implants
and the multichannel implants, where there are many different signal processing tech-
niques that are used with these types of implants.

In the single channel implants the electrical stimulation is done at a single location
on the cochlea, where it has only one electrode, while in the multichannel implants
the electrical stimulation is provided to different locations on the cochlea using multi-
ple electrodes or electrode array [12].Multichannel implants were introduced in 19808,
where the electrode array that is used in this type of implants allows stimulating

different electrodes depending on the signal frequency. The number of electrodes in

the electrode array varies from a device to another, where this matter is still under
investigation. Different signal processing strategies have been used in multichannel
implants, the main two categories that have been mentioned in literature are the
waveform strategies and the feature extraction strategies. The waveform strategies
can be summarized as presenting the analog or pulsatile waveform that is derived
by ﬁltering the speech signal into different frequency bands. On the other hand,
feature extraction strategies use different feature extraction algorithms to present
the speech signals such as the formants. An example of the waveform strategies is
the Compressed—Analog (CA) approach, where it uses an automatic gain control to
compress the signal and then ﬁltering the signal into four frequency bands. The ﬁl-
tered waveforms are delivered simultaneously to four electrodes in analog form[11, 12].
An experiment on cochlear implants patients using this kind of signal processing is
presented in [13], the problem of this strategy is the channel interaction, where the
stimuli from one electrode might be destroyed the stimuli of another electrode and
this will affect the neural responses and distort the spectral information of the speech
signals[14]. Another example is the Continuous Interleaved Sampling(CI S) approach
that was developed to resolve the problem of channel interaction by using nonsimul-
taneous, interleaved pulses, where only one electrode is stimulated at a time [12, 15].
Different factors affect the performance of the CI S approach like the pulse rate and
duration, stimulation order and compression function[8, 16, 17]. In the Feature ex-
traction approach the main device that uses these signal processing strategies is the
Nucleus Multi-Electrode Implant that was introduced in the early 19805, where many
developments have been done for this device since that time[18]. Different strategies
have been used in this device includes extracting the fundamental frequency(F0)and
the second formant frequency(F2) and this is called the FO/F2 strategy[18, 19]. The
FO/F2 strategy was developed by adding the extracting of the ﬁrst formant and it

was called the F0 / F 1 / F2 strategy[20]. More development where done in [22]. A com-

parison between the performance of the two methods has been presented in[21, 23].

1.4 Previous Work

During the last two decades numerous research has been done on cochlear implants
and hearing aids and related topics. In [8]a review of the cochlear implant types,
devices, characteristics and different signal processing strategies has been done.

In [24] the wavelet transform is used to implement the Continuous Interleaved
Sampling(C'I S) speech strategy in the cochlear implants, where it was found that
using The WT will provide fast calculations in the CIS processor. A speech signal
processing using Bionic wavelet transform and neural network simulation in [25] is
proposed, where it was found that this approach reduces the number of the required
channels and highly tolerates noise in addition to the reduction the stimulation dura-
tion for words and improving the recognition of vowels an consonants. A comparison
between cochlear implants and acoustic hearing has been done in [26], where it mea-
sures the speech recognition in noise as a function of the number of the spectral
channels and signal to noise ratio. Two different cochlear implant devices and three
signal processing strategies have been used in this experiment, it was found that the
performance of normal hearing people is better than the cochlear implants patients,
and when comparing the CI patients performance with each other it was found that
most of them were unable to make full use of the spectral information provided by
the number of electrodes placed in their implants. A study about the effects of chan-
nel interactions of electrical pulse rate on the cochlear implants was done in [27].
Different experiments have been done in this paper to study this issue. The authors
found that the channel interaction limits the number of information channels that are
transmitted to the brain.

Since electrodes get activated if there is high noise between spoken words, many

researchers studied the problem of noise reduction to improve the speech recognition.

Different methods and techniques were proposed for this goal. In [28],noise reduction
algorithm in dual microphone behind the ear hearing aid is presented using the singu-
lar value decomposition based optimal ﬁltering in combination with a voice activity
detector. The algorithm helped in improving the ability to discriminate between the
noisy signal periods and the noise-only periods.

Most signal processing strategies in the current cochlear implants focus on extrac—
tion of the amplitude modulation where it is sufﬁcient in the speech recognition in
noise free environment [29].However, in [30],a different speech processing strategy is
proposed, where it encodes both amplitude and frequency modulations to improve
the performance of the cochlear implants in a noisy environment. It was found that
extracting and encoding frequency modulation is sufficient in the speech recognition
in noisy environment.

Another speech processing strategy that encodes the envelop and the fundamen-
tal frequency of the speech signal to be used in the modulation of the amplitude and
frequency of the electrical pulses that are transmitted to the electrodes was imple-
mented in [31] on the tonal language. Moreover, a new signal processing strategies
were proposed in [32]. Two methods of signal separation and noise suppression were
presented to improve the performance of hearing aids and cochlear implants in adverse
conditions. The authors found that the method performs better than the common
band-pass ﬁltering techniques used currently in HA and CI and it captures the rapid
dynamics of the speech signal and minimizes the effect of noise.

In [33], the authorsdescribe some possible improvements that can be done for
the cochlear implants to achieve better performance especially in multi-talker envi-
ronment. They include the combination of electrical and acoustic stimulation of the
auditory system when the patients have signiﬁcant residual hearing. The authors also

discuss the factors that affect the difﬁcult cases of the cochlear implant patients.

1.5 Contributions of this thesis

In this thesis the problem of resolving starts and ends of spoken words is addressed
to improve speech intelligibility by CI patients under severe adverse conditions. Two
methods are implemented to study the characteristics of the transient periods of
words. The ﬁrst method is the Power method which uses signal subspace decom-
position through low rank approximation to study the effect of the energy that is
contained in the beginning and end of the words on the eigen vectors that spans the
space of the target signal, and how do the competing signals affect the pattern of the
target signal eigen vector. This method is explained in details in Chapter 2.

The second method is the parametrization method, where it parameterize a sparse
representation of the speech signal with a Gaussian mixture model to obtain a feature
set that can be used for classiﬁcation and clustering. A detailed study of this method
is presented in Chapter 3.

Finally, Chapter 4 concludes this thesis with a summary of contributions and

future work.

CHAPTER 2

THE POWER METHOD

In this chapter the power of the speech signals is studied to see if it can help in the
recognition of the beginning and end of the words.

This chapter is organized as follows. Section 3.1 gives an introduction about the
power method. Section 3.2 describes the implementation of the proposed method.
Section 3.3 provides simulation results to demonstrate the performance of the pro-

posed method. Finally, Section 4.4 summarizes this chapter.

.10

2. 1 Introduction

Cochlear implants (CI) patients face a serious problem in the recognition of the be-
ginning and end of the words in a noisy environment, where the desired signal under-
goes considerable temporal modulation over the transitional time interval. Roughly
speaking, the word has three parts: beginning, middle and end. The problem in the
beginning and the end of the word is that their signals have aperiodic form,that is

more transient like.

2.2 Implementation

Figure 2.1 shows the block diagram of the power method. Simple sinusoidal signals
multiplied by exponential have been used to test this algorithm, considering them as
the simplest case of the real speech signals, where these signals have different features
that can be used in our study like frequency and temporal variation, increasing sinu-
soids were considered as the beginning of the words and decaying ones as the end of
the words. This method can be summarized as follows:

1. Framing the received signal into frames of length N and calculating the power
related to each frame signal.

2. The frames undergoes a DWPT up to L levels, then ﬁnding the ”best” subband,
deﬁned as the subband where the principal signal in each frame lives.

3. Singular value decomposition is computed for the wavelet coefﬁcients found in
the previous step to ﬁnd the eigen vectors that span the signal subspace.

4. A comparison between the power calculations in step 1 with the eigen vector of
the best subband , here we check if this eigen vector follows the same power pattern

of the target signal over time.

11

 

 

Framing

 

 

 

 

DWPT

 

 

 

 

 

 

2.2.1 Framing

In studying speech signals it is necessary to transform the signal from its actual form
into frames to guarantee local stationarity. Assuming that s = [s [0]s[1] ............ s[N —
1]] denotes the speech signal that will be transduced through the device over a time
frame with length N. In a multi-speaker environment, the received signal is a mixture

of speech signals and noise. Assuming that we have P independent speakers, the

noise—free speech signal model is:

12

 

SVD

 

 

 

Eigen Vector
Analysis

 

 

Figure 2.1. Block diagram for the Power method.

 

P

33m = Z ampSp, (2.1)
p=1

h

where, amp denotes the weight of the speaker p, 3p, in the mt speech frame. the

observations can be expressed in matrix form as,

YM :XM'l'ZAI' (2.2)

ﬂaming the signal into M frames with frame length N according to,

yg a151(2)+ +apsp(2)
YM = ' = ' ' ' , (2.3)
_ yM _ _ alsl((M)+ +apsp(M) J

 

 

 

 

Equation (2.3) represents time-invariant mixing, where the mixing of speakers
across frames is ﬁxed.
The other type of mixing is the time-variant mixing, where mixing of speakers

across frames is dependent on the frame index. The signal is framed according to,

r y, I F a1(1)31(1)+ +ap(1)sp(1) 1
W a1(2)sl(2)+ +ap(2)sp(2)
YM = = , (2 4)
- y)” - . a1(ll/[)SI((fl/I)+ +ap(M)sp(M) _l

 

 

 

 

13

where for (1,-(m), z' is the speaker index and m is the frame index.

We consider the more challenging case of time-variant mixing. For the beginning
of the word case the scaling factors are assumed to have the pattern, a M > aM_1 >
.. > a1, while for the end of the word case, we assume aM < aM__1 < < a1.
These assumptions are based on the deﬁnition of a start and end of a spoken word
and characteristics of phonemes typically encountered in this segments of the word.

The next step is to calculate the power of every frame, using Parseval’s formula(2.5),

Power = Z |at(n)|2. (2.5)

2.2.2 Discrete wavelet packet decomposition

The mth frame undergoes a Discrete Wavelet Packet Decomposition (DWPT) up to
L levels (where the number of subbands is J = 2L+1 — 1 subbands). The DWPT is
an extension to the discrete wavelet transform, in other words the DWT is a subtree
of the DWPT [34].

Figure 2.2 shows the wavelet packet ﬁlter-bank decomposition with successive
ﬁltering and down-sampling up to three levels.

In the DWT each level is calculated by decomposing the approximate coefficients
through high and low pass ﬁlters, while in the DWPT both approximate and details
coefﬁcients are decomposed using high and low pass ﬁlters and then down sampled,
where this increases the frequency and time resolution.

After ﬁnding the DWPT for every frame, the coefficients are rearranged in matrix
form where each matrix contains the coefﬁcients of all the frames at a certain subband,

as described in equation (2.6).

14

 

gin] 4’2

 

 

 

 

 

-'- gin]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1e

 

 

 

 

 

 

 

 

 

 

 

 

 

oooiea

 

 

 

 

 

 

x[n]--—> h[n] —b@——> h[n]

 

 

 

Figure 2.2. wavelet packet ﬁlter-bank decomposition up to three levels.

 

 

y[m)
31$qu
Coef(i,j) = , (2-6)
_ ye” _
where Coe f (2', j ) denotes for the wavelet coefﬁcients at subband (i, j) and ylgi’j) de-

15

notes the coefﬁcients of the frame k at subband (2', j).

Now we can write,

yjm = 33%,, + zjm, (2.7)

Where yfn = (11%,,(0),y¥,,(1),...,y¥,,(Nj —— 1)) denotes the DWPT of Ym in the jth
subband. In doing so, one obtains an over complete representation of the observations
in the form of a dictionary basis AJ to choose from. The transformed observations

can be expressed in a matrix form as,

Yj = Xj + Zj. (2.8)

When the energy of every subband is calculated, one can determine those with
the highest energy , presumably where the signal lives.

It is important that the objective here is to detect the beginning and the end
of the words in the target speaker speech signal in the observed mixture where the

desired signal does not necessarily have the highest energy.
2.2.3 Singular value decomposition

Singular value decomposition(S VD) is a fundamental technique in many matrix anal-
ysis and computation[35]. By using S VD the matrix is decomposed into several com-
ponent matrices that shows many useful and interesting properties of the original
matrix. One more beneﬁt of using the S VD of a matrix in computations is that it
reduces the numerical error. In this method, our interest in using the SVD, comes
from its ability to split a vector space into lower-dimensional subspaces.

This can be done using factorization of a rectangular real matrix. Assuming Y ia

a k air 1 matrix, then it can be expressed as,

Y = UyDyVy*, (2.9)

16

Where UY is k * k unitary matrix that contains the orthonormal eigen vectors of
YYT.

Dy is k =r= l matrix with non-negative numbers on the diagonal and zeros every
where else, where these numbers are the singular values in a descending order.

And Vy* is the conjugate transpose of Vy which is a unitary matrix with dimen-
sion 1 * I and contains a set of orthonormal eigen vectors of YTY

In the proposed method SVD is used to ﬁnd the eigen vector that spans the
subspace of the target speaker, by calculating the factorization of the matrix(2.8) the

resultant representation of this matrix is,
W — Uj Dj Uj T (2 10)
_ Y Y Y ’ '

After ﬁnding the S VD for every subband, then we ﬁnd the eigen vector that is related
to the subband that has the highest energy and compare the pattern of this eigen

vector with the power pattern of the frames calculated in(2.5).

2.3 Simulations and Results

The Power method was applied to two sample signals, where the ﬁrst was obtained
using single sinusoids is multiplied by exponentially rising exponentials. In the ﬁrst
sample, the input signal was framed with frame length of 25ms , then the power of
every frame is calculated and normalized for the purposes of comparison.

Next step is the DWPT, where every frame is transformed from time domain to
the time-frequency domain. In this step the power of every subband is calculated and
by comparing the power of all subbands and choosing the one that has the highest
power where the signal lives.

The third step is to ﬁnd the SVD for the wavelet coefﬁcients matrix found in

the previous step, and check the principle eigen vector that is related to the subband

17

where the signal lives and comparing the normalized power(Pn) with normalized
eigen vector(Un).
Equation 2.11 describes the transformed observations of the speech signal to the

time-frequency domain. Equation 2.12 describes the S VD factorization of 2.11,

W = XI +Zj. (2.11)

, . . 'T
Y9 = U{,D{,U{, , (2.12)

Figure 2.4 and Figure 2.6 show a bar plots that presents a comparison between (Pn)
and (U n) for two different signals 1:1 and $2, the ﬁrst one fades in and the second
one fades out. Figure 2.3, Figure 2.5 show the time domain representation of the two

signals simultaneously. Where,
:rl(t) = exp(4t)cos(27r400t) (2.13)

51:2(t) = 10 exp(—2t)cos(27r120t + 7r/5) (2.14)

Table 2.1 through Table 2.10 present the details that is related to every ﬁgure,
where it contains the frame number, the subband where the principal signal in each

frame lives, the normalized power values and the normalized eigen vector.

18

x1 =exp(4*t).*cos(2*pi*t*400) M=8, Fs=22KHz, Ts=25ms, #of samples=550lframe

x1 (t)

60

 

 

' E [Y4TEY5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Y13Y2:Y3

 

 

1.76 2.2 2.5
4

 

 

‘60 i i i
0 0.44 0.88 1.32
Time(sec) x 10

Figure 2.3. x1(t):Exponentially increasing signal

19

Pn(red),Un(blue)

Pn(red),Un(b|ue)

0.5

0.5

Y1 6 Y2
2 1
.Q
E’
30.5
'D
9
12345678 f 12345678
Frame number A Frame number
Y3 9 Y4
E 1
‘c’
30.5
'0
9
12345678 E 12345678
0.
Frame number Y5 Frame number
:37 1
.0
E’
30.5
'0
9
1:“
*1 1 2 3 4 5 6 7
Frame number

Figure 2.4. A comparison between Pn and Un for X]

Table 2.1. A comparison between Pn and Un at subband(5,2) for Y1 in x1

 

Frame Subband Pn U11

1 (5, 2) 0.2460 0.4351
2) 0.3007 0.5117
) 0.3675 0.5922
) 0.4491 0.6755
) 0.5487 0.7600
I
)
)

 

 

 

 

 

 

0.6703 0.8438
0.8187 0.9247
1.0000 1.0000

 

 

 

 

 

 

 

ooqoaowinoom

(5
(
(
(
(
(
(

 

Table 2.2. A comparison between Pu and Un at subband(4,1) for Y2 in x1

 

Frame Subband Pn Un

1 4,1) 0.2468 0.4536
4 1) 0.3014 0.5297
4 1) 0.3680 0.6092
4 1) 0.4494 0.6908
4,1) 0.5488 0.7728
4 1)
4 1)
4 1)

 

 

 

 

 

 

0.6703 0.8534
0.8187 0.9301
1.0000 1.0000

 

 

 

 

 

 

 

CONCECJ'IAOON

 

From these ﬁgures and tables we can say that the eigen vector of the best subband
follows the expected power pattern. In the second example, combination of the two

signals 11:1 and x2 is used, where the mixing expression is presented in 2.15.

X(t) = exp(4t)cos(27r400t) + 10exp(—2t)cos(27r120t + 7r/5) (2.15)

Figure 2.7 shows the signal X in the time domain, and Figure 2.8 through Figure

21

Table 2.3. A comparison between Pn and U11 at subband(4,1) for Y3 in x1

 

 

 

 

 

 

 

 

 

 

Frame Subband Pn Un
1 (4,1) 0.2455 0.4401
2 (4,1) 0.2999 0.5166
3 (4,1) 0.3665 0.5968
4 (4,1) 0.4479 0.6796
5 (4,1) 0.5474 0.7635
6 (4,1) 0.6692 0.8464
7 (4,1) 0.8180 0.9262
8 (4,1) 1.0000 1.0000

 

 

 

 

Table 2.4. A comparison between Pn and Un at subband(4,1) for Y4 in x1

 

 

 

 

 

 

 

 

 

 

 

Frame Subband Pn Un
1 (4,1) 0.2454 0.4451
2 (4,1) 0.3000 0.5214
3 (4,1) 0.3668 0.6014
4 (4,1) 0.4483 0.6838
5 (4,1) 0.5480 0.7669
6 (4,1) 0.6697 0.8490
7 (4,1) 0.8184 0.9276
8 (4,1) 1.0000 1.0000

 

 

 

 

 

2.12 show a comparison between the normalized power and the normalized eigen
vectors. Table 2.11 through Table 2.15 show the details of these comparisons. The
objective here is to examine if the eigen vector of every signal alone has the same

pattern that is not affected by mixing.

22

Table 2.5. A comparison between Pn and Un at subband(4,1) for Y5 in x1

 

 

 

 

 

 

 

 

 

Frame Subband Pn U11
1 (4,1) 0.3013 0.5316
2 (4,1) 0.3681 0.6110
3 (4,1) 0.4496 0.6924
4 (4,1) 0.5491 0.7742
5 (4,1) 0.6706 0.8544
6 (4,1) 0.8189 0.9306
7 (4,1) 1.0000 1.0000

 

 

 

 

Table 2.6. A comparison between Pn and Un at subband(5,1) for Y1 in x2

 

 

 

 

 

 

 

 

 

 

 

 

 

Frame Subband Pn U11
1 (5,1) 1.0000 1.0000
2 (5,1) 0.9045 0.9525
3 (5,1) 0.8182 0.9061
4 (5,1) 0.7401 0.8610
5 (5,1) 0.6694 0.8172
6 (5,1) 0.6055 0.7747
7 (5,1) 0.5477 0.7335
8 (5,1) 0.4955 0.6937

 

Table 2.7. A comparison between Pn and Un at subband(5,1) for Y2 in x2

 

 

 

 

 

 

 

 

 

 

Frame Subband Pn Un
1 (5,1) 1.0000 1.0000
2 (5,1) 0.9046 0.9525
3 (5,1) 0.8183 0.9062
4 (5,1) 0.7403 0.8611
5 (5,1) 0.6697 0.8173
6 (5,1) 0.6058 0.7748
7 (5,1) 0.5481 0.7336
8 (5 1) 0.4959 0.6938

 

 

 

 

23

 

 

 

x2 =10*exp(-2't).*cos(2*pi*t*120+pi/5) M=8, Fs=22KHz, Ts=25ms #ofsamples=550lframe

10
I...

8 ' -

 

4
1
2]: 11

6‘

11 1111111 ~
0‘ 1]:11 11 111 111:]l1111‘ 11 11111 11 :I ‘I]‘] 111l
[III ‘1]‘1‘11‘111‘:1:[[1|1‘111111:,IIII‘III‘ ]I III “::1‘1‘ 11]]‘
1“‘11I‘I I1 “ :

:IIlI“‘1111[1“]1‘11‘11: 1‘. _

x2(t)

1
[[II “[1

1

1
1 I
] 1 ]
I

 

 

-10 . 1 a 1 1
0 0.44 0.88 1.32 1.76 2.2 2.5
Time(sec) x 104

Figure 2.5. x2(t):Exponentially decaying signal

Table 2.8. A comparison between Pu and Un at subband(5,1) for Y3 in x2

 

24

Y1

Pn(red),Un(blue)
.0
O 01 —I
Pn(red),Un(blue)
O
o '01 _.

12345678 12345678
Frame number Frame number
Y4

. 0.5

1 2 3 4 5 6 7 8
5 Framenumber

1 2 3 4 5 6 7 8
Framenumber

1

Pn(red),Un(blue)
.C’
O 0'! —K
I _<
(A)
_< Pn(red) Un(blue)
o _L

0.5

Pn(red),Un(blue

1 2 3 4 5 6 7
Framenumber

Figure 2.6. A comparison between Pn and Un for X?

Table 2.9. A comparison between Pn and Un at subband(5,1) for Y4 in x2

 

25

Table 2.10. A comparison between P11 and U11 at subband(5,1) for Y5 in x2

 

X(t) =exp(4t)cos(2*pi*400t)+ 10exp(-2t)cos(2*pi*120t+pi/5)
M=8, Fs=22KHz, Ts=25ms, #of samples=550lframe

6°I‘IIII???’

 

 

X(t)

 

 

 

 

l

0.44 0.88 1.32 1.76 2.2 2.5

Time(sec) x 104

Figure 2.7. X(t):Combination of two signals:one feeds in and the other feeds out

26

Pn(red),Un(blue)

0.9 -

0.8 -

0.7 -

SD
0)

9
(J1

p
A

0.3 -

0.2 r

0.1 -

 

Y1-subband(5,1)

1 2 3 4 5 6 7 8
Frame number

 

0.9 '

0.8 -

0.7 -

.0
ca

Pn(red).Un(blue)
.0 s=> .0 .o
N OJ A 0'!

F3
_L
v

 

Y1-subband(5,2)

1 2 3 4 5 6 7 8
Frame number

Figure 2.8. A comparison between Pn and U11 for X

27

 

Pn(red),Un(blue)

 

Y2-subband(4,1)

Y2-subband(5,1)

 

 

 

1 1 .......
0.9- 0.9-
0.8- 0.8 -
0.7- 0.7 r
0.6- - 3 0.6»
.D
E’
0.5- - 2 05
‘0
9
0.4- ‘ ‘E 0.4~
a.
0.3- i J 0.3-
0.2- 0.2-
0.1 - 0.1 -
0 0
12345678 12345678
Frame number Frame number

Pn(red),Un(blue)

 

Y2-subband(5,2)
1

 

0.9r
0.8 -

0.7 -

0.3 r

0.2 -

0.1-

 

0.6 -

0.5 -

0.4 -

rrrrrr

0
12345678

Frame number

Figure 2.9. A comparison between Pu and Un for X

28

 

Pn(red),Un(blue)

 

Y3-subband(4,1)

Y3-subband(5,1 )
1

Y3-subband(5,2)
1

 

 

 

0.9- 0.9r
0.8~ 0.8-
0.7- 0.7-
0.6- J 3 0.6
.0
E’
0.5- - 3 o 5
”U
9
0.4- - E’ 0.4+
o.
0.3- 0.3~
0.2- 0.2-
0.1 - 0.1 -
0 0
12345678 12345678
Frame number Frame number

Pn(red),Un(blue)

 

 

IIIIIII

53
co

5:
00

p
\l

_O
O)

p
()1

P
A

SD
00

p
N

p
_x

 

0
1 2 3 4 5 6 7 8
Frame number

Figure 2.10. A comparison between Pu and Un for X

29

 

Y4-subband(4,1) Y4-subband(5,1) Y4-subband(5,2)

 

 

 

 

 

 

 

 

 

 

7 l I I 7 l I 1 1’7 TI I I I 1 7 I Y I l l l
0.9- 0.9- 0.9r
0.8- 08 0.8-
0.7- 07 0.7-
30.6- 30.6- -30.6-
.o .O a
E E c
2 0.5- 2 o 5 - 3 0.5
'O 'O 'U
93 93 9
E’OA- ‘EO.4- ~‘c’0.4~
n. a a
0.3- 03 0.3L
0.2- 02 0.2-
0.1r 01 0.1-
0 ‘ 0 0
12345678 12345678 12345678
Frame number Frame number Frame number

Figure 2.11. A comparison between P11 and U11 for X

When applying this algorithm, it was expected that in the case of the combination
of different exponential sinusoids, the dominant signal will have the main effect on
the eigen vectors subspaces as we are using eigen decomposition which follows second
order statistics and therefore will only track the highest power in a given frame and
will not track the speaker specific signals. These expectation have been conﬁrmed by
applying this algorithm on multiple signals, where it was found that the wavelets offer

no exception as compared to Fourier. The eigen vector that was assumed to span the

30

Y5-subband(4,1) Y5-subband(5,1) Y5-subband(5.2)
1 1

 

 

 

 

 

 

 

 

 

 

0.9- 0.9-
0.8 0.8~
0.7- 0.7-
30.6- 306- 3
.Q .0 .D
E E’ ‘c’
205 20.55 3
E E E
304- E04 ‘5
D. 0. O.
0.3— 0.3—
o.2~ 0.2-
01 0.1-
o o 0
1234567 1234567 1234567

Frame number Frame number Frame number

Figure 2.12. A comparison between P11 and U11 for X

subspace of the target signal follows the power pattern of the dominant signal.

31

Table 2.11. A comparison between Pu and Un for Y1 in X

 

 

 

 

 

 

 

 

 

 

Frame Subband(X) Pn Un(5,1) Un(5,2)
1 (5,1) 1.0000 1.0000 1.0000
2 (5,1) 0.9083 0.9551 0.9729
3 (5,1) 0.8261 0.9114 0.9472
4 (5,1) 0.7528 0.8689 0.9227
5 (5,1) 0.6877 0.8278 0.8989
6 (5,1) 0.6303 0.7878 0.8755
7 (5,1) 0.5802 0.7489 0.8520
8 (5,1) 0.5370 0.7112 0.8278

 

 

 

 

 

 

Table 2.12. A comparison between Pu and U11 for Y2 in X

 

 

 

 

 

 

 

 

 

 

Frame Subband(X) Pn Un(4,l) Un(5,1) Un(5,2)
1 1.0000 1.0000 1.0000 0.6210
2 0.9409 0.9799 0.9638 0.6711
3 0.8954 0.9612 0.9283 0.7247
4 0.8640 0.9437 0.8933 0.7808
5 0.8475 0.9269 0.8586 0.8381
6 0.8472 0.9102 0.8237 0.8950
7 0.8648 0.8929 0.7883 0.9497
8 0.9025 0.8741 0.7520 1.0000

 

 

 

 

 

 

Table 2.13. A comparison between Pu and U11 for Y3 in X

 

 

 

 

 

 

 

 

 

 

Frame Subband(X) Pn Un(4,1) Un(5,1) Un(5,2)
1 0.3848 0.5989 0.7072 0.4854
2 0.4198 0.6536 0.7419 0.5589
3 0.4675 0.7115 0.7805 0.6353
4 0.5301 0.7714 0.8223 0.7132
5 0.6106 0.8321 0.8664 0.7908
6 0.7126 0.8918 0.9117 0.8662
7 0.8406 0.9486 0.9568 0.9369
8 1.0000 1.0000 1.0000 1.0000

 

 

 

 

 

 

32

 

 

Table 2.14. A comparison between Pu and Un for Y4 in X

 

 

 

 

 

 

 

 

 

 

Frame Subband(X) Pn Un(4,1) Un(5,1) Un(5,2)
1 ) 0.2589 0.4597 0.4423 0.4619
2 ( ) 0.3116 0.5341 0.5165 0.5377
3 (4 1) 0.3765 0.6120 0.5949 0.6166
4 (4 1) 0.4563 0.6923 0.6764 0.6973
5 (4 1) 0.5541 0.7733 0.7596 0.7783
6 (4 1) 0.6739 0.8532 0.8427 0.8574
7 (4 1) 0.8206 0.9297 0.9237 0.9323
8 (4,1) 1.0000 1.0000 1.0000 1.0000

 

 

 

 

 

 

Table 2.15. A comparison between Pu and Un for Y5 in X

 

 

 

 

 

 

 

 

 

Frame Subband(X) Pn Un(4,1) Un(5,1) Un(5,2)
1 (4,1) 0.3028 0.5334 0.6157 0.4954
2 (4,1) 0.3693 0.6125 0.6901 0.5768
3 (4,1) 0.4506 0.6936 0.7636 0.6615
4 (4,1) 0.5499 0.7752 0.8340 0.7482
5 (4,1) 0.6711 0.8551 0.8988 0.8350
6 (4,1) 0.8192 0.9309 0.9553 0.9198
7 (4,1) 1.0000 1.0000 1.0000 1.0000

 

 

 

 

 

 

33

 

 

2.4 Summary and Discussion

In this chapter, we proposed the Power method to study how the power of the signal
affects the behavior of the beginning and end of the words in the wavelet domain. The
proposed Power method used the sparse representation of the signal in combination
with the singular value decomposition where the beginning and end of the words are
considered as the target signal in a noisy environment, even if they don’t have the
highest energy.

The singular value decomposition of the sparse representation of the signals has
been used to study the eigen vectors that spans the subspace of the desired signal
when it is found in combination with other signals. The assumption was that the
eigen vector that spans the subspace of the target signal might keep the same power
pattern even in the existence of other signals, but it was found that the competing
signals might be dominant and this affects the behavior of the eigen space of the
‘whole signal, where the eigen vector of the target signal follows the power pattern of

the dominant signal.

34

CHAPTER 3

THE PARAMETRIZATION METHOD

In the previous chapter, we introduced the power method. The results of this method
were not satisfactory. Therefore in this chapter, we statistically analyze the distribu-
tion of the wavelet coefficients using a parametric model to improve the classiﬁcation
task.

This chapter is organized as follows. Section 3.1 gives an introduction about the
proposed method. Section 3.2 describes the implementation of the parametrization
method. Section 3.3 provides simulation results to demonstrate the performance of
the proposed method. Finally, Section 4.4 summarizes the major contributions of this

chapter.

35

3. 1 Introduction

In this method a parametric model of the probability distribution of the wavelet
coefficients is used rather than the raw wavelet coefﬁcients to create a feature space

that can be used in classification and clustering purposes.

3.2 Implementation

Figure 3.1 shows the block diagram for the parametrization method. As seen, there

are seven steps to implement the proposed method. In the ﬁrst step the words are

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Words Partitioning - Framing DWPT
1
GMM for every Separation of the PC A
subband dass parameters
Clusters Analysts

 

 

 

Figure 3.1. Block diagram for the Parametrization method.

36

divided into 3 main classes:start, middle and end. In the second step the the new
form of the signal is framed with a frame length of N. The third step transforms the
signals to the wavelet domain using the discrete wavelet packet decomposition. The
forth step ﬁts the probability distribution of the wavelet coefficients with a Gaussian
mixture model(GM M ), where the coefficients of every subband can be modeled as
more than one Gaussian with different parameters(mean and variance)[36]. The next

step classiﬁes the parameters of the GMM depending on the frames class.
3.2.1 Words Partitioning

the Speech signals used in this method with babble noise added at different signal to
noise ratios [37]. In this step the words are divided manually into three parts: start,
middle and end. This knowledge of the parts of the word will help in studying the

characteristics of each part alone.
3.2.2 Framing

The input to the algorithm is a vector that contains the three parts of the words.
The frame length that is used in this approach is 25ms. Same way of framing that is

presented in 2.2.1.
3.2.3 Discrete Wavelet Packet Decomposition

Here the frames undergoes a discrete wavelet packet decomposition up to L levels as

explained in 2.2.2.
3.2.4 Gaussian Mixture Model

In this step a Gaussian mixture model(GM M ) is used to ﬁt the probability dis-
tribution of the wavelet coefﬁcients found in 3.2.3, where each set of coefficients is
represented with two Gaussian’s with two parameters(mean and variance)and the pa-
rameters arranged into two main matrices, the mean matrix and the variance matrix.
Each matrix has the parameters of all the frames at a certain subband as described

in equations (3.1) and (3.2) respectively.

37

4411 7412 #1M
742' , j = , (31)
4421 #22 #2M
where #1:, j denotes the mean of distributions characterizing the coefﬁcients at subband

(z', j) using the two Gaussian’s.

2 2 2
2 — 0'11 012 0000 01M 2
0233' _ 2 2 2 ’ (3' )

where 02-J- denotes the variance of distributions characterizing the coefficients at sub—

band (i, j ) for the two Gaussian mixtures.
3.2.5 Separation of Parameters

After estimating the parameters of the Gaussian mixture model that represents the
distribution of the wavelet coefﬁcients, this step comes to separate these parameters
depending on the class that they belong to according to. This can be done by deter-
mining the parts of equations (3.1) and (3.2) that belong to the start, the middle and

the end frames.
3.2.6 Principle Component analysis

In this step the algorithm starts to go into two directions to analyze the problem,
the ﬁrst is the classiﬁcation and the second is the clustering. In the classiﬁcation
problem, the (PCA) is done for the parameters of each class separately, where every
class is projected on its own principle components.

In the clustering problem the PCA is done for 3.1 and 3.2, which means ﬁnding
the principle components of the whole signal regardless the type of the class and using
them for the projection step. The expectation here is that if the features obtained

are distinct among these classes, we will see separate clusters in the feature space[38].

38

3.2.7 Clusters Analysis

In cluster analysis the observed data set can be organized into groups depending on
the degree of associations between the objects in each group.

There are different methods to examine how objects in a cluster are related to each
other, which can be the degree of separation of these clusters or how compact is every
cluster. One approach is computing distance between the objects that is forming the
different clusters. One example of a distance measure is Euclidean distance, where this
type is the geometric distance in a multidimensional space, and it can be computed

according to,

 

Ed(v1,v2) = Z (121,- - U202 . (3-3)

2'
Equation (3.3) can be written as,
133141.16) = 21221.- — v2.12. (3.4)
i
which is called the squared Euclidean distance.

The separability factor p is computed at each subband, deﬁned as,

average distance across clusters

 

(3.5)

average distance within clusters

p can be used to study the clusters separation at every subband, where the subband
that has a large value of p implies a high degree of separation and more compact

clusters.

3.3 Simulation and Results

In the words partitioning step, we have considered two cases the ﬁrst one divides

the words into two classes, where it considers the transient periods of speech as

39

one class(start and end of the words)and the steady state periods as the second
class(middle of the words), this is referred to as the two-class problem. The second
approach divides the word into three classes where it considers the start and the end
as separable classes and the third class is the middle of the word. This is referred to
as the three-class problem. Figure 3.2 shows an example of the input signal used in

this algorithm in the time domain ”Sky that morning was clear and bright blue”.

'Sky that morning was clear and bright blue’
0.5 I l l l I l

 

0.4

l
J

 

 

 

 

 

. 4 l l l l l l
O 2000 4000 6000 8000 10000 12000 14000

Figure 3.2. The input signal: ”Sky that morning was clear and bright blue”.

40

In the framing step the signal is framed with frame length of 257713, and then
these frames undergoes a DWPT and the wavelet coefﬁcients are rearranged as men-
tioned in 3.2.3 to ﬁnd the DWPT, different Daubechies and Symlets ﬁlters have been
used. After computing the DWPT, the Gaussian mixture model is used to ﬁt the
probability distribution of the wavelet coefﬁcients as shown in the ﬁgures. Figure
3.3 shows the histogram of the wavelet coefﬁcients that represents framel at subband
(1,0) and the Gaussian mixture that is used to ﬁt the probability distribution of these
coefﬁcients and the related parameters.

Figure 3.4 show the histograms of framel at subband(2,0) with the related GM M .

In the next step the parameters of the different classes are separated as explained
in 3.2.5 in the classiﬁcation problem and then a principle components analysis is done
for each class separately and the resultant principal components are used to compute
the new representation of the GM M parameters computed before. By projecting
each class on its own principle components, the clusters analysis starts takeing place
to study the separability between different classes. In this method the metric that
is used to study the clusters is the distance between clusters by using the factor p
mentioned in equation (3.5).

Figure 3.5 through Figure 3.8 show the result by applying the method as two

classes problem for different speech signals.

41

Subband(1,0)-histogram of the wavelet coeff. of frame1
15

 

l I I l I

I

 

 

 

 

     

 

 

 

0

-0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
GMM for the wavelet coeff. of frame 1 at subband (1,0)

40 I l I I I l
30 _ .
20 .
10 - 0“\- .\\ .

0 —/ f 1 1 1 1 \\\‘j\ ‘‘‘‘‘ M.
-0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08

Figure 3.3. The histogram of the wavelet coefﬁcients that represents frame1 at sub-

band (1,0) and its Gaussian mixture model.

42

Histogram of the wavelt ooeff. of frame 1 at subbnad(2,0)

 

 

 

 

 

6 I I I I
4 .
2 - .
-0.05 0 0.05 0.1
GMM of the wavelet coeff. of frame1 at subband(2,0)
25 I I I I
20 . 4
15 - -
10 ~ -
5 - .x .

 

 

 

 

Figure 3.4. The histogram of the wavelet coefﬁcients that represents frame1 at sub-
band (2,0) and its Gaussian mixture model.

43

0.035
O Start+End
0.03- are Middle
0'025' O Ro=1.8307
0.02-
o
N 0015- o g
8
0.01» 0
*
o
. o 11*
0.005 0
0. OO
¥
(53*
-0.0051 *
- .1 J ' J 4
-o.1 -0.05 0 0.05 0.1
PC1

 

PCA-mean-(4,13)

 

 

 

 

 

 

 

 

 

PC2

x 10'PCA-Variance-(4,13)
3 r

2.5 r

1.5

0.5

 

 

O
O
1- O O O
O
are
¥
. o 0 (5))
O 9.4
O
6 -4 '2 0 2
PC1 -3

Figure 3.5. PCA for the start/end and middle of the words at subband (4,13) and

SNR=15dB.

44

PCA-mean-(3,1) PCA-Variance-(3,1)

 

 

 

 

 

0.04 - 0.06 -
O Start+End O

0 02. O 8 4 Middle 0'05"

' 0 0.04 ~

 

 

8030 Ro=1.6908

 

 

 

 

 

O O
3 ~002- O o 001 O
0. ' ' are 0
0* 36
4 o
_ . 4. O
00 -0.01-
002»
-0.06~ O o
‘003 O
O o
-0.08 1 - . , -0.04 . . 1 .
-0.2 0 0.2 0.4 0.6 -0.05 0 0.05 0.1 0.15
PC1 PC1

Figure 3.6. PCA for the start/end and middle of the words at subband (3,1) and
SNR=15dB.

45

 

PCA-mean-(3,1)

 

 

 

 

 

 

 

 

 

0.021
O O Start+End
are Middle
001 1.
O *6 91g 316
0' 0 g;*
d
O ale 1*
8' 0 r8 0* ale
0
-002 t O R0=1.9008
-0.03r
O
-0.04 1 - . ,
-0.2 -0.1 0 0.1 0.2
PC1

0.035 .

0.03 -

0.025 '

0.02 -

0.01 r

0.005 r

 

PCA-Variance-(3,1)

©W@oo

 

-0.005

-0.1

0 0.1
PC1

Figure 3.7. PCA for the start/end and middle of the words at subband (3,1) and

SNR=15dB.

46

PCA-mean-(4,0) PCA-Variance-(4,0)

 

 

 

 

 

 

 

 

0.35- 0.02-
O Start+End
0'33) 4 Middle 0» *
*
O
0'25 Ro=1.525 _002. g 0 O
0.21 '
O Q
015. O % “0.04' O
N ' o N
0 0
LL 0 O.
0.1* o _0_05. O
Q) 0
0.051 0 <6
-0.08-
0h 3
*3 - 1- O
—0.05~ 32$ 0'
*-
-0.1 . . . . -o.12 OI . . .
-0.4 —0.2 0 0.2 0.4 -0.1 o 0.1 0.2 0.3
PC1 PC1

Figure 3.8. PCA for the start/end and middle of the words at subband (4,0) and
SNR=15dB.

47

In the classiﬁcation problem, we are trying to see if the eigen vectors that is
produces by a ceratin signal can be used as a basis for other signals, which means if
we have a frame and we need to classify this frame, then these frame undergos the
same steps until the PCA step and then the already computed eigen vectors for the
different classes are used to project this frame and to get the new coordinates for
it. The next step is to measure the distance between this projected point and the
centroid of every cluster, where the decision will be taken that the frame belongs to
one of these clusters where the computed distance is minimum, The following ratio

measure is used to ﬁnd which subband performs better , 3.6,

Distance between the test frame and the centroid of the transient cluster

Distance between the test frame and the centroid of the steady-state cluster
(3.6)

 

Ratio =

Where if this Ratio is less than 1, the frame is classiﬁed as transient. If the Ratio
is more than 1, the frame is classiﬁed as steady-state. Figure 3.9 shows an example of
the classiﬁcation problem, where in this example a speech signal that has in the end of
one of its words the phoneme ’n’. This signal is used to classify three frames that came
from another word that has ’n’ in the end. The subband that has the smallest Ratio
value is shown in Figure 3.9, where the Ratio is less than 1, which means that the
eigen vectors that was found for a certain signal can be used to classify the unknown
frames if they belong to transient periods of the speech signals or steady-state.

In Figure 3.9 the left side shows the PCA of parameter 1, and the right side
shows the PCA of parameter 2, where the blue bubbles refer to the parameters of
the transient frames, asterisks refers to the parameters of the steady-state frames,
red bubbles refer to the test frames projected on the eigen vectors of the transient
frames, and the black bubbles refer to the test frames projected on the eigen vectors

of the steady-state frames.

48

0.025 '

0.02 r

0.015 r

0.01 -

0.005 -

PC2

-0.005 r

-0.01 -

-0.015 -

-0.02

-0.2

 

 

PCA-mean-(1,0)
o

 

0 Transient

at Steady-state
0 Test frames
0 Test frames

 

 

 

l 1

-0.1 0 0.1 0.2
PC1

P02

0.025 -

0.02 -

0.015 -

0.01 r

0.005 ~

-0.005 r

-0.01 -

-0.015 r

-0.02

-0.1

 

PCA-Variance-(1 ,0)

l O M
0 0.1
PC1

Figure 3.9. PCA for the transient and steady-state parts of the words at subband
(1,0) and for the test frames.

As mentioned before in the clustering problem, the principal components of the

whole signal are computed Figure 3.10 through Figure 3.16 show the clustering of the

different classes across subbands.

49

PC2

 

 

PCA-mean-levelZ

 

004*
0.03- *
ale
0
001-3800 0
””3.
8 g 9160 O
ale ale
-0.02~
-0.03*
-0.04'
-0.05 - . i .
0 0.1 0.2 0.3 0.4

PC1

PC2

PCA-Variance-level2
0.02 -

0.015- O

0.01 -

0.005 -

 

-0.005 - Q
-0.01 -

-0.015 -

 

 

_0.02 1 J 1
-0.06 -0.04 -0.02 0
PC1

Figure 3.10. Clustering of different classes at subband(2,0).

50

PC2

PCA-mean-Ievel3 PCA-Variance-Ievel3

 

 

 

 

 

0.1 r 31(-
1X-
91%
0.05 r
-*
O
0 .
N
0
3'5. 0"
- . .. - Q
0’05 -. -002 r
o O ,1
'0025 ,3,
-0 1 ale 9
'003 1.3K
O
‘0.15 ‘ ‘ ‘ ' '0.035 ' ‘
“0.2 0 0.2 0.4 0.6 0 0.1 0.2
PC1 PC1

Figure 3.11. Clustering of different classes at subband(3,0).

51

PC2

-0.35

-0.5

Figure 3.12. Clustering of different classes at subband(4,0).

 

PCA-mean-Ievel4

O

 

 

0 0.5
PC1

PCZ

52

 

PCA-Variance-(4,0)

 

0- W
g
4 i
-0.02- a,
it
-0.03~
3k-
0
-0.04-
*
O
-0.05~
-0.06-
If
-0.07-
*-
-0.08 ‘ ' J
-0.1 0 0.1 0.2
PC1

0.3

0.1 r

0.08 r

0.06 -

0.04 r

0.02 -

PC2

-0.02 -

-0.04 -

-0.06 -

 

-0.08
-0.2

PCA-mean-level4

ale
0

0 0.2
PC1

l

0.4

0.6

N
0
D.

0.01

0.005 -

-0.005 r
-0.01 .
-0.015 -
-0.02 -

-0.025 -

-0.03 ' 4
-0.06 -0.04 -0.02 0

PCA-Variance—(4,5)

 

PC1

Figure 3.13. Clustering of different classes at subband(4,5).

Expectation maximization algorithm (EM), is a technique for separating

clusters[39]. We used it here to separate the transient cluster from the steady-state

cluster. For the example given in Figure 3.9, it was found that the subband that

shows separability between the clusters very close to the expected clusters is (1,0),

Figure 3.17 shows the estimated cluster of the transient periods of the speech signal

by EM in the ellipse, where the expected clusters are presented in different styles,

where the blue bubbles refer to the parameters of the transient frames, black bubbles

53

PCA-mean-level4 x 10'PCA-Variance-(4,8)
0.035 - 0 -

 

 

 

 

03
at O f
_ . at
0.03- 0'1
_ _ 716
0025 02 ®
1‘ -0.3~ 5
{0 216
0.02- -o.4-
3 - r 3 -05
a 0.015 a
at
0.01- 4 '0'6'
03
*- "0."7
0.005-
O
of -0.8-
0' -0.9-
311
-0.005 . . . . - . . '
-0.06 -0.04 -0.02 0 0.02 -2 -1 o 1

Figure 3.14. Clustering of different classes at subband(4,8).

refer to the parameters of the steady-state frames, Table 3.1 shows the number of
correctly classiﬁed transient frames by EM, unclassiﬁed transient frames and false
alarm frames.

Figure 3.18 shows the clustering of the classes for another signals with the test
frames, and Figure 3.19 shows the estimated clusters by EM at that subband, where
we were able to estimate a cluster that mainly captures the steady-state points. Table
3.2 shows the number of correctly classiﬁed steady-state frames, unclassiﬁed frames

and false alarm frames.

54

 

x 10'3PCA-mean-level4

 

x 10‘PCA-Variance-(4,10)
5 .

 

 

 

10
_* *
8 - O o
6 . 1* O
0 3
3K
4 - o - o .1
2 .
N N
8 8
0 ~ * r
0‘
-2 . -5 .
611
-4 -
‘6 . ale
_8 1 8 1 1 *1 _ 1 * n 4
-0.02 0 0.02 0.04 0.06 -3 -2 -1 0

Figure 3.15. Clustering of different classes at subband(4,10).

Table 3.1. Details of the frames in the estimated transient cluster by EM at sub-

band(1,0)

 

 

 

 

 

Frames Transient(Total # of Frames=52)
Correctly classiﬁed 43
Unclassiﬁed 9
False alarm 23

 

 

 

55

0.05 r

0.04 -

0.03 -

0.02 r

0.01 -

PC2
o

-0.01 -

-0.02 r

-0.03 -

-0.04 -

-0.05

-0.

PCA-mean-level4

 

 

1 L .
05 0 0.05 0.1

PC1

x 101’CA-Variance-(4,15)
2 ..

it?

 

PC2
J.
G

 

 

0.15 0 0.005 0.01 0.015 0.02
PC1

Figure 3.16. Clustering of different classes at subband(4,15).

Table 3.2. Details of the frames in the estimated steady-state cluster by EM at
subband(4,13)

 

 

 

 

 

Frames Steady-state(Total # of Frames=30)
Correctly classiﬁed 29
Unclassiﬁed 1
False alarm 16

 

 

 

56

Expected clusters compared to the estimated clusters by EM
0.02 .

0.015 -

0.01 -
0.005 r
0 _

8
a -0.005

 

-0.01 _

-0.015 '-

-0.02 -

-0.025 -

 

l

_003 1 1 L 1 1 1 1
-0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05 0.06
PC1

Figure 3.17. Comparison between the expected clusters and the estimated clusters
using EM.

57

x 10'3 PCA for parameter 2 at (4,13)

0 o
o- 000 12>

PC2

 

l I 1 l l l [O l l L l

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
PC1

Figure 3.18. PCA for the transient and steady-state parts of the words at subband
(4, 13) and for the test frames.

58

Comparison between the expected clusters and the estimated ones by
x 10'4 EM at (4,13)-parameter2

25-

20-

G)

15

CD

10

P02

 

 

PC1

x10-4

Figure 3.19. Comparison between the expected clusters and the estimated clusters
using EM at (4,13) for parameter 2.

59

3.3.1 The effect of using the GMM model with the PCA

In this part of the chapter we are trying to investigate the effect of the GM M that
have been used to ﬁt the probability distribution of the wavelet coefﬁcients and the
effect of the principal component analysis.

In the ﬁrst test we will try to see how parameterizing the wavelet coefﬁcients
using GM M affects the method. After ﬁnding the DWPT for the framed signal, the
mean and the variance of the wavelet coefﬁcients are computed and used to check the
clustering of these factors in different subbands. Figure 3.20 shows the clustering of
the mean and the variance of the wavelet coefﬁcients for the different classes, while
Figure 3.21 shows the clustering of the parameters that is estimated using GM M in
combination with the PCA. The separability factor related to every ﬁgure is shown
also.

The second test tries to ﬁgure out the effect of the PCA of the GM M parame-
ters. Figure 3.22 shows the clustering of the different classes parameters that were
estimated by the GM M without doing the PCA step, whereas, Figure 3.23 shows
the clustering of different classes using GM M in combination with PCA.

From these results we can conclude that our algorithm, that parameterize the
wavelet coefﬁcients using GM M model and then uses PCA to analyze the signals,
gives the best separability between the different classes.

Moreover, Figure 3.24 shows the relationship between the sample size and the
separability factor p under different signal to noise ratios. It was found that there
isn’t a clear pattern between p and the sample size, which means the relationship is

inconsistent.

60

PCA-mean-(3,0)

 

 

 

 

 

 

0.06-
0.05 - at 11* R0=1.1098
O
0.04- are
are
0.03’ 311
0.02-Qt; ,, 4.
8
0. 0.01 *d"
* CQS
0
0+ Q) Q *0 Q53 if
0 O f are
-001. 1614 O 0 $16
40 O 1‘
O O is
-0.02*
O O O
-0.03-
it-
-0.04 . . .
0 10 20 30
PC1

PC2

0.09 .

0.08 -

0.07 -

0.06 -

0.05 r

0.04 '

0.03 -

0.02 -

0.01 -

O

 

PCA-Variance-(3,0)

*—

 

R0=1.1315

 

 

 

Figure 3.20. Clustering of different classes mean and variance of the wavelet coefﬁ-
cients at subband (3,0).

61

PCA-mean-(3,0) PCA-Variance-(3,0)

 

 

 

 

 

 

 

 

 

 

0.1 r 0.025 r
R =2.2431 1* Ro=1.6422 at O
0.08 . o f 0.02 '
* 0 if
0.015 r
0.06 r 1"
O o 01 o
O - ' O
0.04- O o 4 3*
4 1'5
4 *1. 0.0051 O
N g 7 N O
Q 002 _ ‘1‘ O 0 Q g
0. m 911' 0.
34 , 0*
111
o -0.005
O 0 air if _ O©
-0.02- o 0
O 315 “0.01 r
O O
'0.04 t -0015 .
ale
11
-0.06 ‘ ‘ ' ‘ -0.02 ' '
-0.4 '02 0 0.2 0.4 '01 0 0.1
PC1 PC1

Figure 3.21. Clustering of different classes parameters that is estimated by the G M M
at subband (3,0).

62

PCA-Variance-(4,1)

 

0-045 ‘ Ro=1.0962

 

 

 

0.04 ~
0.035 . 4
0.03 -
8
a 0.025»
0.02 r O
0.015 * *

0.01

wwAI
O
O

0.0051

 

are 9. if at
4e 4
* 1 @j) 0 1 IO 1 1 1
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
PC1

Figure 3.22. Clustering of different classes parameters found using GM M without
doing PCA at subband (4,1).

63

0.03 -

0.02 r

0.01 -

PCZ

-0.01 -

-0.02 -

 

PCA-Variance-(4,1 )

 

 

 

 

Ro=1.6466
O
at at
o
at
o *4 "b
0% (5:34 I
O
O O *1? are
O 00 * O
O are

0

I

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08

PC1

Figure 3.23. Clustering of different classes parameters that is estimated by the GM M
using PCA at subband (4,1).

64

Separability factor versus sample size at different SNR

 

 

 

 

 

2.5 I F I I I
—4— SNR=10db
——s— SNR=15db
2-4 ' ——a— SNR=5db ‘
(I \.\\
2.3 - .,. / \\ -
\ I\\

 

Separability Factor

 

 

 

 

 

 

1.9 - .
1.8 - Mg
1 ‘7 I J I L J_
50 60 70 80 90 100 110
Sample Size

Figure 3.24. relationship between the separability factor and the sample size under
different S N R.

65

3.4 Summary and Discussion

A Gaussian mixture model has been used to characterize the probability distribution
of the wavelet coefﬁcients. A feature set was obtained for classiﬁcation and cluster-
ing. The classiﬁcation problem relies on the prior knowledge of the principal eigne
vectors of the samples from each class, while the clustering problem relies only on
a separability measure in the feature space between individual classes. Expectation
maximization algorithm has been used for cluster analysis, where the subbands that
show the best classiﬁcation also were best suited for cluster analysis using EM. The
proposed method showed preliminary results on how sparse representation can be
useful in capturing the eminent features of the speech signals that indicates the tran-
sient periods of the spoken words in a noisy environment. This can help the cochlear

implant patients to better communicate under adverse conditions.

66

CHAPTER 4

CONCLUSIONS AND FUTURE WORK

4.1 Summary of the Dissertation

In this thesis, two algorithms based on the sparse representation of the speech signals
have been introduced. Both methods rely on sparse representation, implemented
using discrete wavelet packet decomposition of the speech signals, due to its excellent
ability in capturing transient signals. The Power method uses this representation
in combination with second order statistics, while the parametric method uses a
Gaussian Mixture Model to characterize the distribution of the sparse representation
of the speech signal. In the Power method, it has been shown that the method
is affected by the dominant signals in the surrounding environment that may not
necessarily be the ones of interest. The Gaussian mixture model characterizes the
probability distribution of the wavelet coefﬁcients and therefore uses higher order
statistics. Using this parametrization we were able to get a feature set that can be
used in classiﬁcation and clustering. Implementing this method showed that there
are some subbands that better separate the different classes. In order to use this
result to improve cochlear implant patients performance in a noisy environment, these
subbands can be given higher weights than the rest of the subbands to activate the

implanted electrodes during the transient periods of the spoken words.

4.2 Future Work

The proposed Parametrization method assumes that the wavelet coefﬁcients can be
parameterized using a Gaussian mixture model of two Gaussians, and this may not
be always the case. Multiple Gaussians can be used for the parametrization step.

The technique that have been used to separate the different classes is the expectation

67

maximization technique. However, there are many different techniques that can be
used in cluster analysis and more experiments should be carried out to see which
technique performs better.

The proposed technique uses the sparse representation of the speech signal through
the wavelet packets. Other representations of the speech signal could be used if they
reveal new characteristics of the speech signal that can help in detecting the start and
the end of the words. The algorithm needs to be implemented and tested by cochlear
implant patients to compare the performance to the classical ﬁlter-bank techniques

under adverse conditions.

68

BIBLIOGRAPHY

[1] W. A. Yost, Fundamentals of Hearings an Introduction, Elsevier, 2007.

[2] R. Hinojosa and M. Marion, Histopathology of Profound Sensorineural Deafness,
Ann. New York Acd. of Sci, 1983.

[3] J. Deller, J. Hansen, and J. Proakis, Discrete- Time Processing of Speech Signals,
IEEE press, 2000.

[4] G. Borden, K. Harris, and L. Raphael, Speech Science Primer: Physiology,
Acoustics, and Perception of Speech, Baltimore, MD: Williams and Ailkins, 1994.

[5] F. C00per, P. Delattre, A. Liberman, J. Bost, and L. Gerstman, “Some Experi—
ments on the Perception of Synthetic Speech Sounds,” Acoust. Soc. Amer, vol.
24, pp. 597-606, November 1952.

[6] S. B. Waltzman, J. Thomas Roland, Cochlear Implants, Thieme Medical Pub—
lishers, 2006.

[7] B. Wilson, Signal Processing in Cochlear Implants:Audilogical Foundations, Sin-
gular Publishing Group, 1993.

[8] P. C. Loizou, “Mimicking the Human Ear,” IEEE Signal Processing Magazine,
vol. 15, pp. 101-130, September 1998.

[9] I. Hochmair-Desoyer, E. Hochmair, and K. Burian , “Design and Fabrications of
Multiwire Scala Tympani Electrodes,” Annals of New York Academy of Scienses,
vol. 405, pp. 173-182, 1983.

[10] G. Clark, R. Shepherd, R. Black, and Y. Tong , “Design and Fabrications of the
Banded Electrode Array,” Annals of New York Academy of Scienses, vol. 405,
pp. 191-201, 1983.

[11] P. C. Loizou, “Introduction to Cochlear Implants,” IEEE Engineering in
Medicine and Biology, vol. 18, pp. 32-42, January/February 1999.

[12] P. C. Loizou, “Signal-processing Techniques for Cochlear Implants,” IEEE En-
gineering in Medicine and Biology, pp. 34-46, May/June 1999.

[13] M. Dorman, M. Hannley, K. Dankowski, L. Smith, and G. McCandlless, “Word
Recognition by 50 Patients Fitted with the Symbion Multichannel Cochlear Im-
plant,” Ear and Hearing, vol. 10, pp. 44-49, 1989.

69

[14]

[15]

[161

[171

[181

[191

[20]

[211

[22]

[23]

[24]

M. White, M. Merzenich, and J. Gardi, “Multichannel Cochlear Implants: Chan-
nel Interactions and Processor Design,” Archives of 0tolaryngology, vol. 110, pp.
493-501, 1984.

B. Wilson, C. Finley, D. Lawson, R. Wolford, D. Eddington, and W.Rabinowitz,
“Better Speech Recognition with Cochlear Implants,” Nature, vol. 352, pp. 236-
238, July 1991.

B. Wilson, D. Lawson, and M. Zerbi, “Advances in Coding Strategies for
Cochlear Implants,” Advances in 0t0laryngology-Head and Neck Surgery, vol.
9, pp. 105-129, 1995.

M. Dorman, and P. Loizou, “Changes in Speech intelligibility as a Emotion
of Time and Signal Processing Strategy for an Interaid Patient Fitted With
Continuous Interleaved Sampling(CIS)processors,” Ear and Hearing, vol. 18, pp.
147-155, 1997.

G. Clark, “The University of Melbourne-Nucleus Multi-electrode Cochlear Im-
plant,” Advances in Uta-Rhino-Laryngology, vol. 38, pp. 1-189, 1987.

P. Seligman, J. Patrick, Y. Tong, G. Clark, R. Dowell, and P. Crosby, “A Signal
Processor for a Multiple-Electrode Hearing Prothesis,” Acta 0t0laryngologica,
pp. 135-139, 1984.

P. Blarney, R. Dowell, and G. Clark, “Acoustic Parameters Measured by a
F ormant-Estimating Speech Processor for a Multiple Channel Cochlear Implant,”
Journal of the Acoustical Society of America, vol. 82, pp. 38-47, 1987.

R. Dowell, P. Seligman, P. Blarney, , and G. Clark, “Evaluation of a Two-Formant
Speech Procesing Strategy for a Multichannel Cochlear Prothesis,” Annals of
0tology, Rhinology and laryngology, vol. 96, pp. 132-134, 1987.

J. Patrick, and G. Clark, “The Nucleus 22-Channel Cochlear Implant Syastem,”
gy, Ear and Hearing, vol. 12, pp. 3-9, 1991.

N. Tye—Murray, M. Lowder, and R. Tyler, “Comparison of the FO/F2 and
F0 / F 1 / F2 Processing Strategies for the Cochlear Corporation Cochlear Implant
,” gy, Ear and Hearing, vol. 11, pp. 195-200, 1990.

k. Nie, N. Lan, and S. Gao, “Implementation of CIS Speech Processing Strategy
For Cochlear Implants by Using Wavelet Transform,” Proceedings of ICSP, pp.
1395-1398, 1998.

70

[251

[25]

[27]

[281

[291

[30]

[311

[32]

[331

[341

[351

J. Yao, and Y. Zhang, “The Application of Bionic Wavelet Transform to Speech
Signal Processing in Cochlear Implants Using Neural Network Simulations,”
IEEE Transactions on Biomedival Engineering, vol. 49, pp. 1299-1309, November
2002.

L. M. Friesen, R. V. Shannon, D. Baskent, and X. Wang, “Speech Recognition in
Noise as a Function of the Number of Spectral Channels: Comparison of Acoustic
hearing and Cochlear implants,” Acoust. Soc. Amer, vol. 110, pp. 1150-1163,
August 2001.

J. C. Middlebrooks, “Effects of Cochlear Implant Pulse Rate and Inter-Channel
Timing on Channel Interaction and Thresholds,” Acoust. Soc. Amer, vol. 116,
pp. 452-468, July 2004.

J. Maj, L. Royackers, M. Moonen, and J. Wouters , “SVD-Based Optimal Fil-
tering for Noise Reduction in Dual Microphone Hearing Aids: A Real Time
Implementation and Perceptual Evaluation,” IEEE Transactions on Biomedival
Engineering, vol. 52, pp. 1563-1573, September 2005.

D. K. Eddington, W. H. Dobelle, D. E. Brackmann, M. G. Mladevosky, and J.
L. Parkin, “Auditory Prothesis Research with Multiple Channel Intracochlear
Stimulation in Man ,” Ann. 0tol. Rhinol. Laryngol, vol. 87, pp. 1-39, 1978.

K. Nie, G. Stickney, and F. Zeng , “Encoding Frequency Modulation to Improve
Cochlear Implant Performance in Noise,” IEEE Transactions on Biomedival En-
gineering, vol. 52, pp. 64-73, January 2005.

N. Lan, K. B. Nie, S. K. Gao, and F. G. Zeng , “A Novel Speech-Processing Strat-
egy Incorporating Tonal Information for Cochlear Implants,” IEEE Transactions
on Biomedival Engineering, vol. 51, pp. 752-760, May 2004.

Y. Suhail, and K. Owiess , “Augmenting Information Channels in Hearing Aids
and Cochlear Implants Under Sever Conditions,” ICASSP Conference, pp. 889-
892, 2006.

B. S. Wilson, D. T. Lawson, J. M. Muller, R. S. Tyler, and J. Kiefer , “Cochlear
Implants: Some Likely Next Steps,” Annual Reviews Biomedical Engineering,
vol. 5, pp. 207-249, 2003.

S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999.

K. Baker , “Singular Value Decomposition Tutorial,” , 2005

71

[36] http: //en.wikipedia.org/wiki/Gaussianmixturemodel
[37] www.utdallas.edu/ loizou/speech/noizeus
[38] J. Shlens , “A Tutorial on Principal Component Analysis,” , December 2005.

[39] J. A. Bilmes , “A Gentle Tutorial of the EM Algorithm and its Application
to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” ,
April 1998.

72

[1|]lllllll|llli111111111

1293 02956 1721

  

MEN
1] I
l 1 l]
1 l ‘l.
3