PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

WYPEMEE

 

T §413 :2

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRC/DateDuepGS—p. 1 5

 

 

 

i
B\

Blind Source Recovery: Theoretical Formulations,
Implementations and Application to CDMA
Communication Systems

By

Khurram Waheed

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical and Computer Engineering

2003

Blind

Blind Sou

approach 1

 

 

include co.
is to rccox
environmc
adeptcd t
Constraint
adaptatior

Tl
framcwm
general L
imCICSlin
[”3 for
haVe DEC
algorithm
undercm

BSR

alg

ABSTRACT

Blind Source Recovery: Theoretical Formulations, Implementations
and Application to CDMA Communication Systems
By

Khurram Waheed

Blind Source Recovery (BSR) is an autonomous (or unsupervised) stochastic adaptation
approach that denotes recovering signals from measurements in environments that may
include convolution, transients, and even possible nonlinearity. The primary goal of BSR
is to recover original source signals, as best as possible, even in the absence of precise
environment identiﬁability. A discrete-time optimization framework for BSR has been
adopted based on minimization of the Kullback-Lieblar Divergence, subject to the
constraints of a state-space representation, using the Riemannian contra-variant gradient
adaptation.

The modeling of the environment and its representation is vital in the proposed
framework. The adoption of the state space framework allows for the derivation of more
general update laws capable of catering to most known ﬁltering paradigms. For the
interesting case of multi-variable linear time-invariant dynamic BSR, parametric update
laws for discrete—time canonical feedforward and feedback state space conﬁgurations
have been derived. Higher Order Statistics have been explored to obtain adaptation
algorithms that are purely blind of actual source distributions. The BSR cases of
undercomplete and overcomplete mixtures have also been investigated. A new two-stage

BSR algebraic algorithm for sparse static overcomplete mixtures has been developed.

Ir
Further. 1'

modern l
dcmonstr.

Cl
research 1
scenarios.
and comp:

this thesis

 

 

 

0 Info-th
invcsri
phase
feedba

° Ancx'
of ada
sourci-

' Thcc

algori

ovcrc

OfAI.

Deve‘

CDM

framc

Further, the BSR framework has been successfully formulated for multi-user detection in

modern CDMA wireless communication networks. Promising new results clearly

demonstrate the effectiveness and practicality of the formulated approach.

Computer Simulations have been widely conducted during the course of this

research to evaluate the performance of all the developed algorithms for a variety of

scenarios. In most cases, the performance veriﬁcation has been done using actual speech

and communication data in the presence of noise. In summary the main contributions of

this thesis are

Info-theoretic update laws for a multi-variable dynamic state space network were
investigated. Natural gradient based BSR update laws for the cases of minimum
phase and non-minimum phase mixing environments as well as both feedforward and
feedback canonical state space conﬁgurations were developed and implemented.

An exploration and extension of parametric source distribution models and derivation
of adaptive score-functions (or non-linearities) for the BSR of sources with multiple
source distributions.

The development of the Algebraic ICA (AICA) Algorithm, which is a new ICA
algorithm for blind, sparse, static mixing matrix recovery. It enables the BSR from
overcomplete mixtures of speech and other sparse distributions using a combination
of AICA and interior point linear programming (IP-LP) techniques.

Development and simulation of new Blind Multi-user Detection algorithms for DS-
CDMA and WCDMA wireless communication networks based on the BSR

framework.

Dedicated to the most important people in my life

My parents, my wife and family

iv

lwould 1'
for his g
his value
stimulalio

Sir

Professor

 

 

 

Professor
Dtpanmm
my career
I l
encouragg
lndividua'
M
affection
completlr
eTlrlchcd
L
Skiers? lr

ACKNOWLEDGEMENTS

I would like to express my sincere thanks and appreciation to Professor Fathi M. Salem
for his guidance, support and inspiration throughout the course of this research. Without
his valuable, kind consideration, understanding, ﬁrm belief in my capabilities and
stimulation of my mind this dissertation would not be an object of my pride.

Sincere gratitude is extended to the members of my guidance committee:
Professor Habib Salehi from the Department of Statistics, Professor Hassan Khalil,
Professor John R. Deller Jr., Professor Ning Xi and Professor Tongtong Li from the
Department of Electrical and Computer Engineering for their valuable contributions to
my career and service on my committee.

I would not be who I am today without the unconditional love, support and
encouragement of my parents Abdul Waheed and Najma. They have shaped me into the
individual I am and inspired me to pursue my dreams in life.

My sincere thanks and gratitude to my wife Ayesha for her patience, love
affection, support, encouragement and sacriﬁce during every step that I made towards the
completion of this research. My love and adoration to my kids Rafay and Iqra, who have
enriched my life but have to spend most of their time without me.

Last, but not least, I would like to express my sincere thanks to my brother,
sisters, in-laws and their family members for their spiritual and moral support during the

course of this research.

LIST (
LIST (

Chapter
INTRO

1.1
1.2
13

1.4 1

1.4.1

 

TABLE OF CONTENTS

LIST OF TABLES x1v
LIST OF FIGURES xv

 

 

 

 

 

Chapter 1
INTRODUCTION ......................................................................................... 1
1.1 Background - - -- -- - 3
1.2 Importance of Modeling.... ..................... -- -- - -- - 4
1.3 The State Space Framework----- - - -- -- 4
1.4 Problem Deﬁnition: Adaptive Framework for Blind Source Recovery ...... 6
1.4.1 Blind Source Separation: An Example ........................... i ............................ 8
1.5 Organization of the dissertation -- -- - ....... 11

Chapter 2
REVIEW OF LITERATURE ..................................................................... 14
2.1 The Natural Gradient Framework--- .......... - - - -- 18
2.1.] Deﬁnition of the Natural Gradient ............................................................ 19
2.1.2 Theorem 2.1 .............................................................................................. 20
2.1.3 The Natural Gradient Learning ................................................................. 21
2.2 Blind Source Separation. ................................................................................ 22
2.2.1 Theorem 2.2 .............................................................................................. 23

vi

23

2.4

Chapter

BLIND
SPACE

3.1
3.1.1

3.1.2

3.2

3.2.1

3.3

 

 

2.3 Blind Source Deconvolution. - - - - - - -- - 24

 

 

 

 

2.3.1 Theorem 2.3 .............................................................................................. 24

2.4 Equivariant Performance of the Natural Gradient - - 25
Chapter 3

BLIND SOURCE RECOVERY: A FRAMEWORK IN THE STATE

SPACE .......................................................................................................... 26

3.1 Formulation of the Optimization Structure -- - 27

3.1.1 Algorithms for the Nonlinear Dynamic Case ........................................... 28

3.1.2 Algorithm for Nonlinear Dynamic Canonical Neural Structure ............... 31

3.2 Algorithms for the Linear Dynamic Case - 34

3.2.1 Extensions to the Natural Gradient ........................................................... 37

3.3 Blind Source Recovery Algorithms for Minimum Phase Environments... 38

3.3.] Theorem 3.1: Feedforward Minimum Phase Demixing Network ............ 38
3.3.1.1 Derivation of Update Laws for F eedforward BSR ............................... 39
3.3.1.2 Auxiliary Conditions for Convergence ................................................. 41

3.3.1.2.] Remark 3.1: ..................................................................................... 41
3.3.1.2.2 Remark 3.2: ..................................................................................... 42

3.3.2 Theorem 3.2: Feedback Minimum Phase Demixing ................................ 43
3.3.2.1 Derivation of Update Laws for Feedback BSR .................................... 43
3.3.2.2 Auxiliary Conditions for Convergence ................................................. 46

3.3.2.2.] Remark 3.3: ..................................................................................... 46

vii

3.4
[min
3.4.
3.

B

3.5

3.5.1

Chapter

BLIND
FUNCT]

 

3.4 Blind Source Recovery Algorithms for Non-minimum Phase

Environments ....... .......... -- - ...................... - 47

 

3.4.1 Theorem 3.3: Feedforward Non-minimum Phase Demixing ................... 47

3.4.1.1 Derivation of Update Laws for Feedforward Non-minimum Phase

 

BSR ............................................................................................................... 48
3.4.1.1.] Remark 3.4: ..................................................................................... 55

3.5 Simulation Results - - - - -- -- -- -- 56
3.5.1 BSR Simulations -- Minimum Phase Mixing Environments .................... 58
3.5.1.1 Simulation 1: Feedforward Recovery of Minimum Phase Mixing ....... 61
3.5.1.2 Simulation H: Feedback Recovery of Minimum Phase Mixing ........... 65

3.5.1.3 Simulation III: Feedforward Recovery of Non-minimum Phase

 

 

Mixing ............................................................................................................... 69
3.6 Remarks. ....... - .................... -- - - i - -- - , 74
Chapter 4
BLIND SOURCE RECOVERY USING ADAPTIVE SCORE
FUN CTION S.... ...... . .............. .. ...... 75
4.1 Generalized Gaussian Distribution Model - -- -- - 77
4.1.1 Generalized Kurtosis Measures ................................................................ 79
4.1.2 Relationship between Bi (oi ,ori ) and ed .................................................. 81

4.1.3 Derivation of Generalized Gaussian Adaptive Score Function (GGASF) 82

4.2 Hyperbolic Source Density Models - - - -- -- 84

 

4.2. 1 Background ............................................................................................... 84

viii

4.2.1.1 Pearson Distribution Model .................................................................. 84

 

 

 

4.2.1.2 Generalized Cauchy Distribution .......................................................... 86

4.2.2 Proposed Hyperbolic Distribution Models ..; ............................................ 87

4.2.2.1 Hyperbolic Sub-Gaussian Model .......................................................... 87

4.2.2.2 Hyperbolic Super-Gaussian Model ....................................................... 89

4.2.3 Derivation of Hyperbolic Adaptive Score Function (HASF) ................... 91

4.3 Adaptive Estimation of the Fisher (or normalized) Kurtosis ..... - - 94

4.4 Parameter Regulation for the Proposed Adaptive Score Functions .......... 95

4.4.1 Choice of parameters for Generalized Gaussian Adaptive Score

Function ................................................................................................................... 96

4.4.2 Choice of parameters for Hyperbolic Adaptive Score Function ............... 96

4.5 Simulation Results - ....... - -- 97

4.5.1 Environment Model .................................................................................. 98

4.5.2 Demixing Network Structure .................................................................... 98

4.6 Remarks...-. .......... - -- - - - -- ..... 103
Chapter 5

BLIND SOURCE RECOVERY OF UNDERCOMPLETE AND

OVERCOMPLETE MIXTURES ........ 105

5.1 BSR of Undercomplete Mixtures ................................................................. 107

5.1.1 Efﬁcient reduction of Mixture Dimensions ............................................ 108

5.1.1.1 Algebraic Principal Component Analysis ........................................... 108

5.1.1.2 Adaptive Principal Component Analysis ............................................ 109

ix

.....

 

5.4

Chapte

BLIND
CDMA

6.1
6.2

63

6.3

 

5.2 BSR of Overcomplete Mixtures---- - - -- - - 110

 

5.2.] Two Stage Approach to Overcomplete ICA ........................................... 110
5.3 Algebraic Independent Component Analysis--. ...... - - -- -- - -- - -- 112
5.3.1 The Algebraic ICA Algorithm ................................................................ 116
5.3.2 Algebraic Matrix-Distance Index (AMDI) ............................................. 120
5.3.3 Resolution of AICA Local Minima and Ambiguities ............................. 121
5.3.4 Inferring Sources from Overcomplete Mixtures ..................................... 123
5.3.5 Simulation Examples .............................................................................. 124
5.3.5.1 Simulation 1: 4 x 2 Laplacian Data ..................................................... 125
5.3.5.2 Simulation H: Overcomplete Speech Data (3 x 2 Case) ..................... 127
5.3.5.3 Simulation III: 5 x 3 Speech Data ....................................................... 130

5.3.5.4 Simulation IV: Independent Identically Distributed (iid) Laplacian

 

Data ..................................................................................... ' ........................ 134
5.4 Remarks ........ ...... 136
Chapter 6

BLIND SOURCE RECOVERY APPROACHES FOR WIRELESS lDS-
CDMA AND WCDMA DOWNLIN K SYSTEMS 138

 

 

 

6.1 User Detection in CDMA Systems- - - - - - - 139
6.2 CDMA Downlink Communication Channel - - - - -- 143
6.3 Downlink Signal Models for CDMA Systems - -- -- 146
6.3.1 DS-CDMA Signal Model ....................................................................... 147
6.3.2 WCDMA (UMTS F DD) Signal Model .................................................. 151

X

6.4 CDMA User Detection Structures- - - - - -- -- 154

6.4.1 Conventional Detection Schemes ........................................................... 154
6.4.1.1 Matched Filter (MF) Detector..................‘. .......................................... 155
6.4.1.2 RAKE Detector ................................................................................... 156
6.4.1.3 Linear Minimum Mean Squared Error (LMMSE) Detector. .............. 157
6.4.1.4 Channel Estimation Using The Common Pilot Channel (CPICH) ..... 158

6.4.1.4.] Channel Estimation Procedure ...................................................... 159

6.4.2 Proposed Blind Source Recovery Detection Schemes ............................ 160

6.4.2.1 Natural Gradient Blind Multi-User Detection (BMUD) Algorithms. 160

6.4.2.1.1 Feedforward BMU D Conﬁguration: ............................................. 163
6.4.2.1.2 Feedback BMUD Conﬁguration I: ............................................... 164
6.4.2.1.3 Feedback BMUD Conﬁguration II: ............................................. 165

6.4.2.2 RAKE-Blind Source Recovery (RAKE—BSR) and RAKE-Principal

 

 

Component Analysis (RAKE-PCA) Detectors ................................................... 167

6.5 Simulation Results - ----- - - -- 169

6.5.1 Simulation 1: DS-CDMA System ........................................................... 170

6.5.2 Simulation H: WCDMA (UMTS F DD) System ..................................... 173

6.6 Remarks....--- - -- -- - -- - - -- - - -- - 176
Chapter 7

CONCLUSIONS. ........ . ...................................................... ............... . 178

xi

APP]

Appen
letof

Appenr

Kuﬂbar
Source

Append

State Sp
Cl.

(12. 1

Appendi
Pﬂdbrnl
Appendil
DJJ.
1111
D141
01;;

D15.

 

APPENDICES

 

 

Appendix A
List of Abbreviations and Acronyms ...................................................... 184
Appendix B
Kullback Lieblar Divergence: A Performance Functional for Blind
Source Recovery ........................................................................................ 187
Appendix C
State Space Network: Specialized Filtering Structures ........................ 190
C.1. IIR Filtering - - -- - 190
C.2. FIR Filtering- ----- - - - - - -- -- -- - - - 193
Appendix D
Performance Benchmarks for Blind Source Recovery ......................... 195
Appendix D]. Conventional Performance Benchmarks ...................... 195
D.l.l. Signal to Noise Ratio (SNR) ................................................................... 195
D.l.2. Multichannel Intersymbol Interference (MISI) ...................................... 196
D. l .3. Maximum signal to interference plus noise ratio (SINRM) ................... 197
D] .4. Mean Squared Error (MSE) .................................................................... 198
D.l.5. Distortion Measure .................................................................................. 198
D.l.6. Separation Measure ................................................................................. 199

xii

 

Appen

D1

 

 

 

 

APpend
Multi-I‘
12.1.

BIBL

Appendix D.2. Quadratic Information Measure (QIM) ....................... 200

 

D.2.1. Practical Issues in Estimating Independence of Output Sequences ........ 200
D.2.2. Independence Benchmark using Quadratic Mutual Information ............ 202
D.2.2.1. Renyi’s Entropy .................................................................................. 202
D.2.2.2. Parzen Window Estimator .................................................................. 203
D.2.3. Quadratic Independence Measure (QIM) ............................................... 204
D.2.4. Simulations Results ................................................................................. 209
D.2.5. Observations ........................................................................................... 211
Appendix D.3. Algebraic Matrix Distance Index (AMDI) ................... 212
Appendix E
Multi-Input Multi-Output (MIMO) Canonical Form .......................... 214
E.1. Transformations required for a MIMO Transfer Function 215
BIBLIOGRAPHY ............................................................................ 219

xiii

LIST OF TABLES

Table 3.]. Quantitative Signal Recovery Comparison of the proposed algorithms .......... 73

xiv

   

Figure 1
Figure 1 .

Figure
Function

Figure 3.
Figure 3.:
Figure 3.3

Figure 3.~
311d Poles.

Figure 3.5
Environmi

Figure 3.6

 

Figure 3.7.
Demixing

Figure 3.8.
NCIWOrk T

Figure 3.1(

Figure 3']‘

Figure 3.1;

Figure 3.1 3
ms and P

LIST OF FIGURES

Figure 1.1. General framework for Blind Source Recovery ............................................... 6
Figure 1.2. Blind Source Separation (a) Original Sources (b) Observed Mixtures ............ 9
Figure 1.3. Blind Source Separation (a) Separated Signals (b) Global Transfer
Function ............................................................................................................................ 10
Figure 3.1. State Space Mixing Environment ................................................................... 34
Figure 3.2. State Space Demixing Network ...................................................................... 35
Figure 3.3. State Space Feedback Demixing Structure ..................................................... 43

Figure 3.4. Minimum Phase Mixing Environment (a) Environment Transmission Zeros
and Poles, (B) Impulse Response of the Environment Model .......................................... 59

Figure 3.5. Minimum Phase Mixing Environment, Impulse Response of (a) Theoretical
Environment Inverse, (b) Estimated Demixing Network ................................................. 60

Figure 3.6. Minimum Phase Mixing Environment, Final Global Transfer Function ....... 61

Figure 3.7. Minimum Phase Feedforward BSR —— All poles at the theoretical solution: (a)
Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 62

Figure 3.8. Minimum Phase Feedforward BSR -— All poles set to zero: (a) Demixing
Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ........................ 63

Figure 3.9. Minimum Phase Feedforward BSR — All poles initialized arbitrarily: (a)
Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 64

Figure 3.10. Minimum Phase Feedback BSR — All poles at the theoretical solution: (a)
Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 66

Figure 3.11. Minimum Phase Feedback BSR - All poles set at zero: (a) Demixing
Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ........................ 67

Figure 3.12. Minimum Phase Feedback BSR — All poles initialized arbitrarily: (a)
Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 68

Figure 3.13. Non-minimum Phase Mixing Environment (a) Environment Transmission
Zeros and Poles, (b) Impulse Response of the Environment Model ................................. 70

XV

Figure .
Theoreiz.

 

Figure :
Functior

Figure
Perfomin

Figure 4.

 

Figure 4..

Figure 4
parametei

Figure 4
function I

Figure 4.
range 0f1

Flgure 4.
range of

I:lgure 4
(b) hwc?

Flglue 4

“EM 4‘
“Verse.
ngure 4
kenVorl

Figure .
Batch K

FlgUre ,
ndex’(

Flgure
unCUc

Figure 3.14. Non-minimum Phase Mixing Environment, Impulse Response of (a)
Theoretical environment Inverse, (b) Estimated Demixing Network ............................... 71

Figure 3.15. Non-minimum Phase Mixing Environment, (a) Final Global Transfer
Function, (b) Demixing Network Transmission Zeros and Poles ..................................... 72

Figure 3.16. Non-minimum Phase Mixing Environment, Convergence of MISI

Performance Index. ........................................................................................................... 73
Figure 4.1. Some members of Generalized Gaussian Distribution Family ...................... 78
Figure 4.2. Fisher kurtosis Characteristics for the Generalized Gaussian Family ............ 81

Figure 4.3. Scaled Generalized Gaussian Score Function for different values of
parameter or ....................................................................................................................... 83

Figure 4.4. Fisher Kurtosis Characteristics for the symmetric Pearson Model as a
function of ,u .................................................................................................................... 85

Figure 4.5. (a) Proposed Symmetric Sub-gaussian Density Model, (b) Fisher Kurtosis
range of the proposed model; forO S k S 3 and 2.,- =1 . ..................................................... 88

Figure 4.6. (a) Proposed Symmetric Super-gaussian Density Model, (b) Fisher Kurtosis
range of the proposed model; for 0 S k .<_ 3; 11,- =1 . ............................... p .......................... 90

Figure 4.7. Score Functions based on the proposed: (a) hyperbolic sub-gaussian model,
(b) hyperbolic super-gaussian model; for 0 S k S 3; 1,- =1. ............................................ 92

Figure 4.8. Proposed Adaptive Hyperbolic Score Function: some possible combinations.
........................................................................................................................................... 94

Figure 4.9. (a) Mixing Environment Transfer Function, (b) Theoretical Environment
Inverse ............................................................................................................................... 99

Figure 4.10. Feedforward BSR using Adaptive Score Functions: (a) Estimated Demixing
Network Transfer Emotion, (b) Convergence of MISI Index ........................................ 100

Figure 4.11. Feedforward BSR using Adaptive Score Functions: (b) Convergence of
Batch Kurtosis, (c) Final Global Transfer Function ....................................................... 101

Figure 4.12. Feedback BSR using Adaptive Score Functions: (a) Convergence of MISI
Index, (b) Convergence of Batch Kurtosis ..................................................................... 102

Figure 4.13. Feedback BSR using Adaptive Score Functions: Final Global Transfer
Function .......................................................................................................................... 103

xvi

Figure 5
space.”.

Figure 5

initial w

Figure 5
Converg

Figure 5

Mixed s;

Figure 5.1
Figure 5.7

Figure 58‘

  
 

..........

Humor

 

Figure 6.2,
Humor
Humor
Figme 6. 5.
Figul‘e 6.6.

“Wear
051%”

 

FiEBJre 6. 8,
will] PErfeC.'
BEWC ......

 

Figure 5.1. A scatter plot example of the projection of two sources onto the 2 x 2 mixture
space ................................................................................................................................ 115

Figure 5.2. Convergence of the weights in the 2 x 2 mixture space; wi(0) denotes the
initial weight locations while wi(oo) denotes the ﬁnal weight location. ........................ 119

Figure 5.3. BMMR for Laplacian Data using AICA: (a) Interim Results, (b) Final
Convergence ................................................................................................................... 126

Figure 5.4. Overcomplete Speech Mixture (3 x 2 Case): (a) Original speech sources, (b)

Mixed speech .................................................................................................................. 128
Figure 5.5. Overcomplete Speech Mixture (3 x 2 Case): (a) Recovered speech, (b)
Convergence of Normalized AMDI over 100 experiments ............................................ 129
Figure 5.6. Overcomplete Speech Mixture (5 x 3 Case): 3 Overcomplete Mixtures ..... 131

Figure 5.7. Overcomplete Speech Mixture (5 x 3 Case): Original 5 Speech Utterances 132

Figure 5.8. Overcomplete Speech Mixture (5 x 3 Case): Recovered 5 Speech Utterances
......................................................................................................................................... 133

Figure 5. 9. Unambiguous AICA estimate using multiple runs: (a) using H1, (b) using

H 2 .................................................................................................................................. 135
Figure 6.1. Signal Generation Model for a QPSK DS-CDMA system .......................... 147
Figure 6.2. Signal generation based on the proposed 3GPP UMTS F DD standard ....... 152
Figure 6.3. Channel Estimation Using CPICH ............................................................... 160
Figure 6.4. Feedforward Demixing Structure ................................................................. 163
Figure 6.5. Feedback Demixing Structure I .................................................................... 164
Figure 6.6. Feedforward Demixing Structure II ............................................................. 165

Figure 6.7. DS-CDMA downlink performance for K=20, 50 users (a) using Gold Codes
G=63, (b) using OVSF Codes G=64. .............................................................................. 172

Figure 6.8. WCDMA Downlink System for K=20, 50 users (a) Performance Comparison

with Perfect Channel Estimation, (b) Performance Comparison with Imperfect Channel
Estimate ........................................................................................................................... 174

xvii

 

Figure l
with 1111

Figure l

(b10113.

 

Figure I
of Ql.\1 ;

Figure 6.9. WCDMA Downlink System for K=20, 50 users: Performance Comparison
with Imperfect Channel Estimate and Inter-Cellular Interference .................................. 175

Figure D. 1. Effect of length of observation sequence on (a) Cross Information Measure,
(b) Quadratic Independence Measure (QIM) .................................................................. 208

Figure D.2. (a) Convergence of MISI performance Index, (b) Corresponding Performance
of QIM performance Index ............................................................................................. 210

xviii

 

Blind 501
recover 11
SOUICC pn
CllVirol-malC
illfo-tthn

0f the Cnv

 

Optimzn an

811
in the “flea
Separation :
0f recQVCril
thir linear
deconl’Oly i r
a prion' 1'an

[33’ 54]. Th

Chapter 1 '

INTRODUCTION

Blind Source Recovery (BSR) is an unsupervised adaptation technique to stochastically
recover independent original source signals from corrupted measurements caused by
source propagation through a noisy, linear or nonlinear, dynamic mixing and ﬁltering
environment. The measurements from this environment are the only inputs to an adaptive
info-theoretic BSR network, which seeks to counteract, to the extent possible, the effects
of the environment [34, 64, 139, 140]. Eventually, the purpose of BSR is to develop an
optimal autonomous blind estimate of original signals [8, 9, 53, 110].

Blind Source Recovery (BSR) is a challenging signal processing formulation that
in the linear case includes the well-known adaptive ﬁltering sub-problems of blind source
separation and blind source deconvolution. Blind Source Separation (B88) is the process
of recovering a number of original stochastically independent sources/signals when only
their linear static mixtures are available. Blind Source Deconvolution (BSD) deals with
deconvolving the effects of a temporal or a spatial mixing linear ﬁlter on signals without
a priori information about the ﬁltering medium/network or the original sources/signals

[33, 54]. This dissertation presents a generalized framework based on the multi-variable

l

 

 

canonic
system.
from 1111
have als

l
indepent
signals’s
learning
World ap

T
analyses ;
aPproachi
77], We 1:

rePresenta

 

 

 

The fram
minimurn
Separate u
and chOn
[112, 1401
derived fro
mixfures o
llndercOmpI

SOlution c l

canonical state space representation of both the mixing environment and the demixing
system. Various ﬁltering paradigms have been consequently derived as special cases
from the proposed framework. Simulation results verifying the theoretical developments
have also been included in this work [109, 110, 126].

For this unsupervised adaptive ﬁltering task, only the property of signal
independence is assumed. No additional a priori knowledge of the original
signals/sources is assumed. BSR requires few assumptions and possesses the self-
learning capability, which render such networks attractive from the viewpoint of real
world applications where on-line training is often desired.

The challenges for the BSR reside in the development of sound mathematical
analyses and a framework capable of handling a variety of diverse problems. While other
approaches to BSR have used more conventional signal processing structures [33, 60,
77], we propose a comprehensive BSR framework based on multi-variable state-space
representations, optimization theory, the calculus of variations and higher order statistics.
The framework is utilized to develop natural gradient adaptation algorithms for both
minimum phase and non-minimum phase environments [110, 112, 126, 129, 135, 139].
Separate update laws for the feedforward and feedback recovering (i.e., both demixing
and deconvolving) network parameters have been derived and veriﬁed by simulations
[112, 140]. Further, algorithms have been extended using adaptive score functions
derived from comprehensive source distribution models for unmitigated recovery from
mixtures of multiple source distributions [127, 130, 136, 138]. The problems of
undercomplete and overcomplete BSR have also been extensively studied. A new

solution, called Algebraic Independent Component Analysis (AICA) for the case of

 

 

static. r
propose

commur

 

problen:

conventi

1.1 B:

Interest 1

   
   
 

during r;
SCparatioz
parallel a1

ability to

 

stimuli a

static, overcomplete mixtures of sparse sources has been proposed [132-134, 137]. The
proposed adaptive BSR framework is also applied to the modern CDMA wireless
communication networks (such as DS-CDMA, WCDMA etc.) to efﬁciently address the
problem of multi-user detection in downlink channel with favorable results compared to

conventional techniques [ 122-1 25].

1.1 Background

Interest in the ﬁeld of blind source separation and deconvolution has grown dramatically
during recent years, motivated to a large extent by its similarity to the mixed signal
separation capability of the human brain. The brain makes use of unknown nonlinear
parallel and complex dynamic signal processing with auto-leaming and self-organization
ability to perform such tasks. The peripheral nervous system integrates complex external
stimuli and endogenous information into packets, which are transformed, ﬁltered and
transmitted in a manner that is yet to be completely understood. This complex mixture of
information is received by the central nervous system (brain), split again into original
information, and relevant information relayed to various sections of cerebral cortex for
further processing and action [89].

BSR is valuable in numerous applications that include telecommunication
systems, sonar, acoustics and radar systems, mining and oil drilling systems, image
enhancement, feature extraction and biomedical signal processing. Consider, for
example, the audio and sonar applications where the original signals are sounds, and the
mixed signals are the output of several microphones or sensors placed at different
vantage points. A network would receive, via each sensor, a mixture of original sounds

that usually undergo multi-path delays. The network’s role in this scenario will be to

3

 

   

apphca:
cardiac

engine '1

1.2 Ir

 

Develop1
multi-pai
practical
highly St
HCIWOrk
Variation
propagat

eve“ nor

dynamically reproduce, as closely as possible, the original signals. These separated
signals can subsequently be channeled for ﬁrrther processing or transmission. Similar
application scenarios can be described in situations involving measurement of neural,
cardiac or other vital biological parameters, communication in noisy environments,

engine or plant diagnostics, and cellular mobile communications.

1.2 Importance of Modeling

Development of adequate models of the environment that include time delays or ﬁltering,
multi-path effects, time-varying parameters or sensor dynamics is important in desired
practical applications. The choice of inadequate models of the environment may result in
highly sensitive and un-robust processing by a network. Indeed, in order to render the
network operable in real world scenarios, robust operations must account for parameter
variations, dynamic inﬂuences and signal delays that often result in asynchronous signal
propagation. The environment needs to be modeled as an appropriate dynamic linear (or

even nonlinear) system [53, 86, 151].

1.3 The State Space Framework

The state space formulation provides a general framework capable of dealing with a
variety of situations. The multi-variable state space provides a compact and
computationally attractive representation of multi-input multi-output (MIMO) ﬁlters.
This state space representation allows for the derivation of generalized iterative update
laws for the BSR. The state notion abridges weighted past as well as ﬁltered versions of
input signals and can be easily extended to include non-linear networks [17, 103]. There
are several reasons for choosing this framework.

4

 

 

 

inter
case

inver
the C1
Parar
Parnc
ﬁnite
All c
Gamr
Th€12
nOn-“

gener

Transfer function models, although equivalent to the state space models when initial
conditions are zero, do not exploit any common features that may be present in the
real dynamic systems.

State space models give an efﬁcient internal description of a system. Further, this
choice allows various equivalent state space realizations for a system, more important
being the canonical observable and controllable forms.

The inverse for a state space representation is easily derived subject to the
invertibility of the instantaneous relational mixing matrix between input-output — in
case this matrix is not square; the condition reduces to the existence of the pseudo-
inverse of this matrix. This feature ensures recoverability of original sources provided
the environment model is invertible.

Parameterization methods are well known for speciﬁc classes of models. In
particular, the state space model enables much more general description than standard
fmite/inﬁnite impulse response (FIR/11R) convolutive ﬁltering.

All conventional (dynamic) ﬁltering models, like AR, MA, ARMA, ARMAX and
Gamma ﬁlters, can be considered as special cases of ﬂexible state space models.

The linear state space formulation of a problem is conveniently extendable to include
non-linear and time-varying models. The state space models are ideal candidates for

generalized non-linear model representations that include, e.g., neural networks.

 

 

1.4 1

 

In the 11‘

mm 10“

outputs

about ill;

I

 

unknow
to compi
without a

0fthe unl

 

 

1.4 Problem Deﬁnition: Adaptive Framework for Blind
Source Recovery

In the most general setting, the mixing/convolving environment may be represented by an

unknown dynamic process H with inputs being the n-d independent sources ,2 and the
outputs being the m-d measurements Ln_. In this extreme case, no structure is assumed
about the model of the environment.

The environment may be also modeled as a dynamic system with ﬁxed but
unknown parameters. The processing network H must be constructed with the capability
to compute the “inverse” (or the “closest to an inverse”) of the environment model
without assuming any knowledge of the mixing environment or the distribution structure

of the unknown sources.

 

 

 

1’10) 110)

 

 

 

 

Mixing/Convolving I £20) Demixing/Deconvolving ___.YZ (0
Network/Filter 5 NetworldFilter '
H . ' H ___.
mm“) YMU)

 

 

 

 

 

 

 

 

 

Figure 1.1. General framework for Blind Source Recovery

 

environ
enxiron
state sp
dynamic

equilibr‘.

Where
P ~ is a g
D~$ad
I ‘ repri
represents

Po
Stem frOm
impOSSlble

miXtUres [4

 

 

° For
For
dim

. For

In th

It is possible that an augmented network be constructed so that the inverse of the
environment is merely a subsystem of the network with learning. In this case, even if the
environment is unstable (e. g., due to existence of non-minimum phase zeros in a linear
state space model), the overall augmented network may represent a nonlinear adaptive
dynamic system, which may converge towards the desired parameters as a stable
equilibrium point or set. Thus achieving the global task of blind identiﬁcation [126, 156].

For the linear ﬁltering environments, this problem may be presented as

_X=H*m=H*H*§=P*D*§ (1.1)
where

P — is a generalized permutation matrix

D — is a diagonal matrix of ﬁlters with each diagonal ﬁlter having only one non-zero tap.
* - represents ordinary matrix multiplication for the static mixing case, while it
represents polynomial matrix multiplication for the multi-tap deconvolution problems.

4 Possible differences in the dimensions of the signal, mixture and output vector
stem from the blindness of the problem. However, in most cases it is very difﬁcult if not
impossible to determine the number of unknown sources directly from the observed
mixtures [4,6]. This results in three types of BSR problems, i.e.

o For m = n , the problem is referred to as simply BSR.
o For m > n , the problem is termed as undercomplete BSR, due to higher
dimensionality of the projected mixture space

0 For m < n , the problem is called overcomplete BSR, due to scarcity of dimension

in the mixtures space.

 

 

 

BSR a
conven
for all '
easily

measure

decomp<

 

for m <
algorithr
BSR. w}
1.4.1

In this

matrix .

Simulatl

For notational and mathematical convenience, in Chapter 3 we will derive all
BSR algorithms for the case where n = m = M, unless otherwise speciﬁed. This
convenient choice allows for unambiguous mathematical manipulations in the derivation
for all the algorithms. The under-complete mixture case of BSR, where m > n, can be
easily reduced to the fore mentioned case by appropriately discarding some
measurements using pre-processors, e.g., Principal Component Analysis (PCA), eigen
decomposition etc. [33, 52, 53, 60]. On the other hand, the over-complete mixture case
for m < n, in general, is not tractable and cannot be handled by the existing BSR
algorithms in any straightforward fashion. A solution for a special case of overcomplete

BSR, where the underlying source distributions are sparse, is discussed in chapter 5.

1.4.1 Blind Source Separation: An Example

In this example, we mix three signals using an arbitrarily chosen normalized mixing

matrix . The original sources are

0.1sin(400t)cos(30t)
s(t) = 0.08.sign(sin(500t + 9 cos(40t)))
noise; unifome distributed

The nonlinearity used in this case is based on Edgeworth expansion [8, 47]. The

simulation results are presented below

 

n(t)

 

 

 

lnput Signals

 

 

 

 

 

 

 

 

  
     

11 :11 11] 111.1 11 F
'l '1' A {El .1 “ll"! 11’-
151’111 11111 1111:. 111.112.] 1.1111,
21.111111111141111i‘lii‘l‘iluil'll‘lu‘m1”? 1111111111 11M
‘1' llg'tl‘p‘ 11;1:;1‘p'111n;1};i 1.1111111111111511
11‘1 '11" 15;.31' no
1‘11 111 it 1111 11
1 1 1 l -‘
0.5 0.6 0.7 0.8 0.9 1
l Sill-iii! I
lip 1,715“ illll l 1
ll M1. élll :21"
{‘ll] {”1 l“ 11$ 1'

 

 

 

 

 

 

 

 

liilliiilliull
O. 9

"'l
1.111.111.1111.
0.6

  

 

JLl0.2

 

 

 

 

(a)

 

 

 

Mixed Signals

I 1 I I I T I

 

—T T

111111111111th ill-111* 111111111111J11111111111111111111111111it11111111

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t

 

wwwxmimmwvtWanWWtWWM/w

 

 

 

 

 

 

 

Figure 1.2. Blind Source Separation (a) Original Sources (b) Observed Mixtures

(b)

 

 

 

Blindly Separated Signals

 

 

 

 

1.11111

1 1
1'1'l1.1'11,11'l"l i'nF
1 ll l'l"'1 11"11‘11'171'

111111 11‘111
11 ill

   

1'11

111
11111 "1" 1111f
'1" 1111' '1'l'l"' '
Lﬂ'sbl' H“

l L
0 5 0.6 0.7

  

 

 

 

 

(a)

 

Global Transfer Function

 

 

 

 

(b)
Figure 1.3. Bhnd Source Separation (a) Separated Signals (b) Global Transfer Function

 

 

 

 

1.5 011

This diss
motivalii‘;
oven'ie“
Chapter I
approach

with a di

 

recovery I
general 11.
dmamical
for more g
are def“,
Classiﬁcm
feedfOnya
performan
4 dlSCuSSC
Chapter 3.
(hum-aha
models. rc
parameter
source dist
are 3150

Dr

and OVCTCC

1.5 Organization of the dissertation

This dissertation is divided into seven chapters and ﬁve indices. Some basic introduction,
motivating applications and the deﬁnition of the problem are presented in Chapter 1. An
overview of the current literature on the problems of B88 and BSD is presented in
Chapter 2. This is followed by a concise discussion on the natural gradient learning
approach and its application to blind source recovery networks. In chapter 3, we start
with a discussion on the multivariable optimization framework for the blind source
recovery problem. We initially present BSR update laws previously derived in a more
general nonlinear mixing setting, the framework is then specialized to the linear
dynamical state space representation. The natural gradient based update laws are derived
for more efﬁcient estimation of the recovery network parameters. Separate update laws
are derived for the minimum and non-minimum phase environments. Further
classiﬁcation is done on the basis of the BSR network structure, which can be either in a
feedforward or a feedback formulation. Simulation results and a comparison of the
performance for all the proposed BSR structures make up the rest of the chapter. Chapter
4 discusses the issue of score function adaptation for all the update laws proposed in
chapter 3. Two separate adaptive score functions are proposed for BSR based on the
Generalized Gaussian distribution model and the proposed Hyperbolic distribution
models, respectively. A discussion on adaptive estimation of the batch output kurtosis,
parameter regulation for the proposed score functions and their effectiveness for various
source distributions is included, The simulation results using the proposed score functions
are also presented. In chapter 5, we discuss the recovery issues related to undercomplete

and overcomplete mixtures, which may arise due to excess and deﬁciency of dimension

ll

 

 

 

 

in mix.
sparse

Analys
Simula
are incl
to the a

The foc

 

CDMA
Compre‘.
equivale
derived.
schemes
the pmp
also Sug
this digs
5
dissenat
this “or
entropy
in the 1
SimpljﬁC
(FIR) an

BSR are

in mixture space with respect to the original source space. For the case of overcomplete
sparse distributions, a new ICA algorithm called Algebraic Independent Component
Analysis (AICA) is presented and related implementation issues are discussed.
Simulation examples using both independent identical distribution (iid) data and speech
are included as a demonstration of the AICA recovery performance. Chapter 6 is devoted
to the application of BSR approaches to the CDMA wireless communication networks.
The focus is primarily placed on the forward link (or downlink) for both a generic DS-
CDMA implementation, and a UMTS FDD (or WCDMA) wireless network.
Comprehensive signal models are initially developed and then cast into convolutive
equivalents. Three adaptive BSR algorithms for a user’s signal detection are subsequently
derived. These proposed detection schemes are then compared to conventional detection
schemes for CDMA using extensive simulation results that outline the effectiveness of
the proposed approaches. Chapter 7 summarizes the developments made in this work and
also suggests various possible future extensions possible to the work being presented in
this dissertation.

Several appendices have been included to supplement the main text of this
dissertation. Appendix A enlists the various abbreviations and acronyms used throughout
this work. In appendix B, we present the Kullback-Lieblar divergence [72] or the relative
entropy measure, which is used as the objective performance functional to be minimized
in the BSR optimization framework. Appendix C presents speciﬁc implementation
simpliﬁcations in the state space BSR framework for the cases of ﬁnite impulse response
(FIR) and inﬁnite impulse response (IIR) ﬁlters. Various performance benchmarks for

BSR are presented in Appendix D. Two of these benchmarks namely the Quadratic

12

mmn1

develt1

 

techniq

fbnnul

 

 

Information Measure (QIM) and the Algebraic Matrix distance Index (AMDI) have been
developed during the course of this work. Lastly appendix E presents a transformation
technique to transform a generic MIMO ﬁltering structure to a canonical state space

formulation.

13

 

Blind S
informa
apphcat
lCiCCQm
is t0 rec
InixmrC
based 1
decOn.e

depend

COmDOI
senSOrS
blind S(
2113011115

indeper

Chapter 2 '

REVIEW OF LITERATURE

Blind Source Separation (BSS) by Independent Component Analysis (ICA) using the
information theoretic approaches has been a hot research topic because of its potential
applications in signal processing such as in speech recognition systems,
telecommunications, time series analysis and medical signal processing. The goal of ICA
is to recover independent sources given only sensor observations that are unknown linear
mixtures of the unobserved independent source signals. In contrast to the correlation-
based transformations such as Principal Component Analysis (PCA), ICA not only
decorrelates the signals (2nd order statistics) but also reduces higher-order statistical
dependencies, attempting to make the signals as independent as possible [53].

Two different research communities have considered the analysis of independent
components. On one hand, the study of separating mixed sources observed in an array of
sensors has been a classical and difﬁcult signal processing problem. The seminal work on
blind source separation was by Herault and Jutten [56] where they introduced an adaptive
algorithm in a simple feedback architecture that was able to separate several unknown

independent sources. Their approach was further developed by Jutten and Herault [63];

14

Karhunen and Joutsensalo [65]; Cichocki, Unbehauen and Rummert [36]; Comon [37]
elaborated the concept of independent component analysis and proposed cost functions
related to the approximate minimization of mutual information between the sensors.

In parallel to blind source separation studies, unsupervised learning rules based on
information theory were proposed by Linsker [83]. The goal was to maximize the mutual
information between the inputs and outputs of a neural network. This approach is related
to the principle of redundancy reduction suggested by Barlow [20] as a coding strategy in
neurons. Each neuron should encode features that are as statistically independent as
possible from other neurons over a natural ensemble of inputs; decorrelation as a strategy
for visual processing was explored by Atick [19]. Salam, et. al., [45, 106] ﬁrst proposed
the use of linear state space models for B88 and derived a novel adaptive algorithm based
on multivariable optimization theory. Nadal and Parga [88] showed that in the low-noise
case, the maximum of the mutual information between the input and output of a neural
network implied that the output distribution was factorial; that is, the multivariate
probability density function (pdt) can be factorized as a product of marginal pdfs. Roth
and Baram [102] and Bell and Sejnowski [21] independently derived stochastic gradient
learning rules for this maximization and applied them, respectively, to forecasting, time
series analysis, and the blind separation of sources. Bell and Sejnowski [21] put the blind
source separation problem into an information-theoretic framework and demonstrated the
separation and deconvolution of mixed sources. Their adaptive methods are more
plausible from a neural processing perspective than the cumulant-based cost functions

proposed by Comon [37]. A similar adaptive method for source separation was proposed

by Cardoso and Laheld [29].

15

 

 

 

 

vicwpl-
propos1
Girolar
Project

kurtosi:

linear n

 

separati
Similari‘
process1
deveIOp
algorith
the COn
the nan
C iChoc].

Other algorithms for performing ICA have been proposed from different
viewpoints. Maximum Likelihood Estimation (MLE) approaches to ICA were ﬁrst
proposed by Gaeta and Lacoume [43] and elaborated by Pearhnutter and Parra [94].
Girolami and Fyfe [51], motivated by information-theoretic indices for Exploratory
Projection Pursuit (EPP) used marginal negentropy as a projection index and showed that
kurtosis-seeking projection pursuit will extract one of the underlying sources from a
linear mixture. A multiple output EPP network was eventually developed to allow full
separation of all the underlying sources [50]. Bell and Sejnowski [21] have pointed out a
similarity between their infomax algorithm and the Bussgang algorithm in signal
processing. Torkkola [118] and Girolami [48] also made theoretical and implementation
developments using infomax algorithm. Lambert [73, 74] proposed extensions in BSS
algorithms using frequency domain polynomial matrix algebra, further he also elucidated
the connection between three different Bussgang cost functions. Amari [7, 8] proposed
the natural gradient based learning rules based on information geometry while Amari,
Cichocki and Yang [13] proposed implementations based on use of higher order statistics
and source pdf approximations using Gram Charlier and Edgeworth series expansions
[30]. Gharbi and Salam [45, 47] extended their state space algorithms dervived using
higher order statistics. Girolami and Fyfe [51] chose negentropy as a projection pursuit
index, developing a learning rule that is able to blindly separate mixed sub— and super-
Gaussian. Lee, Girolami, Bell and Sejnowski [76, 77, 79] have shown the
interconnection of all the above mentioned approaches, unifying them under the banner
of extended infomax algorithm. Amari, Douglas, Cichocki and Yang [14] proposed a

comprehensive algorithm using the natural gradient for the multichannel blind source

16

 

 

separa

 

repres.
algoru.

equah2

 

separation problems. Constraining both the mixing and demixing environment to be
represented by FIR ﬁlters. Several research groups have further worked to optimize these
algorithms for various domains of applications especially for mixtures of speech and
equalization of communication channels.

It has been proved that the algorithms for the Blind Source separation and Blind
Source deconvolution give superior performance when optimized using the Riemannian
contra-variant or the natural gradient [6, 9, 16], or equivalently the relative gradient [29].
The use of state space for BSS was ﬁrst proposed by Salam [106]. Gharbi and Salam [45,
47, 11]] extended their algorithms for continuous time static and dynamic environments
using multivariable optimization theory and the natural gradient. Zhang, Cichocki et. al.
[153, 155, 156] followed suit with several research papers, focused on derivation of state
space algorithms based on Lie Group structures. They have also advocated use of Kalman
ﬁlters for state estimation [157]. Salam and Erten. [109] proposed a general framework
for the state space blind source recovery. More recently, Waheed and Salam [110, 112,
126, 129] have proposed extended discrete time state space algorithms, for static and
dynamic environments, both in feedforward and feedback conﬁgurations for Blind
Source Recovery (BSR). They have demonstrated the application of algorithms for both
minimum and non-minimum phase systems, and also for a variety of unknown source
distributions using adaptive non-linearities.

Our current research proposes extended discrete-time natural (or the Riemannian
contravariant) gradient algorithms for the Blind Source recovery (BSR) or multichannel
blind deconvolution (MCBD) problems fused with the use the state-space formulation,

and the optimization theory. BSS in this approach can be viewed as a trivial static case.

17

Before providing details of the conducted research, the notion of the natural gradient and
its application to the problems of BSS and BSD has been concisely discussed.

The choice of the natural gradient, for the problems of Blind Source Recovery (BSR)
or Multichannel Blind Deconvolution (MCBD) has been made due its superior and
equivariant performance, which is preserved in the state-space formulation for this

research

2.1 The Natural Gradient Framework

For any general nonlinear optimization framework, the stochastic gradient method [5,
104, 119, 147] is a popular learning method. However, in many cases the parameter space
is not Euclidean but has a Riemannian metric structure. In these cases, the ordinary
gradient does not give the steepest direction of a target function; rather, the steepest
direction is given by the natural (or the contravariant) gradient. The Riemannian metric
structures were introduced by means of information geometry [6-9, 16]. The natural
gradient is discussed explicitly in the space of matrices for blind source separation (BSS),
and the space of linear dynamical systems for the blind source deconvolution (BSD)
problem.

A performance comparison between the ordinary stochastic gradient and the
natural gradient has been presented in [9]. The reference studies the asymptotic behavior
of online natural gradient learning. For online adaptation, the training examples can be
used only once as they appear during the learning/adaptation phase. Therefore, the
asymptotic performance of online learning/adaptation cannot be better than the optimal
batch procedure where all the examples can be reused again and again. This has been

proved that natural gradient online learning gives the Fisher-efﬁcient estimator in the

18

 

5611 St?

 

asymp
loss ft
than b-
byum

gndkn

2.1;1

Let S:

Euclide

incremc

 

1
Where (

Orthonor

\1
length Of
Theitx

general 0

ii I’CduCeS

 

sense of asymptotic statistics when the loss function is differentiable, so that it is
asymptotically equivalent to the optimal batch procedure, see [7, 91]. However, when the
loss function is non-differentiable, the accuracy of asymmotic online learning is worse
than batch learning by a factor of 2, for example see [25]. In this section, what is meant
by the natural gradient is brieﬂy outlined, and its relationship to the ordinary stochastic

gradient is established.

2.1.1 Deﬁnition of the Natural Gradient

Let S = {w e R" } be a parameter space on which a function L(w) is deﬁned. When S is a

Euclidean space with an orthonormal coordinate system w, the squared length of a small
incremental vector connecting w and w + dw is given by
2 " 2
10111 =Z(dw1-) (2.1)
i=1
where dwi are the components of dw. However, when the coordinate system is non-

orthonormal, the squared length is given by the quadratic form

1W12 = 28:; (W)dwidw j (2.2)

i,j
When S is a curved manifold, there are no orthonormal linear coordinates, and the
length of dw is always written as in equation (2.2). Such a space is a Riemannian space.

The nx n matrix G =(gij)is called the Riemannian metric tensor, and it depends in

general on w.

It reduces to

1, -=-
gg-(w)=6.-j=1 '. J. (2.3)

O,1¢j

l9

 

 

 

 

in the

that m

for a St

 

2.1.2

The SIC

Where

gradie

“rh Cr 1

in the Euclidean orthonormal case, so that G is the unit matrix I in this case.

The steepest descent direction of a function L(w) at w is deﬁned by the vector dw

that minimizes L(w+dw) where ldwl has a ﬁxed length, that is, under the constraint

ldle = .92 (2.4)

for a sufﬁciently small constant a , see [7, 9, 16].

2.1.2 Theorem 2.1
The steepest descent direction of L(w) in a Riemannian space is given by

—€7L(w) = —G" (w)VL(w) (2.5)
where G‘1 = (gij) is the inverse of the metric G = (gij) and VL is the conventional

gradient,

6 6
VL(w)—(5—w1-L(w),...., 6w

n

 

L(w))T

the superscript T denoting the transposition.

Where
€7L(w) = G“VL(w) (2.6)
is deﬁned to be the natural gradient of L in the Riemannian space.

Thus, -§L represents the steepest descent direction of L. Using the tensorial

notation, this is nothing but the contravariant form of -VL . Observe that when the space

is Euclidean and the coordinate system is orthonormal, this reduces to BL = VL

20

I
probl.

1

 

preset

2.1.3

Let us

variabl

 

Signals

adjusta

process

Where ,

SiOChaf

Where
deﬁnill

C(19) e.

Where

The natural gradient learning has been brieﬂy outlined and its use for the
problems of BSS and BSR (or MCBD) is shown below. For complete proofs of the

presented theorems, please see [9, 16].

2.1.3 The Natural Gradient Learning

Let us consider an information source that generates a sequence of independent random
variables sl,sz,...,s,,...,subject to the same probability distribution q(s). The random
signals 3, are processed by an adaptive processor (e.g., a neural network) that has a set of

adjustable parameters w. Let I(s,w) be a loss function when signal s is processed by the
processor whose parameter is w. Then the risk function or the average loss is

L(w) = E[I(s, w)] (2.7)
where E denotes the expectation with respect to 3.

Learning is a procedure to search for the optimal w*that minimizes L(w). The
stochastic gradient descent learning method can be formulated in general as

w,” = w, — n,C(w,)VI(s,, w,) (2.8)
where n, is a learning rate that may depend on t and C(w) is a suitably chosen positive

deﬁnite matrix [5]. In the natural gradient online learning method, it is proposed to put

C(w) equal to G_1(w) when the Riemannian structure is deﬁned as (2.6).
This suggests the natural gradient descent algorithm of the form
Wt+l = Wt -m\7L(w,). (2.9)

where n, is the learning rate that determines the step size.

21

Two illustrative examples for the natural gradient learning are presented that
include Blind Source Separation (BSS) and Blind Source Deconvolution (BSD). A brief
introduction to the setup of each problem is followed by the derivation of update laws
based on the natural gradient, this approach will serve to outline both problems and also
present the motivation in the use of natural gradient while searching for an adaptive

solution.

2.2 Blind Source Separation.

Let us consider n signal sources that produce n independent signals si(t) , i = l,..., n, at
discrete times t =1, 2, Assuming that si(t) are independent at different times and that

the expectations of s,- are 0. Let r(s) be the joint probability density function of s. Then it

is written in the product form
n .
"(S)=I_Iri(si) (2.10)
i=1
Consider the case where there exists no direct access to the source signals s(t) but
only their n instantaneous mixtures m(t) can be observed,

m(t) = Hs(t), or

mi(t)=ZHy-sj(t) (2-11)

i=1

where H = (H ij) is an nx n nonsingular mixing matrix that does not depend on t, and

m = (m1,..., mn)T is the observed mixtures.

22

 

  
 

denshi

Where

form 1

follow

2.2.1

The n.

there:

The

Dojn

Blind source separation is the problem of recovering the original signals s(t), t =
l, 2, ..., from the observed signals m(t), t = 1, 2, ..., [63]. If H is known, this is trivial,
because s(t) = H '1m(t)
The “blind” implies that neither the mixing matrix H nor the probability distribution
densities r,-(s,-) are known.

A typical algorithm to solve the problem is to transform m(t) into

yo) = W. m(t) (2.12)

where W, is an estimate of H '1, and is modiﬁed by the following update equation of the

form (2.9). The speciﬁc form of the update law for BSS can be derived using the

following theorem by Amari [9].

2.2.1 Theorem 2.2

The natural gradient in the matrix space is given by
61 = (V1)WTW (2.13)

therefore, the natural gradient update equation for BSS is

%=m(1-¢(y)yT)W (2.14)

where (p( y) is the column vector

My) =[¢1(y1).....¢m(ym)]T
( ) d1 f ( ) (2'15)
. . =—_O . 1.

$1 y] 6? g I y

The efﬁciency of this equation is studied from the statistical and information geometrical

point ofview [11, 15].

23

 

 

 

2.3 '

 

When

 

 

signal

introd:

Them

Where

T0 rec

of deg

2.3 Blind Source Deconvolution.

When the original signals s(t) are mixed not only instantaneously but also with past
signals as well, the problem is called blind source deconvolution or equalization. By

introducing the time delay operator 2’1 ,

z‘lso) = s(t —1) (2.16)

The mixing matrix ﬁlter H is denoted by

H(z) = i sz‘k (2.17)
k=0

where A). are mxm matrices. The observed mixtures are

m(t) = H(z)s(t) = ZHksa — k) (2.18)
It
To recover the original independent sources, using the ﬁnite impulse response model

d
W(z) = Z sz-k (2.19)
k=0

of degree d. The original signals are recovered by

y(t) = W.(2)m(t) (2.20)
where W, is adaptively modiﬁed by

111.1(2)=W2)—n.171{m..m._1...JV.} (2.21)

where the natural gradient in this case is given by the following theorem [9].

2.3.1 Theorem 2.3
The natural gradient of the manifold of systems is given by

in = Vl(z)WT(z_1)W(z), (2.22)

24

 

 

 

 

when

 

 

“here

Post r

when

where operator z‘l should be operated adequately.

2.4 Equivariant Performance of the Natural Gradient

An algorithm for source separation of instantaneous additive mixtures is deﬁned to have
equivariant performance if its behavior does not depend explicitly on the form of the
mixing matrix H [29]. This deﬁnition can be extended for the case of source
deconvolution [13] by stating that a deconvolution algorithm is equivariant if its behavior
only depends on the combined ﬁlter G(z,k)

Using (2.21), the update algorithm can be expressed in the form

61(2)

AW(Z,k) : —77t6l{mt,mt_1,...,m} = _"t aX(Z)

 

W(z,k) (2.23)
where dX(z) = dW(Z){W(Z)}_1

Post multiplying both sides of (2.23) by H(z), the relation is given by

61(2)

A603,") = _77! 6X(Z)

 

G(z, k) (2.24)

where G(z,k) = W(z,k) * H (z) ,
Note, that both (2.24) and the deﬁnition of gradient can be expressed in a form
that is independent of H(z), which proves the equivariant property of the natural gradient

algorithms.

25

Chapter 3 ‘

BLIND SOURCE RECOVERY: A

FRAMEWORK IN THE STATE SPACE

Blind Source Recovery (BSR) denotes recovery of original sources/signals from
environments that may include convolution, temporal variation, and even nonlinearity. It
also infers the recovery of sources even in the absence of precise environment
identiﬁability. This chapter describes a generalized BSR formulation, see Salem, et. al.
[109, 110], achieved by the application of stochastic optimization principles to the
Kullback-Lieblar divergence as a performance functional subject to the constraints of the
general (i.e., nonlinear and time-varying) state space representation. This technique was
used to derive update laws for nonlinear time-varying dynamical systems, which are
subsequently specialized to canonical neural and time-invariant linear systems. Some of
this work has also been implemented in dedicated hardware/software platforms [40-42,
46].

The state space feedforward and feedback demixing network structures have been

exploited to develop natural gradient based learning rules. These adaptation algorithms

26

are capable of handling most ﬁltering paradigms and which can be conveniently extended
to nonlinear models. In the special cases, distinct linear state-space algorithms are
presented for the minimum phase and non-minimum phase mixing environment models.
Conventional (FIR/IIR) ﬁltering models can be subsequently derived from this general
structure (see, Appendix C). Illustrative simulation examples are presented to

demonstrate the online adaptation capabilities of the developed BSR algorithms.

3.1 Formulation of the Optimization Structure

This section formally describes the state space BSR optimization framework, see [109,
112] for more details. In this section, the BSR stochastic gradient update laws are derived
via the proposed constrained multivariable optimization framework [109, 110] using
Kullback-Lieblar divergence as the performance ftmctional (see, Appendix B). This
performance ftmctional is a well known information-theoretic ‘g‘distance” measure,
appropriate for quantifying the mutual dependence of signals in a mixture. Using the
calculus of variations and the optimization theory, the update laws are initially derived
for a more general non-linear dynamic structure. Subsequently, the update algorithm is
specialized to the case of a linear dynamical state space model.
For the case of discrete-time, independent, wide sense stationary output vector
y(k) , the Kullback Lieblar divergence (or the relative entropy) is given by
n
L(y(k))=—H(y(k))+ZIHi(y,(k)) (3.1)
1:

where, H(y(k)): represents the entropy of the signal vector y(k), given by

27

r- j' py(y)1n|Py(y)1dy Continuous Case
= _ = er
H(y) E[lnlpy(y)l] 1 —Z py(y)ln|py(y)l Discrete Case
, er '

 

and H ,- ( y,(k)) : represents the marginal entropy of a component signal y,(k)

3.1.1 Algorithms for the Nonlinear Dynamic Case

Assume that the environment can be modeled as the following nonlinear discrete-time

dynamic forward model
X e (k +1) = fetXe(k).s(k).h1) (3.2)
m(k) =ge(Xe(k)aS(k)ah2) (3.3)
where

s(k) : n-d vector of original source signals

m(k): m-d vector of measurements

X e (k) : Me -d state vector for the environment

hl: constant parameter vector (or matrix) of dynamic state equation
h2: constant parameter vector (or matrix) of output equation

fe(.) and ge(.): differentiable nonlinear functions [86] that specify the structure of the
environment.

Further it is assumed that existence and uniqueness of solutions are satisﬁed for any
given initial conditions X e(t0) and sources s(k) thus a Lipschitz condition on fe(.) is

satisﬁed [71].

28

 

 

 

 

 

eQUau

andIT

The processing demixing network model may be represented by a dynamic
feedforward or feedback network. Focusing on a feedforward network model, assume the

network to be represented by
X (k+1) =f (X (k).M(k),W1) (3-4)
y(k) = g(X(k),m(k), W2) (35)
where

m(k): m-d vector of measurements

y(k): N—d vector of network output

X (k) : M-d state vector for the processing network
wl : parameters of the network state equation

w2: parameters of the network output equation

f (.) and g(.): differentiable nonlinear functions deﬁning the structure of the demixing
network

The assumption of existence and uniqueness of solutions of the nonlinear difference
equations is also assumed for the network model for any given initial conditions X (to)
and measurement vector m(k).

In order to derive the update law, the notation is abused for the sake of convenience

so that y(k) in (3.1) is represented as yk and L(y(k)) is generalized to Lk (yk) so that the

functional may also be a function of the time index k. Thus, the following constrained
optimization problem can be formulated [109, 112] to be

Minimize

29

 

 

 

subjc.

with t

 

 

 

 

 

compu

becom

Where

Conse

Where

k1
Jo(W1aW2) = Z L"(yk) (3.6)

k=ko
subject to
Xk+1=fk(XkamkaW1) (3.7)
yk =g"(Xr,mr.W2) (3.8)

with the initial condition X k0 , where [k0,k1) is the discrete-time frame used for the

computation of the cost functional. Thus, the augmented cost functional to be optimized

becomes

k1
J(W1,Wz)= Z Lk(yk)+41rT+1(fk(X/rrmkrW1)—Xk+1) (3-9)
k=ko

where 4k is the Langrange variable [81]. Deﬁne the Hamiltonian as

H" = thyk1+dlnfk<Xnmkmn ' (3.10)

Consequently, the necessary conditions for optimality are

 

aHk k
Xk+l= =f (Xk.mk.w1) (3.11)
541m
611" k T ad‘
[I = , = ,1 +___ 3.12

and the change in “weight” parameters become

A --n 951—27 (fk)T/1 (313)

W1 Ira k W1 k+1 -
aH" aL"

A =— —=— — 3.14

W2 77k 6 2 77k 6 2 ( )

where

30

lib

fl:

771.211

 

3.1.2

Most (

There

 

 

capacu
these r.
[109, ‘
neural
scenar

dCI‘iVa

k
a O O I O O C
k é 1— : re resents the arttal derivative w.r.t the arameter a, 1n the limit.
a do P P P

77k: represents a positive time-adaptive learning rate.

3.1.2 Algorithm for Nonlinear Dynamic Canonical Neural Structure

Most of the real life problems cannot be adequately modeled using linear representations.
There is a requirement for rich non-linear demixing network models [151] that have the
capacity to undo effects of an intrinsically non-linear environment. The non-linearities in
these real-life environments can take a variety of forms. The nonlinear dynamic structure
[109, 110] presented above can be specialized to a hierarchical dynamic feedforward
neural structure. This neural structure has the capacity to deal with several practical
scenarios. Since the canonical structure is more amenable to practical implementations, a
derivation for such a neural structure is presented below

We assume the demixing network to be represented by the structure

X1(k+1) = f1(X1(k).m(k),W1)

thk +1) = f2(X1(k).X2(k).m(k), W2)
5 (3.15)

X, (k +1) = f,,(X1 (k), . ..,X,, (k), m(k), w”)

y(k) = g(X1(k).- ...X..(k),m(k). wy)

where

m(k) - represents an m dimensional measurement vector

y(k) - represents an N dimensional output vector

X ,- (k) - represents m x m dimensional 11" network state vector

w,- - represents parameters of the ith network state equation, i = 1,2, ..., n.

31

 

 

11'}. — 1

ﬁt):

differt

 

and m

TCCOV'C

 

divergt

Symbol

Subject

and thc

wy - represents parameters of the network output equation

f,-(.) and g(.) -represent differentiable non-linear functions.

Further it is assumed that existence and uniqueness of the solutions of the
differential equations are satisﬁed for any given initial conditions X ,(to),i =1, 2,---,n
and measurement vector m(k) [71].

In order to derive the update laws for this non-linear case of blind source
recovery, we set up the following optimization problem using the Kullback-Liebler
divergence [72] as the performance functional. Please note the convenient change in

symbolic notation used for the derivation.

Minimize
k1 k
J(wl,...,wn,wy)= Z L (yk) (3.16)
k=ko
subject to

X1“ =f1"(Xr",mr.u1>

X§+1=f2"(X1‘.X§.mk.W2)
= (3.17)

X,’,‘+1 =f2k(X1k,...,X,’f,mk,wn)

yk =g(X1k,...,X,lf,mk,wy)

and the initial conditions X I" 0 , X g0 ,. . . , X 50

The augmented cost functional to be minimized becomes

32

 

 

DCiillt

 

and the

Ill/(+1

k1 k 15H
J(wl,...,w,,,wy)= Z L (yk)+ ,
k=k0 :

 

 

 

hl'f'i'l

—

Deﬁne the Hamiltonian as

F k+1q
’11

 

 

 

h -

f1k(X1kamk,"1)
k+l k k k
Hk =Lk(yk)+ 12: f2(X1.X:2,mk.MQ)

k+l k k k
1,, Lfn(X1,...,Xn,mk,wn)d

f1k(Xlk,mk’W1)’Xlk+l
f2k(X{‘,X§,mk,w2)-X§+1

k k k k+1
_fn(X1,...,Xn,mk,wn)—X,, -

 

Consequently the necessary conditions for optimality are

or!"

 

Xik+1 =

 

.1,

6X," — 6X? ' ex,"

and the change in weighting parameters become

 

 

 

(— k —7‘
5ft O k 1
+
arr" 61'" 1".
AWi——7]k-5;".———77k . I
' k k+l
5f" O 371
5W1
K- .1
Aw —_ git—...,] 2.11.:
y 77k awy k awy

.1 -611" _ aft-"0 ,...,” ,am) $1.34 .

=ﬁk(X1k’”°’Xikamk’wl)’i= 1,2,.”,n

,1=1,2,...,n
0X,"

 

(3.18)

 

(3.19)

(3.20)

(3.21)

(3.22)

(3.23)

In this proposed neural structure the update of the states takes the structural lower

triangular form, while the update of co-states and the weighting parameters assumes

upper triangular structures.

33

3.2

111 111C

form

 

3.2 Algorithms for the Linear Dynamic Case

In the linear dynamic case, the environment model is assumed to be in the state space

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

form
X e (k +1) = AeXe (k) + Bes(k) (3.24)
m(k) = CeXe(k) + Des(k) (3.25)
DO
39:) B. X.(k+1) 1 z" I X.(k) C. m k)
A0

 

 

 

 

 

 

Figure 3.1. State Space Mixing Environment

In this case the feedforward separating network will attain the state space form
X(k +1) = A X(k) + B m(k) (3.26)
y(k) = C X (k) + D m(k) (3.27)
The existence of an explicit solution in this case has been shown by [106, 107]. This
existence of solution ensures that the network has the capacity to compensate for the
environment and consequently recover the original signals, see Figures 3.1 and 3.2.
Specializing the general BSR algorithm to the linear network case, one forms the

(Kullback-Lieblar divergence) performance (“distance”) measure, see (3.6) as

34

 

 

 

subjcc

The at

Agair.

giW‘n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f

D

/
f
mgk) B X(k+1) 24' X(k) C y(k)
/ g /

A

/

 

 

 

Figure 3.2. State Space Demixing Network

k1
J.(A.B.C.D)= 2 Hot) (328)
k=ko

subject to (3.26) and (3.27) with the initial conditions X k0

The augmented cost functional to be optimized becomes

k1
J(A,B.C.D)= Z Horn/113nm X1. +3 m. —Xk+1) (3.29)
k=k0

Again, deﬁne the Hamiltonian as
Hk _ Lk T
— (yk)+ 21.104 X k + B ml.) (3.30)

For the linear time-invariant case the ordinary stochastic gradient update laws are

given by [109, 110].

 

k
Xk+l = 6H =AXk+Bmk (3.31)
k+l
k k
2k =91: Af2k+1+ckT§IL (3.32)
an ayk

35

 

 

provid

 

matric

guaran

Where
P113») :

136].

aH"

AA = — —— = — 2 XT 3.33
77k 6 A 77k k+l k ( )
aHk r .
AB = -77k — = -'7k/1k+1mk (3.34)
BB
611" en" T
AC=- —=— —=— X 3.35
77k 6C 77]: a C 77k¢(y) ( )
on 61" —T T
77k 6 D 77k 6 D Wk“ 1 (001)"? ) ( )

The above derived update laws form a comprehensive theoretical algorithm and
provides the update laws for the states X k , the co-states 3k and all the parametric
matrices in the state space. The invertibility of the state space as discussed in [45, 106] is
guaranteed if the matrix D is invertible. In the above derived laws

77k: positive learning rate of the algorithm, which may be adaptive

[D]—.T : represents the transpose of the inverse of the matrix D if it is a square matrix or
the transpose of its pseudo-inverse in case D is not a square matrix.

(0( y) : represents a vector of the usual nonlinearity or score function [67, 127, 136, 138],
which acts individually on each component of the output vector y, with each component,

say m(yi) , is given as

5P(yi)
¢(yj)=_alogp(yi)='— A)? (3.37)

5% PO’i)

where

p(y,-): is the (estimate) of the probability density function of each source [127, 130,

136].

36

 

 

 

result
upda:
and l
An in

and It

 

3.2.1

In thi

Recov

envirc
[1 10,
SYSlCr

Unstal

6x111

the r.

The update law in (3.35) and (3.36) is similar to the standard gradient descent
results [12, 13, 155], indicating its optimality for the Euclidean parametric structure. The
update law provided above although non-causal for the update of parametric matrices A
and B, can be easily implemented using frame-buffering and memory storage [52, 54].
An inherent delay or latency in the recovered signal are related to the length of the buffer

and the dimension of the state space.

3.2.1 Extensions to the Natural Gradient

In this section, the linear state space algorithms for the problem of Blind Source
Recovery (BSR) derived in the previous section are extended using the natural gradient
[6, 7, 9, 14, 16]. Specialized algorithms for the class of minimum phase mixing
environments are presented both in feedforward and feedback state space conﬁguration
[110, 112, 126]. For the non-minimum phase mixing environment, the requisite demixing
system becomes unstable due to the presence of poles outside the unit circle. These
unstable poles are required to cancel out the non-minimum phase transmission zeros of
the environment. In order to avoid instability due to the existence of these poles outside
the unit circle, the natural gradient algorithm may be derived with the constraint that the
demixing system is a double sided FIR ﬁlter, i.e., instead of trying to determine the HR
inverse of the environment, the inverse is approximated using an all zero non-causal ﬁlter
[11, 129]. The double-sided ﬁlters have been proven to adequately approximate IIR
ﬁlters at least in the magnitude terms with a certain associated delay [22].

The minimum phase algorithms have lesser computational requirements and
exhibit asymptotically better convergence characteristics, compared to the algorithm for

the non-minimum phase mixing environments, when applied to the same class of

37

minimum phase systems. The minimum phase algorithm also exhibits more robust
performance with respect to the choice of the order of demixing/deconvolving ﬁlter per
channel, and the multi-source distribution composition of mixtures that may include at
most one pure gaussian source [33]. The non-minimum phase algorithm gives good
performance for the non-gaussian mixtures and/or the non-gaussian contents of a
complex multi-source distribution mixture. The gaussian source although separated from
other sources may still be auto-convolved (Note that the sum of co-centric copies of a
gaussian distributions is also gaussian). Also notice that the presented algorithms in this
chapter focus primarily on adaptive determination of zeros. The poles are kept ﬁxed
either at zero or at randomly selected stable locations. Otherwise, they are assumed
known using other methods, e.g., adaptive state estimation techniques such as the

recursive least squares (RLS), kalman ﬁlter [155, 157] etc.

3.3 Blind Source Recovery Algorithms for Minimum Phase

Environments

3.3.1 Theorem 3.1: Feedforward Minimum Phase Demixing Network

Assume the (mixing) environment is modeled by a multiple-input, multiple-output
(MMO) minimum phase transfer function. Then, the update laws for the zeros of the

feedforward state space demixing network using the natural gradient are given by

AC(k) = n((IN — ¢(y(k»yT(k»C(k> — ca<y<k>>XT(k)) (3.38)

AD(k) = 77(1 N - ¢(y(k))yT(k))D(k) (339)
where (o( y) is a nonlinear score function given by (3.37)

38

3.3.1.1

For the {L

beas in

 

feedforxx
gradient
Riemmar
better cc

(3.35) an

 

 

been dro

Def

the stc

When

p361

3.3.1.1 Derivation of Update Laws for Feedforward BSR

For the feedforward demixing network, its linear state space representation is assumed to
be as in (3.26)-(3.27). A formal formulation has been developed for deriving the
feedforward update laws for the problem using the output equation (3.27) and the natural
gradient learning. This leads to modiﬁed update laws for the matrices C and D using the
Riemmanian manifold. Further, these new update laws based on the natural gradient have
better convergence performance compared to those based on the stochastic gradient
(3.35) and (3.36). Note that in the following derivation, instantaneous time index k has

been dropped for convenience

Deﬁning augmented vectors )7 and f , and the matrix W as

"— y ”— m W— D C 340
y_1X1 x_1X1 _ 0 ’(L—lim (' )

so the augmented output equation becomes
y=Wr can
The update law for this augmented parameter matrix W is similar in form to (3.36) or

the stochastic gradient law for the static mixing case. Thus the update is

AW = apt/“T — (0(9))?T 1 (3.42)

1177— DT 0
— CT 1
(L-l)m

Consequently for the general case where D may not be square, its inverse (assuming the

where

pseudo-inverse for D to exist) is given by

39

 

Factoring 01
mi
Post-multip
Ali
Using (3 .41
mi

Writing in '

Al

Th

exam 32m

K.

and (348)

W‘T 13(DTD)‘l 0
—CTD(DTD)’1 1(L_,),,,

Factoring out the augmented weight term W'T , (3.42) can be written as

AW = 7711 — n(mTWT1W-T (3.43)
Post-multiplying by the matrix WTW , the update law becomes

AW = ”[1 — n(mTWT 1 W (3.44)
Using (3.41), the above update rule can be written as

AW=n[I—¢(91}T]W (3.45)

Writing in terms of the original state space variables the update law (3.45) is given by

D C y T T D C
A ___ 1- X 3.46
[0 I(L—l)m1 ”1 ¢1X1[y 1110 [“4”] . ( )

Considering the update laws for matrices C and D only, the relation is

_ T T D C "
[AD AC1-77lD Cl-U[¢(y)y ¢(y)X 110 1(L—1)m1

Therefore the ﬁnal instantaneous update laws for the matrices C and D are
461k) = 221th —¢(y(k»yT(k»C(k)—<o(y<k»XT(k)) (3.47)
AD(k)=77(IN-¢(y(k))yT(k))D(k) (3.48)
The resulting update laws for the natural gradient update derived in [155] are in

exact agreement with the update laws derived above [110, 112]. The update law in (3.47)

and (3.48) is related to the earlier derived update law (3.35) and (3.36) by the relation.

40

 

 

where

W

denotes the

in (3.49) is 5

 

3.3.1.2 I

1‘

Examining 1

0 2 1
Also

NH

(0( J

- 11371) DTC 1 11) C 110 C 1
VI = V] = V] (3.49)

CTD I(L_l)m+CTC 0 I(L—l)m O I(L—l)m
where
an" 23L"
V1=135 52:] (3.50)

denotes the update according to the ordinary stochastic gradient, the conditioning matrix

in (3.49) is symmetric and positive deﬁnite.

3.3.1.2 Auxiliary Conditions for Convergence

Examining the update of the remaining terms in equation (3.46),
0=¢1X1yTD => «200% =0 (3.51)
Also
A1(L—l)m = 0 = 1(L—1)m - (P(X)yTC - ¢(X)XT
By a rearrangement of terms, the condition is
10(40er + ¢(X)XT = I(L—1)m
Using the relation (3.51), the condition reduces to

¢(X)XT = 1(1—1)». (3.52)

3.3.1.2.] Remark 3.1:

The resulting update laws for the natural gradient update derived in [155] are in exact
agreement with the update laws (3.47)-(3.48), also see [107, 110]. However, the

derivation approaches and methodologies are different.

41

 

 

   
 

explicitly 11
Update law

feedback 5

HOWCvcr, f.

(in a stocha

3.3. 1.2.2 Remark 3. 2:

The new auxiliary relations (3.51) and (3.52) deﬁne supplementary stochastic conditions
on the nonlinear “correlation” between the outputs and the states of the demixing
network. In this feedforward minimum phase structure case, these relations do not
explicitly appear in the update laws and thus can not be used directly to simplify the
update laws (3.47)-(3.48). (In contrast, similar auxiliary conditions derived for the
feedback structure do simplify the resulting update laws, see proof of Theorem 3.2.)
However, for effective convergence of the algorithm, these conditions need to be satisﬁed

(in a stochastic sense), i.e.,

E (BOYD/T] = O(L-1)me , and

 

E ¢(X)XT1= 1(L-1)m
The ﬁrst equality signiﬁes that the output and the states of the network are “uncorrelated”

as deﬁned. This condition supplements the condition E[¢(y)XT] —> 0( L-1)mx N , which

must be satisﬁed in (3.47) at convergence. Viewing the states as prior signal information,
the causal algorithm necessarily includes only causal correlation while the auxiliary
condition (3.51) signiﬁes the noncausal correlation. The second condition signiﬁes that
the states of the demixing network need to be a white process. It is noted that, in the

special case of FIR ﬁltering where the states are basically delayed versions of previous

measurements, i.e., X ,- (k) = mk_,- , the second condition (3.52) implies that whitening the

measurement signals prior to applying the algorithm would enhance the convergence of
the update law. These conditions are in addition to the stability conditions for the

algorithm [11, 27, 109].

42

3.3.2 T

 

 

Assume I},
S}"S(em 0r

demixing ,

 

3.3.2 Theorem 3.2: Feedback Minimum Phase Demixing

 

 

m(k) + add , y(k)

 

 

 

z(k)

 

[ABCD]

 

 

 

 

 

 

Figure 3.3. State Space Feedback Demixing Structure

Assume the MMO (mixing) environment modeled by a minimum phase linear oynamic
system or transfer function. Then the update laws for the zeros of the feedback state space

demixing network using the natural gradient are given by

AD = n[(I~ + D)((0()’)yT —I~)] (3.53)

AC=n[(1N +D)co(y)XT] (3.54)

where ¢( y) is the appropriate score function by (3.37)

3.3.2.1 Derivation of Update Laws for Feedback BSR

An alternate BSR structure was presented in [112], where the demixing network
comprises of a feedback transfer function, see Figure 3.3. This feedback path is
composed of tunable poles and zeros while the feedforward path is assumed to have ﬁxed
parameters. Using only the update laws for adaptively determining the zeros of this
feedback path can potentially alter the poles of the overall demixing structure.

The network equations for the feedback path are (see Figure 3.3)

43

 

 

:( A
Deﬁning

e( k
the output

y( A
For simpli

3'11
rearranging

(1.1

In matrix f

11,.

01'

z(k) = C X(k) + D y(k) (3.55)
Deﬁning

e(k) = m(k) — z(k) ' (3.56)
the output of the feedback structure is given by

y(k) = H n ”“600
For simplicity, assume H n = I , so

y(k) = 1 *(M(k) - 206)) = m(k) - 2(k) (357)
rearranging terms, the output equation becomes

(1N + D) y(k) + C X(k) = m(k) (3.58)

In matrix form, (3.58) can be written as

1’”JD 1.311812

--1
y - IN+D C m
1X11 0 ’(L—11m1 1X1 (359)

Deﬁning the augmented vectors 3? and T’ , and the matrix W as

~ m ~ y ~ IN+D C
X: ,Y= ,andW=
X X 0 I(L—l)m

(3.59) can be expressed in the compact form

01'

Y=W X=P71T where 71?:W (3.60)

Using the natural gradient, the update law for 7V is

44

z z-T ~ ~T sz
AW=n[W —(0(Y)X 1W W,or
2: ~ ~TzT 2: ,
AW=n IM+N—(p(Y)X W W (3.61)
Differentiating WW 2 I
z/~ z~/
W W+WW =0
and by rearranging terms
z/ z~ ~—1 ~—1~/~—1
W =—WWW =—W WW (3.62)

Using lSt order Euler approximation for the derivatives in (3.62), the variation in W can

be expressed as

AW = —W'1(AW)W" (3.63)
Also note that from (3.60)

~ ~ zT

YT = XTW (3.64)

Using (3.62) and (3.64), the update law in (3.61) becomes

—W_1(AW)W_1 = n[l —¢(T’)T’T]W , or

-1?an = 271! - «xi/WT]

since WW = I .
Arranging terms
~ ~ ~ ~T ~ ~ ~r
AW=—nW [1—¢(Y)Y ]=nW [p(Y)Y —1] (3.65)
Inserting the deﬁnition of W and 1" , (3.65) becomes

45

 

 

Therefo

rows blt

3.3.2.2

By equa‘

11221

Observe
Slippleme
Case, hm
netWork

feedback
signify ,1,

deconelatt

11N+D C 1_ [1N+D C 1x ¢(Y)yT-1N (0(Y)XT
0 1(L—11m 0 1(L—11m (0(X)yT ¢(X)XT - I(H)»,
Therefore the update laws for the matrices C and D are extracted by equating the ﬁrst

rows block wise to get
41111 + D) = AD = 22101, + moon" — IN > + CromyT] (3.66)

AC=ni<IN +D)¢(y)XT wept/oxT -1(L—1)m)] (3.67)

3.3.2.2 Auxiliary Conditions for Convergence

By equating the second row block wise, one gets
T
0(L-1)me = ¢(X)(L—1)mx1y lxN (3-63)
A1(L-1)m = 0 = ¢(X)XT — I(L—l)ma 0r

com/YT = 1(L-1)m I (3.69)

3.3.2.2.] Remark 3.3:

Observe that (3.68) and (3.69) are identical to (3.51) and (3.52). These relations form
supplementary conditions for the convergence of the proposed feedback algorithm. In this

case, however, the states of the feedback structure are a “function” of the previous
network outputs, i.e., X ,(k) =g( yk_,-). Therefore, the auxiliary conditions in the
feedback case deﬁne nonlinear correlation conditions on the demixing outputs that

signify that the expectation of the demixing network outputs will be (nonlinearly)

decorrelated leading to the components being stochastically independent.

46

 

Un
in the deri
conditions
but also ir

ﬁnal updat
AD
AC
The

earlier prop

(3-71) exhil

3.4 Blin

 

Pha

3.4.1 Tl,

Assume the

funCthn. T}

“SINg [he "a

AC].

Unlike the feedforward update laws, however, these conditions appear explicitly
in the derived update laws(3.66) and (3.67). Therefore, enforcing the derived auxiliary
conditions in the update laws (3.66) and (3.67) not only simpliﬁes them computationally,
but also improves the convergence dynamic properties of the proposed algorithm. The

ﬁnal update law for the feedback structure is given by
AD=n[<IN+D>(¢(y)yT—I~ )1 (3.70)

AC=nt<IN +D)¢<y)XT1 (3.71)

The above natural gradient update laws for feedback BSR are different from
earlier proposed feedback (or recurrent) update laws in [3 5, 75]. The algorithm (3.70) and

(3.71) exhibits better stability and faster convergence. See [123] for a comparison.

3.4 Blind Source Recovery Algorithms for Non-minimum

Phase Environments

3.4.1 Theorem 3.3: Feedforward Non-minimum Phase Demixing

Assume the MMO (mixing) environment modeled by a non-minimum phase transfer
function. Then, the update law for the zeros of the state space FIR demixing network

using the natural gradient is given by

AC,- = n(k)[c,- -—¢(y(k))u(k -i)T],i = 1, 2,- ~,L —1 (3.72)

AD=n<k>[D-¢<y(k»u(k)T] (3.73)
where the state space matrix C is defined as

C=[C1 C2 CL-l] (3.74)

47

oreqr

uher

gnu;

3A.
011

inpi

nec
tn:
iSs
ini
thc

th(

(3

(3

C,- - being the MMO FIR filter coefficients corresponding to delay z—i

u(k) — represents an information back—propagationfilter, given as
L-l T T
u(k) = 2 C,- y(k +i)+ D y(k)
i=1

or equivalently in the adjoint state space as
2(k — 1) = AT/t(k) + CTy(k)

u(k) = BTMk) + DTy(k)
where T represents the matrix Hermitian operator, (p( y) is the appropriate score function

given by (3.37) and A and B are given by (43), with A1 j = 0.

3.4.1.1 Derivation of Update Laws for Feedforward Non-minimum Phase BSR

On the outset, please note that the algorithm requires a frame of 3L-2 samples of the
input signal to initiate the computations of the update of the parameter matrices in (3.72)-
(3.73). Note also that the time index of the parameter update (3.72)-(3.73) is not
necessarily the same time index of the signals. In practice, it may be chosen so that the
time index of the parameters is a delayed version of the signal time index so that no
issues of non-causality would arise. Also in practice, see Simulation H1 in section 4.2, the

initialization of the update law (3.72)-(3.73) uses an identity (or a nonsingular matrix) for

the center tap matrix (speciﬁcally, j =1_(L —1)/2_|) and small random or zero values for

the remaining tap matrices.
Let the mixing environment to assume the same state space representation as in
(3.24)-(3.25), while the demixing network assumes the state space representation (3.26)-

(3.27), repeated below for convenience

48

networ
intend:

andB

freun

Where

1T1,-

i" ﬁlt

mam (

X(k+l) = A X(k)+B m(k)
y(k) = C X(k)+ D m(k)
As mentioned before, for the non-minimum phase mixing case, the demixing

network is constrained to be a double-sided FIR ﬁlter [22, 156] so as to approximate the

intended unstable IIR inverse. Using the MIMO controller canonical form, the matrices A

 

 

 

 

and B take the form
Pom 0m 0m 0m - — 1m -
A: 0m 1m 0m 0m ,B= 0m
__0m 0m 1m 0m_ _0m_
where

1",: Identity matrix of dimension m x m

0", : Zero matrix of dimension m x m

For compactness, we can represent (3.26)-(3.27) using the compact (mixed time-

frequency) transfer function notation of the MIMO FIR ﬁlter as
yr = [W (2)] ml. (3.75)
where, W(z) is the transfer function of the demixing MIMO FIR ﬁlter of the form,
W(z) = W0 + le“ + . . . + WHZ‘L+1 (3.76)
Further we can deﬁne a matrix W , comprised of the coefﬁcients of the ﬁlter

W(z). Note that each constituent sub-matrix W,- contains co-efﬁcients pertaining to the

i’h ﬁlter lag acting on the time instant k-i. This matrix W is related to the state space

matrices C and D as given below.

49

 

“111

as

For

trar

wh.
COG
tim

Cau

Usi

dOI

147211410 W1 WL—11=[D c]
where the state space matrices C and D in terms of the sub-matrices W,- are represented

as

D=W0

(3.77)
C=[W1 W2 WL—11

For the MIMO transfer function W(z) , the conjugate (or Hermitian) transpose of the

transfer function can be deﬁned in a fashion similar to the one proposed by Lambert [73],

i.e.,

_ T _ _ T
[W(z)] = [W0 + W12 1+...+ WL_lz “1]
= W0T + Wszl + . . . + W[_,.zL‘1 (3.78)
= 117th“)

where, T is the Hermitian transpose operator. Notice that again we can deﬁne a matrix of

coefﬁcients WT comprised of the transpose of the constituent sub—matrices acting in a
time reversed fashion, i.e., the 14" lag of this time reversed ﬁlter acts on the future (non-

causal) time instant k+i, i.e.,

WTé[WOT W,T WL_1T] (3.79)

Using the deﬁnition of W , we can express the network output equation (3.75) in the time
domain as
_ L—l
yk -_- ka a Dmk + Z cimk_,. (3.80)
i=1

where ’l’k represents an Lm x 1 matrix of observations at the k’h iteration, which includes

the k’h and previous L-l m-d measurement (or observation) vectors, i.e.,

50

 

 

v1. : repre
Ft
divergenc

maximiza

H

where q,

 

jE[0.L-_

47,105],u

A 11
where 61.1

It;
minimum
j‘UL~1
on“ be SC
(3.91) 0r (
“.111 inch,C

filler matrj

T
A T T T
’l7k =[mk mk—l mk—LH]
. k’h . .
yk . represents the output vector at the iteration

For the derivation of the update laws, we minimize the Kullback-Liebler
divergence with the constraints of an FIR representation as described above. Using the

maximization of entropy approach, the Hamiltonian can be deﬁned [9, 47, 105, 109] as

Hk = —log dethjl—Zlog(q,- (y,,W(z))) (3.81)

where q,- ( yi,W (2)) is the online estimated marginal distribution of yi(k) and
j e [0, L - 1] .

Using the ordinary stochastic gradient update laws (3.35) and (3.36), also see [9,
47, 105], the update law for the sub-matrices W,- is given by

k

6H _ T ,
AW: = -m. _aW. = 77k [1W1- 1) 5,.- -¢(yk)m£_,-1 (3.82)
l

where 51-,- equals 1 only when i= j , and j e[0,L—l].

It should be noted that W j is required here to be nonsingular. For the non-
minirnum phase case, j is appropriately chosen to be an intermediate tap, e.g.
j = L(L -1)/2_1 , in order to render an appropriate double-sided FIR ﬁlter. In any event, it

will be seen below that using the natural gradient, the resulting parameter update laws

(3.91) or (3.92) do not depend on the explicit nonsingularity of the j -th tap matrix and

will include all tap matrices. The Stochastic Gradient (SG) law for the complete MIMO

ﬁlter matrix is given by

51

 

modi'

Riem

When

While

__ L—l _ T L—l ,
[nu/(2)156 = Z AWrz" = 721.11%“) 6,.- — Z e(yr)m,f_,z" (3.83)
i=0 i=0

Using the deﬁnition of the natural gradient in the systems space [9], we can
modify (3.83) to have the update of the demixing ﬁlter parameters according to the

Riemannian contra-variant gradient as

AW(z) = [AW/(2)150 WT(z“)W(z)

T L—l _ __ _
=41. (W?) 6,-1— g.)e(yk)ml..-z“1WT(z“)W(z) (3.84)

T _ __ L-l , _ _
= 41. (W?) aerTe‘UWtz) — 2;.) etyomi..z"WT(z“)W(z>

 

where, the ﬁrst term on the right hand side can be simpliﬁed as

(W1711T 6jiWT(z-1)W(z) =1W1711T (o‘jiWT(z))W(z)

T (3.85)
=(Wj'1) WITH—”(2) = [Tn-«2) = "7(2)
while the second term can be simpliﬁed as
L" r —'-T —1 - If] - T - —'
Z ¢(J’k )mr_.-z 'W (2 >W<z> = «204) Z ([W(Z)]mk—i) W(z>z '
i=0 i=0
L-l r __ _
= (DOME J’k—iW(z)Z_'
:3 T (3.86)
= ¢(yk)Z([WT(2‘l)1yk_r1 z-i
i=0

L—l .
= ¢(y/r)z “Li Z”
'=0

where, we used equality (3.75) and u; is the transpose of the non-causal back-

propagation ﬁlter output, deﬁned as

52

 

Usit

tran

req.
1m;
Upc‘
am.

Pra

w,

I I;

L—l
“k % [WT(2"‘>]yk a DTyk + 2 CM..- (3.87)

i=1
Let yk represent an NL x 1 matrix of outputs at the k’h iteration, which includes

the k’h and the future L-l N-d output vectors, i.e.,

T
A T T T
Xk =[yk Yk+l .Vk+L—l:|
then the back-propagation ﬁlter can also be expressed in the compact form
_ “T
u(k) — W _}_’k (3.88)

Using (3.85) and (3.86) in (3.84), we get the following update laws for the MIMO FIR

transfer Function

L—l ,
417(2) = 221. W12) - Z (PUkW/i—izﬂ (3.89)
i=0

The update law (3.89) is non-causal as computation of u£_,-;i =0, 1, 2,...,L—l
requires up to L —1 future outputs to be available. Practically the update law can be
implemented (or made causal) by introducing a delay of L —1 iterations between the
update of the parameters and the computation of the output at the k’" iteration (this in fact

amount of using a frame-buffering in order to initialize the update law as is used in

practice). Thus, a causal version of (3.89) becomes

L—l ,
[AW(Z)]C = 77k [W(Z) - Z ¢()’k—L+1)ui—L-i+12_' (3-90)
i=0

Writing down the update laws in the time domain for the 1‘” component coefﬁcient matrix

W,- , we have

53

[Ni/11C = 7711””: - (0(yk-L+1)ui—L—i+11 (3°91)
Deﬁning the new time index k = k — L +1, we have

[An-lo=nklm-81y11711u’1i-ill we
where the computation of the network output y(k) and the back-propagation ﬁlter

output u (k) is computed as in the following expressions

L—l
y(k)§y(k—L+l)=ka_L+l = ZCim(k—L+l—i)+Dm(k—L+1) (3.93)

i=1

L-l
u(k)éu(k—L+l)=WTy_k_L+1 -_- ZCiTy(k-L+1+i)+DTy(k-L+l) (3.94)

i=1
Therefore the explicit causal update laws for the state space matrices C and D,

using the time index k are

AC,- = 77,, 1C,- —¢(y(k))u(k—i)T],i=1,2,-~,L—1 (3.95)

AD=nk [D-wtytk'11utif] (3.96)
In the state space regime, the information back-propagation ﬁlter assumes the form

20? -— 1) = AUG?) + C50?) (3.97)

u(k) = 37.10?) + DTy(k) (3.98)

The non-minimum natural gradient algorithms [14, 126, 129, 130, 154-156]
require both forward and backward in time propagation by the adaptive MIMO FIR ﬁlter.

One would require a buffer of 3L-2 input samples to initiate the adaptation according to

the update laws.

54

 

 

 

encon11
conven
anylos

beenig

where
minim
Consui
Proces
Phase

10 5110

3.4.1.,

11 sho
F IR 5.

In tha

Simila;

The non-minimum phase state space BSR update laws, being more general, also
encompass the domain of the minimum phase mixing network case. This can be veriﬁed
conveniently by using deﬁnition (3.87) in the update law for the matrix D. Note, without
any loss of generality, the delay introduced in the update law (3.96) to make it causal has

been ignored for clarity.

no = 77,. :D — (0(y(k))u(k)T]

p

L—l T
= 77k D - e(y(k))[DTy(k) + Z CrTyUc + 0]
i=1

1—

H (3.99)
= 77k D - ¢(y(k))[y(k)TD + 2 WC NYC/1'11

i=1

 

' L—l
= 77k (1 - (/?(y(k))y(k)T ) D - (001(k)) Z y(k + 0" Ci]

_ i=1
where the ﬁrst term on the right hand side of equation (3.99) comprises the instantaneous
minimum phase update of the matrix D (see equation (3.48)), while the second term
constitutes the “non-causal” update term corresponding to the back-propagating
processing required for the double sided FIR demixing ﬁlter in the case of non-minimum

phase environments. Similarly, the update law (3.95) can be expanded using (3.80)-(3.87)

to show the explicit relationship to the minimum phase feedforward update law (3.47).

3.4.1.1.] Remark 3.4:

It should be noted that Equation (3.84) holds for the doubly inﬁnite ﬁlter space. For an
FIR Space domain, it does not satisfy a Lie group in the sense of the formulation in [154].
In that event, the matrix product in (3.84) is deﬁned differently and consequently a

similar, but different, update law results.

55

 

 

 

3.5 '

MATl
all thr

follox‘

Wher

BVCt

y(k)

dep
aux
Pol.
knc
rep
We
eff
131

831

3.5 Simulation Results

MATLAB simulation results for the three linear BSR algorithms are being presented. In

all three cases the environment comprises of a 3 x 3 IIR mixing convolving ﬁlter with the
following model [110, 112, 155].
m—l n-l
Z A,m(k — i) = Z B,s(k — i) + v(k) (3.100)
j=0 i=0
where

h

A,-,B,- - are the co-efﬁcient matrices for the 1‘ tap of the autoregressive and moving

average sections of the ﬁlter respectively, and

v(k) - is the additive gaussian measurement noise.

m, n -— are the lengths of the auto-regressive and moving average lags in the ﬁlter.

The demixing network for each simulation can be initialized in a number of ways,
depending on the available environment information, which may be collected via
auxiliary means [155, 157]. This may include complete, partial, or no knowledge of the
poles of the mixing environment [126, 135, 139]. The primary advantage of incorporating
knowledge of poles in a BSR algorithm is to obtain a more compact demixing network
representation. All the three cases have been veriﬁed via simulation and the algorithms
were able to converge satisfactorily using online update algorithms. However, to make
efﬁcient use of space only typical results are being presented here, see [110, 112, 128-
130, 135, 139, 140]. Unless otherwise speciﬁed, the original sources had sub-gaussian,
gaussian and super-gaussian densities respectively.

The convergence performance of the BSR algorithms, for the linear and
convolutive mixing, is measured using the multi-channel intersymbol interference (MISI)

56

 

benchma

pennutati

depend (
Convergg
Simulatir
0n the b
(3.102) .

Outpms

Where
5413') :

7', repl

benchmark. MISI is a measure of the global transfer function diagonalization and

permutation as achieved by the demixing network and is deﬁned as

N ZZlerl“maxp./' 1012171

 

 

_ 2161”] 1"“pr 1619171
I p 7

 

 

. N
MISIk=Z ’ p . +2 . (3.101)
i=1 mxm 1sz7 j=1 maxp.ileij

G(z) = H (2) * 17(2) - Global Transfer Function,
H(z) = [A9, Be, Ce, De] —Transfer Function of Environment,
H (z) = [A, B, C, D]—Transfer Function of Network.

As earlier discussed, the optimal score function for the derived update laws
depend on the density function of the sources to be separated, which upon successful
convergence of the algorithm is similar to the density of the separated outputs. For
simulations, a hyperbolic score function [126, 127, 130, 136] has been used, which relies
on the batch kurtosis of the output of the demixing system. This score ftmction given by

(3.102) converges to the optimal non-linearity for the demixing system as the network’s

outputs approach stochastic independence.

y-atanhwy) K402) S. -r
(0(y)= y for |K4(y)|<r (3.102)
a tanhwy) 1640') 2 r

where

K4 (y): represents batch kurtosis of the output signals.

y: represents a positive density classiﬁcation threshold.

57

 

 

3.5.1 B

In this sec

(3.100) are

 

 

 

BO:

In t
3 matrix o
scaled to 2
map, the
environ,“
OVeran g

ShOum in

3.5.1 BSR Simulations — Minimum Phase Mixing Environments.

In this section, the BSR results for a minimum phase IIR mixing model of the form

(3.100) are presented, where m = n = 3 and

—

 

1 1 —1 0.5 0.8 —0.7 0.06 0.4 —0.5
A0: 1 —1 1 ,A,= 0.8 0.3 —0.2 ,A2= 0.16 —0.1 —0.4

11 -1 1 —0.1 -0.5 0.4 —0.3 —0.06 0.3

'1 0.6 0.8 0.5 0.5 0.6 .125 0.06 0.2
BO: 0.3 1 0.1 ,B1= -0.3 0.2 —0.3 ,Bz= —0.1 0 0.4

_0.6 —0.8 1 -0.2 -0.43 0.6 0.08 —0.13 0.3

In the transfer function domain, this MIMO transfer function will constitute a 3 x
3 matrix with each element being an order 18 [IR ﬁlter (provided each element has been
scaled to account for all the environment transmission poles). The transmission pole-zero
map, the impulse response of the environment transfer function, the theoretical
environment inverse, the corresponding typical demixing network estimate, and the
overall global transfer function using minimum phase BSR algorithms are respectively

shown in Figure 3.4, 3.5 and 3.6.

58

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pole-Zero Map of the Mixing Network
0.8 , ”
.’/
0.6 5 o ‘
,’ x “ ‘\
04 [I] O \'.
O ‘
O ‘ .
0.2- f o 1
9 ,1 X
g, 0 M6300 .. o o . (9 {
E 1
— -0.2 ~ g a) .
.I‘ K’ O I 1
f“ 1
_0-4 7 \ V O r/ _
x ,l
-O 6 \ Q o ,/
,/
/
08 ' \\\ /',
_1 ,2 T. . .2 2 . .2. l. . _"UA“"_",,‘;:~__+._‘~' "' ' I: ..-, 1 .-., . , L, .4.,
-1 0.5 0 0.5 1
Real Axis
(a)
Mixing Environment (impulse)
1 0.6 0.5
0.5 1 NO" n 0 -—----——
cv- CF 02 cv-
o —————— 0.5
0 -~--__
'0‘5 0 10 20 '02 o 10 20 '1 0 10 20
0.5 0.5 0.4
1
o | -_-__ , o _____ 0.2
1:. 8‘. 8 0 '
T 05 1': -05 c
' ' -0.2
, __ . “ML - . -__, . L-____s_ _ _
1 0 10 20 1 o 10 20 '0" 0 10 20
0.6 , 1 1 ~
0'4 0.5 0 5
'— N
:9) 0.2 1&0: 0 c8 O
o -_-.___._ 1
'0'2 0 10 20 0'5 0 10 20 ‘0'5 5 10 20
Taps - n Taps - n Taps - n

 

 

 

(b)

Figure 3.4. Minimum Phase Mixing Environment (a) Environment Transmission Zeros

and Poles, (B) Impulse Response of the Environment Model

59

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Theoretical Environment Inverse
0.6 1 0.4
0.4 N 0.5 660.2
33'- 02 3'- 3" o l..-— ------
0 --." ——————
o _-___-_ I' 0.2 r
‘02 o 5 1o 15 0'5 o 5 1o 15 '0" o 5 1o 15
02 0.5 0.4
0.1 I o l._.__ ______ 9,02
" ___..- $3 ._-__-
BN 0 i -05 1 3 o
0.1 ' 0.2
02 o 5 1o 15 '1 o 5 {o 15 ‘0" 5 1o 15
0.2 0.6 1
o Il‘r'h‘--“'- a 0.4 805
g” .02 3 02 I. is o “_________
‘0-4 0 1-—-—------— _-
-o. A .
60 5 1o 15 ‘02 o 5 1o 15 05o 5 1o 15
Tms-n Tans-n Tans-n
(a)
Estirrated Dem'x'ng Network
1 1 1
0.5 0'5 0.5 I
1‘: a 0 ___-__—__.‘. a
g o .L_-_—-——<1; r 3 o .——- ——————
0.5 r
0'5 o 5 1‘0 15 '10 5 1o 15 0'“ A 5 1o 15
0.6 1.5 1
0‘4 81 1 «.05
so: 02 3 0.5 go:
I l o -—_-----——J
0 w——« o r-_-__ _____ J r
02 5 {o 15 0'5 o 5 {o 15 0‘5 ‘ 5 1o 15
0.5 0.6 1
o 'I.--—-—————« 0" 0.5
33 39" 02 l 38
-0.5 0 1'5._-__.__, 0 ...I..-...__ - _
-1 A - A . 5
o 5 1o 15 02 o 5 1o 15 05 o 5 1o 15
Taps-n Tms-n Tans-n

 

(b)

 

 

Figure 3.5. Minimum Phase Mixing Environment, Impulse Response of (a) Theoretical

Environment Inverse, (b) Estimated Demixing Network

60

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Final Globd Transfer Function
1 1 1
- 91 :2
E o E o-~---——-—-—-~« E or—ww -———- a
c) (D O
-1 -1 -1
0 1B 23 30 o 1A0 26 30 0 1o 20 30
1 1 ‘
,5 .5" ,3
5 G “=54 5 0 5 0* i
o o o
1 -1 1 1
0 1o 20 30 o 10 20 30 o 10 20 30
1 1 1
v- ’3 ’8
3f 0 1:1, 0 5 o
0 0 o
1 .1 .1
0 1o 20 30 0 1o 20 30 0 1o 20 30
Taps-n Taps-n Tms-n

 

 

 

Figure 3.6. Minimum Phase Mixing Environment, Final Global Transfer Function

3.5.1.1 Simulation 1: Feedforward Recovery of Minimum Phase Mixing

Using the minimum phase feedforward structure, three different simulation scenarios are
being presented, where the poles of the demixing network have been ﬁxed (a) at the
solution, (b) all set to zero, or (c) set randomly but stable. In all cases, the matrix C is
initialized to have small random elements, the matrix D is initialized to be the identity
matrix, while matrices A and B are correspondingly set to be in Canonical Form 1. The
number of states in the demixing state space network was correspondingly chosen to be
18, 60, and 27. For all three cases, the pole-zero map of the ﬁnal demixing network and
the online MISI convergence characteristics are being presented in Figures 3.7-3.9 after
30,000 online iterations. It is observed that in all the three cases the global transfer
function is diagonalized, although the demixing network have quite different pole-zero
maps, indicating multiple possible solutions to the same problem. A quantitative

comparative summary of recovered results has been summarized in Table 3.1.

61

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pole-Zero Map of the Final Demixing Network
1 . ' '— — ._ ' I _- ' . ». .7
0 8 "‘
h ‘0 \‘\.
06 . I/ x x ‘ \‘,‘ |
1 ,1” Q- \\ 1
0.4 1 1 7» O U \1‘ .‘
’1 to) x O '\ 1
9 0.2 r J} O x ‘1 i
E? O I x x x O (:1 u x O 0 1 i
g 0 2 1‘ O x I' i
0 4 1 k 90 Q I i
I 8 / i
-0.6 ‘ \ \ x x /// '
. ‘ Q ' !
-05 ,1» 1
-1 . I “ — -:-;"'_:— ”I“ z i l , J
1 0 5 0 0 5 1
Real Axis
(a)
Multichannel Feedforward Blind Source Recovery
3 I I T I f
l
I“!
z 2 '- h J! -‘
,5 Ni, ’1
(Z) 1_ W k , A
WM'lJ.L~\"ﬁ“w"‘H/'WW{N-Mf AV: i/JWNM'WN
00 0.5 1 1.5 2 2.5 3
4
3 I ‘I I T I x 10
*1
N l- .4
2 2 Vii,
2 with
a 1 r- “AWN. a
VW‘WWWWWMM
0O 0.5 1 1.5 2 2.5 3
4 ﬂ , , , x 10‘
3 _.
M
‘1’ V1 l
'3 2 — MM‘VLu l d
W ’ u "‘- 5 I 1 l
- *- 1‘ y' y I, j H .' .. . v -l n ‘ “
1 v « ii *' i1 ”him-W 111 w HilalrwfmwvlwlwW11
00 0.5 1 1.5 2 2.5 3
number of iterations X 10‘

 

 

 

(b)

Figure 3.7. Minimum Phase F eedforward BSR — All poles at the theoretical solution: (a)

Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

62

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pole-Zero Map of the Final Demixing Network
1 . » , ,_, - A ~~=s .1, . - g
i O" “\x i
0'8' /, ' C b) “J 0 (S) O \
I ’1 O 8 \_ )0 (1E) ‘ \x» i
0.6 ‘- \‘ ,.i
. / O o x i
, z (13 O . .1
O 4 ' , " ‘.
GD \ 1
' 1" O O O
0,2 , I l,‘
3 ‘ f CD (3) :‘1
a, 0 «0770 ., 0 ~ » Of
‘° ; i l
E 1 , .
§ *1. O 0 ff '
-o.4l \ 0 Q3 ’
I; \ (I) 0 :
. \\ Q Q I. {
-0'6 h \\ O O O O // A
f \\\ C) O O O O I// ’
‘0‘8; \ 00 O "
j \ x. ,./”/ ‘
-1 = _. E, ; A - 7,__ .. ,4 _ . ‘.‘_:_-~_- ' _-.'__f:'.___ M U__ _ﬁ ,___.._.___;
-1 -0.5 0 0.5 1
Real Axis
(a)
Multichannel Feedforward Blind Source Separation
6 I I I I I
‘5 4 ” ‘
£2
5 2~ 1
...,-M_
o I l L J I
o 0.5 1 1.5 2 2 5 3
4
6 I I I I I x 10
a 4 R125 4
2
(Z 2 P WWW
o J 1 1 1 1
o 0.5 1 1.5 2 2 5 3
6 , . r x 10‘
:3 4 ~ 4
3:
«z» 2 ~ 1
0 l 1 l 1 '7 I
o 0.5 1 1.5 2 2.5 3
number of iterations x 10‘

 

 

 

(b)

Figure 3.8. Minimum Phase Feedforward BSR - All poles set to zero: (a) Demixing

Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

63

 

Pole Zero Map of Final Demixing Network

1 Poles
0 Zeros

 

 

  
   

 

 

 

e—‘——-—~

Imag Axis

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Real Axis
(a)
Multichannel Feedforward Blind Source Recovery
1 E r T r f r
‘5
:9: 0.5 ~ -
5 W
0 1 1 WWWme
0 0 5 1 1.5 2 2 5 3
1 , f , . T x 10‘
l.
N
ca
5 0.5» -
E
5A JHMMNN
0O 0.5 1 1.5 2 2.5 3
1 I I I 1' T x 10‘
m \
U)
§ 05» -
Z
~WW~¢WWM
00 0.5 1 1.5 2 2.5 3
number of iterations x 104

Figure 3.9. Minimum Phase Feedforward BSR — All poles initialized arbitrarily:

(a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

64

3.5.1.2 Simulation 11: Feedback Recovery of Minimum Phase Mixing

For the minimum phase feedback structure, similar three different simulation scenarios
are being presented, where the demixing network poles have been (a) fixed at the
solution, (b) all set to zero, or (c) set to be random but stable. The state space matrices
and the number of states for the feedback network were chosen similar to the feedforward
case. Note that again three different global diagonalizing demixing network pole-zero
maps are obtained. The qualitative simulation result summary is presented in Figure 3.10-

3.12. See Table 3.1 for a quantitative comparison summary.

65

 

 

Pole-Zero Map of the Final Demixing Network
I - r ' r .;V,_.—“" r 7“«?_'.,V ‘ rt ' r ' * '
."" ' \“\_\\‘
0.8- ,
/‘ \
0.6 , x O x x
I” ’1' I”) ‘\
0 4 7’ x X 1
O
0.2 , .2 x x
1’ 1 ‘
2 ~
0, o rooooowuw ,, 00
(U i )r
E i,
— -0.2 -- ‘1 x x {I
l O I;
'0.4 \\\ X X 1",
0,. o .
'06 3" \‘ O I K /
\\ ,/
.\ /
-O.8- \\ f ,
. \ ﬂ/
-1 a, ___. .._ .f___.. ._,7 I_.ﬁ. . .ﬂftv . .-f:;__.21__.. _ . __ . . _
-1 -0.5 0 0.5 1
Real Axis

 

 

 

(a)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Multichannel Feedback Blind Source Recovery
5 I I I I I I T
4 ~ 0 _
'5’ 3wwilii = ‘
.2 . V ' 1; with . -
‘Z’ 2 ‘ ”1M
1 _ 4
WNW 1
oo o 5 1 1 5 2 2 5 3 3 5 4
4 I T I I I I I x 10‘
3 NM: ‘
‘3 5M «LN
‘- 2 W l -
2
0o 0.5 1 1 5 2 2 5 3 3 5 4
4 T I T I I I x ‘
m 3 r -
g 2 ﬂmwllw . .
_. “wt 1k M
w 1)
- 1 b M“ 501%..» d
0 l I 1 l 1 MWAM“
o 0.5 1 1.5 2 2.5 3 3.5 4
number of iterations x 10‘

 

 

 

(b)

Figure 3.10. Minimum Phase Feedback BSR - All poles at the theoretical solution:

(a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

66

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pole-Zero Map of the Final Demixing Network
‘\..
O 8 xx xx K IX
0.6 1’ ‘3 x ‘ .
l.’ " i x ‘,
0 4 , x . " x 7.
0.2 1 A y I.»
m ' x ., j
g 0 ; 7 x x 7 { 5
‘° 1
E ‘ J
_ 0‘2 1 x x I
I x
-04 . '~\ )1 x x x /
\ x x x
0 6 x x x x //
\ x X X X X f/
-0 8 *5 xx x “x ’
.\ \ _ y/,
-1 ‘ __. ’
-1 I 1.5 “"0”" “— 3.57"” I 3* If
Real Axis
(a)
Multichannel Feedback Blind Source Recovery
4 I I I I I I I
3 M ..
I; \‘w‘
§ 2 " \‘Knr— d
(2 1 i— M -1
W
o 1 l I L l_ 1 1
0 o 5 1 1.5 2 2 5 3 3 5 4
d
4 I I I I I f I x 10
3 N -
N
In
.5 2i \N/‘M 5
‘Z, 1 >- %W\ "
M.
m
0 1 l 1 l l l l
0 05 1 15 2 25 3 35 4
d
3 I I I I T I I x 10
“*1.
3 2 F MAW-MA ‘
.5. “W.
(Z) 1 ’— NWMWA -‘
Marv-HM.—
0 1 1 l 1 I 1_ 1
o 0.5 1 1.5 2 2.5 3 3.5 4
number of iterations x 10‘

 

 

 

(b)
Figure 3.11. Minimum Phase Feedback BSR — All poles set at zero: (a) Demixing
Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

67

 

 

  
    
 

 

 

 

 

 
 

 

 

 

 

 

 

 

Pole Zero Map of Final Demixing Network
I—\ l
: . Pobs
1, o Zeros
0' O
1 o
g, o~——— ------- 0__{,__-_3_60_§. —————————— ———-i
m .0
g .
4.
6
Real Axis
(a)
Multichannel Feedback Blind Source Recovery
3 . .

 

I I I I I
2 m d

 

lSl fors1

 

 

 

 

 

 

oo 05 1 15 2 25 3 35 4
4 T I ﬁi I I T X1o‘
N 3 d
(I)
§ 2* 1
E7)
_1__ 1
W
0 1 l l L L 1 l
o 05 1 15 2 25 3 35 4
3 ,4. T .11w

 

ISI for 33
N

 

 

 

 

 

 

WWM_NW
0O 0.5 1 1 .5 2 2.5 3 3.5 4
number of iterations x 10‘
0»

Figure 3.12. Minimum Phase Feedback BSR — All poles initialized arbitrarily:

(a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index

68

3.5.1.3 Simulation III: Feedforward Recovery of Non-minimum Phase Mixing

For the case of non-minimum phase, we again choose m = n = 3 for the HR mixing

environment model (3.100). In this case, the coefﬁcient matrices are given by

P

 

1 1 —1 0.5 0.8 -0.7 0.06 0.4 —o.5
A0=l —1 1 ,A1= 0.8 0.3 —o.2 ,A2= 0.16 -o.1 —o.4
_1 —1 1 —0.1 —0.5 0.4 -0.3 —0.06 0.3
'1 0.6 0.8 0.5 0.7 0.16 .425 0.3 0.7
BO: 0.5 1 0.7 ,Bl= 0.7 0.2 —o.3 ,Bz= —o.1 o —o.4
b0.6 0.8 1 —o.2 0.53 0.6 0.08 -o.13 0.3

The theoretical inverse of this environment model will be an unstable IIR ﬁlter with
a minimum of 20 states. This instability stems from having two poles of the intended
demixing network to be outside the unit circle. However, the demixing network is setup
to be comprised of a doubly-ﬁnite 3 x 3 FIR demixing ﬁlter with 31 taps per ﬁlter (i.e., a
total of 90 states) and supplied with mixtures of multiple source! distributions. The
polynomial matrix W =[D C] is initialized to be full rank, while A and B are in
Canonical form I. Instantaneous update results aﬁer 40,000 iterations are presented in
Figures 3.13-3.16. A close comparison of the theoretical environment inverse and the
adaptively estimated inverse will reveal that the theoretical results form a part of the

much larger (in taps), estimated inverse.

69

 

Pole Zero map of the Mixing Environment

r

 

 

x Poms
o Zeros

 

 

 

1
I
1
1
I.
1
1
III, I
1
1
l
1
|
I
l

Imag Axus
O
I
l
l
I
I
I
I
l
l
l

, L.
l
I
o
l
t
o
I
l
l

__,__;-"

 

 

l
I
I
l
l
l
I
l
l
l
l
I
l
I

 

0
Real Axis

 

 

 

(a)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mixing Environment (impulse
1 0.6 0.5
0.4 I
_o.5 N n o l--------
g I g 0.2 g
o — I----—---- o _ ___ 41.5
. , -1 '
'05 o 5 1o 15 '02 o 5 1o 15 o 5 1o 15
0.5 0.5 1
'_ E N o ---———--—— ”0'5
a“ 0 ...,—__ a“ Cu
-0.5 0 PM---
'0‘5 o 5 1o 15 ’1 o 5 1o 15 ‘0'5 o 5 1o 15
0.5 0.5 1
-04 o I.-—— ------ 05 I
:302 :3 :8
__ 41.5 1 o .--—--—--- 1
o I —--———
02 o 5 1o 15 4 o 5 1o 15 '05 o 5 1o 15
Taps-n Taps-n Taps-n

 

 

 

(b)

Figure 3.13. Non-minimum Phase Mixing Environment (a) Environment Transmission

Zeros and Poles, (b) Impulse Response of the Environment Model

70

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mixing Environment (impulse)
1 1 0.5
0.5 0.5 o I I-u---
:: s: «:3 I I
I a c
o - '- ———————— o .—---— ————— -o.5
-o.5 . -
o 5 1o 15 '05 o 5 1o 15 1 o 5 1o 15
0.5 0.5 1
‘_ I o ..--—-——--- 1 0.5 1
{N 0 '.-_—_—_ it“ £8
I -0.5 1 0 .Fw—-—— J
-o.5. e -1 - -o.5 A -
o 5 1o 15 o 5 1o 15 o 5 1o 15
1 0.5 1
“0.5 o l.---—----- 1 0.5
a” L- ag «:8
o I --—--- 1 43.5 1 0 .---~--——- 1
'0'5 o 5 1? 15 '1 o 5 1‘0 15 '0'5 (T 5 1o 15
Taps-n Taps-n Taps-n
(a)
Estimated DeniﬁngNetmork
1 0.4 0.4
0.5 02 .502
v- N v-
3“ o ,3 ° 3 °
-02 «02
-o.5 - - .4 1 : - -o.4 - A
o 20 4o 60 '0 o 20 4o 60 o 20 40 60
0.5 2 1
.— 0 N 1 1 0.5
N N M
g N
.05 g o 3 o
-1 -1 - 0.5 - -
o 20 4o 60 o 20 4o 60 o 20 4o 60
0.4 1 1
02 0.5
5 o , g 0 , 80.5
3 3 3 0.
.02 -0.5
0,4 -1 0.5
0 20 4o 60 o 20 4o 60 o 20 40 60
Taps-n Tws-n Taps-n
(b)

 

 

Figure 3.14. Non-minimum Phase Mixing Environment, Impulse Response of

(a) Theoretical environment Inverse, (b) Estimated Demixing Network.

71

 

Final Globd Transfer Function

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 1 1
v- N 2
3 0 T ' ‘ E 0 * ‘W E 0 " r
o 0 o 1
-1 -1 1 -1
o 20 4o 60 80 o 20 47) 50 so 0 20 4o 60 80
1 1 1
,— to
§ 0 A a v: ‘ A..— § 0 : “1“ YA- It]? 0 -:,.-.:-.:-
o o 0
-1 .1 -1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6(2):“
l

 

 

 

 

 

 

-1 1 -1 -1 1

 

 

 

 

 

 

 

 

 

 

 

T335 - n Tans - n Taps -n
(a)
Pole-Zero Map ofthe Final Demixing Network
"F ' ’ ‘-’—‘ f '_ T Xi” 7; ' ‘m' ‘ ' ' “
//<.1506) C530 \\
0.8 ,/ 00 @Q2 \\
‘1'] d5) b \\
/' O O ‘\
0.6 * l.’ O O Cb \'\\
' o 0 <2;
O 4 (9 Q ".\
0 CU
02; f 8 O ‘1
9 1 i O ;
x 1 O O O ;
E. 0 03 - - - — - - ”e
E l. O 1
-O.2 ~ 1. 9) o f
0 4 K‘ O ((3)0
\ 1'
-0 6 \ o o 00 x
O O ,/
-0 8 \ CQ) 00 {5 C593,, '/
“ CO 0% Q: o .1/
_1 _ 7 - 1 , §"_\—.a_‘———- _._ ,
-1 -0 5 O 0 5 1
Real Axis
(b)

Figure 3.15. Non-minimum Phase Mixing Environment, (a) Final Global Transfer

Function, (b) Demixing Network Transmission Zeros and Poles.

72

 

 

 

 

Multichannel Feedback Blind Source Recovery

 

 

 

 

 

 

 

 

 

 

6 F f r r r
,A“ ‘
21H”
'5 4 l‘ “4“" —«
5 “1..
— 2 1. wag,
(L) M‘wNWwV‘M m’ " WfV‘MI-vv~~’ ”\W
o 1 1 L L 1 1 1 1 1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
I
8 f ﬂ x 10
6 ﬂ” WK 5
a: ' ‘1
E 4 L— Rx '1
(7) “M.
0 1 1 1 1 1 1 1 L g
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x10‘
6 [1033K r —r ﬂ T
4 - K 4
‘5: “x
.5 M
('72 2 p M
0 1 L 1 1 1 1 1 I 4;
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
number of iterations x 10‘

 

 

Figure 3.16. Non-minimum Phase Mixing Environment, Convergence of MISI

Performance Index.

A quantitative performance comparison of the blindly recovered communication signals

using the presented BSR algorithms is provided in Table 3.1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Minimum Phase Feedforward Minimum Phase Feedback Non-
Demixing Demixing minimum
Poles Poles Random, Poles Poles Random, D5331:
known ﬁxed at stable known ﬁxed at stable 3
zero ﬁxed zero ﬁxed
poles poles
AM -22.55 -21.77 -26.02 -27.67 -l6.47 -23.42 -17.67
Signal
151 PSK -26.66 -l8.48 -23.52 -23.52 -l8.85 -24.85 -11.43
Signal
(1“ dB) White -2109 -21.85 -2157 ~26.03 -18.36 -23.96 -11.39
Noise

 

Table 3.1. Quantitative Signal Recovery Comparison of the BSR algorithms

73

 

3.6 Remarks

This chapter presents a generalized Blind Source Recovery (BSR) framework based on
the theory of optimization under constraints of a state space representation with the
Kullback-Lieblar divergence as the performance functional. The framework is then
specialized to the state space blind source recovery for both static and dynamic
environments, which may possess either a minimum phase or a non-minimum phase
structure. For the case of minimum phase environments, two possible state-space
demixing networks in feedforward and feedback conﬁguration are proposed. For non-
minimum phase (convolutive) mixing, to avoid any instability, the demixing network is
set as a non-causal all-zero feedforward state space network. The update laws for all the
cases have been derived using the natural (or the Riemannian contra-variant) gradient.
The algorithms have been extensively simulated for both synthetic and physical
data. The simulation results for all the cases have been provided using mixtures of
multiple source densities, including one gaussian source. The algorithms exhibit success
in recovery of sources in all cases. Furthermore, for the case of minimum phase
environments, various possible initial conditions have been explored. A comparative
table for a set of communication signals has also been provided as a guideline to the

achievable separation/recovery results using the proposed online adaptive algorithms.

74

Chapter 4

BLIND SOURCE RECOVERY USING

ADAPTIVE SCORE FUNCTIONS

The chapter discusses Blind Source Recovery (BSR) of minimum phase and non-
minimum phase mixtures of multiple source distributions using adaptive score functions.
For the linear static and the dynamic convolutive mixtures, the ability to adaptively
determine discerning nonlinear score functions allows for recovery of original sources
independent from (and even in the absence of) precise environment identifiability.

The performance of BSR algorithms strongly depends on the choice of an
appropriate score function which appears as an element wise acting non-linearity on the
output signals [1, 3, 14, 22, 24, 29, 30]. For a particular problem, the optimal score
function depends on the distribution of the original source signals which are unknown in
a blind scenario. In such cases, unless some assumptions about the distribution of the
sources are made, any BSR algorithm will potentially be unable to deliver the desired
performance. Therefore, adaptive estimation of appropriate score functions is very

attractive ﬁom a practical implementation viewpoint. The proposed score functions have

75

been applied to BSR for both the cases of minimum phase and non-minimum phase
mixing environment models [4, 10, 11, 16]. Several closed form entropy expansion based
deterministic expressions for adaptive score functions have been earlier proposed, but
they are typically suited for a limited range of distributions and converge to the true
estimating score function for static mixtures only [2, 5, 6, 21].

Most of the mixtures encountered in practical BSR problems are from sources
with a variety of non-gaussian distributions, e.g., speech and acoustics, communication
signals, intemet data, interstellar signals, bio-medical signals etc. On the other hand, most
noise phenomena or unidentiﬁed sources are assumed to possess gaussian distributions.
This results in practical situations, where one has to cope with multiple source
distribution mixtures including gaussian distributions. This chapter describes the use of
the proposed adaptive score functions for the linear convolutive class of BSR.

Two adaptive score functions are presented in this chapter. The ﬁrst score
function is derived from the generalized gaussian distribution model [7, 12, 26]. The
second score function is based on proposed new hyperbolic source models, which extend
current Pearson and Hyper-Cauchy distribution models [12]. The proposed score
ﬁmctions can be tuned online by certain control parameters. An adaptive algorithm to
estimate these control parameters for the proposed tuning of score functions using the
batch kurtosis of BSR output is also presented. The primary advantage of the proposed
online parameter estimation is that it renders the adaptive estimation of the demixing
network to be completely blind. No a priori information about the distribution structure of

the original sources is required. Simulation examples verifying the proposed BSR

76

formulation (inclusive of the proposed score function estimation technique) are included

as an illustration of the proposed framework.

4.1 Generalized Gaussian Distribution Model

The generalized gaussian distribution class was proposed by Miller et. al. [17] and used
in the detection theory as a model for non-gaussian noise. This class has a symmetric
unimodal density characteristic. Various classes of densities are generated by a
parameterization of the exponential decay of this density function. The Generalized

Gaussian Family has the following density relation

 

f(xi,a,-,a,-) = “iﬂi(°'i’ai) exp{-(ﬁ,(a,-,a,-)|x,.|)ai} (4.1)

211%.)

where

ai >0, is a positive parameter that controls the distribution’s exponential rate of decay

03-2 is the variance of the distribution

F(.) is the gamma function, given by
(I)
run = Ira-16"Tdr (4.2)
0

,Bi(0',-,a,-) is a scaling function of a,- and a,- , deﬁned as

1 151%
11%.)

This generalized gaussian family can encompass a wide variety of distributions,

 

161(0'13611)é 0i— (4.3)

e.g., for a,- = 1, the model reduces to the laplacian probability density function

77

meta.) =I‘L‘2"ﬂexp{—(ﬂ.(m,l)lxil)} (4.4)

For 01,- = 2, the end results is the gaussian probability density function

i i,1 2
f(xi1210'i) =%JCXP{—(ﬂi(0112)lxil) } (4-5)

 

 

Some members of Generalized Gaussian Family, 02:1
3 . . . , f
- - - Gamma. “1:05

___— Laplacian.a,=1
__ Gaussian.al=2 A
_ Uniform.al=oo

 

2.55

 

 

 

l---—_____

1.51- " .1

f(x, 0, 1)

, \

0.5— / ‘ ~
/’ // \\‘

— /,x’ \ \_,

/
vii-H.” ‘

o —— M'ﬁh' ' — ’1 1
-3 -2 -1 0 1 2 3

 

 

 

 

 

 

 

 

 

Figure 4.1. Some members of Generalized Gaussian Distribution Family

As a,- tends to inﬁnity, the distribution approaches the uniform density function

f(xi,w,ai):m;i<x. (i (4.6)

_ , _
7- ﬂi .31
While as a,- tends to zero, the distribution converges to an impulse probability function
As a function of a,- and for 03:1, some members of the family are presented in

Figure 4.1.
78

Based on the generalized gaussian density model in (4.1), the kth absolute moment

can be characterized as

Eiixiik} =ﬂi(0'i1ai)kr[ (4-7)

 

4.1.1 Generalized Kurtosis Measures

The generalized kurtosis is a statistical measure to describe the “tail weight” for a
distribution, it does so by estimating the deformation of the distribution from normality.
Kurtotic measures have been extensively used in signal processing for classiﬁcation of
data distributions, and as a distance measure from gaussianity. In order to avoid any
ambiguity, it is important to clarify that there are two types of kurtosis measures.

Pearson Kurtosis [19] is based on the mean statistics [12, 19] and in its

generalized form it is deﬁned as

KP(g,x,-)=E{|xilzg} 1 (4.8)

2
E {'11 Fl
Fisher Kurtosis [12, 13] is based on the k-statistics [13] and is correspondingly

deﬁned to be the ratio of cumulants, i.e.,

KF . : K21; (xi) 4.9
(Sax!) ﬂea”) ( )

Typically the above-mentioned kurtosis measures are used for g = 2 , referred to

as the standard kurtosis. For this standard case, the above two kurtosis measures are

related by

79

Kf(x,)=Kf(x,.)—3 (4.10)
where K it (x,) = K i(2,117,) represents the computed kurtosis for the standard case of 4' = 2

The primary advantage in using the Fisher Kurtosis is that a perfect gaussian
source will have zero ﬁsher kurtosis. A leptokurtic distribution (Fisher kurtosis > 0)
represents the super-gaussian family of distributions, while a platykurtic (Fisher kurtosis
< 0) distribution represents the family of sub-gaussian distributions [12, 13].

For the generalized gaussian family, the generalized Pearson kurtosis can be

simpliﬁed to

riml‘lii
KP(;,x,-)= a‘ a’ (4.11)

 

Therefore, the standard Fisher kurtosis can be written as a function of the shaping

parameter a,- only, i.e,

Kf(,i,___r(%,-)F(%.)_,
PM)

Note that for the parameter ai = 2 , the distribution reduces to the true gaussian

 

(4.12)

with zero kurtosis. The complete kurtosis characteristics of the generalized gaussian
family are given in Figure 4.2. As generalized gaussian family can only represent

unimodal distributions. The Fisher kurtosis approaches approximately —1.2 as a,-

approaches inﬁnity (i.e., the Fisher Kurtosis for a ﬂat or uniform distribution).

80

 

 

 

 

 

 

 

Fisher Kurtosis Characteristics of Generalized Gaussian Family
6 I I I T r Y7 T
i
5 i— i -1
‘1
4 ’ 1.
3. l
‘5
85 2 ~
ukv
1 r-
\
o \ -
1 IT‘““ We
2 L I l
o 1 2 3 4 5 6 7 8
81

 

Figure 4.2. Fisher kurtosis Characteristics for the Generalized Gaussian Family

4.1.2 Relationship between Bi (oi ,ai ) and ai

Imposing the statistical constraint of distribution integral on the generalized Gaussian

distribution family, i.e.,
(I)
If(xiaai’ai)dxi =1 (4.13)
_w

where, the relation between 6,-(0', ,a,) and a,- can be estimated to be
510711011 Ya" = aiE{|xilai} (4.14)

This relationship is helpful in the determining the ﬁnal form of the generalized

gaussian score function.

81

4.1.3 Derivation of Generalized Gaussian Adaptive Score Function

(GGASF)

Due to the very parameterized nature of the generalized gaussian family and its ability to
model a variety of unimodal source densities of interest to blind source recovery, a
parametric score function (nonlinearity) can be derived based on this family. This
comprehensive score function inherits a nice parametric structure from its parameterized

parent density. In an adaptive setting, the regulation parameter a,- can be adapted during

the adaptation for the source recovery algorithm.
The element-wise nonlinear score function required for the BSR problem has the

deﬁnition [1, 3, 22, 24, 29, 30]

¢<y>=-91’%f5¥l=[n(yi) 42622) 14.6.)? (4.15)

For the 1"" output with an exponential density function, the equation can be

simpliﬁed to
5P1()’%y
(0i(}’i) = ———-—' (4.16)
P101)
where

y,- : represents the i’h output of the network
p,- (y,-) : represents the statistical probability density of the i’h output
Using the generalized gaussian family as the candidate density in relation (4.16)

for the score function

82

(0101): '-

_6_ aiﬂi(0'iaai)

...,. 2M.)

“1131(011‘11)

 

211%.)

CXP{—(ﬂi(01aai)|xil)ai}

by simple calculus, the relation reduces to

(0101-) = «xi/31101.04)” 181

a;

—16

a—ylflyil

exp{_(ﬂi(ai1ai)|xi|)ai}

(4.17)

(4.18)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1
0.5 * 1
ID. ‘-
0 .
1| 0 11
d o
-0.5»
_1 . 1
-1 -0.5 0 0.5 1
1 -
0.5
N on
11 0+ 11
6 CS
-0.5
-1 . . 1
-1 -0.5 0 0.5 1
1 -
0.5 > 1
v to
11 0’ 11
:5 cs
-O.5>
-1 . . 1
-1 -0.5 0 0.5 1

Generalized Gaussian Score Function scaled to a Unit Square

 

1
0.5»
0.
-o.5+

 

_1 L. n
-1 -0.5

1

Y

1

0.5

 

 

0.5»
0+
-0.5’

 

-1

 

 

-1 -o‘.5

0:5

 

1
0.51
O»
41.51/

 

 

 

-1 -0.5

0.5

 

 

Figure 4.3. Scaled Generalized Gaussian Score Function for different values

of parameter a

83

Using (4.14) and the deﬁnition |yi| = yisign(y,-) , the score function is given by

la"

1 .
_ lyi SlgnO’i)

" 5W}

where an appropriate choice of ai makes the proposed score function (or nonlinearity)

 

(xi-(yr) (4.19)

suitable to a particular source distribution. This characterization of the source distribution
can be done adaptively using the Fisher kurtosis estimate for the batch-output of the BSR
network. Figure 4.3 shows some members of the generalized gaussian score function

family as a function of 51,-.

4.2 Hyperbolic Source Density Models

In this section, first new probability density models are proposed and their relationship to
existing well-known density models is established. Consequently, a score function for
each of the proposed density models is derived. A combination of these two score
functions results in the proposed Hyperbolic adaptive score function [30]. Interestingly
the proposed score function is quite similar to our earlier proposed adaptive score

function based on heuristic justiﬁcations, see Waheed, et al. [28].

4.2.1 Background

4.2.1.1 Pearson Distribution Model

A simple mixture of two gaussian densities gives the Pearson density model [18, 19], i.e.,
f(xiﬂi) = (1 - ai)fl(xi) + aif2(xi) (4-20)

where

84

— .— o 2
1 exp (x1 1“!)

2
27:0",-2 201'

 

fi(xi) =

represents a gaussian pdf.

For the symmetric case of the Pearson model, i.e.. ai=12, |,u1|=|,u2|=|,u| and

0'12 = 0% = 0'; the Pearson model possesses two distinct modes for ,u > 0. However, for

p = O the density reduces to the regular gaussian density. This model can effectively

model gaussian and sub-gaussian densities including bimodal (binary) densities. For the

symmetric Pearson density, the model can be expressed as

 

 

 

 

2 2
1 . x- -x-
f (xi) = exp ——’u’—2- exp -—'2 cosh _y, 2' (4.21)
0i V 2” 201' 20',‘ 0"
Fisher Kurtosis for Pearson Distribution, 62:1
0 a . , . , , 1
-0.4[ 1
\\ .
-0.8 r \ —
3 1 k .1
“2‘!
-1.2 - J
-1.4 ~ -
-1.6 ~ ‘ _
-1 .8 ~ _
-2 4 1 1 4 1 1 1 “—‘
o 1 2 3 4 5 6 7 8

 

 

 

 

l p

 

Figure 4.4. Fisher Kurtosis Characteristics for the symmetric Pearson Model as

a function of ,u

85

The Fisher kurtosis [12, 13] for the symmetric Pearson distribution model is given by

4
K:(#,02)=__‘_2£__

2 (4.22)
#2 +02)

It is evident that for ,u > O , the kurtosis of the distribution is strictly negative; see

Figure 4.4, making it a suitable model for sub-gaussian to gaussian distributions only.

4.2.1.2 Generalized Cauchy Distribution

The symmetric super-gaussian density models are best represented by the Generalized

Cauchy density. The generalized standard Cauchy density is given by [12, 18]

k-
f (xi)=——’—2—m—, (4.23)
(1 +x ) 1
where
mi: represents the order of the Cauchy model (i.e., determines the kurtosis of the

distribution)

ki : represents a scaling parameter to satisfy the constraints of a pdf, e.g., for m,- =1,

k,-=%[.

The Cauchy distribution is a symmetric distribution with heavy tails and a single
peak at the center of the distribution. Due to the heavier tail, the kurtosis is much larger

than that for a normal distribution. In short, the kurtosis for a generalized Cauchy model

(mi > 0) is strictly positive and has an inﬁnite range.

86

4.2.2 Proposed Hyperbolic Distribution Models

4.2.2.1 Hyperbolic Sub-Gaussian Model

An alternate model is proposed based on the simpliﬁed Pearson model in (4.21). Not only
can this model represent all the kurtosis range of the original Pearson model; but also due
to an increased degree of freedom in the hyperbolic component, it can model additional
range of bimodal densities, which the original Pearson model cannot. The proposed

model is given by

fsubo.)=N(y..a?)cosh(4<xi44))" (4.24)
where

1,,k : represent positive shaping parameters

N( #13012) : represents a gaussian pdf [32] with mean p,- and variance 03-2. p,- is the mean
for the symmetric distribution. Without loss of any generality, assume p,- = 0.

Just like the Pearson Model, this model is capable of representing density models
from the gaussian down to the sub-gaussian bimodal densities. The family of density

curves generated by this probability model for A,- = 1 is shown in Figure 4.5.a. For k = 0,

this model represents the gaussian density with a Fisher kurtosis equal to zero.

87

 

 

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

Sub-Gaussian Distribution Family: N(O,1)'cosh(x)k

   

 

XXXX
II II II II
(ADM-‘0

 

 

 

 

 

 

(a)

 

 

Normalized Kurtosis for the Sub-Gaussian Distribution Model

 

I I I T I

 

 

l \‘k

0.5 1 1.5
k

)—

 

2.5

 

Figure 4.5. (a)

(b)

Proposed Symmetric Sub-gaussian Density Model, (b) Fisher Kurtosis

range of the proposed model; forO S k S 3 and xi,- = l .

88

 

 

The Fisher kurtosis [12, 18, 32] of this proposed model for the range 0 S k S 3 and

A,- =1 is presented in Figure 4.5.b. Note that unlike the original Pearson model, this new

representation is capable of representing the bimodal distributions with a Fisher kurtosis
< -2. In fact, the model can model bimodal distributions with a Fisher kurtosis value of

down to -3.

4.2.2.2 Hyperbolic Super-Gaussian Model

Similar to the above approach, the product of a gaussian and a hyperbolic function can be

used to approximate the generalized Cauchy density model, i.e.,

" k
fsup(xi) = N(0,a.-2)sech(/1,-x,-) (4.25)
where
’11 ,k : represent positive shaping parameters

The proposed model can effectively model all possible generalized standard
Cauchy densities. In addition, for k = O, the proposed model reduces to the gaussian

density. Figure 4.6.a presents some members of this family for ,1,- = land 0 S k S 3.

As expected the normalized kurtosis for this family can range from zero to

inﬁnitely large positive values. For the range 0 s k S 3and ,1?- =1 , the Fisher kurtosis for

the family is shown in Figure 4.6.b.

89

 

Super-Gaussian Distribution Family: N(0,1)"sech(x)k

 

0.81“
I\ —— k:

H —-'k=1

0.7F [I ‘1
0.6 ~ ’/ \

l l
N
/

-——-k=

--- k=3

 

 

 

05*

OA~

0.2L

OJr

 

l-

 

 

 

 

(a)

 

Normalized Kurtosis for the Super-Gaussian Distribution Model
400 I I I I I

 

350r 1
300r ' ~
250r ~

200F ~

K4(k,x)

150r n

100* -

50v ~

 

 

 

0 05 1 15 2 25 3

 

 

 

(b)

Figure 4.6. (a) Proposed Symmetric Super-gaussian Density Model, (b) Fisher Kurtosis

range of the proposed model; for 0 S k S 3; xi, =1.

90

4.2.3 Derivation of Hyperbolic Adaptive Score Function (HASF)

The two proposed density models can represent a wide range of sources which are of
particular interest to the source separation models. This representation covers the
complete range from the binary distributed sources in communications to the very sparse
distributions obtained during stellar and biological measurements. The score functions
are initially derived for each individual proposed density model. Later both of these score
functions are merged to formulate the proposed adaptive hyperbolic score function.

For the proposed sub-gaussian distribution model (4.24), the BSR score function

can be computed as

apiU’y
«no» = ————a¥i = lg—A-ki m(liyi) (4.26)
pi(yi) 0"

The sub-gaussian family of score functions generated in the range 0 S k S 3 , 0,2 =1 and
31‘ =1 is shown in Figure 4.7.a.

While for the proposed super-gaussian distribution model (4.25), the BSR score

function is given by

5P10’V
mm = -——9yi = Jig—+4114- tanh(4.y.-) (4.27)
pi(yi) 0',’

The super-gaussian family of score functions generated in the range 0 S k S 3, 0,2 =1

and 21.,- = 1 is shown in Figure 4.7.b.

91

 

 

 

 

 

 

 

 

 

 

Sub-Gaussian Score Functions: x-kxtanh(x)
51-
—— k=0
4* — k=1
—— k=2 / /
3- k=3 ,
/ /
2r , / / x
11- ’,””N‘\\ , / / x”
,’ IA ‘ / 2”
0” rl’ / _,_.\ .. / I /’/’
’1’ z / ’ \\\‘_I / l,’
-1- ,”, // \\\ ”/’
1”, / / “d
.2: / ’ /
.3/ / ’
.4/’

-5 l 1
-5 o 5
x
(a)

 

Super-Gaussian Score Functions: x+kxtanh(x)

 

 

 

 

 

 

 

 

 

 

(b)

Figure 4.7. Score Functions based on the proposed: (a) hyperbolic sub-gaussian model,

(b) hyperbolic super-gaussian model; for O S k S 3; xii =1.

92

Comparing (4.26) and (4.27), it is evident that both score functions have a similar
structure but differ by the sign of the hyperbolic tangent term only. The appropriate sign
of this term depends on whether the BSR output distribution is sub-gaussian or super-
gaussian. This hyperbolic term, however, should disappear for the gaussian constituents
in a BSR problem. A good adaptive estimate of the Fisher (or normalized) kurtosis of the
BSR network output can provide us with this discrimination. In an adaptive sense, the
switch between these different possible score functions can be made based on the
estimate of output kurtosis. Both fore-mentioned score functions can be thus combined

into an adaptive hyperbolic score function given as:

(0101') = Uiyi + S i19"(K : (Yi ))ai tanh (ﬂiYi) (4-23)
where
u,- : is a scaling parameter for the linear term. 0, is inversely proportional to the variance

of the reference gaussian (or the resulting non-gaussian) distribution. Hence, for the

distributions with large variance, this term will approach zero.

ai,,6,-: are two scaling parameters of the hyperbolic term which control the magnitude
and the slope of the score function near the origin. or, is directly proportional to the

shaping parameter k in the proposed distribution models. For the gaussian model, k = 0

and therefore a,- = Oand the hyperbolic term should vanish. Some possible score

functions are presented in Figure 4.8.

93

 

Some possible Hyperbolic Adaptive Score Functions
. 5 -

 

 

~10

 

 

 

 

 

 

 

 

II... 1 17., 1%
Q 5 ---~/""/‘ Q . ///
T... 0. / W... o J/
./
-10 4 -5 .
-5 0 5 -5 0 5
N. 10

\
. 1
b

5 3.:
5 4:
K

\

 

 

 

 

I
('11 o
=1a=
I
1b 0

 

 

 

 

 

 

 

 

 

 

3" _10 ’ 1 S“ _4 1
-5 0 5 -5 0 5
-~ ll”.
Q. 0.5 . Q.
N 6
ll _. 0 > II ~ 0
U G
6 -O.5 i N:
g_ g...
-1 -5
-5 0 5 -5 0 5
y, yr

 

 

 

Figure 4.8. Proposed Adaptive Hyperbolic Score Function: some possible combinations.

4.3 Adaptive Estimation of the Fisher (or normalized)
Kurtosis

In order to perform an efﬁcient adaptive estimate of the Fisher (or the normalized)
kurtosis [12, 32] of the BSR output, a batch update algorithm is employed. Practically,
one efﬁcient simple formulation to adaptively estimate the output Fisher kurtosis is using

a forgetting factor as follows:

K.(n+1)= y.K.-(n>+(1—r,-)Kf(y,-(n» (4.29)
where

94

K ,(n) : represents the adaptively determined Fisher kurtosis for the 1"” output at the n’h
iteration

7,: represents the forgetting factor for adaptive kurtosis determination of the i’h output;

K I ( yi(n)) : represents the normalized or Fisher kurtosis estimate for the rim batch of the

i’h output, which is computed as

_ _. 4
Kf(y.-(n» = M- 3 (4.30)
0i

where

E: represents a batch of length N for the 1"” BSR network output

12,-,6} : represent the mean and the variance of the batch of the 1"” BSR network output,

respectively.

4.4 Parameter Regulation for the Proposed Adaptive Score
Functions

Using the estimated output Fischer kurtosis (4.29), discrimination can be done between
the three possible cases of gaussian, sub-gaussian or super-gaussian distributions. This is
done by appropriate selection of the score function shaping parameters for both the
proposed adaptive score ﬁmctions. This can be effectively done for the case of source
recovery from linear convolutive mixtures, where only a discriminating score function is
adequate to separate sources with different distributions [8, 15, 26, 28, 30, 31, 33]. On the

other hand, for nonlinear BSR, a good match between the true source distribution and the

95

chosen score function is required for estimation of the correct separating network [7, 9,

24].

4.4.1 Choice of parameters for Generalized Gaussian Adaptive Score

Function

In order to employ the proposed adaptive generalized gaussian score function for the

linear convolutive BSR, one needs to primarily focus on only three cases of

parameter a,- such that a classiﬁcation between super-gaussian, sub- gaussian and
gaussian distribution families can be made. Practically, a choice of a, for the

computation of the score function, see (4.19), is made using the adapted value of kurtosis;

see (4.29), by the following simple scheme.

0 if K,(.) _>_ V , 1 S 0:,- <1.5 ; where l]! > O is a constant threshold (for super-gaussian

source distributions)

0 elseif Ki(.)S—§ , 611-23; where 5 >0 a constant threshold (for sub-gaussian

source distributions)

0 else a1: 2 (for gaussian distribution)
Typical values chosen for or, in the super-gaussian case is l, and for the sub-

gaussian case is 3.

4.4.2 Choice of parameters for Hyperbolic Adaptive Score Function

In order to employ the proposed hyperbolic adaptive score function for the linear
convolutive BSR, the estimated normalized batch kurtosis of the BSR network output is

used to discriminate between the three possible cases of gaussian, sub-gaussian or super-

96

gaussian distributions. This is done by appropriate selection of the score function shaping

parameters 1),- , a,- and ,Bi. Outlined below is an algorithm to choose the salient feature of

the proposed score function
0 if Ki(.)2u/, 0S1),- S 1; 61,21 and ,8,- 21 where u/ >0 is a constant threshold
(for super-gaussian source distributions)
- elseif K,(.) S —§ , u,- 21 ; a,- 21 and ,6, 21 where 6 > 0 a constant threshold (for
sub-gaussian source distributions)

0 else a; = 0 (for gaussian distribution)
Note that for the case of sub-gaussian and super-gaussian distributions, er, and ﬂ,-

can be ﬁne tuned to match the dynamic range and also further improve the signal to noise

ratio (SNR) and intersymbol interference (ISI) of the recovered BSR sources.

4.5 Simulation Results

Simulation results using a 3 x 3 HR mixing environment are being presented. The
demixing system is formulated as a state-space network [20, 22, 29, 33]. For both
simulations involving feedforward and feedback networks respectively, the structure for
matrices A and B is kept ﬁxed [23], whereas elements of matrices C and D are adapted.
For an in depth discussion on the BSR algorithms see [22, 23, 25, 27-29]. In all presented
simulations, the source signals are chosen to possess different distributions, i.e., super-
gaussian, gaussian and sub-gaussian respectively; the score function is adapted online
along with the BSR algorithm.

The convergence performance of the algorithm is measured using the multi-

channel intersymbol interference (MISI) benchmark. MISI is a measure of the global

97

transfer function diagonalization and permutation as achieved by the demixing network,

see Appendix D for details.

4.5.1 Environment Model

This environment model is assumed to be a 3 x 3 HR ﬁlter

m—l n-l
Z A,m(k — i) = Z B,s(k — i) + v(k) (4.31)
j=0 i=0
where
'1 1 -1 0.5 0.8 —0.7 0.06 0.4 —0.5
Ao=l —1 1 ,A1= 0.8 0.3 —0.2 ,A2= 0.16 -0.1 -0.4
_1 —1 1 —o.1 —o.5 0.4 -o.3 —0.06 0.3
’1 0.6 0.8 0.5 0.5 0.6 .125 0.06 0.2
BO: 0.3 1 0.1 ,131 = -o.3 0.2 -0.3 .32: —o.1 o 0.4

 

_0.6 —0.8 1 —0.2 —0.43 0.6 0.08 —O.13 0.3

and v(k) - represents additive gaussian noise

4.5.2 Demixing Network Structure

The theoretical inverse of this IIR mixing environment will be a 3 x 3 HR matrix ﬁlter
with each individual element being an IIR (or ARMA) ﬁlter of order 18. For both the
feedforward and the feedback demixing cases, matrix C can be initialized with very small
random numbers or all zero elements, while the matrix D is chosen to be an identity
matrix. The number of taps per ﬁlter was chosen to be 21. Note for these equivariant
minimum phase setups, any over-determination in the number of taps does not affect the

performance of the algorithm.

98

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mixing Environment (impulse)
1 0.6 5
0.5 0'4 0 Ir-__--__-___ .
F N P)
g o ' c“ 0.2 I €05
- o - —.-____._..-_ '
0'5 o {o 20 '0‘2 o 10 20 '1 o 10 20
0.5 0.5 0.4
I 0.2 .
o -- --------- o ----—-———--
:3 a a o I -...
c -05 c -05 c
' ' -0.2
’1 o 10 2o '1 0 1‘0 20 '0" o 10 20
0.6 1 1
0'4 0.5 0.5
cm 0,2 I :81 :8
o _.__ _________ o -——-—-——— o --------
.02 0 1O 20 '0'5 0 1O 20 '05 0 1O 20
Taps - n Taps - n Taps - n
(8)
Theoretical Environment Inverse
0.6 1 0.4
0.4 1 0.5 0.2
:: t.- ‘2 I ___----
0 III— ------- l 0 I ___—___— ‘ -o.2 .
. 1 - _ .4
.02 0 5 1O 15 .05 0 5 10 15 ‘0 O 5 10 15
0.2 0.5 0.4
0.1 0.2
- I - ———————— .. I.

3“ o '-r- —————— gwo s i it“ o -.“f__—__
0.1 1 ' oz 1
'0’2 6 s 140 15 '1 6 5 1o 15 '0" o ? 1F 15
0.2 I 0.6 1

- -~-_ _____ . l
- ° I - ,1 °4 8 0..
{L02 i 02 1 i o -
-O.4 o I_-__ ______ I I -
. 2 .
.06 0 5 10 15 ‘0 0 5 10 15 '05 0 5 10 15
Taps - n Taps - n Taps - n

 

 

 

(b)

Figure 4.9. (a) Mixing Environment Transfer Function, (b) Theoretical Environment

Inverse

99

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Estimated Demixing Network by BSR
2 —-r 4 2
.. 1 w 2 m 1 ll
5" L 3" I 2" 0 I""‘“‘“
0 II. ______ (M I —__ ______ .
I I 4
'1 o 5 10 15 '2 o 5 1o 15 '2 o 5 10 15
0.4 2 1
0.2 1 0.5 I
" LI__I_._. . N "’ .._ ___-
EN 0 I I 2W 0 - - ------ 304 O I l.— .-
-02 '- " -o.5
'0" o 5 10 15 '1 o 5 1o 15 '1 o 5 1o 15
0.5 0.6 1.5
.. o l.__________ 1 a 0.4 5:; 1
303.0 5 3 0.2 3 0.5
. o '1.-_ ___—___ o I_-_.. _ ---—
-1 1 A = -0.2—-—-—-——-——-~ -0.5
0 5 10 15 0 5 10 15 O 5 10 15
Taps - n Taps - n Taps - n
(a)
Multichannel Blind Source Recovery
8 If I I I I _1
‘5 6 ~ 1
.5. 4 — ~
‘9 2 _ _
0 1 1 T‘Ww‘mi "' 1 "A“ "FA
0 0.5 1 1.5 2 2.5 3
4
10 4 , , r 1 x 10
N
In
5 5~ ~
Z
0 1 l l l #
0 0.5 1 1.5 2 2.5 3
I ‘r f I I x—lo‘
8 -l
a 6 ~ -
2 4- -1
(2 2L W q
0 L l l I L w
0 0.5 1 1.5 2 2.5 3
number of iterations x 10‘

 

(b)

 

 

Figure 4.10. F eedforward BSR using Adaptive Score Functions: (a) Estimated Demixing

Network Transfer Function, (b) Convergence of MISI Index

100

 

Convergence of Output Signal Kurtosis

4 I ﬁ r I T

 

 

 

 

5:: : 1 11M 0,11%va WI” wﬁ‘i’“\'l~:ﬁ’blﬂ)i"a EMMWMMXi,”~,~ml‘l“imhiﬁlj\llhlv
a” 1.1%ng M“. M '1" ll 2
0o 01.5 1 11.5 2 2T5 3
2 . . f r . x 111‘
11 ‘

031%., N“ “Mme/“Wm? WW" NW1 ﬁw‘wkihﬁ'r’ WMW

 

 

l 1 l

0 0.5 1 1.5 2.5 3

 

 

2 ﬁ T T f I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_1 1 r 1 J l
0 0.5 1 1.5 2 2.5 3
numberotilontlom 4
x 10
(a)
Final Global Transfer Function
1 1 ‘ 1
:2 ‘2 $3
34: 0 ——~—-—— 3.: o -—-——-——-— E o - i
(D (D (D
-1 -1 -1 «
o 10 20 30 o 16 20 30 o 20 4o
1 1 1
3:1 53 a
0 0 (9
-1 -1 .1 1
0 1o 20 30 o 10 20 30 o 20 4o
1 1 1 1
F, 3 32
E 0 u v. 0 1 E 0 r 1
(D 0 (D
-1 -1 l .1 1
o 20 40 o 10 20 30 o 20 4o
Taps - n Taps - n Taps - n

 

 

 

(b)

Figure 4.11. Feedforward BSR using Adaptive Score Functions: (b) Convergence of

Batch Kurtosis, (c) Final Global Transfer Function

101

 

 

 

 

 

 

 

 

 

 

 

 

 

Multichannel Feedback Blind Source Recovery
6 T I I I ‘1'
‘5 4L \MHMMMAV “
5 xx-
— 2 _ """\ s
‘2 ”WWW
WWﬁ
0 l 1 1 1
0 0 5 1 1.5 2 2 5 3
6 T‘ I I I _x—lo
\
\\
g 4 _ \\ -
L. W
.2 VW\/./’\
(7) 2 l— M...“ ...
— WWW-'-
0 l _L 1 l l
0 0.5 1 1 5 2 2.5 3
6 I I I I I X1o‘
x‘\
2 4— ‘\\r ~
g M
a.) 2 F W -4
O r 1 41 1 l_
0 0.5 1 1.5 2 2.5 3
number of iterations x 10‘

 

 

 

(a)

 

Convergence of Output Signal Kurtosis for Feedback BSR
3 . r .

 

 

 

 

2» -
8‘ 1 i wwdﬁfyﬂwwwwhwa“"ﬂﬁﬁwnMvawﬁ\”é
T." OW‘wa/‘W ,
40 d5 1 {5 2 is 3

 

x4 for y2

 

 

 

 

 

 

 

 

 

 

 

o 05 1 15 2 25 3
2 r f I I T x 10‘
a 1 1
CC}

xv 0 PW NV “(WWW-l":

1 1 1 1 w 1
'0 05 1 15 2 15 3
number of iterations x 10‘

(b)

Figure 4.12. Feedback BSR using Adaptive Score Functions: (a) Convergence of MISI

Index, (b) Convergence of Batch Kurtosis

102

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Final Global Transfer Function after BSD
3 3 3
2 2 2
:I 1 .33 1 4 $3 1
3: a. E
0 o —— 0 o e — w o . A
-1 -1 -1
-2 -2 -2
0 20 40 0 20 40 o 20 40
3[ 3 3
2 < 2 1 2
_5. 1 «a 1 «_3 1
N
3* o - S’ 0 ” 59’ °
-1 l -1 -1
-2 -2 - -2
0 20 40 o 20 40 o 20 4o
3 3 3
2 2 2
" 1 “‘ 1 «'3 1
5 o 1 5 o 4 Q o
-1 -1 J -1
-2 -2 -2
o 10 20 30 o 20 40 o 20 4o
Taps - n Taps - n Taps - n

 

 

 

Figure 4.13. Feedback BSR using Adaptive Score Functions: Final Global

Transfer Function

4.6 Remarks

In this chapter, two separate adaptive score functions for blind source recovery
algorithms have been proposed. One of the score functions is based on the generalized
gaussian density model, while the other is derived using proposed hyperbolic extensions
to the Pearson and hyper-Cauchy models. Further we presented a method for online
adaptation of the parameters for these score functions. This online adaptation scheme has
been extensively tested for both minimum phase and non-minimum phase, instantaneous
as well as convolutive mixing. For linear convolutive mixing, the results using both the
proposed score functions are comparable. For simulations using bimodal distributions

only, the latter score function outperforms the former that is best suited for unimodal

103

distributions. For nonlinear BSR, however, the slightly different kurtosis characteristics
of the two proposed distribution models may be better exploited to estimate the correct
demixing network. A couple of simulation examples using two different demixing
architectures have also been presented. It is observed that the proposed adaptive
framework can recover sources from a complex convolutive mixture efﬁciently even

when all the sources have different probability density structure.

104

Chapter 5

BLIND SOURCE RECOVERY OF
UNDERCOMPLETE AND

OVERCOMPLETE MIXTURES

Blind Source Recovery is a framework for the estimation of unlmown original source
signals. Most of the existing BSR literature as well as most theoretical developments in
this dissertation have been done under the assumption that the number of observed
mixtures is equal to the number of original sources. In essence, this implies that the
dimension of original sources to be estimated is known a priori and thus sufﬁcient
dimension in mixtures is recorded. In such a quadratic dimensional scenario, the
recoverability of the sources is guaranteed provided the environment is invertible [13,
14]. In case the environment is linear in nature, this theoretical demixing network is

unique and BSR adaptation results approach a permuted and scaled version of this

105

solution. However, in practical settings, the number of mixtures may or may not exactly
equal the number of original sources. In case the dimension of observed mixture space
exceeds the dimension of the original sources, the BSR problem is termed as
Undercomplete BSR. Contrarily, if the dimension of the mixture space is smaller than
the signal space, the recovery problem is called Overcomplete BSR [1, 4, 8, 27].

This variation in dimension may arise due to a number of reasons. Firstly, in a
blind setting, the number of original stochastically independent sources may not be
exactly known, e.g., consider stellar measurements, cocktail party problem, ﬁnancial
markets etc. Secondly, the dimension of the mixture or observation space may be
determined by the structure of the problem [6, 18, 19], e.g., communication networks,
number of installed recording devices etc. Thirdly, the additive noise in an environment
may be dynamic and originate from a spurious number of sources causing the number of
actual sources to vary over time, e.g., number of users in a wireless communication
system, channel additive sources and multi-path effects in multiple access
communication systems or in fact any measurements in dynamic noisy environments [5,
8, 11]. Due to these practical issues and difﬁculties in BSR problems, it is imperative to
consider the cases of undercomplete and overcomplete BSR.

In this chapter, assuming the mixing to be linear in nature, a methodology is
presented to use the BSR algorithms presented earlier in this work for the case of
undercomplete mixtures. The case of overcomplete BSR is intractable in general.
However, only possible solutions exist for sparse sources and are discussed in subsequent

sections [1, 4, 10, 16, 24, 25, 27].

106

5.1 BSR of Undercomplete Mixtures

Undercomplete BSR is a more commonly encountered problem especially in the realm of
communication systems, radar and sonar measurements or any other noisy BSR
formulations, where adequate sensors are employed to collect the mixed signals and/or
corrupted sources. For the case of undercomplete mixtures, the original sources occupy
only a subspace of the mixture dimensions. In noisy undercomplete BSR, the update laws
presented in earlier chapters can be used as is on the mixture dimensions to recover an
equal number of sources. However, in such a recovered set of estimated signals, some of
the sources may be recovered with multiple permutations and deﬂationary approaches
may be needed to efﬁciently estimate the actual estimated sources [1, 5].

Altemately undercomplete BSR problem can be converted to the standard BSR
problem by efﬁciently discarding some of the mixture data. This can be achieved by
using signal pre-processors [4, 5, 7], such as PCA, projection pursuitietc. Efﬁcient data
reduction is useful in several ways; ﬁrstly, it reduces the dimension of the BSR problems
and therefore results in a reduction of computational requirements. Secondly, data
reduction techniques such as PCA based pre-whitening may also result in reduction of
additive noise in the mixture space simplifying the BSR problem at hand. However, the
performance of the data reduction stage is critical, as any data discarded at this stage is
not used by the BSR stage and thus in-efﬁcient data reduction may limit the performance
of the overall recovery setup [5, 8]. See chapter 6, for the performance of BSR algorithms

with a data-reducing pre-processor.

107

5.1.1 Efﬁcient reduction of Mixture Dimensions

In undercomplete mixtures, efﬁcient estimation of the desired signal subspace results in
both reduction of overall computational load as well as it may serve as a pre—processor to
reduce the additive measurement noise in the mixture space. Two best candidates for
such reduction of data dimensions are basis pursuit and pre-whitening using principal
component analysis (PCA). Under good implementation both techniques lead to similar
performance [4, 5, 7], therefore only PCA is discussed in this work. PCA may be

implemented either using algebraic or adaptive techniques.

5.1.1.1 Algebraic Principal Component Analysis

In algebraic PCA, an auto-correlation matrix of the received mixture data is computed or

estimated online as [6].

R(m) = ar(m)rH (m) + (1 —a)R(m -1) ‘ (5.1)

An efﬁcient data reducing pre-whitening matrix in this case is then constructed as
follows:

W ___ D;%VSH (5.2)
where 15$ and 173 are the estimates of the n dominant eigen-values and eigen-vectors of

the auto-correlation matrix R. These eigen values and vectors can also be computed

adaptively, see [8, 9].

108

5.1.1.2 Adaptive Principal Component Analysis

Instead of trying to estimate the auto-correlation of the mixture data or its eigen
decomposition, the pre-whitening matrix may be directly adapted from the mixture data

[6, 7] by the following simple update law

W(k +1) -_- W(k) + n(k)[1 —x(k)xH (10] W(k) (5.3)
or its averaged version

AW(k) = 77(10[1 — <x(k)xH (10)] W(k) (5.4)
where x(k) = W (k)m(k) and (.z) represents statistical averaging.

In case, m(k) is noisy such that m(k) = 151(k) + v(k) , and 151(k) is the
corresponding noiseless mixture estimate. Let f(k) = W (k)ri1(k) be the corresponding

noiseless whitened estimate. The additive noise v(k) within m(k) introduces a bias in

the estimated pre-whitening matrix W. Computing the covariance matrix for the

whitened data as

Rn = <x(10x” (10) = WR,;,,;,WH + WRWWH (5.5)

where R,;,,;, =<15‘1(k)1izH(k)> and RW =<v(k)vH (10).

Assuming the covariance matrix of the noise to be either known or estimated

online [5-8], an unbiased PCA estimate can be determined by

W(k +1) = W(k) + 77(10[1 — x(k)xH (10+ 03W(10W” (10]W(10 (5.6)

where, it is assumed that RW = 031 . For more details, see [5].

109

5.2 BSR of Overcomplete Mixtures

The overcomplete BSR poses an intractable problem withno straightforward theoretical
solution. In fact, even with the precise knowledge of the mixing structure for a simple
static case, source estimation is not possible using any simple linear algebraic
manipulation. A possible solution for the case of sparsely distributed (super-gaussian)
statically mixed sources is inspired from basis pursuit [4], where the overcomplete
mixtures is decomposed using a combination of adaptation for the mixing matrix and
source inference using an interior point linear programming minimization of an L1 norm
based performance functional. Therefore, the discussion on overcomplete BSR will be
restricted to the static case of ICA only.

Several possible solutions to the separation of independent components from their
overcomplete mixtures have been proposed for super-gaussian sources [1, 2, 4, 10, 27,
28], but either the algorithms are computationally very demanding or the achieved
separation quality is not very good. Recently published results have proposed efﬁcient
two stage approach to the recovery of sparse overcomplete mixtures [3, 16, 24]. This
approach results in remarkably reduced computational requirements as well relevantly

improved quality of received sources.

5.2.1 Two Stage Approach to Overcomplete ICA

In two stage overcomplete recovery approach, the overall computational burden of the
adaptation is reduced by decoupling the two sub-problems of mixing matrix estimation
and original source recovery. Data space projection methods [15, 22] are employed to

ﬁrst blindly estimate the underlying mixing matrix, this is called Blind Mixing Matrix

110

Recovery (BMMR). In the second Overcomplete Source Estimation (OSE) stage, source
data is estimated directly from the mixture data minimizing an L1 norm objective
distance functional for interior point linear programming [4, 10]. As the two steps are
carried out independently of each other, rather than recursively [1, 10, 27], the
computational burden is greatly reduced.

Another technique for OSE using the shortest path approximation is explained in
[3, 17]. This technique requires less computation than the aforementioned technique.
However, in our experience the quality of sources recovered using the IP-LP approach is
better than the latter approach, especially when the mixture space has a larger dimension.

In our work, the overcomplete ICA is solved in two stages, our porposed
Algebraic Independent Conponent Analysis (AICA) Algorithm is used to estimate the
rectangular mixing matrix, followed by optimal source inferencing using L1 norm based
interior point linear programming technique.

The proposed algebraic ICA algorithm is meant for the blind estimation of the
mixing matrix and provides promising results for all cases of ICA. Under the assumption
of sparse sources, AICA provides a blind estimate of the mixing matrix, termed as Blind
Mixing Matrix Recovery (BMMR) [16]. BMMR directly constitutes a solution for
quadratic ICA (and undercomplete ICA after discarding the appropriate undesired
subspace), while for the case of overcomplete ICA, it is followed by a stage for the
recovery of more sources than observations based on the already estimated mixing matrix
[4, 10, 16, 25].

In the following sections ﬁrst Algebraic ICA algorithm is presented followed by a

permutation invariant algebraic performance benchmark for the evaluation of ICA

111

algorithms. The source recovery step has been summarized for the convenience of the
reader. Further, a heuristic technique is discussed to enhance the algebraic resolution of
the AICA solution for cases where some of the columns of the mixing matrix are
algebraically “close” to each other. This is followed by simulation examples and a

discussion on the algorithm itself.

5.3 Algebraic Independent Component Analysis

Independent Component Analysis can be best visualized as a projection method where
the observed mixture data is projected onto a new basis, which minimizes the statistical
dependence among the various data components. Such a representation is very appealing
in a number of domains including statistics, signal processing, bio-medical, molecular,
communications, audio, speech, sonar, astro-physics, textual-topics etc. The best-known
application of ICA is termed as blind source separation (BSS). BSS provides the
framework for the estimation of the original source signals directly from their measured
static mixture with the only underlying assumption of mutual independence of the actual
source signals. Blind Source Separation (BSS) is a special case of Blind Source Recovery
(BSR), where the mixing is assumed to be static.

Algebraic Independent Component Analysis (AICA) is a new ICA algorithm that
exploits algebraic operations and vector-distance measures to estimate the unknown
mixing matrix in a scaled algebraic domain. AICA possesses stability and convergence
properties similar to earlier proposed geometic ICA (gee-ICA) algorithms, however, the
choice of the proposed algebraic measures in AICA has several advantages. Firstly, it
leads to considerable reduction in the computational complexity of the AICA algorithm

as compared to similar algorithms relying on geometric measures making AICA more

112

suitable for online implementations. Secondly, algebraic operations exhibit robustness
against the inherent permutation and scaling issues in ICA, which simpliﬁes the
performance evaluation of the ICA algorithms using algebraic measures. Thirdly, the
algebraic framework is directly extendable to any dimension of ICA problems exhibiting
only a linear increase in the computational cost as a function of the mixing matrix
dimension. The algorithm has been extensively tested for over-complete, under-complete
and quadratic ICA using unimodal super-gaussian distributions. For other less peaked
distributions, the algorithm can be applied either using a neighbourhood function,
averaging of data [7].

The Algebraic ICA (AICA) algorithm is based on the linear algebraic notion of

dot-product between two n-d data vectors as a distance measure. The dot product between
two vectors 5 and b in the same basis ‘1’ e R" is deﬁned as

_ - A _T" _ _ — I

ab — a b —|a||blcosl90b (5.7)
where Bab is the n-d angle between the vectors 5' and b .

In case, the two vectors have been projected onto a unit hyper-sphere ‘1’1 e R" , then

IE] = W =1 , and the dot product simply becomes a measure of the n-d. angle or difference

in directions of the two vectors.

If the two vectors are co-incident (i.e., lie along the same n-d direction), the
absolute value of the cosine of the n-d angle 00b is maximum (i.e., one), while if they are
orthogonal to each other, the cosine of the angle 1901, becomes zero. Therefore, after
projection onto a unit hyper-sphere, the notion of minimum distance is equivalent to the

notion of maximizing the absolute dot-product between the projected vectors. Using the

113

absolute value of the dot product produces a sign ambiguity, which is inherent in any ICA
algorithm. This results in considerable computational savings during the implementation
of the proposed algorithm, i.e.,

[7.17] = |a.17| where a = —a (5.8)
Another useful property for the dot-product is its insensitivity to the order of the product,
i.e., (73:17.5

In order to satisfy the stability and convergence requirements of AICA, the
original source distributions should be unimodal, super-gaussian and preferably
symmetric [3, 15, 23].

Consider the case where there are 11 original independent Lebesgue-continuous

sources sk e R";k =1,2...n

. . . . T .
s(t) =[sl(1) 32(1) sn(1)] ;1=1,2,...,N (5.9)
that are mixed by a mixing matrix H e Rmx" , where m may or may not be equal to n,
i.e.,

H=[hl 112 hn];hkeRm;k=l,2...n (5.10)

where each column maps the k’h source signal sk to the mixture space mk eR'",

i.e., m = HS .
Without any loss of generality, it is assumed that the columns of the mixing
matrix reside on the m-d unit hyper-sphere. In case the columns of H do not satisfy this

constraint, they can be projected onto this hyper-sphere by

1:71:93, 52 13,] (5.11)

114

where

~ h
h = k ;k=1,2... 5.12
k 47k“ n - ( )

This assumption is equivalent to having a scaling ambiguity in the mixing matrix
estimates by the algorithm, which is again inherent in any ICA formulation.

In the mixture basis m e Rm , the k’h source data is clustered along the m-d hyper-
direction formed by the intersection of bk and 411‘. The AICA algorithm therefore tries

to estimate these hyper-directions (and hence the mixing matrix) using algebraic

operations on the available mixture data.

 

Scatter Data with centroids along the directions oi the Mixing Matrix
I I I I r I

?h,/nh,u . ‘ r f

 

     

0.4.3..

0.2 r

 

 

 

-1.5 -1 —O.5 0 0.5 1 1.5

 

 

 

Figure 5.1. A scatter plot example of the projection of two sources onto the

2 x 2 mixture space.

115

For clarity of illustration, consider the case for m = n = 2 as shown in Figure 5.1. The
columns of the mixing matrix [1 have been normalized so that they are projected onto
the unit circle. In order to clearly indicate the direction of the plotted mixing matrix, lines

are drawn between h,/||h,.|| and -11,- /||h,-||. The mixture data has also been scaled to ﬁt

within the unit circle. The dashed lines indicate the basins of attraction for each of these

directions.

5.3.1 The Algebraic ICA Algorithm

Assuming that the n-d mixture data has been accumulated over some period of time and
there are N samples available that in essence can capture the distribution of the original
sources and are also adequate for the algorithm to converge. Furthermore, the dimension
mxn of the desired weight network is known. The algorithm can be implemented as

follows:

1. Pick an n number of m-d starting elements for the weight matrix
W0 = [W1 wz wn] on the unit hyper-sphere. These initial starting points can be
chosen in a number of ways, e.g., they can be either chosen randomly or evenly
distributed around the unit hyper-sphere.

2. Choose an m-d mixture sample vector m,- taken at the 14” sampling instant, such that
m,- ¢O,i=1,2...N

3. Project this sample mixture vector onto the m-d unit hyper-sphere by

Mi=ﬂ i=1,2...N (5.13)

2
"mill

116

. Find the dot-product of this projected mixture vector M ,- with the current i’h estimates
ofthe weight matrix Wi;i = 0,1...N—1
. Determine the closest weight direction (i.e., the winner column) by

w; =argmax|M,-T*wj|;j=1,2,...n, i=1,2,...N (5.14)

“’1
. Update only the winning weight direction by the following algorithm

, W(wj(k)*+n(0sgn(M, -wj(10")), if M} *w; >0
wj(k+l) = . * (5.15)
i/I(wj(k)*+77(t)sgn(—M,~—wj(k) )), if Mfr-w]. <0

where, n(t) - represents a small time adaptive learning rate. Often the learning rate is
chosen to be an exponentially decreasing function with a decay rate that depends on
the characteristics of the source distributions to be estimated.

w(.): is a nonlinear function that projects the updated weight vector again onto the

unit hyper-sphere, i.e., w(.) = —(l-

"01

. Determine the change in weight space

AWi=Wi-W,-_1;i=1,2,...N (5.16)
. If IAW1|< a where e>0 is a small positive valued stopping criterion; the algorithm

has converged to its ﬁnal state.
. Otherwise, update the iteration counter i: i+1, and repeat steps 2 to 9 till the

algorithm converges.

For the proposed algorithm, the update is according to the rule: one winner takes all.

The algorithm reaches a ﬁxed-point (or settles down to a solution), when

117

E(sign(M,--wj(k)*))=0. The AICA algorithm at this stage has converged to the

median of the distribution, which for a symmetric distribution also matches with the

mean of the distribution. At convergence, there is no change in the weights and

E(Wk+1)=E(Wk). In [3, 15, 16] similar conditions have been derived for gee-ICA,

termed as Geometric Convergence Condition (GCC), but these conditions are in general
satisﬁed by all self-organizing algorithms (SOM). Both AICA and geo-ICA algorithms,
resemble the Kohonen’s algorithm for self-organizing maps (SOM) with a trivial zero-
neighborhood [7]. In a 2-sources 2-mixtures case, the algorithm possesses only two ﬁxed

points [15],

W(oo) = (5-17)

where, X is a generalized permutation matrix.

The convergence of the algorithm to the true solution is therefore a function of
initialization and distribution of the sources to be estimated. In case of super-gaussian
sources, however, only the true solution is the stable ﬁxed point. Another observation for
lower dimension problems is that a weight vector may get trapped between two data
clusters and may require re-initialization especially if the two directions are very close to
each other. However, for larger dimension problems, the probability of this occurrence is
negligible.

The proposed algorithm framework has similarities to the Geometric ICA
algorithm [3, 12, 15, 17] but requires signiﬁcantly less computation due to the choice of a
different distance measure and a reduced number of parameters to update. The use of a

neighborhood function in the proposed algorithm is currently being investigated for

118

source distributions other than the super-gaussian. There is interest in the ability to handle
signals with multiple source distributions [20].

Figure 5.2 presents the results after convergence of [the proposed algorithm for the
2x2 case with initialization as presented in Figure 5.1. The initial estimates of the weight
vector were chosen randomly. It is evident from Figure 5.2 that the algorithm caused the

initialized weights to converge to the nearest desired direction.

 

Directions of Initial 81 Final Mixing Matrix Estimates

 

0.8-4
0.64

0.2L

 

 

 

 

 

 

 

Figure 5.2. Convergence of the weights in the 2 x 2 mixture space; wi(0) denotes the

initial weight locations while wi(oo) denotes the ﬁnal weight location.

119

5.3.2 Algebraic Matrix-Distance Index (AMDI)

Algebraic Matrix Distance Index (AMDI) is primarily a scaling, sign and permutation
invariant symmetric numerical index to estimate the “similarity” between the column-
bases of two “normalized matrices” with conformal dimensions. Unlike the Multichannel
Intersymbol Intereference (MISI) measure (also called Amari Index by some) used in
ICA to measure the diagonalization of the global transfer function, AMDI is a measure of
similarity between two column-normalized matrices with similar dimensions.

These attributes also make AMDI suitable for comparison of results for
algorithms such as AICA and geo-ICA especially for overcomplete ICA where
computation of MISI is not possible. In order to compute this metric, the columns of the
matrices are normalized by projecting them onto the conformal unit hyper-sphere by

dividing all the elements of each column by the norm of the corresponding column. The

 

 

AMDI is given by
n-ZmaxdW'*H|) n-Zmax(|W'*H|)
AM)I(W, H) = '0“ + 00’s (5.18)
n n
where

W, H — are two column-normalized matrices of similar dimensions.

The Algebraic Matrix Distance Index (AMDI) has the following properties:
0 0 S AA/HDI (W ,H ) S 2; AAﬂ)1(W,H) = 2 only if both matrices are orthogonal to
each other in the n-d space, also

. AMDI(W,H) = AA/ﬂ)1(H,W)

120

5.3.3 Resolution of AICA Local Minima and Ambiguities

Algebraic ICA is a member of a class of data-dependant self-organizing algorithms. The
performance of the algorithm is a function of the choice of the initial weights, learning
rate and most importantly characteristics of the data. While AICA weight columns
converge to the directions (or columns) of the actual mixing matrix in the limit, provided
the data is unimodal with a high value of kurtosis, and the mixing matrix directions are
sparse [3, 16, 22]. For data that does not meet these requirements either due to lower
kurtosis, or due to the actual mixing matrix directions being “close” to each other
(especially for the overcomplete cases), the algorithm does not give the desired estimate
performance. For such cases, a better estimate of the mixing matrix may be derived using
more than one trials of the algorithm. This technique is also useful for the cases where the
available data is inadequate for the AICA algorithm to estimate the mixing matrix. A

heuristic stepwise approach for such ambiguous cases can be followed as:

l. Initialize all the columns of the estimator matrix at the same location. For each
successive trial, choose a different initial location, e.g., one way to do so is to
initialize the weights at possible m-d binary combinations of 1’s and zeros. For m
mixtures there are possible 2’"-1 such initializations possible, i.e. leaving aside the
trivial combination of all zeros.

2. Execute the AICA algorithm for each initial condition. For a sparse mixing
matrix, the algorithm converges independent of the initialization provided adequate
data is available [17, 24]. In most cases, the algorithm will converge during one of the

trials.

121

3.

In case, the algorithm fails to converge for any choice of initialization. Note, in a
blind scenario this condition cannot be veriﬁed and multiple trials serve as a mean of
estimating the mixing matrix with a higher conﬁdence level. The AICA estimate for
all cases can be post-processed to determine the high conﬁdence candidate weight
columns, see steps 4 to 10.

Determine the list of “close” columns in all trials. This can be algebraically
realized by choosing the corresponding columns with dot product values above a
certain threshold or. This threshold a can be chosen as a function of data sparseness
and the desired resolution of the AICA algorithm.

Concatenate the column pairs with dot products better than a higher threshold ,6,

where ,6 2 a and a common column.

Repeat Step 5 till all possible concatenations are made (this may take 77 -l
iterations for 17 trials, if 1 combination is made each time).

Determine the independent 2 columns as the mixing matrix candidate columns.

The concatenated column cases are given a relatively higher rank. The candidate
columns are arranged as a function of their dot-products values. At this instant, one
option is to choose the M highest ranked columns as the ﬁnal mixing matrix candidate
columns.

In case the ﬁnal column candidates making the out are smaller than M,
deﬂationary approaches can be used after estimation of the output using the mixing
matrix estimate. The remaining columns can now be determined in a consecutive

stage.

122

10. Altemately when the candidate number of columns is larger than M, instead of
choosing the columns via ranking, all the columns may be used for the source
recovery stage, then the selection of appropriate outputs is made by measuring the
independence of recovered outputs, e.g., such an estimate can be made using
Quadratic Information Measure [21].

Using the above-mentioned technique, for a 3-d mixture space, the mixing matrix
with a resolution of better than a 15° (approx. 0.25 rad) hyper-cone in the weight space

has been successfully estimated. See simulation III for some more details.

5.3.4 Inferring Sources from Overcomplete Mixtures

Once the mixing matrix has been estimated by the AICA algorithm, the original sources
can be estimated. For quadratic ICA, the sources can be estimated by computing the

inverse of the estimated mixing matrix W, i.e.

5 = W‘lm (5.19)

For the case of overcomplete mixtures, the recovery of the sources is in general
not possible. However, for the case of sparse super—gaussian sources, once the mixing
matrix has been estimated, interior point linear programming (IP-LP) techniques [4] can
be used to determine the best possible estimate of M-d sources from m-d mixture where
M > m. The technique uses interior point LP with an L1 norm based performance
functional. Standard interior point L2 norm based LP methods can be used for this

technique according to the following procedure [4, 10]:

min ch min "5“1
3.1. 3 st. (5.20)
Ax = b,x _>_ 0 W5 = m

123

where

A = [W —W]: W is the estimated mixing matrix of dimension m x M , note by choice M

= n in this case.

x = [u v]T: represents the M—d intermediate recovered sub-signals with the constraints

uZO and v20

c=[l HT: constrains the recovered M—d sub-signals u and v equally, l is an M—d

vector of ones.
b = m: m are the m-d overcomplete mixtures.
The M-d outputs are then determined using the relation
5 = u — v (5.21)
Another technique using the shortest path approximation is explained in [3, 16].
This technique requires less computation than the aforementioned technique. However,
in our experience the quality of sources recovered using the LP approach is better than

the latter approach.

5.3.5 Simulation Examples

Four simulation examples are being presented: the ﬁrst example demonstrates the issue of
multiple ﬁxed points (or presence of local minima) in the AICA algorithm using
independent identically distributed (IID) data. Then there are two examples using
overcomplete mixtures of 3 and 5 sparse speech segments, respectively. In these sparse
speech samples, words are spoken with certain time intervals to increase the sparseness of
the actual speech signals. The recovery of these speech signals from their 2 and 3 static

mixtures, respectively, is presented. The AICA algorithm is ﬁrst used to estimate the

124

mixing matrix, while the speech sources are inferred using the L. norm interior point
linear programming.

The fourth example discusses BMMR for a 6 x 3 mixture using IID laplacian
sources. The motivation for this example are cases where some of the mixing directions
are “so close” that algebraic ICA algorithm cannot discriminate these algebraically
adjacent columns due to the relatively high overlap of data aligned to them (resolution).
For such cases, the proposed ambiguity resolution technique is applied, which allows for

the exploration of the complete mixture bases in an orderly and organized manner.

5.3.5.1 Simulation 1: 4 x 2 Laplacian Data

In this simulation, a synthetic random matrix mixes 3 independent identically distributed

laplacian sources. For this simulation the mixing matrix is

.._ 0.7231 —0.6121 —0.6587 0.8005
0.6908 0.7908 0.7524 -0.5994

All the weights for the AICA algorithm in this case are initialized along the real
axis. It is observed that the algorithm converges to the undesired ﬁxed point. This is
sensed by AICA observing that only two of the weights are updated, while the other two
are not, see Figure 5.3.a. The algorithm in this case re-initializes the discarded weights
very close to the weights which are currently being updated. This results in the
convergence of the algorithm to the true solution, see Figure 5.3.b. The determined ﬁnal

weights for this simulation are

_ 0.7265 0.8039 0.6448 0.6075
— 0.6872 -0.5948 0.7644 —0.7943

and the achieved AMDI performance is less than 0.0003.

125

 

Weight locations (interim)

K r .

\

 

     

 

 

 

 

 

 

(a)

 

Final Weight locations

I K r r r

 

 

 

)—

 

 

 

 

 

(b)
Figure 5.3. BMMR for Laplacian Data using AICA: (a) Interim Results,

(b) Final Convergence

126

5.3.5.2 Simulation II: Overcomplete Speech Data (3 x 2 Case)

A simulation example for an overcomplete mixture which comprises of three speech
signals mixed by a random mixing matrix into two mixtures is being presented. The
AICA algorithm is ﬁrst used to estimate the mixing matrix, while the speech sources are
inferred using the L1 norm interior point linear programming.

The AICA estimated mixing matrix was very close to the original mixing matrix
used with a normalized AMDI of 0.001. The recovered sources show a permutation
between the ﬁrst and third sources. Also notice that the recovered signals from the
overcomplete mixture in Figure 5.5 .a are not without artifacts but very closely resemble
the original sources in Figure 5.4.a.

Observe that the separated quality achieved using separate Blind Mixing Matrix
Recovery (BMMR) and source recovery steps far exceeds other algorithms [1, 4, 10],

provided a good mixing matrix estimate was found.

127

 

Original Speech

.— ° _rééﬂ‘w‘ﬁikﬂF—éﬁ‘— ‘

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7
Sam ples x 10‘
(a)
1 Mixed Speech
7
“.4
7
x 10‘

 

 

 

(b)

Figure 5.4. Overcomplete Speech Mixture (3 x 2 Case): (3) Original speech sources,

(b) Mixed speech

128

 

 

 

Separated Speech by 1.1 based LP

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Sampbs ‘

 

 

(a)

 

I Convergence of Algebraic Matrix Distance hdex (AMDI)

 

0.8 1

AMDI

0.2 ~ \ 4
\

 

 

 

 

\\
0 1 1 \\~ -¥—. 1 _- 4 1 1 1 1
0 0 5 1 1.5 2 2.5 3 3 5 4 4 5 5
Iterations 4
x 11')
(b)

Figure 5.5. Overcomplete Speech Mixture (3 x 2 Case): (a) Recovered speech,

(b) Convergence of Normalized AMDI over 100 experiments

129

 

5.3.5.3 Simulation III: 5 x 3 Speech Data

For this simulation, the simulation example comprises of an overcomplete speech

mixture. Five utterance are mixed using a 5 x 3 mixing matrix, as given below.

0.4872 —0.9063 0.3285 0.6203 -0.8837
H = 0.2021 -0.0247 0.2714 -0.6752 0.2019
-0.8489 —0.4220 0.9047 0.3990 0.4223

The AICA algorithm is ﬁrst used to estimate the mixing matrix W, while the speech
sources are inferred using the L1 norm interior point linear programming.

The AICA algorithm converges to the estimate in approximately 25000 iterations
achieving an AMDI of 0.002. The estimated mixing matrix W exhibits a permutation,
where the columns appear in the order 4, 3, 5, l and 2. Also the second and fourth

columns of Whave an inverted sign.

0.6243 —0.3298 —0.8787 —0.4861 09015
W = —0.6724 —0.2773 0.1993 —0.2020 0.0228
0.3949 —0.9157 0.4240 0.8445 0.4160

In Figures 5.6-5.8, the 3 speech mixtures, the original 5 speech utterances and the
5 inferred speech sources are shown. Notice that the recovered signals from the
overcomplete mixture in Figure 5.8 are not without artifacts (introduced primarily by the
source inferring stage) but closely resemble the original speech utterances. A listening
test proves this. The speech simulation data can be downloaded at
http://www.egr.msu.edu/bsr/results.

Observe that the two stage overcomplete mixture recovery approaches [16, 24,
25] are relatively simpler to implement as compared to other complex approaches [1, 4,

10]. Also combined BMMR and source inferencing provide satisfactory results as long as

130

the underlying probability distribution is sparse. The performance of the AICA algorithm
depends on the quantity of preamble data available for estimation of the weight matrix.
Although the algorithm may be implemented for any dimensions, the amount of data
required may be immense. This can be dealt with by repeatedly training for weight
estimation on a batch of data, provided the data batch size is adequate to represent the

actual underlying source distributions.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5 x 3 Speech Mixtures
0.5 Eatk T i I 1
ET iii-€111 ;_,
e“ 0 rr'jv saw-1,1 1- 11..
-0.5 L
0 1
0.2
W’tl' [ .. : t“ 1 .111“
N .’;u= 4 if“; ' ‘1-
E O 9s'i Fir 15$“??? ".414?
- .2 1
0 O 1
0.5
F 1‘1 13
E” 0 4'
11$ .H
_0'5 4 l 1 1 L 1 1
O 1 2 3 4 5 6 7
Samples 11 104

 

Figure 5.6. Overcomplete Speech Mixture (5 x 3 Case): 3 Overcomplete Mixtures

131

 

 

Original Speech Data

 

  

 

 

 

“ "=1:‘*‘1.+ii“”‘“* MW1—an —--——~ g wee-WM- .
’1' 1? “ti ‘0"
1 2 4 5 6 7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 2 4 5 6 7
1% #3. a"? ' '1”: _q
1

1 2 4 5 6 7
Samples x 104

 

132

Figure 5.7. Overcomplete Speech Mixture (5 x 3 Case): Original 5 Speech Utterances

 

 

Recovered Speech using AICA and IP-LP
0.5 1 . e a . .

 

T

1

,.~— 0 WWW wwwmwﬁrk+w~4ﬂ§+-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.8. Overcomplete Speech Mixture (5 x 3 Case): Recovered 5 Speech Utterances

133

5.3.5.4 Simulation IV: Independent Identically Distributed (iid) Laplacian Data

For this simulation, six iid laplacian sources are mixed by a randomly generated matrix of
dimension 6 x 3 into three mixtures. Also presented are some of the results where the
columns of the mixing matrix are close and the matrix is estimated using the ambiguity
resolution scheme is presented in section 1.3.4.

For the mixing matrix below the columns 5 and 6 are within 0.4 rad of each other.

AICA is able to estimate the mixing matrix successfully using the data from 3 trials.

0.0226 -0.4698 0.5311 —0.2640 0.9597 -0.9498
H1: -0.0620 -0.0327 -0.5890 -0.4616 0.0309 0.1029
0.9978 -0.8822 0.6091 -0.8469 0.1793 0.0021

Figure 5.9.a shows the results using the proposed technique. The recovered
weight are shown in red, while the blue dash-dot line shows the actual mixing matrix

locations. In another simulation, the mixing matrix is

0.0839 0.6345 0.6606 0.9997 0.3875 0.3164
H2 = 0.8850 0.6525 0.7506 0.0029 0.1546 -0.0014
-0.4579 0.2144 -0.0149 0.0248 0.8509 0.9486
where columns pairs 2 and 3; 5 and 6 are within 0.35 rad of each other. The AICA can

resolve these columns successﬁrlly using results from 7 trials. The ﬁnal result is shown in

Figure 5.9.b.

134

 

Final Iteration Weights and Environment

 

 

 

 

 

(a)

Final Iteration Weights and Environment

 

 

 

 

 

 

(b)
Figure 5.9.Unambiguous AICA estimate using multiple runs: (a) using H],

(b) using H 2

135

The performance of the AICA algorithm depends on the quantity of preamble data
available for estimation of the weight matrix. Although the algorithm may be
implemented for any dimensions, the amount of data required may be immense. In such
cases, the same data batch or a randomly drawn batch may be repeated during the weight
estimation process provided the amount of data in the batch is adequate to represent the
actual source distributions [7]. The convergence of weights certainly depends on the
initialization. Another observation for lower dimension problems is that there are more
chances for a weight vector to be trapped between two algebraically close data clusters.
However, for larger dimension ICA problems, the probability of this occurrence is

minimum.

5.4 Remarks

In this chapter, the adaptation issues for the undercomplete and the. overcomplete BSR
have been discussed. A new Algebraic ICA (AICA) algorithm based dot-product has
been presented. The algorithm estimates the static mixing matrix under the assumption
that the original sources are unimodal and possess positive ﬁsher kurtosis. It is
demonstrated that the proposed AICA algorithm for the case of overcomplete mixing is
used for BMMR. The original sources can then be estimated using interior point L1 norm
based LP technique. Also presented is AMDI, which is a matrix-distance measure to
compare two matrices, which may be both scaled and permuted with respect to each
other. Further, a heuristic technique to determine the BMMR solution when the mixing
matrix columns are close to each other is also discussed. The algorithm has been
successfully applied to the problems where the dimension of mixtures is half the number

of original independent signal sources. Currently algorithm extension using a

136

neighborhood function is being investigated for gaussian and sub-gaussian distributed
signals. The speech data for the examples can be accessed at the website

http://www.egr.msu.edu/bsr/res% [26].

137

Chapter 6

BLIND SOURCE RECOVERY
APPROACHES FOR WIRELESS
DS-CDMA AND WCDMA DOWNLINK

SYSTEMS

Code Division Multiple Access (CDMA) is based on spread-spectrum technology and is
a dominant air interface for the proposed modern 3G and 4G wireless networks [145].
The transmitted CDMA signals propagate through noisy multipath fading communication
channels before arriving at the receiver of the user equipment (UE). In contrast to
classical single-user detection (SUD) algorithms, which do not provide the requisite
performance for modern high data rate applications, multi-user detection (MUD)

approaches require a lot of a-priori information not available to the UE. In this chapter,

138

three promising adaptive natural-gradient based info-theoretic blind user detection
approaches capable of handling the wireless dynamic environments are proposed. The
ﬁrst approach, Blind Multi User Detection (BMUD), is the. process of simultaneously
estimating multiple symbol sequences associated with all users in the downlink of a Code
Division Multiple Access (CDMA) communication system using only the received
wireless data. This approach is applicable to CDMA systems with short codes but
becomes impractical for systems using long codes. Also proposed are two other adaptive
approaches, namely RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal
Component Analysis (RAKE-PCA) that fuse an info-theoretic stage into a generic RAKE
receiver. This results in robust detection algorithms with performance exceeding the
standard LMMSE detectors for both DS-CDMA and WCDMA systems under conditions

of congestion, imprecise channel estimation and unmodeled multiple access interference

(MAI).

6.1 User Detection in CDMA Systems

The conventional detection schemes for CDMA signals only exploit second order
statistics among user codes. Practically, however, the underlying user data symbol
sequences are in general mutually (near-) “independent”. This is a key assumption, which
propels the application of info-theoretic learning approaches such as information
maximization [21] and minimum mutual information [14] to the realm of CDMA. The
use of these algorithms is justiﬁed since a wide sense stationary slowly fading multipath
CDMA environment can be conveniently represented as a linear multi-channel
convolution model (see section 6.3.1) [39, 123]. The received CDMA signal can be

considered as a sum of several non-gaussian random variables generated by the linear

139

convolutive transformations of statistically (near-) independent component user variables
[58, 141]. This linear transformation accounts for the user spreading codes, the cell
scrambling codes (in case of a cellular architecture), multiple channel paths and slowly
fading channel effects [141]. Our goal is to estimate a linear transformation to counteract,
as “optimally” as possible, the effects of a ﬁrst transforrnation-- resulting in the recovery
of the original user signals under the constraint of knowing only the user’s signature code
(and the corresponding cell scrambling code for a two stage implementation) [58, 62,
122,123,141]

For a generic DS-CDMA implementation using only a single stage of user
spreading codes (e.g., using PN codes, gold codes or Walsh Hadamard codes etc.), info-
theoretic blind source recovery techniques [62, 123, 124] can be employed to recover the
user data sequences in the multi user environment. Blind Source Recovery (BSR) is the
process of estimating the original “independent” user-speciﬁc symbol sequences
independent of, and even in the absence of, precise system/channel identiﬁcation [109,
110]. In typical downlink signal processing, where many of the system parameters are
unknown, including the codes for co-existing users at any instant of time, one can use the
blind techniques for better estimate of the user-speciﬁc signals [62, 123, 124].
Altemately, Blind Multi-User Detection (BMUD) algorithms based on the Natural
Gradient Blind Source Recovery (BSR) techniques in both feedback and feedforward
structures can be used [109, 110, 135, 139]. The “quasi-orthogonality” of the spreading
codes and the inherent “independence” among the various transmitted user symbol
sequences form the basis of the proposed BMUD methods. The proposed structures and

algorithms demonstrate promising results as compared to the conventional techniques

140

comprising, e.g., Match Filter (MF), RAKE and block LMMSE [57, 98, 120, 121]. Our
proposed algorithms can be implemented either using the batch or the more
computationally efﬁcient instantaneous update methods [1.10, 112]. Although batch
implementations exhibit better performance [123], it is however accompanied by longer
latency and more involved implementation structures not suitable for a UE/MS. Focus is
placed on the instantaneous (or on-line) performance of the BSR algorithms only, which
still exceeds the performance of other approaches [122, 124].

This (on-line) detection technique can be easily extended to other CDMA
implementations, using relatively short scrambling codes, but becomes impractical in
WCDMA downlink using long scrambling codes. In spite of the fact that very low bit
error rates (BER) can be achieved with this technique and the detection process does not
even require the knowledge of user’s own signature code, the recovered signal stream is
at the symbol level with no user identiﬁcation [62, 123]. Further, inherent sign and
permutation ambiguities exist in BSR (scaling is not relevant as the recovered streams are
either BPSK or QPSK etc.); therefore user identiﬁcation is not possible unless some
preamble or pilot data is transmitted periodically [62, 123]. This periodic requirement
stems from the dynamic nature of the wireless communication scenario where users may
dynamically enter or exit the system. The environment structure also varies widely due to
the mobility of the MS/UE [4].

With these practical constraints in mind, new algorithms are proposed by an
infusion of info-theoretic learning algorithms such as static Blind Source Recovery
(BSR) (or Independent Component Analysis, ICA) and Principal Component Analysis

(PCA) in the existing structure of a RAKE receiver [122, 125]. The purpose of this

141

additional info-theoretic stage is to counter, as best as possible, the unmodeled multiple
access interference (MAI) and the additive (white gaussian) noise in the channel [122].
Further, use of a simple info-theoretic stage does not make the receiver structure too
complex (in fact, it is simpler than most other proposed adaptive LMMSE
implementations [44, 121]). RAKE-PCA uses up to second order statistics, as compared
to RAKE-BSR, which utilizes higher order statistics. This results in slightly simpler
update structure for the RAKE-PCA, but the performance of the RAKE—BSR is found to
be better than RAKE-PCA. Further, assuming the score-ﬁrnction for the ICA update law
to be chosen properly, the resulting equalization matrix in case of RAKE-BSR has
relatively smaller element values (energy) as compared to the corresponding matrix for
RAKE-PCA, which can be translated to the need of fewer memory bits. Lastly, both
RAKE-BSR and RAKE-PCA use all the available user information, so that there are no
issues of user identiﬁcation and sign ambiguities in this case [122, 125]. The main
advantage of both the RAKE-BSR and RAKE-PCA algorithms is the improved BER
performance for the UE/MS without the need of any additional information than what a
standard RAKE receiver already has. The proposed algorithms can be applied to both
generic DS-CDMA and modern 3G UMTS implementations.

The remaining chapter is organized as follows: the next section discusses the
CDMA forward communication channel (or downlink) and the limitations of the
conventional user detection approaches. In the third section, the signal models are derived
for a generic single stage spreading DS-CDMA and the two stage WCDMA (or 3GPP
UMTS FDD) downlink scenarios [143]. This is followed by a section on detection

algorithms for UE/MS. Initially the structure of the well-known linear detectors such as

142

MF, RAKE, LMMSE etc. is brieﬂy described. This is followed by a description of the
proposed info-theoretic structures for DS-CDMA and WCDMA wireless networks. The
ﬁﬁh section presents simulation results under various test conditions and congestion
levels for the systems. Simulation results using the BER performance as a performance
measure support the superiority of the proposed info-theoretic formulations for various
DS-CDMA and WCDMA situations. In the conclusion section, the ﬁndings of the work
are enlisted and summarizing remarks are made on the effectiveness of the proposed

algorithms.

6.2 CDMA Downlink Communication Channel

Code Division Multiple Access (CDMA) uses spread spectrum techniques in which all
the users share the same temporal and spectral resources [58, 98, 121]. In the downlink
signal processing, each user is identiﬁed by a unique signature code, which is chosen to
be “orthogonal” (or quasi-orthogonal) to the signature codes allotted to other users in the
system. The Direct-Sequence Code-Division Multiple Access (DS-CDMA) is a
promising data transmission technique capable of high data rates and better immunity to
channel impairments and noise [98]. In DS-CDMA, the signal energy is “spread” over a
wide frequency range, which reduces the effects of fading channels. CDMA systems
utilize the given spectral and temporal resources more effectively as compared to other
multiple access schemes such as FDMA and TDMA [145]. This results in a larger user
capacity per given resources [57, 98, 121], making CDMA systems a cost effective
choice for modern and proposed future wireless networks [1-4]. Other advantageous
features include soﬁ capacity limit, cell frequency reuse, soft handover of users etc. [57,

98, 121]. Wide bandwidth CDMA will be a dominant technology for third generation
143

(3G) and future wireless communication systems and form an integral part of the
CDMA2000, 3rd Generation Partnership Project (3GPP) Universal Mobile
Telecommunication System (UMTS) Frequency Domain Duplex (FDD, also called
WCDMA) and Time Domain Duplex (TDD) standards [1-4, 143]. Note that the UMTS
TDD standard uses multi-carrier CDMA (MC-CDMA) instead of DS-CDMA. While
indoor, personal, ad-hoc and local non-cellular implementations of CDMA use one stage
of signature codes for user identiﬁcation [27, 149, 152], the modern 2.5G and beyond
wireless voice-data—multimedia networks such as IS-95, CDMA2000, UMTS FDD (or
WCDMA) and beyond use both user-speciﬁc spreading codes and cell-speciﬁc long
scrambling codes to permit cost-effective cellular implementations [14].

Unlike the uplink communication channel, where all user codes are known and
the base station (BS) possesses superior signal processing capabilities and resources, the
downlink path is constrained by restricted computational resources and limited
knowledge of the user equipment (UE)/mobile station (MS) [3, 121]. In a CDMA system,
the symbol train to a user may be detected using either a single-user or a multi-user
detector. A single-user detector (SUD) such as Matched Filter (MF) detector, Zero
Forcing (ZF) detector [121], RAKE etc., does not model the multiple access interference
(MAI) due to the presence of other users and extraneous signals in its signal path, rather
it treats all interfering users and disturbances as additive noise. This severely limits its
detection performance and it fails to provide the performance levels vital to modern high
data rate applications [4]. In contrast, a multi-user detector (MUD) [58, 120, 121, 146]
includes all the users in the signal model. Signiﬁcant improvement can be obtained with a

multi-user receiver [121, 146]. However the optimal MUD [120] is computationally

144

intensive and requires several dynamic system parameters to be known precisely. Several
linear multi-user detection (MUD) techniques such as Best linear unbiased estimator
(BLUE) [146], linear minimum mean squared error (LMMSE) estimator [44, 59, 146]
have been proposed for the wireless downlink based on the linear convolutive channel
model. In practical situations most of these estimators require extensive knowledge of
channel parameters and massive computations. Even if approximated using block level
implementations, the system performance becomes sub-optimal, which further degrades
quickly if the a-priori channel estimates are far from true system parameters. Adaptive
approximations of these algorithms also require some a-priori information for correct
initialization and may not match the performance of the algebraic counterparts under all
conditions.

For the proposed 3GPP UMTS FDD [2, 3] and future tightly synchronous CDMA
standards, the user speciﬁc spreading is done by Orthogonal Variable Spreading Factor
(OVSF) codes, while the cell speciﬁc complex scrambling is done using a set of long
gold sequences (and/or Kasami sequences) [66]. Long codes have their own advantages
such as code-hopping which results in similar performance for all users in the system
(i.e., better QoS). In addition long codes also result in improved power control for users
with relatively smaller data rates in the system and better rejection of extra-cellular
interference (i.e., lower effective BER) [58]. Contrarily, the use of long scrambling codes
makes the implementation of the exact linear detection algorithms and BMUD
impractical due to the excessive computational requirements [44, 146]. Further, the actual
MUD model also does not capture effects such as inter-cellular interference of signals

and interference from other services sharing the same temporal and spectral resources.

145

Even the performance of block implementations of linear detectors such as block
LMMSE (B-LMMSE) is severely degraded due to imprecise channel estimation (and/or
inaccurate estimate of the phase reference for the principal path), if possible at all [59,
121,122]

The restrictive downlink performance of the conventional detectors in practical
communication systems and the higher performance requirements of the modern
multimedia and other dedicated user services, necessitate better detection schemes for the

downlink CDMA channel.

6.3 Downlink Signal Models for CDMA Systems

In this section, the signal model for a generic lDS-CDMA implementation using one layer
of spreading codes only is ﬁrst discussed. In the subsequent sub—section, this model is
extended for the modern wireless networks such as IS-95, CDMA2000, UMTS etc. These
modern CDMA implementations utilize another stage of codes for cell-speciﬁc

scrambling [1 -4].

146

6.3.1 DS-CDMA Signal Model

 

 

 

 

 

 

 
 

 

 
   
   
 

 

 

 

 

b.(t)
User 1 Data
Q —+ X] 320)
I [92(1) 2 Multipath ,.(t)
l ' AWGN
: | Channel
1 : skﬂ)
191(1)
User K Data 0
Q ——tl—;J_

 

 

 

 

 

Figure 6.1. Signal Generation Model for a QPSK DS-CDMA system

In a typical downlink synchronous DS-CDMA system employed for indoor ATM and
certain ad-hoc wireless networks [149, 152], each user’s data is spread using an
individual signature waveform (or spreading code), then the data for all users is combined
and transmitted by the Base Station (BS). Each User Equipment (UE) or Mobile station
(MS) synchronizes itself with the BS using the broadcast synchronization/pilot channels,
once synchronized the BS and UE/MS can communicate on the trafﬁc channel
(comprising of both data and control streams). Assuming the data transmission to be
QPSK, i.e., comprising of two composite data channels created by a serial-to-parallel
(S/P) stage, which are constellated in quadrature. At the UE/MS receiver, the received

signal is ﬁrst passed through a chip-matched ﬁlter and sampled at chip rate.

147

Considering a total of K active users in an L multipath environment and N

transmitted symbols during the observation frame T p, the received signal is given by [82,

122, 124]
N K L—l
r(t)= Z Z Zt/Ekn(t)bk(n)h1(t)5k(t -"T-Tr) + n(t) (6-1)
n=lk=11=0

where ekn is the energy of the nth symbol for the kth user, bk(n) e {ilzti} is the n'h
complex symbol for the k’h user, 111 and r, are the l‘h path’s gain co-efﬁcients and delay,

respectively. n(t) is the additive noise and sk(t) is the k’h user’s signature code (or

spreading sequence) generated by
G-l
ska) = Z ak(m)p(t—mTc); ak(m) 6 {—1,1}; os m s G—l (6.2)
m=0
ak(m) is a real spreading sequence (i.e., any of the standard CDMA PN codes, such as
the Gold, Walsh-Hadamard, Kasami etc. [66]) sequence for the k"I user containing G
chips per symbol, i.e., G = T b / Tc . p(t) is a chipping pulse of duration Tc. T b being the
symbol period.
Under the assumption of time-invariance, the model (6.1) can be more compactly
written in a vector-matrix format as
7 = 1155' + a (6.3)
where, H is a (NG +L—1)xNG multipath propagation co-efﬁcient matrix containing
the channel coefﬁcients. S is a NGxNG block diagonal matrix with the matrix of
spreading codes forming the diagonal elements, 3 is an NG -d vector containing the user
symbols, while '1? is the (NG + L —- 1)-d channel noise vector with covariance matrix Q.

The structure of the above deﬁned matrices and vectors is given by

148

” kg 0 0 i
h 0
H: L—l
0 ho
_ 0 0 hL—l-

 

 

S=diag[§ S S],§=[sl $2 sk]

__ T
b=[t(nT be)?" b<N>T] wept/able) ifs—23202) ﬁbre)?

The compact linear model (6.3) is useful in deriving the linear detectors such as
matched ﬁlter (MP), linear minimum mean squared error (LMMSE) etc. (see, section
6.4.1) for recovery of the transmitted symbol train for a desired user. However, the
primary disadvantage of this model is the prohibitive dimensions of the constituent
matrices, especially with longer frame durations and larger G, the matrices become
excessively large, making this model unsuitable for any real-time implementation at
UE/MS.

Altemately, the signal model can be represented as a linear convolutive model
[57, 62, 122-124], i.e., during symbol time, the received chip data is constituted of the
chips corresponding to the currently transmitted symbol and its delayed multipath
components as well as delayed chips from some previously transmitted symbols and the
channel added noise. In this formulation, G chips arriving at the UE/MS during the 11'“
symbol time are computed as the sum of the chips from L multipaths of the n“
transmitted symbol and the multipath components of the previous J-l symbols (n-l, ...,

n-J-l), where

J $1.410 (6.4)

149

rL being the maximum chip delay in L multipaths. The n’h received symbol can be

expressed as

K L—l
m(t) = z bk(”)\/5kn(t) Z h1(t)Sk(t-nT—Tr)+ "n(t)+
1:0
(6.5)

k=1

J-l K L—l

Z Ebro—111%.- n(t) Zhrmstlt —<n—j)T—m; nTStS(n+1)T
j=1k=l 1:0

Under the assumption L S G , the above model can be expressed just in terms of
the current and the preceding symbol. That is, the multipaths with delay greater than
symbol period either do not exist or are weak enough to be ignored. In this case, the

output samples of the chip-matched ﬁlter can be written as:

K L-I L-l
rn = k2 bkndgkn (t) 120 hlfkl + bk,n-l\/£k,n-1(t)lzo [713“ + "n (6-6)
=1 = =

where, ‘2'“ and g, are G-d early and late code vectors, i.e.,

T
3It]: 0 0 Sklll 5110-711 (6.7)
B_W_J
r]
T
§k1= Sk[G-T]+1] Sk[G] 0 0 (6.8)
G—rl

r, is the discretized delay satisfying the constraint 0 S r, S T b- Imposing time invariant

constraints, the multipath slowly fading environment model (6.6) can be represented in

the form

r" = Hob" + Hlbn—l + 11,, (6.9)

150

where b” and b,,_1 are the K—d vectors of current and previous symbol for all the K users.

H0 and H1 are G x K mixing matrices with the structure

Ho =[Ho,o H01 Hort], H1 =[H1,0 H1,1 Hm]
Such that
L—l
H 0,k = J}; z hlikr (6-10)
1:0
L—l
Hut = JEIZ biz/t1 (611)
1:0

and 50,31 represent the energy of the current and the previous symbol respectively.

6.3.2 WCDMA (UMTS FDD) Signal Model

Consider the forward link (or downlink) for a modern wireless 3G UMTS FDD (best
known as WCDMA) cellular mobile communications network [1-4]. For WCDMA, a
user’s signal stream is ﬁrst spread by a user-speciﬁc OVSF sequence (of length 4-512
chips) followed by a cell-speciﬁc scrambling using long gold codes of length 38400 chips
[3]. This arrangement allows for the signal streams to be distinguishable both at the cell
(BS) and the user (UE/MS) level (see Figure 6.2) [3, 4]. Due to the presence of long
scrambling codes used in the WCDMA downlink, the signal model for the mobile unit
differs from the one discussed in the previous section. The intra-cell multiple user signals
sharing the same multipath environment are considered to be the main cause of the

Multiple Access Interference (MAI) in the received signal for a desired user [57, 121].

151

 

 

 

 

 

 

 

 

 

 

      
   
 

 

 

 

 

 

 

 

 

 

 

 

DPDCH 1 S/P C
Ch] I 1
/DPCCH 1 : Z
l
Cscramb
, ‘ Multipath
S/P 1
DP DC” 2 Cch,2 <+ A WGN '( )
Channel
1
I
l
I
x j ~—
DPDCHK . 8,],
/DPCCHK

 

 

 

 

 

 

 

 

Figure 6.2. Signal generation based on the proposed 3GPP UMTS FDD standard

User speciﬁc data is delivered to the physical layer in blocks often termed as
Transport Blocks (TB), which include bits for CRC, FEC etc. The transport channel
multiplexer (or a parallel-to-serial converter P/S) converts all the data to a single BPSK
stream. After interleaving and addition of physical channel information for transmission
power control (TPC), pilot, transport format combination indicator (TFCI) etc., the data
stream comprises of two dedicated logical channels, one for data (DPDCH) and one for
corresponding control (DPCCH) information, in multiplexed form they form one user
speciﬁc dedicated physical data channel (DPCH) [2]. However, note that in case several
data streams are associated to a single user, they can share the same logical control
channel. Before transmission, the data goes through a serial-to-parallel (S/P) converter
for QPSK modulation. Each user speciﬁc stream is ﬁrst spread using a relatively short
(4-512 chips) OVSF channelization code Cc}, (or real user-speciﬁc spreading code, sk(t) ).

The parallel channels are mapped to the I and the Q branches. At this stage the data

152

streams for all the users in a cell are summed together. This combined QPSK stream is
subsequently scrambled by the long (38400 chips) scrambling code Cscmmb (or cell-level

complex gold code, e(t) ). The collective data for all users is then modulated and

synchronously transmitted via the transmission medium, which is assumed to be a wide
sense stationary slowly fading multipath frequency selective channel. All physical
channels transmitted from a BS are mutually orthogonal at the time of transmission as the
spreading codes are the OVSF (or Walsh Hadamard codes), which are conﬁned to the
symbol period only. Different users can have different data-rates, which is

accommodated by assigning different length OVSF spreading sequences sk (t) in a global

preﬁx-free code assignment tree [3]. The complex gold scrambling code e(t) has a

length of 38400 chips, which lasts for a 10 ms frame [3, 66]. The downlink system is
designed to be tightly synchronous at the frame level [2].

The received signal at the mobile receiver within a transmitted frame" of length T;

can be represented by
N K L-l
r(t) = 2 Z Z bk(n)./g,m(t)h1(t)c(t —r1)sk(t—nT—r,) + n(t) (6.12)
n=1k=l 1:0

where c(t) e {11 ii} is the complex cell-speciﬁc scrambling sequence. Other variables

are deﬁned in a fashion similar to the model (6.1).
The received signal at UE/MS is passed through a chip-matched ﬁlter and sampled at

chip rate. The received vector 7 in this case can be expressed as [122, 125]

7 = HCSb +ﬁ (6.13)
where H, S, b— and H are deﬁned in a fashion similar to (6.3). C is the NGxNG

complex diagonal scrambling matrix with CC” = 2x] NGx NG- N being the number of

153

symbols within a frame time T F. C is deﬁned as: C =diag[c1 c2 cNG];
cie{i:1ii}; lSiSNG

Note that in a physical cellular environment, the unmodeled channel noise has a
more intricate structure than simple additive white Gaussian noise. The additive noise
may include inter-cellular signals especially near cell boundaries or in congested
metropolitan areas, and cross-interference with other services sharing the ﬁ'equency
spectrum [57].

The size of the above model is excessively large due to the dimension of the long
scrambling code. Even the convolutive model, in this case, is not practical due to the
large dimension of the underlying code structure. A number of approaches will be
discussed in the subsequent section to recover the desired user’s symbol train at UE/MS

while keeping in mind the constraints of performance and ease of implementation.

6.4 CDMA User Detection Structures

In this section, the linear detection schemes such as the matched ﬁlter (MF), multipath
matched ﬁlter or RAKE receiver and the linear minimum mean squared error (LMMSE)
detector are described. Other linear detector formulations such as the least squares (LS)
or the zero forcing (ZF) detector, best linear unbiased estimator (BLUE) detector are not
discussed as their performance is worse as compared to the LMMSE detector. For more

details on these detectors, please see [57, 84, 121, 146].

6.4.1 Conventional Detection Schemes

In this section, we describe three conventional user detection schemes dominantly used in

CDMA communication systems. These schemes are the matched ﬁlter (MF), RAKE

154

detector and the linear minimum mean squared error (LMMSE) detector. Also presented

is a practical technique for CDMA channel estimation [57].

6.4.1.1 Matched Filter (MF) Detector

The standard matched ﬁlter is a single user detector which just utilizes the user’s own
signature sequence to make the best possible estimate of user’s transmitted sequence
from the raw chip data received at the UE/MS. The detection algorithm completely
ignores the presence of convolutions due to multipaths in the in the receiver environment
as well as the MAI due to additional users sharing the resources. For the DS-CDMA

system (6.3), MF detector for the i“ user becomes

A
.—

bi,” = s,” 71' (6.14)
where S, = diag[§',- S,- Si], and S,- = [0 s, 0]. s,- being the i’h user’s
signature code, F is the received chip train as deﬁned in (6.3).

In case of WCDMA system, the user’s symbol undergoes spreading as well as

scrambling, therefore the MF detector becomes

A

3,, m1 = SiHCHF (6.15)
where C=diag[c1 c2 cNG];c,-e{ilii}; lSiSNG is the scrambling code

matrix.

Practically, MF can be implemented as
; _ H~
balm-(n) — Si r(n) (6.16)

where

~( ?((n—1)G+l : nG) for DS-CDMA
r n =
CH (n)7((n -1)G+1:nG) for WCDMA

155

and CH(n) = {diag[c(,,_1)c;+1 CnG]}H

6.4.1.2 RAKE Detector

The RAKE detector in CDMA is created using multiple chip-delayed MF detector ﬁngers
in parallel. The RAKE receiver ﬁrst identiﬁes three or more (depending on the number of
ﬁngers) strongest multipath signals arriving at the receiving antenna using a maximal
ratio path search algorithm [57, 121]. These multipath signals are combined after
adjustment for their corresponding delays (and possibly phase and path attenuation if
channel estimation is available) to produce a relatively stronger received signal chip
stream. In this work, the RAKE receiver is assumed to be implemented with the
knowledge of both channel multipath delays (maximal ratio path search) as well as the
corresponding multipath attenuation co-efﬁcients (channel estimation). The RAKE

detector for the DS-CDMA system therefore can be mathematically expressed as

,RAKE 4,1111”? (6.17)

...Q‘I)

where H H is the hermitian of the estimated channel matrix, other variables are as deﬁned

earlier for MF detector. Similarly for the WCDMA system, RAKE detectors is given by

A

b},m=s{’c‘”1‘1”r (6.18)
Similar to (6.16), once the channel estimate is known, RAKE can also be

efﬁciently implemented as

'b’t,a.irr<n)=si’i(n) (6.19)

where

1:1(n)- HH(n)?((n—1)G+1:nG) for DS-CDMA
c” (1019” (n)?((n—1)G+1 : nG) for WCDMA ’
156

C H (n) is as deﬁned above , and H” (n) is the hermitian of the channel estimate MA

ﬁlter.

6.4.1.3 Linear Minimum Mean Squared Error (LMMSE) Detector.

The optimal LMMSE detector is considered to be the best linear detector for DS-CDMA
reception. While other linear detectors such as LS, ZF, BLUE etc. do not provide good
performance in the presence of excessive noise (especially colored noise). LMMSE
detectors do a trade-off by not performing perfect orthogonalization of received signal
stream at low SNRs by trying to minimize additive noise variance.

The LMMSE detector for single stage spreading DS-CDMA systems is given by
Sinner =51” H”(ozﬁﬁ” +21“? (6.20)
where R = r;[? 7H ] = (0'2HHH + Q) is the auto-correlation matrix for the signal chip

train received at the UE/MS, 0'2 being the average power of transmitted user signals.
For the WCDMA system, the symbol recovery needs to be descrambled before

dispreading, and the LMMSE detector becomes

5:,LMMSE = 51” C HI? ”(01231119 H + Q47 (621)

There are several difﬁculties in the implementation of the LMMSE receiver.
Practically, the dimension of H especially in case of WCDMA is impractical. Typically
LMMSE is implemented on mG-sized blocks of received data and is called block
LMMSE (B-LMMSE).

Observe that the terms on the left side of matrix inverse in (6.20) and (6.21) are

similar to (6.17) and (6.18), respectively. Therefore LMMSE detector can typically be

implemented by adding a pre-processing stage before RAKE receiver. The data

157

correlation matrix may be adaptively estimated resulting in adaptive LMMSE
implementations [44, 84, 96, 146].

For DS-CDMA systems, since the number of users K is typically smaller than the
spreading gain G, the user data occupies a sub-space only. In such cases, instead of
inverting the autocorrelation matrix R , the data-dimensions may be reduced without any

signiﬁcant performance degradation [142] using eigen decomposition, i.e.
Farmer =SFCHﬁH<I7SD§1VsH>r (6.22)
where [)3 and 173 are the estimates of the K dominant eigen-values and eigen-vectors of

the auto-correlation matrix R. These eigen values and vectors can also be computed

adaptively, see [52, 62].

6.4.1.4 Channel Estimation Using The Common Pilot Channel (CPICH)

The Common Pilot Channel (CPICH) is a ﬁxed rate (30 kbps, G=256) downlink physical
channel that carries a pre-deﬁned bit sequence [1-4]. Each cell has a Primary Common
Pilot Channel (P-CPICH) and possibly one or more Secondary Common Pilot Channels
(S-CPICH). What conﬁguration of these pilot channels is available and whether they can
provide phase reference and possibly channel estimation for the dedicated data channels
DPCH is communicated to the UE/MS by higher-level signaling. It is possible in a
UMTS implementation that neither the P—CPICH nor any S-CPICH is a phase reference
for any downlink DPCH [4].

If applicable, CPICH allows the UE/MS to equalize the channel in order to
achieve a phase reference with the SCH (Synchronization Channel). CPICH also allows
active estimation for power control. Primary CPICH always uses the same channelization

code.

158

6. 4.1.4.1 Channel Estimation Procedure

In order to outline a simple and practical channel estimation technique using CPICH [1,
4], we assume that the transmitted pilot comprises of a stream of a single symbol train of
1—1’ , and uses an all ones spreading code (i.e., the ﬁrst OVSF code)

1. For each independent multipath, multiply the incoming symbol train with the
corresponding delayed scrambling sequence.

2. Remove the modulation form CPICH by simply multiplying the CPICH data by
its conjugate, i.e., 1+i. The resulting channel estimate is noisy because of AWGN
and multiple-access interference.

3. Pass the noisy channel estimate through a smoothing ﬁlter to achieve better noise

immunity. This ﬁlter can be either a moving average window of length 2M+1, e.g.,

 

. 1 E;
hi = pik (6-23)
2M +1 k=_M ’

Since, a moving UE/MS represents a dynamic channel, the channel estimates may
be processed using a relatively smaller factor M followed by a single-zero

smoothing ﬁlter

II,- = (l — p)l-1;- + p11,; where 0 S p S1 (6.24)
4. Decimate or interpolate the ﬁltered channel estimate obtained in step 2 to match

the data rate of the CPICH to the data rate of the DPCCH/DPDCH. This simple
technique works well in most cases because the channel is assumed to be stable for

the symbol duration.

159

 

 

 

 

 

 

 

 

 

CPI CH Smfotlrthmg Data kRate hi
from each multipath 1 ter Mate lng Channel Estimate
(CPICH)” '

Figure 6.3. Channel Estimation Using CPICH

 

 

 

This simple technique works well in most cases. The channel is typically assumed to be

stable for symbol duration.

6.4.2 Proposed Blind Source Recovery Detection Schemes

In this section, three new MUD detection algorithms using BSR techniques are presented.

6.4.2.1 Natural Gradient Blind Multi-User Detection (BMUD) Algorithms.

Blind Multi-user Detection (BMU D) is the process to blindly estimate all the. user symbol
sequences directly ﬁ'om the received composite CDMA signal using the Blind Source
Recovery (BSR) techniques. BSR framework implies recovery of original signals from
environments that may include transient, convolution and even non-linearity [109, 110,
112, 126]. The linear BSR algorithms [110, 112, 129] have been developed for linear
convolutive mixing environments by the minimization of mutual information (Kullback
Lieblar Divergence) using the natural gradient subject to the structural constraints of a
recovery network. The natural gradient BMUD network can be either in the feedforward
or the feedback conﬁguration [112]. The proposed BMUD algorithm adaptively estimates
a set of matrices to counter the linear convolutive environment model (6.9).

The justiﬁcation for BMUD algorithms is based on the convenient convolutive
signal model representation of DS-CDMA systems, see (6.9), and the reasonable

160

assumption that the various transmitted user symbol sequences are mutually
“independent” as they are generated independently of each other. In this framework, both
the transmitted sequence and the mixing matrices in the model (6.9) are unknown to the
user. The only known entity to the user is the self-identiﬁcation code. Other available
prior information is the nature of transmitted data, which is assumed to be quaternary
sub-gaussian distribution, e. g., QPSK data distorted by the multipath channel and
additive noise. There exists enough information to apply the info-theoretic natural
gradient Blind Source Recovery (BSR) algorithms for BMUD in this case [55, 62, 101,
123,124]

Further assume that the DS-CDMA channel is not over-saturated and K S G. The
proposed BMUD algorithms do not require any pre—whitening of received data. However,
in most modern WCDMA, G is chosen to be very large and in general K < G. Therefore,
it is computationally advantageous to pre-process the data for dimension reduction to K
which is the actual number of principal independent symbol sequences (or users) in the
received data. The process of pre-whitening will also remove the second order
dependence among the received data samples and some of the additive noise [33, 39, 62,
101, 123-125]. The data pre-whitening can be achieved either online using adaptive
principal component analysis (PCA) algorithms or it may be done algebraically [33, 52,
60,98]

The pre-whitening matrix can be computed as

_. l
W = D, AV,” (6.25)
where D, and V, are as deﬁned in section 6.4.1.3.

The whitened version of (6.9) is given by

161

rnw = W(Hobn + Hlbn-l + n") E [701)" + ﬁlbn-l (6.26)
where
5;” represents the G-d whitened data received during the n’h symbol time and H0

,H1 represent the equivalent G xK convolutive mixing matrices for the current and the

previous symbol.
BMUD algorithms blindly adapt a set of matrices to estimate the independent user

symbols y" at the n’h instant. This is followed by a decision stage to interpret, as best as
possible, yn and estimate the corresponding user symbol estimates b” also at the n'h
instant.

13.. = My.) (627)
where 111(.) : represents the (nonlinear) decision stage.

The update laws for both feedforward and feedback structures [110, 112] have
been presented. In the next section, the performance of the proposed algorithms is

discussed and compared with conventional user detection algorithms.

162

6. 4. 2. I . I F eedforward BM UD Conﬁguration:

 

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

 

 

Figure 6.4. Feedforward Demixing Structure

For the feedforward conﬁguration, the BMUD stage output is computed as

K
y" = Worry + Z Wkrgtk ' (6.28)
k=l

The update laws for this feedforward structure can be easily derived, using the

techniques outlined in Chapter 3. For the feedforward parametric matrices W0 and Wk ,

the update laws are
We m(I—wonbf.’ )Wo (6.29)
and
H
4W1 06 (1 —¢(yn)yrII-I )Wk —¢(yn)(rn”:k) (6.30)

where o(.) is an element—wise acting score function [12, 27, 36, 37, 60, 114, 127, 130,

136], and I is a K-d identity matrix.

163

For the initialization of the algorithm, W0 is chosen to be either an identity or a
diagonally dominantly matrix, while all other matrices Wk are initialized to have either

random elements with a very small variance or as a matrix of all zeros [110, 112, 155].

Note that no matrix inversion is required for the feedforward algorithm.

6. 4. 2. 1.2 Feedback BM UD Conﬁguration I:

 

 

3

 

 

 

£__

 

 

3

 

 

 

 

 

 

 

 

 

 

 

Figure 6.5. Feedback Demixing Structure 1

In the feedback conﬁguration I, the output is estimated by

K
ya = W0“1 [Div — Z WkYn—k] (6-31)
k=l

The update laws for this structure using the natural gradient have been derived in

[112]. The update law for the matrix W0 is given by
H
AW0 °C —Wo (1 -¢(yn)yn ) (6.32)
While for the feedback matrices Wk , the update law is

164

H
4W1 DC We [u(ynm—k) (6.33)

The matrices in this case are also initialized in a fashion similar to the feedforward case.

However, note that at least one matrix inversion is required in this formulation.

6. 4. 2. 1.3 Feedback BM UD Conﬁguration II:

 

 

 

 

 

 

 

r- “‘4‘“
3

 

 

E

 

 

 

 

rX——>"6

 

 

 

 

 

 

Figure 6.6. Feedforward Demixing Structure 11

An alternate feedback conﬁguration, see Waheed et a1. (Waheed, 2003 #630},
implements the feedback structure without the need for any matrix inversion. For this
feedback conﬁguration 11, the BMUD stage output is computed as
K
ya = WOrriv 11 Z kan—k (6.34)
k=1
The update laws for this feedback structure have been derived in [110, 112, 155]

and are given by

165

4% at (I «out»? )Wo (635)

Mt at (I won»? )Wk + <00.)th ’ (6.36)

In case the channel can be estimated [57], the performance of the proposed
BMUD algorithms may be adjudged by the diagonalization of the absolute value of the
global transfer function. The global transfer function presents the combined effect of the
complex mixing and demixing transfer functions. For the two symbol convolutive models
for the case of L S G , the global transfer function for the natural gradient algorithms in

the z-domain are given by [123-125]:
G = GO + (112*1 + 622-2 (6.37)
where, for the feedforward conﬁguration

Go = Woﬁo = WOWHO’
G1 = W017] + ME, = WOWHI + WIWHO and , (6.38)
62 1" W1‘71 = WIWHI

while, for the feedback conﬁguration I

G0 = WO-lﬁo = WO—IIVI‘Io,
Gl =W0"1(H1—W1)=W0’1(IWI1 —W1) and (6.39)
02 =

and for the for the feedback conﬁguration 11

GO = WOHO = WOWHO,
G1 = Woﬁl + W1 = WOWHI + W1 and (6.40)
02 = o

The proposed BMUD algorithms as formulated result in recovery of the user

symbols directly. The algorithms can be conveniently applied to DS-CDMA systems

166

using only user-speciﬁc spreading sequences. They may also be extended to CDMA
systems using relatively short scrambling codes, though the dimension of matrices may
still become large for DSP implementations in a UE/MS. The WCDMA system uses long
codes in the downlink, making the application of these algorithms impractical because of

the requisite dimension of the demixing network matrices.

6.4.2.2 RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal

Component Analysis (RAKE-PCA) Detectors

RAKE-BSR and RAKE-PCA are two new proposed adaptive detectors [122, 125], which
utilize the same knowledge as a RAKE receiver. An info-theoretic adaptive weighting
matrix of dimension G x6 is introduced into the RAKE structure, which gives a big
performance boost to the RAKE receiver. The performance of RAKE-BSR/RAKE-PCA
exceeds the performance of LMMSE detectors under the conditions of high network
congestion, imprecise channel estimation, and unmodeled inter-cellular interference etc

[57, 122, 125]. The closed form structure of these proposed detectors is given by [122]

3 _ Si” WHH'r' for DS-CDMA Systems (6 41)
"RAKE—ICA / PCA — 51H CH WHHF for WCDMA Systems °
where W = diag[A A A], and A is the G x G matrix that is adaptively estimated

either using static BSR (ICA) or PCA algorithms.

It is proposed to adapt the matrix A using the natural gradient update laws [7, 9,
122]. However, there exist several other methods for ICA/PCA and any other suitable
method may be used for these adaptations [21, 60, 77]. This blind adaptation of the A
matrix has several advantages and improves the performance of the overall equalization

process in several ways. Firstly, it can counter artifacts in the estimated channel co-

167

efﬁcients H . Secondly, the channel estimation process (as in RAKE receivers) may be

limited by the structure (such as number of ﬁngers) and may estimate only a few of the

dominant channel parameters. W stage tends to counteract this anomaly, as best as
possible, and provides better performance than LMMSE in such cases [122]. Thirdly, this
adaptive stage minimizes the effect of the additive channel noise, which may have an
unmodeled intricate underlying structure. Lastly, the natural gradient ICA /PCA
algorithms inherently reduce near-far problems by removing any ill conditioning in the
signal space for all the users in the system. This results in all the mobile users in the
system to have a BER performance similar to the average BER performance of the

downlink channel [57, 122]. The matrix A is adaptively estimated using the update laws

A(k +1) = A(k) + ntAA(k) (6.42)
where
H .
AA(k) = {(1 — rp(y(k))y(k) )A(k) for static BSR (or ICA) (6.43)
(1 — y(k)y(k)H )A(k) for PCA

and (0(.) is a nonlinear score function [7, 11, 12, 28, 36, 126, 130, 138] which depends on

the underlying distribution structure of the signals involved. For QPSK signal, a suitable

score-function [136] is
$101) = 0m — “1' 031111031 Re{yi}) + tanh (.31 MM») (6-44)
Of these proposed RAKE-BSR/RAKE-PCA structures, RAKE-BSR exhibits
relatively faster and more stable convergence [110, 155]. However, in CDMA systems

the underlying code structure is chosen to be “orthogonal”, and RAKE-PCA may

converge to a slightly lower BER solution if the channel impairments are linear in nature.

Note that in (6.41), if the channel estimate H [57] is either not available or changes very

168

dynamically, the detector can be estimated without using the channel estimate and the

structure reduces to Matched Filter BSR/PCA, i.e.,

8371177 for DS-CDMA Systems

. _ = 6.45
“W BSR/PCA {sﬁCHWr for WCDMA Systems ( )

I)

Q"

The performance of this structure is better than MF alone, and approaches RAKE
performance as the underlying matrix A converges. However, in this chapter, this

structure will not be discussed any further.

6.5 Simulation Results

To verify the performance of the proposed BSR algorithms, a series of experiments for
both DS-CDMA and WCDMA systems were conducted. Some selected results are being
provided for the purpose of illustration. The proposed algorithms can be easily extended
to multirate CDMA transmission. However, for a clear comparison of the proposed
algorithms with the conventional approaches, the simulations have been restricted to the
case where all the users have the same data rate. Assuming a constant spreading gain, G =
64. The channel is assumed to be a wide sense stationary slowly fading with several
possible multiple paths to the UE/MS. The transmitted signal is also corrupted by
additive white gaussian noise (AWGN) during the transmission. The simulated SNR
range for all simulations is from —10 to 20 dB. The dominant multi-path delays and
attenuation co-efﬁcients for the signal received at the UE/MS are assumed ﬁxed (i.e., the
UE/MS is assumed to be static). Further all the multi-path attenuation co-efﬁcients are
assumed to be complex, i.e., each multipath applies both scaling and rotation to the

propagated signal. For all the included simulations, the received CDMA signal comprises

169

of ﬁve multipaths with delays of 0, 1, 2, 3, 4 chips. The corresponding channel

attenuation co-efﬁcients are chosen to be.

h=[0.25+0.18i 0.21+0.14i 0.18+0.11i 0.14+0.11i 0.11+0.07i],respectively.

The multipath channel estimate is either estimated online [57] or assumed to be

known unless otherwise speciﬁed. The score ﬁmction for both the BMUD and proposed
RAKE-BSR algorithms is chosen to be (12,-(n): yi —(tanh(Re{y,-})+ tanh(1m{y,-})).

The UE/MS and BS are assumed to be in perfect synchronism. A decaying time-adaptive
learning rate is chosen for all the adaptive algorithms. The B-LMMSE algorithm is
applied on a block size of G chips, the auto-correlation matrix (for the B-LMMSE
algorithm) is computed from the whole ensemble of the received data. The conjugate of
the channel ﬁlter H, is applied recursively. These steps were done to ensure that the
performance of the simulated B-LMMSE is not restricted due to implementation. In order
to efﬁciently utilize the space, for each simulation, presentation is restricted to just two
congestion scenarios of 20 users (approx. 30% congestion) and 50 users (approx 80%
congestion).

The ﬁnal symbol decision stage for all algorithms is given by
y(yn) = sign(Re{yn})+sign(Im{yn})i (6.46)

Any sign ambiguity in the recovered symbols is ﬁxed using the pilot bits in each user’s

dedicated data channel.

6.5.1 Simulation 1: DS-CDMA System

In this simulation, a DS-CDMA systems is simulated and modeled as in (6.9). All the 5

multi-path channel co-efﬁcients and delays are assumed known. The system has been

170

simulated using two types of spreading codes, namely gold codes with a spreading gain G
= 63, and OVSF (or Walsh Hadamard) codes with a spreading gain G = 64. The
simulation results are presented below in Fig. 6.6.

It is evident from the presented results that the performance of proposed BMUD
algorithm far exceeds any other user detection method. Under very good SNR conditions,
the BER performance of B-LMMSE using gold codes improves over the proposed
BMUD algorithm. This is because the channel estimate is assumed to be perfectly known
for B-LMMSE and secondly, the BMUD algorithm is applied in instantaneous (on-line)
mode with 5000 iterations only. The performance of the BMUD exceeds B-LMMSE if

the algorithm is allowed to adapt longer.

171

 

DS-CDMA Downlink using Gold Codes

0 - G = 63, H Known
10 I I

I

 

.4
4

A A

 

 

_ 1' '3 “L w;— ""~9—1‘—.— _ I“ 5..
10 I ‘ggggbl-\j1-V---~V---——V

-\ ‘-~~

: —e— MF 20 users k‘ ‘V\\g 1
r -- +— RAKE 20 users \\ \‘\«i
* —e— B-LMMSE 20 users 8. ‘
~-e’— B-MUD 20 users \\ <
—e— RAKE-PCA 20 users \19
—e— RAKE-BSR 20 users 4,
— A~ MF 50 users :

- RAKE 50 users
— o- B-LMMSE 50 users
- v— B-MUD 50 users
— o— RAKE-PCA 50 users
— a- RAKE-BSR 50 users

-10 -5 0 5 10 15 20

 

BER

10-

VWVTVr

 

 

 

 

 

 

 

 

 

(a)

 

o DS-CDMA Downlink using OVSF Codes - G = 64, H Known
10 . . . . . ,

 

 

 

 

10 MF 20 users
RAKE 20 users
B-LMMSE 20 users \
B-MUD 20 users
RAKE-PCA 20 users
RAKE-BSR 20 users
MF 50 users
RAKE 50 users
B-LMMSE 50 users
- v— B-MUD 50 users
- <>— RAKE-PCA 50 users
5 - a- RAKE-BSR 50 users

-10 -5 0 5 10 15 20

811111111

 

 

 

 

 

 

(b)
Figure 6.7. DS-CDMA downlink performance for K=20, 50 users (3) using Gold Codes
G=63, (b) using OVSF Codes G=64.

172

The performance of the proposed RAKE-PCA and RAKE-BSR algorithms is also
better than B-LMMSE at lower SNRs and under congestion of the DS-CDMA system.
However, for lower congestion of the system and high SNR,‘the B-LMMSE performance
supercedes these algorithms if the channel estimate is perfect. In case, the channel

estimate is imprecise [57], the proposed algorithms are always better than B-LMMSE.

6.5.2 Simulation II: WCDMA (UMTS FDD) System

For the WCDMA Downlink simulation, 5 multipaths are assumed to exist similar to the
previous simulation, all the user-speciﬁc codes are OVSF with a spreading gain G of 64.
The long scrambling code has the frame-length of 38400 chips (10ms). A performance
comparison of the proposed RAKE-BSR and RAKE-PCA algorithms with conventional
RAKE, B-LMMSE under various network conditions is presented in Figures 6.7 and 6.8.
In the ﬁrst comparison (Fig. 6.7.a), the channel estimation is done for all the multipaths
in the signal generation model. It is observed that the performance of the proposed
algorithms RAKE-PCA and RAKE-BSR is very similar to B-LMMSE for lower SNR
values. Under good SNR conditions, the performance of the B-LMMSE algorithm is
better than the proposed algorithms.

However, note that for the B-LMMSE the auto-correlation matrix is estimated
using 5000 symbols, while only 1000 instantaneous adaptations (for K = 50) are done for
the proposed algorithms. In case more adaptations are done, the performance of proposed
algorithms will approach the LMMSE limit, but even the performance attained with a

limited number of iterations exhibits their effectiveness.

173

 

 

o UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 5

j I

10

    

BER
3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

—+— RAKE 20 users
.9” B-LMMSE 20 users \ 3
3 ~A— RAKE-PCA 20 users
10' r —a— RAKE-BSR 20 users
E —+‘- RAKE 50 users 1
i —e— B-LMMSE 50 users \ it
’ —A— RAKE-PCA 50 users ‘5
—e— RAKE-BSR 50 users 0
.4 l L I l l
10-10 -5 o 5 1o 15 20
SNR
(a)
0 UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 3
10 ﬂ 1 ' ' '
‘~..‘ ===‘¢‘-.- 1
L ,...,“ 1““9
10"?
t
(I -
LIJ 10 2 r
m .
_w RAKE 20 users
——0— B-LMMSE 20 users
_3 -—er— RAKE-PCA 20 users
10 .— —e— RAKE-BSR 20 users
3 —+— RAKE 50 users
[ -, e— B-LMMSE 50 users
- A- RAKE-PCA 50 users
’ —B— RAKE-BSR 50 users
‘4 1 1 1 1 1 J
10-10 -5 0 5 10 15 2'0
SNR
(b)

Figure 6.8. WCDMA Downlink System for K=20, 50 users (a) Performance Comparison
with Perfect Channel Estimation, (b) Performance Comparison with

Imperfect Channel Estimate

174

 

 

ﬂ I

O UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 5
10 T Y i ‘

.J

 

BER

 

i —¢—— RAKE 20 users
. -e— B-LMMSE 20 users ‘
- ~e— RAKE-PCA 20 users J
. —e— RAKE-BSR 20 users
1— RAKE 50 users
» -o— B-LMMSE 50 users
—A— RAKE-PCA 50 users
—a- RAKE-BSR 50 users

 

 

 

 

 

.2

1 L l l l l

0-10 -5 0 5 10 15 20
SNR

 

 

 

 

Figure 6.9. WCDMA Downlink System for K=20, 50 users: Performance

Comparison with Imperfect Channel Estimate and Inter-Cellular Interference

In the second comparison (Figure 6.7.b), the channel estimate is restricted to the
three dominant paths only, which correspond to the choice of delays 0, l and 2 in this
case. In this case it is observed that the detection performance of the proposed RAKE-
BSR and RAKE-PCA algorithms exceeds B-LMMSE at all SNRs. Comparing both
RAKE-PCA and RAKE-BSR, it is observed that the performance of RAKE-PCA is
approximately 1% better than RAKE-BSR. But RAKE-BSR has other advantages (as

discussed in section 6.4.2.2) of stability and smaller energy of the computed filtering

matrix W. Another important observation in this case is that at very low SNRs, RAKE
gives the best performance, but as the SNR improves the proposed algorithms can

achieve very small BER. Therefore, a SNR based switching mechanism can be developed

175

to switch between the standard RAKE and the proposed algorithms, which just constitute
an additional adaptive stage in the structure of the RAKE receiver.

In the third scenario (Figure 6.8), both the channel estimate is assumed to imprecise
as in the previous comparison with only three dominant multipaths out of ﬁve estimated.
In addition, the received signal is corrupted by extra-cellular signals of the neighboring
cell BS. This is a realistic scenario in busy metropolitan areas, where there exist several
dominant multipaths and the cell size is also kept relatively small to maximize the
number of users per unit area. Inter-cellular interference is also critical when the UE/MS
is on the cell boundary and undergoes a soft (or soft-soﬁer) hand-over. The intercellular
interference is unmodeled MAI and severely limits the performance of the detection
algorithms. For the purpose of simulation this auxiliary BS interference is assumed to
have half the energy of the primary BS. It is observed that the proposed RAKE-
BSR/RAKE-PCA algorithms exhibit better immunity to this excessive unmodeled MAI

and retain their qualitative performance advantage over the conventional algorithms.

6.6 Remarks

Three adaptive info-theoretic algorithms for CDMA systems have been proposed. Firstly
a purely blind multi-user detection (BMUD) algorithm for the case of DS-CDMA
systems, or WCDMA systems using short codes has been proposed. This algorithm gives
better performance relative to all other user detection algorithms. This performance
advantage further dominates conventional techniques under practical constraints of
accuracy in on—line channel estimation and unmodeled environment MAI effects. Two
other info-theoretic extensions have also been proposed on top of the standard RAKE

detector namely RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal
176

Component Analysis (RAKE-PCA). These detection schemes add an adaptive info-
theoretic stage, based on higher-order statistics for RAKE-BSR and second-order
statistics for RAKE-PCA, to the standard RAKE detector. This makes the resultant
hybrid detector more robust to imperfections in channel estimates; unmodeled MAI and
other slowly varying channel effects etc. Of these proposed algorithms, RAKE-BSR is
more immune to synchronization errors between UE/MS and BS, possesses faster
convergence and is better capable to cater for various dynamic and time-varying channel
effects as compared to RAKE-PCA. However, for the presented WCDMA results RAKE-
PCA demonstrates a slight performance advantage. This is due to the fact that the
simulated channel imperfections are only linear, the channel synchronization is perfect

and the underlying code structure is orthogonal.

177

Chapter 7

CONCLUSIONS

This dissertation is dedicated to the stochastic blind adaptive nonlinear signal processing
problem of Blind Source Recovery (BSR). Blind Source Recovery represents
unsupervised estimation of original source signals directly from the measurement space,
with or without precise MIMO environment identiﬁcation. A discussion on the BSR
problem deﬁnition, applications, choice of the state space representation and practical
issues is included in Chapter 1. In chapter 2, a focused survey of the major theoretical
development in the ﬁeld of BSR as related to this work has been provided. The natural
gradient and its application to the static and dynamic BSR adaptation has also been
reviewed.

Chapter 3 is the crux of this work. Using state space as a compact network
representation, the BSR optimization framework has been described for both non-linear
and linear structures. The natural gradient learning has been incorporated to develop
equivariant linear state space update laws for the cases of minimum phase and non-
minimum phase environments. In the case of minimum phase environments, separate

adaptation algorithms have been developed for feedforward and feedback demixing

178

networks. It is demonstrated, via simulations, that the derived BSR algorithms exhibit
robust convergence irrespective of the adopted network structure and various parameter
initializations. Further, it is observed that based on the initialization of parameters the
overall network may converge to different but equivalent state space representations.
Since BSR denotes blind recovery, state space BSR algorithms have been extended using
parametric source distributions. A couple of formations are introduced in chapter 4 with
the capability to estimate the types of the underlying source distributions and eventually
control the shape of the nonlinear score function in all the BSR update laws. These
complementary adaptation algorithms add the capability to estimate the underlying
distribution structure of original sources to the BSR framework. This results in extensive
algorithms that can be applied in an autonomous fashion to mixtures of multiple source
distributions. This is an important development as typically the nature of corruption in the
measurement space is unknown. It is demonstrated via simulations that the proposed
adaptive score functions can indeed efﬁciently capture the stochastic nature of actual
sources and cause proﬁcient convergence of BSR algorithm for mixtures of several
different types of sources.

Although it is theoretically tractable to have the dimension of mixture space equal
to the number of original sources, but in a blind setting the actual number of sources is
unknown. This deﬁciency in the a priori knowledge or the structure of a recovery
problem may ensue an undercomplete or an overcompete BSR problem. The
transformation of an undercomplete BSR mixture into a complete BSR mixtures has been
discussed and related issues have been pointed out. For the overcomplete mixtures, there

exists no solution unless the mixture can be projected onto a sparse temporal or spectral

179

analysis space. Algebraic ICA algorithm has been proposed to perform static mixing
environment estimation for sources that exhibit sparseness in the time-domain. AICA
approach has been combined with an Ll-norm interior. point linear programming
approach to recover overcomplete mixtures of speech.
Chapter 6 discusses the application of BSR framework to the domain of CDMA
wireless communication networks. An in-depth signal model has been utilized to develop
a good understanding of the basic DS-CDMA implementations and the modem 30
WCDMA or UMTS FDD implementations. Three BSR based solutions are presented for
the critical forward link multi-user detection in these networks. Extensive comparison of
the proposed BSR user detection schemes with conventional downlink user detection
formats indicate their effectiveness and robustness under conditions of congestion,
channel estimation artifacts and unmodeled multiple access interference.
Quadratic Independence Measure (QIM) and Algebraic Matrix Distance Index
(AMDI) are two new BSR performance measures that have been an outcome of this work
and are presented in appendices D2 and D.3.
The major highlights of this work include
0 Using an info-theoretic, parametric, state space BSR optimization framework and
efﬁcient natural gradient learning, robust BSR update laws have been derived for
various state space arrangements.

0 Structural environment conﬁgurations such as the minimum phase and the non-
minimum phase have been treated comprehensively. Linear feedforward and

feedback structures for BSR have been proposed and implemented.

180

The convergence of the natural gradient BSR algorithms has been investigated, via
extensive simulations, for various demixing network structures and parameter
initializations.

Investigation into efﬁcient parametric modeling of the source distributions, resulting
in two new source density models proposed for sub- and super-gaussian sources,
respectively.

Adaptive estimation of underlying source distributions was achieved for linear
convolutive BSR. This results in completely blind algorithms, which do not even
require the knowledge of the type of underlying source distributions to be estimated.
The comprehensive algorithms are also capable to demix / decorrelate mixtures of
multiple source distributions.

A new data space, cost-effective, algebraic independent component analysis
algorithm, called Algebraic ICA (AICA) has been introduced.

In conjunction with source inferencing algorithms for overcomplete mixtures, AICA
has been applied to the realm of overcomplete blind source separation of speech with
encouraging results.

Three new BSR based multi-user detection algorithms have been proposed to the ﬁeld
of CDMA wireless communication networks. The proposed algorithms have been
simulated for both generic DS-CDMA and modem 36 WCDMA (and CDMA-2000)
communication standards. The BSR MUD algorithms demonstrate a favorable
performance advantage to conventional user detection schemes for CDMA wireless

communication networks.

181

Although, the dissertation concludes here, but there are several, yet to be tackled,
open issues in BSR. Some of the possible extensions of this work are

0 For the case of nonlinear environments, efﬁcient BSR algorithms need to be derived
and their performance needs to be characterized for various different types of
nonlinearities.

- BSR performance evaluation of the Algebraic ICA algorithm needs to be done on
more realistic sparse overcomplete data, e.g., electromedical data (i.e., EEG, MEG to
name a few), astrophysical data etc.

o BSR formulations have been applied to the CDMA downlink wireless communication
networks in this work. Currently the work is being extended to the CDMA uplink
channel. There are several other applications in communication systems, wireless
networks, acoustics, imaging, text, astronomy, ﬁnance etc., where application of BSR

techniques can be very useful.

182

APPENDICES

183

Appendix A _

List of Abbreviations and Acronyms

3G
3GPP
AICA
AMDI

ARMA
ARMAX

AWGN
BER
BLER
B-LMMSE
BLUE
BMMR
BMUD
BP
BPSK
BS
BSD
BSR
BSS
cdf
CDMA

3rd Generation (wireless networks)

3rd Generation Partnership Project
Algebraic Independent Component Analysis
Algebraic Matrix-Distance Index
Autoregressive ﬁlter

Autoregressive Moving Average Filter

Autoregressive Moving Average Filter with Exogenous
Variables

Additive White Gaussian Noise
Bit Error Rate

Blocking Error Rate

Block Linear Minimum Mean Squared Error Detector
Best Linear Unbiased Estimator
Blind Mixing Matrix Recovery
Blind Multi-user Detection
Basis Pursuit

Binary Phase Shift Keying
Base Station

Blind Source Deconvolution
Blind Source Recovery

Blind Source Separation
Cumulative Density Function

Code Division Multiple Access

184

CDMA2000
CPICH
CRC
DM
DPCCH
DPCH
DPDCH
DS-CDMA
EPP

F B

FDD
FEC

FF

FIR
GGASF
GICA or geo-ICA
GTF
HASF
ICA

IID or iid
IIR
IP-LP
lS-95

KL
LMMSE
LMMSE
MA
MAI
MCBD
MF
MIMO
MISI

Wireless Standard Name (U.S. equivalent of WCDMA)
Common Pilot Channel

Cyclic Redundancy Check

Distortion Measure

Dedicated Physical Control Channel

Dedicated Physical Channel

Dedicated Physical Data Channel

Direct Sequence Code Division Multiple Access
Exploratory Projection Pursuit

F eedbaek Network

Frequency Domain Duplex

Forwared Error Correction Codes

Feedforward Network

Finite Impulse Response

Generalized Gaussian Adaptive Score Function
Geometric Independent Component Analysis
Global Transfer Function

Hyperbolic Adaptive Score Function
Independent Component Analysis

Independent Identically Distributed

Inﬁnite Impulse Response

Interior Point Linear Programming

EIA Interim Standard 95 (also called CDMAone)
Kullback Lieblar Divergence

Linear Minimum Mean Squared Error

Linear Minimum Mean Squared Error Detector
Moving Average Filter

Multiple Access Interference

Multichannel Blind Source Deconvolution
Matched Filter

Multiple Input Multiple Output

Multichannel Inter-symbol Interference

185

 

MLE
MS
MSE
MUD
OSE
OVSF

P/ S
PCA
pdf
PDF
PN
QIM
QoS
QPSK
RAKE-BSR
RAKE-PCA
S/P

SF
SINRM
SM
SNR
SUD
TB
TDD
TF
TFCI
TPC

UMTS
WCDMA
ZF

Maximum Likelihood Estimation
Mobile Station

Mean Squared Error

Multi-user Detection
Overcomplete Source Estimation

Orthogonal Variable Spreading Factor Codes (Variable Length
Walsh Codes)

Parallel to Serial Converter

Principal Component Analysis

Probability Density Function

Probability Distribution Function
Pseudo-noise

Quadratic Information Measure

Quality of Service

Quaternary Phase Shiﬁ Keying

RAKE with Blind Source Recovery Detector
RAKE with Principle Component Analysis Detector
Serial to Parallel Converter

Spreading Factor

Maximum Signal to Interference plus Noise Ratio
Separation Measure

Signal to Noise Ratio

Single User Detection

Transport Block

Time Domain Duplex

Transfer Function

Transport Format Combination Indicator
Transmit Power Control

User Equipment

Universal Mobile Telecommunication System
Wideband Code Division Multiple Access

Zero Forcing

186

Appendix B

Kullback Lieblar Divergence: A
Performance Functional for Blind Source

Recovery

Kullback Lieblar Divergence (or the relative entropy) [3 8, 69, 72, 100] between two

probability laws Q(X = xi) = qi and P(X = x,) = p,- is deﬁned as

KX= jp,m(%)dx (13.47)
xeX

The relative entropic divergence of the mutual probablilty distriburion function
(pdf) of a random output vector y (Kullback-Lieblar divergence) with respect to the
product of marginal distribution functions is a measure of the independence of the
maginal output variables. Kullback—Lieblar divergence is, therefore, an appropriate
performance measure for blind source recovery based on the minimization of mutual

information.

187

In the continuous case this relation is given by

 

 

 

L(y>= pronn ,IW) dy (8.48)
yEY pri(yi)
'=l

where

py (y) is the probability density function of the random output vector y
p Yi (Yr) is the probability density function of the ith component of the output vector y

This functional L0») is a “distance” measure with the following properties

i) L( y) 2 0 (13.49)
ii) L(y) = 0 if‘fpyb’) = H pyi (yi) (B50)
i=1

This measure provides an estimate of the degree of dependence among the various
components of the recovered output signal vector and is appropriate to be used as the
objective functional in the optimization framework of the BSR problem.

For the discrete case, the functional becomes

 

L(y)= Z py(y)ln "py(y) (B-51)
yEY pri (yi)
i=1

 

 

The functional can be further simpliﬁed assuming the statistical properties of the
output signals to be ergodic.

188

 

k
L(y(k)): Z py(y(k))1n py(y( » (3.52)

”Y H py, (yi(k))
i=1

 

 

Further simpliﬁcation can be achieved using the assumption that as the algorithm
approaches convergence the components of the output vector y(k) will become
statistically less dependent, ultimately approaching independence as close as possible.
Therefore the above functional can be re-written in its simpliﬁed mutual information
form using the entropy of signals, namely,

n
L(y(k)) =-H(y(k))+Z;Hi(yi(k)) (3.53)
,=
where

H(y(k)): Entropy of the signal vector y(k), given by

r— I py(y)ln|py(y)|dy Continuous Case
: _ = er .
H(y) E[lnlpy 00'] T - Z Py (Y)ln|Py()’)l Discrete Case (B 54)
L er

 

H ,- ( yi(k)) : Marginal entropy of a component signal yi(k).

189

Appendix C ‘

State Space Network: Specialized

Filtering Structures

One of the advantages of using state space model is that most signal processing ﬁlter
structures form special cases where the constituent matrices of the dynamic state equation
take speciﬁc forms. As an example, implementation for both the Inﬁnite Impulse
Response (IIR) and the Finite Impulse Response (FIR) ﬁlters using state space is

discussed in this appendix [109, 110].
CI. IIR Filtering

Consider the case where the matrices A and B are set in the canonical form I (or the

controller form). In this case A and B are conveniently represented as

F1411 A12 ’4qu ‘11
1 0 o 0

A: : z = z : andB= : (c.1)
_ o 0 1 o _ _o_

 

 

 

 

where

190

A : matrix of dimension Lm x Lm.

A1}- : Block sub-matrix of dimension m x m, may be simpliﬁed to a diagonal matrix
I : Identity matrix of dimension m x m

0 : Zero matrix of dimension m x m

B: matrix of dimension Lm x m

 

 

The state matrix is given by
_ X1001
Xk = X(k) = X73“) ((3.2)
_X L (k )1
where

X (k) : is a Lm x m dimensional state vector for the ﬁlter, and each
X J- (k): is an m x m dimensional component state vector

For this model the state model reduces to the following set of equations

representing an IIR ﬁltering structure.

L
X1(k+l) = z Alej(k)+m(k)
j=l
X2(k+1)=X1(k)

. (C.3)
XL(k+1)=XL—1(k)

L
y(k) = 2 C jX ,- (k) + Dm(k)
j=l

For an IIR ﬁlter represented in the state space canonical form I, there is no update

law required for B, while for the matrix A, it is needed update only the ﬁrst block row and

191

therefore the state equation update law (see section 2.1.2) reduces to the following block

update rule

arr"
= -77-—- = WU: .)T/1k+1 = “771k+1X,T(k) (C.4)
6A1j 1}

The special structure in matrix A also effects the update law for the co-state

equations (propagating in future time) which reduces to

CT aLk

k= k1 k
11()12(+)+C1ay()

CT aLk

12(k) = 13(k+1)+C2 zayk --(k)

(C.5)

T aLk

3L0?) = C 6” --(k)

Solving speciﬁcally for time k and then using time shift for time k+i, we obtain

the recursive form for the update as

CT aLk

2100:: ayk(lc+j— 1)

L k (06)
11(k+1)= ZCT— 61’

j=1 k

—(k+j)

which is structurally similar to the natural gradient algorithm derived for blind source

separation/ deconvolution by [155] and can be implemented in a similar fashion by using

the usual time delayed version of the algorithm implemented by buffer storage memory.
Using (C.6), we can further simplify (C.4) for the update of the block sub-

matrices in A as

L
—an1 C176 ———-—(k + j)XT (07)

192

C.2. FIR Filtering

In this case, the ﬁrst row of block sub-matrices A1 j in (CI) is zero, therefore the above

ﬁltering model (C.3) reduces to the FIR ﬁltering structure. In this case the state space
model reduces to

X1(k+1)=m(k)
X2(k+1)=X1(k)
5 (C8)
XL(k+1)=XL—1(k)

L
y(k) = 2 C jX j (k) + Dm(k)
j=l

where the state dynamic equations reduce merely to the delays of the mixture inputs i.e.
X1(k)=m(k—l)
.Xz(k)=M(k-2) (09)
:XL(k) = 17106 -L)

In this case only matrices C and D need to be updated. Using the MIMO
controller form, both matrices A and B contain only ﬁxed block identity and zero sub-
matrices and are conveniently absorbed in (C9).

In case of non-minimum phase environments, the demixing network is typically
constituted as a double-sided FIR ﬁlter [14, 129]. Discrete time ﬁnite length double sided
FIR ﬁlter can be used to approximate the otherwise unstable theoretical demixing
network due to the presence of poles outside the unit circle [22]. The implementation of

these non-causal double sided ﬁlter is handled by using a time delayed version of the

above algorithm with the corresponding increased requirement of buffered storage. The

193

 

primary advantage of these doubly ﬁnite ﬁlters being their ability to converge at “stable”
solutions even for non-minimum phase environments [14, 129].

One possible method to deal with the problem of the ﬁlter approximation length
and resulting delay is in the frequency domain implementations of the algorithm, where
the converged solution represents a 1024 or higher tap double sided FIR ﬁlter in
equivalent frequency domain. The result can be converted to the time domain equivalent
by appropriately using the inverse FFT followed by chopping or windowing techniques to
optimally contain maximum possible ﬁlter power spectral density (PSD) while
minimizing algorithmic delay and hence the buffering memory requirement [73, 76].
However, recent studies have shown that the frequency domain techniques have an upper

bound on the performance as the ﬁlters lengths become large [18, 90].

194

 

Appendix D I

Performance Benchmarks for Blind

Source Recovery

Appendix D.1. Conventional Performance Benchmarks

In this section we will provide an overview of several performance benchmarks that have
been employed by various researchers for the problem of blind source recovery [113,

131].

D.l.l.Signal to Noise Ratio (SNR)

Signal to noise ratio is a well-known communication benchmark, which deﬁnes the ratio
between the desired signal and the unwanted noise/distortion powers usually expressed in

st.
P
SNR=1010g %, (D.l.lO)
n
where P, — represents the signal power, and

195

P" — represents the noise and/or unwanted signal power
This benchmark is applicable in situations where the desired signal is known and is

therefore not very suitable for BSR problems where no such reference signal exists.
D.l.2. Multichannel Intersymbol Interference (MISI)

MISI is an extension of the communication benchmark inter-symbol interference, which
quantiﬁes the smearing of a communication channel by an adjacent channel due to their
overlapping temporal and/or spectral characteristics [52, 98]. MISI is useful as a
dispersion measure as it is insensitive to the overall gain and mean group delay. In the
case of BSR type problems, MISI is a measure to determine the “distance” between the
global, i.e., the combined mixing and demixing network transfer functions and an identity
impulse transfer function. This is a measure of the global diagonalization and

permutation as achieved by the demixing network [9 and the references therein]

MSIk =

 

 

ﬁlzjzplerl-maxpuler-ILﬁIZerlGrvl-leGpvll

i=1 maxPJ leij I H max?” lGP'j I
(D. 1 .1 1)

where for a linear case

G(z) = H (z) * 17(2) - represents Global Transfer Function,
H(z) = [A6, Be, Ce, De] — Transfer Function of Environment
H (z) = [A, B, C, D]-— Transfer Function of Network

This is the performance benchmark of choice for most of the research on BSR. However,

the main drawback of this benchmark is that it requires knowledge of the mixing

196

environment, which is available ofﬂine only in synthetic simulation environments and
hence cannot be employed in practical blind scenarios.

Another variation of the above performance measure uses a squared version of the
global transfer function matrix GM, i.e., the performance measure relies on L2 norm
instead of the L1 norm used above. This technique has an additional inherent ﬂaw that

small error terms will be de-emphasized.
D.l.3.Maximum signal to interference plus noise ratio (SINRM)

This measure uses the maximum signal to interference plus noise ratio as a measure to
determine separation. Interference in source k is constituted by all sources j such that j at

k. Then SINR of source k at output 1' of separator W is deﬁned as [32]

H
W: a
SINRk(w,-)=a,fl ' k

  

(13.1.12)

where

w,- - is the im column of separator matrix W

ak - is the kth column of mixing matrix A

a]? - is the input power of source k
Rvk - is the temporal mean (for nonstationary sources) of the correlation matrix of the
noise plus interference for source k, deﬁned by
Rvk = Rx — eﬁakaf (D.1.13)
Again this measure requires knowledge of mixing transfer function, which is

unknown in blind problems.

197

D.l.4.Mean Squared Error (MSE)

Mean squared error is probably the best-known adaptation measure. MSE uses the L2

norm of the error as a measure to determine the convergence of an algorithm. In the
absence of a desired reference to compute an explicit error measure as in the BSR case,
this error may be computed as the difference between the current and the previous value
of an adapted parameter or using a probabilistic score function. Although using MSE one
can determine whether an algorithm has reached its steady state but there is no way to
determine whether the algorithm has converged to the desired solution or a spurious local
minimum, and as such is not suitable for quantiﬁcation of BSR algorithm. For any

parameter y , MSE is given by

2
MSE(7) = WC) ”(k-m (D.1.14)

2
I7(k)-W(7(k))l
Some other performance indices proposed in [113] for synthetic audio

environments are given below
D.l.5.Distortion Measure

The distortion in the jth separated output can be deﬁned as

Dj =1010g(E{(mJ-,Sj —ajyj)2}/E{(mj,sj,sj)2}] (D.1.15)

where

aj = E{m2_ }/E{y12} - is a scaling factor
Jr"

J

E {.} - is the statistical expectation operator

198

 

y j - is the corresponding output for the j‘h source, and

mks]. - is the contribution of thej‘h source to the ith microphone.

D.l.6. Separation Measure

The quality of separation for the fh separated output can be deﬁned as
2
2
Sj=1010g E{(yj,sj) } E [ZyJ-W] (D.1.16)
199]
where
y j, Si - represents the f" output when only ith source is active.

The last two benchmarks can also be represented in the frequency domain and are
applicable for synthetic evaluations only, where the original source signals are known

and their contribution to an observed mixture & output can be estimated. ,

199

 

 

 

Appendix D.2. Quadratic Information Measure (QIM)

Quadratic Information Measure (QIM) is a novel performance index to measure the
statistical independence of data sequences. In this work, QIM has been proposed and
applied it to the framework of blind source recovery (BSR) that includes blind source
separation, deconvolution and equalization. This performance index is capable of
measuring the mutual independence of data sequences directly from the data. This
information theoretic independence measure is derived based on Renyi’s quadratic
entropy [100] estimated by a ﬁnite data length Parzen window [93] using a gaussian
kernel. Simulation results are presented to validate the performance of the proposed

benchmark and compare it with other established benchmarks.
D.2.].Practical Issues in Estimating Independence of Output Sequences

Stochastic blind signal processing tasks such as blind source separation, deconvolution
and equalization have been the focus of wide interest during the last two decades. This
wide interest is due to several attractive and diverse application domains for these kinds
of problems that include blind classiﬁcation of topics in an intemet chatroom,
communication systems, audio and acoustics, ﬁnance and marketing, astronomy and
physics, bio-medical, space and geo-exploratory applications. However, a daunting task
remains to reliably classify the performance of a number of proposed algorithms
especially in a fashion that makes them suitable for practical implementation. Most of the

presented research on the topic relies on performance benchmarks that require knowledge

200

 

of either the original source signals or the mixing environment transfer function. Both
these quantities are otherwise assumed to be unknown (no precise knowledge of the
above mentioned quantities is actually what makes these tasks blind). Therefore, these
performance benchmarks render themselves useful for synthetic simulations only. In a
practical situation, while making observations or recording a sequence of available
information, precise knowledge of the original signals or the environment transfer
function is never available. Compounding this situation with the inherent indeterminacies
of permutation and scaling in the blind signal recovery problem, the dilemma of having
an unknown number of sources, the unknown order of ﬁlters required say for a
deconvolution type of setup etc. severely limits the capability to determine the quality of
signal separation algorithms in practical situations.

The use of a new benchmark has been pr0posed to determine mutual
independence of a batch of signals for source recovery formulations. Unlike most other
simulation performance benchmarks employed for BSR, this benchmark can be
computed directly from the output data of the BSR network. It may also be applied
directly to the observed mixture data in order to quantify the transformation effect of the
BSR network. This novel benchmark is based on Renyi’s Mutual Information, see [97,
150]. In the context of BSR, the use of this independence performance benchmark ensues
the debate on the correspondence between statistical independence and source recovery.
In this appendix, an overview of Renyi’s quadratic entropy is ﬁrst provided and it is
discussed that how it can be utilized for estimation of signal independence. This is
followed by a discussion on how to apply QIM to BSR type problems. At the end,

simulated results are presented and observations are made from this study.

201

D.2.2.Independence Benchmark using Quadratic Mutual Information

A general benchmark for determination of statistical independence based on Renyi’s
quadratic mutual information has been proposed in this work [131]. Also discussed are
salient features of this measure and use of Parzen Window method for its estimation. For
a more detailed discussion on Renyi’s quadratic mutual information, see [97, 100]. The
framework was applied to the problem of blind source separation by Principe et. al. for
the development of a generalized information theoretic learning framework; where the
learning of the demixing structure can be derived directly from data. However, the
resulting update laws are computationally very expensive. Instead, it is proposed to
utilize a ﬁnite data-length quadratic entropic measure as a performance benchmark for

achieving statistical independence.

D.2.2.l. Renyi’s Entropy

Renyi’s entropy deﬁnition is derived from his proposed theory of means [100] and is
given by

N
H arr-{Z 12mm») (13.2.17)
k=1

where,

(p(.) - is a continuous and strictly monotonic function subclass of Kolmogorov-Nagumo

functions [69, 92]. To satisfy the constraints of an information measure

(D.2.18)

x Shannon's Entropy
(0(x) = (l—a) ; .,
2 x Renyr s Entropy

202

 

I(pk) - any information measure, e.g., I(pk)=—log(pk)is Hartley’s information

measure (or Shannon’s info-theoretic measure [38]).

Simplifying the above relation, (D.2. l 7)

N
HRa zl—lzlog[2p?];a>0,a¢l (D.2.19)
" k=1

Renyi’s quadratic entropy is the special case for a = 2, i.e.,

N
HR2 = —log[kzl 11,3] (D.2.20)

D.2.2.2. Parzen Window Estimator

For the purpose of estimating the squared probability density for the quadratic mutual
information, a Parzen kernel estimator [93] is employed. The Parzen estimator takes the

form of a convolution of an estimator kernel with the observations. Using this estimator,

the pdf py (z) of a random vector y e 91'" is given by

N
16y(z)=%2x(z—y.) (13.2.21)
i=1

where

y, e 93'" is the ith observation vector
x(.) is a kernel function that satisﬁes the conditions for a pdf. There are several such

well-known kernel functions, e.g., the gaussian kernel, the cauchy kernel or the uniform
kernel [93]. For ease of implementation, the Gaussian kernel is chosen with covariance

matrix 2 , i.e.,

203

 

 

m) = G(z,2) = (D222)

1 exp[ zTZ_lz]
M 1
(271') A Isl/2 2
This choice is motivated by an integral property of the Gaussian kernel, which
results in an efﬁcient and exact computation of Renyi’s quadratic entropy, for the case of

2 gaussian kernels

N
Lim 2 6(2), -y,-,21)G(Zk _yjaZZ)
= EG(z-yi,21)G(z—yj,22)dz = 007 _yjrzl +2:2)
where

y,- e 91'" and y j e 93'" are two data vectors in the space

21 and 22 are covariance matrices for corresponding Gaussian kernels.

D.2.3.Quadratic Independence Measure (QIM)

The proposed performance benchmark is based on the Cauchy-Schwartz Inequality. For
L2 norm, the inequality is
2 2 2
1x1 urn 2(xTy) «1224)

Using the above inequality, the Kullback-Lieblar divergence is “approximated”

by taking the logarithm of both sides of (D224) and rearranging

2 2
logmxllz— 2 0 (D.2.25)
(xTy)

For two pdfs f (x) and g(x) , the above inequality can be used as a divergence

measure between the two pdfs. The relation is given by

204

 

 

( EoﬂxfdxX [jogefdx]

[ J: f(x)g(x)dx)2

 

Pcs<f,g)=log (D226)

with the following properties
. 0 3 PC, (.) s 1
0 PCS(.) = 0 if and only if f (x) = g(x)

. Pcs(f,g)=Pcs(gaf)

expressing (D.2.26) in terms ofquadratic entropy

Paws) = 2m2 (17m )—(HR2 (f(X))+HR2 (g(x))) (112.27)

In order to apply this criterion for the measurement of independence in the source
separation framework, let
f (x) represent the joint probability density estimate of the signal vector using Parzen
estimator, and
g(x) represent the product of marginal probability density estimates using the Parzen

estimator

For k observations of an output vector Y 69?” , the above relation can be

expressed as

Pcs(y) = -108[MC1(Y)] (D228)
where

M010”) -represents the normalized cross information measure

205

2
MUCIPU')
N

MJ1P(Y)HMMP(Y1)
1:1

[VIC/(Y):

 

M J,P(Y ) - represents the joint information measure

1 k k
MJ,,,(Y) :k—Z-ZZU’IGu/Y— 13,2020]

M PC,P(Yj , Y,) - represents the partial marginal information measure

MPCIP(Yj’YI)

:i(o(rj, -Y,.,,2a 21),I:)1 ..N

#"I—-

M M”, (Yr) - represents the marginal information measure

1 k
MMP(Y,')=—JZ:1(MPCIP(YJ', Y),...)I=1N

kh-

M UC,p(Y ) - represents unnormalized cross information measure

1 k {N
MUCIP(Y)=;Z HMPCIP(Yj’YI)

)=1 1:1

In case of linear instantaneous and convolutive mixtures, separability can be
deﬁned as a combination of output independence and diagonalization of the global
transfer function [7, 28, 36, 37, 106, 110]. The diagonalization measure ensures that all
sources are extracted and that the algorithm did not converge to a spurious solution in the
data space, where estimated data sources are independent but different from the original
sources. In a completely blind scenario, there is no known criterion to determine this
diagonalization without explicit knowledge of the mixing transfer function. However, for

a ﬁnite known number of network outputs practically one can verify that all the separated

206

(D229)

(D230)

(D231)

(D232)

(D233)

- 1: at)!!!”

 

 

sources are distinct and independent. This ensures separation of all non-gaussian source
distributions, whereas for a gaussian source this ensures separation from other

distributions only, while the reverberative effects might still be present.

 

 

207

 

Quadratic Independence Measure vs. Data Length

1 ' i i I ' "l w ' ii

0.9 >- /I/
0.8 - / .

0.7L / —

 

 

 

Mary)
\

0.6 ~ —
0.5

0.4/ -

0.3 L
10

I
I

 

 

 

103 10‘ 105
Number of Data Samples

(a)

 

 

 

 

 

Quadratic Independence Measure vs. Data Length

0.9

 

1/

0.8L ' 1
0.7 ~ 4
0.6 ~ 1
0.5 . \ .
0.4 - \ -
0.3 ~

PCSM

0.1r

 

 

 

 

o 1 . 1 I 1 I 144 L . r 1 1 r 1 I 1 . 1 1
10 1o3 10‘ 10
Number of Data Samples

0))

 

 

 

 

Figure D. 1. Effect of length of observation sequence on (a) Cross Information Measure,

(b) Quadratic Independence Measure (QIM)

208

The effectiveness of this performance measure depends on the length as well as
the probability distributions of the constituents in an observation sequence. In Fig. D], a
graph for QIM performance measure is being provided as a function of length of
observation data. The data has three constituents that include speech, communication

signals, and gaussian noise.

D.2.4. Simulations Results

Due to space limitation, only one simulation is being provided. The mixing environment

is given by
m-l n—l
Z Aim(k — i) : Z Bis(k — i) + v(k) (D234)
j=0 i=0
where
'1 1 -1 0.5 0.8 —0.7 0.06 0.4 —0.5
A0=l -1 1 ,A1= 0.8 0.3 —o.2 ,A2= 0.16 —0.1 -0.4
_1 -1 1 —o.1 -0.5 0.4 —0.3 —0.06 0.3
"1 0.6 0.8 w 0.5 0.5 0.6 .125 0.06 0.2
BO: 0.3 1 0.1 ,3]: -0.3 0.2 —0.3 ,32: -o.1 0 0.4
_O.6 —0.8 1 —0.2 -o.43 0.6 0.08 —o.13 0.3

 

v(k) - Additive Gaussian noise

The feedforward separation results for each channel using MISI and QIM

(computed with a batch of 1000 samples) are shown below

209

 

 

 

Blind Source Recovery - Convergence of MISI Index

 

 

 

 

 

1 If I I I
a a _
,. a): _
£3 4 - x... -
(2 2 ,— XWM -‘
0 1 (WWW 1““ names”..- 1:22 ......
o 0.5 1 1.5 2 2 5 3
1o . . , , 1 x 10‘
\

ISI fory2
0'1
3/
<"
I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 I I I I
0 0.5 1 1.5 2 2 5 3
x10‘
I I I I I
8 —<
>10 6 '— .1
g 4 .. J
m
_ 2 _ _
o I l I I I
0 0.5 1 1.5 2 2.5 3
number of iterations x 10‘
(a)
I cs(y) - Cauchy Schwartz Quadratic Mutual Information
3 I I T I I
2 _.
1 _. -1
WW
0 I I I I I
0 0.5 1 1.5 2 2.5 3
Vc(y) - Cross I nforrnation Potential x 10‘
1 T T I I T
0.5 e r
0 I L P I I
0 0.5 1 1.5 2 2.5 3
number of Iterations x 10‘
(b)

Figure D.2. (a) Convergence of MISI performance Index, (b) Corresponding Performance

of QIM performance Index

210

 

 

D.2.5. Observations

The proposed quadratic independence benchmark [131] has been thoroughly investigated
for BSR problems that include both minimum phase and non-minimum phase systems.
Due to the normalized nature of QIM, it is suitable for performance comparison of
demixing achieved by different algorithms. The primary inhibition is that
computationally the benchmark has a quadratic relationship with the length of the data
set. Therefore for practical situations, it is recommended to use either the QIM measure
with a shorter data set or another computationally inexpensive measure such as MSE to
determine the convergence of the algorithm. Once the algorithm has converged, a larger

data chunk may be used to determine the achieved performance level using QIM.

211

Appendix D.3. Algebraic Matrix Distance Index

(AMDI)

Algebraic Matrix Distance Index (AMDI) [132-134, 137] is primarily a scaling, sign and
permutation invariant numerical index to estimate the “similarity” between the column-
bases of two matrices with conformal dimensions. These attributes make AMDI suitable
for comparison of any two matrices of similar dimensions including results for algorithms
such as AICA and geo-ICA. Unlike the multichannel intersymbol interference (MISI,
also called Amari Performance Index), which measures the diagonalization of the global
solution (if it can be calculated) in ICA or BSS. AMDI is used to compares the estimated
mixing matrix with the synthetic or known mixing matrix to gauge the performance of
the algorithms that compute the mixing matrix instead of the demixing matrix.
Fru'thermore, for the case of overcomplete ICA, although no MISI can be calculated, yet
AMDI can be used to verify the correctness of the estimated mixing matrix.

This Algebraic Matrix-Distance Index (AMDI) is a symmetric “matrix-distance”
measure. Unlike geometric measure [115, 117], AMDI estimates the distance between
two column-normalized matrices independent of possible sign and permutation

ambiguities. The columns of the matrices are normalized by projecting them onto the

212

 

 

conformal unit hyper-sphere by dividing all the elements of each column by the norm of

the corresponding column. The AMDI is given by

n-ZmaxQW'*H|) n-Zmax(|W'*H|)
AAﬂ)1(W, H) : rows + 00’s (D.3. 1)

n n

 

where
W, H — are two column-normalized matrices of similar dimensions.
The Algebraic Matrix Distance Index (AMDI) has the following properties:
0 Os AW] (W ,H ) S 2; AMDI(W,H) = 2 only if both matrices are orthogonal to
each other in the n—d space, also
0 AAﬂJI(W,H)=AAﬂ)I(H,W)
In order to better visualize the convergence properties of ICA, at times it is useful to
compute the normalized AMDI instead. The normalization is done by the maximum

AMDI during the estimation phase, which is typically AMDI at initialization.

213

Appendix E

Multi-Input Multi-Output (MIMO)

Canonical Form

This appendix describes the steps required to convert a MINIO ﬁlter deﬁnition, as used in
all simulation examples to the canonical state space representation. For all the state space
BSR simulations and derivations, the network structure is assumed to be in MIMO

controller form [17, 103]. In this form, the matrices A and B have the structure

 

 

 

 

F—QI ’Qz "QM-l "QM1 ’1]
I 0 0 O 0
A: o 1 o 0 ,B= 0 (El)
I 1 ° 0
_ o 0 1 o _ _o_
C=lfi P2 PM_1 PMLD=IP01 (152)

where the state space network/ﬁlter can be represented by

H(z)=1P<z>112<z>rl (13.3)
with
P(z) : ﬁzz-z" (13.4)
i=0

214

Q(Z) = 29,-z“ with Q0 = IN (13.5)
i=0

The polynomial matrices P(z) and Q(z) of (E3) are derived from a MIMO transfer

as explained in the following section.

E.1. Transformations required for a MIMO Transfer

Function

Consider a pxm multi-input multi-output transfer function of the form

 

 

"11(2) "12(2) "1111(2) -
d11(2) 6112(2) d1m(2)
"21(2) "22(2) "2111(2)
H(z) = (121(2) d22(2) d2m(Z) (E6)
npl(z) "p2(z) "pm(z)
_dpl(z) dp2(z) dpm(z)d

 

 

Deﬁning, the monic least common denominator of each column i to be l,(z) of

dimension say her, i.e.,
I1(2) é LCM (6111‘ x (121' x ' ' ' dpi) (E7)

We can represent each term of the transfer function as

 

 

n--(z) If (z)
nr<z>_ n,(.) _ '1 dry-<2) (er)
2(2) ‘ die) ’ I-(z) '
. Wi’ 1M} .
Now, deﬁning the modiﬁed numerator polynomials of dimension lxn as
A 1 °
511(2) = "y(2){ 1(2) dij-(z)} (E9)

215

Using the above deﬁnitions, we can represent (E.6) as

H(z) :

OI'

H(z):

PM
11(2)
521(2)
11(2)

ﬁp1(z)

7"1_2(_Z_)
12(2)
522(2)
12(2)

ﬁp2(z)

 

 

_ I1(2)

711(2)
7121(2)

 

_ﬁpl (Z)

therefore, by deﬁning

N(z)é

and

D(z) é

r711(2)
521(2)

 

Lﬁpl (Z)

"11(2)

I2(2)

0

0 12(2)

 

0

h-

0

(EH) can be expressed as

H(z) : N(z)D“l (z)

ﬁlm(z)q

[m(z)
ﬁ2m(z)
Im(z)

ﬁpm(z)

 

4n(z) _

ﬁlm(z)1

ﬁ2m(z)

 

ﬁpm (2)2

ﬁlm(z)1

ﬁzm(2)

 

' 1
11(2)

0

 

 

ﬁpm(z)d

 

h-

0

‘.’ /.<>

0

 

%m (Z)_

(E.10)

 

(E.ll)

(E.12)

(13.13)

(E.l4)

Note that (EM) and (E3) although similar are not equivalent. To express the above

matrices N(z) and D(z) in (BM) as P(z) and Q(z) of (E3), we need to re-arrange the

216

polynomials in the matrices to be in sub-matrices containing co-efﬁcients of descending

power of z, i.e., for a causal P(z), we have

P(z) é

 

and

(2(2) é

 

 

where,

51 1,0
521,0

 

npl,0

11 1,0

 

-— — -' F'—— — —— 1
"12,0 "1m,0 "11,1 "12,1 "1m,1
522,0 ﬁZm,O + 521,1 522,1 7721111
51220 ﬁpm,0d _ﬁpu ﬁp2,l ﬁpde
"-- —- — q T
"l l,n "12,71 nlm,n
572131 722m 72111,» _n
...... + . : . z
_ﬁplm 5,12," ﬁpm,n_ _
"in,l 0 0 ‘
__1 . . . Z +'
0 : : :
l22,0 . O 0 11ml-
. I WILM O O -
0 1pm.0_ 0 122,114
+ o
L _ 0 0 [pmMJ

P(z) : [P0 +1012:l + ------ + Pnz'"]

 

 

 

 

 

 

 

 

 

Q(z) : [Q1271 + ...... + QMz‘M ] and Q0 : I

ﬁij, k : represents the k’h ﬁlter tap for the if” modiﬁed numerator ﬁlter, and

Iij,

217

 

 

 

 

_1 +
(13.15)
1
. (E.16)
z—M
(E.17)
(E.18)

k : represents the k’h ﬁlter tap for the if” common column denominator ﬁlter

BIBLIOGRAPHY

218

BIBLIOGRAPHY

[1] 3GPP, Multiplexing and Channel Coding (FDD), in rs 25.212, V3.3.0. 2002.

[2] 3GPP, Physical channels and mapping of transport channels onto physical
channels (FDD), in TS 25.211, V3.3.0. 2002.

[3] 3GPP, Spreading and Modulation (FDD), in TS 25.213, V4.5.0. 2002.

[4] 3GPP, UE Radio Transmission and Reception (FDD), in TR 25.101, V4.5.0.
2002.

[5] Amari, 8., Theory of Adaptive Pattern Classiﬁers. IEEE Trans. on Electronic
Computers, 1967. EC-l6(3): p. 299-307.

[6] Amari, S., Differential-Geometrical Methods in Statistics Lecture Notes in
Statistics 1985. 28.

[7] Amari, S., Learning and Statistical Inference, in Hand-book of Brain Theory and
Neural Networks, M.A. Arbib, Editor. 1995, MIT Press: Cambridge, MA. p. 522-526.

[8] Amari, S. Neural Learning in Structured Parameter Spaces-Natural Riemannian
gradient. in Neural Info. Processing Systems, NIPS-96. 1996: MIT Press.

[9] Amari, S., Natural Gradient Works Efﬁciently in Learning. Neural Computation,
1998.10:p.251—276.

[10] Amari, S., Natural Gradient Learning for Over- and Undercomplete Bases in ICA.
Neural Computation, 1999. 11(8): p. 1875-1883.

[1 1] Amari, S. and J.F. Cardoso, Blind Source Separation: Semi-Parametric Statistical
Approach. IEEE Trans. on Signal Process, 1997. 45(11): p. 2692—2700.

[12] Amari, S., T.-P. Chen, and A. Cichocki, Stability Analysis of Adaptive Blind
Source Separation. Neural Networks, 1997. 10(8): p. 1345—1352.

[13] Amari, S., A. Cichocki, and H.H. Yang, A New Learning Algorithm for Blind
Signal Separation, in Advances in Neural Information Processing Systems. 1996, MIT
Press: Cambridge, MA. p. 757-763.

[14] Amari, 8., SC. Douglas, A. Cichocki, and H.H. Yang, Multichannel Blind
Deconvolution and Equalization using the Natural Gradient, in Proceeding of IEEE
Workshop on Signal Processing, Adv. in Wireless Communications. 1997: Paris, France.
p. 101—104.

219

[15] Amari, S. and M. Kawanabe, Information Geometry of Estimating Functions in
Semiparametric Statistical Models. Bernoulli, 1997. 3: p. 29-54.

[16] Amari, S. and H. Nagaoka, Methods of Information Geometry. 1999: AMS and
Oxford University Press. '

[l7] Antsaklis, PI. and AN. Michel, Linear Systems. 1997: Electrical and Computer
Engineering Series, McGraw-Hill.

[l8] Araki, S., S. Makino, R. Mukai, T. Nishikawa, and H. Saruwatari, Fundamental
Limitation of Frequency Domain Blind Source Separation for Convolved Mixture of
Speech, in 3rd Int'l Symposium on ICA and BSS, ICA-2001. 2001: San Diego, CA. p.
132-137.

[19] Attick, J.J., Could Information Theory provide an Ecological Theory of Sensory
Processing ? Network: Computation in Neural Systems, 1992. 3(2): p. 213-251.

[20] Barlow, H.B., The coding of sensory messages, in Current Problems in Animal
Behavior, W.H. Thorpe and CL. Zangwill, Editors. 1961, Cambridge University Press:
MA. p. 331-361.

[21] Bell, AJ. and TJ. Sejnowski, An Inforrnation-Maximization Approach to Blind
Separation and Blind Deconvolution. Neural Computation, 1995. 7: p. 1129—1159.

[22] Benveniste, A., M. Goursat, and G. Ruget, Robust Identiﬁcation of a Non-
minimum Phase System: Blind Adjustment of a Linear equalizer in Data
Communications. IEEE Trans. on Automatic Control, 1980. 25: p. 385—399.

[23] Boﬁll, P. and M. Zibulevsky, Blind Separation of More Sources than Mixtures
using the Sparsity of their Short Term Fourier Transform, in Proc. of Int’l Symposium on

Independent Component Analysis and Blind Source Separation, ICA-2000. 2000:
Helsinki, Finland. p. 87-92.

[24] Boﬁll, P. and M. Zibulevsky, Underdetermined Blind Source Separation using
Sparse Representation. Signal Processing, 2001. 81(1 1): p. 2353-2362.

[25] Broeck, C.V.d. and P. Reimann, Unsupervised Learning by Examples: On-line
versus Off-line. Physical Review Letters, 1996. 76: p. 2181-2191.

[26] Cao, J. and N. Murata, A Stable and Robust ICA Algorithm Based on t-
distribution and Generalized Gaussian Distribution Models, in Neural Networks for
Signal Processing IX. 1999. p. 283-292.

[27] Cardoso, J.F., Blind Signal Processing: Statistical Principles. Proceedings of the
IEEE, 1998. 90(8): p. 2009—20026.

[28] Cardoso, J.F., Entropic Contrasts for Source Separation: Geometry and Stability,
Chapter 4, in Unsupervised Adaptive Filtering. 1999, John Wiley & Sons.

220

[29] Cardoso, J.F. and B. Laheld, Equivariant Adaptive Source Separation. IEEE
Trans. on Signal Processing, 1996. 44: p. 3017—3030.

[30] Charlier, G., Application, de la theorie des probabilites a l'astronomie, Part of the
traite. 1931, Paris: Gauthier-Villars. '

[31] Chen, S.S., D.L. Donoho, and M.A. Saunders, Atomic Decomposition by Basis
Pursuit, in Technical Report. 1996: Dept. of Statistics, Stanford University, CA

[32] Chevalier, P., On the Performance of Higher Order Blind Source Separation
Methods, in IEEE-ATHOS workshop on HOS. 1995: Begur Spain. p. 30-34.

[33] Cichocki, A. and S. Amari, Adaptive Blind Signal and Image Processing. 2002:
John Wiley and Sons.

[34] Cichocki, A., S. Amari, and J. Cao, Neural Network Models for Blind Separation
of Time Delayed and Convolved Signals. IECE Trans. Fundamentals, 1997. E82-A(9): p.
1595-1603.

[35] Cichocki, A., 8.1. Amari, and J. Cao. Blind Separatrion of Delayed and
Convolved Sources with Self Adaptive Learning Rate. in Int'l Symposium on Nonlinear
Theory and Applications, NOLTA-96. 1996. Kochi, Japan.

[36] Cichocki, A., R. Unbehauen, and E. Rummert, Robust learning algorithm for
blind separation of signals. Electronics Letters, 1994. 30(17): p. 1386-1387.

[37] Common, P., Independent Component Analysis, a new concept ? Signal
Processing, 1994. 36(3): p. 287-314.

[38] Cover, T.M. and J.A. Thomas, Elements of Information Theory. 1991: Wiley
Interscience Series.

[39] Cristescu, R., T. Ristaniemi, J. Joutsensalo, and J. Karhunen, Blind Separation of
Convolved Mixtures for CDMA Systems, in Proc. of the European Signal Processing
Conference, EUSIPCO-2000. 2000: Tampere, Finland. p. 619-622.

[40] Erten, G. and FM. Salam, Real Time Separation of Audio Signals Using Digital
Signal Processors, in Proceedings of the 40th Midwest Symposium on Circuits and
Systems. 1998. p. 1237 -1240.

[41] Erten, G. and F .M. Salam, Voice Output Extraction by Signal Separation, in
Proceedings of the 1998 IEEE International Symposium on Circuits and Systems. 1998.
p. 5—8.

[42] Erten, G. and FM. Salam, Voice Extraction by On-line Signal Separation and

Recovery. IEEE Transactions on Circuits and Systems 11: Analog and Digital Signal
Processing, 1999. 46(7): p. 915—922.

221

[43] Gaeta, M. and J .-L. Lacounme, Source Sparation without a priori Kowledge: The
Mximum Lkelihood Slution, in Proc. Eur. Signal Processing Conf. 1990. p. 621-624.

[44] Gesbert, D., J. Sorelius, P. Stoica, and A. Paulraj, Blind Multi-User MMSE
Detectior for CDMA Signals in 181 channels. IEEE Communications Letters, 1999. 3(8):
p. 233-235.

[45] Gharbi, A.B.A. and FM. Salam, Separation of Mixed Signals in Dynamic
Environments: Formulation and some Implementation, in Proc. of 37th Midwest

Symposium on Circuits and Systems. 1994: Lafayette, LA. p. 587-590.

[46] Gharbi, A.B.A. and EM. Salam, Implementation and Test Results of a Chip for
the Separation of Mixed Signals. IEEE Transactions on Circuits and Systems, 1995.
42(11): p. 748—751.

[47] Gharbi, A.B.A. and FM. Salam, Algorithms for Blind Signal Separation and
Recovery in Static and Dynamic Environments, in Proceedings of 1997 Int’l Symp. on
Circuits and Systems. 1997: Hong Kong. p. 713—716.

[48] Girolami, M., An Alternative Perspective on Adaptive Independent Component
Analysis Algorithms, in Technical Report, Computing and Information Systems. 1998:
Paisley University, Scotland.

[49] Girolami, M., A]. Bell, and T.J. Sejnowski, A Unifying Information Theoretic
Framework for Independent Component Analysis. Int’l Journal on Mathematical and
Computer Modeling 2000. 38: p. 1—21. -

[50] Girolami, M. and C. Fyfe, Extraction of Independent Signal Sources using a
Deﬂationary Exploratory Projection Pursuit Network with Lateral Inhibition. IEE
Proceedings on Vision, Image and Signal Processing, 1997. 14(5): p. 299-306.

[51] Girolami, M. and C. Fyfe, Generalised Independent Component Analysis through
Un-supervised Learning With Emergent Bussgang Properties, in Proc. International
Conference on Neural Networks. 1997: Houston, TX. p. 1788-1791.

[52] Haykin, 8., Adaptive Filter Theory. 4 ed. 1996: Prentice-Hall.

[53] Haykin, S., Neural Networks: A Comprehensive Foundation. 2 ed. 1998: Prentice
Hall.

[54] Haykin, S., Unsupervised Adaptive Filtering, ed. S. Haykin. Vol. I and II. 2000:
John Wiley and Sons.

[55] Heikkila, M.J., A Novel Blind Adaptive Algorithm for Channel Equalization in
WCDMA Downlink, in 12th IEEE International Symposium on Personal, Indoor and
Mobile Radio Communications, PIMRC-Ol. 2001: San Diego, CA. p. A-41--A-45.

222

[56] Herault, J. and C. Jutten, Space or Time Adaptive Signal Processing by Neural
Network Models AIP Conference Proceedings, 1986. 151: p. 206-211.

[57] Holma, H. and A. Toskala, WCDMA for UMTS Radio Access for Third
Generation Mobile Communications. 2001: John Wiley & Sons, Inc.

[58] Honig, M., U. Madhow, and S. Verdr'r, Blind Adaptive Multiuser Detection. IEEE
Trans. on Information Theory, 1995. 41(4): p. 944-960.

[59] Hooli, K., M. Latva-Aho, and M. Juntti, Performance Evaluation of Adaptive
Chip-Level Channel Equalizers in WCDMA Downlink, in IEEE International Conference
on Communications. 2001. p. 1974 —l979.

[60] Hyvarinen, A., J. Karhunen, and E. Oja, Independent Component Analysis. 2001:
Wiley Inter-Science Series.

[61] Hyvarinen, A. and E. Oja, A Fast Fixed-Point Algorithm for Independent
Component Analysis. Neural Computation, 1997. 9: p. 1483—1492.

[62] J outsensalo, J. and T. Ristaniemi, Blind Multi-user Detection by Fast Fixed Point
Algorithm without Prior Knowledge of Symbol Level Timing, in Proc. of the IEEE
Signal Processing Workshop on Higher Order Statistics. 1999: Ceasarea, Israel. p. 305-
308.

[63] Jutten, C. and J. Herault, Blind separation of sources, an adaptive algorithm based
on neuromimetic architecture. Signal Processing, 1991. 24(1): p. 1-31. .

[64] Karhunen, J ., Neural Approaches to Independent Component Analysis and Source
Separation, in Proc. of 4th European Symp. on Artiﬁcial Neural Networks (ESANN-96).
1996: Bruges, Belgium. p. 249-266.

[65] Karhunen, J. and J. Joutsensalo, Representation and Separation of Signals Using
Non-linear PCA Type Learning. Neural Networks, 1994. 7(1): p. 113-127.

[66] Karkkéiinen, K.H.A. and RA. Leppéinen, The Inﬂuence of Initial-Phases of a PN
Code Set on the Performance of an Asynchronous DS-CDMA System. Wireless Personal
Communications, 2000. 3(3): p. 279-293.

[67] Karvanen, J., E. J., and K. V., Adaptive Score Functions for Maximum
Likelihood ICA. Journal of VLSI Signal Processing, 2002. 32: p. 83-92.

[68] Karvanen, J. and V. Koivunen, Blind Separation Methods Based on Pearson
system its Extensions. Signal Processing, 2002. 82(4): p. 663-673.

[69] Kendall, M. and A. Stuart, The Advanced Theory of Statistics. Vol. I. 1977:
Charles Griffin and Co. Ltd.

223

 

[70] Kenney, J.F. and ES. Keeping, The k-Statistics, in Mathematics of Statistics, V.
Nostrand, Editor. 1962: Princeton, NJ. p. 99-100.

[71] Khalil, H.K., Nonlinear Systems. 3 ed. 2002: Prentice Hall.

[72] Kullback, S., Topics in Statistical Information Theory. 1987, New York:
Springer-Verlag.

[73] Lambert, R., Multichannel Blind Deconvolution: FIR Matrix Algebra and
Separation of Multipath Mixtures. 1996.

[74] Lambert, R. and A.J. Bell, Blind separation of Multiple Speakers in a Multipath
Environment, in Int’l Conference of Acoustics, Speech and Signal Processing, ICASSP-
97. 1997: Munich, Germany. p. 423-426.

[75] Lee, T.W., A.J. Bell, and R. Lambert, Blind separation of Delayed and Convolved
Sources. Advances in Neural Information Processing Systems, 1997. 9: p. 758-764.

[76] Lee, T.W., A.J. Bell, and R. Orglmeister, Blind Source Separation of Real-World
Signals, in Proc. of IEEE Int’l Conference on Neural Networks ICNN-97. 1997: Houston,
TX. p. 2129—2135.

[77] Lee, T.W. and M. Girolami, Advances in Independent Component Analysis.
2000: Springer Verlag.

[78] Lee, T.-W., M. Girolami, and T.J. Sejnowski, Independent Component Analysis
using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian
Sources. Neural Computation, 1999. 11(2): p. 417—441.

[79] Lee, T.W. and T.J. Sejnowski, Independent Component Analysis for Sub-
Gaussian and Super-Gaussian Mixtures, in 4th Joint Symposium on Neural Computation.
1997, Institute for Neural Computation. p. 132—140.

[80] Lewicki, M. and T.J. Sejnowski, Learning Overcomplete Representations. Neural
Computation 2000. 12: p. 337-365.

[81] Lewis, FL. and V.L. Syrmos, Optimal Control. 2 ed. 1995: Wiley Interscience
Series.

[82] Li, H. and H.V. Poor, Iterative Channel Estimation and Multiuser Detection in
Multipath CDMA Channels, in Proc. of IEEE Int’l Conference on Communications 2003:
Anchorage, Alaska, USA. p. to appear.

[83] Linsker, R., A local learning rule that enables information maximization for
arbitrary input distributions. Neural Computations, 1997. 9(8): p. 1661-1665.

[84] Mahram, H.H., H.E. B011, and G. Alirezaei, Performance Evaluation of Advanced
Receivers for WCDMA Downlink Detection, in Proc. of 5th Int’l Symposium on

224

 

Wireless Personal Multimedia Communications, WPMC-2002. 2002: Honolulu, Hawaii,
USA.

[85] Mathis, H., T.P. Von-Hoff, and M. Joho, Blind Separation of Signals with Mixed
Kurtosis Signs using Threshold Activation Functions. IEEE Transactions on Neural
Networks, 2001. 12(3): p. 618—624.

[86] Michel, AN. and D. Liu, Qualitative Analysis and Synthesis of Recurrent Neural
Networks. 2002, New York: Marcel Dekker.

[87] Miller, J.H. and 1B. Thomas, Detectors for Discrete-Time Signals in Non-
Gaussian noise. IEEE Transactions on Information Theory, 1972. IT-18(2): p. 241-250.

[88] Nadal, J .P. and N. Parga, Redundancy Reduction and Independent Component

Analysis: Conditions on Cumulants and Adaptive approaches. Neural Computation,
1997. 9(7): p. 1421-1456.

[89] Nicholls, J ., From Neuron to Brain. 4 ed. 2001: Sinauer Associates Incorporated.

[90] Nishikawa, T., H. Saruwatari, and K. Shikano, Blind Source Separation Based on
Multi-Stage ICA Using Frequency-Domain ICA and Time-Domain ICA, in Int'l
conference on Fundamentals of Electronics, Communications and Computer Sciences,
ICFS-2002. 2002. p. 7-12.

[91] Opper, M, Online versus Ofﬂine Learning from Random Examples: General
Results. Physics Review Letters, 1996.77: p. 4671-4674.

[92] Papoulis, A., Probability, Random Variables and Stochastic Processes. 1984, New
York: McGraw-Hill.

[93] Parzen, E., On the Estimation of a Probability Density Function and the Mode.
Annals Mathematical Statistics, 1962.33: p. 1065-1076.

[94] Pearlrnutter, B. and L. Parra, A Context-Sensitive Generalization of Independent
Component Analysis, in 1996 International Conference on Neural Information
Processing. 1996: Hong Kong.

[95] Pearson, K., Contributions to the Mathematical Study of Evolution. Philosophical
Transactions of the Royal Society A, 1894. 185: p. 71-110.

[96] Prator, O., C. Unger, A. Zoch, and GP. Fettweis, Performance of Adaptive Chip
Equalization for the WCDMA Downlink in Fast Changing Environments, in IEEE 7th
lnt’l Symposium on Spread Spectrum Technology and Applications, ISSSTA-02. 2002:
Prague, Czech Republic. p. 273-277.

[97] Principe, J ., D. Xu, and J. Fisher, Information-Theoretic Learning, Chapter 7,, in
Unsupervised Adaptive F iltering,, S. Haykin, Editor. 1999, John Wiley & Sons.

225

 

[98] Proakis, J .G., Digital Communications. 4 ed. 2001: McGraw-Hill.

[99] Puntonet, CG. and A. Prieto, An Adaptive Geometrical Procedure for Blind
Separation of Sources. Neural Processing Letters, 1995. 2.

[100] Renyi, A., On measures of Entropy and Information. 1 ed. Selected Papers of
Alfred Renyi. Vol. 2. 1976: Budapest Akademia Kiado.

[101] Ristaniemi, T. and J. Joutsensalo, Advanced ICA-based Receivers for Block
Fading DS-CDMA Channels. Signal Processing, 2002. 82(3): p. 417-431.

[102] Roth, Z. and Y. Baram, Multi-dimensional Density Shaping by Sigmoids. IEEE
Transactions on Neural Networks, 1996. 7(5): p. 1291-1298.

[103] Rugh, W.J., Linear System Theory. 2 ed. 1996: Prentice Hall, NJ.

[104] Rumelhart, D.E., G.E. Hinton, and RJ. Williams, Learning Internal
Representations by Error Propagation, in Parallel Distributed Processing, S. Amari,
Editor. 1986, MIT Press: Cambridge, MA. p. 318-362.

[105] Sabala, I., A. Cichocki, and S. Amari. Relationships between Instantaneous Blind
Source Separation and Multichannel Blind Deconvolution. in Proc. Int. Joint Conference
on Neural Networks. 1998. Alaska USA.

[106] Salam, F .M., An Adaptive Network for Blind Separation of Independent Signals,
in Proc. of International Symposium on Circuits and Systems. 1993: Chicago, IL. p. 431—
434.

[107] Salam, F.M. and G. Erten, Blind Signal Separation and Recovery in Dynamic
Environments, in Proceedings of 3rd IEEE Workshop on Nonlinear Signal and Image
Processing 1997: Mackinac Island, MI.

[108] Salam, F.M. and G. Erten, Exact Entropy Series Representation for Blind Source
Separation, in Proceedings of IEEE SMC Conference. 1999. p. 553—558.

[109] Salam, F.M. and G. Erten, The State Space Framework for Blind Dynamic Signal
Extraction and Recovery, in Proceedings of the IEEE International Symposium on
Circuits and Systems. 1999. p. 66—69.

[110] Salam, F.M., G. Erten, and K. Waheed, Blind Source Recovery: Algorithms for
Static and Dynamic Environments, in Proc. of INNS-IEEE International Joint Conference
on Neural Networks. 2001. p. 902—907.

[111] Salam, F.M., A.B.A. Gharbi, and G. Erten, F orrnulation and Algorithms for Blind
Signal Recovery, in Proceedings of the 40th Midwest Symposium on Circuits and
Systems. 1998. p. 1233—1236.

226

 

[112] Salam, F.M. and K. Waheed, State-Space Feedforward and Feedback Structures
for Blind Source Recovery, in Proc. of 3rd International Conference on Independent
Component Analysis and Blind Signal Separation. 2001: San Diego, CA. p. 248—253.

[113] Schobben, D., K. Torkkola, and P. Smaragdis, Evaluation of Signal Separation
Methods, in Proc. of lst Intl. workshop on ICA and BSS, ICA-99. 1999: Aussois, France.
p. 261-266.

[114] Taleb, A. and C. Jutten, Sources Separation in Post-Nonlinear Mixtures. IEEE
Transactions on Signal Processing, 1999. 10(47): p. 2807-2820.

[115] Theis, F.J., A. Jung, E.W. Lang, and CG. Puntonet, A Theoretical Model for
Linear Geometric ICA, in Proc. of 3rd Int’l Symposium on Independent Component
Analysis and Blind Source Separation, ICA-2001. 2001. p. 349-354.

[116] Theis, F .J . and E.W. Lang, Formalization of Two-Step Approach to Overcomplete
BSS; , in IASTED Int’l Conf. on Signal & Image Proc. SIP-2002. 2002: Hawaii, USA.

[117] Theis, F.J. and E.W. Lang, Geometric Overcomplete ICA, in Proc. of 10th
European Symp. on Artiﬁcial Neural Networks, ESANN 2002. 2002: Bruges, Belgium.
p.217-223,.

[118] Torkkola, K., Blind Separation of Convolved Sources based on Information
Maximization, in IEEE Workshop on Neural Networks for Signal Processing. 1996:
Kyoto, Japan. p. 423—432.

[119] Tsypkin, Y.Z., Foundation of the Theory of Learning Systems. 1973, New York:
Academic Press.

[120] Verdu, S. Minimum Probability of Error for Asynchronous Gaussian
Multiple-Access Channels. in IEEE Transactions on Information Theory. 1986.

[121] Verdu, S., Multiuser Detection. 1998: Cambridge University Press.

[122] Waheed, K., K. Desai, and F.M. Salem, Adaptive RAKE-Blind Source Recovery
Algorithms for 3GPP UMTS/WCDMA Downlink Receivers, in 2003 IEEE International
Conference on Robotics, Intelligent Systems & Signal processing, RISSP-2003. 2003:
Changsha, Hunan, China. p. to appear.

[123] Waheed, K., K. Desai, and F.M. Salem, Blind Multi User Detection in DS-CDMA
Systems using the Natural Gradient based Symbol Recovery Structures, in 4th

International Symposium on Independent Component Analysis and Blind Source
Separation, ICA-2003. 2003: Nara, Japan. p. 727-732.

[124] Waheed, K., K. Desai, and F.M. Salem, Natural Gradient based Blind Multi User
Detection in QPSK DS-CDMA Systems, in 2003 IEEE-INNS Joint International
Conference on Neural Networks, IJCNN-2003. 2003: Portland OR. p. to appear.

227

[125] Waheed, K., K. Desai, and F.M. Salem, Blind Info-Theoretic Multi-User
Detection Algorithms for DS-CDMA and WCDMA Downlink Systems. IEEE
Transactions on Neural Networks, submitted 2003: p. to appear.

[126] Waheed, K. and F .M. Salam, Blind Source Recovery: Some Implementation and

Performance Issues, in Proceedings of the 44th IEEE Midwest Symposium on Circuits
and Systems. 2001: Dayton, OH. p. 694-697.

[127] Waheed, K. and F .M. Salam, Blind Source Recovery using an Adaptive
Generalized Gaussian Score Function, in 45th IEEE Int’l Midwest Symposium on
Circuits and Systems. 2002: Tulsa, OK p. 418-421.

[128] Waheed, K. and F.M. Salam, Cascaded Structures for Blind Source Recovery, in
45th IEEE Int’l Midwest Symposium on Circuits and Systems. 2002: Tulsa, OK. p. 656-
659.

[129] Waheed, K. and F.M. Salam, State Space Blind Source Recovery of Non-
minimum Phase Environments, in 45th IEEE International Midwest Symposium on
Circuits and Systems. 2002: Tulsa, OK. p. 422-425.

[130] Waheed, K. and F .M. Salam, State-Space Blind Source Recovery for Mixtures of
Multiple Source Distributions, in Proc. of IEEE Int’l Symposium on Circuits and
Systems. 2002: Scottsdale, Arizona. p. 197—200.

[131] Waheed, K. and F.M. Salam, A Data-Derived Quadratic Independence Measure
for Adaptive Blind Source Recovery in Practical Applications, in 45th IEEE Int’l
Midwest Symp. on Circuits and Systems. 2002 Tulsa, OK. p. 473-476.

[132] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis, in
2003 IEEE International Conference on Robotics, Intelligent Systems & Signal
processing, RISSP-2003. 2003: Changsha, Hunan, China. p. to appear.

[133] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis: An
approach for Separation of Overcomplete Speech Mixtures, in 2003 IEEE-INNS Joint

International Conference on Neural Networks, IJCNN-2003. 2003: Portland OR. p. to
appear.

[134] Waheed, K. and F.M. Salem, Algebraic Overcomplete Independent Component

Analysis, in 4th International Symposium on Independent Component Analysis and Blind
Source Separation, ICA-2003. 2003: Nara, Japan. p. 1077-1082.

[135] Waheed, K. and F.M. Salem, Blind Source Recovery: A State Space Formulation,
in Contemporary Issues in System Stability and Control with Applications, D. Liu and
P]. Antsaklis, Editors. 2003, Birkhauser Boston, MA. p. 171-196.

[136] Waheed, K. and F.M. Salem, New Hyperbolic Source Density Models for Blind
Source Recovery Score Functions, in IEEE International Symposium on Circuits and
Systems. 2003: Bangkok, Thailand. p. to appear.

228

 

[137] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis and its
Application to Overcomplete Blind Source Separation. Neural Processing Letters,
submitted 2003: p. to appear.

[138] Waheed, K. and F.M. Salem, Blind Source Recovery for Multiple Source
Distributions using Adaptive Score Functions. Neural Processing Letters, submitted
2003: p. to appear.

[139] Waheed, K. and F.M. Salem, Blind Source Recovery: A Framework in the State
Space. Journal of Machine Learning Research, submitted 2003: p. to appear.

[140] Waheed, K. and F.M. Salem, State Space Feedforward and Feedback Structures
for Blind Source Recovery. Neural Processing Letters, submitted 2003: p. to appear.

[141] Wang, X. and H.V. Poor, Blind Equalization and Multiuser Detection in
Dispersive CDMA Channels. IEEE Transactions on Communications, 1998. 46(1): p. 91-
103.

[142] Wang, X. and H.V. Poor, Subspace Methods for Blind Joint Channel Estimation
and Multiuser Detection in CDMA Systems. Wireless Networks, 2000. 6(1): p. 59-71.

[143] Website, 3rd Generation Partnership Project Website. hm://W.3mg. 2003.
[144] Website, Blind Source Recovery. http://www.cg:.msu.edu/bsr. 2003.
[145] Website, CDMA Group website. http://www.cdg.org[index.asp. 2003.

[146] Werner, S. and J. Lilleberg, Downlink Channel Decorrelation in CDMA Systems
with long Codes, in IEEE Vehicular Technology Conference, VTC-99 1999: Houston,
Texas, USA. p. 1614 -l617.

[147] Widrow, B., A Statistical Theory of Adaptation. 1963: Oxford Pergamon Press.

[148] Woods, J.W. and H. Stark, Probability and Random Processes with Applications
to Signal Processing. 3 ed. 2001: Prentice Hall.

[149] Wu, Y., Q. Zhang, W. Zhu, and S.-Y. Kung, Spreading Code Assignment in an
Ad-hoc DS-CDMA wireless network, in International Conference on Communications,
ICC-2002. 2002: New York, NY. p. 3066 —3070.

[150] Xu, D., Energy, Entropy and Information Potential in Neurocomputing, in Ph.D.
Thesisl998, Department of Electrical Engineering, University of Florida: Florida.

[151] Yang, H.H., S. Amari, and A. Cichocki, Information Back-Propagation for Blind
Separation of Sources from Non-linear Mixtures, in Proc. of Int’l Conference on Neural
Networks ICNN-97. 1997: Houston, TX. p. 2141—2146.

229

[152] Zhang, J.-G., A.B. Sharmaand, and WC. Kwong, Optical CDMA-based ATM
Switches Supporting Equal-Bit-Rate and Variable-Bit-Rate Communication Services, in
IEEE Global Telecommunications Conference, GLOBECOM -98. 1998. p. 3221 —3226.

[153] Zhang, L.Q., S. Amari, and A. Cichocki, Natural Gradient Approach to Blind
Separation of Over- and Under-complete Mixtures, in First International Workshop on

Independent Component Analysis and signal Separation (ICA99). 1999: Aussois, France.
p. 455-460.

[154] Zhang, L.Q., S. Amari, and A. Cichocki, Semiparametric Approach to
Multichannel Blind Deconvolution of Non-minimum Phase Systems. Advances in Neural
Information Processing Systems 2000. 12: p. 363—369.

[155] Zhang, L.Q. and A. Cichocki, Blind Deconvolution of Dynamical Systems: A
State Space Approach. Journal of Signal Processing, 2000. 4(2 ): p. 111-130.

[156] Zhang, L.Q., A. Cichocki, and S. Amari, Multichannel Blind Deconvolution of
Nonminimum Phase Systems using Information Backpropagation, in Proc. of 5th Int’l
Conf. on Neural Information Processing. 1999: Perth, Australia. p. 210—216.

[157] Zhang, L.Q., A. Cichocki, and S. Amari, Kalman Filter and State-Space Approach
to Multichannel Blind Deconvolution, in IEEE Workshop on Neural Networks for Signal
Processing. 2000: Sydney, Australia. p. 425—434.

[158] Zibulevsky, M. and BA. Pearlrnutter, Blind Source Separation by Sparse
Decomposition in a Signal Dictionary. Neural Computation, 2001. 13(4): p. 863-882.

230

llfl