Adaptive Independent Component Analysis:
Theoretical Formulations and Application to CDMA Communication System
with Electronics Implementation
By
Zaid Albataineh

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Electrical Engineering – Doctor of Philosophy
2014

ABSTRACT
Adaptive Algorithms for Independent Component Analysis:
Formulations and Application to CDMA Communication systems with
Electronic Implementation
By
Zaid Albataineh
Blind Source Separation (BSS) is a vital unsupervised stochastic area that seeks to
estimate the underlying source signals from their mixtures with minimal assumptions
about the source signals and/or the mixing environment. BSS has been an active area of
research and in recent years has been applied to numerous domains including biomedical
engineering, image processing, wireless communications, speech enhancement, remote
sensing, etc. Most recently, Independent Component Analysis (ICA) has become a vital
analytical approach in BSS. In spite of active research in BSS, however, many
foundational issues still remain in regards to convergence speed, performance quality and
robustness in realistic or adverse environments. Furthermore, some of the developed
BSS methods are computationally expensive, sensitive to additive and background noise,
and not suitable for a real-time or real world implementation.
In this thesis, we first formulate new effective ICA-based measures and their
corresponding robust adaptive algorithms for the BSS in dynamic “convolutive mixture”
environments. We demonstrate their superior performance to present competing
algorithms. Then we tailor their application within wireless (CDMA) communication
systems and Acoustic Separation Systems. We finally explore a system realization of one
of the developed algorithms among ASIC or FPGA platforms in terms of real time speed,
effectiveness, cost, and economics of scale.

Firstly, we propose a new class of divergence measures for Independent
Component Analysis (ICA) for estimating sources from mixtures. The Convex CauchySchwarz Divergence (CCS-DIV) is formed by integrating convex functions into the
Cauchy-Schwarz inequality. The new measure is symmetric and convex with respect to
the joint probability, where the degree of convexity can be tuned by a (convexity)
parameter. A non-parametric (ICA) algorithm generated from the proposed divergence is
developed exploiting convexity parameters and employing the Parzen window-based
distribution estimates. The new contrast function results in effective parametric and nonparametric ICA-based computational algorithms. Moreover, two pairwise iterative
schemes are proposed to tackle the high dimensionality of sources.
Secondly, a new blind detection algorithm, based on fourth order cumulant
matrices, is presented and applied to the multi-user symbol estimation problem in Direct
Sequence Code Division Multiple Access (DS-CDMA) systems. In addition, we propose
three new blind receiver schemes, which are based on the state space structures. These
so-called blind state-space receivers (BSSR) do not require knowledge of the propagation
parameters or spreading code sequences of the users but relies on the statistical
independence assumption among the source signals.
Lastly, system realization of one of the developed algorithms has been explored
among ASIC or FPGA platforms in terms of cost, effectiveness, and economics of scale.
Based on our findings of current stat-of-the-art electronics, programmable FPGA designs
are deemed to be the most effective technology to be used for ICA hardware
implementation at this time.

I would like to dedicate this dissertation to my loving Father, Mohammad Taisser,
who sadly passed away in 2001. I wish he had a chance to be here today. A great and
special feeling of gratitude to my loving Mother, Bahjh, her constant love and caring are
the reasons of who and what I am. My gratitude and my love are beyond words to her and
to all my sisters and brothers; namely Jedah, Ziad, Jawdat, Jehan, Fareed and Jomana.
Lastly, I would like to dedicate this dissertation to my country Jordan.

iv

ACKNOWLEDGMENTS
First of all, I wish to express my great gratitude to my advisor, Dr. Fathi M.
Salem, for his guidance and support throughout all stages of this research. His advice and
encouragement have provided me with the skills needed to develop and refine this
research. Special Thanks for Dr. Hayder Radha, Dr. Yang Wang and Dr. Shantanu
Chakrabartty for agreeing to serve on my committee.
I would like to acknowledge and thank Michigan State University (MSU) for the
educational opportunity that allowed me to conduct my study and research. I also wish to
thank all my friends and colleagues in the ECE departments at MSU. A huge thanks to
all my friends who have assisted and supported me in so many ways during my study,
especially Reemon Haddad, Yazan Smadi, Taghleb, Saad, Zuhair, Khalil, Abu Rawhi and
Mustafa who have been friends and colleagues. I would also like to thank Yarmouk
University for supporting me financially during my study at MSU.
Lastly, I would like to express my great gratitude to my mother, brothers and
sisters for all their support and encouragement.

v

PREFACE
In Chapter 1, we provide the background needed to understand Blind Source
Separation Problems. We provide the definitions and theoretical background necessary to
understand Blind Techniques. We also address our motivations, problem definitions and
some concepts that are essential to grasp the topics covered by the different chapters of
this dissertation; namely, Blind Techniques in Wireless Communication, State Space
Approach and Adaptive Framework for Blind Source Separation.
In Chapter 2, we perform a thorough review of the BSS\ICA algorithms, and then we
give an overview of the ICA algorithms and emphasize the approaches that influenced
our work. We also study some of the methods that have been developed to solve the ICA
problems in the case of instantaneous and convolutive mixtures.
In Chapter 3, we propose a new Convex Cauchy–Schwarz Divergence (CCS-DIV)
measure for Blind Source Separation (BSS) and unsupervised learning of acoustic and
speech signals. The CCS-DIV measure is developed by integrating convex functions into
the Cauchy–Schwarz inequality. By including a convexity quality parameter, the measure
has a broad control range of its convexity. With this measure, a new CCS–ICA algorithm
is structured and a non-parametric form is developed incorporating the Parzen windowbased distribution. Moreover, the CCS–ICA algorithm has a controlled speed towards
timed convergence. Also, A two pairwise iterative schemes are proposed to tackle the
high dimensional problem in the blind source separation BSS.
In Chapter 4, we present a frequency-domain method based on robust independent
component analysis (RICA) to address the multichannel blind source separation (BSS)

vi

problem of the convolutive speech mixtures in highly reverberant environments. We
apply an algorithm to separate the source signals in adverse conditions, i.e.: high
reverberation conditions when the short observation signals are available. Furthermore,
we study the impact of several parameters on the performance of separation, e.g.
overlapping ratio and window type in frequency-domain method. We also compare
different techniques to solve the permutation ambiguity.
In Chapter 5, a new blind detection algorithm, based on fourth order cumulant
matrices, is presented and applied to the multi-user symbol estimation problem in Direct
Sequence Code Division Multiple Access (DS-CDMA) systems. The blind detection is to
estimate multiple symbol sequences in the downlink of a DS-CDMA communication
system using only the received wireless data and without any knowledge of the user
spreading codes. The proposed algorithm takes advantage of higher cumulant matrix
properties to reduce the computational load and enhance performance.
In Chapter 6, we develop three update laws in order to enhance the performance of
the blind detector based on the state space structures. Bit error rate (BER) simulations of
these methods are shown for different number of users, signal to noise ratio (SNR) and
different number of symbols per user in comparison with the Blind Multiuser Detectors
(BMUD), Linear Minimum mean squared error (LMMSE) and other conventional
detectors.
In Chapter 7, we introduce a constrained blind multiuser detection in order to
improve its performance with imposing the regularization parameter to cope the illconditioning problem of the covariance matrix and to mitigate the performance
degradation.

vii

In Chapter 8, we investigate the ICA algorithms in terms of hardware
implementation. Although software implementation is important to investigate the
capabilities of ICA algorithms and to simulate significant aspects of applications,
Hardware implementation provides real time solutions and an optimal parallelism method
in terms of fast convergence. Furthermore, software implementation may suffer from
insufficient memory problems because the large data sets of the ICAs’ applications and
its high dimensionality. Thus, hardware implementations are a promising approach to
implement the ICA algorithms and they are executed by Integrated Circuit (ICs). Owning
the high speed processing and the parallel architecture features make the hardware
implementation outperforming the software implementation in terms of sufficient
memory and fast convergence. Finally in Chapter 9 we conclude the dissertation and
highlight directions for future work.

viii

TABLE OF CONTENTS
LIST OF TABLES ......................................................................................................... xiii
LIST OF FIGURES ....................................................................................................... xiv
LIST OF ALGORITHMS ........................................................................................... xviii
1
Chapter 1 .................................................................................................................... 1
Introduction ......................................................................................................................... 1
1.1
Blind Techniques in Wireless Communication ......................................... 5
1.1.1 Direct-Sequence Code Division Multiple Access ................................. 7
1.1.2 Why Blind.............................................................................................. 9
1.1.3 CMDA SIGNAL MODELS ................................................................ 10
1.1.3.1 DS-CMDA Receiver Signal Model .......................................... 10
1.1.3.2 WCDMA Receiver Signal Model ............................................. 13
1.2
The State Space Framework .................................................................... 14
1.3
Adaptive Framework for Blind Source Separation ................................. 15
1.3.1 Invertible Mixtures .............................................................................. 16
1.3.1.1 Instantaneous mixtures .............................................................. 16
1.3.1.2 Convolutive Mixtures ............................................................... 18
1.3.2 Underdetermined mixtures .................................................................. 20
1.4
Dissertation Contributions ....................................................................... 21
2
Chapter 2 .................................................................................................................. 26
Literature Review.............................................................................................................. 26
2.1
Introduction.............................................................................................. 26
2.2
Principle Component Analysis (PCA) ..................................................... 30
2.3
Independent Component Analysis (ICA) ................................................ 32
2.3.1 The Instantaneous ICA Framework ..................................................... 33
2.3.1.1 Preprocessing ............................................................................ 34
2.3.1.2 Nonlinear function choice (Activation function): ..................... 35
2.3.1.3 The Learning Update Rules ...................................................... 36
2.3.1.3.1 Batch Learning (Offline learning):...................................... 36
2.3.1.3.2 Stochastic gradient (Online learning): ................................ 36
2.3.1.4 ICA based on Maximization of non-Gaussianity...................... 36
2.3.1.4.1 Kurtosis Measure ................................................................ 37
2.3.1.4.2 Gradient algorithm using kurtosis ....................................... 38
2.3.1.5 Fixed point algorithm ................................................................ 39
2.3.1.6 Negentropy Measure ................................................................. 40
2.3.1.7 ICA based on Maximum Likelihood Estimation ...................... 43
2.3.1.8 ICA based on Entropy Maximization ....................................... 46
2.3.1.9 Bell-Sejnowski method ............................................................. 46
2.3.1.10 ICA based on Tensorial Methods............................................ 48
ix

2.3.1.11 PARAllel FACtor (PARAFAC) algorithms ........................... 49
2.3.1.12 Joint Approximation Diagonalisation (JAD) .......................... 50
2.3.2 The Convolutive ICA Mixtures ........................................................... 50
2.3.2.1 Time-Domain Methods ............................................................. 54
2.3.2.2 TRINICON Blind Source Separation ....................................... 55
2.3.2.3 Frequency-Domain Methods: ................................................... 57
2.3.2.3.1 Lee’s approach .................................................................... 57
2.3.2.3.2 Smardagdis approach .......................................................... 59
2.3.2.3.3 Independent Vector Analysis (IVA) ................................... 60
2.3.2.3.4 Parra’s approach: ................................................................. 60
2.3.2.3.5 Recursive Convolutive ICA ................................................ 62
2.4
Ambiguities in ICA algorithms ............................................................... 62
2.4.1 Scale ambiguity ................................................................................... 63
2.4.2 Permutation ambiguity ........................................................................ 65
2.4.3 Circularity of Fast Fourier Transform (FFT) ....................................... 66
2.5
Performance Metrics of ICA methods ..................................................... 67
2.5.1 Instantaneous case ............................................................................... 67
2.5.1.1 Performance Matrix G .............................................................. 67
2.5.1.2 SNR measure: ........................................................................... 67
2.5.2 Convolutive case.................................................................................. 68
2.5.2.1 Performance Index: ................................................................... 68
2.5.2.2 Mutual Information measure:.................................................... 68
2.5.2.3 Performance Evaluation ............................................................ 68
3
Chapter 3 .................................................................................................................. 70
Convex Cauchy–Schwarz Independent Component Analysis for Blind Source Separation
........................................................................................................................................... 70
3.1
Introduction.............................................................................................. 71
3.2
A Brief Description of Previous Divergence Measures .......................... 74
3.2.1 Previous Divergence Measures ........................................................... 75
3.2.2 The proposed Divergence Measure ..................................................... 78
3.2.3 Link to other Divergences: .................................................................. 80
3.2.4 Geometrical Interpretation of the Proposed Divergence for હ = ૚
and	હ = −૚. ..................................................................................................... 81
3.2.5 Evaluation of Divergence Measures .................................................... 82
3.3
Convex Cauchy–Schwarz Divergence Independent Component Analysis
(CCS–ICA) ............................................................................................................... 87
3.4
Scenario of two or three source signals ................................................... 93
3.5
Computational Complexity...................................................................... 96
3.6
Simulation Results ................................................................................... 96
3.6.1 Sensitivity of CCS-DIV measure ................................................. 97
3.6.2 The performance and the convergence speed of the proposed
CCS-ICA algorithms versus the existing ICA-based algorithms .......... 100
3.6.3 Experiments on Speech and Music Signals ............................... 109
3.7
Conclusion ............................................................................................. 114
4

Chapter 4 ................................................................................................................ 116
x

A RobustICA-Based Algorithm for Blind Separation of Convolutive Mixtures ........... 116
4.1
Introduction............................................................................................ 116
4.2
Convolutive Mixtures ............................................................................ 120
4.2.1 Problem Definition ............................................................................ 120
4.1
Recursively Regularized ICA ................................................................ 123
4.2
The presented method Based on RobustICA framework ...................... 125
4.4.1 Step1: Preprocessing (Data Whitening) ............................................ 125
4.4.2 Step 2: Determining the rotation matrix (unitary matrix)	‫)ܟ(܃‬. ...... 127
4.5
Scaling and Permutation Ambiguities ................................................... 131
4.5.1 Estimation the diagonal matrix ࡰ࢝................................................... 132
4.5.2 Estimation the permutation matrix ડ(‫ )ܟ‬.......................................... 133
4.6
Experiments Results .............................................................................. 138
4.6.1 Section 1 ............................................................................................ 139
4.6.2 Section 2 ............................................................................................ 144
4.7
Conclusion ............................................................................................. 148
5
Chapter 5 ................................................................................................................ 149
Robust Blind Multiuser Detection Algorithm Using Fourth Order Cumulant Matrices 149
5.1
Introduction............................................................................................ 149
5.2
DS-CMDA Signal Model ...................................................................... 152
5.3
Robust and Fast Independent Component Analysis (ICA).................... 154
5.3.1 The Preprocessing.............................................................................. 155
5.3.2 The FastICA Algorithm ..................................................................... 156
5.3.3 The Robust ICA algorithm ................................................................ 158
5.4
The Proposed Detection Algorithm Based On Cumulant Matrices....... 159
5.4.1 Step1: Preprocessing (Data Whitening) ............................................ 159
5.4.2 Step 2: Determining the rotation matrix (unitary matrix) U. ............. 161
5.5
Simulation Results ................................................................................. 163
5.5.1 Performance ....................................................................................... 163
5.5.2 Measure of Computation ................................................................... 168
5.6
Conclusion ............................................................................................. 172
6
Chapter 6 ................................................................................................................ 173
Adaptive Blind Multiuser Detection DS-CDMA Based on State Space Approach ....... 173
6.1
Introduction............................................................................................ 174
6.2
DS-CMDA Signal Model ...................................................................... 179
6.3
Conventional Blind Linear Multiuser Detectors.................................... 182
6.3.1 Single user detection (SUD) Detector ............................................... 182
6.3.2 Rake Detector .................................................................................... 182
6.3.3 LMMSE Detector .............................................................................. 183
6.4
The Proposed Detection Schemes Based On state space framework .... 184
6.4.1 Step1: Preprocessing (Data Whitening) ............................................ 185
6.4.2 Step 2a: Determining the rotation matrix (unitary matrix) U based on
the feedforward structure. .............................................................................. 186
6.4.3 Step 2b: Determining the rotation matrix (unitary matrix) U based on
the feedback structure I. ................................................................................. 188
xi

6.4.4 Step 2c: Determining the rotation matrix (unitary matrix) U based on
the feedback structure II. ................................................................................ 191
6.4.5 The proposed adaptive detectors ....................................................... 193
6.5
Simulation Results ................................................................................. 193
6.6
Conclusion ............................................................................................. 205
7
Chapter 7 ................................................................................................................ 206
Constrained Blind Multiuser Detection for DS-CDMA System .................................... 206
7.1
Introduction............................................................................................ 206
7.2
DS-CMDA Signal Model ...................................................................... 208
7.3
Conventional Blind Multiuser Detection ............................................... 209
7.4
The Proposed Detection Scheme ........................................................... 212
7.5
Simulation Results ................................................................................. 215
7.6
Conclusion ............................................................................................. 219
8
Chapter 8 ................................................................................................................ 220
Hardware Implementation .............................................................................................. 220
8.1
Introduction............................................................................................ 221
8.2
Comparative Study of Existing Solutions to implement ICA Algorithms
…………………………………………………………………………221
8.1.1 Analog CMOS Integration Circuit .................................................... 222
8.1.2 Mixed Signal Techniques (Analog and Digital Circuit).................... 223
8.1.3 ASIC Solutions .................................................................................. 224
8.1.4 FPGA Solutions ................................................................................. 224
8.3
Multiplier Design ................................................................................... 229
8.4
Simulation Results ................................................................................. 231
8.5
Conclusion ............................................................................................. 233
9
Chapter 9 ................................................................................................................ 236
Conclusion and Future Work .......................................................................................... 236
9.1
Conclusion ............................................................................................. 236
9.2
Future Work ........................................................................................... 239
APPENDIX .................................................................................................................... 243
BIBLIOGRAPHY .......................................................................................................... 246

xii

LIST OF TABLES
Table 3.1: The performance of the ICA algorithm based on the proposed divergence .. 226
Table 3.2: The computational load, in seconds, of the ICA algorithm based on the
proposed divergence and other widely used ICA algorithms, each entry averages
over the corresponding number of trials. Observation mixtures consists of two
source signals that follow the same ........................................................................ 104
Table 3.3: The corresponding variance of the performance. .......................................... 104
Table 3.4: Kurtosis Values of the different probability density functions that used in the
ICA experiments ..................................................................................................... 107
Table 3.5: The performance of the ICA algorithm based on the proposed divergence in
terms of Amari error (multiplied by 100). Each entry averages over the
corresponding number of trials. .............................................................................. 107
Table 3.6: The computational load, in seconds, of the ICA algorithm based on the
proposed divergence and other widely used ICA algorithms, each entry averages
over the corresponding number of trials. ................................................................ 108
Table 3.7: The performance of the ICA algorithm based on the proposed divergence and
other widely used ICA algorithms in terms of Amari error (multiplied by 100). Each
entry averages over the corresponding number of trials. ........................................ 108
Table 8.1: Comparison of Analog, Mixed signal and ASIC Solutions ........................... 226
Table 8.2: Comparison of FPGA Solutions .................................................................... 227
Table 8.3: Comparison Results Among Various ICA Implementations......................... 228

xiii

LIST OF FIGURES
Figure 1.1: Illustration of the BSS Problem [2] .................................................................. 2
Figure 1.2: Block diagram of blind source separation ........................................................ 3
Figure 1.3: Wireless Communication Scenario [2]............................................................ 6
Figure 1.4: Conceptual State-Space model which illustrates the general linear ............... 15
Figure 2.1: Classifications for BSS problem. ................................................................... 28
Figure 2.2: Block diagram of the Convolutive Mixtures. ................................................. 52
Figure 2.3: Lee’s Block diagram ...................................................................................... 58
Figure 2.4: Smardagdis’s Block diagram.......................................................................... 59
Figure 2.5: Illustration of permutation ambiguity in frequency domain. ......................... 62
Figure 3.1: Illustration of Geometrical Interpretation of the proposed Divergence ......... 83
Figure 3.2: Different divergence measures versus the joint probability Px1, x2(A, A) .... 85
Figure 3.3: CCS-DIV and α-DIV versus the joint probability Px1, x2(A, A) ................... 85
Figure 3.4: CCS-DIV with various alphas versus the joint probability Px1, x2(A, A) ..... 86
Figure 3.5: The surfaces and Contours of CCS-DIV vs CS-DIV .................................... 86
Figure 3.6: Comparison of (a) CCS-DIV with α = 1, (b) CCS-DIV with α = -1, (c) KLDIV, (d) E-DIV, (e) CS-DIV (f) C-DIV with α = -1 of demixed signals as a function
of the demixing parameters θ1 and	θ2. .................................................................... 99
Figure 3.7: Comparison of SIRs (dB) of demixed two speeches and music signals by
using different ICA algorithms in parametric BSS task. ........................................ 105
Figure 3.8: Comparison of learning curves of C-ICA, E-ICA, KL-ICA, and ................ 106
Figure 3.9: Comparison of SIRs (dB) of demixed two speeches and music signals by
using different ICA algorithms in parametric BSS task-- random initial value. .... 111
Figure 3.10: Comparison of learning curves of C-ICA, E-ICA, KL-ICA, and CCS-ICA
with α=1, and α=-1 in a two-source BSS task with random initial value. .............. 111
xiv

Figure 3.11: Comparison of SIRs (dB) of demixed two speeches and ........................... 112
Figure 3.12: Comparison of learning curves of C-ICA, E-ICA, ..................................... 112
Figure 3.13: Comparison of SIRs (dB) of demixed two speeches and ........................... 113
Figure 3.14: The original signals and de-mixed signals by using................................... 113
Figure 3.15: Comparison of SIRs (dB) of demixed two speeches and music signals by
using different ICA algorithms in instantaneous BSS task with additive Gaussian
noise. ....................................................................................................................... 114
Figure 4.1: configuration of the two experimental setups that were conducted by ........ 138
Figure 4.2: Results obtained in Test1 experiments. The SIR performance of ................ 140
Figure 4.3: Results obtained in Test1 experiments. The SDR performance ................... 141
Figure 4.4: Corresponding CPU time for each method. ................................................. 142
Figure 4.5: Figure 4.5: Results obtained in Test2 experiments. The SIR performance .. 143
Figure 4.6: Results obtained in Test2 experiments. The SIR performance of the .......... 143
Figure 4.7: Results obtained in the Test1 experiments [130]. Best performance ........... 145
Figure 4.8: Results obtained in the Test1 experiments [130]. ........................................ 146
Figure 4.9: Results obtained in the Test2 experiments [130]. ........................................ 146
Figure 4.10: Results obtained in the Test2 experiments [130]. ...................................... 147
Figure 4.11: Impact of FFT length, 2-by-2 case, Results obtained in the....................... 147
Figure 5.1: Average BER as a function of SNR for 30 users and 100 runs.................... 165
Figure 5.2: Average BER as a function of SNR for different sample lengths T with 30
users and 100 runs. Black triangle right lines: M = 104. Black circle lines: T = 9000.
Black hexagram lines: T = 8000. Black square lines: T = 7000. Red triangle up lines:
M = 6000. Blue circle lines: T=5000. Blue hexagram lines: T=4000. Blue square
lines: T=3000. Blue triangle right lines: T=2000. ................................................. 166
Figure 5.3: Average BER as a function of SNR for different Users K with signal blocks
composed of T = 3500 samples and 100 runs. Black triangle right lines: K = 10.
Black circle lines: K = 20. Red triangle up lines: K = 30. Blue square lines: K=40.
Blue hexagram lines: K=50. ................................................................................... 167

xv

Figure 5.4: Average extraction quality as a function of computational cost for different
mixture sizes K with signal blocks composed of T = 3500 samples and 1000 mixture
realizations. Solid lines: K = 5. Dashed lines: K = 10. Dotted lines: K = 20. ........ 169
Figure 5.5: Average extraction quality as a function of computational cost for different
mixture sizes K with signal blocks composed of T = 3500 samples and 1000 mixture
realizations. Solid lines: K = 30. Dashed lines: K = 40. ......................................... 170
Figure 5.6: Average extraction quality as a function of computational cost for different
mixture sizes K with different signal blocks T samples and 1000 mixture
realizations. Solid lines: K = 20. Dashed lines: K = 30. Dotted lines: K = 40. ...... 171
Figure 6.1: Signal generation model for a typical QPSK DS-CDMA system ................ 179
Figure 6.2: Feedback Demixing Structure I.................................................................... 188
Figure 6.3: Feedback Demixing Structure II .................................................................. 190
Figure 6.4: Average BER as a function of SNR for DS-CDMA downlink. Using Gold
codes G=63. (a) Using 30 users (b) Using 50 users................................................ 195
Figure 6.5: Corresponding CPU time for each method. ................................................. 196
Figure 6.6: Average BER as a function of SNR for DS-CDMA downlink. Using OVSF
codes G=64. (a) Using 30 users (b) Using 50 users................................................ 197
Figure 6.7: Average BER as a function of SNR for WS-CDMA downlink. .................. 199
Figure 6.8: Average BER as a function of SNR for WS-CDMA downlink. Using OVSF
codes G=64. (a) Using 30 users (b) Using 50 users................................................ 200
Figure 6.9: Average BER as a function of SNR for DS-CDMA downlink. ................... 201
Figure 6.10: Average BER as a function of SNR for WCDMA downlink..................... 202
Figure 6.11: Average BER as a function of SNR for various number of users K ......... 203
Figure 6.12: Average BER as a function of SNR for various sample sets M ................. 204
Figure 7.1: Average BER as a function of SNR for 15 users ......................................... 215
Figure 7.2: Average BER as a function of SNR for 15 users with L=2N, L=3N. .......... 216
Figure 7.3: Average BER as a function of SNR for 15 users For various ...................... 217
Figure 7.4: Average BER as a function of SNR for 15 users with L=1000. .................. 218
Figure 8.1: The proposed Mixer (Multiplier) schematic. ............................................... 230

xvi

Figure 8.2: Voltage Conversion Gain versus IF ............................................................. 232
Figure 8.3: Voltage Conversion Gain versus IF ............................................................. 233
Figure 8.4: Voltage Conversion Gain versus IF ............................................................. 234
Figure 8.5: Voltage Conversion Gain versus IF ............................................................. 234
Figure 8.6: The proposed Mixer (Multiplier) Layout. .................................................... 235

xvii

LIST OF ALGORITHMS
Algorithm 3.1: ICA Based on the gradient descent .......................................................... 93
Algorithm 3.2: ICA Based on pairwise gradient decent scheme ...................................... 94
Algorithm 3.3: ICA Based on pairwise Jacobi scheme .................................................... 95
Algorithm 6.1: RAKE based FastICA method .............................................................. 192
Algorithm 6.2: RAKE based RICA method .................................................................. 193

xviii

1 Chapter 1

Introduction

Blind techniques have been used in studies since the 1980’s, when the first
adaptive equalizers were designed for digital communication [1]. The problem with using
this technique was to estimate an unknown linear single input signal output (SISO)
stationary channel, without any knowledge about the input signal. The word “blind”
implies that all signal processing techniques recover both the unknown mixing systems
and unknown sources based only on the observations [1], [2].
The Blind Source Separation (BSS) problem was found in the framework of
neural modeling around 1982 by Bernard Ans, Jeanny Herault and Christian Jutten [1]. It
then gained considerable attention in more diverse research areas after Comon published
his pioneering paper in a signal processing journal on Independent Component Analysis
(ICA) in 1994 [7]. In 1995, Bell and Senjnowski were boosting the ICA topic by
developing the infomax algorithm [18]. Meanwhile, the well-known JADE algorithm was
proposed by Cardoso and Soul, in 1993 [16]. Although various BSS algorithms have
been developed with numerous contrast functions for optimization over the last decade,
BSS is still considered one of the most important research topics in signal processing. It
has generated a lot of interest in the last decade [1], [3].

1

Figure 1.1: Illustration of the BSS Problem [2]
BSS is considered to be an unsupervised stochastic method which separates the
underlying source signals from their mixtures, without any knowledge about the source
signals or the mixing process. Recently, Independent Component Analysis (ICA) has
become a vital algorithm in BSS, Figure 1.1. ICA has been a very important topic in
many research areas [1-3], [12-27], i.e.: biomedical engineering, image processing,
wireless communication systems, speech enhancement, remote sensing, etc. ICA is
related to Principle Component Analysis (PCA) and Factor Analysis (FA) in multivariate
analysis. It specifically corresponds to second order methods in which the components or
2

factors are a Gaussian distribution. However, the ICA is a statistical technique that
includes higher order statistics (HOS) where; the goal is to represent a set of random
variables as a linear transformation of statistically independent components.
The ICA aims to recover both the unknown source signals and mixing system or
one of them from only observed system outputs. This important research topic includes
several concepts of signal processing, information theory, statistics and probability,
neural networks, etc. The ICA has several applications in many fields such as image
processing, wireless communications, biomedical applications and audio source
separation. In addition, the ICA problem appears in many multi-sensor systems [1]. ICA
methods are essentially based on parameter estimation, which requires a model of
separating system, objective criteria and optimization methods.

Figure 1.2: Block diagram of blind source separation
The BSS can be expressed as in Figure 1.2, where S = {s1 , s2 , … , sn } represents
the n source signals, and X = {x1 , x2 , … , xm } represents the m observed signals, A or H
represents the mixing systems which correspond to whatever is static or dynamic. Two
3

mixture models have been considered. Firstly, linear static mixtures, which assume that
the mixing system A is memoryless, are called linear instantaneous mixtures. Secondly,
the linear convolutive mixtures, which assume that the mixing matrix H is a memory
system, vary with time. In order to solve the BSS problems, we need assumptions to
apply; otherwise this problem is considered to be ill-posted. The common assumption in
BSS\ICA field is the mutual statistical independence among the original sources.
Although sometimes this assumption is problematic to establish, it is realistic and fully
justified in several applications. Other suitable assumptions can be used successfully for
solving BSS problems depending on the applications.
In this dissertation, we investigate and study the two models of mixing systems:
instantaneous mixtures model in (1.1) and convolutive mixtures model in (1.2),
respectively.
‫(ݔ‬t) = A‫(ݏ‬t) + ‫(ݒ‬t)															t = 1,2, … , T

(1.1)

Where
•

A = [aij ]௠x୬ is the memoryless mixing system.

•

‫[ = )ݐ(ݏ‬sjt ]௡xT is the ݊ original source signals to the system

where the T is the sampled length.
•

‫[ = )ݐ(ݔ‬xit ]௠xT is the ݉ observed signals of the system

•

‫(ݒ‬t) = [nit ]௡xT is the noise and usually assume to be an

additive white Gaussian noise (AWGN)

‫(ݔ‬t) = ∑௅d Hd ‫(ݏ‬t − d) + ‫(ݒ‬t)									∀						t = 1,2, … , T
4

(1.2)

Where
•

Hd (t) = [hijd (t)]௠x௡ is the mixing matrix at dth delay

which represents d = 1,2, … , L
•

hijt (t) is the impulse response of time instant t.

We study these two models intuitively in next chapter. Also, In terms of
applications, we address the problem of BSS in a cocktail party problem [3], [38] and
[120]. In additional, we are going to address its application in wireless communication
systems specifically using blind multiuser detections in CDMA system [89], [75], [59].
Furthermore, a system realization of one of the developed algorithms will be explored
among ASIC or FPGA platforms factoring in cost, effectiveness, and economics of scale.

1.1 Blind Techniques in Wireless Communication
In telecommunication systems [74], [75], [89], [91], the most essential challenge
has been to set up the system based on simultaneous multiuser access in order to get
higher efficient wireless systems. However, several state-of-the-art approaches have been
proposed in literature [74], [85], [94] to overcome this challenge such as the trainedbased systems. These techniques periodically enforce the user to send a known training
sequence for the receiver in order to make the receiver able to estimate the parameter of
the propagation channel. They are caused by the multiple reflections of the radio waves
on the obstacles encountered, e.g. buildings, cars, and trees etc. Furthermore, according
to [60], [74], it has been reported that 20% of the bandwidth is devoted to the training
sequence in a GSM system and up to 40% in a UMTS system. In spite of the good
performance of the aforementioned technique, the cost tends to be significantly large in
terms of bandwidth. The efficiency of most communication systems is based on the
5

bandwidth and the transmitted power. However, the blind multi user techniques are a
promising area in wireless communication systems because of its potential to ensure the
high communication rate and spectral efficiently, thereby reducing or disposing of the
training sets.

Figure 1.3: Wireless Communication Scenario [2]
Blind techniques are considered to be an attractive field of work because the
following reasons: 1) reduce the training sequence; 2) help the trained-based systems
back up in fast-time varying channels and at severe multipath fading, respectively. Also,
Blind techniques help to recover the signals in other situations such as eavesdropping,
where using the training sequence is not possible [59], [74]. For these reasons, we are
motivated to do additional research in this area in order to design a new multi user
detection that performs well in a multipath propagation environment. And, It has to be
more robust to the outlets “versus type of noise” in terms of performance.

6

1.1.1

Direct-Sequence Code Division Multiple Access
Code division multiple access (CDMA) is a multiplexing technique or a channel

access method that allows several users to access to the same multi-point transmission
medium “RF channel” asynchronously and simultaneously, to transmit over it and to
share its capacity. CDMA is used by various radio communication technologies such as
CDMAONE “the mobile phone standard”, CDMA2000 “the third generation of
CDMAONE, WCDMA “the third generation standard used by GSM carriers”, and
Evolved High-Speed Packet Access (HSPA+) [74]. Although, LTE (4G) is in operation
in many cellular companies inside and outside the U.S., the networks are still not fully
built out, and LTE coverage is still not universal. Thus, the most of the older 2G and 3G
systems are still in charge or at least working in parallel with the 4G as in U.S.
companies, AT&T and T-Mobile use GSM/WCDMA/HSPA while Verizon, Sprint, and
MetroPCS use cdma2000/EV-DO. Moreover, The LTE wireless interface is incompatible
with 2G and 3G networks, so that it must be operated on a separate wireless spectrum.
Nevertheless, 3G is intended to be replaced by 4G technologies sooner or later, but it is
going to take a long time before the LTE coverage is developed to be fully operated and
universal, especially in some countries worldwide, such as India, Pakistan, Iraq … etc.
[141,[142].
One of the most interesting concepts in data communication is the idea of
allowing multiusers to share and send information simultaneously over a single
communication channel. This means, several users share a band of frequencies called
“Bandwidth”. Despite the fact that the CDMA is suitable for satisfying the demand for
higher data rates and an inherent capability of CDMA is resisting interference and
7

providing a secure channel, Multiuser detection in CDMA systems usually suffers from
the multi-access interference (MAI) and inter-symbol interference (ISI) due to the nonideal cross-correlations between users’ spreading sequences. The concept of spread
spectrum communication was proposed for military purposes and used in anti-jamming
techniques. Later, the spread spectrum techniques were employed to civilian purposes.
In highly loaded systems [74], [88], conventional detectors are considered an
unsuitable choice, since most of them suffer from the external interference sources such
as adjacent channel interference or jamming, and treat the interference as an additional
background noise. These drawbacks have motivated development of numerous
interference rejection techniques [89], [91].
In CDMA literature [74], [76], most of the conventional Multiuser detection
methods assume the low statistical correlation between desired users and interfering ones,
which motivated them to use the SOS properties of the received data. Thus, one relatively
new idea is to extend the work on higher order statistics (HOS) in order to make the
methods robust and secure against incomplete cross-correlation and a near-far problem,
which are considered to be additional drawback factors in conventional detection
methods. BSS based on HOS [1], [2] [8-21] are able to recover the signals from the
mixtures without any knowledge about the waveform structure “modulation” and mixing
coefficients. Lastly, the adaptive LMMSE detector has been proposed to overcome the
complex matrix inversion operation, but this detector still needs the spreading codes of all
users. Therefore, the Adaptive MMSE detector might not be realistic in the downlink
receiver. Therefore, one of the main emphasizes in this dissertation is to develop and
implement the BSS/ICA algorithm to assist Multiuser detection methods in order to

8

mitigate different types of interference sources in CDMA system, especially in DS/W
CDMA downlink systems.

1.1.2

Why Blind
Blind Multiuser detectors are promising because they only require the received

signal with neither prior knowledge of any training signals nor the user spreading codes
in order to equalize the channels and estimate the multiple symbol sequences associated
with all users in CDMA systems. On the other side, several state-of-the-art approaches
have been proposed in literature to overcome this challenge such as the trained-based
systems [74], [85- 86]. These techniques periodically enforce the user to send a known
training sequence for the receiver in order to make the receiver able to estimate the
parameter of the propagation channel. In spite of the good performance of the
aforementioned technique, the cost tends to be significantly large in terms of bandwidth.
The efficiency of most communication systems is based on the bandwidth and the
transmitted power. Therefore, the blind techniques usually perform well and more robust
in terms of the estimation the symbol under the ill-condition environment i.e. under
severe multipath fading channels. One can also incorporate prior information such as
spatial knowledge or a set of short training sequences if available, in order to construct
the semi-blind detectors [75], [78], [89], [91], [96].
The reasons to apply the blind multiuser detections are as follows [2]:
1) Training examples for interference are often not available [74], [78].
2) In rapid time varying channels, training may not be efficient [79].

9

3) Capacity of the system can be increased by eliminating or reducing
training sets [60], [74].
4) Multi-path fading during the training period may lead to poor source or
channel estimations. [87-89]
5) Training in distributed systems requires synchronization and/or sending a
training set each time a new link is to be set up. This may not be feasible
in a multi-user scenario [90 -93].

1.1.3

CMDA SIGNAL MODELS
In this section, we briefly present the signal model for a CDMA implementation

using one layer of spreading codes only. Next, we briefly describe the DS-CDMA signal
and WCDMA signal models in a typical synchronous CDMA system employed for
indoor ATM and certain ad hoc wireless networks [75], [81], and [89].
1.1.3.1 DS-CMDA Receiver Signal Model
In a DS-CDMA system, several users share the medium simultaneously by using
their own signatures. The simplest received signal model r(t)	before filtering in a symbol
interval is given by
୏
୐ିଵ
r(t) = ∑M
mୀ1 ∑୩ୀଵ ∑୪ୀ଴ α୪୫ b୩,୫ s୩ (t − mTୠ − d୪ Tୡ ) + n(t)

(1.3)

where
l, k,	and	m	represent	the	path,	user	and	symbol	indices,	
respectively.	
α୪୫ 	 is	 the	 path	 gain	 since	 in	 downlink	 model;	 the	 path	
gain	 does	 not	 	 differ	 among	 users	 because	 all	 users’	
10

signals	 are	 sent	 together	 and	 the	 path	 gain	 α୪୫ 	 and	
propagation	delay	factor	d୪ 	depend	only	on	the	number	
of	paths.	
b୩,୫ 		is	symbol.	
s୩ (. )	is	spreading	code	(	chip	sequence)	
d୪ 	is	the	propagation	delay	factor,	d୪ ∈ {0, 1, … ,

େିଵ
ଶ

}		

C	is	the	number	of	chips	per	symbol.	
Tୠ , Tୡ , t	 	 are	 time	 ,	 symbol	 and	 chip	 duration,	
respectively.	
n(t)	 is	 an	 additive	 white	 Gaussian	 noise	 (AWGN)	
channel.	
In this dissertation, the system is assumed to be time-invariant which means that
the channel parameters are much slower than the frequency of the transmitted
symbol data. However, let us assume that G is the number of code sequence, K is the
number of users, and L is the number of channels; thus, the vector form of the
equation (1) will become:
‫ = ܚ‬۶‫ ܊܁‬+ ‫ܖ‬

(1.4)

Where ‫ ܚ‬is the received vector signal, ۶ is an (G + L − 1)	x	G matrix which
represents the multipath propagation coefficients, ‫ ܁‬is an G	x	K block diagonal
matrix, ‫ ܊‬is an K − d vector which represent the data symbols and ‫ ܖ‬is the
(G + L − 1) − d channel noise vector with covariance matrix ‫ۿ‬. This model received
signals (2) is suitable for deriving the linear symbols detectors such as the MF, the

11

RAKE, the LMMSE and the blind Detectors based on FastICA and Robust ICA
algorithms [89], [88] – [75] .
An alternative signal model is proposed in [89], [81] as a linear convolutive
model given by:
ഥ ࢔ + ࢔࢔
࢘࢔ = ࡴ૙ ࢈࢔ + ࡴ૚ ࢈࢔ି૚ + ࢔࢔ = ࡴ࢈

(1.5)

where
࢘࢔ is the received signal vector;
‫[ = ܖ܊‬bଵ (n), … , b୏ (n)]୘ is a current bits of all users;
۶૙ = [‫ܐ‬૚ , … , ‫ ] ܓܐ‬is signature matrix of the current bits of
all users including MAI;
૙
‫ۍ‬
‫ې‬
ࢎ࢑ (૙)
‫ێ‬
‫ۑ‬
.
‫ۑ‬
ࢎ࢑ = ‫ێ‬
.
‫ێ‬
‫ۑ‬
.
‫ێ‬
‫ۑ‬
‫ ࡺ( ࢑ࢎۏ‬− ࢊ࢑ − ૚)‫ے‬

(1.6)
തതതത૚ , … , ࢎ
തതതത࢑ ] is the signature matrix of the previous
ࡴ૚ = [ࢎ
bits of all users including ISI;
where
ࢎ࢑ (ࡺ − ࢊ࢑ )
‫ۍ‬
‫ې‬
.
‫ێ‬
‫ۑ‬
.
ഥ࢑ = ‫ێ‬
‫ۑ‬
ࢎ
.
‫ێ‬
‫ۑ‬
‫ ࡺ( ࢑ࢎێ‬+ ࡹ − ૚)‫ۑ‬
‫ۏ‬
‫ے‬
૙
(1.7)
ࡴ = [	ࡴ૙ ; ࡴ૚ ] is the signature matrix of all users;
࢈࢔ = [ܾଵ (݊), … , ܾ௄ (݊)]் are currents bits of all user;
࢈࢔ି૚ = [ܾଵ (݊ − 1), … , ܾ௄ (݊ − 1)]் are previous bits of all
users;
்
ഥ࢔ = [ܾ௡் ;	ܾ௡ିଵ
࢈
]் are bits of all users;

12

࢔࢔ = [࢔(࢔ࡺ), … , ࢔(࢔ࡺ + ࡺ − ૚)]ࢀis independent white
Gaussian Noise vector.
In uplink (asynchronous) CDMA systems, one can assume that H଴ and Hଵ are
mutually independent. Therefore, H is a full column matrix and its rank is 2K as shown in
[88], [89]. In downlink (synchronous) CDMA communication [75], [81], H is a matrix,
has full-rank and less restrictions as seen in [89]. In additional, our proposed algorithms
are working well in asynchronous CDMA system [83], [81], the main focus in this
dissertation is the synchronous CDMA communication system.
1.1.3.2 WCDMA Receiver Signal Model
In a WCDMA system, the presence of scrambling codes makes the system different
from the DS-CDMA system. The main reason behind the MAI in the WCDMA system is
the intra-cell multiple user signals sharing the same multipath channels. However, the
simplest received signal model r(t)	before filtering in a symbol interval is given by:
ெ

௄

௅

࢘(‫ = )ݐ‬෍ ෍ ෍ ߙ௟௠ ܾ௞,௠ ܿ௞ (‫ ݐ‬− 	 ݀௟ ܶ௖ )‫ݏ‬௞ (‫ ݐ‬− ݉ܶ௕ − ݀௟ ܶ௖ ) + ݊(‫)ݐ‬
௠ୀଵ ௞ୀଵ ௟ୀ଴

(1.8)
Where c୩ (t) ∈ {±1	 ± j} are the complex cell-specific scrambling sequences, the rest
of the variables are defined in a similar manner to the model (1.8). The received signal at
UE/MS is passed through a chip-matched filter and sampled at chip rate. The received
vector r in this case can be expressed as
࢘ = ࡴ࡯ࡿ࢈ + ࢔

(1.9)

Where C is the G x G complex diagonal scrambling matrix with ۱۱ ୌ = Iୋ୶ୋ and the
rest of the variables are defined as similar in (1.9). The form of C is given by:
13

࡯	 = 	ࢊ࢏ࢇࢍ[ࢉ૚ 			ࢉ૛ …				ࢉ૜ 	]

(1.10)

Where
ܿ௜ ∈ {±1	 ± ݆}						∀		1 ≤ ݅	 ≤ ‫ܩ‬

1.2 The State Space Framework
In this dissertation, we investigate the state space framework in order to design a
blind multiuser detection based on the state space approach. The Linear State Space
approach is an extension of the static “instantaneous” ICA model which easily extends
further to a flexible nonlinear model as in [2], [95].
Despite the fact that it is one of the most powerful tools in terms of Blind Source
Separation, there are several reasons that have made us interested in this specific model
[2]. [95], [135]:
1. Flexible and Universal Linear Model: the mixing and filtering processes in the
state space approach may have different mathematical and/or physical models, i.e.
multichannel deconvolution models (MCD), Finite Impulse Response (FIR),
Moving Average (MA), Autoregressive (AR) and Autoregressive Moving
Average (ARMA), etc [2].
2. Many canonical realizations: it provides us with many canonical realizations of
the same dynamic system using the equivalent transformations.
3. The Linear State Space approach is an extension of the static “instantaneous” ICA
model and it is easy to extend it further to a flexible nonlinear model.
4. The state space models provide us two subsystems with different update
approaches, 1) a linear and memory less output layer 2) a non-linear or linear
recurrent networks.
14

5. Recoverability:

the inverse of the state space representation depends on the

invert-ability of the mixing matrix between the input and the output.

Figure 1.4: Conceptual State-Space model which illustrates the general linear
state-space mixing and self-adaptive demixing model for Dynamic ICA [2].

1.3 Adaptive Framework for Blind Source Separation
Generally, the BSS consists of recovering the unobserved sources, denoted in
vector notation as	s(t) = (s1 (t), s2 (t), … , sn (t))T 		 ∈ R௡ , with assuming zero mean and
stationary with observed mixtures, x(t) = (x1 (t), x2 (t), … , xm (t))T 		 ∈ R௠ , which can be
written:
x(t) = Φ(s(t))
15

(1.11)

Where Φ is an unknown mapping from R௡ in	R௠ , and t represents the sample
index, which can stand for instance of time; so, we can divide the BSS problems based on
the type of Φ mapping into two groups, one called the invertible mixtures and second
called Underdetermined mixtures “non-invertible” [1].

1.3.1

Invertible Mixtures
If the mapping Φ is invertible, which means satisfying the condition	݉ ≥ ݊,

where ݉ and ݊ are the numbers of sensors and the number of sources, respectively. In
this case, identification of Φ or of its inverse, directly leads to source separation, i.e.
provides estimated sources y(t) such that:

y(t) = Wx(t) = W ∗ H ∗ s(t) = P ∗ D ∗ s(t)														

(1.12)

Where
P – is a generalized permutation matrix.
D – is a scaling matrix which is a diagonal matrix.
This equation shows typical indeterminacies of BSS problem. According to the
nature mixing, BSS problem can be more or less complicated. For example, simple
instantaneous mixtures when Φ is restricted to a simple mixture	A, sources are estimated
up to a permutation and a scale. Invertible mixtures can be divided into following two
categories [1]:
1.3.1.1 Instantaneous mixtures
One modest approximation is to assume that the mixing system A in Figure 1.2 is
instantaneous, assuming that A is a mixing matrix, n(t) is an additive noise, s(t) is the
16

source signals and x(t) is the observed signals, then the instantaneous model can be
expressed as follows:

x(t) = As(t) + n(t)															t = 1,2, … , T

(1.13)

(t)
(t)
x (t)
aଵଵ 				aଵଶ 				. ..						aଵ୒ ‫ ۍ‬sଵ ‫ ۍ ې‬nଵ ‫ې‬
‫ ۍ‬ଵ ‫ې‬
x (t)
s (t)
n (t)
‫ ێ‬ଶ ‫ ۍ ۑ‬aଶଵ 				aଶଶ 				. ..						aଶ୒ ‫ ێ ې‬ଶ ‫ ێ ۑ‬ଶ ‫ۑ‬
‫ێ‬
‫ۑ‬
.
.
‫ێ‬
‫							…					… = ۑ‬. ..									… ‫ێ‬
‫ۑ‬+‫ ێ‬. ‫ۑ‬
‫ێ‬
‫ۑ‬
‫ ێ‬. ‫							…					… ێ ۑ‬. ..									… ‫ ێ ۑ‬. ‫ ێ ۑ‬. ‫ۑ‬
‫ ێ‬. ‫ۏ ۑ‬a 				a 				. ..						a ‫ ێ ے‬. ‫ ێ ۑ‬. ‫ۑ‬
୑ଵ
୑ଶ
୑୒
‫ۏ‬x୑ (t)‫ے‬
‫ۏ‬s୒ (t)‫ۏ ے‬n୑ (t)‫ے‬

(1.14)

For notational and mathematical convenience, we will assume that we have the
number of observed signals equal to the number of source signals, i.e.:	݉ = ݊. We also
assume that the mixing matrix A is a full rank matrix. In addition, we assume that the
noise in the mixtures is an additive Wight Gaussian noise.
The BSS problem aims to retrieve the original sources from the given
observations. In the instantaneous model, we only need to estimate un-mixing matrix W
that equals the inverse of the mixing matrix	W = Aି1 to recover the original signals s(t)
almost directly. Estimate the unmixing matrix W ≈ Aି1 from a given set of
observations	x(t) that can retrieve the original signals via a linear transform:
y(t) = Wx(t) 		 ≈ s(t)				t = 1,2, … , T

(1.15)

In order to measure the performance of the separation algorithm, we use the
performance matrix	G,
G = WA ≈ I

(1.16)

where I is an identity matrix. Ideally, the performance matrix G would be closed
to an identity matrix for an efficient separation algorithm. Despite the fact that the
17

separated sources y(t) may not be the same order and scale as the original sources, the
matrix G should be an identity up to scale and a permutation.
1.3.1.2 Convolutive Mixtures
A convolutive mixture can be considered as a natural extension of the
instantaneous BSS problem. Assume an m-dimensional vector of received discrete time
signals x(k) = [x1 (k), x2 (k), … , xm (k)]T at time k is assumed to be produced from an ndimensional vector of source signals	s(k) = [s1 (k), s2 (k), … , sm (k)]T , where	m ≥ n, by
using a stable mixture model [2]:
∞

x(k) = ෍ Hp s(k − p) = Hp ∗ s(k),				
pୀି∞

with	 ∑∞
ି∞ ‖Hp ‖ ≤ ∞	

(1.17)

Where ∗ represents the linear convolution operator and Hp is an (m	x	n) matrix of
mixing coefficients at time-lag	p.
Assume that elements hjip denote the coefficients of the Finite Impulse Response
(FIR) filter Hp , and L is the maximum unknown channel length. Then, the noise-free
convolutive model is written as follows:
Lି1
x(k) = ∑pୀ0
Hp s(k − p)		

(1.18)

Thus, one can find an approximate inverse channel matrix Wp in order to recover
the source signals s(k) = [s1 (k), s2 (k), … , sm (k)]T such that
y(k) = Wp ∗ x(k) = ∑Qି1
ො(k)	
pୀି0 Wp x(k − p) = s

(1.19)

where Q is the length of the inverse of the channel impulse response. However,
there are two approaches to solve this problem and recover the source signals. In time

18

domain approaches, they have several general drawbacks such as Q should be selected at
least equal to the unknown true channel	L. Therefore, for long mixing filter, which means
long transfer functions, the computation will be expensive [2], [133]. Also, using the IIR
filter instead of long FIR filter to overcome this problem really suffers from the
instability and need to invert the non-minimum phase filters [2], [38], [133]. Moreover,
timing approaches are sensitive to channel order mismatch – see [2], [133] for a recent
survey. However, time domain methods are suitable and very efficient for small mixing
filters such as in communication channel [133]. For all these limitations, we focus our
study on frequency approaches to solve the cocktail party problem [3], [38]. Also, the
main advantage of a frequency domain BSS approach is the ability to apply the set of any
instantaneous ICA algorithms to solve the convolutive BSS problem. On other hand, the
main challenge of BSS in the frequency domain is to deal with the permutation and
scaling ambiguities - see [1], [3], [38] for a recent survey.
However, one can re-map the aforementioned BSS models into frequency
domain by applying the Discrete Fourier Transform (DFT) on the observed signals x(k)
in order to transform it to the instantaneous mixtures problem as follows
x(k) = H ∗ s(t)		x(q, w) ≈ H(w)s(q, w)

(1.20)

where w is a frequency index, q is a frame index, s(q, w) = [s1 (q, w), … ,
sm (q, w)]T and x(q, w) = [x1 (q, w), … , xn (q, w)]T .
In the previous equation, it is considered to be valid only for periodic signals	s(t)
[3]. However, it is approximately valid if the time-convolution is circular. Therefore, to
ensure that the time convolution is circular, it requires making the Fourier Transform
length significantly larger than the maximum length of the mixing channels L [3]. In [38],
19

they imposed the spectral smoothing approach in order to mitigate the circularity effect in
frequency domain BSS methods. We will study these effects intuitively in next chapter.

1.3.2

Underdetermined mixtures
If the number of observations (sensors) ݉ is less than the number of sources	n,

the mixing process is referred to be underdetermined (not invertible) [1], [52], [53].
The separation processes can be attained successfully in the frequency-domain up
to scaling and permutation ambiguities under the assumption that the mixing matrix ‫)݂(ܪ‬
is full column rank at each frequency bin. However, when the number of source signals
is more than the number of sensors, the assumption on the mixing matrix ‫ )݂(ܪ‬becomes
not valid. So, in this case the problem is more difficult since the mixing matrix ‫)݂(ܪ‬
becomes ill-conditioned matrix which means the mixing matrix ‫ )݂(ܪ‬is not left pseudoinvertible. However, a lot of work has been done in order to perform a good separation
process in the case of the instantaouse mixture [1]. However, there are not so many works
that has been done on the underdetermined case in the convolutive mixture [1], [38]. In
the literature, the well-known algorithm of such method is the DUET algorithm which is
proposed by Rickard et al [2], [3], [38] and [117]. The DUET algorithm assumes a
specific delayed model that only works for audio signals with small delay, e.g. hearing
aid etc. The DUET algorithm performs the separation processes using the two sensors in
order to compute two parameters amplitude differences and phase differences between
the source signals. Several papers were published to develop and enhance the
performance of the DUET algorithm in [3], but their performance in real reverberant
environment is still limited. One of the promising approaches in this field is to convert
20

some of the underdetermine cases of the instantounous mixture into the frequency
domain in order to tackle the underdetermined problem in the convoluvtive mixture as
presented in the literature [1], [3], [38].

1.4 Dissertation Contributions
The contributions of this thesis are summarized as follows:
•

We perform a thorough review of the BSS\ICA algorithms, and then we give
an overview of the ICA algorithms and emphasize the approaches that
influenced our work. We also study some of the methods that have been
developed to solve the ICA problems in the case of instantaneous and
convolutive mixtures.

•

Independent Component Analysis (ICA) is a crucial tool in Blind Source
Separation (BSS). In this thesis, we present a new Convex Cauchy–Schwarz
Divergence (CCS-DIV) measure for Blind Source Separation (BSS) and
unsupervised learning of acoustic and speech signals. This CCS-DIV measure
is developed by integrating convex functions into the Cauchy–Schwarz
inequality. By including a convexity quality parameter, the measure has a
broad control advantage of its convexity. With this measure, a new CCS–ICA
algorithm is structured and a non-parametric form is developed incorporating
the Parzen window-based distribution. Furthermore, the CCS–ICA algorithm
has a controlled speed towards timed convergence. Several case-study
scenarios were carried out on instantaneous and noisy mixtures of speech
signals. Finally, the superiority of the proposed CCS–ICA algorithm is
demonstrated in metric performance comparison with FAST ICA, Robust
21

ICA, convex ICA (C-ICA), and other existing algorithms based on mutual
information and Jenson’s inequality.
•

Two pairwise iterative schemes are proposed to tackle the high dimensionality
problem. Two pairwise schemes non-parametric independent component
analysis ICA algorithms based on a new high-performance Convex Cauchy–
Schwarz Divergence (CCS-DIV). These two schemes enable fast and efficient
de-mixing of sources in real-world applications where the dimensionality of
the sources is high. Finally, the performance superiority of the proposed
schemes is demonstrated in metric-comparison with FastICA, RobustICA,
convex ICA (C-ICA), and other leading existing algorithms.

•

We propose a frequency-domain method based on robust independent
component analysis (RICA) to address the multichannel Blind Source
Separation (BSS) problem of the convolutive speech mixtures in highly
reverberant environments. We impose regularization processes to tackle the
ill-conditioning problem of the covariance matrix and to mitigate the
performance degradation in frequency domain methods. We apply an
algorithm to separate the source signals in adverse conditions, i.e. high
reverberation conditions when short observation signals are available.
Furthermore, we study the impact of several parameters on the performance of
separation, e.g. overlapping ratio and window type in the frequency domain
method.

We also compare different techniques to solve the permutation

ambiguity. Through simulations and real-world experiments, we verify the
superiority of the presented algorithm among other BSS algorithms, i.e.

22

recursive regularized ICA (RR-ICA), independent vector analysis (IVA) and
others.
•

Code Division Multiple Access (CDMA) is a channel access method used by
various radio technologies and it is based on spread-spectrum technology. In
general, CDMA is used as an access method in many mobile standards such as
CDMA2000, and WCDMA.

We address the problem of blind multiuser

equalization in the wideband CDMA system, in the noisy multipath
propagation environment. Herein, we propose three new blind receiver
schemes, which are based on the state space structures. This so-called blind
state-space receivers (BSSR) does not require knowledge of the propagation
parameters or spreading code sequences of the users but relies on the
statistical independence assumption between the source signals. Also, we
develop and derive three update-laws in order to enhance the performance of
the blind detector. Additionally, we upgrade three semi-blind adaptive
detectors based on the corporation between the RAKE receiver and the
stochastic gradient algorithms which are used in several blind adaptive signal
processing algorithms, namely FastICA, RobustICA, and principle component
analysis PCA. Bit error rate (BER) simulations of these methods are shown
for different number of users, signal to noise ratio (SNR) and different number
of symbols per user in comparison with the Blind Multiuser Detectors
(BMUD), Linear Minimum mean squared error (LMMSE) and other
conventional detectors. The results show that the proposed algorithm
outperforms the other detectors in estimating the symbol signals from the

23

mixed CDMA received signals. Moreover, the new blind detectors mitigate
the multi access interference (MAI) in CDMA.
•

A new blind detection algorithm, based on fourth order cumulant matrices, is
presented and applied to the multi-user symbol estimation problem in Direct
Sequence Code Division Multiple Access (DS-CDMA) systems. The blind
detection is to estimate multiple symbol sequences in the downlink of a DSCDMA communication system using only the received wireless data and
without any knowledge of the user spreading codes. The proposed algorithm
takes advantage of higher cumulant matrix properties to reduce the
computational load and enhance performance. Bit error rate (BER)
simulations of this algorithm are shown for different number of users, signal
to noise ratios (SNR) and different number of symbols per user in comparison
with the FAST ICA and Robust ICA algorithms. The results show that the
proposed algorithm outperforms both ICA-based detectors in estimating the
symbol signals from the received mixed signals. Moreover, the proposed blind
detector is computationally fast and exhibits high convergence speed in
extracting user symbols.

•

In direct sequence code division multiple access DS-CDMA communication
system, the blind multiuser detection is presented for enhance the
computational complexity and mitigate the multiple access interference (MAI)
in the detector. The ill-condition of the covariance matrix of the received
signals degrades the performance of the linear minimum mean-squared error
LMMSE detector. Especially, when the Signal to noise ratio is high and small

24

data set is available for covariance matrix estimation. In this thesis, we
introduce a constrained blind multiuser detection in order to improve its
performance with imposing the regularization parameter to cope the illconditioning problem of the covariance matrix and to mitigate the
performance degradation. Through simulation results, we show that the
proposed method improves the performance of the blind multiuser detection
and outperforms the conventional multiuser detections.
•

Lastly, we investigate the ICA algorithms in terms of hardware
implementation. Although software implementation is important to investigate
the capabilities of ICA algorithms and to simulate significant aspects of
applications, Hardware implementation provides real time solutions and an
optimal parallelism method in terms of fast convergence.

25

2 Chapter 2

Literature Review

In this chapter, we perform a thorough review of the BSS\ICA algorithms, and
then we will give an overview of the ICA algorithms and will emphasize the approaches
that influenced our work. We will study some of the methods that have been developed to
solve the ICA problems in the case of instantaneous and convolutive mixtures,
respectively. Finally, for a more thorough review on ICA problems, applications and
methods, we recommend referencing one of these valuable books: “Handbook of Blind
Source Separation Independent Component Analysis and Applications” [1], “Independent
Component Analysis in frequency domain” [3] and “Adaptive Blind Signal and Image
Processing: Learning Algorithms and Applications” [2].

2.1 Introduction
Independent component analysis (ICA) considers a vital algorithm in Blind source
separation (BSS). ICA algorithms based on the information theatric approach are
attractive and have been considered a hot topic in signal processing for the last two
decades due to its potential in areas such as biomedical, wireless communication system,
audio separation and identification, etc. The goal of ICA is to recover the original source
signals from the mixtures without any further knowledge about the mixing coefficients
26

and the original sources. However, ICA is a statistical technique that includes higher
order statistics (HOS), where the goal is to represent a set of random variables as a linear
transformation of statistically independent components [1-3].
The idea of BSS was first introduced by J. Herault, C. Jutten and B. Ans) who
worked in neurophysiologies in the early 1982s [1]. They proposed a blind method to
separate the natural impulses coming from different parts of human body. [5], [40],[41].
Meanwhile, telecommunications related applications of ICA have been proposed even
earlier in MIMO systems [62],[101], [117], [118]. Also, ICA has been investigated and
implemented in several applications such as audio and biomedical signal processing and
feature extraction. An extensive review of the history of ICA and its applications is given
in [53], [56], [57]. BSS methods have several interesting applications. In finance, they
use the BSS algorithms to find the independent factors in data [74]. In Image processing,
they use the BSS algorithm to help estimate the best independent basis for compression
or denoising [14-16]. They are also used in biomedical signals, like in an EEG [11] or an
ECG, for analysis purposes [1-2]. In audio, they use the BSS algorithms to identify the
sounds or separate the audio signals as in the cocktail party problem [1-3], [51].
However, one of the most interesting applications of the BSS is in wireless
communication systems.

27

Temporal-Sapatial Mixing
Multichannel Blind Deconvolution (MBD)
ࡸ

࢞(࢑) = ෍ ࡴ࢖ (࢑)࢙(࢑ − ࢖) + ࢔(࢑)
࢖ୀ૙

Signal-Channel Blind Deconvolution
(SBD)

Instantaneous BSS/ICA (Spatial Case)
࢞(࢑) = ࡭࢙(࢑) + ࢔(࢑)

ࡸ

࢞(࢑) = ෍ ࢎ࢖ (࢑)࢙(࢑ − ࢖) + ࢔(࢑)
࢖ୀ૙

Supervised single-channel
Deconvolution

Principle Component Analysis

ࡸ

࢞(࢑) = ࡭࢙(࢑) + ࢔(࢑)

࢞(࢑) = ෍ ࢎ࢖ (࢑)࢙(࢑ − ࢖) + ࢔(࢑)
࢖ୀ૙

Where ࡭ is an orthogonal matrix

Where there are training sources

Figure 2.1: Classifications for BSS problem.
There are more than four communities that have worked on the BSS, especially
Independent component analysis (ICA), refer to [1] and [2]. However, one of the best
examples used to illustrate the BSS problem is the cocktail party problem. While
attending a cocktail party and using ones’ ears to listen and separate a specific sound
source among all other sounds present in the room, for example: people talking, music,
28

etc., in order to emulate this human behavior, researchers carry out the BSS idea from the
way our brain tackles this problem. Much research has been done over the last two
decades in BSS and ICA areas.
Recently, researchers proposed several BSS methods based on the frequency-time
analysis [3], [50] or using regularization algorithms [20], [21]. They conducted the matrix
factorization based on frequency-time analysis in order to get inproved performance and
speed up the convergence as in [4], [36], [47] and [49]. They also investigated the BSS
singles channel problem extensively in [14], [19] and [42]. Additionally, they employed
the independent vector analysis for joint BSS over multiple datasets [2] and studied the
sparse analysis in order to estimate the demixing matrix W due to a quadratic programing
technique and to deal with nonnegative BSS problems. However, sparse component
analysis was proposed in [50]. Despite all of these methods, they are not considered a
case of non-stationary conditions, the uncertainty of parameters in general ICA model
and the effects of noise signals.
On other hand, one can express the mixing system in various models based on the
non-stationary mixing coefficients and source signals in real-world recording. This
complicated circumstance may explain two scenarios. One can assume that the sources
are moving or the sources have disappeared. However, several works have been done on
both scenarios individually. Usually, for the scenario where the sources are moving, the
source distributions and the number of sources is assumed to be fixed. Thus, an adaptive
BSS algorithm has been proposed to compensate for the variations of a mixing matrix. In
[118], they applied the Markov process to hold the variation of the sources signals. Also,
the status of source signals was detected using the 3-D tracker, where if the sources were

29

moving, they added a beam-forming algorithm to BSS methods in [119]. Several
researchers have characterized the time-varying of the source distributions by automatic
relevance determination (ARD) techniques [120], [121], [122].

The switching ICA

algorithm [123] was proposed in order to detect the absence sources or the ones present.
S-ICA algorithm used a hidden Markov model in order to represent the status of the
source signals and it assumed that the generative model was fixed. In [123], they studied
the replacement of source signals. An online vibrational Bayesian (VB) learning was
used in [120]; they proposed a new online ICA algorithm based on VB learning to
separate the dependent source signals. In [125], they proposed the ICA algorithm based
on the piecewise non-Gaussian stationary model in order to separate the non-Gaussian
signals which have the varying distribution, refer to Figure 2.1 to see the root of BSS
problems.

2.2 Principle Component Analysis (PCA)
Principle Component Analysis (PCA) is one of the most well-known algorithms
in multivariate analysis and data mining. It was established by Pearson [2]. He proposed a
general framework of the PCA algorithm in biological context. Recently, there have been
many efficient and powerful adaptive algorithms PCA which have been proposed and
developed for PCA [1].
Generally speaking, the PCA aims to derive a smaller set of variables with less
redundancy while retaining as much of the information from the original variables as
possible. In other words, PCA is a mathematical tool that uses orthogonal transformation
in order to convert a set of observations, which might be correlated variables, into a set of
values of linearity uncorrelated variables referred to as the principle component. The
30

most important objectives of PCA are dimensionality reduction, determination of linear
combinations of variables, feature selection (choosing the most useful variables),
visualization of multidimensional data, identification of underlying variables, and
identification of groups of outliers.
Assume we have a random vector x with ܶ elements and there are ݉ observations
of this vector. In order to transform a set of observations x into a set of values of linearity
uncorrelated variables	u, we can apply the PCA to the observed data	x as follows:
The first step will be to remove the dc component of observed data	x as follows:
x = x − E[x]

(2.1)

The operator E[. ] is the expectation value of	x. In this dissertation, we use the
expectation operator for the theoretical analysis. For the practical simulation of the
algorithms, we will find that the expectation depends on the type of learning algorithms.
For batch learning algorithms (offline) we will use the sample mean, whereas, for the
stochastic learning algorithms we will drop the expectation and use the actual expression
inside the expectation. Then, one can convert the PCA to the eigenvalue problem of the
covariance matrix of	x, which is essentially equivalent to the well-known transformation
(Karhunen-Loeve transform) which is used in signal processing, as follows:
R ୶୶ = E[x(t)x ୘ (t)] = VΛV ୘ 	 ∈ ℝ୫୶୫ 						∀	t = 1,2, … , T

(2.2)

Where x ୘ is x transpose, Λ = diag{λଵ , λଶ , … , λ୫ } is a diagonal matrix that contains
݉ eigenvalues and V = [Vଵ , Vଶ , … , V୫ ] 		 ∈ ℝ୫୶୫ is corresponding orthogonal or unitary
matrix that consists of the unit length eigenvalues called principle eigenvectors. The
Karhunen-Loeve-transform [2] sets up a linear transformation of an observed vector x as
follows

31

y୮ = Vୗ୘ x

(2.3)

Where x = [xଵ (t), xଶ (t), … , x୫ (t)]୘ is the zero-mean observed vector (input
vector), y୮ = [yଵ (t), yଶ (t), … , y୬ (t)]୘ is the output vector referred as the vector of
principle components (PCs), and Vୗ = [vଵ , vଶ , … , v୫ ]୘ 			 ∈ ℝ୫୶୬ 	 is the set of signal
subspace eigenvectors, with the orthonormal vectors v୧ = [v୧ଵ , v୧ଶ , … , v୧୫ ]୘ , (i.e.
(v୧୘ v୨ = δ୧୨ )		∀	j ≤ i,	(δ୧୨ ) is the Kronecker delta. The vectors (v୧ 		∀	i = 1, 2, … , n	) are
the eigenvectors of the covariance matrix R ଡ଼ଡ଼ , while the variances of the PCs y୧ are the
corresponding principal eigenvalues. Therefore, we can re-formulate the equations as
R ୶୶ v୧ = λ୧ v୧ 					∀	i = 1,2, … , n

(2.4)

Where v୧ are the eigenvectors, λ୧ are the corresponding eigenvalues and R ୶୶ =
E[xx ୘ ] is the covariance matrix of zero-mean signal x and E is the expectation operator.
Also we can re-write the equation (2.4) in a matrix form V ୘ R ୶୶ V = Λ where Λ	 is the
diagonal matrix of eigenvalues of the covariance matrix	R ୶୶ . However, to compute the
eigenvalues and corresponding eigenvectors of the covariance matrix	R ୶୶ , one might use
the Single Value Decomposing method [2], which is referred as prewhitening or
decorrelation of input data, to transform the observations to a set of orthogonal
(decorrelated) signals. However, The PCA algorithms carry out principle components
where they are uncorrelated, i.e. they are orthogonal. However, PCA didn’t recover the
sources from the observed data mixtures.

2.3 Independent Component Analysis (ICA)
The concept of Independent Component Analysis (ICA) is a vital algorithm for
the BSS [1]. P. Comon [7] was the first to describe the fundamentals of this technique

32

and defined its name in 1994. ICA has been succeeding as an attractive algorithm since it
has been applied in many diverse fields as a method that can retrieve the original sources
from the linearly mixed independent components.

2.3.1

The Instantaneous ICA Framework
An instantaneous ICA mixtures model with ݊ source signals is defined as
‫ = ܠ‬A‫ ܛ‬+ ‫ܞ‬

(2.5)

Where A is an mxn mixing matrix, x represents a matrix with ݉ observed mixed
signal vectors as in (2.6) , ‫ ܛ‬is a matrix with ݊ source signals as in (2.7) and ‫ ܞ‬is an
Additive Gaussian Noise .
‫[ = ܠ‬xଵ , xଶ , … , x୫ ]୘
‫[ = ܛ‬sଵ , sଶ , … , s୬ ]୘

(2.6)
`

(2.7)

In general, ICA framework carries out with the following assumptions:
1. The source signals s are assumed to be statistically independent, which means
that:
p(‫ = )ܛ‬p(sଵ , sଶ , … , s୬ ) = p(sଵ )p(sଶ ) … p(s୬ )
2. No more one source signal has Gaussian distribution. Since, the mixing matrix A
is not identifiable for more than one Gaussian Independent Components. [12],
[14].
For simplicity, we will assume that A is square, i.e.	m = n, for the rest of analysis.
The main idea of ICA is to recover the source signals from the observed signals without
any knowledge about the source signal or the mixing matrix. In order to achieve this

33

purpose, the ICA algorithm computes the weighting matrix W that is equal to the inverse
of matrix A. However, the estimated source signals ࢟ is given as below
‫ = ܡ‬W୘‫ܠ‬

(2.8)

However, it is a linear transformation so we can estimate one of the independent
components due to	w ୘ ‫ܠ‬, where w is a column vector of the demixing matrix	W in (2.8).
Generally speaking, ICA methods usually divide into two steps, preprocessing (prewhitening) and rotation. Pre-whitening or preprocessing is actually half of the ICA
process. Pre-whitening is based on second order statistics (SOS) and the rotation process
needed to separate the mixtures, which is based on ICA methods. In the next section, we
are going to analyze a basic approach using the pre-whitening method.
2.3.1.1 Preprocessing
Some of adaptive ICA algorithms require pre-whitening, also called sphering or
normalized spatial decorrelation. The preprocessing consists of two steps. The first step
aims to center the mixed signals ‫ ܠ‬by removing its mean of mixed signals as in (2.1).
After mixed signals have centered, we express the centering matrix ‫ܠ‬ሶ as follows
xሶ ୘
xሶ (1)									xሶ ଵ (2)				…				…								xሶ ଵ (T)	
‫ ۍ‬ଵ୘ ‫ ۍ ې‬ଵ
‫ې‬
(1)									xሶ
(2)				…				…								xሶ
(T)	
xሶ
xሶ
ଶ
ଶ
ଶ
ଶ
‫ێ ۑ ێ‬
‫ۑ‬
…………………………………………‫ۑ‬
‫ܠ‬ሶ = ‫ ێ‬. ‫ێ = ۑ‬
‫ ێ‬. ‫ۑ…………………………………………ێ ۑ‬
‫ ێ‬. ‫ۑ…………………………………………ێ ۑ‬
‫ۏ‬xሶ ୬୘ ‫ۏ ے‬xሶ ୬ (1)									xሶ ୬ (2)				…				…								xሶ ୬ (T)	‫ے‬

(2.9)

The Eigen-Vector Decomposition SVD [28] can be used to decompose the
covariance matrix of ‫ܠ‬ሶ and its corresponding operation is expressed as in (2.10)
C୶ = E[‫ܠ‬ሶ ‫ܠ‬ሶ ୘ ] = EDE ୘

34

(2.10)

Where E represents the eigenvectors which are orthogonal matrix of mixed
signals, and D expresses in (2.11) represents the diagonal Eigenvalues of matrix	C୶ .
D = diag(dଵ , dଶ , … , d୬ )

(2.11)

Thus, the whitening process of ‫ܠ‬ሶ is expressed by (2.12)
భ

Z = Dିమ E ୘ ‫ܠ‬ሶ = V‫ܠ‬ሶ

(2.12)

భ

Where V equals Dିమ E ୘ and represents the whitening matrix or Mahalnobis
transform of	‫ܠ‬ሶ . Equation (2.12) shows that the centered matrix ‫ܠ‬ሶ 	is a linear transformed to
a matrix Z and the covariance matrix of Z equals to identity matrix [2]. In other words,
the means of the matrix Z is uncorrelated and have a unit variance. Figure 2.2 illustrates
three basic transformations of observed data	x: pre-whitening, PCA and ICA,
respectively.
In next sections, we are going to discuss some of the basic approaches in order to
conduct the ICA performance of instantaneous mixtures, and we assume that the additive
weight Gaussian noise terms ࢜ in (2.5) are negligible or reduced to be at negligible levels
due to the preprocessing stated in the previous section.
2.3.1.2 Nonlinear function choice (Activation function):
Nonlinear function or Activation function is the source signal model	ϑ(‫)ܡ‬.
However, it is very important to select a suitable nonlinear function depending on our
source signals. There has been much research regarding this topic. The most suitable ones
for super Gaussian sources and Sub Gaussian sources are proposed by Hyvarinen in [14],
[15], as follows:
For super Gaussian sources which are the source signals having a positive kurtosis
sign, e.g. a Laplacian signal.
35

ϑୗ୙୔ (‫ = )ܡ‬−2 tanh(‫)ܡ‬

(2.13)

For sub Gaussian sources which are the source signals having a negative kurtosis
sign, e.g. a uniform signal.
ϑୗ୙୆ (‫ = )ܡ‬tanh(‫ )ܡ‬− ‫ܡ‬
2.3.1.3

(2.14)

The Learning Update Rules
The update rules divide into two categories based on learning procedure, online

learning and offline learning [2].
2.3.1.3.1 Batch Learning (Offline learning):
Batch learning is the kind of algorithms that have an update rule requiring the
whole training sample in every step of iteration. Usually, in batch learning, the update
rule relies on the expectation of observed data	x. In practice, the expectation of observed
data x approximated by the mean of observed data	x.
2.3.1.3.2 Stochastic gradient (Online learning):
Online learning is the kind of algorithms that have an update rule that doesn’t
require the whole training sample in every step of iteration. Mathematically speaking,
these categories of update rules don’t rely on the expectation of observed data	x. In other
words, one might get online learning from the batch learning by dropping the expectation
operator from the offline update rule.
2.3.1.4 ICA based on Maximization of non-Gaussianity
In this section, we will study the ICA algorithm based on the non- Gaussianity
criteria. The non-Gaussianity approach is based on the Center Limit Centre (CLT) which
states that for independent sources their sum will become closer to Gaussian distribution
36

than each individual source. The CLT shows that for whitened data (in section 2.3.1),
finding an independent source is equal to finding the direction of w which gives a
component of maximum non-gaussianty [2], [12-16].
For sake of illustration, one assumes that the observed data is ‫ = ܠ‬A‫ ܛ‬and the
weight vector is	w. However, in order to find one of the independent components, x that
is	y = w ୘ ‫ܠ‬, the vector w ୘ should be in the row of the inverse matrix	Aିଵ .
y = w୘ ‫ = ܠ‬v୘ ‫ܛ‬

(2.15)

Therefore, this implies that, by maximizing the non-gaussianty of ‫ ܡ‬in terms of	w,
then, we will get one of the independent components present in	x. In addition, this is the
same as we have in the whitened data before applying ICA methods. There are different
criteria for measuring non-Gassianity. Next, we will study some of these criteria.
2.3.1.4.1 Kurtosis Measure
Kurtosis is a dimensionless measure and refers to a fourth order cumulant of a
random variable. Mathematically, one can express the normalized kurtosis of zero mean
random variable in terms of 2nd – 4th order moments as follows:
୉[୷ర ]

kurt(y) = (୉[୷మ ])మ − 3

(2.16)

The important feature of kurtosis is that kurtosis kurt(y) is equal to zero for
Gaussian random variables. So, kurtosis is a tool to measure the relative sharpness and
flatness of distributions. However, kurtosis with positive sign is termed to the super
Gaussian data and kurtosis with negative sign is termed to the sub Gaussian data.
For sake of optimization, the kurtosis kurt(y)	expression in (2.16) can be
expressed as follows:
kurt(y) = E[y ସ ] − 3(E[y ଶ ])ଶ
37

(2.17)

This expression is easier to optimize, by omitting the denominator since the term
(E[y ଶ ])ଶ is always positive. In this approach, usually the data has to be whitening in
order to ensure that the source signals are uncorrelated and have unit variance, i.e. the
source signals are orthonormal. Then apply the ICA methods to find the direction
of	y୧ = w ୘ x, where kurt(y) is maximized, i.e. the direction of the most non-Gaussian
component. After that the orthogonal projection 	y୧ = w ୘ x will give us the separated
component.
2.3.1.4.2 Gradient algorithm using kurtosis
After pre-whitened, the observation signals can be expressed as z = Vx so that the
estimated signals	y becomes as follows:
y = w ୘ z = w ୘ Vx

(2.18)

In general, we are beginning from the initial random vector of		w, and then
looking for a direction of w at which the value of kurtosis of estimated signals	y = w ୘ z
is increasing. In fact, we consider the Maximizing of the absolute value of kurtosis and it
is suitable for both super Gaussian and sub Gaussian signals. However, to perform the
gradient decent method of kurtosis under the constraint that ‖w‖ଶ = 1 as follows:
ப|୩୳୰୲൫୵౐ ୸൯|
ப୵

= 4sgn(kurt(w ୘ z)){E[z(w ୘ z)ଷ ] − 3w‖w‖ଶ }

(2.19)

In terms of direction, we can simplify the gradient vector by omitting the scalar
term and the second term. Then, one can update the expression as follows:
w = w + γ∆w
∆w = 	sgn(kurt(w ୘ z)){E[z(w ୘ z)ଷ ]}
w = wൗ‖w‖
38

(2.20)

2.3.1.5 Fixed point algorithm
In [14-15], Hyvarinen proposed the ICA algorithm that takes advantage of a
Newten-type method (Lagrange Multipliers) for maximizing the Kurtosis in order to
increase the speed and Robustness in previous ICA algorithm. The derivation of a fixed
point algorithm is discussed in depth in [14]. He forms the new update law of an ICA
algorithm as follows:
w ା = E[z(w ୘ z)ଷ ] − 3w

(2.21)

The basic scheme for one independent component estimated is as follows:
1. Prewithening Data, i.e. ‫ݔܸ = ݖ‬
2. An initial value for the Wight vector w that has ห|w|ห = 1
3. Find the updated weighed vector
‫ ݓ‬ା = ‫)ݖ ் ݓ(ݖ[ܧ‬ଷ ] − 3‫ݓ‬
4. Normalize and update the weight vector
௪శ

‫‖ = ݓ‬௪ శ‖					Where ‖‫ ‖ݓ‬is the norm of	‫ݓ‬.
5. Go back to step 3 until the convergence.
In order to estimate many components, one can apply the previous scheme Ntimes to get all sources (components) that exist in the observed data. Notably, we should
always keep each new estimated component orthogonal to the previous one in order to
prevent estimating the same component each time as follows:
1. Prewithening Data, i.e. ‫ݔܸ = ݖ‬
2. An initial value for the Wight vector w that has ห|w|ห = 1
3. Find the updated weighed vector
w ା = E[z(w ୘ z)ଷ ] − 3w
39

4. Find a projection matrix ‫ ܤ‬then Set
w ା = w ା − BB ୘ w ା
5. Normalize and update the weight vector
wା
w=
‖w ା ‖
Where ‖w‖ is the norm of	w.
6. Go back to step 3 until the convergence.
Practically, the projection matrix B contains all vectors w that calculate to find the
previous components. However, the transformation in step 4 enables the algorithm to
converge to a different component from the previous ones that were discovered.
2.3.1.6 Negentropy Measure
Let us define that J	as a negentropy of random vector y where it represents a
normalized version of entropy of a random vector	s. In general, negentropy is an
information theoretic tool that is used to measure the distance of random variables from
the Gaussian distribution at the same covariance. Mathematically, we can express J(y)
as follows:
J(y) = H(y ୋ୅୙ୗୗ ) − H(y)

(2.22)

However, negentropy J is an appropriate measure of nonGassianity, statistically
[2]. Instead of estimating the negentropy, one can use an approximation of negentroy
that is proposed in [11] as follows:
ଵ

ଵ

J(y) ≈ ଵଶ E ଶ [y ଷ ] + ସ଼ [kurt(y)]ଶ

40

(2.23)

By using the higher order cumulants and taking advantage of a non-quadratic
function	G to simplify the approximation of negentropy, one can rewrite the
approximation of negentroy J(y)	as follows:
J(y) = (E[G(y)] − E[G(v)])ଶ

(2.24)

where v represents a Gaussian variable with zero mean and unit variance. Based
on this approximation of negentroy, FastICA algorithm is structured and analyzed in
depth by Hyvarinen in [12].
The gradient algorithm can be estimated in order to produce a fixed point
algorithm as follows (after the pre-whitening):
∆w = μE[zg(w ୘ z)]
w = w/||w||
Where	μ = 	E[G(w ୘ z)] − E[G(v)], and	g(y) =

(2.25)
∂G(y)
ൗ∂y.

One of the most common choice non-quadratic functions of	g(y), amongst others,
is as follows:
g(y) = tanh(αy) , ∀	1 ≤ α ≤ 2

(2.26)

Practically, the maximum of the approximation of the negentropy of w ୘ z are
occurred at certain optima of E[G(w ୘ z)] under the constraint‖w‖ଶ = 1. In order to find a
certain optima, one can solve the gradient of the Lagrangian to zero (Kuhn-Tucker
conditions) [2].
F(z, w) = E[zg(w ୘ z)] + λw = 0

(2.27)

Newton’s method is used to solve this equation, we have
ப୊

ப୵

= E[zz ୘ g ᇱ (w ୘ z)] + λI ≈ E[zz ୘ ]E[g ᇱ (w ୘ z)] + λI = {E[g ᇱ (w ୘ z)] + λ}I (2.28)

41

According to the Newton’s method, the update rule becomes as follows:
wା = w − ቂ

ப୊ ିଵ

ப୵

ቃ

F

(2.29)

Finally, the update rule for the FastICA is
w ା = E{z[g(w ୘ z)]୘ } − E{g ᇱ (w ୘ Z)}w

(2.30)

In general, the FastICA algorithm can be stated up as follows:
The basic FastICA scheme for one independent component estimated is as
follows [12-15]
1. An initial value for the Wight vector w
2. Find the updated weighed vector
w ା = E{Z[g(w ୘ Z)]୘ } − E{g ᇱ (w ୘ Z)}w
Where the g is a non-quadratic function such as	g(y) = y ଷ , y = w ୘ X and g ᇱ is
the derivative of the non-quadratic function g.
3. Normalize and update the weight vector
୵శ

w = ‖୵శ‖
Where ‖w‖ is the norm of	w.
4. Go back to step 2 until the convergence.
In order to estimate many components, one can apply the previous scheme Ntimes to get all sources (components) that exist in the observed data. Notably, we should
always keep each new estimated component orthogonal to the previous one in order to
prevent estimating the same component each time as follows:
1. Pre-whitening Data, i.e. ‫ݔܸ = ݖ‬
2. An initial value for the Wight vector w that has ห|w|ห = 1

42

3. Find the updated weighed vector
w ା = E{Z[g(w ୘ Z)]୘ } − E{g ᇱ (w ୘ Z)}w
4. Set
w ା = w ା − BB ୘ w ା
5. Normalize and update the weight vector
wା
w=
‖w ା ‖
Where ‖w‖ is the norm of	w.
6. Go back to step 3 until the convergence.
There is a method that one can use to estimate all independent components
simultaneously instead of estimating each independent component individually. This can
occur by using different learning rules for all estimated signals and apply the symmetric
decorrelation to ensure the convergence in ICA method. The symmetric decorrelation is
as follows:
షభ

W = W(W ୘ W) మ

(2.31)

Where W = [wଵ , wଶ , … , w୒ ] is the matrix of the vectors	w୧ .

2.3.1.7 ICA based on Maximum Likelihood Estimation
In this part, we employ Maximum Likelihood (ML) as a contrast function in ICA
algorithm. ICA based on ML estimation carried out in [28], [29]. Assume that the unmixing matrix denotes	W ் ≈ Aିଵ, thus, we can recall the instantaneous mixture model in
(2.5) as follows:
‫ = ܠ‬A‫ܛ‬
43

Then the estimated signals are
‫ = ܡ‬W் ‫ܠ‬
Due to a basic property of linear transformed random vectors
p୶ (x) = |det	(Aିଵ )|pୱ (s)

(2.32)

By the assumption of the statistical independence between the estimated source
signals y and	pୱ (s) ≈ p୷ (y), we can show that the probability density of observed data x
is as follows:
p୶ (x) = |det(W)|p୷ (y) = |det(W)| ∏୬୧ୀଵ p୧ (y୧ )

(2.33)

Where y୧ = w୧୘ x where w୧ is a column of elements in W, therefore, we can
express p୶ (x) as follows:
p୶ (x) = |det(W)| ∏୬୧ୀଵ p୧ (w୧୘ x)

(2.34)

By constructing the likelihood function of	W, as a product of the densities at each
observed signal, we get L(W)	is:
L(W) = ∏୬୧ୀଵ p୧ (w୧୘ x) |det(W)|

(2.35)

Then, by optimizing the expectation of log-likelihood function L(W) as follows:
E[log{L(W)}] = Eൣlog൛∏୬୧ୀଵ p୧ ൫w୧୘ x൯ |det(W)|ൟ൧ = Eൣ∑୬୧ୀଵ log൛p୧ ൫w୧୘ x൯ൟ൧ +
log	(|det(W)|)

(2.36)

Then, the ML contrast function of W express as follows:
G(W) = 	Eൣ∑୬୧ୀଵ log൛p୧ ൫w୧୘ x൯ൟ൧ + log	(|det(W)|)

(2.37)

Now, we are going to use a gradient decent approach in order to maximize
ML,	G(W), contrast function with respect to W. However, one can show that:
பୋ(୛)
ப୛

= (W ୘ )ିଵ + E[ϑ(Wx)x ୘ ]

44

(2.38)

Where ϑ(Wx) = ϑ(y) = [ϑଵ (yଵ ), ϑଶ (yଶ ), … , ϑ୬ (y୬ )]୘ and it is a nonlinear
function or (Activation function) that performs the source signal model. We can derive
the nonlinear function according to following equation:
ப

ଵ

ϑ୧ (y୧ ) = ப୷ log{p(y୧ )} = ୮(୷ )
౟

౟

ப୮(୷౟ )
ப୷౟

(2.39)

The update rule for ML estimation using the gradient decent method is expressed
as:
W = W + γ∆W

(2.40)

Where
∆W	α	

∂G(W)
= 	 {(W ୘ )ିଵ + E[ϑ(Wx)x ୘ ]}
∂W

And, γ is the learning rate or the step size
In [12], the same result has been achieved when minimized the Kullback-Leibler
(KL) divergence between the joint and the product of the marginal distributions of
estimated signals	y୧ . The vital point in Amari’s paper [2], [71] showed that the parameter
space in this optimization problem is a Riemannian metric Structure instead of Euclidian
Structure. However, the steepest decent should be given by the natural gradient instead of
using the gradient decent method. In that sense, the update rule of the natural gradient is
given as follows (by multiplying the right hand of previous update rule by	W ୘ W):
W=W−γ

பୋ(୛)
ப୛

W୘W

Then, 	
∆W	α	

∂G(W) ୘
W W = {I + E[ϑ(y)y ୘ ]}W	
∂W

45

(2.41)

Notably, Natural gradient method is based on the attempt to implement the
Newton decent method by the approximation of hessian inverse (∇ଶ G)ିଵ ≈ W ୘ W.
2.3.1.8 ICA based on Entropy Maximization
The differential entropy of a random vector s with density	p(‫ )ܛ‬can be expressed
as follows [2], [36]:
H(‫ = )ܛ‬− ‫ ׬‬p(‫ )ܛ‬log{p(‫ })ܛ‬. d‫ܛ‬

(2.42)

Let’s define J	a negentropy of random vector s where it represents a normalized
version of entropy of a random vector	s. In general, negentropy is information theoretic
tool that uses to measure the distance of random variables from the Gaussian distribution
at the same covariance. Mathematically, we can express J(‫ )ܛ‬as follows:
J(‫ = )ܛ‬H(‫ܛ‬ୋ୅୙ୗୗ ) − H(‫)ܛ‬

(2.43)

In addition, mutual information can be a good method to measure the statistical
dependence between random variables. P. Comon shows that the mutual information is a
good metric of statistical dependence [7]. In that sense, if the random variables ‫= ܛ‬
{sଵ , sଶ , … , s୬ } are statistically independent, then the mutual information I(‫ )ܛ‬is equal to
zero. We can define the mutual information I(‫ )ܛ‬as follows:
I(‫∑ = )ܛ‬୬୧ୀଵ H(s୧ ) − H(‫)ܛ‬

(2.44)

2.3.1.9 Bell-Sejnowski method
ICA algorithm based on minimized the mutual information metric is proposed by
Bell and Sejnowski in [18], [2]. They use the mutual information as a way to measure the
independent between random variables. One can assume that the un-mixing matrix is W

46

and the estimated source signals are	‫ = ܡ‬W ் ‫ܠ‬. However, let’s express the mutual
information as follows:
୒
்
I(‫୒∑ = )ܡ‬
୧ୀଵ H(y୧ ) − H(‫∑ = )ܡ‬୧ୀଵ H(y୧ ) − H(W ‫	)ܠ‬

(2.45)

By using the fact that the differential entropy is in general not invariant under
arbitrary invertible maps, we can express the mutual information as follows:
I(‫୒∑ = )ܡ‬
୧ୀଵ H(y୧ ) − H(‫ )ܠ‬− log	 |det	(W)|

(2.46)

One can state the optimization problem as follows: we can minimize the mutual
information I(‫ )ܡ‬with respect to the un-mixing matrix W to estimate the un-mixing
matrix W that makes the estimated signals ‫ ܡ‬more statistically independent. Then, one
can re-write the expression of differential entropy as follows:
H(y୧ ) = −E[log{p(y୧ )}]

(2.47)

And, the mutual information expression becomes as follows:
I(‫ = )ܡ‬− ∑୒
୧ୀଵ E[log{p(y୧ )}] − H(‫ )ܠ‬− log	 |det	(W)|

(2.48)

Despite the fact that the estimated source signals y are uncorrelated, because of
the ICA assumption that the source signals s are statistically independent, one can
simplify the mutual information cost function to become almost identical with ML cost
function	G(W). One can show that the determent of un-mixing matrix det	(W) is
constant as follows:
Since we have uncorrelated estimated signals	y, then we can state that:
E[yy ୘ ] = I	 ⇒ 	WE[xx ୘ ]W ୘ = I	 ⇒ then, 				det(W) det(E[xx ୘ ]) det(W ୘ ) = 1
This implies that the 	det(W) must be constant. However, H(‫ )ܠ‬is not a function
of	W. So, it can be omitted from the MI contrast function	I(‫ )ܡ‬as follows:
I(‫ = )ܡ‬− ∑୒
୧ୀଵ E[log{p(y୧ )}] − log	 |det	(W)|
47

(2.49)

୘
I(‫ = )ܡ‬−ൣ	Eൣ∑୒
୧ୀଵ log൛p୧ ൫w୧ ‫ܠ‬൯ൟ൧ + log	(|det(W)|)൧

(2.50)

Then, recall the ML contrast function of W that is expressed as follows:
୘
G(W) = 	Eൣ∑୒
୧ୀଵ log൛p୧ ൫w୧ ‫ܠ‬൯ൟ൧ + log	(|det(W)|)

(2.51)

Apart from the minus sign, both contrast functions look very similar. By
minimizing the MI cost function with respect to W , we will end up with the same update
rule of the ML estimation as follows:
∆W	α		{(W ୘ )ିଵ + E[ϑ(W‫ ܠ)ܠ‬୘ ]}

(2.52)

In conclusion, ICA algorithm based on deferent metrics of statistical independent
(MI, ML and KL criterions) ends up with the same update algorithms.
2.3.1.10 ICA based on Tensorial Methods
A tensor is a multi-linear operator that is derived from the Taylor series of the
log-characteristic function	f(w) = E[exp(jwx)], where x is a zero mean random variable.
One can express the Taylor series of the log-characteristic function log	{f(w)} as follows:
logf(w) = κଵ (jw) + κଶ

(jw)ଶൗ
(jw)୰ൗ
+
⋯
+
κ
୰
2!
r! + ⋯

(2.53)

The coefficients κ୧	 ∀	i = 1,2, … are called Cumulants. In multivariate situations,
one can call them cross cumulants, which are similar to cross conveniences. In BSS
problem, Kurtosis can be expressed as a fourth-order cross-cumulant as follows:
Kurt(y) = cum(y୧ , y୨ , y୩ , y୪ )

(2.54)

Where	y୧ = ∑୧ w୧ x୧ , then
Kurt(∑୧ w୧ x୧ ) = cum൫∑୧ w୧ x୧ , ∑୨ w୨ x୨ , ∑୩ w୩ x୩ , ∑୪ w୪ x୪ ൯
= ∑୧୨୩୪ w୧ସ w୨ସ w୩ସ w୪ସ cum(x୧ , x୨ , x୩ , x୪ ) (2.55)

48

In general, the tensor is defined as the fourth order cumulants, and it is similar to
the covariance matrix for second order moments. The cumulant structure is symmetry,
however the eigenvalue decomposion is always valid as it’s shown in [1]. Let’s assume
that we have an eigenvector matrix Vand the corresponding eigenvalues	λ, and then one
can decompose the tensor Fas follows:
F = λV

(2.56)

Likewise, the pre-whitened data	Z = VAs = W ୘ s , where the matrix W is the
unmixing matrix, however, matrices W	, W ୘ will be orthogonal. One can express the
୘
eigenmatrix	V = w୫ w୫
, where the vector w୫ is the m-th row of matrix	W, of the

following tensor F with the corresponding eigenvalues which represents the kurtosis of
the independent components [1], [16], [27] as follows:
F = F୧୨ = ෍ V୩୪ cum൫z୧ , z୨ , z୩ , z୪ ൯ = ෍ w୫୩ w୫୪ cum( z୧ , z୨ , z୩ , z୪ ) = ⋯
୩୪

୩୪

= w୫୧ w୫୨ kurt(s୫ )

(2.57)

In other words, one can estimate the un-mixing matrix W for the independent
sources at given eigenmatrices of the tensor. This case is valid if we have the distinct
eigenvalues, otherwise, the problem will be difficult to solve.
2.3.1.11 PARAllel FACtor (PARAFAC) algorithms
Several BSS algorithms based on the Parallel factor the (PARAFAC) model have
been proposed i.e. [11], [59], [60], [86]. PARAFAC is a multi-linear tool for tensor
decomposing in sum of rank-1 tensors.

49

2.3.1.12 Joint Approximation Diagonalisation (JAD)
In order to overcome the problem in the tensor, Cardoso [16] is proposed JADE
algorithm by diagonals of the tensor of matrix	F.

Since the tensor F is a linear

combination of terms of	w୧ w୧୘ , one can express the tensor F of any matrix as the
eigenvalue decomposition form, i.e. the matrix	Q = WFW ୘ . This allows estimating the
unmixing matrix W by minimizing the off-diagonal terms or maximizing the diagonal
terms of	Q. However, the cost function of the JADE algorithm was proposed by Cardoso
as follows:
max୛ J୨ୟୢୣ (W) = max୛ 		∑୧‖diag(WF୧ W ୘ ‖ଶ

(2.58)

Where F୧ represents the tensor of the different matrix	V୧ , where V୧ might be the
eigenmatrices of the tensor	F୧ . JADE algorithm is not as effective in terms of
convergence and computational especially for the high dimension [2], [16] [17].

2.3.2

The Convolutive ICA Mixtures
In the previous sections, we have investigated several methods based on the

instantaneous case in the ICA framework. All previous methods perform well in terms of
quality of separating source signal from the linear mixed sources. However, in
practicality, if we apply these methods on real life applications, i.e. a multipath channel in
communication, room environment for a sound separation, we will fail to recover source
signals. The major reason is because the instantaneous model doesn’t hold the varying in
the mixing matrix. Figure 2.2 illustrates that a single channel convolution and a
deconvolution process. A multi-channel deconvolution problem can be considered as a
natural extension of the instantaneous BSS problem. With this problem, assume an m50

dimensional vector of received discrete time signals x(k) = [xଵ (k), xଶ (k), … , x୫ (k)]୘ at
time k is assumed to be produced from an n-dimensional vector of source signals	s(k) =
[sଵ (k), sଶ (k), … , s୫ (k)]୘ , where	m ≥ n, by using a stable mixture model [2].
ஶ
x(k) = ∑ஶ
୮ୀିஶ H୮ s(k − p) = H୩ ∗ s(k),				with	 ∑ିஶฮH୮ ฮ ≤ ∞	

(2.59)

Where ∗ represents the convolution operator and H୮ is an (m	x	n) matrix of
mixing coefficients at time-lag	p.
One can define that
ି୮
H(z) = ∑ஶ
୮ୀିஶ H୮ z

(2.60)

where z ିଵrepresents the unit time-delay operator, i.e. z ି୮ [s୧ (k)] = s୧ (k − p).
Generally speaking, the goal of multichannel deconvolution is to recover the
source signals, up to the possibly scaled and time delayed, from the received signals by
using the approximate knowledge about the source signal distributions and statistics.

51

࢞(࢑) = ෍ ࢎ࢖ ࢙(࢑ − ࢖)

࢙(࢑)

࢟(࢑) = ෍ ࢝࢖ ࢞(࢑ − ࢖)

࢖

࢖

Convolution

Deconvolution

ࢎ࢖

࢝࢖

࢟(࢑) = ෍ ࢍ࢖ ࢙(࢑ − ࢖)
࢖

࢙(࢑)

Cascade System
ࢍ࢖ = ࢝࢖ ∗ ࢎ࢖
ࢍ࢖ (࢑) = ෍ ࢝࢑ ࢎ࢖ି࢑
࢑

a) Diagram illustrating convolution and deconvolution process of the signal channel
Unknown
࢜(࢑)

࢙(࢑)

ࡴ(ࢆ)

෍ᇹ

࢞(࢑)

ࢃ(ࢆ)

࢓

࢔

࢟(࢑)

࢔

b) Multichannel blind deconvolution problem (MBD)

Figure 2.2: Block diagram of the Convolutive Mixtures.
Typically, we assume every source signal s୧ (k) is an i.i.d (independent and
identically distributed) sequence. One can express the convolutive mixture model as
follows:

52

aଵଵ 				…									aଵ୒
xଵ (n)
sଵ (n)
‫ۍ‬
‫ ۍ ې‬a 				…								a ‫ۍ ې‬
‫ې‬
x (n)
sଶ (n)
ଶ୒
‫ ێ‬ଶ
‫ ێ ۑ‬ଶଵ …				
‫ێ‬
‫ۑ‬
‫ۑ‬
‫ێ=ۑ … ێ‬
‫ۑ … ێ∗ۑ‬
…				
‫ێ ۑ‬
‫ێ‬
‫ێ ۑ‬
‫ۑ‬
‫ۏ‬x୑ (n)‫ ۏ ے‬a୑ଵ 				…						a୑୒ ‫ۏ ے‬s୒ (n)‫ے‬

(2.61)

Applying Short Time Fourier Transform (STFT) for this model gives us two
advantages as follows:
•

In frequency domain, signals become more super Gaussian, which will be
more suitable for any ICA learning algorithms.

•

In frequency domain, one can use the approximation of linear convolution
with multiplication.
aଵଵ 				…									aଵ୒
xଵ (n)
sଵ (n)
‫ۗې‬
‫ۗې‬
‫ۍۓ‬
‫ ۍۓ‬a 				…								a ‫ۍ ې‬
sଶ (n) ۖ
ଶ୒
ۖ‫ ێ‬xଶ (n) ‫ۖۑ‬
ۖ‫ ێ‬ଶଵ
‫ێ‬
‫ۑ‬
‫ۑ‬
…				
STFT ‫ = ۑ … ێ‬STFT ‫ێ‬
‫ۑ … ێ∗ۑ‬
‫ێ۔‬
‫ێ۔‬
…				
‫ێ ۑ‬
‫ۘۑ‬
‫ۘۑ‬
ۖ
ۖ
ۖ
ۖ
‫ۏە‬x୑ (n)‫ۙے‬
‫ ۏە‬a୑ଵ 				…						a୑୒ ‫ۏ ے‬s୒ (n)‫ۙے‬

(2.62)

Let’s assume	M = N, the STFT of convolutive model becomes as follows:

xଵ (f, t)
Aଵଵ (f)				…									Aଵ୒ (f) sଵ (f, t)
‫ۍ‬
‫ۍ ې‬
‫ۍې‬
‫ې‬
x (f, t)
A (f)				…								Aଶ୒ (f) sଶ (f, t)
‫ ێ‬ଶ
‫ ێ ۑ‬ଶଵ
‫ێۑ‬
‫ۑ‬
…				
‫ێ=ۑ … ێ‬
‫ۑ … ێۑ‬
…				
‫ێ‬
‫ێ ۑ‬
‫ێۑ‬
‫ۑ‬
‫ۏ‬x୒ (f, t)‫ ۏ ے‬A୒ଵ (f)				…						A୒୒ (f) ‫ۏ ے‬s୒ (f, t)‫ے‬

(2.63)

⇒ x(f, t) = A୤ s(f, t), ∀	f = 1, … , F

(2.64)

Where ‫ ܨ‬is the number of FFT points, and also note that we use the STFT instead
of FT to preserve the stationary property of the signals and to divide the signal into
shorter overlapping frames. In other words, we transform the convolutive mixture
problem into L instantaneous problems by assuming the statistical independence among
53

the frequency bins. However, one can simply transform the convolution problem into
multiplication by using the windowing method, i.e. the window larger than the filter

length such as	F ≫ T. But in fact, this case is not easy to implement since the data will be
in complex number form, which affects the stability factor of the algorithm [3], [39],
[71]. In additional, the scale and permutation will have an effect in this approach as it will
be explained later.
2.3.2.1 Time-Domain Methods
One can estimate source signals by estimating the un-mixing coefficients in time
domain. The convoltutive mixtures model can be expressed as follows:
୘
x୧ (n) = ∑୒
୨ୀଵ ∑ୢୀଵ a ୧୨ୢ s୨ (n − d)				∀	i = 1, 2, … , N

(2.65)

In order to estimate the source signals from the mixtures in this model, one can
estimate the un-mixing coefficients filter w୧୨ୢ in FIR filter architecture (feedback
architecture) as follows:
୘
y୧ (n) = ∑୒
୨ୀଵ ∑ୢୀଵ w୧୨ୢ s୨ (n − d)				∀	i = 1, 2, … , N

(2.66)

Delay-compensation problem considers a major issue in time-domain models.
Several researchers carry out some methods to solve these problems in time domain. One
of these methods is the use of feedback architecture which is proposed by Torkkola [2],
[39], [133] in order to remove temporal dependencies and stabilize the cross-weights.
Some of his research utilized the IIR structure. Lee [36], [38] presented the following
IIR structure to separate the source signals from the mixtures as follows:
୘
y୧ (n) = x୧ (n) − ∑୒
୨ୀଵ ∑ୢୀଵ w୨ୢ y୨ (n − d)				∀	i = 1, 2, … , N	

(2.67)

Or
y(n) = x(n) − W଴ y(n) − ∑୐୩ୀଵ W୩ y(n − k)
54

(2.68)

So, in order to estimate the un-mixing matrix	W, Lee maximizes the joint
entropy	H(g(y)), where g(. ) is the sigmoid function which is used in the BellSejnowski’s method. He presents a new update rule for this model in time domain as
follows:
∆W଴ = −(I + W଴ )(I + E[φ(y)y ୘ ])

(2.69)

∆W୩ = −(I + W୩ )E[φ(y)y ୘ (n − k)]

(2.70)

Where φ(y) = −

∂logp(y)
ൗ∂y.

Several drawbacks are noticed in time-domain methods for recovering the source
signals. For long mixing filter, which means long transfer functions, the computation
will be too expensive [2], [133], [71]. Also, using the IIR filter instead of long FIR filter
to overcome this problem it suffers from instability and needs to invert the non-minimum
phase filters [39]. However, time domain methods are suitable and very efficient for
small mixing filters such as in communication channel [2]. In addition, Torkkola
proposes a feedback structure to overcome the spectral whitening problem in feed
forward structure in [2], [38], and [133].

Next, we will explain some methods in

frequency domain to solve the convolutive mixtures problem.
2.3.2.2 TRINICON Blind Source Separation
The TRINICON algorithm is based on the time domain approach and proposed by
Buchner et al. [131], [1]. The main drawback in this algorithm is its sensitivity to outlets,
not robust, especially in a real world recording problems. See [133], [3] and [38].
In their work, they use the multivariate models as a cost function in order to
consider the whole temporal structure of the original sources. Actually, they just

55

simplified the optimal formula of BSS in time domain approach by windowing the
observed signals in terms of blocks. Let us assume L୆ is the length of each block. One
can express the separated model of each block as follows
y(b, win) = x(b, win)W(b)

(2.71)

where
•

b denotes the block index

•

win	 ∈ {	0, … , L୆ − 1} is the time-shift index within the block

•

x(b, win) = [xଵ (b, win), … , x୑ (b, win)] is the M observed signals which

segmented in blocks of length L୆ .

y(b, win) = [yଵ (b, win), … , y୒ (b, win)] is the estimated N source signals
W(b) is the separation matrix for a given block and it is given by:
W (b)				. ..				Wଵ୑ (b)
‫ ۍ‬ଵଵ
‫ې‬
W (b)				. ..				Wଶ୑ (b) ‫ۑ‬
W(b) = ‫ ێ‬ଶଵ
‫ێ‬
‫ۑ‬
‫ۏ‬W୒ଵ (b)				. ..				W୒୑ (b)‫ے‬

(2.72)

So, assume L is the length of the FIR filters; however, the m-th mixture is
modeled as
x୫ (b, win) = ൣx୫ (bL + win), … , x୑ ൫(b − 2)L + 1 + win൯൧

(2.73)

Therefore, the output signals are modeled as:
y(b, win) = [y୬ (bL + win), … , y୒ (bL − D + 1 + win)]
= ∑୑
୫ୀଵ x୫ (b, win)W୫୬ (b)

(2.74)

Where D is the number of the time-lags.
The TRINICON algorithm aims to estimate each of the demixing matrixes
W୫୬ (b) based on three common optimization criteria:

56

Minimization of the cross-correlation of the output over multiple timelags.
Based on standard ICA algorithm “Non-Gaussianity.
Minimization of the cross-correlation of the output at different instant time
“Non-stationary”.
2.3.2.3 Frequency-Domain Methods:
In this section, we will express three interesting methods of ICA convolutive
mixture in frequency domain as follows:
2.3.2.3.1 Lee’s approach
Lee in [3], [38] proposed a FIR un-mixing structure. He used a method that
moved from time domain to the frequency domain in order to separate the sources and to
avoid the convulsion in time domain. In additional, he developed an update rule of unmixing matrix W୤ for each bin, which is similar to natural gradient one as follows:
∆W୤ = (I + EൣSTFT൛φ൫y(n)൯ൟy ୌ (f, t)൧)W୤

(2.75)

Lee’s method used time-domain and frequency domain. Time domain was used to
take advantage of the features of the nonlinearity function	φ(y). Whereas, he employed
the frequency domain just to make the unmixing processes. The proposed framework of
Lee’s method can be seen in Figure 2.3.

57

ࢄ(ࢌ)	
࢞૚ 	
࢞૛ 	
࢞࢔ 	

L-points
STFT

ࡿ(ࢌ)	
ࢃ૚
∎	

∎	

∎	

ࡿ૚ 	
L points
ISTFT

ࡿ૛ 	
ࡿ૜ 	

ࢃ࢔
Source
Model
STFT

Figure 2.3: Lee’s Block diagram
The main drawback in this method is that it requires extra computational
complexity. It requires moving from and to the frequency domain in order to use the
nonlinearity at each update step. According to Lee’s results his method didn’t encounter
the permutation problem.

58

ࢄ(ࢌ)	

ࡿ(ࢌ)	
ࢃ૚

࢞૚ 	

∎	

࢞૛ 	
࢞࢔ 	

∎	

L-points

∎	

STFT

ࡿ૚ 	
L points
ISTFT

ࡿ૛ 	
ࡿ૜ 	

ࢃ࢔

Figure 2.4: Smardagdis’s Block diagram
2.3.2.3.2 Smardagdis approach
Some researchers only employ the frequency domain for the convolutive problem.
However, they perform the source modeling and un-mixing in frequency domain, in order
to avoid the complexity in previous methods. Figure 2.4 shows the framework of
smaragdis approach [3], [39], where the system adapted to work in frequency domain for
each bin individually.
Since, the source signals tend to be more superGaussian in frequency domain; one
can take advantage of minimizing the Kullback-Leibler divergence in order to estimate
the source signals in frequency domain. Amari derives an update rule for a complex data
as follows:
∆W୤ = γ(I + E[φ(y(f, t))y ୌ (f, t)])W୤
Where γ is the learning rate, and	φ(y) =

∂logp୷ (y)
൘ .
∂y

59

(2.76)

Smaragdis mentioned in his paper that most problems arise in convolutive
mixtures come from the permutations and scale ambiguities. Also, he proposed zeropadding method before the FFT in order to smooth the spectra. Smaragdis’s framework
seems to be a robust and general solution for convolutive mixture problems.
2.3.2.3.3 Independent Vector Analysis (IVA)
Independent vector Analysis (IVA) is developed by Intae et al. [36], [40], it
extends the ICA model to be in the multivariate model. Furthermore, they proposed the
decoupling frequency in the adaptive learning rule to reduce the possibility of the
permutations. Similar to time domain method, IVA updates all the variables at the same
time thus it might converge into local minima. Furthermore, IVA algorithm suffers a
slow convergence from the high dimensionality of its contrast function and, in terms of
cost it’s considered to be too expensive to be implemented in real time.
2.3.2.3.4 Parra’s approach:
Parra and Spence proposed a new ICA algorithm based on the non-stationary and
SOS of signals in order to solve the convolutive mixture problems. Signals are considered
to be non-stationary if their statistics are varying in time. Mathematically, one can say

that the signal x(n) is a non-stationary signal if	C୶ (n) ≠ C୶ (n + d), where C୶ (n) is the
covariance matrix of x, it represents as C୶ (n) = E[x(n)x ୘ (n)], and d is a constant time.
Now, assume a noisy convolutive mixture model, as follows:
x(n) = A ∗ s(n) + e(n)

(2.77)

And the STFT form will be
x(f, t) = A(f)s(f, t) + e(f, t)			∀	f = 1, 2, … , F
Then, the covariance of the observed data x in frequency domain is
60

(2.78)

C୶ (f, k) = E[xx ୘ ] = A୤ Cୱ (f, t)Aୌ୤ + Cୣ (f, k)

(2.79)

Where Cୱ (f, t) is the source covariance and Cୣ (f, t) is the noise covariance; next,

෢ୱ (f, t) and the estimated noise
one can assume that the estimated source covariance is C

෢ୣ (f, t). However, one of appropriate error measurements will be as
covariance is	C

follows:

Error[k] = C୶ (f, k) − A୤ C෠ ୷ (f, k)Aୌ୤ − C෠ ୣ (f, k)

(2.80)

J൫A୤ , C෠ ୷ , C෠ ୣ ൯ = ∑୩‖Error[k]‖ଶ୊

(2.81)

Where y(f, t) = W(f)x(f, t) are the estimated sources, one can write the cost
function as follows:

In order to estimate each of the parameters	A୤ , C෠ ୷ , C෠ ୣ , one can find the derivative

∂J
∂J
∂J
of J respect to each, as follows ൗ∂A , ൘ ෠ , ൘ ෠ .
∂Cୣ
∂C୷

Using a stable FIR un-mixing filter	W୤ , we can re-write the above formula as
follows:

C෠ ୷ (f, k) = E[yy ୘ ] = W୤ [C୶ (f, t) − Cୣ (f, k)]W୤ୌ

(2.82)

Then, the cost function will be as follows:

J൫W୤ , C෠ ୷ , C෠ ୣ ൯ = ∑୩‖[C୶ (f, t) − Cୣ (f, k)]‖ଶ୊

(2.83)

According to analysis in [62], one can estimate the un-mixing matrix 	W୤ using

the gradient of the above cost function in terms of		W୤ , 	C୷ , and 	Cୣ . Parra proposed in his

paper two methods to recover the source signals	y(f, t). These methods were a least

squares and a Maximum Likelihood estimator. Wang [3] addressed a cyclostationary
convolutive mixture and proposed a new algorithm by combining the fourth and second
order statistics of the data to enhance the performance.
61

2.3.2.3.5 Recursive Convolutive ICA
The RR-ICA is proposed by F. Nesta et al. [130]. It is based on frequency domain
approach to separate the source signals from the short data sets in high reverberation. The
RR-ICA is used to speed up the convergence and make it more robust to outlets. The
main drawback in this algorithm is its sensitivity to outlets in the real world recording,
refer to chapter 4.

2.4 Ambiguities in ICA algorithms
In general, there are some ambiguities described in all ICA methods, as follows:
Scale ambiguity: one can’t identify the energies or the variances of the independent
components. Since, both of mixing matrix A and source signals s are unknown, then any
scalar multiplication on A or s will be lost in the de-mixing process.

ࢅ(ࢌ, ࢚)	

ࢄ(ࢌ, ࢚)	

∎	

࢞૛ 	
࢞࢔ 	

≠ ࢟૚ 	

ࢃ૚

࢞૚ 	

∎	

L-points

∎	

STFT

L points
ISTFT

≠ ࢟૛ 	

≠ ࢟૜ 	

ࢃ࢔

Figure 2.5: Illustration of permutation ambiguity in frequency domain.
Permutation ambiguity: one can’t identify the order of the independent components. The
mathematical model of the ambiguities of the ICA model can be expressed as follows:
62

x = As = (ADP)(Dିଵ P ିଵ s)

(2.84)

Recall the performance matrix
G = WA ≈ DP,
Where D is any non-singular diagonal matrix which illustrates the scale
ambiguity, and P is an identity matrix with permuted rows, which illustrates the
permutation matrix. So, in general, ICA methods recover the source signals s from only a
given observed signals x up to arbitrary scaling and permutation. However, in the
instantaneous ICA case, these Ambiguities are not affective and can be ignored. But in
some of convolutive ICA models, we will see that these ambiguities should be addressed
especially, in some applications such as when working in frequency domain.

2.4.1

Scale ambiguity
Generally speaking, the ICA algorithms aren’t able to determine the energies

(variances) of the independent components. As a result, in instantaneous mixture
problem, this ambiguity usually is ignored since one can normalized the source signals in
order to rectify this ambiguity without any loss. However, the unmixed signals can be
amplified or attenuated after the separation.
In frequency domain, ICA algorithm performs L instantaneous ICA algorithms
for each frequency bin. So, scaling ambiguity has a real effect in this domain, where any
arbitrary scaling change of each individual update rule will cause a spectral deformation
to our observed signals. Also, if the arbitrary scales are not uniform along the frequency,
this might cause changing in the signal envelope after separation.

63

Researchers have worked to tackle this ambiguity. A method proposed to keep the
un-mixing matrix normalized with unit norm was	‖	W୤ ‖ = 1, in order to remove the scale
of data as in Smaragdis’s paper [3], [39]. In addition, this also helps the natural gradient
to fast convergence. This can be expressed as follows:
	W୤ =

	୛౜

(2.85)

షభ

‖	୛౜ ‖ ొ

Another smart idea to solve this ambiguity is by constraining the diagonal
elements in the un-mixing matrix to be unity, such as	W୤ ୧୧ = 1. This constraint ensures
that there is no spectral deformation of the observed signals.
Scaling Ambiguity due to Minimal Distortion Principle (MDP)
For the sake of simplicity, let’s assume there is no permutation Γ(݂) = 1

ambiguity. Then, the estimated source signals ܵ(݂) as each frequency as follows:
ܻ(݂) = ܹ(݂)ܺ(݂) ≈ ‫)݂(ܵ)݂(ܦ‬

(2.86)

Thus, the estimated signals ܻ(݂) are scaled versions of the source signals ܵ(݂) by
diagonal matrix‫)݂(ܦ‬, however, after, multiplying both sides of the previous equation by
ܹ ିଵ (݂) . It becomes as follows

ܹ ିଵ (݂)	ܻ(݂) ≈ ܹ ିଵ (݂)	‫)݂(ܵ)݂(ܦ‬

(2.87)

Also, we have the

ܹ(݂) = 	‫ି ܪ)݂(ܦ‬ଵ (݂)	

Thus,

(2.88)

ܹ ିଵ (݂)	ܻ(݂) ≈ ‫)݂(ܵ)݂(ܪ‬

64

(2.89)

Under the Minimal Distortion Principle definition, the nth source is scaled with
respect to the image at the nth microphone [3], [60], [129]. Therefore, the rescaled output
signals are given by

2.4.2

ܻ ௦௖௔௟௘ௗ (݂) ≈ ݀݅ܽ݃(ܹ ିଵ (݂)	)Y(f)

(2.90)

Permutation ambiguity
In general, ICA algorithms suffer from the permutation problem [38], since it is

unable to recover the source signals in order. Although this ambiguity usually is ignored
in instantaneous mixtures especially in time domain, it has a place and a real effect in
convolutive mixtures especially in frequency domain as shown in Figure 2.5. Any
arbitrary permutation of the source signals along frequency axis will cause uncompleted
separation among the sources. Thus, several researchers have proposed methods to
impose some coupling between frequency bins to withstand the permutation along
frequency.
The main cause of the permutation ambiguity is the statistically independency
assumption between the frequency bins. Lee applies this assumption in time domain
especially in the source model, which is the nonlinearity in time domain, thus, he never
reported the permutation algorithm.
Permutation Ambiguity
Permutation Ambiguity is one of the main challenges in frequency domain for
BSS. Many techniques have been proposed to cope with this ambiguity in frequency
domain, but it is still an open issue [3]. Since, in this dissertation, we are choosing to
develop the robust ICA algorithm in frequency domain, we pay a lot of attention to

65

investigating this ambiguity and to developing a robust method to overcome this
ambiguity.
There are three main solution groups to solve the permutation ambiguity in
frequency domain as follows:
Group based on the geometric information such as Time Direction of
Arrivals (TDOA) and Direction of Arrivals (DOA) [3], [38], [72], [128].
Group based on the clustering-based techniques [57], [60], and [3].
In terms of performance, the first group generally performs better than the second
group especially with a small data sample. But it is not optimal in a practical sense, since
we don’t usually have geometric information about real environment conditions. A
second group performed better than first group especially when we had a large sample set
of data, because they are based on the clustering-based techniques i.e. correlation,
distance, etc. And, they are more robust for real world scenarios. For more details, refer
to [3], [64].

2.4.3

Circularity of Fast Fourier Transform (FFT)
It has been known that the time domain signal can be transformed to frequency-

domain by Fourier Transform. We are computed by the mean of the Discrete Fourier
Transform over sample time blocks	‫ܨ‬. This approximation means that we enforce the
signal to be a periodical signal with period equal to the sample frequency over the sample
௙

time block	ܶ = ிೞ . However, in [55], [50], [72] and [63] they have reported that this
simplification is not a realization in sense of time domain filters. Therefore, the transfer

66

function of this filters are unstable and having overshoots in his frequency response. For
more details refer to [38], [3].
There are two solutions to mitigate the circularity effect of FFT; 1) by increasing
the length of the DFT	‫[ ܨ‬72], and 2) by imposing smoothed function to modify the
frequency response of such a filter as in [38].

2.5 Performance Metrics of ICA methods

2.5.1

Instantaneous case

2.5.1.1 Performance Matrix G
In order to measure a performance for ICA algorithms, one can use the
Performance matrix G [2] as follows:
G = WA ≈ I
Ideally, our un-mixing matrix W should equal the inverse of mixing matrix	Aିଵ .
However, one would expect the matrix P to be closed to identity matrix. But, since ICA
method separates the sources up to permutation and scale. Then, the performance matrix
G should be a good indication measure of the quality of separation. Additionally,
performance matrix G presents the relation between the permutation of the original
sources and the estimated ones.
2.5.1.2 SNR measure:
In practice, one can use the signal to noise ratio (SNR) as a separation quality
measurement [2] as follows:
౟
SNR = 10log ቂ∑ (ୱ(୧)ି୷(୧))
ቃ
మ

∑ ୱమ (୧)

౟

67

(2.91)

In other words, it shows the comparison between the energies of an original
signal and estimated signals. Notably, to use this metric, we should compare the signals
with the same variance and polarity, since ICA method separates the sources up to a
permutation and scale.

2.5.2

Convolutive case

2.5.2.1 Performance Index:
From a statistical view, the performance index was established in [2] by
employing the performance matrix G as follows:
୫
PI = ∑୫
୧ୀଵ ൤∑୨ୀଵ

หୋ౟ౠ ห

୫ୟ୶ౡ |ୋ౟ౡ |

୫
− 1൨ + ∑୫
୨ୀଵ ൤∑୧ୀଵ

หୋ౟ౠ ห

୫ୟ୶ౡ |ୋ౟ౡ |

− 1൨

(2.92)

Obviously, for an ideal performance matrix	G, this index tends to minimum (to
zero). However, the larger performance index value PI is the worst performance for the
algorithm.
2.5.2.2 Mutual Information measure:
Reiss et al [56] takes advantage of the mutual information as a measure of
statistical independence and creates a performance index. He develops the time series
method to estimate the mutual information in [71].
2.5.2.3 Performance Evaluation
From (2), the separated sources are given by
s୧ (t) = ∑୫
୨ୀଵ W୧୨ ∗ x୨ (t)

(2.93)

According to [64], one can divide the power of one of the separated sources s୧ (t)
into two portions; the first portion belongs to the source coming from the source	i,	p୧୧ ,
68

second one belongs to the crosstalk signals s୩ (t), p୧୩ . Therefore, one can define the
output SIR as the ratio of the power of the first portion p୧୧ to the power of the second one
p୧୩ as follows:

SIR ୧ =

୮౟౟

୮౟ౡ

= 10 log ∑

∑౪ ୱమ౟౟ (୲)

(2.94)

మ
౪ ∑౟ಯౡ ୱ౟ౡ (୲)

In this dissertation, we will calculate the SIR for source i as follows
SIR ୧ = ୮ = 10 log
୮౟౟

౟ౡ

∑౪ቀ∑ౣ
ౠసభ ୛౟ౠ ∗୶ౠ౟ (୲)ቁ

మ

∑౪ ∑౟ಯౡቀ∑ౣ
ౠసభ ୛౟ౠ ∗୶ౠౡ (୲)ቁ

మ

(2.95)

We will deal with the convolve speech signals with premeasured real-word
recordings or artificially generated room impulse responses (RIRs). However, we only
have access to the observed signals x୨୧ (t) (microphone signals) recorded when only the
݅th source is active. We will set the input SIR as a baseline, i.e. the SIR obtained without
any processing. Or, we will refer to the most interesting evaluation criteria in [126], [127]
to study our algorithm performance.

69

3 Chapter 3

Convex Cauchy–Schwarz Independent
Component Analysis for Blind Source Separation

Independent Component Analysis (ICA) is a powerful tool in Blind Source
Processing (BSP). We present a new high-performance Convex Cauchy–Schwarz
Divergence (CCS-DIV) measure for Blind Source Separation (BSS) and unsupervised
learning of acoustic and speech signals. The CCS-DIV measure is developed by
integrating convex functions into the Cauchy–Schwarz inequality. By including a
convexity quality parameter, the measure has a broad control range of its convexity. With
this measure, a new CCS–ICA algorithm is structured and a non-parametric form is
developed incorporating the Parzen window-based distribution. Furthermore, a pairwise
iterative scheme is employed to tackle the high dimensional problem in BSS. We present
two schemes of pairwise non-parametric ICA algorithms based on gradient decent and
the Jacobi Iterative method. Several case-study scenarios are carried out on noise-free
and noisy mixtures of speech and music signals. Finally, the superiority of the proposed
CCS–ICA algorithm is demonstrated in metric-comparison performance with FastICA,
RobustICA, convex ICA (C-ICA), and other leading existing algorithms.

70

3.1 Introduction
Blind Signal Processing (BSP) is one of the most challenging and emerging areas
in signal processing. BSP has gained a solid theoretical foundation and numerous
potential applications. BSP remains a very important and challenging area of research
and development in many domains, e.g. biomedical engineering, image processing,
communication system, speech enhancement, remote sensing, etc. BSP techniques do not
assume full a priori knowledge about the mixing environment, source signals, etc. and do
not require any training samples. BSP includes three major areas: Blind Signal Separation
(BSS), Independent Component Analysis (ICA), and Multichannel Blind Deconvolution
(MBD) [1], [2].
In the following, we provide a focused and brief overview. ICA is considered a
key factor of BSS and unsupervised learning algorithms [1]. ICA specializes to Principal
Component Analysis (PCA) and Factor Analysis (FA) in multivariate analysis and data
mining, corresponding to second order methods in which the components are in the form
of a Gaussian distribution [8 - 20], [1], [2]. However, ICA is a statistical technique that
includes higher order statistics (HOS), where the goal is to represent a set of random
variables as a linear transformation of statistically independent components.
ICA techniques are based on the assumption of non-Gaussianity and

independence of the sources. Let an M × T observation vector X = [xଵ , xଶ , … x୑ ]୘ be

obtained from M statistically independent sources S = [sଵ , sଶ , … s୑ ]୘ 	by X = AS, where A

is an 	M × M invertible mixing matrix. The estimated sources can be modeled by Y = WX

where W is a demixing matrix. The ICA goal is to determine a demixing matrix W to
estimate the source signals. ICA uses the non-Gaussianity of sources and an

71

independence measure to find a demixing matrix	W. A measure could be based on the
mutual information, Higher Order Statistic (HOS), such as the kurtosis, and Joint
Approximate Diagonalization. In other words, the demixed matrix is obtainedby
optimizing such a contrast function.
Furthermore, the metrics of cumulants, likelihood function, negentropy, kurtosis,
and mutual information have been developed to obtain a demixing matrix in different
adaptations of ICA-based algorithms [1]. Comon [7] was the first to describe the
fundamentals of ICA. Recently, he proposed the Robust Independent Component
Analysis (R-ICA) in [11]. He used a truncated polynomial expansion rather than the
output marginal probability density functions to simplify the estimation processes. In [14
– 15], the authors have presented ICA using mutual information. They constructed a
formulation by minimizing the difference between the joint entropy and the marginal
entropy of different sources.
The so-called convex ICA [20] is established by incorporating a convex function
into a Jenson’s inequality-based divergence measure. Xu et al [21] used the
approximation of Kullback–Leibler (KL) divergence based on the Cauchy–Schwartz
inequality. Boscolo et al. [22] established nonparametric ICA by minimizing the mutual
information contrast function and by using the Parzen window distribution.
A new contrast function based on nonparametric distribution was developed by
Chien and Chen [23], [24] to construct the ICA algorithm. They used the cumulative
distribution function (CDF) to obtain a uniform distribution from the observation data.
Moreover, Matsuyama et al. [25] proposed the alpha divergence approach. Also, the fdivergence was proposed by Csiszar et al. [4], [6], [26]. Alternate studies have presented

72

the nonnegative matrix factorization (NMF) to solve the BSS problem [4]. They took
advantage of imposing the nonnegative constraints to minimize and measure the
approximation errors. The Euclidean distance and KL divergence were used as the error
functions for NMF problems in [26].
In addition, the maximum-likelihood (ML) criterion [27] is another tool for BSS
algorithms [27]–[29]. It is used to estimate the demixing matrix by maximizing the
likelihood of the observed data. However, the ML estimator needs to know all the source
distributions. Recently, in terms of divergence measure, Fujisawa et al. [28] have
proposed a very robust similarity measure to outliers and they called it the Gamma
divergence. In addition, the Beta divergence was proposed in [31] and investigated by
others in [4].
In this chapter, we develop an effective and improved measure of dependency
among the signals, and then we construct its corresponding (parametric and nonparametric) ICA algorithms. A novel family of dependency divergence is developed
which we name Convex Cauchy Schwarz Divergence (CCS-DIV) -- due to its use of the
Cauchy Schwarz Inequality “divergence.” We develop this new measure by conjugating a
convex function into the Cauchy–Schwarz inequality-based divergence measure. This
new contrast function has a wide range of effective curvature since it is controlled by a
convexity parameter. The corresponding convex Cauchy–Schwarz divergence ICA
(CCS–ICA) employs the Parzen window density approximation to distinguish the nonGaussian structure of source densities. We also present two effective pairwise ICA
algorithms: one is based on the gradient descent and the other is based on the Jacobi
optimization. The link between CCS_DIV, ED-DIV, KL-DIV and CS-DIV is also shown.

73

The efficacy of the corresponding ICA algorithms based on the proposed CCS-DIV is
verified by means of several ICA experiments. This CCS–ICA has succeeded effectively
in solving the BSS of speech and music signals with and without additive (Gaussian)
noise, and it has shown a high comparative performance outperforming other existing
ICA-based algorithms.
The chapter is organized as follows. Section II presents a brief description of
several divergence measures. Section III proposes the new convex Cauchy–Schwarz
divergence measure. Section IV presents the CCS–ICA method. The comparative
simulation results and conclusions are given in Section V and Section VI, respectively.

3.2 A Brief Description of Previous Divergence Measures
Divergence or their counterparts (dis)similarity measures play an important role in
the areas of neural computation, pattern recognition, learning, estimation, inference, and
optimization [4]. In general, they measure a quasi-distance or directed difference between
two probability distributions p and	q, which can also be expressed for unconstrained
arrays and patterns. Divergence measures are commonly used to find a distance or
difference between two ݊-dimensional probability distributions p	 = 	 (pଵ , pଶ , … p୬ )
and	q	 = 	 (qଵ , q ଶ , … q ୬ ). However, the divergence measure is a fundamental and key
factor in measuring the dependence between observed variables and creating an ICAbased procedure.
In this dissertation, we are mostly interested in distance-type measures that are
separable, thus, satisfying the condition	D(p||q) = ∑୬୧ୀଵ d(p୧ , q ୧ ) ≥ 0; where the
condition equals zero if and only if	p = q. But they are not necessarily symmetric in the

74

sense that D(p||q) = D(q||p) and do not necessarily satisfy the triangular
inequality	D(p||q) ≤ D(p||z) + D(z||q), for	another	distribution	z .
Usually, the vector p corresponds to the observed data and the vector q is the
estimated or expected data that are subject to constraints imposed on the assumed models.
For the BSS (ICA and NMF) problem, p	corresponds to the observed data matrix X and q
corresponds to the estimated matrix	Y = WX. Information divergence is a measure of
distance between two probability curves. In other words, the distance-type measures
under consideration are not necessarily a metric on the space P of all probability
distributions [4].
The metric is the distance between two pdfs if the following conditions

hold: (݅)	‫∑ = )ࢗ||࢖(ܦ‬௡௜ୀଵ ݀(‫݌‬௜ , ‫ݍ‬௜ ) ≥ 0 if and only if 	࢖ = ࢗ, (݅݅)	‫)࢖||ࢗ(ܦ = )ࢗ||࢖(ܦ‬

and	(݅݅݅)	‫ )ࢠ||࢖(ܦ ≤ )ࢗ||࢖(ܦ‬+ ‫)ࢗ||ࢠ(ܦ‬. Distances which are not a metric, are referred to
as divergences [4]. Next, we review the most common divergence measures with onedimensional probability curves.

3.2.1

Previous Divergence Measures

Shannon theory shows the KL divergence (KL-DIV) [1], [4], which is the relative
entropy between the joint distributions of two continuous variables xଵ and xଶ (p(xଵ , xଶ ))
and the product of their marginal distributions (p(xଵ )p(xଶ )). KL-DIV is given by

D୏୐ (xଵ , xଶ ) = H൫p(xଵ )൯ + H൫p(xଶ )൯ − H൫p(xଵ , xଶ )൯
D୏୐ (xଵ , xଶ ) = ∬ p(xଵ , xଶ )log ቀ୮(୶

୮(୶భ ,୶మ )
ቁ . dxଵ dxଶ
భ )∙୮(୶మ )

75

(3.1)

(3.2)

where D୏୐ (xଵ , xଶ ) ≥ 0 with equality if and only if xଵ = xଶ. 	 This means that they
are independent of each other. Xu [21] developed Euclidean divergence (E-DIV) and
Cauchy–Schwartz divergence (CS-DIV) by joining the terms of joint distributions of two
variables and their product of marginal distributions into the Euclidean distance and the
Cauchy– Schwartz inequality, respectively. E-DIV and CS-DIV are given respectively by
D୉ (xଵ , xଶ ) = ∬൫p(xଵ , xଶ ) − p(xଵ ) ∙ p(xଶ )൯ . dxଵ dxଶ
ଶ

Dୌ (xଵ , xଶ ) = log

(3.3)

∬ ୮(୶భ ,୶మ )మ .ୢ୶భ ୢ୶మ ∙∬ ୮(୶భ )మ ∙୮(୶మ )మ .ୢ୶భ ୢ୶మ
[∬ ୮(୶భ ,୶మ )୮(୶భ )୮(୶మ ).ୢ୶భ ୢ୶మ ]మ

(3.4)
where D୉ (xଵ , xଶ ) ≥ 0 and Dୌ (xଵ , xଶ ) ≥ 0 and the equality holds if and only if
xଵ = xଶ .	At equality, the variables are independent of each other. These divergence
measures are reasonable contrast functions to be used in the ICA method as novel
measures of dependence. Furthermore, the alpha divergence (α-DIV) was developed by
Amari et. al. [2], [4]. It can be used as a measure of dependence. α-DIV is given by:
D஑ (xଵ , xଶ , α) = ∬ ቎

ଵି஑
ଶ

p(xଵ , xଶ ) +

ଵା஑
ଶ

																											భషಉ
p(xଵ ) ∙ p(xଶ ) − 			p(xଵ , xଶ ) మ ൫p(xଵ ) ∙

p(xଶ )൯

ಉశభ
మ

቉ . dx dx .
ଵ
ଶ

(3.5)

Matsuyama [25] introduced the alpha ICA algorithm by using α-DIV as a contrast
function of the ICA method. In the case	α = −1, the α-DIV is equivalent to KL-DIV [4],
[6].

Csiszár [26] introduced an interesting divergence measure that is called an f-

divergence (f-DIV) and is given by

76

D୤ (xଵ , xଶ ) = ∬ p(xଵ , xଶ )f ቀ

୮(୶భ ,୶మ )
ቁ . dxଵ dxଶ
୮(୶భ )∙୮(୶మ )

(3.6)

where f(. ) denotes a convex function satisfying f(t) ≥ 0 for t ≥ 0, and f(1) = 0,

fሖ (1) = 0. In addition, Csiszár shows that the α-DIV is a special case of f-DIV when using
the following convex function
f(t) =

ସ

ଵି஑

మቂ

ଵି஑
ଶ

+

ଵା஑
ଶ

t−t

భశಉ
మ

ቃ For	t ≥ 0

(3.7)

Furthermore, Zhang [31] developed a general divergence function by integrating
the α-DIV and f-DIV functions into the following form:
D୞ (xଵ , xଶ ) = ଵି஑మ ቄ
ସ

‫׬‬f൬

ଵି஑
ଶ

ଵି஑
ଶ

p(xଵ , xଶ ) +

∬ f൫p(xଵ , xଶ )൯ . dxଵ dxଶ +

ଵା஑
ଶ

ଵା஑

p(xଵ ) ∙ p(xଶ )൰ . dxଵ dxଶ ቅ

ଶ

∬ f൫p(xଵ ) ∙ p(xଶ )൯ . dxଵ dxଶ −

(3.8)
Lin [32] developed a Jensen–Shannon divergence (JS-DIV) by using the Shannon
entropy H[. ] into the Jensen’s inequality; the JS_DIV is given by

D୎ୗ (xଵ , xଶ ) = H൫λp(xଵ , xଶ ) + (1 − λ)p(xଵ )p(xଶ )൯ − 				λH൫p(xଵ , xଶ )൯ − (1 −
λ)H൫p(xଵ )p(xଶ )൯

(3.9)

where 0 ≤ λ ≤ 1 represents a weighting parameter between the joint distribution
and the product of their corresponding marginal distributions. D୎ୗ (xଵ , xଶ ) ≥ 0, and the
equality holds if and only if	xଵ = xଶ 	. Recently, Chien [20] proposed the convex

77

divergence (C-DIV) by using the Jensen’s inequality. C-DIV is developed by combining
the convex function f(. ) into the Jensen’s inequality. C-DIV is given by
Dେ (xଵ , xଶ , α) =
(1 − λ) ∬ ൤
ଵା஑
ଶ

ଵି஑
ଶ

+

ସ

൜ λቂ
మ ∬

ଵି஑
ଵା஑
ଶ

ଵି஑
ଶ

+

ଵା஑
ଶ

p(xଵ , xଶ ) − p(xଵ , xଶ )

p(xଵ ) ∙ p(xଶ ) − ൫p(xଵ ) ∙ p(xଶ )൯

భశಉ
మ

భశಉ
మ

ቃ . dxଵ dxଶ +

൨ . dxଵ dxଶ − ൤
భశಉ
మ

൫λp(xଵ , xଶ ) + (1 − λ)p(xଵ )p(xଶ )൯ − ൫λp(xଵ , xଶ ) + (1 − λ)p(xଵ )p(xଶ )൯

ଵି஑
ଶ

+

ቃൠ

(3.10)
In the case	α = 1, C-DIV is equivalent to the JS-DIV. Dେ (xଵ , xଶ , α) ≥ 0 and the
equality holds if and only if	xଵ = xଶ , which means they are independent of each other.

3.2.2

The proposed Divergence Measure

While there exist a wide range of measures, performance especially in audio and
speech applications still requires improvements. The quality of an improved measure
should provide geometric properties for a contrast function in anticipation of a dynamic
(e.g., gradient) search in a parameter space of de-mixing matrices. The motivation here is
to introduce a simple measure and incorporate controllable convexity in order to control
convergence to the optimal solution.
To improve the performance of the divergence measure and speed up the
convergence, this chapter presents a novel divergence method that is based on
conjugating the convex function into the Cauchy–Schwartz inequality. In this context, we
take advantage of the convexity parameter alpha to control the convexity in the
divergence function and to speed up the convergence in the ICA and NMF algorithms.
Incorporating the joint distribution (p(xଵ , xଶ )) and the marginal distributions
78

(p(xଵ )p(xଶ )) into the convex function f(. ) in (3.7) and conjugating them to the Cauchy–
Schwartz inequality yields
หൻf൫p(xଵ , xଶ )൯, f൫p(xଵ )p(xଶ )൯ൿห

ଶ

≤ 		 ൻf൫p(xଵ , xଶ )൯, f൫p(xଵ , xଶ )൯	ൿ ∙ 	 ൻf൫p(xଵ )p(xଶ )൯, f൫p(xଵ )p(xଶ )൯ൿ (3. 11)

where 〈∙	,∙〉 is an inner product; Now, based on the Cauchy–Schwartz inequality a
new symmetric divergence measure is proposed, namely:
Dେୌ (xଵ , xଶ , α) = 	log

∬ ୤మ ൫୮(୶భ ,୶మ )൯.ୢ୶భ ୢ୶మ 	∙	∬ ୤మ ൫୮(୶భ )∙୮(୶మ )൯.ୢ୶భ ୢ୶మ 	
[∬ ୤൫୮(୶భ ,୶మ )൯∙୤൫୮(୶భ )୮(୶మ )൯.ୢ୶భ ୢ୶మ ]మ

where Dେୌ (xଵ , xଶ , α) ≥ 0 and

equality holds if and only if xଵ = xଶ .

(3.12)

This

divergence function is then used to develop the ICA and NMF algorithms. Notably, the
joint distribution and product of the marginal densities in Dେୌ (xଵ , xଶ , α) is symmetric.
This symmetrical property does not hold for KL-DIV, α-DIV, and f-DIV. Additionally,
the CCS-DIV is tunable by the convexity parameter α.
In contrast to C-DIV and α-DIV , the convexity parameter α range is extendable.
However, Based on l’Hopital’s rule, one can derive the realization of CCS-DIV for the
case of ߙ = 1 and ߙ = −1 by finding the derivatives, with respect to	ߙ, of the numerator
and denominator for each parts of Dେୌ (‫ݔ‬ଵ , ‫ݔ‬ଶ , α).	 Thus, the CCS-DIV with	ߙ = 1 and
ߙ = −1	are respectively given by (3.13) and (3.14).

79

3.2.3

Link to other Divergences:

This CCS-DIV distinguishes itself from the previous divergences in the literature
by incorporating the convex function into (not merely a function of) the Cauchy Shawarz
inequality-- in order to guarantee convexity in the new divergence. This chapter thus
develops a framework for generating a family of dependency measure based on
conjugating the convex function into the Cauchy Shawarz inequality. Such convexity is
anticipated (as is evidenced by experiments) to reduce local minimum near the optimal
solution and enhance searching a non-linear surface of the contrast function. Also, it
provides a flexibility of scalability to high dimensional data. The motivation behind this
divergence is to render the CS-DIV to be convex similar to the f-DIV. For this work, we
shall focus on one convex function	f(t) as in (3.7), and its corresponding CCS-DIVs in
(3.13) and (3.14). It can be seen that the CCS-DIV, for the α = 1 and α = −1	cases, is
implicitly based on Shannon entropy (KL divergence) and Renyi’s quadratic entropy,
respectively. Also, it is to show that the CCS_DIVs for the α = 1 and α = −1	cases are
convex functions in contrast to the CS-DIV.

Dେୌ (‫ݔ‬ଵ , ‫ݔ‬ଶ , 1) =
	log

ቀ∬ ቄ൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯ − p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) + 1൯ 	ቅ d‫ݔ‬ଵ d‫ݔ‬ଶ 	ቁ ∙ ቀ∬ ቄ൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯ − p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬
ଶ

[∬൛൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯ − p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) + 1൯ ∙ ൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯ − p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) + 1൯	ൟ

(3.13)
Dେୌ (xଵ , xଶ , −1)
= log

ቀ∬ ቄ൫log൫p(xଵ , xଶ )൯ − p(xଵ , xଶ ) + 1൯ 	ቅ dxଵ dxଶ 	ቁ ∙ ቀ∬ ቄ൫log൫p(xଵ ) ∙ p(xଶ )൯ − p(xଵ ) ∙ p(xଶ ) + 1൯ ቅ dxଵ dxଶ 	ቁ
ଶ

ଶ

[∬൛൫log൫p(xଵ , xଶ )൯ − p(xଵ , xଶ ) + 1൯ ∙ ൫log൫p(xଵ ) ∙ p(xଶ )൯ − p(xଵ ) ∙ p(xଶ ) + 1൯ൟdxଵ dxଶ ]ଶ

(3.14)

80

3.2.4
Geometrical Interpretation of the Proposed Divergence for હ = ૚
and	હ = −૚.
For simplicity, let’s define the following terms:
ܸ௃ = ඵ(‫ݔ(݌‬ଵ , ‫ݔ‬ଶ ))ଶ ݀‫ݔ‬ଵ ݀‫ݔ‬ଶ

ܸெ = ඵ(‫ݔ(݌‬ଵ )‫ݔ(݌‬ଶ ))ଶ ݀‫ݔ‬ଵ ݀‫ݔ‬ଶ

ܸ௖ = ඵ ‫ݔ(݌‬ଵ , ‫ݔ‬ଶ )‫ݔ(݌‬ଵ )‫ݔ(݌‬ଶ )݀‫ݔ‬ଵ ݀‫ݔ‬ଶ

‫ۓ‬ඵ ൝ቆp(‫ݔ‬ଵ , ‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯ቇ 	ൡ d‫ ݔ‬d‫ = ߙ		 	 ݔ‬1
ଵ
ଶ
ۖ
−p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) + 1
	
ܸ௃௃ =
ଶ
‫۔‬
log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯
ۖ ඵ ൝ቆ
ቇ ൡ d‫ݔ‬ଵ d‫ݔ‬ଶ 	 															ߙ = −1
−p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) + 1
‫ە‬
ଶ

‫ۓ‬ඵ ൝ቆp(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯ቇ ൡ d‫ ݔ‬d‫ = ߙ				 	 ݔ‬1
ଵ
ଶ
ۖ
−p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) + 1
	
=
ଶ
‫۔‬
log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯
ۖඵ ൝ቆ
ቇ ൡ d‫ݔ‬ଵ d‫ݔ‬ଶ 			 																									ߙ = −1
−p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) + 1
‫ە‬
ଶ

ܸெெ

ܸ஼஼ =

p(‫ ݔ‬, ‫ ∙ ) ݔ‬log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯
‫ ۓ ۓ‬ቆ ଵ ଶ
ቇ∙ ۗ
ۖ
ۖ
−p(‫ݔ‬ଵ , ‫ݔ‬ଶ ) + 1
ۖඵ
d‫ ݔ‬d‫ = ߙ		 ݔ‬1
ۖ ‫ ۔‬p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) ∙ log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯ ۘ ଵ ଶ
ቇۖ
ۖ ۖቆ
−p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) + 1
‫ە‬
ۙ
	
‫ ۓ ۔‬log൫p(‫ݔ‬ଵ , ‫ݔ‬ଶ )൯
ۗ
ۖ ۖ ቆ−p(‫ ݔ‬, ‫ ) ݔ‬+ 1ቇ ∙ ۖ
ଵ ଶ
d‫ݔ‬ଵ d‫ݔ‬ଶ 		 																					ߙ = −1
ۖඵ
ۖ ‫۔‬ቆ log൫p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ )൯ ቇ	ۘ
ۖ
ۖ
‫ ە ە‬−p(‫ݔ‬ଵ ) ∙ p(‫ݔ‬ଶ ) + 1 ۙ

With these terms, one can express the CCS-DIV and the CS-DIV as
‫ܦ‬஼஼ௌ = log൫ܸ௃௃ ൯ + log(ܸெெ ) − 2log(ܸ஼஼ )	
‫ܦ‬஼ௌ = log൫ܸ௃ ൯ + log(ܸெ ) − 2log(ܸ஼ )	
81

(3.15)
(3.16)

In Figure 3.1, we illustrate the geometrical interpretation of the proposed
divergence (CCS-DIV), which is equivalent to Cauchy Schwarz Divergence (CS-DIV).
Geometrically, we can show that the angle between the Joint pdfs and Marginal pdfs in
the CCS-DIV is given as following:
ߠ஼஼ௌ = acos ൬

௏಴಴

ඥ௏಻಻ ௏ಾಾ

൰ ≡ ߠ஼ௌ = acos ൬

௏಴

ඥ௏಻ ௏ಾ

൰

(3.17)

where	ܽܿ‫ ݏ݋‬denotes the cosine inverse. As a matter of fact, the convex function ݂
renders the CS-DIV a Convex contrast function for the ߙ = 1	and ߙ = −1	cases.
Moreover, it provides the proposed measure an advantage over the CS-DIV in terms of
speed and accuracy.

3.2.5

Evaluation of Divergence Measures

In this section, the relations among the KL-DIV, E-DIV, CS-DIV, JS-DIV, αDIV, C-DIV, and the proposed CCS-DIV are discussed. C-DIV, α-DIV, and the proposed
CCS-DIV with α = 1, α = 0	and	α = −1 are evaluated. Without loss of generality, a

82

܎(‫࢞(ܘ‬૚ , ࢞૛ ))

܎(‫࢞(ܘ‬૚ )࢖(࢞૛ ))

ࢂࡶࡶ
ࣂ࡯࡯ࡿ

ࢂࡹࡹ

ࡰ࡯࡯ࡿ
ࢂ࡯࡯ = ‫) ࡿ࡯࡯ࣂ(ܛܗ܋‬ඥࢂࡶࡶ ࢂࡹࡹ
= −࢒࢕ࢍ((ࢉ࢕࢙ࣂ࡯࡯ࡿ )૛ )ᇹ

Figure 3.1: Illustration of Geometrical Interpretation of the proposed Divergence
simple case is considered. Two binomial variables {xଵ , xଶ } in the presence of the binary
events {A, B} are considered as in [20], and [24].
The

joint

probabilities

p୶భ ,୶మ (A, A),

are

p୶భ ,୶మ (A, B),

	p୶భ ,୶మ (B, A)	and	p୶భ ,୶మ (B, B), and the marginal probabilities are p୶భ (A), p୶భ (B), p୶మ (A)
and p୶మ (B). Different divergence methods are tested by fixing the marginal probabilities,
e.g., p୶భ (A) = 0.7, p୶భ (B) = 0.3, p୶మ (A) = 0.5 and	p୶మ (B) = 0.5, and setting the joint
probabilities of p୶భ ,୶మ (A, A) and p୶భ ,୶మ (B, A) free in the intervals (0, 0.7) and (0, 0.3),
respectively.
Figure 3.2 shows the different divergence measures versus the joint
probability	p୶భ ,୶మ (A, A). All the divergence measures reach the same minimum
at	p୶భ ,୶మ (A, A) = 0.35, which means that the two random variables are independent.
Figure 3.3 shows the CCS-DIV and α-DIV at different values of α, which controls the
slope of curves, respectively. Among these measures, the steepest curve is obtained by
the CCS-DIV at	α = −1. Fig. 3.4 represents the CCS-DIV with different values of α:
positive values more than +1 and negative values less than -1.

83

Notably, CCS-DIV works with any value of α and it effectively increases the
slope of the “learning” curve by decreasing α; on the contrary, C-DIV and α-DIV work
only for	|α| ≤ 1. Furthermore, the flattest curve is obtained by CCS-DIV with increasing
α, see Figure. 3.4. This is similar to E-DIV [6] and C-DIV [20] with	α = 1. Moreover, as
we have shown in Figure 3.2 and Figure 3.4, CCS-DIV with α ≥ −1 is comparatively
sensitive to the probability model and obtains the minimum divergence effectively.
However, CCS-DIV with α ≥ −1 should be a good choice as a contrast function for
devising the ICA algorithm.
It is also worthwhile to compare and study the difference between the proposed
measure and the Cauchy-Schwarz measure. Figure 3.5 shows the different divergence
measures versus the joint probabilities	p௫భ ,௫మ (A, A) and	p௫భ ,௫మ (B, A). According to Figure

3.5, all the divergence measures reach the same minimum on the line	p௫భ ,௫మ (A, A) =

1.5p௫భ ,௫మ (B, A)		, which means that the two random variables become independent. One

can observe that the CS-DIV is not a convex function of the pdfs in contrast to CCS-DIV
from the graphs in Figure 3.5.

84

0.9
KL DIV
E DIV
CS DIV
CCS-DIV alpha=-1
CCS-DIV alpha=1
CDIV alpha=1
CDIV alpha=-1
alpha-DIV alpha=1

0.8

Divergence Measure

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2

0.25

0.3

0.35
P y1,y2 (A,A)

0.4

0.45

0.5

Figure 3.2: Different divergence measures versus the joint probability
‫ܠ۾‬૚ ,‫ܠ‬૛ (‫ۯ‬, ‫)ۯ‬

0.5
alphaDIV with alpha=-1
alphaDIV with alpha=1
alphaDIV with alpha=0
CCS-DIV with alpha=-1
CCS-DIV with alpha=0
CCS-DIV with alpha=1

0.45

Divergence Measure

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.2

0.25

0.3

0.35
Py1,y2 (A,A)

0.4

0.45

0.5

Figure 3.3: CCS-DIV and α-DIV versus the joint probability ‫ܠ۾‬૚ ,‫ܠ‬૛ (‫ۯ‬, ‫)ۯ‬

85

0.8

CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV
CCS-DIV

0.7

Divergence Measure

0.6
0.5
0.4

with
with
with
with
with
with
with
with
with
with
with

alpha=-1
alpha=1
alpha=0
alpha=-2
alpha=-3
alpha=-4
alpha=-5
alpha=2
alpha=3
alpha=4
alpha=5

0.3
0.2
0.1
0
0.2

0.25

0.3

0.35
P y1,y2 (A,A)

0.4

0.45

0.5

Figure 3.4: CCS-DIV with various alphas versus the joint probability
‫ܠ۾‬૚ ,‫ܠ‬૛ (‫ۯ‬, ‫)ۯ‬

Surface of CCS-DIV

Contour of CCS-DIV
0.35

0.8
0.3
0.6
0.25

0.2

P

x2,x1

0.4

0.2

0.15

0

0.1

0.4
1

0.05

0.2
P

0.5
0

x2,x1

0

0.1

P x1,x1

0.2 0.3
P

0.4

0.5

x1,x1

Surface of CS-DIV

Contour of CS-DIV
0.35

0.8
0.3
0.6
0.25
P x2,x1

0.4

0.2

0.2

0.15

0

0.1

0.4
1

0.05

0.2
P x2,x1

0.5
0 0

P x1,x1

0.1 0.2 0.3 0.4 0.5
P x1,x1

Figure 3.5: The surfaces and Contours of CCS-DIV vs CS-DIV

86

3.3 Convex Cauchy–Schwarz Divergence Independent Component Analysis
(CCS–ICA)
In this section, we develop the ICA algorithm by using the CCS-DIV as a contrast
function. Let us consider a simple system that is described by the vector-matrix form
x = Hs + v

(3.18)

where x = [xଵ , … , x୑ ]୘ 	is a mixture vector, s = [sଵ , … , s୑ ]୘ is a source signal
vector, v = [vଵ , … , v୑ ]୘ is an additive noise vector, and H is an unknown full rank

M × M mixing matrix. However, to obtain a good estimate of Y = Wx of the source

signals	s, the contrast function CCS-DIV should be minimized. Then, the components of
Y become least dependent, that is, when this demixing matrix W becomes a rescaled
permutation of	H ିଵ . Following the standard ICA procedure, the original data x should be
preprocessed by removing the mean {E[x] = 0} and by a weighting matrixቄ	V =
Λ

ିଵൗ ୘
ଶ E },

where the matrix E	represents the eigenvectors matrix and Λ the eigenvalues

matrix of the autocorrelation, namely,		{R ୶୶ = E[xx ୘ ]}. However, the whitening step

obtained matrix (V) so that the MxT whitened data vector (X ୲ ) has covariance of identity
matrix, {R ୶୶ = I୏ }, which can be obtained as ቄX ୲ = Λିమ V ୘ xቅ. The demixing matrix can
భ

be estimated by, e.g., the gradient descent algorithm [2], [13]:
W(k + 1) = W(k) − γ

பୈిి౏ (ଡ଼,୛(୩))
ப୛(୩)

(3.19)

where k represents the iteration index and γ is a step size or a learning rate.
Therefore, the updated term in the gradient descent is composed of the differentials of the
CCS-DIV with respect to each element w୫୪ of the M × M demixing matrix	W.

87

The differentials

பୈిి౏ (ଡ଼,୛(୩))
ப୵ౣౢ (୩)

	, 1 ≤ m, l ≤ M are calculated using a different

probability model and CCS-DIV measures as in [2], [20] and [24]. The update procedure
(14) will stop when the absolute increment of the CCS-DIV measure meets a predefined
threshold value. During the iterations, we should make the normalization step w୫ =
w୫
ൗ||w || for each row of W,	 where ||. ||	denotes a norm. Furthermore, we can use the
୫
CCS-DIV measure in the natural gradient format to increase the efficiency of the ICAbased algorithm, i.e.
W(k + 1) = W(k) − γ

பୈిి౏ ൫ଡ଼,୛(୩)൯
ப୛(୩)

W ୘ (k)W(k)

(3.20)

The natural gradient KL-ICA algorithm [28] suffers from the problem of
convergence to the matrix with large scaling values, especially, if the initial demixing
matrix and learning rate are not carefully selected by the user. This kind of problem is too
challenging and hard to overcome specifically when a highly non-linear function is
presented in the KL-ICA. However, many regularization algorithms have been proposed
to stabilize the KL-ICA algorithm and improve the convergence speed as in [2], [13], and
[28].
In general, dealing with the indeterminacy of the scales of the demixed signals in
the natural gradient form is, at most times, hard. Here, the ICA algorithm based on the
CCS-DIV measure mitigates this problem by selecting an appropriate learning rate.
In setting up the CCS–ICA algorithm based on the proposed CCS-DIV
measure,	Dେୌ (xଵ , xଶ , α), usually, the vector xଵ	 corresponds to the observed data and the
vector xଶ corresponds to the estimated or expected data. Here, the CCS–ICA algorithm is
established as follows.

88

Assuming that the demixed signals Y୲ = WX ୲ with the mth component denoted as

	y୫୲ = w୫ X ୲ . Then, using CCS-DIV as the contrast function with built-in convexity
parameter α,	we get
Dେୌ (Y୲ , y୫୲ , α) = 		log

∬ ୤మ (୮(ଢ଼౪ )).ୢ୷భ ୢ୷మ 	∙	∬ ୤మ ൫∏౉
భ ୮(୷ౣ౪ )൯.ୢ୷భ ୢ୷మ 	
మ
[∬ ୤(୮(ଢ଼౪ ))∙୤൫∏౉
భ ୮(୷ౣ౪ )൯.ୢ୷భ ୢ୷మ ]

(3.21)

We use the Lebsegue measure [5] to approximate the integral with respect to the
joint distribution of		Y୲ = {yଵ , yଶ , … , y୒ }. The contrast function thus becomes
Dେୌ (Y୲ , y୫୲ , α) = log

౐ మ
ొ
మ
∑౐
భ ୤ (୮(୛ଡ଼౪ ))∙∑భ ୤ ൫∏భ (୮(୵ౣ౪ ଡ଼౪ ))൯
ొ
మ
[∑౐
భ ୤(୮(୛ଡ଼౪ ))∙୤൫∏భ (୮(୵ౣ౪ ଡ଼౪ ))൯]

(3.22)

The adaptive CCS–ICA algorithms are carried out by using the deferential of the
proposed divergence ൭

	∂Dେୌ (Y୲ , y୫୲ , α)
ൗ∂w ൱ which is derived in Appendix A. Note
୫୪

that the derivative of determinant demixing matrix (det	(W)) with respect to element
(w୫୪ 	) equals the cofactor of entry	(m, l)		in the calculation of the determinant of		W,
which means	ቀ

ப ୢୣ୲(୛)
ப୵ౣౢ

= W୫୪ ቁ. And the joint distribution of the output is determined by

౪
p(Y୲ ) = |ୢୣ୲	(୛)|
in Appendix A.

୮(ଡ଼ )

For simplicity, we can write Dେୌ (Y୲ , y୫୲ , α) as a function of three variables.
୚ ∙୚మ

Dେୌ (Y୲ , y୫୲ , α) = log (୚భ

(3.23)

మ
య)

Then,
பୈిి౏ (ଢ଼౪ ,୷ౣ౪ ,஑)
ப୵ౣౢ

=

୚ᇲభ ୚మ ା୚భ ୚ᇲమ ିଶ୚భ ୚మ ୚ᇲయ
୚భ ୚మ ୚య

where
୘

୘

୲ୀଵ

୲ୀଵ

Vଵ = ෍ f ଶ (Y୲ )	,			 Vଵᇱ = ෍ 2f(Y୲ )f ᇱ (Y୲ )Y୲ᇱ

89

(3.24)

୘

୘

୲ୀଵ

୲ୀଵ

ᇱ
Vଶ = ෍ f ଶ (y୫୲ )		,				 Vଶᇱ = ෍ 2f(y୫୲ )f ᇱ (y୫୲ )y୫୲
୘

Vଷ = ෍ f(Y୲ ) f(y୫୲ )	,			
୲ୀଵ

Vଷᇱ

୘

= ෍f
୲ୀଵ

ᇱ

(Y୲ )f(y୫୲ )Y୲ᇱ

୘

ᇱ
+ ෍ f(Y୲ )f ᇱ (y୫୲ )y୫୲
୲ୀଵ

୑

Y୲ = p(WX ୲ )and	yଶ = ෑ p(w୫ X ୲ )
୬ୀଵ

Y୲ᇱ =
Where	

ப ୢୣ୲(୛)
ப୵ౣౢ

∂Y୲
p(X ୲ )
∂ det(W)
=−
∙
∙ sign(det(W),	
ଶ
|det(W)|
∂w୫୪
∂w୫୪
= W୫୪ ;

ᇱ
y୫୲

୑

∂y୫୲
∂p(w୬ X ୲ )
=
= ቎ෑ p൫w୨ X ୲ ൯቏
∙ x .		
∂w୫୪
∂(w୬ X ୲ ) ୪
୨ୀ୫

where	x୪ 	denotes	the	lth	entry	of	X ୲ .
In general, the estimation accuracy of a demixing matrix in the ICA algorithm is
limited by the lack of knowledge of the accurate source probability densities. However,
non-parametric density is used in [1], [13], [15], [43] by applying the Parzen window
estimation since it has a distribution shape that is data-driven and is flexibly formed
based on the Kernel functions with a bandwidth	h. In this work, a novel non-parametric
CCS–ICA algorithm is also presented by minimizing the CCS-DIV to generate the
demixed signals	Y = [yଵ , yଶ , … , y୑ ]	୘ .

90

The demixed signals are described by the following univariate and multivariate
distributions [43],
p(y୫ ) =

ଵ

୘୦

∑୘୲ୀଵ ϑ ቀ

୷ౣ ି୷ౣ౪

p(Y) = ୘୦౉ ∑୘୲ୀଵ φ ቀ
ଵ

୦

ଢ଼ିଢ଼౪
୦

ቁ

(3.25)

ቁ

(3.26)

where the univariate Gaussian Kernel is
ϑ(u) = (2π)

ି

ଵ ୳మ
ି
ଶe ଶ

and the multivariate Gaussian Kernel is
φ(u) = (2π)ି మ eି୳ మ .
ొ

౐౫

The Gaussian kernel, used in the non-parametric ICA, is a smooth function. We
note that the performance of a learning algorithm based on the non-parametric ICA is
better than the performance of a learning algorithm based on the parametric ICA. By
substituting (20) and (21) with Y୲ = WX ୲ and y୫୲ = w୫ x୲ into (17), the nonparametric
CCS-DIV becomes
	Dେୌ (Y୲ , y୫୲ , α) = 							log

౭ౣ ൫౔౪ ష౔౟ ൯
భ ౐
∑
஬(
)൰
౐౞ ౟సభ
౞
౭ౣ ൫౔౪ ష౔౟ ൯
౉భ ౐
൰)]మ
[∑౐
౪సభ ୤൫୮(୛୶౪ )൯∙୤(∏భ ౐౞ ∑౟సభ ஬൬
౞

౐
౉
మ
మ
∑౐
౪సభ ୤ (୮(୛ଡ଼౪ ))∙∑౪సభ ୤ ൬∏భ

(3.27)
and its derivative is
பୈిి౏ (ଢ଼౪ ,୷ౣ౪ ,஑)
ப୵ౣౢ

=

୚ᇲభ ୚మ ା୚భ ୚ᇲమ ିଶ୚భ ୚మ ୚ᇲయ
୚భ ୚మ ୚య

where
Vଵ =

୘

෍ f ଶ (Y୲ )	,			 Vଵᇱ
୲ୀଵ

୘

= ෍ 2f(Y୲ )f ᇱ (Y୲ )Y୲ᇱ
୲ୀଵ

91

(3.28)

୘

୘

୲ୀଵ

୲ୀଵ

ᇱ
Vଶ = ෍ f ଶ (y୫୲ )		,				 Vଶᇱ = ෍ 2f(y୫୲ )f ᇱ (y୫୲ )y୫୲
୘

Vଷ = ෍ f(Y୲ ) f(y୫୲ )	,			
୲ୀଵ

Vଷᇱ

Y୲ᇱ =

where		

ப ୢୣ୲(୛)
ப୵ౣౢ

y୫୲

ᇱ
y୫୲

୘

= ෍f

ᇱ

୲ୀଵ

(Y୲ )f(y୫୲ )Y୲ᇱ

୘

ᇱ
+ ෍ f(Y୲ )f ᇱ (y୫୲ )y୫୲
୲ୀଵ

Y୲ = p(WX ୲ )

∂Y୲
p(X ୲ )
∂ det(W)
=−
∙
∙ sign(det(W),	
ଶ
|det(W)|
∂w୫୪
∂w୫୪

= W୫୪ ; and sign(∙) is the sign function. Thus

୑

୘

୒

୘

୫ୀଵ

୧ୀଵ

୬ୀଵ

୧ୀଵ

1
y୫ − y୫୧
1
w୫ (X ୲ − X ୧ )
=ෑ
෍ϑቀ
ቁ = ෑ ෍ϑቆ
ቇ
h
Th
h
Th
୘

∂y୫୲
1
w୫ (X ୲ − X ୧ )
w୫ (X ୲ − X ୧ )
X ୲୪ − X ୧୪
=
= − ෍ϑቆ
ቇ∙ቆ
ቇ∙൬
൰
∂w୫୪
h
h
h
Th
୧ୀଵ

୑

∙ ቎ෑ p൫w୨ X ୲ ൯቏.		
୨ஷ୫

where X ୲୪ and X ୧୪ denote the lth entry of	X.
Remark: This non-parametric CCS_DIV might suffer from insufficient data and
high computation in a high dimensional space, especially when estimating the joint
distribution. In this case, the pairwise iterative scheme which is proposed in [20], [38]
should be used to mitigate this potential drawback.

92

Algorithm 3.1: ICA Based on the gradient descent
Input: (‫ )ܶ	ݔ	ܯ‬matrix of realization‫ࢄ	ݏ‬, Initial demixing matrix	ࢃ = ࡵࡹ ,
Max. number of iterations	‫ݎݐܫ‬, Step Size ߛ i.e.	ߛ	 = 0.3, alpha ߙ i.e.	ߙ = −0.99999
Perform Pre-Whitening
{ࢄ = 	ࢂ ∗ ࢄ = ࢫ^(−1 ⁄ 2)	ࡱ^ܶ		ࢄ},
For loop: for each I Iteration do
For loop: for each ‫ = ݐ‬1, … , ܶ
Evaluate the proposed contrast function and its derivative
൭

	ࣔࡰ࡯࡯ࡿ (ࢅ࢚ , ࢟࢓࢚ , ࢻ)
ൗࣔ࢝ ൱
࢓࢒

End For

Update de-mixing matrix ࢃ

ࢃ=ࢃ−ߛ

߲ࡰ஼஼ௌ (ࢄ, ࢃ)
߲ࢃ

Normalization of ࢃ
Check Convergence
‖∆‫ܦ‬௖ ‖ ≤ ߳ i.e.	߳ = 10ିସ
End For
Output: Demixing Matrix ࢃ, estimated signals y

3.4 Scenario of two or three source signals
Generally Speaking, the non-parametric ICA algorithm suffers from insufficient
data and high computation in a high dimensional space, especially when estimating the
joint distribution. However, in several previous reports in the literature, e.g., [13], [16],
the authors suggest applying the pairwise iterative schemes to tackle the high dimensional
data problem for non-parametric ICA algorithm(s). However, there are no results
indicating how the performance would hold up with the pairwise scheme, especially in
terms of computational complexity and in terms of the accuracy of the non-parametric
ICA algorithm. In this work, we present two effective pairwise ICA algorithms: one is
based on the gradient descent and the other is based on the Jacobi optimization [16].

93

Without loss of generality, one can represent the demixing matrixW as a series of

rotational matrices in terms of unknown angle(s) θ୧୨ ∈ [−π/4, π/4] between each two

pair (i, j) of the observed signals. Specifically, define the pairwise rotation matrix
ࢃ൫ߠ௜௝ ൯ = ൤

cos ߠ௜௝ 							 − sin ߠ௜௝
൨
sin ߠ௜௝ 											cos ߠ௜௝

(3.29)

The idea is to make each pair of the estimated (marginal) output “independent” as
possible (minimize dependency). It was proved and pointed out by Comon in [6] that the
mutual independence between the M whitened observed signals might be attained by
maximize the independence between each pair of them. In this work, we present two
algorithms to solve the high dimensional problem in the non-parametric scheme. First, we
adopt the non-parametric algorithm based on the gradient descent into the pairwise
iterative scheme of Algorithm 3.2.
Algorithm 3.2: ICA Based on pairwise gradient decent scheme
Input: (‫ )ܶ	ݔ	ܯ‬matrix of realization ࢄ, Initial demixing matrix	ࢃ = ࡵࡹ ,
number of iterations	‫ݎݐܫ‬, Step Size ߛ i.e.	ߛ	 = 0.3, alpha ߙ i.e.	ߙ = −0.99999
For loop: for each	݅ = 1 … ‫ ܯ‬− 1
For loop: for each ݆ = ݅ + 1 … ‫ܯ‬
Initial demixing matrix ࢃ૛ = ࡵ૛ 	
While: while (true)
Find	ࢃ૛ from due to Algorithm 1 for each pairs of ࢄ ;
End While
Initial rotational matrix
ࡾ = ࡵࡹ ,
Update rotational matrix
ࡾ([࢏		࢐], [࢏			࢐]) = ࢃ૛
Update Demixing matrix
ࢃ= ࡾ∗ࢃ
Update observation Matrix
X=W * X
End For
End For
Output: Demixing matrix ࢃ and demixed sources in ࢄ

94

Second, we proposed a CCS-ICA algorithm based on Jacobi pairwise scheme in
Algorithm 3.3. This algorithm based on finding the rotation matrix in (3.29) that attains
the
minima of CCS-DIV. So, in fact, we set up the range of thetas, such that	θ୧୨ ∈ [	−
୮୧
ସ

], where 	θ୥ is the grid search, for instance	θ୥ =

୮୧

଺ସ

୮୧
ସ

: θ୥ ∶

. Then for each pair (i, j)	of the

observation data, we find the demixing matrix	Wଶ , which attains the minimum of the
CCS-DIV. Please refer to Algorithm 3 for more details.
Algorithm 3.3: ICA Based on pairwise Jacobi scheme
Input: (‫ )ܶ	ݔ	ܯ‬matrix of realization X, Initial demixing matrix	ࢃ = ࡵࡹ ,
number of iterations	‫ݎݐܫ‬, Step Size ߛ i.e.	ߛ	 = 0.3, alpha ߙ i.e.	ߙ = −0.99999
Perform Pre-Whitening
{ࢄ = 	ࢂ ∗ ࢄ = ࢫ^((−1) ⁄ 2)	‫}ࢄ		ܶ^ܧ‬,
For loop: for each ݅ = 1 … ‫ ܯ‬− 1
For loop: for each ݆ = ݅ + 1 … ‫ܯ‬
௣௜ ௣௜ ௣௜
For loop: For each ߠଵ = − ସ : ଺ସ ∶ ସ
ܿ‫ߠ	ݏ݋‬ଵ − ‫ߠ ݊݅ݏ‬ଵ
ࢃ૛ = ൤
൨
‫ߠ ݊݅ݏ‬ଵ
ܿ‫ߠ ݏ݋‬ଵ
Evaluate
ࡰࢉ (ࢄ([࢏	࢐], : ), ࢃ૛ ∗ ࢄ([࢏	࢐], : ), ࢻ) For all	‫ = ݐ‬1, … , ܶ.
End For
Find
ࢃ૛ = ࢓࢏࢔ࢃ૛ ࡰࢉ (ࢄ(࢏: ࢐, : ), ࢃ૛ ∗ ࢄ, ߙ)
Initial rotational matrix
ࡾ = ࡵࡹ ,
Update rotational matrix
ࡾ([࢏		࢐], [࢏			࢐]) = ࢃ૛
Update Demixing matrix
ࢃ= ࡾ∗ࢃ
Update observation Matrix
ࢄ=ࢃ∗X
End For
End For
Output: Demixing matrix ࢃ and estimated Sources in	ࢄ

95

3.5

Computational Complexity

Given ܶ realizations of M observation signals, the computational complexity of the
ܶ and the number of observation signals	‫ܯ‬, and

proposed algorithms rely on
approximately is given by	O ቀ

୑(୑ିଵ)
ଶ

T ଶ ቁ. The computational complexity has been a

measure of merit for ICA algorithms. With the advent of Graphics Processing Units
(GPUs) (see Nvidia.com, e.g.), and more powerful computing platforms, performance
accuracy holds more merit. In our comparison among the ICA algorithms, we employ
several metrics including computational load time and accuracy.

In this work, we

employ the adaptive sampling technique that produces improved performance in terms of
accuracy and computational load together. The presented technique samples the signal
into small time blocks in order to evaluate the integration of the proposed divergence and
reduce the computational complexity. Thus, we have introduced sampling factor	T	ୱ 	to
evaluate the proposed divergence at each 	Tୱ instance. Therefore, the computational
complexity of the proposed algorithm is reduced by the square of the sample factor 	Tୱ to

be less than	O ൬

୑(୑ିଵ)
ଶ

ቀ୘ ቁ ൰. Namely, we quantize the specific area of integration of the
୘ ଶ
౩

proposed divergence into equal ቀ୘ 	ቁ segments to evaluate the proposed divergence.
୘

౩

3.6

Simulation Results
Several experimental results are conducted to compare the performance of

different ICA-based algorithms. This work provides results that have a diversity of
experimental data and conditions.

96

3.6.1

Sensitivity of CCS-DIV measure

This experiment evaluates the proposed CCS-DIV divergence measure in relation
to the sensitivity of the probability model of the discrete variables. Results indicate that
the CCS-DIV with α = 1	and α = −1 successfully reaches the minimum point of the
measure. Let us consider the case as in [20], [34], [35], where the mixed signals	X = AS,
to investigate the sensitivity of CCS-DIV with α = 1	and	α = −1, respectively.
Simulated experiments in [20], [35] were performed for two sources (M = 2) and with a
demixing matrix W

W=൤

sin θଵ
൨
sin θଶ

cos	θଵ
cos θଶ

(3.24)

where W,	in this case, is a parametrized matrix that establishes a polar coordinate
system. The row vectors in W have unit norms and provide the counterclockwise rotation
of θଵ 	and	θଶ 	, respectively. The orthogonal rows in W holds the relationship between
஠

θଵ 	and	θଶ 	, which is	θଶ = θଵ ± ଶ . Notably, the amplitude should not affect the
independent sources. By varying θଵ 	and	θଶ 	, we get different demixing matrices.
However, consider the simple case, i.e., mixtures of signals of two zero mean continuous
variables; one variable is of a sub-Gaussian distribution and the other variable is of a
super-Gaussian distribution. For the sub-Gaussian distribution, we use the uniform
distribution
p(sଵ )= ቊ 2தభ
0
1

													 sଵ ൫-τଵ ,τଵ ൯

							Otherwise

		

		

ቋ

(3.25)

and for the super-Gaussian distribution, we use the Laplacian distribution
ଵ

p(sଶ ) = ଶத exp ቂ−
మ

|ୱమ |
தమ

ቃ

(3.26)

97

In this task, data samples T = 1000	are selected and randomly generated by
using	τଵ = 3	 and	τଶ = 1. Kurtosis for the two signals are −1.2,	and 2.99,	respectively,
ସ]
and they are evaluated using	Kurt(s) = E[s ൘E[s ଶ ] − 3.

Without loss of generality, we take the mixing matrix as the 2 × 2 identity matrix,

thus, xଵ = sଵ and xଶ = sଶ [6], [20], [25]. The normalized divergence measures of the

demixing signals and their sensitivity to the variation of the demixing matrix is shown in
Figure 3.6. As shown in Figure 3.6, the variations of the demixing matrix are represented
by the polar systems θ1	and	θ2.		A wide variety of demixing matrices are considered by
taking the interval of angles {θଵ 	and	θଶ 	} from 0 to π. Furthermore, Fig. 3.6 evaluates the
CCS-DIV along with E-DIV, KL-DIV, and C-DIV with α = 1	and	α = −1 . The
minimum (i.e., zero) divergence is achieved at the same conditions {	θଵ = 0	 and	θଶ =
஠
ଶ

	} as is clearly seen.
In addition, no local minima are found. Clearly, the values of CCS-DIV with

α = 1 are low and flat within the range of θ2	between 0.5 and 2.5. This performance is
similar to other divergence measures as in [20], [25]. Contrarily, the values of CCS-DIV
with α = −1 enable a relatively more convex form in the same range. Thus, the CCSDIV with α = −1 leads through the steepest descent to the minimum point of the CCSDIV measure. Observe that the CCS-DIV with α = 1	has a flat curve with respect to
θଵ 	and	θଶ . For other	α values, the CCS-DIVas a contrast function, can produce large
decremental steps of the demixing matrix towards convergence to the solution. And
again, one can observe that the CS-DIV is not a convex function in contrast to CCS-DIV
from the graphs in Figure 3.6.

98

1
0.9

1

0.8

0.8

0.7
0.6

0.6

0.5

0.4

0.4
0.3

0.2

0.2

0
4

0.1

4

3

3

2

0

2

1

1
0

4

0

θ2

3

2

θ1

(a)

1

CCS-DIV with α=1

4

3

2

1

0

θ2

θ1

(b) CCS-DIV with α=-1

1

1

0.9
0.8

0.8

0.7

0.6

0.6
0.5

0.4

0.4
0.3

0.2

0.2
0.1

0
4

0
4

3

2

1

0

0

θ2

3

2

1

3

4

2

1

0

0

θ2

θ1

(c) KL-DIV

3

2

1
θ1

(d) E-DIV

0.35

1

0.3

0.8
0.25

0.6

0.2
0.15

0.4

0.1

0.2

0.05
0
4

0
4
3
2
1

θ2

0

0

0.5

1.5

1

2

2.5

3

2

3.5

θ2

θ1

0

0

2

1

3

4

θ1

(e) CS-DIV
(f) C-DIV with α=-1

Figure 3.6: Comparison of (a) CCS-DIV with α = 1, (b) CCS-DIV with α = -1, (c) KLDIV, (d) E-DIV, (e) CS-DIV (f) C-DIV with α = -1 of demixed signals as a function of
the demixing parameters ી૚ and	ી૛ .
99

4

3.6.2

The performance and the convergence speed of the proposed CCS-

ICA algorithms versus the existing ICA-based algorithms
In this section, Monte Carlo Simulations are carried out. It is assumed that the
number of sources is equal to the number of observations “sensors”. All algorithms have
used the same whitening method. The experiments have been carried out using the
MATLAB software on an Intel Core i5 CPU 2.4-GHz processor and 4G MB RAM. Each
entry corresponds to the average of corresponding trial “independent Monte Carlo” runs
in which the mixing matrix is randomly chosen.
First, we compare the performance and convergence speed of the gradient descent
ICA algorithms based on the CCS-DIV, CS-DIV, E-DIV, KL-DIV, and C-DIV with
α = 1	and	α = −1. In all tasks, the standard gradient descent method is used to devise
the parameterized and non-parameterized ICA algorithms based on CCS-DIV with γ=0.7
and γ=0.3 for α=1 and α=-1 cases, respectively , CS-DIV with γ=0.3, E-DIV with
γ=0.06, KL-DIV γ=0.17 as in [14], and C-DIV with γ=0.008 and γ=0.1 for α=-1 and α=1
cases, respectively as in [13]. During the comparison, we use the bandwidth as a function
షభ

of sample size, namely, h = 1.06T ఱ [13-15]. To study the parametric scenario for the
ICA algorithms, we use mixed signals that consist of two signal sources with a mixing
matrix	A = [[0.5		0.6]୘ 	[0.3			0.4]୘ ], which has a determinant	det(A) = 0.02. One of the
signal sources has a uniform distribution (sub-Gaussian) and the other has a Laplacian
distribution with kurtosis values −1.2109 and	3.0839, respectively. T = 1000 sampled
data are taken using a learning rate γ = 0.3 and for 250 iterations. The gradient descent
ICA algorithms based on the CCS-DIV, CS-DIV, E-DIV, KL-DIV, and C-DIV with
α = 1	and	α = −1, respectively, are implemented to recover the estimated source signals.
100

The initial demixed matrix W is taken as an identity matrix. Fig. 3.7 shows the demixed
signals resulting from the application of the various ICA-based algorithms. Clearly, the
parameterized CCS–ICA algorithm outperforms all other ICA algorithms in this scenario
with signal to interference ratio (SIR) of 41.9 dB and 32 dB, respectively. Additionally,
Fig. 3.8 shows the “learning curves” of the parameterized CCS–ICA algorithm with	α =
1	and	α = −1 when compared to the other ICA algorithms, as it graphs the DIV
measures versus the iterations (in epochs). As shown in Fig. 3.8, the speed convergence
of the CCS–ICA algorithm is comparable to the C-ICA and KL-ICA algorithms.
Furthermore, Table 3.1 and 3.2 summarize the performance of the proposed nonparametric ICA algorithms with α = −1 against other several algorithms, i.e. CS-DIV,
E-DIV, KL-DIV, C-DIV with α = −1 and IK-DIV in terms of accuracy and
computational complexity, respectively. CCS2 and CCS3 represent Algorithm 2 and
Algorithm 3, respectively. We also compare it with other benchmark algorithms such as
FastICA [8], RobustICA [7], JADE [11] and RapidICA [42]. For these methods, the
default setting parameters are used according to their toolboxes and their publications. In
this task, we have examined the aforementioned ICA algorithms to separate mixtures of
two sub-Gaussians, two sup-Gaussians, and both sub and sup- Gaussian signals. We use
the following distributions: For the sub-Gaussian distribution, we use the uniform
distribution
p(sଵ )= ቊ2தభ
1

		 											sଵ 	݅݊	൫-τଵ ,τଵ ൯		

0 													 Otherwise

ቋ

(3.27)

and the Rayleigh distribution, we use the following
ୱమ

p(sଶ ) = sଶ exp ቂ− ଶమ ቃ

(3.28)

For the super-Gaussian distribution, we use the Laplacian distribution
101

p(sଷ ) =

ଵ

ଶதమ

exp ቂ−

|ୱయ |
தమ

ቃ																

	

																								(3.29)	

and log-normal distribution, we use the following
p(sସ ) = exp ቂ−

(୪୭୥ ୱర )మ
ଶ

ቃ

(3.30)

Also, data samples,	T = 1000, are selected and randomly generated by
using	τଵ = 3	

and	τଶ = 1.

Kurtoses

for

all

aforementioned

signals

are

−1.2, 2.99, −0.7224, and	8.4559		respectively, and they are evaluated using	Kurt(s) =
E[s ସ ] 	 ⁄ (E[s ଶ 	])ଶ 	 − 3.

One can observe several patterns from Tables 3.1, 3.2 and 3.3. The presented
algorithms based on the proposed measure show the best performance in terms of
accuracy (in most cases) and stability. The proposed algorithm CCS3 exhibits the
comparable behavior in terms of speed and stability with KL and ED. Clearly, the
proposed divergence improves the CS-DIV in terms of stability and performance.
Notably, most the presented divergences struggle to separate the Rayleigh distributions
(࢙૛ , ࢙૛)

(including the KL-DIV) except the proposed divergence and C-DIVs. Moreover,

Table 3.3 verifies our point in this letter, thanks to the convexity; the stability of the
proposed algorithm outperforms the CS-DIV and makes the divergence more robust
against variation of parameters.
Also, it is obvious that the non-parametric methods perform betters in terms of
performance and stability than the non-Gaussian methods such as JADE, FastICA and
other algorithms. Nevertheless JADE performs better than each of FastICA, RobustICA
and Rapid ICA in terms of accuracy in some cases, but in terms of speed, we find that
these later algorithms outperform the JADE algorithm, especially the rapid ICA and
Robust ICA. However, Table 3.5 summarizes the performance of the aforementioned

102

algorithms in a more complex separation process. A different, randomly generated source
signals (refer to Table 3.4) and mixing matrices are employed. As a result, Table 3.4
summarizes the performance of each algorithm in terms of the standard error metric
(multiplied ×100), see [1], [13]. All results have been averaged over a number of
independent Monte Carlo runs. Table 3.4 demonstrates again that the non-parametric ICA
based on the proposed divergence provides the best performance in terms of accuracy (in
most cases). However, in terms of speed, RapidICA, FastICA, RobustICA and JADE
perform better. So, these algorithms could be chosen to initialize for methods of higher
performance in order to reduce the overall computational load.
Since, the comparison between the ICA algorithms has relied on two criteria,
namely, accuracy and computational load, a tradeoff between these two criteria has
always been assessed for each targeted application. We also note that with the advent of
Graphics Processing Units (GPUs), computational load/speed becomes less of a factor,
and the true metric becomes accuracy. Table 3.6 summarizes the performance of CCSICA (see Algorithm 3.3) based on the different values of		ܶ௦ 	(1,10,100,1000), and
Table 3.7 shows their corresponding computational load in seconds. Based on these
results, one observes that the best performance of the CCS-ICA, Algorithm 3.3 scheme,
in terms of accuracy and speed occurs with			ܶ௦ = 100. For brevity, Readers can get more
results of non-parametric of CCS-ICA algorithm at http://www.egr.msu.edu/bsr/.
Also, to check the robustness of the proposed algorithm, we have modified the
initial demixed matrix W to be random. Figure 3.9 and Figure 3.10 show the results of
the SIR of the demixed signals and the learning curve of C-ICA, E-ICA, KL-ICA, and

103

CCS–ICA with α = 1, and α = -1 in a two-source BSS task with a random initial
demixing matrix, respectively.
Table 3.1: The performance of the ICA algorithm based on the proposed divergence
and other widelyused ICA algorithms in terms of Amari error (multiplied by 100).
Each entry averages over the
corresponding number of trials. Observation mixtures consists of two source
signals that follow the same distribution as denoted in the corresponding example.
Samples

Trials

FastICA

JADE

RobustICA

Rapid
ICA

IKDIV

CSDIV

KLICA

EDDIV

CDIV+

CDIV-

CCS
DIV1+

CCSDIV1-

CCS
DIV2+

1000
1000
1000
1000
1000

100
100
100
100
100

6.16
22.34
2.45
3.34
5.11

4.77
18.51
2.10
3.03
4.53

5.27
28.29
2.24
3.13
5.39

5.07
20.26
2.14
3.29
5.17

3.32
6.78
2.31
1.93
2.44

2.66
7.39
2.21
2.02
2.07

1.75
8.13
2.31
1.93
2.06

2.04
5.12
2.31
1.71
2.24

2.17
5.38
2.65
2.04
2.56

2.36
3.83
2.50
1.90
2.10

2.25
8.92
2.19
1.97
2.50

2.40
5.80
1.94
1.93
2.33

1.86
3.55
1.84
1.82
2.21

Table 3.2: The computational load, in seconds, of the ICA algorithm based on the proposed
divergence and other widely used ICA algorithms, each entry averages over the corresponding number of
trials. Observation mixtures consists of two source signals that follow the same
distribution as denoted in the corresponding example.
Samples

Trials

FastICA

JADE

RobustICA

Rapid
ICA

IKDIV

CSDIV

KLICA

EDDIV

CDIV+

CDIV-

CCSDIV2+

CCSDIV2-

CCS
DIV3+

1000
1000
1000
1000
1000

100
100
100
100
100

0.0
0.0
0.0
0.0
0.0

0.0
0.1
0.0
0.1
0.0

0.0
0.0
0.0
0.0
0.0

0.0
0.0
0.0
0.0
0.0

20.1
20.1
19.1
20.4
20.2

22.1
21.3
20.7
24.3
20.1

19.5
19.2
19.1
19
20.1

20.1
20.2
22.1
23.1
22.1

24.1
23.3
25.1
24.1
21.4

24.1
23.3
25.1
24.1
21.4

22.2
19.1
18.1
19.1
18.1

22.2
19.1
18.1
19.1
18.1

19.3
21.2
20.2
19.2
19.2

Table 3.3: The corresponding variance of the performance.
Samples

Trials

FastICA

JADE

RobustICA

1000
1000
1000
1000
1000

100
100
100
100
100

11.02
102.53
1.11
18.47
13.91

12.07
211.75
1.80
15.34
12.88

38.05
332.76
1.71
17.44
13.90

Rapid
ICA
11.74
95.06
1.27
14.64
14.16

IKDIV
3.76
16.43
1.63
1.51
2.14

CSDIV
2.58
37.71
1.54
1.52
2.75

KLICA
0.81
8.87
1.66
0.95
1.63

EDDIV
1.15
8.87
1.66
0.78
1.72

CDIV+
1.39
6.76
1.60
1.07
2.25

CDIV0.98
3.92
1.35
1.08
1.16

CCSDIV2+
1.53
28.35
2.39
1.19
2.04

CCSDIV21.91
10.63
1.93
1.51
1.92

These results agree with those in the previous sections. We also report the time of
each epoch for using different divergence measures in their implementation. Furthermore,
Figure 3.11 shows the “learning curves” of the CCS-DIV measure with several convexity
parameter values in a three-source BSS task. The mixed signals are a result of the mixing
matrix
104

CCSDIV3+
0.72
6.19
1.17
0.83
1.37

A = [[0. 3			0.2
2				0.4]୘ 			[0.4			0.8				0.7]୘ 		[0.5			0.6				0.3]]୘ 		]]
and

three

Laplacian

distributions

with
with	τଵ = 1, τଶ = 0. 5	,,

and	τ
and ଷ = 1. 5,

respectively.. The sampled data of each source has samples
samples		T = 1000.. The values of the
kurtosis of the three sources are 3.22, 3.08, and 2.57, respect
respectively.
ively. In this task, the
standard gradient descent method ((3.19)) is used to devise the parametrized ICA
algorithms based on CCS-DIV
DIV with
with	γ = 0.7, CS-DIV with	γ = 0.3, E-DIV
DIV with	γ
with =
0.06, KL-DIV γ = 0.17 as in [25
[25], and C-DIV with	γ = 0.008 as in [20].Clearly, the
CCS–ICA with α = −1 (as well as the C
C-ICA with	α = −1)) attains the same
convergence speed, see Figure 3.12. Moreover, Figure 3.13 depicts the SIR of the
demixed signals of all the algorithms. It is obvious that the CCS
CCS–ICA with α = −1 has a
better performance when compared to all others algorithms.

60
50
40
30
SIR #1 dB

20

SIR #2 dB

10
0
C-DIV
C-DIV KL-DIV
with
with
alpha =- alpha =1
1

E-DIV

CS-DIV CCS-DIV CCS-DIV
with
with
alpha =- alpha =1
1

Figure 3.7: Comparison of SIRs (dB) of demixed two speeches and music signals by
using different ICA algorithms in parametric BSS task.

105

Divergence Measure

15
C-ICA alpha = -1
E-ICA
KL-ICA
CS-ICA
CCS-ICA alpha = -1
C-ICA alpha = 1
CCS-ICA alpha = 1

10

5

0
0
10

1

2

10

10

10

3

Number of Epochs

Figure 3.8: Comparison of learning curves of C-ICA, E-ICA, KL-ICA, and
CCS-ICA with α=1, and α=-1 in a two-source BSS task.

106

Table 3.4: Kurtosis Values of the different
probability density functions that used in the ICA
experiments
Signals’ Notation
࢙૚
࢙૛
࢙૜
࢙૝
࢙૞

Kurtosis
−1.2116
2.9324
−1.3995
136.0108
11.6452

࢙૟
࢙ૠ
࢙ૡ
࢙ૢ
࢙૚૙
࢙૚૚

4.219
−1.2065
3.1965
3.4302
−1.3049
−1.6805

Signals’
Notation
࢙૚૛
࢙૚૜
࢙૚૝
࢙૚૞
࢙૚૟
࢙૚ૠ
࢙૚ૡ
࢙૚ૢ
࢙૛૙
࢙૛૚
࢙૛૛

Kurtosis
−0.65419
−0.33421
−1.6935
−0.86239
−0.60566
−0.75488
−0.65645
−0.81022
−0.7692
−0.27737
−0.56816

Table 3.5: The performance of the ICA
algorithm based on the proposed divergence in
terms of Amari error (multiplied by 100). Each
entry averages over the corresponding number of
trials.
Samples
‫܂‬

2

1000

1024

CCS3
at
૙. ૚‫܂‬
4.6

4

2000
4000
8000
1000

1024
1024
1024
250

3.6
2.8
2.2
5.8

2.3
1.9
1.6
3.8

1.9
1.6
1.1
2.4

1.8
1.4
1.2
2.5

8

2000
4000
8000
1000

250
250
250
100

5
3.5
2.7
5.6

2.9
2.5
2.2
3.8

2
1.6
1.3
.5

1.8
1.6
1.3
3.2

16

2000
4000
8000
1000

100
100
100
25

3.7
3.1
3.0
20.5

3.1
2.6
2.2
15.8

2.2
2.2
1.9
8.6

3
2.8
1.9
5.5

20

2000
4000
8000
1000

25
25
25
10

12.6
8.6
5.8
27.7

10.1
8
3.9
15.1

7
4.5
1.9
13.7

5.1
4.2
2.9
8.9

2000
4000
8000

10
10
10

22.8
15.6
9.8

11.3
9
6.3

12
7.2
3

7.2
5.3
2.3

Dimensions
ࡹ

Trials

CCS3
At
૙. ૙૚‫܂‬
2.9

CCS3
At
૙. ૙૙૚‫܂‬
2.1

CCS3
At
૚
2

107

Table 3.7: The performance of the ICA algorithm based on the proposed
divergence and other widely used ICA algorithms in terms of Amari error (multiplied
by 100). Each entry averages over the corresponding number of trials.
Dimensions

Samples

Trials

JADE

FastICA

RapidICA

RobustICA

CS

CDIV

KLDIV

CCS2

CCS3

2

1000

512

5.6

7.3

6.1

7.2

2.5

2.2

2.3

2.1

2

4

2000
4000
8000
1000

512
512
512
200

5.1
3.1
2.4
8

5.9
4.1
2.6
9.7

5.5
3.5
2.5
9.1

6
4.3
2.6
9.8

1.9
1.6
1.4
3.1

1.9
1.6
1.4
3.1

1.9
1.6
1.4
3.1

1.9
1.6
1.4
3.1

1.8
1.4
1.1
2.5

8

2000
4000
8000
1000

200
200
200
75

5.4
4.2
2.1
10.5

7.3
4.2
2.7
10.3

6.5
4.1
2.5
9.6

7.2
4.3
2.7
11.2

2.9
1.4
1.5
4.6

2.9
1.4
1.5
4.6

2.9
1.4
1.5
4.6

2.9
1.4
1.5
4.6

1.8
1.6
1.2
3.2

16

2000
4000
8000
1000

75
75
75
15

8.1
5.7
2.7
8

8.0
4.1
3.1
9.7

7.6
4.4
3.0
9.1

8.2
3.9
3.2
9.8

3.9
2.3
2
8.1

3.9
2.3
2
8.1

3.9
2.3
2
8.1

3.9
2.3
2
8.1

3
2.8
1.9
5.5

20

2000
4000
8000
1000

15
15
15
5

5.4
4.2
2.1
22.3

7.3
4.2
2.7
21.1

6.5
4.1
2.5
20.1

7.2
4.3
2.7
26.2

6.9
5.6
3.6
14.1

6.9
5.6
3.6
11.1

6.9
5.6
3.6
14.1

6.9
5.6
3.6
11.1

5.1
4.2
2.9
8.9

2000
4000
8000

5
5
5

15.7
7.8
4.5

15.6
7.2
4.1

15.2
7.1
3.9

16.2
7.2
4.0

9.3
7.5
4.7

13.3
7.6
5.3

10.3
6.4
4.6

8.3
6.7
4.4

7.2
5.3
2.3

Table 3.6: The computational load, in seconds, of
the ICA algorithm based on the proposed divergence and
other widely used ICA algorithms, each entry averages
over the corresponding number of trials.
Samples ‫܂‬

Trials

2

1000

4

2000
4000
8000
1000

1024

CCS3
at
૙. ૚‫܂‬
0.4

1024
1024
1024
250

0.5
0.8
1.5
1.8

8

2000
4000
8000
1000

250
250
250
100

16

2000
4000
8000
1000

20

Dimensions
ࡹ

2.8

CCS3
At 	
૙. ૙૙૚‫܂‬
29.8

CCS3
At 	
૚
28

4.8
8
10.6
24

44.8
77.9
137
218.1

96.4
342.9
1073
237.9

4.3
5.9
10.2
19.3

39
47.9
83.6
128.7

344.8
593.4
1105
1053

630.3
2348.6
7737.1
1174

100
100
100
25

31.5
46.5
74.2
170.6

201.7
266.4
241.8
909.5

1743
3109
5534
6282

3347
11705
42115
4376.2

2000
4000
8000
1000

25
25
25
10

242.3
305.5
329.9
339

1171
1403
2297
1195.7

9320
14717
25658
9605

17918.3
58894.6
10483.4
11355.2

2000
4000
8000

10
10
10

427.4
607.6
900

1724.2
2398.3
3754.5

14708
23634
42538

27504.8
52536.6
97312.1

108

CCS3 At
૙. ૙૚‫܂‬

3.6.3

Experiments on Speech and Music Signals

Two experiments are presented in this section to evaluate the CCS–ICA
algorithm. Both experiments are carried out involving speech and music signals under
different conditions. The source signals are two speech signals of different male speakers
and a music signal. The first experiment is to separate three source signals from their
mixtures given by X = AS where the 3 x 3 mixing matrix
A = [[0.8			0.3			 − 0.3]୘ 			[0.2		 − 0.8				0.7]୘ 		[0.3			0.2				0.3]୘ 		].
The three speech signals are sampled from the ICA ’99 conference BSS test sets
at http://sound.media.mit.edu/ica-bench/ [24], [66] with an 8 kHz sampling rate. The nonparametrized CCS–ICA algorithms (as well as the other algorithms) with α = 1	and	α =
−1 are applied to this task. The resulting waveforms are acquired and the signal to
interference ratio (SIR) of each estimated source is calculated. We use the following to
calculate the SIR:
Given

the

source

signals

S = {sଵ , sଶ , … s୑ }

and

demixed

signals	Y =

{yଵ , yଶ , … y୑ }, the SIR in decibels is calculated by

౪
SIR	(dB) = 10 log ∑౉ ౪సభ
‖୷ ିୱ ‖మ

∑౉ ‖ୱ ‖మ

౪సభ

౪

౪

(3.31)

The summary results are depicted in Figure 3.14. In addition, Figure 3.14 shows
the SIRs for the other algorithms, namely, JADE, Fast ICA, Robust ICA, KL-ICA and CICA with α = 1	and	α = −1. As shown in Figure 3.14, the proposed CCS–ICA algorithm
achieves significant improvements in terms of SIRs. As shown in the previous figures
also, the proposed algorithm has consistency and obtains the best performance among the
host of algorithms

109

Moreover, a second experiment is conducted to examine the comparative
performance in the presence of additive noise. We now consider the model x = As + v
that contains the same source signals with additive noise and with a different mixing
matrix
A = [[0.8			0.3			 − 0.3]୘ 			[0.2		 − 0.8				0.7]୘ 		[0.3			0.2				0.3]୘ 		]

The noise v is an M x T vector with zero mean and σଶ I covariance matrix. In
addition, it is independent from the source signals. Figure 3.15 shows the separated
source signals in the noisy BSS model with SNR = 20 dB. In comparison, Fig. 3.16
presents the SNRs of all the other algorithms. Clearly, the proposed algorithm has the
best performance when compared to others even though its performance decreased in the
noisy BSS model. Notably, the SNRs of JADE, Fast ICA and Robust ICA were very low
as they rely on the criterion of non-Gaussianity, which is unreliable in the Gaussian-noise
environment. In contrast, C-ICA, KL-ICA, and the proposed algorithm, which are based
on different mutual information measures, achieved reasonable results. We note that one
can also conduct and use the CCS-DIV to recover the source signals from the convolutive
mixtures in the frequency domain as in [3], [38].

110

60
50
40
30

SIR #1 dB

20

SIR #2 dB

10
0
C-DIV
C-DIV
with
with
alpha =-1 alpha =1

KL
KL-DIV

E-DIV

CS-DIV

CCS-DIV CCS-DIV
with
with
alpha =-1 alpha =1

Figure 3.9: Comparison of SIRs (dB) of demixed two speeches and music signals
by using different ICA algorithms in parametric BSS task
task-- random initial value.

Divergence Measure

15
C-ICA alpha = -1
E-ICA
KL-ICA
CS-ICA
CCS-ICA alpha = -1
C-ICA alpha = 1
CCS-ICA alpha = 1

10

5

0
0
10

10

1

10

2

10

3

Number of Epochs

Figure 3.10: Comparison of learning curves of C
C-ICA, E-ICA,
ICA, KL-ICA,
KL
and
CCS-ICA with α=1, and α=-11 in a two
two-source
source BSS task with random initial value.

111

35
30
25
20

SIR #1 dB

15

SIR #2 dB

10

SIR #3 dB

5
0
C-DIV with C-DIV with
alpha =-1 alpha =1

KL-DIV

E-DIV

CCS-DIV
CCS-DIV
with alpha with alpha
=-1
=1

Figure 3.11: Comparison of S
SIRs
IRs (dB) of demixed two speeches and
music signals by using different ICA algorithms in parametric BSS task.

Divergence Measure

15
C-ICA alpha = -1
E-ICA
KL-ICA
CS-ICA
CCS-ICA alpha = -1
C-ICA alpha = 1
CCS-ICA alpha = 1

10

5

0
0
10

10

1

10

2

3

10

Number of Epochs

Figure 3.12: Comparison of learning curves of C
C-ICA, E-ICA,
KL-ICA, and CCS-ICA
ICA with α=1, and α=
α=-1 in a three-source
source BSS task.

112

50
45
40
35
30
25
20
15
10
5
0

SIR #1 dB
SIR #2 dB
SIR #3 dB

Figure 3.13: Comparison of SIRs (dB) of demixed two speeches and
music signals by using different ICA algorithms in instantaneous BSS task.

2
Original signal 1
demuxed signal1

0
-2

0

1

2

3

4

5

6

7
4

x 10
1
Original signal 2
demuxed signal2

0
-1

0

1

2

3

4

5

6

7
4

x 10
1
Original signal 3
demuxed signal3

0
-1

0

1

2

3

4

5

6

7
4

x 10

Figure 3.14: The original signals and de
de-mixed signals by using
CCS-ICA
ICA algorithm in instantaneous BSS task with additive Gaussian noise.

113

20
18
16
14
12
10
8
6
4
2
0

SIR #1 dB
SIR #2 dB
SIR #3 dB

Figure 3.15: Comparison of SIRs (dB) of demixed two speeches and music
signals by using different ICA algorithms in instantaneous BSS task with additive
Gaussian noise.

3.7

Conclusion
A novel divergence measure is presented based on integrating convex functions

into the Cauchy–Schwarz
Schwarz inequality. This divergence measure is used as a contrast
function to develop new ICA algorithms to solve the Blind Source Separation (BSS)
problem. The CCS-DIV
DIV derived algorithms can be controlled to attain the steepest
descent towards the minimum value. Also, a pairwise iterative scheme is employed to
address the high dimensional problem in BSS. Two schemes of pairwise non-parametric
non
ICA algorithms are developed
veloped based on the proposed divergence. Several examples and
experiments are carried out to show the improved performance of the proposed
divergence. Furthermore, this chapter compares the metric performance with a host of
leading ICA algorithms. We hav
have developed also nonparametric CCS–ICA
ICA approaches
ap
to
114

demixing where the source signals are estimated by the Parzen Window density. The
conver¬gence speed of the parameterized CCS–ICA procedure is evaluated and
compared to other algorithms. The proposed CCS–ICA algorithms attained the highest
SIR in separation of speech and music signals relative to other leading ICA-based
algorithms.

115

4 Chapter 4

A RobustICA-Based Algorithm for Blind
Separation of Convolutive Mixtures

1.0

We propose a frequency-domain method based on robust independent component

analysis (RICA) to address the multichannel Blind Source Separation (BSS) problem of
the convolutive speech mixtures in highly reverberant environments. We impose
regularization processes to tackle the ill-conditioning problem of the covariance matrix
and to mitigate the performance degradation in frequency domain methods. We apply an
algorithm to separate the source signals in adverse conditions, i.e. high reverberation
conditions when short observation signals are available. Furthermore, we study the
impact of several parameters on the performance of separation, e.g. overlapping ratio and
window type in the frequency domain method. We also compare different techniques to
solve the permutation ambiguity. Through simulations and real-world experiments, we
verify the superiority of the presented algorithm among other BSS algorithms, i.e.
recursive regularized ICA (RR-ICA), independent vector analysis (IVA) and others.

4.1

Introduction
Blind Source Separation (BSS) has a solid theoretical foundation and many

potential applications. In fact, BSS has remained a very important topic of research and

116

development for a long time in many areas, such as biomedical engineering, image
processing, communication systems, speech enhancement, remote sensing, etc. BSS
techniques do not require any prior knowledge about a mixing matrix or source signals
and do not require any training data [1], [2].
Independent Component Analysis (ICA) is a powerful tool in BSS and
Multichannel Blind Deconvolution (MBD). ICA is a key factor of BSS and unsupervised
learning algorithms. ICA is related to Principle Component Analysis (PCA) and Factor
Analysis (FA) in multivariate analysis and data mining. This is especially the case when
corresponding to second order methods in which the components or factors are in the
form of a Gaussian distribution [1], [3], [6]. However, ICA is a statistical technique that
includes higher order statistics (HOS), where the goal is to represent a set of random
variables as a linear transformation of statistically independent components [1]. ICA
methods usually assume certain properties on the sources or mixing system in order to
exploit a separation criterion which imposes the same properties on their estimates.
In ICA of speech signals, several approaches have been proposed in a simple
case of instantaneous linear mixtures [12-18]. However, the convolutive linear mixtures
are considered more suitable in real-world applications [1-3]. Several convolutive ICA
approaches have been proposed for time domain [3], [4], and frequency domain [35]-[42]
methods. Also, refer to [3], [38] for more details of existing convolutive ICA methods.
In speech signals, one can exploit the inherent non-stationary attribute of natural
speech signals by using the second order statistics (SOS) method [2]. Mixing
environments are considered to be stationary environments and even on a short period,
one can exploit the Higher order statistics e.g. Joint Approximation Diagonalization

117

(JAD) problem as in [16], [17]. According to [42], [143], online BSS algorithms can be
adapted in time domain under non-stationary conditions. The time domain approach
suffers from slow convergence, lack of stability and high computational complexity.
Alternatively, a block on-line frequency domain BSS algorithm is proposed in
[38]. Then, one can apply the separation processes on individual blocks of the input data
over time.

Furthermore, one can assume that the mixing environment is stationary on

short time windows. This means that the source signals don’t change their location
during this interval of time. This requires choosing the right time frame to grantee that the
separation algorithms are accurate enough with this given observed data within this
window. For more details, refer to [133], where there is a recent ICA algorithm based on
the time domain framework for the short mixtures.
The recursive regularized ICA [130] algorithms

proposed allow estimating a

large number of demixing matrix even with a short amount of data. Despite the good
performance of the aforementioned algorithm, it is considered to be under a semi blind
category since it is based on prior knowledge about the acoustic source signals, i.e.: the
acoustic propagation and the spectral characteristic of the source signals. In [37], [50],
they studied the relationship between the number of frames of the STFT analysis and the
BSS algorithms based on frequency framework. They carry out that the BSS algorithms
in frequency domain are significantly affected by the number of the mixing matrices.
Also in [37], [50], they proposed the method that applying the ICA adaptation to a group
of frequencies in order to leave the size of the STFT large enough to achieve accurate
separation processes. However, this method assumes that the acoustic propagation
approximated is based on an anechoic model, i.e.: as the DRR decreases. However, there

118

are several drawbacks for separating the acoustic sources based on frequency domain
methods [130]. First of all, when we have a high reverberation environment, this enforces
us to increase the number of demixing matrices to ensure an efficient estimation for the
source signals. However, this requirement is not easy to satisfy especially if we have
short observation signals of the source signals. Therefore, inspired by the works of V.
Zarzoso,P. Comon [11], this chapter considers several challenges for the convolutive
mixtures in the frequency domain in order to carry out the RobustICA based algorithm in
frequency domain. We can summarize these challenges as follows.
•

Increasing the immunity of the BSS algorithm towards the outlets, e.g.
signals’ length, additive noise, reverberation time and source moving etc.

•

Implementing should be optimized to be suitable for the real-time
operation [42] in order to make the real-time DSP processor handle the
computational cost without interruptions or distortions.

•

Effectively treating the scaling and permutation problems in the frequency
domain.

•

Reducing the computational complexity of the ICA algorithms based on
the frequency framework.

•

Controlling the accuracy of the ICA algorithm especially when short
mixtures are available and the demixing matrices are not constrained by
any anechoic model.

The remainder of the chapter is organized as follows: Section II, a brief
description of convolutive mixture and problem statement. Section III reviews the
Recursively Regularized ICA. Section IV presents the RobustICA-Based method in
119

frequency domain. In Section IV, we perform solving the ambiguities in ICA algorithm
based on frequency domain. The comparative experiments results and conclusions are
given in Section V and Section VI, respectively.

4.2

Convolutive Mixtures

2.0

A convolutive mixture can be considered as a natural extension of the

instantaneous BSS problem. Assume an m-dimensional vector of received discrete time
signals x(k) = [xଵ (k), xଶ (k), … , x୫ (k)]୘ at time k is to be produced from an ndimensional vector of source signals	s(k) = [sଵ (k), sଶ (k), … , s୫ (k)]୘ , where	m ≥ n,
by using a stable mixture model [2]:
3.0

x(k) = ∑ஶ
୮ୀିஶ H୮ s(k − p) = H୮ ∗ s(k),				
4.0

5.0

with	 ∑ஶ
ିஶ ‖H୮ ‖ ≤ ∞	

(4.1)

Where ∗ represents the linear convolution operator and H୮ is an (m	x	n) matrix

of mixing coefficients at time-lag	p.

4.2.1
6.0

Problem Definition

Assume that elements h୨୧୮ denote the coefficients of the Finite Impulse

Response (FIR) filter	H୮ , and L is the maximum unknown channel length. Then, the
noise-free convolutive model is written as follows:
x(k) = ∑୐ିଵ
୮ୀ଴ H୮ s(k − p)		
7.0

(4.2)

Thus, one can find an approximate inverse channel matrix W୮ in order to

recover the source signals s(k) = [sଵ (k), sଶ (k), … , s୫ (k)]୘ such that

y(k) = W୯ ∗ x(k) = ∑୕ିଵ
ො(k)	
୯ୀି଴ W୯ x(k − q) = s
120

(4.3)

where Q is the length of the inverse of the channel impulse response. However, there
are two approaches to solve this problem and recover the source signals. In time
domain approaches, they have several general drawbacks such as Q should be
selected at least equal to the unknown true channel	L. Therefore, for a long mixing
filter, which means long transfer functions, the computation will be too expensive [2],
[3]. Also, using the IIR filter instead of long FIR filter to overcome this problem
really suffers from the instability and will need to invert the non-minimum phase
filters [2], [3], [133]. Moreover, time approaches are sensitive to channel order
mismatch [3]. However, time domain methods are suitable and very efficient for
small mixing filters such as in communication channel [2], [36]. With all these
limitations, we focus our study on frequency approaches to solve the cocktail party
problem. The main advantage of a frequency domain BSS approach is the ability to
apply the set of any instantaneous ICA algorithms to solve the convolutive BSS
problem. On other hand, the main challenge of BSS in the frequency domain is to
deal with the permutation and scaling ambiguities, see [3], [38] for a recent survey.
However, one can re-map the aforementioned BSS models into frequency domain by
applying the Discrete Fourier Transform (DFT) on the observed signals x(k) in order
to transform it to the instantaneous mixtures problem as follows:
x(k) = H ∗ s(t)		x(q, w) ≈ H(w)s(q, w)
where

(4.3)

w is a frequency index, q is a frame index, s(q, w) = [sଵ (q, w), … ,

s୫ (q, w)]୘ and x(q, w) = [xଵ (q, w), … , x୬ (q, w)]୘ .
In the previous equation, it is considered to be valid only for periodic signals	s(t).
However, it is approximately valid if the time-convolution is circular. Therefore, to

121

ensure that the time convolution is circular [1], it requires making the Fourier
Transform length significantly larger than the maximum length of the mixing
channels L [6]. In [38], [130], they imposed the spectral smoothing approach in order
to mitigate the circularity effect in frequency domain BSS methods. In practice, to
avoid the convergence into local minima during the separation processes, one can
separate the observed signal at each frequency bins. Thus, the sampled observed
signals xୱ (t) are sampled at the discrete time instant	nୱ using the sampling
frequencyfୱ . And then transforming it into time-frequency domain xୱ (q, w) using the
short time frequency transform (STFT) applied to T overlapped samples of the
observed signals. However, one can express the time-frequency of the nth sensor at t
frame as follows
xୱ (q, w) = ∑୬౩ xୱ (nୱ )win ቀ

୬౩ ି୯.ୱ୦୧୤୲
୤౩

ቁe

౤
ି୨ଶ஠୵ ౩
౜౩

୲

			∀	w = fୱ 	, q ∈ 	 [0, … , T − 1]	 (4.4)
୘

Where win(∙)denotes the windowing function, here, we usually use the Hanning
window since it is typically for acoustic signals. The Hanning window [6] is given by
win(nୱ ) = ଶ ൬1 + cos	 ቀ
ଵ

ଶ஠୬౩
୘

ቁ൰

(4.6)

In a real-world scenario, we use the reverberation time	T଺଴	 to approximately

define the length of the impulse response, since the impulse response functions h(t)
are theoretically being infinite. The reverberation time 	T଺଴	 is the required time that

reduces the energy of sounds into 60	dB where the sound signal becomes no longer

active or “dies away”. Therefore, the convolutive ICA model can be approximated
into a series of the instantaneous ICA model as follows:
x(q, w) = H(w)s(q, w)

122

(4.7)

Where w represents the frequency bin, t denotes the time domain frame, e.g. in a
short time frequency transform, x(q, w) is a column vector of the observed signals in
frequency domain, s(q, w) is a column vector of the original source signals and H(w)
is an M	x	N mixing matrix in frequency domain.

For the sake of simplicity: let us assume that the number of source signals N

equals to the number of the observed signals	M. Thus, by applying the ICA algorithm
to the x(q, w) at each frequency bins, one can recover the estimated source signals as
following
y(q, w) = W(w)x(q, w)	

(4.8)

Where W(w) is the demixing matrix at w frequency bin. Also, due to the well-known
symmetry property of the Fourier Transform, one can just find demixing matrices
୘

(W(w)) a half of the frequency bins		w ∈ (0, … , ), and then using the symmetry
ଶ
property to find the others.

4.1

Recursively Regularized ICA
The recursive regularized ICA [130] algorithms are proposed to allow estimating
a large number of demixing matrix even with a short amount of data. Although this
algorithm performed well, it is considered to be under a semi blind category since it is
based on a prior knowledge about the acoustic source signals i.e. the acoustic
propagation and the spectral characteristic of the source signals.
Naturally, BSS assumes that the source signals are usually overlapping in time. In
acoustic signals, one can assume that the source signals have a sparse in timefrequency domain which means that at each time-frequency point there is one

123

dominant energy source signal. Also, acoustic source signals have usually temporal
continuity in the frequency domain.

First of all, let’s recall the estimated source signals ‫ݐ(ݕ‬, ‫)ݓ‬
‫ݐ(ݕ‬, ‫ݐ(ݔ)ݐ(ܹ = )ݓ‬, ‫)ݓ‬

(4.9)

Thus, the update law of the mixing matrix based on the natural gradient optimization
[2] as follows:

‫ݐ(ݕ‬, ‫ି])ݐ(ܪ[ = )ݓ‬ଵ ‫ݐ(ݔ‬, ‫)ݓ‬

∆‫ ܫ()ݐ(ܪ ↔ )ݐ(ܪ‬− ‫ݐ(ݕ(߮[ܧ‬, ‫ݐ(ݕ))ݓ‬, ‫)ݓ‬ு ])
Then

‫ܪ‬௡௘௪ (‫ )ݐ(ܪ = )ݐ‬+ ߤ∆‫)ݐ(ܪ‬

(4.10)
(4.11)

(4.12)

During the updating processes, we updated all coefficients of the mixing matrix
H୬ୣ୵ (t) due to the gradient based on the Kullback Laibur divergenceg =
E[φ(y(t, w))y(t, w)ୌ ].
According to [72], [130], weighting the instantaneous gradient will improve the
estimation technique in previous adaptation processes. Therefore, the developed
gradient expectation is given by [130] as follows:

෡௜ (݇) ቂ‫ ܫ‬− ‫ܧ‬ൣ߮൫‫ݐ(ݕ‬, ‫)ݓ‬൯‫ݐ(ݕ‬, ‫)ݓ‬ு ൧ቃ
෡௜ (݇) = ‫ܪ‬
∆‫ܪ‬
෡௜ (݇)(‫ ܫ‬− ߮(‫ݐ(ݕ‬, ‫ݐ(ݕ))ݓ‬, ‫)ݓ‬ு )൧
෡௜ (݇) = ‫ܧ‬ൣ‫ܪ‬
∆‫ܪ‬

෡௜ (݇)(‫ ܫ‬− ߮(‫ݐ(ݕ‬, ‫ݐ(ݕ))ݓ‬, ‫)ݓ‬ு ) (4.13)
≅ ∑௤ ℂ(‫ݍ‬, ‫ܪ⨀)ݓ‬

Where ⨀ is the Hadamard product (i.e., element-wise), and ℂ(‫ݍ‬, ‫ )ݓ‬is a weight
matrix constructed as

ℂ(‫ݍ‬, ‫ܥ[ = )ݓ‬ଵ (‫ݍ‬, ‫)ݓ‬, ‫ܥ‬ଶ (‫ݍ‬, ‫)ݓ‬, … , ‫ܥ‬ே (‫ݍ‬, ‫])ݓ‬

124

(4.14)

And the generic weighting column vector ‫ܥ‬௠ (‫ݍ‬, ‫ )ݓ‬is defined as
‫ܥ‬௠ (‫ݍ‬, ‫= )ݓ‬

௖೘ (௤,௪)	
ே೟

[1	, 1, … , 1]்

(4.15)

Where ܰ௧ is the number of time frames on which the gradient is averaged and
ܿ௠ (‫ݍ‬, ‫ )ݓ‬is a weight.

4.2

The presented method Based on RobustICA framework

8.0

In this section, a new strategy is proposed, based on the RobustICA method of the

kurtosis framework [11] and [38]. Here, one needs to first recall the time-frequency
representation of the observed vector equation (2),
x(q, w) = H(w)s(q, w)
9.0

(4.16)

The aim of this study is to estimate the demixing matrix W(w)	from the observed

vector x(q, w) under the assumption that the impulse response of all mixing filters is
assumed constant during the recording. The estimated source vector is given as the
following at each frequency bin

‫ݍ(ݕ‬, ‫ݍ(ݔ)ݓ(ܹ = )ݓ‬, ‫)ݓ‬

4.4.1

(4.17)

Step1: Preprocessing (Data Whitening)

In the preprocessing step, the demixing matrix W(w)	are detected up to a unitary

matrix U(w)	using the second order statistic (SOS). This step was used to reduce the
noise and to eliminate redundancy in the data at each frequency bin. The KxK
covariance matrix (R) of the noise free observed signals can be expressed by
ܴ(‫ݍ‬, ‫ = )ݓ‬E[‫ݍ(ݔ‬, ‫ݍ(ݔ)ݓ‬, ‫)ݓ‬ୌ ]						∀	‫ = ݓ‬0, … , ଶ

்

By substituting ‫ݍ(ݔ‬, ‫ )ݓ‬in (21), one gets R as follows
125

(4.18)

R(‫ݍ‬, ‫)ݓ(ܪ = )ݓ‬E[‫ݍ(ݏ‬, ‫ݍ(ݏ)ݓ‬, ‫)ݓ‬ୌ ]‫)ݓ(ܪ‬୘ = ‫)ݓ(ܪ)ݓ(ܪ‬୘

(4.19)

By imposing the Tikhonov regularization techniques [76] to avoid the ill-posed
problem, where it is well-known that the regularization is effective way to avoid the
ill-conditioned matrix, the equation (4.19) becomes as follows
R(‫ݍ‬, ‫ )ݓ‬+ cI = ‫)ݓ(ܪ)ݓ(ܪ‬୘

(4.20)

Where I	is an K	x	K identity matrix, and ܿ = m. ൫tr൫R(‫ݍ‬, ‫)ݓ‬൯ + λ୫ୟ୶ ൯, it is
regularization parameter with m is a positive constant and λ୫ୟ୶ is a maximum
eigenvalue of the estimation covariance matrix	R(‫ݍ‬, ‫)ݓ‬. Note that the regularization
method here just adds energy constraint to boosting the covariance matrix to be a
well-conditioned matrix. Therefore, the R(q, w) + cI can be decomposed as
R(q, w) + cI = V(w)Λ(w)V ୘ (w)

(4.21)

where V(w)	is a KxK matrix satisfying
V(w)V ୌ (w) = V ୌ (w)V(w) = I୏

(4.22)

And Λ(w) is an KxK diagonal matrix. So, from (22), the KxK matrix H(w) will be
H(w) = V(w)Λ(w)ିమ U ୌ (w)
భ

where U(w) is a KxK full rank unitary matrix and 	UU ୌ = I୏ .

(4.23)

However, the whitening step obtained matrix V(w) so that the KxT whitened data
vector Z(q, w) has covariance of identity matrix,	R ୞୞ (q, w) = I୏ , which can be
obtained as follows:
భ

Z(q, w) = Λିమ V ୘ x(q, w)

Z(q, w) = U ୌ (w)s(q, w)

(4.24)
(4.25)

The estimated source signals can be recovered with a linear Zero-Forcing (ZF)
equalizer. Then the estimated KxT source vector
126

y(q, w) = U(w)Z(q, w)

(4.26)

After the preprocessing step, the estimation of the source signals y(q, w)	reduces to
determining the KxK unitary matrix U(w) (rotation matrix).
4.4.2

Step 2: Determining the rotation matrix (unitary matrix)	‫)ܟ(܃‬.

One way of finding the rotational matrix U(w) is by maximizing the normalized
fourth-order marginal cumulant (Kurtosis contrast) of the whitened data	Z in (4.25).

To estimate U(w)	in (4.26), this chapter exploits the statistical independence of

equalized source vector. More precisely, the unitary matrix U(w) will be estimated by
utilizing the independent property of estimated source vector at each frequency bin
(y(q, w)	)	in the normalized fourth-order marginal cumulant of whitened data Z(q, w)
as follows:
K(q, w) =

మ

୉ൣ|୷(୯,୵)|ర ൧ିଶ୉మ ൣ|୷(୯,୵)|మ ൧ିห୉ൣ୷(୯,୵)మ ൧ห
୉మ [|୷(୯,୵)|మ ]

(4.27)

Where E[∙] represents the expectation operator. Based on the deflation approach to
ICA [97], one can extract the one of the estimated source signal as follows
y୧ (q, w) = uୌ
୧ (w)	x(q, w)

(4.28)

Where (∙)ு represent the conjugate-transpose operator,‫ݑ‬௜ (‫ )ݓ‬is the ith column

vector of the demixing matrix ܷ(‫ )ݓ‬and ‫ݕ‬௜ (‫ݍ‬, ‫ )ݓ‬is ith source signal at each wth

frequency bin and qth frame time. According to [2], [3], the column vector ‫ݑ‬௜ (‫ )ݓ‬of the

demixing matrix ܷ(‫ )ݓ‬can be estimated for all users due to the batch adaptation by a
gradient decent method as follows

‫ݑ‬௜௟ାଵ = ‫ݑ‬௜௟ − ߤ∆‫ܩ‬௜௟
127

(4.29)

Where ݈ denotes the iteration index, ‫ݑ‬௜௟ is the ith column vector of the demixing

matrix ܷ(‫ )ݓ‬at ݈‫ݐ‬ℎ iteration and ∆‫ܩ‬௜௟ is the gradient of the contrast measure that updates
the demixing vector ‫ݑ‬௜௟ in the demixing matrix ܷ(‫)ݓ‬. Gradient function depends on the

cost function that ICA would maximize /minimizes in order to extract the source signal.
Herein, this chapter refers to use the ICA techniques based on the kurtosis criterion which
is given in (4.27) as follows:
‫ݍ(ܭ‬, ‫= )ݓ‬

ாൣ|௬(௤,௪)|ర ൧ିଶா మ ൣ|௬(௤,௪)|మ ൧ିหாൣ௬(௤,௪)మ ൧ห
ா మ [|௬(௤,௪)|మ ]

మ

(4.30)

Owning the RobustICA’s search-method of the kurtosis criterion in (4.30) in
order to choose the optimal step size as follows:
μ୭୮୲ = arg 		୫ୟ୶
					ஜ |K൫‫ݍ(ݕ‬, ‫ )ݓ‬+ μg(‫ݍ‬, ‫)ݓ‬൯|

(4.31)

Where ݃ is the gradient of Kurtosis contrast	‫(ܭ‬. ). One can easily choose the

optimal step size μ୭୮୲ based on one of the algebraic methods instead of using the exact
line search as in [8], [28] to avoid the intensive computation and other limitations as in
[11]. Therefore, it is easy to find the global optimum step size μ୭୮୲ for the criteria that

can be expressed as polynomial function of μ due to its roots, e.g. the criteria kurtosis

[11-16], the constant modulus [13], [82] and the constant power [2].
Therefore, RobustICA performs an optimal step-size of estimating ith source

signal, based on optimization, for lth iteration, wth each frequency bin, and ‫ݍ‬th frame as
follows:
•

Step 1) An initial value for the Wight vector 	u(w)

•

Step 2) Compute the optimal step size polynomial coefficients; for
Kurtosis contrast, the optimal step size polynomial is given by

128

p(u(w)) = ∑ସ୩ୀ଴ a୩ μ୩

(4.32)

where the coefficients a୩ can be obtained at each iteration by the observed signal
block and the current values of w and g. Details can be found in [7], [11].
•

Step 3) Extract the optimal step size polynomial root	μ୩ . The root can be
obtained by using the Ferrari’s formula as in [134].

•

Step 4) Select the optimal step size polynomial root μ୩ as follows
୪
୪
μ୭୮୲ = arg 		୫ୟ୶
					ஜ |K ቀy୧ (q, w) + μg ୧ 	(q, w)ቁ |

•

(4.33)

Step 5) Find the updated weighed vector
= u୪୧ − μ୭୮୲ g ୪୧
u୪ାଵ
୧

(4.34)

where g ୪୧ is the ith gradient of Kurtosis contrast	K(. ) at lth
iteration.
•

Step 6) Normalize and update the weight vector
u୪ାଵ
=
୧

୳ౢశభ
౟

ฮ୳ౢశభ
౟ ฮ

(4.35)

Where ‖u‖ is a norm of		u.
•

Step 7) Go back to step 2 until the convergence.

To prevent locking onto a previous extracted source, or when the old and new
vectors w are in the same direction, the learning converges and their absolute dot-product
value reaches close to 1. Thus, owning the deflation method proposed in [97], avoids
different vectors from converging at the same maxima. However, each vector of 	U =
{uଵ , uଶ , … , u୬ } needs to be orthogonalized before each iteration. Based on the GramSchmidt orthogonalization, the deflation scheme estimates each independent component

129

at each iteration step. Gram-Schmidt orthogonalization of (i + 1)th component can be
expressed as follows
୏
୪
୪
୪
୪
u୪ାଵ
୧ାଵ = u୧ାଵ − ∑୨ୀଵ ቀu୧ାଵ u୨ ቁ u୨
୘

u୪ାଵ
୧ାଵ =

୳ౢశభ
౟శభ

ฮ୳ౢశభ
౟శభ ฮ

(4.36)
(4.37)

where a new weight vector u୧ାଵ is obtained by subtracting the vector projected
from the old weight vector.
The following steps summarize the presented algorithm procedure:
Start
Perform the time-frequency representation as in (4.4).
்

For each frequency bin ‫ = ݓ‬1, … , ଶ

Pre-processing of the observed data ‫ݐ(ݔ‬, ‫ 	)ݓ‬and imposing the Tikhonov
regularization parameter to avoid the ill-conditioning problem of the covariance
matrix and to mitigate the performance degradation.
ߙ = ݉. ‫ݐ(ݔ[ܧ(ݎݐ‬, ‫ ݔ)ݓ‬ு (‫ݐ‬, ‫)])ݓ‬

(4.38)

where ݉ is a positive constant and ‫ )∙(ݎݐ‬represent the trace of estimation
covariance matrix of the observation signals.
Initialize ‫ ܭ	ݔ	ܭ‬matrix ܹ equals identify matrix	‫ܫ‬. Where K is the number of
users.

For each user ݇ = 1, … , ‫ܭ‬

Initialize ‫ݓ‬௞ column vector of the demixing matrix ܹ
While

Evaluate ‫ݐ(ݕ‬, ‫ )ݓ‬in (4.13)

130

Select the optimal step size polynomial root ߤ ௞ in (4.14)
update weighed vector in (4.33)
Do the orthogonalization and normalization in (4.36) and (4.37), respectively
Find kth users in (4.10).

‫ݕ‬௜ (‫ݍ‬, ‫ݑ = )ݓ‬௜ு (‫ݍ(ݖ	)ݓ‬, ‫)ݓ‬

(4.39)

Do deflation by subtracting the estimated kth source contribution to
the	‫ݍ(ݖ‬, ‫	)ݓ‬as follows [97]:

‫ݖ‬௜ାଵ (‫ݍ‬, ‫ݖ = )ݓ‬௜ (‫ݍ‬, ‫ )ݓ‬− ℎ ∗ ‫ݕ‬௜ (‫ݍ‬, ‫)ݓ‬

(4.40)

Where	ℎ is the symbol direction estimated via least squares, is given by
ℎ = ௬೔ (௤,௪).௬೔ಹ(௤,௪)
௭ (௤,௪).௬ ಹ (௤,௪)
೔

೔

(4.41)

Check the convergence point. if so, End while loop, otherwise, go back until the
convergence.

Save ‫ݑ‬௞ in the ܷ(‫;)ݓ‬

End for loop	݇.

Save the demixing matrix ܷ(‫)ݓ‬

End ‫ ݓ‬loop

4.5
6.0

Scaling and Permutation Ambiguities

Assume ܷ(‫ )ݓ‬is the unitary matrix that computed at each bin, however, the least

square estimation of the mixing matrix ‫ )ݓ(ܪ‬is given by

‫ܪ‬௅ௌ (‫ݍ(ݔ = )ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫ݍ(ݕ()ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫))ݓ‬ା
7.0

Where

‫ݍ(ݕ‬, ‫ݍ(ݔ)ݓ(ܷ = )ݓ‬, ‫)ݓ‬
131

(4.42)

(4.43)

8.0

However, one can express the estimated mixing matrix ‫ܪ‬௅ௌ (‫ )ݓ‬in term of the

perfect mixing matrix ‫ )ݓ(ܪ‬as follows
‫ܪ‬௅ௌ (‫ିܦ)ݓ(ܪ = )ݓ‬ଵ (‫)ݓ‬Γ ିଵ (‫)ݓ‬

9.0

(4.44)

Where	D(‫ )ݓ‬is an unknown diagonal matrix and Γ(‫ )ݓ‬is an unknown

permutation matrix. Therefore, we have to estimate D(‫ )ݓ‬and Γ(‫ )ݓ‬matrices to solve the
scaling and permutation ambiguities.

4.5.1

Estimation the diagonal matrix ࡰ(࢝)

Several methods to compensate the scale ambiguity have been proposed in the
literature. Thus, we choose to estimate the diagonal matrix D(‫ )ݓ‬using the minimal
distortion principle [3], [38], [129]. The D(‫ )ݓ‬is given in [3] as
D(‫ܪܣ[݃ܽ݅݀ = )ݓ‬௅ௌ (‫])ݓ‬

(4.45)

D(‫ݍ(ݔܣ[݃ܽ݅݀ = )ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫ݍ(ݕ()ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫))ݓ‬ା ]

(4.46)

Where ‫ 	 ∈ 	ܣ‬ℝ௡	௫	௠ is a matrix which has all its entries are1ൗ݉; where m is the

number of observations whereas, ݊ is the number of sources, and ݀݅ܽ݃[‫ ]ܥ‬returns a

matrix ‫ ܥ‬that contains the diagonal elements of matrix ‫ ܥ‬and sets the other non-diagonal

elements of matrix ‫ ܥ‬zeros. The interpretation of (4.46), in a sense of perfect separation,
is that each estimated source averages along the sensors in the sense of all other sources
have turned off. In other words, the Minimal Distortion Principle assumes that the nth
source is scaled with respect to the image at the nth microphone [129]. Therefore, the
rescaled source signals can be expressed as follows:

‫ ݕ‬௥௘௦௖௔௟௘ௗ (‫ݍ‬, ‫ ≅ )ݓ‬D(‫ݍ(ݔ)ݓ‬, ‫)ݓ‬

‫ ݕ‬௥௘௦௖௔௟௘ௗ (‫ݍ‬, ‫ݍ(ݔܣ[݃ܽ݅݀	 ≅ )ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫ݍ(ݕ()ݓ‬, ‫ ݕ)ݓ‬ᇱ (‫ݍ‬, ‫))ݓ‬ା ]‫ݍ(ݔ‬, ‫)ݓ‬
132

(4.47)
(4.48)

4.5.2

Estimation the permutation matrix ડ(‫	)ܟ‬

In this subsection, despite the fact that it estimates the permutation matrix	Γ(‫)ݓ‬
proposed in several current works in the literature, it is still considered a very challenging
problem that needs to be addressed. Assume that we have ݊ sources signals which are
presented in BSS problem; then, there are ݊ factorial times the possible permutations at
each bin, which yields a complex combinational problem.
Mentioned previously, there are several techniques used to solve the permutation
problem in the literature [3]. In this chapter, we will review and evaluate them in terms of
computational complexity and performance.
One can divide these methods into two main solution groups to solve the
permutation ambiguity in frequency domain as follows:
1. Group based on the geometric information such as Time Direction of Arrivals
(TDOA) and Direction of Arrivals (DOA) [3], [37], [38], [50].
2. Group based on the clustering-based techniques [64], [67], [69], [70].
Many of these techniques are based on the geometric information, such as
estimation of the direction of arrival (DOA) and Time difference of Arrival (TDOA) as in
[50]. Other techniques depend on the coherence of the un-mixing filter coefficients. In
other words, these techniques take advantage of some prior knowledge about mixing
filters and restrict the mixing matrix ‫ )݂(ܪ‬to be continuous in frequency domain [130].
Furthermore, in [62], Parra imposes smoothness to the de-mixing filter values in the
frequency domain.
Also, a restriction is made with the frequency domain update rule to be associated
with the limited length filter in the time domain. Such a restriction may not be considered

133

sophisticated especially in a case of reverberant environment since it is necessary to have
a long length filter to cover all reverberations.
Although it can be avoided by choosing a large frame size, it still causes more
overall complexity, especially, when the short mixtures are available. In terms of the
properties of speech, there are other categories which have been proposed in literature, to
estimate the permutation matrix and make the spectral alignment.
The most common is based on the inter-frequency correlation of speech envelopes
[61], [65]. The inter-frequency correlation technique exploits the nature of speech
production, where it’s known that all spectral components of speech signal increase as the
talker speaks louder. In that sense, several weighted techniques and criteria have been
proposed to impose the frequency-coupling between the adjacent frequency bins, for
more details [60], [61], [65]. Although these techniques perform well in the simulations,
they are not sophisticated when they are applied to a real recoding room. They suffer
from the propagation error or delays. For example, if an error occurs at a certain
frequency bin it may increase the possibility to occur again at the following frequency
bins.
Therefore, in the literature [3], [38], they avoid propagation error by estimating a
frequency independent reference profile, which is called centroid, due to using a
clustering based method for each separated source, and then structuring the ݊ frequency
dependent profiles such that they are all matched with a different frequency independent
reference profile at each frequency bin.
The main steps of the clustering-based techniques are as follows:

134

1) Define the quantities that are used in the clustering, such as signal envelopes of
the source profiles and the log-power of the source profiles etc.
2) Choose the measure that is use to determine the matching level between the
centroids and the profiles such as correlation and distance etc.
3) Choose the cluster technique.

In [69], the profile Ψ௜ (‫ݍ‬, ‫ )ݓ‬of a separated signal ‫ݕ‬௜ is chosen to be the envelope

of the separated source ‫ݏ‬௜ where Ψ௜ (‫ݍ‬, ‫ݕ| = )ݓ‬௜ (‫ݍ‬, ‫|)ݓ‬. In [70], they are chosen for the

profile Ψ௜ (‫ݍ‬, ‫ )ݓ‬of a separated signal ‫ݕ‬௜ to be a certain dominance measure. Whereas, in
[67], the profile Ψ௜ (‫ݍ‬, ‫ )ݓ‬of a separated signal ‫ݕ‬௜ is defined to be its centered log-power
spectral density where the log-power profile is given as follows:

	Ψ௜ (‫ݍ‬, ‫݃݋݈ = )ݓ‬ൣܹ௜,: (‫ܴ)ݓ‬௫ (‫ݍ‬, ‫ܹ)ݓ‬௜,:ு (‫)ݓ‬൧

(4.49)

In clustering based approaches, the length of the profiles T௙ is also an important
parameter in terms of accuracy; especially for short signals. Practically, we are going to
set up the profiles for the overlapping frames over the whole signal. Once we get the
profiles of the separated signals, then, we compute the centroids in order to perform the
clustering.
The clustering based techniques is essentially based on the assumption that
profiles coming from the same source at different frequency bins still have more match
level than those coming from other sources.
Actually, the most common methods to associate each source profile to a centroid
at each frequency bin are based on; 1) maximize correlation measures [69], [70]; 2)
Minimize distance measures across the ݊ factorial times of the possible permutations at
each frequency bin [67]. However, they employ the iterative techniques to update the

135

centroids and the permutation matrices. In other words, they update the centroids first,
and then they permute the source profiles to each desired centroid and match them
together using one of the two previous measures, i.e. distance [67] or correlation in [69]
and [70]).
In spite of the fact that the aforementioned iterative methods perform well, they
tend to be significantly more expensive in terms of cost and computational complexity
since they have the ݊ factorial times of the possible permutations at each frequency bin.
To avoid this drawback in the aforementioned iterative methods in [60], Nion
proposes a more efficient modification of the clustering strategy which is updated the
whole permutation matrices and centroids simultaneously. In other words, the update of
the centroids and permutation matrices are not interleaved. Thus, their modification has
improved these iterative methods in terms of computational complexity. Their methods
can be summarized as follows:
Step 1. Determine the centroids and Compute them as following:

Consider the ݊	‫ܶ	ݔ‬௙ matrix ℱ(‫ )ݓ‬that is structured from the ݊ profiles	Ψ௜ (‫)ݓ‬,

∀		݅ = 1, … , ݊. Furthermore, one can extend the ݊	‫ܶ	ݔ‬௙ matrix ‫ )ݓ(ܨ‬to the ‫ܶ	ݔ	݊ܨ‬௙ matrix

‫ )ݓ(ܩ‬by concatenating the matrices	ℱ(‫ = ݓ		∀	)ݓ‬1, … , ‫ܨ‬. In order to enforce the ‫݊ܨ‬

profile points in matrix ‫ )ݓ(ܩ‬varying smoothly with time, we have to encounter the
computation of the profiles for overlapping frames. Hereafter, we just need to classify
these ‫ ݊ܨ‬profile points into an ݊ clusters due to apply the k-mean algorithm on the
‫ܶ	ݔ	݊ܨ‬௙ matrix	‫ )ݓ(ܩ‬to carry out a frequency independent ݊	‫ܶ	ݔ‬௙ centroid matrix	‫= ܯ‬

[݉ଵ் , ݉ଶ் , … , ݉௡் ]் . The centroid matrix is structured by summing all the points within a
cluster, which have attained a minimum distance regarding to the centroid cluster.
136

Furthermore, the k-means algorithm also gives the list of indices that attains each

one of ݊	clusters. Thus, our simulation shows that almost about ‫ ܨ‬points are assigned to
each cluster, which implies that the aforementioned property of the speech is valid.

Therefore, we only need to exploit the frequency independent ݊	‫ܶ	ݔ‬௙ centroid matrix ‫ ܯ‬to
do the computation processes and mitigate the computational complexity.

Step 2. Estimating the permutation matrices
In previous step, we reduced the computational processes to finding the ݊	‫݊	ݔ‬

permutation matrix Γ(‫ )ݓ‬subject to ‫)ݓ(ܩ‬Γ(‫ )ݓ‬match the frequency independent ݊	‫ܶ	ݔ‬௙

centroid matrix ‫ ܯ‬at each frequency bin. Therefore, one can choose to minimize the

distance that is given in [67] as follows

min୻(௪) ‖‫ ܯ‬− ‫)ݓ(ܩ‬Γ(‫‖)ݓ‬ଶி 	∀	‫ = ݓ‬1, 2, … ‫	ܨ‬

(4.50)

Or choosing the correlation criteria that is given by [69], [70] as follows
max୻(௪) ∑௡௜ୀଵ Φ〈݉௜ , [‫)ݓ(ܩ‬Γ(‫])ݓ‬:,௜ 	〉 	∀	‫ = ݓ‬1, 2, … ‫			ܨ‬

Where Φ〈∙	,∙〉 is the correlation coefficient.

(4.51)

In terms of performance, the first group generally does well and better than the
second group, especially at the small data sample available. But it is not optimal in
practical sense, since we don’t usually have geometric information about the real
environment conditions. In that sense, the second group performed better than first group
especially if we have a large sample set of data, because they are based on the clusteringbased techniques i.e.: correlation, distance, etc. and they are more robust to the real world
scenarios. For more details, refers to [3] [38].

137

4.6

Experiments Results
In this section, we examine the performance of the RobustICA-based algorithm

developed in this chapter. The time-frequency representation of the observed data is
computed as explained in section due to the Short-Time-Fourier-Transform. Then, for
each frequency bin, we find the demixing matrix as we will present in this section. Then,
we will solve the scale and permutation ambiguities based on the aforementioned
techniques.
However, we divided this section into two subsections. Firstly, we illustrate the
performance of the RobustICA-based algorithm with different permutation methods in
the literature [3], [38]. Also, we study the effect of the type of the windows on the
performance of the presented algorithm as well as the effect of overlapping parameter.

Figure 4.1: configuration of the two experimental setups that were conducted by
Francesco Nesta1 in [130], a) room is characterized for Test1, b) class-room is
characterized for Test2

Secondly, we provide the performance of the presented algorithm in two realworld scenarios that are generated in adverse conditions by, in [130], and compare it with

138

others state-of-the-arts in the [130], [36], and [62] and [67], labeled as “RR-ICA”, “IVA”,
“Parra”, “Pham”, respectively. In this chapter, we evaluate the performance of the
presented algorithm due to the BSS_EVAL toolbox, which is proposed in [126], [127].
We use time-invariant filters of 1024 taps to represent the signal to interference ratio
(SIR) and source to distortion ratio (SDR).

4.6.1

Section 1

In this subsection, we study the computational complexity and the performance of
the presented algorithm based on several following criteria to solve the scale and
permutation ambiguities in the frequency domain BSS problem. Let’s define these
criteria as following:
Method1 is the RobustICA-based algorithm with clustering of envelope profiles
with a distance measure iterative procedure [8].
Method2 is the RobustICA-based algorithm with clustering of log-power -profiles
with a correlation measure iterative procedure [21].
Method3 is the RobustICA-based algorithm with clustering of envelope profiles
with a distance measure kmeans procedure [8], [60].
Method4 is the RobustICA-based algorithm with clustering of log-power profiles
with a correlation measure kmeans procedure [21], [60].
Method5 is the RobustICA-based algorithm with clustering of dominance-profiles
with a correlation measure iterative procedure [22].
Method6 is the RobustICA-based algorithm with clustering of dominance-profiles
with a correlation measure iterative kmeans procedure [22], [60].

139

18
16
14

method 1

12

method 2

10

method 3

8

method 4

6

method 5

4

method 6

2
0
0.5s

1s

2s

4s

9s

Figure 4.2: Results obtained in Test1 experiments. The SIR performance of
the presented algorithm with various permutation solvers

In this section, we have used real world recordings, resulting from the
experiments were conducted in [130] named: Test1. We would like to thank the authors
who provided these recordings on their website “http://bssnesta.webatu.com/testhscma.html”.
The two sources were recorded at ݂௦ = 16	݇‫ ݖܪ‬with two microphones spaced by
݀ = 0.02	݉ apart to avoid the spatial aliasing. The chosen room was characterized by a

moderate reverberant time of 160 ms. The room had dimensions of (3.5m x 5.1m x 2.6m)
as shown in Figure 4.1. The signal duration was fixed to be 9 sec.

140

14
12
method 1

10

method 2

8

method 3
6

method 4

4

method 5
method 6

2
0
0.5s

1s

2s

4s

9s

Figure 4.3: Results obtained in Test1 experiments. The SDR performance
of the presented algorithm with various permutation solvers

In Figure 4.2 and Figure 4.3, we show the performance of RobustICA-based
algorithm with various aforementioned techniques of permutation solvers in terms of the
SIR and SDR, respectively.

In comparison, we notice that the dominance-profiles

provide more robustness in terms of the signal’s length, although the envelope profiles
are more sensitive to the signal’s length than the log-power profiles. Moreover, the
dominance-profiles approach with the iterative procedure has the same performance as
with the kmean procedure. Also, Figure 4.4 shows the corresponding CPU time of each
permutation method that need to solve the permutation ambiguity.

141

0.8
0.7
0.6

method 1

0.5

method 2

0.4

method 3
method 4

0.3

method 5
0.2

method 6

0.1
0
0.5s

1s

2s

4s

9s

Figure 4.4: Corresponding CPU time for each method.
Based on these observations, we will use the dominance-profiles approach with
the iterative procedure after the RobustICA-based algorithm in the rest of these
experiments. In Figure 4.5, we illustrate the impact of the window’s types on the
performance of the proposed algorithm in terms of SIR and SDR respectively. And, we
test the performance of the presented algorithm versus the overlapping parameter as it
shown in Figure 4.6.The best performance of the presented algorithm was achieved
during the certain range of the overlapping percentage. Therefore, based on these results,
we use the overlapping parameters to be 0.65 with Hamming window type.

142

16
14
12
10
8

SDRs

6

SIRs

4

SARs

2
0

Figure 4.5: Figure 4.5: Results obtained in Test2 experiments. The SIR performance
of the presented algorithm with various window types

16
14
12
10
SDRs

8

SIRs
6

SARs

4
2
0.95

0.9

0.85

0.8

0.75

0.7

0.65

0.6

0.55

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0

0

Figure 4.6: Results obtained in Test2 experiments. The SIR performance of the
presented algorithm with various overlap ratios

143

4.6.2

Section 2

In this section, we perform the separation of the two mixture observations what
consists of two sources. We have used the two tests “Test1 and Test2” of the real world
recordings, resulting from the experiments were conducted in [130], see fig.23. Test2
uses the real world recordings of adverse reverberant conditions as in fig 23. The room
is a reverberant class-room with dimensions 4.75 m Length x 5.92 m Width x 4.5 m
Height. The reverberation time is around 700 mile second	ܶ଺଴ = 700݉‫ݏ‬. The

performance is also averaged over ten random pairs of sources. The signal duration was

fixed to be 9 sec. After we got the demixing matrix ܹ for each frequency bins, we used

the Inverse Fourier Transform to obtain the mixing matrix in the time domain.
The independent vector analysis IVA [17] used with step size 0.1 and
number of iterations is 1000.
Parra’s method [38], used with number of iterations is 1000.

Pham’s algorithm [12] and [20]. Used with FFT overlapping equals 75%

and a window size equal to 5.
RR-ICA algorithm reported in [130].
Figures 4.7 & 4.8 and 4.9 & 4.10, shows the summary analysis of the presented
algorithm versus other algorithms presented in [130] for Test1 and Test2 configurations,
respectively. These graphs are reported the best performance of each algorithm over the
FFT size.

Obviously, the RobustICA-based algorithm are outperforms the others

algorithms for any signal length in terms of SIR and SDR.
Moreover, in Figure 4.11, we illustrate the impact of the FFT length on the
performance of the proposed algorithm in terms of SIR. Clearly, the presented algorithm
144

performs well especially during reasonable FFT length in regards to other corresponding
algorithms as it shown in Figure 4.11.
Based on these results, one can show that the presented algorithm is stable in
terms of the high reverberation environment and variations of the observations’
parameters. Furthermore, the presented algorithm performs well in terms of stability and
speed convergence. Owning the optimal step size, deflation and regularization techniques
make the presented algorithm more robust and perform well even though in the adverse
conditions. Therefore, the presented algorithm performs well for solving the convloutive
BSS problem of the real-world recordings in adverse conditions.
20
18
16
14

Proposed Method

12

RR-ICA

10

IVA

8

Parra's Mehtod

6
Pham's method

4
2
0
0.5

1

2

4

9

Figure 4.7: Results obtained in the Test1 experiments [130]. Best performance
is reported in terms of SIR, by applying the given algorithms
with different signal lengths

145

12
10
Proposed Method

8

RR-ICA
6

IVA
Parra's method

4

pham's algorihtm
2
0
0.5

1

2

4

9

Figure 4.8: Results obtained in the Test1 experiments [130].
Best performance is reported in terms of SDR,
by applying the given algorithms with different signal lengths

16
14
12
proposed method

10

RR-ICA
8

IVA

6

Parra's method

4

pham's algorihtm

2
0
0.5

1

2

4

9

Figure 4.9: Results obtained in the Test2 experiments [130].
Best performance is reported in terms of SIR, by applying the given
algorithms with different signal lengths

146

8
7
6
proposed method

5

RR-ICA
4

IVA

3

Parra's method

2

pham's algorihtm

1
0
0.5

1

2

4

9

Figure 4.10: Results obtained in the Test2 experiments [130].
Best performance is reported in terms of SDR, by applying the given
algorithms with different signal lengths

25
20
proposed method
15

RR-ICA
IVA

10

Parra's method
pham's algorihtm

5
0
512

1024

2048

4096

8192

16384

Figure 4.11: Impact of FFT length, 2-by-2 case, Results obtained in the
Test2 experiments [130].

147

4.7

Conclusion
This chapter presented the RobustICA-based algorithm to solve the frequency-

domain BSS problem for convolutive acoustic mixtures in several adverse conditions.
Through the real-world experiments, we show the superiority of the presented algorithm
among other popular algorithms in the literature in terms of the performance and
complexity computation. Moreover, we compared several permutation solvers in terms of
computation complexity and performance to provide the RobustICA-based algorithm
with an efficient frequency-dependent permutation scheme. Finally, we studied the effect
of several parameters on the separation performance of the presented algorithm. We also
presented the effect of the type of the window on the separation performance and we also
showed that the performance improves at a certain range of overlapping between the
signals. Lastly, in this chapter, we showed the performance of a system that can work
efficiently with around 0.5–10 seconds of input data, which is close to the real-time
implementation. Accordingly, our proposed algorithm is optimized to be suitable for the
real-time operation. As a result, it is suitable for a large number of applications to ensure
the real-time implementation.

148

5 Chapter 5

Robust Blind Multiuser Detection Algorithm
Using Fourth Order Cumulant Matrices

A new blind detection algorithm, based on fourth order cumulant matrices, is
presented and applied to the multi-user symbol estimation problem in Direct Sequence
Code Division Multiple Access (DS-CDMA) systems. The blind detection is to estimate
multiple symbol sequences in the downlink of a DS-CDMA communication system using
only the received wireless data and without any knowledge of the user spreading codes.
The proposed algorithm takes advantage of higher cumulant matrix properties to reduce
the computational load and enhance performance. Bit error rate (BER) simulations of this
algorithm are shown for different number of users, signal to noise ratios (SNR) and
different number of symbols per user in comparison with the FAST ICA and Robust ICA
algorithms. The results show that the proposed algorithm outperforms both ICA-based
detectors in estimating the symbol signals from the received mixed signals. Moreover, the
proposed blind detector is computationally fast and exhibits high convergence speed in
extracting user symbols.

5.1 Introduction
Communication systems performance hinges on speedy and reliable data/symbol

149

transfer among users. In that context, data reaches each user with as few errors as
possible. Code Division Multiple Access (CDMA) family of systems continue to be the
most deployed and popular multiple access scheme. This is mainly due to its soft multiple
access characteristics, robustness against fading and its anti-interference capability. In
schemes accepting non-orthogonal multiple-access designs, like direct sequence code
division multiple-access (DS-CDMA), multiple-access interference (MAI) is the limiting
feature of the scheme’s capacity. To alleviate MAI, a variety of multi-user
detectors/receivers have been proposed; whereas, most of them need either the data of the
preferred user’s dispersion sequence or a preparation (pilot) sequence. When neither is
easily possible, due to computational delays or time constraints, the challenge of
extracting the broadcast information generally belongs to the domain of blind source
separation (BSS) [74-80].
The conventional or single-user detection (SUD) methods consider MAI as
external noise. In an alternative approach, the structure of MAI is modeled as in the work
on optimum multi user detection [76]. However, it has been shown that these detectors
either need a complete knowledge of the MAI and training data, or involve a long
decoding delay [74]. To overcome these limitations, classes of efficient blind detectors
are proposed in the literature. However, the blind detection techniques in the wireless
communication literature, e.g., in [75] utilize primarily the Second Order Statistics (SOS)
and some Higher Order Statistics (HOS) of the received data. Independent Component
Analysis (ICA) is a statistical technique that includes HOS, where the goal is to represent
a set of random variables as a linear transformation of statistically independent
components [78]. ICA based techniques are based on the assumption of non-Gaussianity

150

and independence of the sources. For example, the Fast ICA algorithm is applied for the
detection of symbols in DS-CDMA systems in [77], while its convergence is not ensured.
RAKE-ICA is also proposed in [75]. Recently, blind multiuser detection based on
Tikhonov regularization has been proposed in [76], which requires the prior knowledge
of the signature sequence and timing of the desired user.
ICA based algorithms have been the most active in solving BSS problems during
the last decade [1] – [2]. There are numerous targeted applications in speech, image and
biomedical signal processing. Several approaches have been established for constructing
the ICA algorithms [9– 20], and one such an approach is based on information theory.
The Information Maximum (InfoMax) algorithm proposed in [18], e.g., is derived from
HOS. It extracts the source signals from the mixed signals; however, it involves high
computational complexity. To reduce this complexity, the Fast ICA algorithm, based on
an approximation of Negentropy and Newton iteration, has been subsequently developed.
A comparative study of ICA-based algorithms for the DS-CDMA detection
problem [75], [88] shows that the Fast ICA algorithm performs comparatively well in
extracting the unknown symbols. Whilst, it also involves high computational complexity
and exhibits slow convergence.
Recently, Zarzoso et al [11] have proposed an ICA-based algorithm with a new
contrast function that can avoid permutation ambiguity and has quantitatively better
separation quality than the so-called conventional ICA algorithm (C-ICA). Furthermore,
Zarzoso and Comon [1] have introduced a search line into the iterative maximizing
Kurtosis contrast function to render the Fast ICA more Robust and to increase its
computational efficiency. By using an algebraic optimal step size, they have shown that

151

the extraction quality would be better than the Fast ICA algorithm. Therefore, the Robust
ICA algorithm has efficient extraction primarily for smaller number of components
(sources). Moreover, there exist some non-linear ICA-based algorithms that are used to
separate dependent source signals [89] and can extract only one signal [90].
This chapter presents a Blind DS-CDMA detection technique based on fourth
order cumulant matrices [21], [34]. This technique is very effective in multi-user CDMA
environments where no prior information of the user’s code is required to be known at
the receiver. This approach is considered blind because the spreading codes of all the
users, the characteristics of the environment, as well as the transmitted symbols are
assumed unknown. The simulation scenarios are carried out to observe variations in the
bit error-rate as a function of (i) signal to noise ratio, (ii) number of users and/or (iii)
number of symbols per user. Furthermore, the performance of the proposed algorithm is
quantified and a comparison is made among the three ICA-based algorithms in terms of
their performance and computational complexity. The remainder of the chapter is
organized as follows. In Section II, a brief description of DS-CDMA signal model and
multi-path fading is presented. In Section III, Robust and Fast Independent Component
Analysis (ICA) and their signal model are discussed briefly. Section IV proposes the new
detector based on the fourth order cumulant matrices. The comparative simulation results
and conclusions are given in Sections V and VI, respectively.

5.2 DS-CMDA Signal Model
In a typical downlink (Synchronous) CDMA system, the CDMA employed in
Evolved High-Speed Packet Access (HSPA+), in 4G systems, to keep the transmitted
bandwidth constant regardless of the bit-rate through solving the non-symmetric user152

bandwidth problem. The synchronous DS-CDMA system has been used in Satellite
systems, indoor ATM, and in certain ad hoc wireless network because of its attractive
features (namely, anti-interference capability and robustness against fading). DS-CDMA
systems allow several users to share the medium simultaneously by using their own
signatures (spreading codes).

The synchronous DS-CDMA system assigns shorter

spreading codes to higher-rate users and longer spreading codes to lower-rate users, while
keeping the chip rate constant. A typical DS-CDMA system model is given, e.g., in [75].
The simplest received signal model r(t)	 is
୏
୐
r(t) = ∑୑
୫ୀଵ ∑୩ୀଵ ∑୪ୀ଴ α୪୫ b୩,୫ s୩ (t − mTୠ − d୪ Tୡ ) + n(t)

(5.1)

where
•

l, k, m are path, user and symbol indices, respectively.

•

α୪୫ is the path gain since, in a downlink model, the path gain is the same
because all users’ coded signals are transmitted together and the path gain
α୪୫ and propagation delay factor	d୪ depend only on the signal paths.

•

b୩,୫ is the kth user m symbol.

•

s୩ (. ) is the kth user spreading code

•

d୪ is the propagation delay factor, d୩୪ ∈ {0, 1, … ,

େିଵ
ଶ

} (assumed to be of

duration of at most half the sequence C).
•

t, Tୠ , Tୡ are time, symbol, and chip duration, respectively.

•

n(t) is an additive white Gaussian noise (AWGN).

The received signal is assumed to be properly sampled and synchronized discrete
data. However, let us assume that G is the number of code sequence, K is the number of

153

users, and L is the number of channels. Thus, the vector form of the equation (1) will
change to be as:
r = ASb + n

(5.2)

Where r is the received vector signal, A is an (G + L − 1)	x	G matrix which

represents the multipath propagation coefficients, S is an G	x	K block diagonal matrix, b

is an K − d vector which represent the data symbols, n is the (G + L − 1) − d channel
noise vector with covariance matrix Q. This model received signals (5.2) is suitable for
deriving the linear symbols detectors such as the MF, the RAKE, the LMMSE and the
blind Detectors based on FastICA and Robust ICA algorithms.

5.3 Robust and Fast Independent Component Analysis (ICA)
A simple Blind Source Separation (BSS) model with n Source Signals and n
observations is defined as follows:
Let the sources form the vector

S = [sଵ , sଶ , … , s୬ ]୘

(5.3)

Let the observations form the vector
X = [xଵ , xଶ , … , x୬ ]୘

(5.4)

The (static) linear BSS model is
X = AS + N

(5.5)

where A is an n x n (invertible) mixing matrix, and the vector N is an Additive
Gaussian Noise. The ICA algorithm assumes that no more than one source signal has
Gaussian distribution.
The main idea of ICA is to recover the source signals from the observed signals
without a priori knowledge of the vector source signal or the mixing matrix. To achieve
154

this, an ICA-based algorithm iteratively computes a weighting matrix W that
incrementally approximates the inverse of the mixing matrix A. The estimated source
signal Y is thus given as follows:

Y = WX

(5.6)

W = ൣw୨୩ ൧, 1 ≤ j, k ≤ n

(5.7)

where

As it is a linear transformation, we can choose to estimate one of the independent
components in the form of wX, where w is a row vector of the matrix W in (5.7). The
estimation can be achieved by maximizing the “non-Gaussianity” of	wX.
Generally Speaking, It can be said that the Fast ICA algorithm entails two steps:
one is the preprocessing step and the other is finding the rotating matrix W that
maximizes the non-Gaussianity of	Y = WX.

5.3.1

The Preprocessing

This involves two computations. The first aims at centering the mixed matrix
signal X by removing its mean as follows:
ഥ
X = X − E[X]

(5.8)

where ഥ
X represents the zero-mean mixed signal of		X, and E[. ] is (an estimate of)
the expected value operator. The second whitens the mixed signals to have a unitary
variance after centering the mixed signal. To that end, we compute The Singular-Value
Decomposition (SVD) [29] is applied to the covariance matrix of ഥ
X to obtain
ഥ	X
ഥ	୘ ] = EDE ୘
C୶ = E[X

155

(5.9)

where E represents the eigenvectors which are orthogonal matrices of mixed
signals, and D is a diagonal of the Eigenvalues 	C୶ , expressed as:
D = diag(dଵ , dଶ , … , d୬ )

(5.10)

ഥ is expressed as
Thus, the whitening operation of X
భ

ഥ = VX
ഥ
Z = (Dିమ E ୘ )X

(5.11)

భ

ഥ.
where V equals Dିమ E ୘ and represents the pre-multiplying whitening matrix of	X
ഥ	is linearly transformed to the matrix Z.
Equation (11) shows that the centered matrix X
Observe now that the covariance matrix of Z equals the identity matrix (I୏ ) [2]. In other
words, the componenets of the vector Z are uncorrelated.

5.3.2

The FastICA Algorithm

The batch algorithm FastICA [25], [26] is derived by maximizing the nonGaussianity of a measure based on the fixed point iteration. The Center-Limit-Theorem
(CLT) [2] infers that (a large) sum of independent sources will become closer to a
Gaussian distribution. The CLT shows that, for whitened data, finding an independent
source is achieved by finding the direction vector w which gives a component its
“maximum non-gaussianity.” (Further discussions are in section II-A.) Moreover, if just
one of the independent components (ICs) is Gaussian, the ICA algorithm can still work
and estimate the ICs. The authors in [25, 26] use the Fast ICA, which is a fixed point
iteration scheme, for finding the maximum of the non-gaussianity. The basic Fast ICA
scheme for one independent component is outlined as follows:
•

Choose an initial value for the weight vector w

•

Find the updated weight vector
156

w ା = E{Z[g(wZ)]୘ } − E{g ᇱ (wZ)}w
(5.12)
•

where g is a non-quadratic function such as	g(y) = y ଷ , y = wZ and g ᇱ is the
derivative of the non-quadratic function g.

•

Normalize and update the weight vector, i.e.,
୵శ

w ା = ‖୵శ‖

(5.13)

where ‖. ‖ is the norm operator.
•

Go back to step 2 until the convergence.
When the old and new vectors w are in the same direction, the update converges

and their absolute dot-product value reaches close to 1. In the same context, the deflation
method proposed in [1], [25] avoids different vectors from converging to the same value.
It enforces each vector of 	W = {wଵ , wଶ , … , w୬ } to be orthogonalized based on the GramSchmidt orthogonalization. The deflation scheme estimates each independent component
at each iteration step. The Gram-Schmidt orthogonalization of the	(k + 1)th component
can be expressed as follows
ା
୘
w୩ାଵ
= w୩ାଵ − ∑୩୨ୀଵ(w୩ାଵ
w୨ ) w୨

w୩ାଵ = ฮ୵ౡశభ
శ ฮ
୵శ

(5.14)
(5.15)

ౡశభ

where a new weight vector w୩ାଵ is obtained by subtracting the vector projected
from the old weight vectors.

157

5.3.3

The Robust ICA algorithm

Recently, Zarzoso and Comon [1] improved the robustness of the Fast ICA
algorithm by using a line search direction to choose the optimal step size
μ୭୮୲ = arg 		୫ୟ୶
					ஜ |K(y + μg)| where g is the gradient of the kurtosis contrast function	K(. ).
At each iteration, the Robust ICA algorithm performs an optimal step-size as follows
[11]:
Choose an initial value for the weight vector ‫ݓ‬
Compute the optimal step size polynomial coefficients; for the kurtosis
contrast function, the optimal step size polynomial is given by
‫∑ = )ݑ(݌‬ସ௞ୀ଴ ܽ௞ ߤ ௞

where the coefficients ܽ௞ can be obtained at each iteration from the

(5.16)

observed signal block and the current value of ‫ ݓ‬and ݃. Details can be
found in [7].
Compute the optimal step size polynomial roots	ߤ ௞ . The roots can be
obtained by using the Ferrari’s formula [36].

Select the optimal step size polynomial root ߤ ௞ as follows:
		௠௔௫
ߤ௢௣௧ = ܽ‫					݃ݎ‬ఓ
|‫ ݕ(ܭ‬+ ߤ݃)|

(5.17)

Find the updated weight vector.

‫ ݓ‬ା = ‫ ݓ‬+ ߤ௢௣௧ ݃

(5.18)

where ݃ is the gradient of the kurtosis contrast function	‫(ܭ‬. ).
Normalize and update the weight vector
௪శ

‫ ݓ‬ା = ‖௪ శ‖

158

(5.19)

where ‖. ‖ is a norm operator.

5.4 The Proposed Detection Algorithm Based On Cumulant Matrices
In this section, a new blind detection strategy is proposed, based on the cumulant
matrices that uses the algorithms JADE, SHIBBS and JAD [2] [16], [17], and [19]. Here,
one needs to first recall the received matrix form (5.2),
r = ASb + n
The aim here is to detect the b symbol vector from the received data r under the
following assumptions:
•

AS1) the G	x	K matrix,	H: = RA, is of full column rank.

•

AS2) the symbol signals, b, are non-Gaussian, independent and identically
distributed (i.i.d)

•

AS3) the Additive Noise vector is white and independent of source signals

•

AS4) the power of the transmitted symbol signals, normalized to be unity.

•

AS5) in this chapter, all signals are real and the number of users K is given.
The Method involves two steps:

5.4.1

Step1: Preprocessing (Data Whitening)

This is a common preprocessing step. As a consequence, the symbol signals are
detected up to a unitary matrix using second order statistics (SOS). This step is used to
reduce the noise and to eliminate correlations among the data components. Under the
Assumptions AS1, AS2, AS3 and AS4, the G	x	G covariance matrix (R) of the noiseless
transmitted signals can be expressed by

159

R = E[rr ୘ ] − σଶ I୏

(5.20)

By substituting X from (5.2) into (5.20), one gets
R = H	E[bb୘ ]	H ୘ = HH ୘

(5.21)

Under AS1, R can be decomposed as
R = VΛV ୘

(5.22)

where V	is a G	x	G matrix of eigenvectors satisfying
VV ୘ = V ୘ V = Iୋ

(5.23)

and Λ is a G	x	G diagonal matrix containing real (eigenvalues) entries. Thus, the
G	x	K matrix H can be expressed in its singular value decomposition (SVD) as [29]
H = VΛ మ U ୘
ି

భ

where U is a KxK full rank unitary matrix and 	UU ୘ = I୏ .

(5.24)

The whitening step obtained the matrix V so that the Kx1 whitened data vector Z
has covariance equals to the identity matrix, R ୞୞ = I୏. . Specifically,
భ

Z = Λ ିమ V ୘ r

(5.25)

Z = U ୘ b + Λ ିమ V ୘ n
భ

(5.26)

Thus, the transmitted symbols can be recovered with a linear Zero-Forcing (ZF)
equalizer. Consequently, the estimated Kx1 symbols
b෠ = UZ

(5.27)

After the preprocessing step, the detection of the symbol signal b෠ 	reduces to

determining the KxK unitary matrix U (a rotation matrix).

160

5.4.2

Step 2: Determining the rotation matrix (unitary matrix) U.

One way of finding the rotation matrix U is based on the fourth order cumulant
matrices. To estimate U in (27), we exploit the statistical independence of the equalized
symbols vector. More precisely, the unitary matrix U will be estimated by using the
independent property of symbols in the fourth order cumulant matrices (say, Q), so that it
has many of the cumulant elements equal to zero [18]. The well-known Joint
Approximation Diagonalization of Eigen Matrices algorithm [18] is efficient and robust
when separating a small number of sources. JADE uses both second order decorrelation
(whitening process) and fourth order cumulant matrices (rotation process) to separate the
source signals from the mixed signals. Hyvarinen et al [18], [19], show that JADE is not
efficient when the sources are numerous, because the size of the 4th order cumulant sets
increases with the 4th power of the number of sources. This requires a large memory in
addition to incurring a high cost for Eigen matrix calculation. From [21, 34], however,
one can get a good sense of the JADE algorithm by using only a portion of the fourth
order cumulant matrices-- up to the number of users (K), and thus avoiding the Eigen
matrix calculation. The Fourth order cumulant matrices of the independent zero-mean
symbols b෠ 	for	1 ≤ i, j, k, l ≤ K, are

Qୠ෡ = cum(b෠ ୧ , b෠ ୨ , b෠ ୩ , b෠ ୪ )

Qୠ෡ = Eൣb෠ ୧ b෠ ୨ b෠ ୩ b෠ ୪ ൧ − Eൣb෠ ୧ b෠ ୨ ൧Eൣb෠ ୩ b෠ ୪ ൧ − Eൣb෠ ୧ b෠ ୩ ൧Eൣb෠ ୨ b෠ ୪ ൧

−E[b෠ ୧ b෠ ୪ ]E[b෠ ୨ b෠ ୩ ]

(28).

Because of AS3, the symbols of vector b෠ 	are assumed to have unitary variance.
Thus, the 4th order cumulant matrices become:

161

Q ୠ෡ = cum൫b෠ ୧ , b෠ ୨ , b෠ ୩ , b෠ ୪ ൯ = 	Eൣb෠ ୧ b෠ ୨ b෠ ୩ b෠ ୪ ൧ − δ୧୨ δ୩୪ 	 − δ୧୩ δ୨୪ − δ୧୪ δ୨୩

(29).

where δ୧୨ 	is the Kronecker delta, and equals 1				∀		i = j ; otherwise zero. Because

෡ , many of the cumulant elements are zero.
of the independent property of the symbols	b

However, one can rewrite the fourth order cumulant set Q for symbols b෠ 	 (28) and for the
whitened vector Z (27) as follows, respectively:
Q ୠ෡ = cum൫b෠ ୧ , b෠ ୨ , b෠ ୩ , b෠ ୪ ൯

≠ 0										∀	i = j = k = l
Q ୠ෡ = ቄ
ቅ		∀		1 ≤ i, j, k, l ≤ K
= 0												Otherwsie							

Q ୠ෡ = cum൫Z୧ , Z୨ , Z୩ , Z୪ ൯ ≠ 0				∀		1 ≤ i, j, k, l ≤ K

(5.30).
(5.31)

Owing to this difference between (31) and (32), the symbols could be detected
from the received signals. Here, the objective function, which was proposed in [21], is

used to determine the unitary matrix U and then to estimate the symbols vector b෠ 	(27),
specifically,

C൫U, b෠ ൯ = ∑୏୧ୀ୩ୀଵ cumଶ ൫b෠ ୧ , b෠ ୨ , b෠ ୩ , b෠ ୪ ൯				∀	1	 ≤ i = j, k = l ≤ K
U: ൝
ൡ
୲୰ୟୡୣ(୙)
subject	to	min	(1 − ୏ )
(5.32)

Note that the iteration stop criterion of the proposed algorithm ends the iteration

when the unitary matrix U tends to be almost the identity matrix	I. So, we can express it
mathematically by:
	(1 −

)≤ϵ

୲୰ୟୡୣ(୙)
୏

(5.33)

where ϵ is a threshold value, e.g., ϵ = eିଷ.
The following are the bulleted steps of the proposed algorithm, based on the
Fourth order cumulant matrices:
162

•

Pre-processing

•

Zero mean of received signal

•

Compute the covariance Matrix R

•

Compute the whiten vector Z

•

Repeat

•

Calculate Cumulant Matrices Qୠ෡ (5.32)

•

Call Joint Diagonlization U in (33)

•

Continue until min ቀ1 −

•

End

ቁ ≤ ϵ	where ϵ is the threshold.

୲୰ୟୡୣ(୙)
୏

5.5 Simulation Results

5.5.1

Performance

The simulated DS-CDMA downlink data, in the presence of AWGN, is used to
verify the effectiveness of the proposed algorithm and to compare it to the detectors
based on the Fast ICA and Robust ICA algorithms. We used spreading codes as short
gold codes with length of chips to be NC=31. The maximum number of users is K =30.
We assumed all signals for all users to be sent with the same power. Monte Carlo
Simulations are executed to verify the validity and the effectiveness of the proposed
algorithm in comparison to the two detectors. Fig. 34 shows the simulation results of
BER vs. SNR of the presented detectors. The parameters are set as: Number of symbols
M=3500, Number of users K=30, Number of paths L=1, with various values of SNR
from -10 dB to 5dB. Figure 5.1 demonstrates that the proposed algorithm, based on the

163

fourth order cumulant matrices (SJAD), improves the performance of the CDMA system,
producing the lowest BER consistently in comparison. It is also observed that the
performance of the FastICA-detector and RobustICA-detector is almost identical.
Figure 5.2 shows the simulation results of BER vs. SNR with various numbers
of symbols (M) with 30 users (K=30). It is obvious that the performance of the proposed
algorithm performs more consistently as M increases and is better than that of the
RobustICA Detector. Furthermore, Figure 5.2.c shows that the RobustICA detector, in
contrast to the proposed method, performs poorly as M decreases. Figure 5.3 shows the
simulation results of BER vs. SNR with medium number of symbols, M= 3500, various
K users. It is obvious that the proposed method performs more consistently and exhibits
improvement in the performance as K decreases. The algorithm also mitigates the MAI,
and has better performance than the other Detectors as shown in Figure. 5.3. Also, the
FastICA detector performs well as K decreases in Figure. 5.3.b to the contrary of the
RobustICA detector in Figure 5.3.c. As a result, the proposed detector performs well in
estimating and recovering symbols in DS-CDMA system.

164

0

10

SJAD
FICA
RICA

-1

10

-2

BER

10

-3

10

-4

10

-5

10

-6

10

-10

-5

0

5

SNR dB

Figure 5.1: Average BER as a function of SNR for 30 users and 100 runs.

165

0

10

SJAD
SJAD
SJAD
SJAD
SJAD
SJAD
SJAD
SJAD
SJAD

-1

10

-2

BER

10

-3

10

with
with
with
with
with
with
with
with
with

M=2000
M=3000
M=4000
M=5000
M=6000
M=7000
M=8000
M=9000
M=10000

-4

10

-5

10

-6

10

-10

-8

-6

-4

-2
0
SNR dB

2

4

6

8

Figure 2.a Detector based sjad algorithm
0

10

ICA
ICA
ICA
ICA
ICA
ICA
ICA
ICA
ICA

-1

10

-2

BER

10

-3

10

with
with
with
with
with
with
with
with
with

M=2000
M=3000
M=4000
M=5000
M=6000
M=7000
M=8000
M=9000
M=10000

-4

10

-5

10

-6

10

-10

-8

-6

-4

-2
0
SNR dB

2

4

6

8

Figure 2.b Detector based FAST ICA algorithm
0

10

RICA
RICA
RICA
RICA
RICA
RICA
RICA
RICA
RICA

-1

10

-2

BER

10

-3

10

with
with
with
with
with
with
with
with
with

M=2000
M=3000
M=4000
M=5000
M=6000
M=7000
M=8000
M=9000
M=10000

-4

10

-5

10

-6

10

-10

-8

-6

-4

-2

0
SNR dB

2

4

6

8

10

Figure 2.c Detector based RICA algorithm

Figure 5.2: Average BER as a function of SNR for different sample lengths
T with 30 users and 100 runs. Black triangle right lines: M = 104.
Black circle lines: T = 9000. Black hexagram lines: T = 8000.
Black square lines: T = 7000. Red triangle up lines: M = 6000.
Blue circle lines: T=5000. Blue hexagram lines: T=4000.
Blue square lines: T=3000. Blue triangle right lines: T=2000.

166

0

10

K=10
K=20
K=30
K=40
K=50

-1

10

-2

10

-3

BER

10

-4

10

-5

10

-6

10

-7

10

-8

10
-10

-8

-6

-4

-2

0

2

4

6

8

SNR dB

Figure 3.a Detector based sjad algorithm
0

10

K=10
K=20
K=30
K=40
K=50

-1

10

-2

10

-3

BER

10

-4

10

-5

10

-6

10

-7

10

-8

10
-10

-8

-6

-4

-2

0

2

4

6

SNR dB

Figure 3.b Detector based FAST ICA algorithm
0

10

K=10
K=20
K=30
K=40
K=50

-1

10

-2

10

-3

BER

10

-4

10

-5

10

-6

10

-7

10

-8

10
-10

-8

-6

-4

-2

0

2

4

6

SNR dB

Figure 3.c Detector based RICA algorithm

Figure 5.3: Average BER as a function of SNR for different Users K
with signal blocks composed of T = 3500 samples and 100 runs.
Black triangle right lines: K = 10. Black circle lines: K = 20.
Red triangle up lines: K = 30. Blue square lines: K=40.
Blue hexagram lines: K=50.

167

5.5.2

Measure of Computation

A measure of Complexity in Computation for the algorithm, in terms of iterations,
is widely used. This kind of measure is not justifiable in the sense that an algorithm may
require only a few iterations to converge, but each iteration may involve heavy
computation. Also, such a measure does not take into account the fact that computation
time depends on actual algorithmic implementation. However, for this work, the measure
followed here is based on the number of real-valued floating point operations (flops) that
are required to reach a solution for an algorithm. Flop count details can be found in [7],
[11]. As a natural measure of extraction quality, the average signal of mean square error
SMSE was employed as a contrast function of independent criteria and is defined as
follows:
SMSE = ଶ ∑୏୩ SMSE୩,୪(୩)
ଵ

where

(5.34)

SMSE୩,୪ = E[|s୩ − α୪ sො୪ |ଶ ]

and with
α୪ =

E[s୩ sො୪∗ ]
E[|sො୪ |ଶ ]

However, to study the performance of the proposed detector and to compare the
three detectors in terms of computational complexity, the SMSE performance index in
(5.34) is used and averaged over 1000 independent realizations of the received data. The
extracted symbols are computed directly from the observed received data. Fig. 39
summarizes the performance and complexity variations obtained for T= 3500, with
different values of the users K. As can be seen in Figures 5.4 and 5.5, the best and faster

168

performance is provided by the proposed algorithm based on fourth order cumulant
matrices, especially, when K is relatively small.

Moreover, the FastICA algorithm

performs well as the proposed algorithm when K increases. In contrast, the RobustICA
detector performs poorly when K increases. In the same context, the performance of the
algorithms for a varying block of sample size T is evaluated.
Separation quality-cost trade-off, real case, noiseless mixtures, T = 3500 samples
0
-5
SJAD
FastICA
RobustICA

-10

SMSE (dB)

-15
-20
-25
-30
-35
-40
1
10

2

3

4

10
10
10
complexity per source per sample (flops)

5

10

Figure 5.4: Average extraction quality as a function of computational
cost for different mixture sizes K with signal blocks composed of
T = 3500 samples and 1000 mixture realizations. Solid lines: K = 5.
Dashed lines: K = 10. Dotted lines: K = 20.

169

Separation quality-cost trade-off, real case, noiseless mixtures, T = 3500 samples
0
SJAD
FastICA
RobustICA

-5

SMSE (dB)

-10

-15

-20

-25

-30
1
10

2

3

4

10
10
10
complexity per source per sample (flops)

5

10

Figure 5.5: Average extraction quality as a function of computational
cost for different mixture sizes K with signal blocks composed of
T = 3500 samples and 1000 mixture realizations. Solid lines: K = 30.
Dashed lines: K = 40.

Figure 5.6 shows the average SMSE curves for different numbers of symbols K
with different T samples. From this figure, one can see that the proposed detector is
considerably more efficient than the other two detectors. Overall, the best performance is
provided by the proposed detector which achieves the given performance level with
lower cost.

170

Separation quality-cost trade-off, real case, noiseless mixtures, T = 1000 samples
0

Separation quality-cost trade-off, real case, noiseless mixtures, T = 2000 samples
0

-5

-5

-10
SMSE (dB)

SMSE (dB)

-10

-15

-20

-25

-30
1
10

-15

-20

-25

SJAD
FastICA
RobustICA
2

3

4

-30
1
10

5

10
10
10
complexity per source per sample (flops)

10

-5

-10

-10
SMSE (dB)

-5

-15

-30
1
10

3

4

10

5

-15

-20

-20

-25

2

10
10
10
complexity per source per sample (flops)

Separation quality-cost trade-off, real case, noiseless mixtures, T = 4000 samples
0

Separation quality-cost trade-off, real case, noiseless mixtures, T = 3000 samples
0

SMSE (dB)

SJAD
FastICA
RobustICA

-25

SJAD
FastICA
RobustICA
2

3

4

10
10
10
complexity per source per sample (flops)

5

10

SJAD
FastICA
RobustICA

-30
1
10

2

3

4

10
10
10
complexity per source per sample (flops)

Separation quality-cost trade-off, real case, noiseless mixtures, T = 5000 samples
0

-5

SMSE (dB)

-10

-15

-20

-25

-30
1
10

SJAD
FastICA
RobustICA
2

3

4

10
10
10
complexity per source per sample (flops)

5

10

Figure 5.6: Average extraction quality as a function of computational cost
for different mixture sizes K with different signal blocks T samples and
1000 mixture realizations. Solid lines: K = 20. Dashed lines: K = 30.
Dotted lines: K = 40.

171

5

10

5.6 Conclusion
In this chapter, we have investigated three adaptive algorithms for user-detection
in CDMA systems, the proposed one based on fourth order cumulant matrices, the Fast
ICA and the Robust ICA algorithms. The results show that the proposed algorithm
exhibits better performance relative to the other two user detectors. The results also show
that the proposed algorithm can mitigate Multiple Access Interference (MAI), thus
improving the performance of conventional detection. Furthermore, the performance of
the proposed detector displays the most consistent improvement as	‫( ܯ‬the number of
symbols) increases. Also, we assess the performance of computational complexity of the
three user detection algorithms employing the average signal of mean square error
SMSE, as a contrast function of independent criteria. The results show that the proposed
detector provides a faster and more robust performance.

172

6 Chapter 6

Adaptive Blind Multiuser Detection DS-CDMA
Based on State Space Approach

Code Division Multiple Access (CDMA) is a channel access method used by
various radio technologies and it is based on spread-spectrum technology. In general,
CDMA is used as an access method in many mobile standards such as CDMA2000, and
WCDMA. We address the problem of blind multiuser equalization in the wideband
CDMA system, in the noisy multipath propagation environment. Herein, we propose
three new blind receiver schemes, which are based on the state space structures. This socalled blind state-space receivers (BSSR) does not require knowledge of the propagation
parameters or spreading code sequences of the users but relies on the statistical
independence assumption between the source signals. Also, we develop and derive three
update-laws in order to enhance the performance of the blind detector. Additionally, we
upgrade three semi-blind adaptive detectors based on the corporation between the RAKE
receiver and the stochastic gradient algorithms which are used in several blind adaptive
signal processing algorithms, namely FastICA, RobustICA, and principle component
analysis PCA. Bit error rate (BER) simulations of these methods are shown for different
number of users, signal to noise ratio (SNR) and different number of symbols per user in
comparison with the Blind Multiuser Detectors (BMUD), Linear Minimum mean squared
173

error (LMMSE) and other conventional detectors. The results show that the proposed
algorithm outperforms the other detectors in estimating the symbol signals from the
mixed CDMA received signals. Moreover, the new blind detectors mitigate the multi
access interference (MAI) in CDMA.

6.1 Introduction
Code Division Multiple Access (CDMA) is a channel access method used by
various radio technologies and it is based on spread-spectrum technology such as in thirdgeneration (3G) cellular telephony, terrestrial and satellite communications systems, and
indoor wireless networks [1-2], [9]. Although, LTE (4G) is in operation in many cellular
companies inside and outside the U.S., the networks are still not fully built out, and LTE
coverage is still not universal. Thus, the most of the older 2G and 3G systems are still in
charge or at least working in parallel with the 4G as in U.S. companies, AT&T and TMobile use GSM/WCDMA/HSPA while Verizon, Sprint, and MetroPCS use
cdma2000/EV-DO. Moreover, The LTE wireless interface is incompatible with 2G and
3G networks, so that it must be operated on a separate wireless spectrum. Nevertheless,
3G is intended to be replaced by 4G technologies sooner or later, but it is going to take a
long time before the LTE coverage is developed to be fully operated and universal,
especially in some countries worldwide, such as India, Iraq … etc. [141-142].
As any radio communication systems, CDMA systems consider as interference
limited and it suffers from deferent types of interference, namely an internal multiple
access interference (MAI) due to the non-ideal cross-correlations between users’
spreading sequences, narrow-band interference, inter-symbol interference (ISI) and noise
at the receiver. These drawbacks, in general, affect the performance of a CDMA system.
174

In highly loaded systems, conventional detectors are considered unsuitable choice, since
most of them suffer from the external interference sources such as adjacent channel
interference or jamming, and treat the interference as an additional background noise. In
practical, the primary source of interference is MAI in the CDMA system. This has
motivated to the development of numerous interference rejection techniques to overcome
the MAI and the near-far problem in the conventional receiver. Several state-of-art
approaches are proposed in literature to overcome this challenge such as the trainedbased systems. Also, the most conventional detection for the CDMA signals are based on
the second order statistics among user codes.
In CDMA system, Multiuser detection has been presented in several works in
order to enhance channel capacity and mitigate multiple access interference (MAI).
Multiuser detection was firstly established to obtain an optimum multiuser detector for
multi-Gaussian channel in [1]. In additional, several suboptimum detectors have been
proposed in [6-8], because of the computational complexity in the optimal detector which
make it unrealistic. In [1] and [32-36], the training sequence techniques were used to
present suboptimal detectors, i.e. adaptive linear detector and zero-forcing detector. In
[6], they proposed suboptimal detector based on the Liner minimum mean square error
(LMMSE) method. In [8], X. Wang and H. Poor proposed the blind MMSE and the blind
de-correlating detectors. In [31-36], adaptive blind detectors were proposed based on the
minimum output energy incorporating with constrained optimization methods. Several
subspace approaches were proposed in in the literature, i.e. [20], [20], [36]. In [10],
several types of group-blind linear detectors were proposed in order to enhance the
performance for uplink and downlink channels. The key idea of these detectors was to

175

take advantage of the cross-correlation matrix which constructed by exploiting the
correlation between the successive received signals. These detectors suffer from the
computational complexity in order to be implemented. Also, they require information of
timing and spreading waveform of the desired user.
The aforementioned techniques periodically enforce the user to send a training
sequence that has to be known for the receiver in order to make the receiver able to
estimate the parameter of the propagation channel which are caused by the multiple
reflections of the radio waves on the obstacles encountered, i.e. buildings, cars, and trees
etc. Furthermore, according to [42], it has been reported that 20% of the bandwidth is
devoted to the training sequence in GSM and up to 40% in UMTS. In spite of the good
performance of the aforementioned techniques, the cost tends to be significantly large in
terms of bandwidth. However, adaptive signal processing techniques seemed to be more
sufficient methods to CDMA systems in the presence of the high dynamic conditions due
to the mobility of the mobile terminal, the short code case and the fortuitous of the
channel access. In particular, blind adaptive techniques tend to be a hot topic for last
decade in order to ensure the high communication rate due to its potential to eliminate/
reduce the training sets. Moreover, blind techniques are considered an attractive field so
as to work besides the trained-based systems to 1) reduce the training sequence, 2) to help
the trained-based systems to back up especially in fast-time varying channels and at
severe multipath fading. Also, Blind techniques help to recover the signals in some other
situations such as 1) eavesdropping, where using the training sequence is not possible or
not available, 2) when the receiver fails to keep on tracks the desired user. However, the
underlying user symbol sequences are usually mutually independent. Therefore, this is

176

the key assumption that makes the CDMA system a suitable environment to take
advantage of the blinds techniques, such as information maximization [1] and minimum
mutual information [6]. Moreover, the adaptive LMMSE detector has been proposed to
overcome the complex matrix inversion operation, but this detector still needs the
spreading codes of all users. Therefore, the MMSE detector might not be realistic in the
downlink receiver. Also, it might be insecure in the downlink case. However, it seems
more suitable to work in the uplink case. Therefore, this chapter aims to recover the
source symbol sequences from the linear convolutive received mixture without any
knowledge of the user spreading codes and in the absence of channel identification.
Simply put, this chapter proposes blind adaptive detections, based on state space
structures using natural gradient method for multipath channels in CDMA systems. Three
update-laws are derived based on the state space structures, and then, three blind statespace receivers (BSSR) are developed for MAI, ISI suppression and symbol estimations.
The second contribution of this chapter is three semi-blind adaptive algorithms based on
the corporation between the RAKE receiver and the stochastic gradient algorithms which
are used in several blind adaptive signal processing algorithms, namely FastICA,
RobustICA, and principle component analysis PCA. Nevertheless, this chapter explores a
higher order statistics (HOS) in order to make the methods robust and secure against
incomplete cross-correlation and a near-far problem which consider other drawback
factors in conventional detection methods. The simulations are carried out to study and
verify the effectiveness of the proposed methods for solving the symbols estimations.
Moreover, we observe variations in the bit error-rate as a function of signal to noise ratio,
number of users and number of symbols per user. Finally, a comparison is attempted

177

between the proposed algorithms and conventional ones in terms of performance and
their computational complexity.
Through this chapter, lower case letters denote scalars, bold lower case letters
denote vectors, and bold upper case letters denote matrices. The following symbols are
used to present our work:
(∙)் refers to transpose operator;

(∙)ு refers to Hermitian transpose operator;
‫ )∙(݁ܿܽݎݐ‬refers to the trace operator.

݆ = √−1 refers to the imaginary unite.

݀݅ܽ݃(∙) refers to diagonal matrix;
‫ )∙(݊݃ݏ‬refers to sign operator;

‫ ]∙[ܧ‬refers to statistical expectation.
The remainder of the chapter is organized as follows. In Section II, a brief
description and derivation of synchronous CDMA signal models in multi-path fading are
presented. Section III proposes the new detection scheme based on State space approach.
The comparative simulations results and conclusion are given in Section IV and Section
V, respectively.

178

s1(t)

I

+

User 1 Data
Q

b1(t)

×j

s2(t)
b2(t)

Multipath
AWGN
Channel

r(t)

sk(t)

I

+

User K Data
Q

∑

bk(t)

×j

Figure 6.1: Signal generation model for a typical QPSK DS-CDMA system

6.2 DS-CMDA Signal Model
In this section, we briefly present the signal model for a DS-CDMA
implementation using one layer of spreading codes only. In a typical synchronous DSCDMA system employed for indoor ATM and certain ad hoc wireless networks [74],
[75]. In a DS-CDMA system, several users share the medium simultaneously by using
their own signatures. The simplest received signal model r(t)	before filtering in a symbol
interval, as in Figure 6.1, is given by [75]
୏
୐
r(t) = ∑୑
୫ୀଵ ∑୩ୀଵ ∑୪ୀ଴ α୪୫ b୩,୫ s୩ (t − mTୠ − d୪ Tୡ ) + n(t)

(6.1)
where
•

l, k, m are path, user and symbol indices, respectively.

•

α୪୫ is the path gain since in downlink model; the path gain does not differ
among users because all users’ signals are sent together and the path gain
α୪୫ and propagation delay factor	d୪ depend only on the number of paths.

•

b୩,୫ is symbol.
179

•

s୩ (. ) is spreading code ( chip sequence)

•

d୪ is the propagation delay factor, d୩୪ ∈ {0, 1, … ,

•

C is the number of chips per symbol.

•

Tୠ , Tୡ , t are time , symbol and chip duration, respectively.

•

n(t) is an additive white Gaussian noise (AWGN) channel.

େିଵ
ଶ

}

In this chapter, the system is assumed to be time-invariant which means that the
channel parameters are much slower than the frequency of transmitted symbol data.
However, let us assume that G is the number of code sequence, K is the number of users,
and L is the number of channels. Thus, the vector form of the equation (6.1) will change
to be as:
r = HSb + n

(6.2)

Where r is the received vector signal, H is an (G + L − 1)	x	G matrix which

represents the multipath propagation coefficients, S is an G	x	K block diagonal matrix, b

is an K − d vector which represent the data symbols, n is the (G + L − 1) − d channel
noise vector with covariance matrix Q. This model received signals (6.2) is suitable for
deriving the linear symbols detectors such as the MF, the RAKE, the LMMSE and the
blind Detectors based on FastICA and Robust ICA algorithms [2], [11], [15]. In
additional, an alternative signal model is proposed in [80], [81] as a linear convolutive
model is given by:
r୬ = H଴ b୬ + Hଵ b୬ିଵ + n୬ = Hbത୬ + n୬
where
•

r୬ is the received signal vector;

•

b୬ = [bଵ (n), … , b୏ (n)]୘ is a current bits of all users;
180

(6.3)

•

H଴ = [hଵ , … , h୩ ] is signature matrix of the current bits of all users
including MAI;
0
‫ۍ‬
‫ې‬
h୩ (0)
‫ێ‬
‫ۑ‬
.
‫ۑ‬
h୩ = ‫ێ‬
.
‫ێ‬
‫ۑ‬
.
‫ێ‬
‫ۑ‬
‫ۏ‬h୩ (N − d୩ − 1)‫ے‬

•

തതതଵ , … , h
തതത୩ ] is the signature matrix of the previous bits of all users
Hଵ = [h
including ISI;
where
h୩ (N − d୩ )
‫ۍ‬
‫ې‬
.
‫ێ‬
‫ۑ‬
.
‫ۑ‬
hത୩ = ‫ێ‬
.
‫ێ‬
‫ۑ‬
‫ێ‬h୩ (N + M − 1)‫ۑ‬
‫ۏ‬
‫ے‬
0

•

H = [	H଴ ; Hଵ ] is the signature matrix of all users;

•

b୬ = [bଵ (n), … , b୏ (n)]୘ are currents bits of all user;

•

b୬ିଵ = [bଵ (n − 1), … , b୏ (n − 1)]୘ are previous bits of all users;

•

bത୬ = [b୘୬ ;	b୘୬ିଵ ]୘ are bits of all users;

•

n୬ = [n(nN), … , n(nN + N − 1)]୘ is independent white Gaussian Noise
vector.

In uplink (asynchronous) CDMA systems, one can assume that H଴ and Hଵ are
mutually independent, therefore H is a full column matrix and its rank is 2K as it’s shown
in [87], [89]. Whereas for downlink (synchronous) CDMA communication, H is a matrix
and its rank is full-rank with hardly restricted as in [91]. The main focus in this chapter is

181

on the synchronous CDMA communication system although our proposed algorithms are
working well in asynchronous CDMA system [92], [88].

6.3 Conventional Blind Linear Multiuser Detectors
In this section, we briefly describe the conventional linear multiuser detectors
such as the Match Filter (MF), the Rake receiver and the LMMSE detector in multipath
environment. In additional, we briefly describe the Blind-Group detectors in [g] which
include the blind decorrelation detector and Blind linear hybrid detector. For more details
on these linear detectors, see [74], [86-95].

6.3.1

Single user detection (SUD) Detector

The SUD is a standard MF detector which exploits the user’s signatures to make
the best estimation of the user’s sequences from the data received at MS. This detector is
completely ignoring the MAI due to other users sharing the resources [74]. One can
express the MF Detector for ith user in DS-CDMA system as follows:
஽
= ܵ௜ு ‫ݔ‬
ܾ௜,ெி

(6.4)

where ܵ௜ = ݀݅ܽ݃(‫̅ݏ‬௜ , ‫̅ݏ‬௜ , … , ‫̅ݏ‬௜ ) , ‫̅ݏ‬௜ = [0	0 …	‫ݏ‬௜ … 0]. ‫ݏ‬௜ is the ith user’s signature

஽
code, ‫ ݔ‬is received data, and ܾ௜,ெி
is the estimated DS-CDMA symbol vector, see [74].

6.3.2

Rake Detector

Perhaps, the most special case of linear multiuser detection is the Rake Detector
which consists of multiple chip-delayed SUD fingers in parallel. In this chapter, we

182

implement the Rake receiver with the knowledge of both channel delays and channel
coefficients. However, one can express the RAKE for DS-CDMA system mathematically
as follows:

஽
ഥு ‫ݔ‬
ܾ௜,ோ஺௄ா
= ܵ௜ு ‫ܪ‬

(6.5)

஽
Where ‫ ܪ‬is the estimated channel matrix, and ܾ௜,ோ஺௄ா
is the estimated ith user’s

symbol;

6.3.3

LMMSE Detector

Despite the fact that the conventional linear detectors based on the Least Square
(LS), Zero-Force (ZF) and BLUE algorithms perform poor especially in colored noise
presence, the LMMSE detector is considered as one of the best linear detector for DSCDMA system [74], [75]. However, one can express the LMMSE as follows:
஽
ഥ ு (ߪ ଶ ‫ܪ‬
ഥ‫ܪ‬
ഥ ு + ܳത )ିଵ ‫ݔ‬
= ܵ௜ு ‫ܪ‬
ܾ௜,௅ெெௌா

(6.6)

ഥ‫ܪ‬
ഥ ு + ܳത ) = ܴ = ‫ ݔݔ[ܧ‬ு ] is the auto-correlation of the received data
Where (ߪ ଶ ‫ܪ‬

at the MS, and ߪ ଶ is the average power of the transmitted power. There are several

drawbacks in implementation the LMMSE receiver; however, the main drawback is that
the computation of the auto-correlation	ܴ is very expensive. One can use the eigen-

decomposition instead of inverting the auto-correlation matrix ܴ as follows [5], [74]:
ௐ
ഥ ு (ܸ௦ ‫ܦ‬௦ିଵ ܸ௦ு )‫ݔ‬
ܾ௜,௅ெெௌா
= ܵ௜ு ‫ܪ‬

(6.7)

Where ܸ௦ is the estimated Eigen-vectors of the auto-correlation matrix	ܴ, and ‫ܦ‬௦

is the corresponding eigen-valuse of the auto-correlation matrix	ܴ. Additionally, one can
use adaptive algorithms to estimate the LMMSE user’s symbols as in [135].

183

6.4 The Proposed Detection Schemes Based On state space framework
In this section, a new blind detection strategy is proposed, based on the state space
framework [75]. We proposed the three blind multiuser detectors based on feed-forward
structure, feedback structure I, and feedback structure II, respectively.
Here, one needs to first recall the received signal model (3)
r୬ = H଴ b୬ + Hଵ b୬ିଵ + n୬
The aim of this chapter is to detect the b symbol vector from the received data r
under the following assumptions:
•

AS1) the GxK matrices,H଴ , Hଵ 	is of full column rank.

•

AS2) the symbol signals, b, are white, independent and identically
distributed (i.i.d)

•

AS3) the Additive Noise vector is white and independent of source signals

•

AS4) the power of the transmitted symbol signals, normalized to be unity.

•

AS5) the maximum lag in the entire multipath channels ൫max(τ୪ )൯ is
smaller than the spreading gain of the CDMA	G.

•

AS6) the CDMA system is not over-saturated, which means the number of
users (K) is less the the number of the spreading gain	(G).

•

AS7) the channel is assumed to be a slowly fading wide sense stationary.

Each method involves two steps: First, a preprocessing stage. Second, the rotation
stage based on the state space structures. In the next subsection, we will explain the
preprocessing stage (whitening processes), and then we will derive the three methods
based on each state space structure in individual subsections.

184

6.4.1

Step1: Preprocessing (Data Whitening)

In the preprocessing step, the symbol signals are detected up to a unitary matrix
using the second order statistic (SOS). This step was used to reduce the noise and to
eliminate redundancy in the data. Under the Assumptions AS1, AS2, AS3 and AS4, the
GxG covariance matrix (C) of the noiseless transmitted signals can be expressed by
C = E[rr ୌ ] − σଶ I୒ୋ

(6.8)

Without loss of generality, we consider a simpler two tap models then we will
generalize them using induction techniques. Therefore, substituting r in (6.8), under our
assumptions AS1-AS7, one can expressed the covariance matrix C as follows
C = H଴ E[bbୌ ]H଴ ୌ + Hଵ E[bbୌ ]Hଵ ୌ = H଴ H଴ ୌ + Hଵ Hଵ ୌ

(6.9)

Under AS1, the H଴ H଴ ୌ and Hଵ Hଵ ୌ can be decomposed, respectively as
C଴ = V଴ Λ ଴ V଴ ୌ

(6.10)

Cଵ = Vଵ Λଵ Vଵ ୌ

(6.11)

where V଴ and Vଵ are an KxK matrix satisfying
V଴ V଴ ୌ = V଴ ୌ V଴ = I୏
Vଵ Vଵ ୌ = Vଵ ୌ Vଵ = I୏

(6.12)
(6.13)

and Λ ଴ and Λଵ are an KxK diagonal matrix containing significant eigenvalue
entries. So, from aforementioned equation (6.3), the GxG H଴ and Hଵ matrices will be
represented respectively as
H଴ = V଴ Λ ଴ ିమ U଴ ୌ
భ

Hଵ = Vଵ Λଵ ିమ Uଵ ୌ
భ

185

(6.14)
(6.15)

where U଴ is a KxK full rank unitary matrix and 	U଴ U଴ ୘ = I୏ and Uଵ is a KxK full

rank unitary matrix and 	Uଵ Uଵ ୘ = I୏ . However, the whitening step obtained matrix V଴

and Vଵ so that the Kx1 whitened data vector r୬୵ has a covariance of the identity matrix,
C = I୏ , which can be obtained as follows:
భ

భ

r୬୵ = Λ ଴ ିమ V଴ ୌ r୬ + Λଵ ିమ Vଵ ୌ r୬

(6.16)

Therefore,
r୬୵ = U଴ ୌ b୬ + Uଵ ୌ b୬ିଵ + ቀΛ ଴ ିమ V଴ ୌ + Λଵ ିమ Vଵ ୌ ቁ n୬
భ

భ

(6.17)
The transmitted symbols can be recovered based on the state space structures.

However, after the preprocessing step, the detection of the symbol signal b෠ 	୬ reduces to

determining the KxK unitary matrices U୩ (rotation matrices). Next, the derivations for the
three proposed algorithms, based on feedforward structure, feedback structure I and
feedback structure II, respectively, are presented.

6.4.2
Step 2a: Determining the rotation matrix (unitary matrix) U based on
the feedforward structure.
The output from the feedforward structure is given in [75] as follows:
୵
y୬ = U଴ r୬୵ + ∑୏୩ୀଵ U୩ r୬ି୩

(6.18)

Again, we consider a simpler two tap models, thus the two tapes of the
FeedForword model represents as

୵
y୬ = U଴ r୬୵ + Uଵ r୬ିଵ

(6.19)

186

However, one can re-write the previous convolutive filter in the following
augmented form

y୬
U 								Uଵ r୬୵
ቂr ୵ ቃ = ቂ ଴
ቃ൤ ୵ ൨
0													I r୬ିଵ
୬ିଵ

(6.20)

Let’s define that
y୬
෩
Y = ቂr ୵ ቃ
୬ିଵ

෩ = ൤ U଴ 									0 ൨
U
	Uଵ 										I
୵
෩ = ൤ r୵୬ ൨
R
r୬ିଵ

So, the expression in (15) becomes as
෩
෩ ୘R
෩
Y=U

(6.21)

Based on ICA algorithm [2], [13], [14], the update law for the weight column of

෩ , we have
de-mixing matrix	U

෩ ቀG൫u୘ R
෩൯ቁቃ
uା = u − μE ቂR

(6.22)

෩ , μ is the step size and G is the score function.
Where u is the column vector of	U
However,
u=൤

U଴
൨
Uଵ

(6.23)

Then
൤

u଴
r୬୵
u଴ ା
൨
=
ቂ
ቃ
−
μ	
൤
୵ ൨ G(y୬ )
uଵ
r୬ିଵ
uଵ ା
187

(6.24)

Where u଴ , uଵ are the column vector of U଴ and Uଵ , respectively. Therefore, the
update laws for the individual columns are
u଴ ା = u଴ − μr୬୵ G(y୬ )

(6.25)

୵
uଵ ା = uଵ − μr୬ିଵ
G(y୬ )

(6.26)

By induction, the update law for the kth lag element u୩ is
୵
u୩ ା = u୩ − μr୬ି୩
G(y୬ )

(6.27)
Wk
z -1 I

W1
z -1 I

rnw

+

W0− 1

yn

Figure 6.2: Feedback Demixing Structure I

6.4.3
Step 2b: Determining the rotation matrix (unitary matrix) U based on
the feedback structure I.
The output of the feedback structure I is given in Figure 6.2 as follows:
y୬ = U଴ିଵ ൫r୬୵ + ∑୏୩ୀଵ U୩ y୬ି୩ ൯

(6.28)

Consider two tapes of the Feedback Configuration I model
y୬ = U଴ ିଵ (r୬୵ + Uଵ y୬ିଵ )

(6.29)

However, one can re-write the previous convolutive filter in the following
augmented form

188

൤

r୬୵
U 								Uଵ y୬
൨=ቂ ଴
ቃቂ
ቃ
y୬ିଵ
0													I y୬ିଵ

Or

(6.30)

y୬
U 								Uଵ ିଵ r୬୵
ቃ ൤
൨
ቂy ቃ = ቂ ଴
y୬ିଵ
୬ିଵ
0													I
y୬
ቂy ቃ =
୬ିଵ

ଵ

ୢୣ୲	(୙బ )

൤

I						 − Uଵ
r୵
൨൤ ୬ ൨
0													U଴ y୬ିଵ

(6.31)

Let’s define that
y
෩=ቂ ୬ ቃ
Y
y
୬ିଵ

෩ = ൤ I											0 ൨
W
−Uଵ 								U଴
෩ = ൤ r୬ ൨
R
y୬ିଵ
୵

So, the previous augmented expression becomes
෩
෩ ୘R
෩
Y=U

(6.32)

Based on ICA algorithm [2], the update law for the weight column of de-mixing

෩ , we have
matrix	U

෩ ቀG൫u୘ R
෩൯ቁቃ
uା = u − μE ቂR

(6.33)

෩ , μ is the step size and G is the score function.
Where u is the column vector of	U
However,
u଴ = ൤

I଴
൨
−Uଵ ଴

The update law is
189

(6.34)

൤

i
r୬୵
iା
൨
=
൤
൨
−
	μ
൤
൨ G(r୬୵ − uଵ y୬ିଵ )
uଵ
y୬ିଵ
uଵ ା

(6.35)

And

0ଵ
uଵ = ൤ ଵ ൨
U଴

(6.36)

Then
൤

0
r୬୵
0ା
൨ G(u଴ y୬ିଵ )
ା ൨ = ൤u ൨ − μ	 ൤y
u଴
଴
୬ିଵ

(6.37)

The update laws for the individual columns are
u଴ ା = u଴ − μy୬ିଵ G(u଴ y୬ିଵ )

(6.38)

uଵ ା = uଵ − μy୬ିଵ G(r୬୵ − uଵ y୬ିଵ )

(6.39)

And

By induction, the update law for the kth lag element u୩ is
u୩ ା = u୩ − μy୬ି୩ G(r୬୵ − u୩ y୬ି୩ )

(6.40)

Wk
z -1 I

W1
z -1 I

rnw

W0

+

+

yn

Figure 6.3: Feedback Demixing Structure II

190

6.4.4
Step 2c: Determining the rotation matrix (unitary matrix) U based on
the feedback structure II.
The output of the feedback structure II is given in Figure 6.3 as follows:
y୬ = U଴ r୬୵ + ∑୏୩ୀଵ U୩ y୬ି୩

(6.41)

Again, consider two tapes of the Feedback structure II model
y୬ = U଴ r୬୵ − Uଵ y୬ିଵ

(6.42)

However, one can re-write the previous convolutive filter in the following
augmented form

y୬
U 								−Uଵ r୬୵
ቂy ቃ = ቂ ଴
ቃ൤
൨
୬ିଵ
0													I y୬ିଵ

(6.43)

Let’s define that
y୬
෩
Y = ቂy ቃ
୬ିଵ

෩ = ൤	U଴ 								0൨
W
−Uଵ 							I
෩ = ൤ r୬ ൨
R
y୬ିଵ
୵

So, the previous expression becomes

෩=U
෩ ୘R
෩
Y

(6.44)

Based on ICA algorithm [2], the natural gradient update laws for the weight

෩ , we have
column of de-mixing matrix	U

෩ ቀG൫u୘ R
෩൯ቁቃ
uା = u − μE ቂR

(6.45)

෩ , μ is the step size and G is the score function.
Where u is the column vector of	U
However,
u଴
u = ቂ−u ቃ

(6.46)

ଵ

191

Then
൤

u଴
u଴ ା
r୬୵
൨
=
ቂ
ቃ
−
μ	
൤
൨ G(y୬ )
uଵ
y୬ିଵ
uଵ ା

(6.47)

The update laws for the individual columns are
u଴ ା = u଴ − μr୬୵ G(y୬ )

(6.48)

uଵ ା = uଵ − μy୬ିଵ G(y୬ )

(6.49)

And

By induction, the update law for the kth lag element u୩ is
u୩ ା = u୩ − μy୬ି୩ G(y୬ )

(6.50)

Algorithm 6.1: RAKE based FastICA method
Input: (‫ )ܶ	ݔ	ܯ‬matrix of realization r, Initial demixing matrix	ࢃ = ࡵࡳ ,
number of iterations	‫ݎݐܫ‬, Step Size ߛ i.e.	ߛ	 = 0.3, H is the estimated channel matrix,
ࢍ(࢟) = ࢟ଷ
Perform Pre-Whitening
{࢘ = 	ࢂ ∗ ࢘ = ࢫ^((−1) ⁄ 2)	‫}࢘		ܶ^ܧ‬,
For loop: for each ݅ = 1 … ܰ
ࡾ = ࢃࡴࡴ ࢘(: , ࢏)
ࢃ = ࡱ{[ࢍ(ࢃࡾ)]ࢀ } − ࡱ{ࢍᇱ (ࢃࡾ)}ࢃ			
W=W/norm(W)

ு
ഥு
࢈஽
௜,ூ஼஺ (: , ݅) = ࡿ௜ ‫࢘ ࡴ܅‬

End For
Output: the estimated Symbols	࢈஽
ூ஼஺

192

6.4.5

The proposed adaptive detectors

In this section, we develop three adaptive detectors based the independent
component Analysis (ICA), Robust ICA and Principle Component Analysis (PCA).
Having the RAKE receiver structure in (10), one can express the adaptive weight RAKE
for DS-CDMA system mathematically as follows:

ு
ு
࢈஽
௜,ோ஺௄ா = ࡿ௜ ‫࢘ ࡴ܅‬

(53)

Where ࡴ is the estimated channel matrix, ࢈஽
௜,ோ஺௄ா is the estimated ith user’s symbol;

and ࢃ is an ‫ ܩ	ݔ	ܩ‬weighting matrix. However, we present Algorithms 6.1, 6.2 and 6.3 to
estimate the matrix	ࢃ adaptively based on the FastICA, Robust ICA and PCA
algorithms, respectively.

Algorithm 6.2: RAKE based RICA method
Input: (‫ )ܶ	ݔ	ܯ‬matrix of realization r, Initial demixing matrix	ࢃ = ࡵࡳ ,
number of iterations	‫ݎݐܫ‬, Step Size ߛ i.e.	ߛ	 = 0.3, H is the estimated channel matrix,
here g is the gradient of the Kurtosis contrast K(.)
Perform Pre-Whitening
{࢘ = 	ࢂ ∗ ࢘ = ࢫ^((−1) ⁄ 2)	ࡱ^ܶ		࢘},
For loop: for each ݅ = 1 … ܰ
ࡾ = ࢃࡴࡴ ࢘(: , ࢏)
∆ࢃ = (ࡵࡳ − ࡾ ∗ ࡾࡴ )ࢃ
		௠௔௫ |‫ݕ(ܭ‬
ߤ௢௣௧ = ܽ‫					݃ݎ‬ఓ
+ ߤ݃)|
ࢃ = ࢃ+		ߤ௢௣௧ 	∆ࢃ
W=W/norm(W)
ࡴ
ഥு
࢈஽
(:
௜,ோூ஼஺ , ݅) = ࡿ࢏ ‫ݎ ࡴ܅‬
End For
Output: the estimated Symbols	࢈஽
ோூ஼஺

6.5 Simulation Results
In this section, a series of simulations are carried out in order to verify the
proposed algorithms in downlink DS-CDMA system in the presence of AWGN. We

193

assume a constant spreading gain, which is NG = 63 for Gold Code. The received

CDMA signal is taken in five multipath channels L = 5 with delays of {0, 1, 2, 3	, 4}

chips, respectively. Also, we use the complex attenuation coefficients to represent the
multipath channels, which are	h଴ = 0.25 + j0.18,hଵ = 0.21 + j0.14,hଶ = 0.18 + j0.11,
hଷ = 0.14 + j0.11, and hସ = 0.11 + j0.07, respectively. We use the following model
function for sub Gaussian sources which the source signals are having a negative kurtosis
sign.

Gୗ୙୆ ൫b෠ ൯ = b෠ − ൫tanh൫Re൛b෠ ൟ൯ + jtanh൫Im൛b෠ ൟ൯൯
Monte Carlo Simulation was run to verify the validity of the algorithm

simulations. Also, we use the signal to noise ratio (SNR) as a figure of merit which
merely represents the ratio of the energy per bit and the power spectral density (PSD) of
the noise. Moreover, all the user symbols are assumed to be transmitted with the same
power.
Figure 6.4 (a) and (b) sow the simulation results of SNR vs. BER for three

proposed detectors regarding to other ones for number of users ‫ = ܭ‬30 and ‫ = ܭ‬50,
respectively. The other parameters were set as: Number of symbols M=1000, Number of

paths ‫ = ܮ‬5, with various values of SNR -10 dB to 30dB. Rake based on RobustICA

algorithtm is used with these parameters: the source Kurtosis signs is considered to be
maximize absolute normalized kurtosis for all sources, 1݁ − 3 is used to be the threshold
for statistically-significant termination test, the maximum number of iterations per
extracted source is 1000, prewhitening (via SVD of the observed data matrix);
orthogonalization deflation type is used and the extracting vectors initialization is an
identity matrix of suitable dimensions.

194

Number of users K = 30, Using Gold Codes G = 63

0

10

-1

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

SNR dB

(a)
Number of users K = 50, Using Gold Codes G = 63

0

10

-1

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

SNR dB

(b)
Figure 6.4: Average BER as a function of SNR for DS-CDMA downlink. Using
Gold codes G=63. (a) Using 30 users (b) Using 50 users

In Figure 6.4, the proposed algorithms improve the performance of the CDMA
system; blind multiuser detection based on the second feedback structure has given the
lowest BER regards to the others, and outperforms the performance of the other
detectors. We also observe that the proposed algorithms work in the high SNR ratio,
195

posed problem for LMMSE receiver especially when the
which most likely cause ill--posed
sample set T is fairly small. Moreover, the performance of the blind multiuser detection
degrades as the number
umber of the user increases as shown in Fig. 6.4 (b).
Also, to complete our discussion, Figure. 6.5 shows the performance of all the
proposed blind multiuser equalizers in terms of computational complexity. It shows the
average of the corresponding C
CPU time of each proposed method.. However, the price to
pay for the enhancement in the BER performance is represented by the computational
complexity in terms of CPU times. We also study the effect of OVSF codes in Fig. 6.6.
Generally, Figure. 6.6 shows that uusing
sing the OVSF codes enhances the performance of the
proposed methods.
CPU Time
14
12
10
8
6
4
2
0
ICA

PCA

FF

FB1

FB2

RICA BMUD

Figure 6.5: Corresponding CPU time for each method.

196

Number of users K = 30, Using OVSF Codes G = 64

0

10

-1

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

25

30

SNR dB

(a)
Number of users K = 50, Using OVSF Codes G = 64

0

10

-1

10

-4

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-10

-5

-2

BER

10

-3

10

10

0

5

10

15

20

SNR dB

(b)
Figure 6.6: Average BER as a function of SNR for DS-CDMA downlink. Using
OVSF codes G=64. (a) Using 30 users (b) Using 50 users

In the WCDMA System, we assume that the channel coefficients are	h଴ = 0.25 +
j0.18,hଵ = 0.21 + j0.14,hଶ = 0.18 + j0.11, hଷ = 0.14 + j0.11, and hସ = 0.11 + j0.07,
197

respectively, the bandwidth of channel is 1.25 mega chips per second (MCPs), all userspecific codes use two types of spreading codes, namely, Gold codes with a spreading
gain G=63, and OVSF or (Walsh-Hadamard) codes with spreading gain G=64. The long
scrambling code has a frame-length of 10 ms. In Figure 6.7, we demonstrate the
performance of the various methods in terms of BER for the WCDMA downlink
scenario.
We observe that the LMMSE is slightly better some presented detectors under the
good SNR conditions. But the proposed algorithm based on the second feedback structure
outperforms all detectors again at all SNR and has given the lowest BER regards to the
others.

198

Number of users K = 30, Using Gold Codes G = 63

0

10

-1

10

-2

BER

10

-3

10

-4

10

-10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-5

0

5

10

15

20

25

30

25

30

SNR dB

(a)
Number of users K = 50, Using Gold Codes G = 63

0

10

-1

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

SNR dB

(b)
Figure 6.7: Average BER as a function of SNR for WS-CDMA downlink.
Using Gold codes G=63. (a) Using 30 users (b) Using 50 users

199

Number of users K = 30, Using OVSF Codes G = 64

0

10

-1

10

-2

BER

10

-3

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-4

10

-10

-5

0

5

10

15

20

25

30

25

30

SNR dB

(a)
Number of users K = 50, Using OVSF Codes G = 64

0

10

-1

10

-2

BER

10

-3

10

-4

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

10

-10

-5

0

5

10

15

20

SNR dB

(b)
Figure 6.8: Average BER as a function of SNR for WS-CDMA downlink.
Using OVSF codes G=64. (a) Using 30 users (b) Using 50 users
It is also worthwhile to compare the presented algorithms with a large data sample
set. Thus, Figure 6.8, Figure 6.9 and Figure 6.10 are the performance of the various
detectors with fairly long sample M=3000 in the DS_CDMA and WCDMA systems,
respectively. A plausible notice, the LMMSE detector gets better than other detectors

200

under good SNR conditions. But, still the proposed algorithm based on the feed backward
second configuration has exceeded the LMMSE detector at all SNRs less than 22dB.
Number of users K = 30, Using Gold Codes G = 63

0

10

-1

10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

SNR dB

(a)
Number of users K = 30, Using OVSF Codes G = 64

0

10

-1

10

-2

BER

10

-3

10

-4

10

-10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-5

0

5

10

15

20

25

30

SNR dB

(b)
Figure 6.9: Average BER as a function of SNR for DS-CDMA downlink.
For 30 users (a) Using Gold codes G=63. (b) Using OVSF codes G=64.

201

Number of users K = 30, Using Gold Codes G = 63

0

10

-1

10

-2

BER

10

-3

10

-4

10

-10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-5

0

5

10

15

20

25

30

25

30

SNR dB

(a)
Number of users K = 30, Using OVSF Codes G = 64

0

10

-1

10

-2

BER

10

-3

10

-4

10

-10

BER-mean-LMMSE
BER-mean-MF
BER-mean-RAKE
BER-mean-FB2
BER-mean-FB1
BER-mean-FF
BER-mean-BMUD
BER-mean-ICA
BER-mean-PCA
BER-mean-RICA

-5

0

5

10

15

20

SNR dB

(b)
Figure 6.10: Average BER as a function of SNR for WCDMA downlink.
For 30 users (a) Using Gold codes G=63. (b) Using OVSF codes G=64

Finally, we study the effect of the size of the sample set and the number of users
on the performance of the proposed method in Figure 6.11 and Figure 6.12, respectively.
In Figure 6.11, the simulation results show the BER vs. SNR with various K users at 500
symbols for each user for blind multiuser detection based on the second feedback
202

structure detector. It shows that the proposed detector perform less improvement of the
performance as K increases. Thus, Figure 6.12 shows the simulation results of BER vs.
SNR with 30 users (K=30) for various data samples (M). Although the proposed
algorithm seems robust for sample sets and performs well, it is obvious that the proposed
algorithm also improves more consistently in the performance as M increases and
mitigates the MIA. Overall, the proposed algorithm outperforms other algorithms in most
cases and performs better to solve the symbol estimation problem in DS/WCDMA
downlink system, especially when the size of the sample set is relatively small.

0

Number of SampleSet M = 3000, Using Gold code G = 64

10

K=10
K=15
K=20
K=30
K=40
K=50

-1

10

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

SNR dB

Figure 6.11: Average BER as a function of SNR for various number of users K

203

Number of users K = 30, Using Gold code G = 64

0

10

M=3000
M=2000
M=1000
M=800
M=400
M=200

-1

10

-2

BER

10

-3

10

-4

10

-10

-5

0

5

10

15

20

25

30

SNR dB

Figure 6.12: Average BER as a function of SNR for various sample sets M

204

6.6 Conclusion
This chapter carried out both simulation and theoretical demonstrations of the
blind multiuser detector based on the space state structures in the CDMA system. Also,
we develop the three blind multiuser detectors based on the three adaptive algorithms;
namely ICA, RICA and PCA. The results appear to show that the proposed algorithm
perform well in the symbol estimation problem in DS/CDMA systems and outperform
the other conventional detectors and the Adaptive MMSE. Our results also show that the
Multiple Access Interference (MAI) can be mitigated by the proposed algorithm, thus
improving the performance of blind multiuser detection. Although the proposed method
improves as the size of the sample set increases, the results show the proposed detector
performs well even though the sample sets are small, unlike the LMMSE detector.
Moreover, unlike the complexity of the LMMSE detector, the complexity of the proposed
methods, being a constant, didn’t increase exponentially. Finally, the proposed method,
unlike the adaptive LMMSE detector, has no restriction about the spreading codes since
they do not require the spreading codes of the interfering users. Therefore, it is a more
suitable choice in the downlink case and it does work in the uplink case as well.

205

7 Chapter 7

Constrained Blind Multiuser Detection for DSCDMA System

In direct sequence code division multiple accesses DS-CDMA communication
system, the blind multiuser detection is presented for enhance the computational
complexity and mitigate the multiple access interference (MAI) in the detector. The illcondition of the covariance matrix of the received signals degrades the performance of
the linear minimum mean-squared error LMMSE detector. Especially, when the Signal to
noise ratio is high and small data set is available for covariance matrix estimation. In this
chapter, we introduce a constrained blind multiuser detection in order to improve its
performance with imposing the regularization parameter to cope the ill-conditioning
problem of the covariance matrix and to mitigate the performance degradation. Through
simulation results, we show that the proposed method improves the performance of the
blind multiuser detection and outperforms the conventional multiuser detections.

7.1 Introduction
Multiuser Detection has been one of the significant topics in communication
system for past decades because of its potentials to suppress the multiple access
interference (MAI) efficiently in CDMA systems. Recently, significant attention has been

206

given on multiuser detection in blind manner, which only requires a prior knowledge
about of the signature sequence and timing of desired user [74-79].
In CDMA system, Multiuser detection has been presented in several works in
order to enhance channel capacity and to mitigate multiple access interference (MAI).
Multiuser detection was firstly established to obtain an optimum multiuser detector for
multi-Gaussian channel in [76]. In additional, several suboptimum detectors have been
proposed in [76] and [77], because of the computational complexity in the optimal
detector which make it unrealistic. In [76], [81] and [82], the training sequence
techniques were used to present suboptimal detectors, i.e. adaptive linear detector and
zero-forcing detector. In [75], they proposed suboptimal detector based on the Liner
minimum mean square error (LMMSE) method.
In general, the ill-posed linear equation problem has been arisen in blind
multiuser detection through the ill- covariance matrix which degrades the performance of
the LMMSE detectors [76], [81]. This problem affects the MUD when small observed
numbers of symbol users are considered, especially, in high signal to noise ratio
environment. For example, when a small data set transmits over slowly time varying
channels, we use a small data blocks within the channel coherence time. Therefore,
several works have suggested to use regularization techniques, i.e. [76], [80] to deal with
the ill-posed problem in order to gives a stable solution of the blind multiuser detection.
This chapter develops a Blind linear DS-CDMA detection technique, based on
minimum of energy output. The key idea of the proposed detector is to improve the
conventional blind detector by imposing a new constraint on the cost function and add a
regularization parameter to the covariance matrix to avoid the singularity especially in the

207

presence of high SNR environment [74]. The main focus of this chapter is to study the
effect of the new constraint and regularization parameter in the presence of the high
additive Gaussian noise AWGN for long and small data sets. Furthermore, we study the
robustness of the proposed detector due to see the effect of the mismatch between the
original code sequences and estimated ones on the performance of the proposed detector,
which most likely happens in multipath channels. The simulations are carried out to
observe variations in the bit error-rate as a function of signal to noise ratio, number of
users and number of symbols per user. Furthermore, the performance of the proposed
algorithm was studied and a comparison attempted between the proposed algorithm and
subspace blind multiuser detection [87] in terms of their performance. The remainder of
the chapter is organized as follows. In Section II, a brief description and derivation of
synchronous CDMA signal models are presented. Section III presents the conventional
detectors. Section IV proposes the proposed detection scheme. The comparative
simulations results and conclusion are given in Section VI and Section VII, respectively.

7.2 DS-CMDA Signal Model
Assume we have a downlink (synchronous baseband) DS-CDMA system with K
users. At the receiver, the sampled received signal during the ith symbol through the
match filter with chip rate Tୡ is given in vector form by [75]

࢘(݅) = ∑௄
௞ୀଵ ‫ܣ‬௞ ܾ௞ [݅]‫ݏ‬௞ + ݊(݅)							∀	݅ = 0, 1, 2, … , ‫ ܮ‬− 1
(7.1)

where
r(i) = [rଵ (i), rଶ (i), … , r୒ (i)]୘ , where N denotes the number of processing gain.
•

A୩ is the received amplitude
208

•

b୩ [i] ∈ {±1}

are the ith transmitted signal and are assumed to be

independent
•

s୩ is the normalized signature waveform of the kth user.

•

n[i] is the additive white Gaussian noise (AWGN) and assumed to have

zero mean and covariance matrix equals σଶ I୒ , where I୒ is an N identity

matrix.
•

L is the number of user symbol

Without loss of generality, we assume that the signature waveforms s୩ are
linearly independent and the noise n[i] is independent of user data. However, the
covariance matrix of received signal {C = E[rr ୘ ]} is given by:
C = E[rr ୘ ] = ∑୏୩ୀଵ Aଶ୩ s୩ s୩୘ + σଶ I୒

(7.2)

According to [74] the decision output of the linear detector for ith transmitted
symbol of the user one can be described as a weight vector	mଵ ∈ R୒ , however, it is given
by

b෠ ଵ (i) = sgn ቀmଵ୘ r(i)ቁ = sgn(r ୘ (i)mଵ )

(7.3)

Let’s denote that R = [r(0), r(1), … , r(L − 1)] and	b୐ = [bଵ (0), bଵ (1), … , bଵ (L −
1)], so equation (3) becomes in vector form as
ࡾ ∙ ࢓૚ = ࢈ࡸ

(7.4)

7.3 Conventional Blind Multiuser Detection
Despite the fact that the conventional linear detectors based on the Least Square
(LS), Zero-Force (ZF) and BLUE algorithms perform poor especially in colored noise

209

presence, the LMMSE detector is considered as one of the best linear detector for DSCDMA system [74] –[85]. However, one can express the LMMSE as follows:
m୭୮୲ = argmin{E[‖b୩ − mୌ r‖ଶ ]}

(7.5)

So, the MOE detector is given by [1], [7] as follow:

mଵ = (sଵ୘ C ିଵ sଵ )ିଵ Cିଵ sଵ

(7.6)

Subject to mୌ s୩ = 1

According to literature [74], the ideal bit error rate (BER) of Minimum of Energy
detector (MOE) under the constraint mୌ s୩ = 1 is given by following the approximation
Pୣ ≈ Q ቌ

ቍ = Q ቀ஢‖୫భ ‖ ቁ

୅భ

୅

ቆ஢ට୫౐
భ ୫భ ቇ

భ మ

(7.7)

where
Q(x) = ቀ

ଵ

√ଶ஠

ቁ ‫ ୶׬‬exp ൭−x ൗ2൱ . dx
ଶ

ஶ

(7.8)

However, in practice, the covariance matrix is computed using the observation
value as follows:

C ≈ C෨ = ∑୐୧ୀଵ r[i]r ୘ [i]

(7.9)

As it has been reported in literature, the detector suffers from the ill-conditioned
problem, especially, when the SNR is high and the data sample set is small, which cause
degrading the performance of the detector.

Whilst the covariance matrix C෨ sometimes in practice becomes in the ill-

conditioned matrix, as a result, the inversion of the ill-conditioned matrix will lead to ill210

posed problem, which degrades the robustness of the detector. Several methods have
been proposed to avoid this ill-posed problem in literature, the most two important
methods are subspace decomposition algorithm and Regularization method, which have
already been proposed in [76] and [79], respectively. Here, we impose a new constraint to
speed up the convergence and improve the performance; also, we use a new regulation
rule to avoid the ill-posed problem. The detector based on the subspace decomposition
algorithm (SBMUD) is given by [74]-[79]
ୌ ିଵ
ିଵ ୌ ିଵ
mୗ୆୑୙ୈ = ൣs୩ୌ [Uୱ Dିଵ
ୱ Uୱ ] s୩ ൧ [Uୱ Dୱ Uୱ ] s୩
ିଵ

(7.10)

SBMUD can solve the ill-posed problem by dividing the ill-conditioned matrix
into two subspaces, which are the signal subspace and the noise subspace. The main
problem in this method is the performance of the detector totally depends on the
estimation covariance, which increases the chance to degrade the performance of the
detector if the estimation covariance matrix presents a large deviation from the ideal
covariance.
Li Hu, in [76], applies Tikhonov regularization [80] to mitigate the ill-posed
algorithm. It is well-known that the regularization is effective way to avoid the illconditioned matrix. The detector based on Tikhonov is given by
ିଵ
m୰ୣ୥୳୪ୟ୰ = ቂs୩ୌ ൣC෨ + αI൧ s୩ ቃ

ିଵ

ିଵ
ൣC෨ + αI൧ s୩

(7.11)

Where α is the regularization parameter, note that the regularization method just
adds energy constraint to boosting the covariance matrix to be well-conditioned matrix.
In [76], they use two rule for α based on Tikhonov [80] which are a) α = m. tr൫C෨ ൯ where

m is a positive constant and tr(∙) represent the trace of estimation covariance matrix C෨

211

and b) α = c. λ୫ୟ୶ where c is a positive constant and the λ୫ୟ୶ is a maximum eigenvalue
of estimation covariance matrix C෨ .

7.4 The Proposed Detection Scheme
In this section, a new blind detection strategy is proposed, based on the minimum
of energy of the output [74]. Here, one needs to first recall the received signal model (3)
r = Hb + n

(7.12)

Without loss of generality, we assume that the first user to be the desired user.
However, one can estimate the desired user as a weight vector	w ∈ R୒ . Therefore, the
output is given by:
y = wୌr

(7.13)

The output power E[y ଶ ] can be expressed as follows
E[y ଶ ] = w ୌ Rw

(7.14)

where R = E[rr ୌ ] is the covariance matrix, and E[. ] is expectation operator. In
order to avoid the ill-conditioning of the covariance matrix R of the received signal, we
are going to impose the regularization parameter into the energy functions under the
following two constraints
		ቊ

w ୌ sଵ = 1	
∑୏୩ୀଶ w ୌ s୩ = 0

(7.15)

However, the proposed blind linear detector can be expressed as the following
constrained optimization problem:
w୭୮୲ = arg min୵ {E[‖bଵ − w ୌ r‖ଶ ] + c‖w‖ଶ }
Subject to		ቊ

w ୌ sଵ = 1	
∑୏୩ୀଶ w ୌ s୩ = 0

212

(7.16)

Therefore, one can solve the above constraint problem using the augmented
Lagrangian method, so the cost (energy) function is given by
J = w ୌ Rw + c‖w‖ଶ + γଵ (w ୌ sଵ − 1) + γଶ ൫∑୏୩ୀଶ w ୌ s୩ ൯

(7.17)

where = m. (tr(C) + λ୫ୟ୶ ) , it is regularization parameter with m is a positive
constant.

γଵ 	and	γଶ are the Lagrangian multipliers. Therefore, the gradient of J

expression is
g = 2Rw + 2cw − γଵ sଵ + γଶ ∑୏୩ୀଶ s୩

(7.18)

Let’s define that
୏

s ଶ = ෍ s୩
୩ୀଶ

Therefore, the gradient g in (4.18) becomes
g = 2Rw + 2cw + γଵ sଵ + γଶ sଶ

(7.19)

Then
ଵ

w୭୮୲ = ଶ [R + cI]ିଵ [γଵ sଵ + γଶ sଶ ]

(7.20)

Where I	is an N	x	N identity matrix. Under the first constraint, we have
w ୌ sଵ = 1

(7.21)

However,
ஓ

ቂ ଶభ [R + cI]ିଵ sଵ +

ஓమ

ୌ

[R + cI]ିଵ sଶ ቃ sଵ = 1

(7.21)

γଵ = [sଵୌ [R + cI]ିଵ sଵ ]ିଵ ቂ2 − γଶ [sଶୌ [R + cI]ିଵ sଵ ]ቃ

(7.23)

ଶ

Therefore,

Now, under second constraint, we have
∑୏୩ୀଶ w ୌ s୩ = 0
213

(7.24)

Then
୏

෍ቂ
୩ୀଶ

ୌ
γଵ
γଶ
[R + cI]ିଵ sଵ + [R + cI]ିଵ sଶ ቃ s୩ = 0
2
2

However,
୏

୏

୏

୩ୀଶ

୩ୀଶ

୩ୀଶ

γଶ
γଵ ୌ
sଵ [R + cI]ିଵ ෍ s୩ + ෍ s୩ୌ 	 [R + cI]ିଵ ෍ s୩ = 0
2
2
୏

γଵ ୌ
γଶ
sଵ [R + cI]ିଵ sଶ + ෍ s୩ୌ 	 [R + cI]ିଵ sଶ = 0
2
2
୩ୀଶ

Let’s define that
sଵଵ = sଵୌ [R + cI]ିଵ sଵ
୏

sଶଵ = ෍ s୩ୌ 	 [R + cI]ିଵ sଵ
୩ୀଶ

sଵଶ = sଵୌ [R + cI]ିଵ sଶ
୏

sଶଶ = ෍ s୩ୌ 	 [R + cI]ିଵ sଶ
୩ୀଶ

Therefore,
γଵ = sଵଵ ିଵ [2 − γଶ sଶଵ ]

(7.25)

Then
sଵଵ ିଵ [2 − γଶ sଶଵ ]sଵଶ +

γଶ
s =0
2 ଶଶ

[2sଵଵ ିଵ sଵଶ − γଶ sଵଵ ିଵ sଶଵ sଵଶ ] +
ଵ

γଶ = 2 ቂsଵଵ ିଵ sଶଵ sଵଶ − ଶ sଶଶ ቃ
214

ିଵ

γଶ
s =0
2 ଶଶ

sଵଵ ିଵ sଵଶ

(7. 26)

7.5 Simulation Results
In this section, the simulated DS-CDMA downlink data in the presence of AWGN
is carried out to verify the proposed method and compare it to the Subspace Blind
Multiuser Detector (SBMUD). We used spreading codes with short gold for the length of
chips to be C=31. Thus, the maximum number of users K =30 and assumed all signals for
all users are sent at the same power. Monte Carlo Simulation was run to verify the
validity of the algorithm simulations. Figure 7.1. shows the simulation results of SNR vs.
BER for all detectors. The parameters were set as: Number of symbols L=500 and 1000
respectively, Number of users K=15, with various values of SNR 0 dB to 12dB. In Figure
7.1, we can see that the proposed algorithm improves the performance of the system,
given the lowest BER regards to the SBMUD, and outperforms the performance of the
SBMUB. Furthermore, we can also observe that the performance of the proposed
algorithm slightly outperforms the performance of the regularized algorithm [76] at small
sample sets, i.e. L=2N and 3 N, in Figure 7.2.
0

10

idea
SBMUD-L=1000
SBMUD-L=500
Alg+regul-L=1000
Alg+regul-L=500

-1

10

-2

10

-3

10

-4

10

-5

10

0

2

4

6

8

10

12

Figure 7.1: Average BER as a function of SNR for 15 users

215

0

10

-1

10

idea
regular-3N [3]
regular-2N [3]
Alg-3N
Alg-2N
Alg+regul-3N
Alg+regul-2N

-2

10

-3

10

-4

10

0

2

4

6

8

10

12

Figure 7.2: Average BER as a function of SNR for 15 users with L=2N, L=3N.
Fig. 7.3 show the simulation results of BER vs. Regularization parameter with 15
users and 1000 sample sets for various SNR at	SIR ୩ = −20	dB and	SIR ୩ = 0	dB. It is
obvious that the BER performance at 	SIR ୩ = −20	dB is worse than that at	SIR ୩ = 0	dB.
Furthermore, the BER performance performs well and outperforms the direct matrix
inversion (DMI), which occurs at		g = 0, under all SNR values. Moreover, the
regularization parameter can be chosen as	g = 0.025. In order to study the effect of the
signature waveform mismatch on the BER performance as in [3].

216

SNR = 0 dB
k

0

10

SNR =
SNR =
SNR =
SNR =
SNR =

-1

10

0 dB
5 dB
10 dB
12 dB
15 dB

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

0.05

0.1

0.15

0.2
m

0.25

0.3

0.35

0.4

Fig.3a: L = 1000 and SIR =0 dB
SNR = -20 dB
k

0

10

SNR =
SNR =
SNR =
SNR =
SNR =

-1

10

0 dB
5 dB
10 dB
12 dB
15 dB

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

0.05

0.1

0.15

0.2
m

0.25

0.3

0.35

0.4

Fig.3b: L = 1000 and SIR =-20 dB
SNRk = 0 dB

0

10

SNR =
SNR =
SNR =
SNR =
SNR =

-1

10

0 dB
5 dB
10 dB
12 dB
15 dB

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

0.05

0.1

0.15

0.2
m

0.25

0.3

0.35

0.4

Fig.3c: L = 500 and and SIR =-20 dB
SNR = -20 dB
k

0

10

SNR =
SNR =
SNR =
SNR =
SNR =

-1

10

0 dB
5 dB
10 dB
12 dB
15 dB

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

0.05

0.1

0.15

0.2
m

0.25

0.3

0.35

0.4

Fig.3d: L = 500 and and SIR =-20 dB

Figure 7.3: Average BER as a function of SNR for 15 users For various
L sample sets

217

Figure 7.4 represents the BER performance corresponding to channel

coefficients	Hଵ = [0.7, 0.2, 0.1]	with	‫ۦ‬sଵ |s෤ଵۧ = 0.7032,

and

	Hଶ = [0.65, 0.15, 0.3]

and	‫ۦ‬sଵ |s෤ଵ ۧ = 0.6597, respectively; where s෤ଵ represent the effective signature waveform
vector. And for L=1000 and K= 15 users. Herein, it is obvious that the proposed detector

has almost the same performance in the mismatch case and without mismatch.
Furthermore, it is clear that the performance degrades as mismatch increases as shown in
Figure 7.4. Despite the performance degrades with mismatch increases, the proposed
algorithm still performs well and gives a reasonable performance close to the match one.
Overall, the proposed detector performs better for solving symbol estimation problem in
DS-CDMA system and avoids the ill-condition in inversion matrix.

0

0

10

10
No mismatch
Ideal for H1
with mismatch H1

No mismatch
Ideal for H2
with mismatch H2

-1

-1

10

BER

BER

10

-2

10

-3

-3

10

10

-4

10

-2

10

-4

0

5

10

15

10

0

m

5

10

15

m

Figure 7.4: Average BER as a function of SNR for 15 users with L=1000.

218

7.6 Conclusion
In this study, we have developed the constraint blind multiuser detection based on
minimum of energy.

Furthermore, we use the regularization method to avoid the

singularity in covariance matrix and ill-posed problem. The results appear to show that
the proposed method performs well in the symbol estimation problem in DS-CDMA
systems and outperforms the other detectors. Our results also show that the Multiple
Access Interference (MAI) can be mitigated by the proposed method, thus improving the
performance of conventional Detection. Furthermore, the results show the proposed
detector displays most robustness in the performance as mismatch between the original
sequence and estimated one increases.

219

8 Chapter 8

Hardware Implementation

In this chapter, we investigate the ICA algorithms in terms of hardware
implementation. Although software implementation is important to investigate the
capabilities of ICA algorithms and to simulate significant aspects of applications,
Hardware implementation provides real time solutions and an optimal parallelism method
in terms of fast convergence. Furthermore, software implementation may suffer from
insufficient memory problems because the large data sets of the ICAs’ applications and
its high dimensionality. Thus, hardware implementations are a promising approach to
implement the ICA algorithms and they are executed by Integrated Circuit (ICs). Owning
the high speed processing and the parallel architecture features make the hardware
implementation outperforming the software implementation in terms of sufficient
memory and fast convergence [105].

220

8.1 Introduction
During the last decade, several works have been presented to implement some of
ICA algorithms on fully analog CMOS circuits, mixed (analog and digital) signal
integrated circuit, application-specific integrated circuit (ASICs) and field programmable
gate arrays (FPGAs).
In both analog and mixed CMOS integrated circuits, designers can integrate and
create a fully customized design based on analog CMOS technologies or Mixed
technologies. Although these aforementioned methods use the silicon in a more efficient
way, the costs of these methods are significantly high, especially in terms of design
expense and process. Therefore, the digital ASICs and FPGAs are considered to be an
efficient ways to implement the ICA algorithms in general. Furthermore, one can
consider the FPGAs based on the reconfigurable technology are the most promising
technique to implement the ICA algorithm in terms of cost, since it allows the end user to
modify and re-configure their designs multiple times.
One can refer to the survey in [99], where they studied and investigated the
implementation of ICA algorithm based on very large scale integration (VLSI)
approaches. Also, as we aforementioned, several designers implemented some of ICA
algorithms based on the analog and mixed integrated technologies i.e. [100], [104] and
[106].

8.2 Comparative Study of Existing Solutions to implement ICA Algorithms
In the last decade, VLSI technologies have been presented with several
advantages which make it great choice to implement the ICA algorithms.

221

8.1.1

Analog CMOS Integration Circuit

Implementation of ICA algorithms based on an analog integrated circuit is usually
the first choice in terms of low circuit delay and power consumption. Analog circuit
design allows the end users to work at transistor level and necessary interconnections.
Therefore, it emphasizes that the application based on analog integrated circuit has the
minimum amount of transistors and the shortest interconnections to achieve the high
circuit density and then low circuit delay and power consumption.
In implementation process, the implementation of ICA algorithms can be utilized
by dividing the design process into several groups based on the functions. However, one
can design a simple module structure such as 2 x 2 input-output structures in order to
extend it to any size readily. Also, this method has the ability to control the design area
based on the application and it can be connected to external peripherals due to AnalogDigital Converter / Digital-Analog Converter on or off the chip.
In [101], Cohen and Andreou proposed two chips to implement the H-J ICA
algorithm of speech signals. Fabrication was based on a 2-um, n-well CMOS process.
Also, Gharbi and Salem in [136], proposed a chip design for the H-J algorithm using the
2 um CMOS technology. Lately, Cho and Lee in [100] presented an analog CMOS chip
to implement the InfoMax ICA algorithm using a 0.6 um, p-well, AMS CMOS process.
Their design consisted of a multiplier circuit, weight update circuits and quadrature
function circuit.
The analog integrated circuit design has several drawbacks such as high expense
of the workstation-based development system and slow turnaround time (approximately
eight weeks). Thus, one can consider it an insufficient method to fast implementation of
222

most ICA designs [99]. Moreover, the analog CMOS circuits sometimes suffer from the
transistor mismatch which affects the performance of the CMOS circuit. Transistor
mismatch occurs due to edge effects, striation effects and random variations, for more
details refer to [137].
In spite of all the aforementioned drawbacks, the analog integrated circuit is still
very efficient in terms of design. Also, one can solve the transistor mismatch problem
through increasing the current due to increasing the length or the width of the transistors
and using the concentric structure to ensure the matched transistors by sharing the same
surrounding structures.

8.1.2

Mixed Signal Techniques (Analog and Digital Circuit)

Mixed Signal Techniques provide an alternative approach to the analog integrated
circuit. Actually, it combines the analog and digital circuit to take advantage of digital
circuit in terms of fast switch and easy implement for certain application i.e.: Digital to
Analog circuit. Although, mixed signal circuit outperforms some of ICA implantation
based on analog integrated circuit, it still suffers from the expensive cost in terms of the
workstation-based and the long time of the prototype turnaround period.
In [102], they simplified the H-J and infomax ICA algorithms and proposed a new
chip design based on the mixed signal design. Also in [105], they used the parallel VLSI
architecture to implement the feed-forward network. The chip was fabricated by 0.5 um
two poly three metal CMOS technology.
According to aforementioned reasons, one can say that the analog CMOS and
Mixed signal techniques provides the end user with an efficient full-custom solutions to

223

ICA algorithms. But it still requires having sufficient knowledge about the transistor level
design and the physical problem. Also, it considers costing expensive in sense of time
and cost of the work-station.

8.1.3

ASIC Solutions

Application Specific Integrated Circuit (ASIC) is considered to be one of the
digital VLSI technologies which also include FPGAs. ASICs typically contain about ten
million gates. They allow the end user to take advantage of the large number of libraries
that are provided by IC vendors. Thus, one can call ASICs semi-custom solutions.
Actually, although ASICs somehow increase the design risk and the cost in the sense of a
nonprogrammable feature; it provides solutions for the very complex ICA algorithms
with a good compact circuit design and low power consumption.
Table 8.1 compares the Implementation of ICA algorithms based on analog IC
and mixed IC and ASIC solutions. Clearly, compact circuit design (ASIC) achieve the
best performance in terms of low power circuit as a result of small chip design.

8.1.4

FPGA Solutions

Field programmable gate arrays (FPGAs) are products that are fabricated with
specific standards and general-purposed by hardware companies. They allow the end user
to implement a specific task or design on them. Also, the end users can modify their
designs several times and program the interconnections in a few hours instead of waiting
several weeks for the final fabrications. Therefore, FPGAs have outperformed the other
224

VLSI technologies in terms of the turnaround period and the development expense.
Typically, FPGAs contains more than 2000 gates and up to 2 million.
In the literature [103-115], many works have been proposed to implement the
ICA algorithm based on the FPGA technologies [103], [104] and [106]. Also, according
to [138], the current growth of the FPGA/ASIC technologies has reached far beyond the
Moore’s Law. Table 8.2 and Table 8.3 presents the recent FPGA solutions to implement
the ICA algorithms. Although FPGAs outperforms the other VLSI technologies by
having reconfigurable and reusable features, they usually suffer from the higher circuit
delay which restricts their capacities. For more details refer to [99].

225

Table 8.1: Comparison of Analog, Mixed signal and ASIC Solutions
Research
Group
Cohen and
Andreou [101]
Gharbi and
Salam [136]
Cho and lee
[100]
Celik et al.
[102]
Du et al.
[104]
Research
Group
Cohen and
Andreou [101]
Gharbi and
Salam [136]
Cho and lee
[100]
Celik et al.
[102]
Du et al.
[104]
Research
Group
Cohen and
Andreou [101]
Gharbi and
Salam [136]
Cho and lee
[100]
Celik et al.
[102]
Du et al.
[104]

ICA algorithm

VLSI Category

Herault -Jutten

Analog

Herault-jutten

Analog

Infomax ICA

Analog

H-J, infomax ICA

Mixed

Parallel ICA

ASIC

Fabrication Technology

Chip Size

2 um n-well 2M2P CMOS

N/A

2um CMOS

2.22 x 2.25

0.6 um p-well 3M2P
CMOS
0.5 um 3M2P CMOS

2.8 x 2.8

0.18 um 6 M1P CMOS

1.191 x 1.191

Input X Output

Voltage (V)

2x2

-2.5 – 2.5

2x2

N/A

4x4

-2.5 – 2.5

3x3

N/A

1 x 1 sequential

-1.8 – 1.8

226

3x3

Table 8.2: Comparison of FPGA Solutions
Research Group

ICA algorithm

Lim et al. [99]

MI and DO ICNNs

Nordin et al. [109]

Pipelined InfoMax

Satter and Charay.
[113]
Wei and Charo.
[111]
Kim et al. [103]

Infomax ICA

Xilinx Virtex E

Infomax ICA

Xilinx Virtex E

InfoMax ICA

Du and Qi [104]

Parallel ICA

Research Group

Frequency

Samples

Lim et al. [99]

155K Hz MI, 3.62 KHz
(DO)
N/A

1500

Nordin et al. [109]

FPGA
Xilinx Virtex
XCV 812E
N/A

Altera
EP20K600E
Xilinx Virtex
V1000E

N/A

Satter and Charay.
[113]
Wei and Charo.
[111]
Kim et al. [103]

60M Hz

2500

12.288 M Hz

N/A

N/A

N/A

Du and Qi [104]

20.161 MHz

60000

Research Group

Capacity (Million Gates)

Design Utilization

Lim et al. [99]

25

Nordin et al. [109]

N/A

0.7 % MI, 0.5%
DO
N/A

Satter and Charay.
[113]
Wei and Charo.
[111]
Kim et al. [103]

0.6

6%

0.6

15%

0.6

N/A

Du and Qi [104]

1.0

92 %

227

Table 8.3: Comparison Results Among Various ICA Implementations
[101]
Speech
ICA
2

[103]
Image
pICA
20 (WVs)

[102]
Image
pICA
4 (WVs)

[103]
Speech
FastICA
2

[104]
EEG
InfoMax
4

Speed
(M Hz)
Power
Dissipation
(mW)
Gates
(Million)
Computation
Time (Sec)

12.288

35.92

20.161

50

68

100

98.8

N/A

N/A

N/A

N/A

16.35

0.0114

N/A

0.2295

N/A

0.315

0.272

60

1129.5

N/A

0.003

N/A

0.29

Implementatio
n
Approach

ASIC

FPGA

ASIC

FPGA

FPGA

FPGA

Application
Algorithm
Number of
Channels /
Weight Vectors

228

[112]
EEG
FastICA
8

8.3 Multiplier Design
In this section, we design a down-conversion mixer (multiplier). So, the proposed
mixer is shown in Figure 8.1. The structure is a modified variation of the Gilbert cell
double balance mixer which has benefits of good port-to-port isolation and low evenorder distortion.

The circuit consists of RF stage(M୒ଵ , M୒ଶ ), LO Switching Circuit(M୒ଷ − M୒଺ ),

Current Injection Circuit(M୔ଵ − M୔ଷ ), Boosting Inductor(Lଵ , Lଶ ), a load resistance

stage(Rଵ , R ଶ ), and an output driver stage(M୒ଵ , M୒ଶ ). The trans-conductance stage “RF
stage” amplifies the input differential RF signals. This stage is composed of the stacked
NMOS-PMOS transistors. We mixed differential signals from RF and LO input ports
through operating LO signals as an ideal switch function. We added Current Injection

Circuit(M୔ଵ − M୔ଷ ) to improve the conversion gain and linearity. Output driver

transistors(M୒ଵ , M୒ଶ ) are common-source stages to match the output characteristic
impedance of 50.

The parasitic capacitance at the source node of LO Switching

Circuit(M୒ଷ − M୒଺ ) affects the mixer performance significantly.

229

Figure 8.1: The proposed Mixer (Multiplier) schematic.
So, we used boosting inductor
inductor(Lଵ , Lଶ ) to resonate with the parasitic capacitance in
order to improve the performance and specifically to improve the linearity. Transistors
(M୒ଷ − M୒଺ ) form bias circuit to provide bias current to other stages in the circuit.

We used the current injection technique to maintain the total current of the mixer
of the mixer and to reduce the switching stage flicker noise since less noise is generated
at the output with less current flowing through the switching stage. The drawback from
current injection is the parasitic capacitance at the source of the switching stage where
this capacitance becomes larger. So, the two inductors are placed between the RF input
stage and the switching stage. Furthermore, the series resonated inductors provide high
impedance so as to improvee the conversion gain with good gain flatness and linearity.

230

The NF is defined by amount of noise contributed by the circuit. The mixer
carries out both the RF and the image signals to the same IF.
So for a noiseless mixer the output SNR is half the input SNR then NFSSB of a
noiseless mixer is 3 dB. So, the NF is

ܴܵܰோி
ܰ‫ = ܨ‬10 log ൤
൨					[݀‫]ܤ‬
ܴܵܰூி
ܰ‫ܨ‬ௌௌ஻ = 3	݀‫ ܤ‬+ ܰ‫ܨ‬஽ௌ஻

To measure mixer's performance depends on power consumption, conversion
gain, linearity, and noise figure. A Figure of Merit (FoM) is a quantity used to
characterize the performance of a device that attempts to combine all the important
parameter values that describe the performance of a circuit.
This value could be used to measure the performance of the mixer circuit so, FoM
is:
FoM =

Gain(abs). IIP3(mW)
NF(abs). Vୢୢ . Power(mw)

Where IIP3 is input third-order intercept point, Vdd is the power supply; NF is
Noise Figure of the circuit. And power is the power consumption.

8.4 Simulation Results
In this section, we present simulation results for the Mixer circuits. The presented
mixer is designed by TSMC The circuit is 0.18	μm CMOS RF process and is simulated
using the Cadence tool.
The proposed Mixer design described in previous section is operated around 1.9
GHz. biased at 1.2V supply voltage. As shown in Figure 8.2, with an RF power of 30dBm and an LO power of 5dBm, IF frequency of 250MHz, the conversion gain
231

is15.9	 ± 0.4dB.. The results show good gain flatness within the IF band. Figure 8.3
shows the DSB-NF
NF of our design as a function of tthe
he IF frequency. It is clear that the
DSB noise figure is less than 7.2 dB. So, the SSB Noise figure would be less than 10 dB.
Figure 8.4 presents the conversion gain versus the power supply, it is clear that
the mixer can work very well with low voltage supply. Furthermore, Figure 8.4 shows the
gain conversion versus the power of LO, it is so obvious the stability of our design and
the higher gains over the wide band LO frequency. For the IIP3 point, we use the two
tones test to measure it. As shown in Fi
Figure 8.5,, we get a suitable IIP3 of 10.25 dBm.
And the 1-dB
dB compression point of -0.8
0.8 dBm. Then FoM is 0.194, which outperforms
results reported in the literature
literature.

Figure 8.2: Voltage Conversion Gain versus IF

232

Figure 8.3: Voltage Conversion Gain versus IF

8.5 Conclusion
In this chapter,, we used the TSMC 0.18um CMOS to simulate the mixer design. The
mixer demonstrates high linearity and high gain performance. Furthermore, we achieved
a good noise figure. On the other side, we achieved a mixer that has a 15.9 dB conversion
gain and 10 dBm IIP3 linearity with 7.2 dB DSB
DSB-noise
noise figure. In this chapter, the
injection current with boosting inductor achieve a high performance for the mixer
(multiplier) at low voltage supply.

233

Figure 8.4: Voltage Conversion Gain versus IF

Figure 8.5: Voltage Conversion Gain versus IF

234

Figure 8.6: The proposed Mixer (Multiplier) Layout.

235

9 Chapter 9

Conclusion and Future Work

In this chapter we conclude the dissertation and highlight directions for future work.

9.1 Conclusion
In this dissertation, Chapter 1 provided the background needed for the discussion
of blind source separation problem. The benefits of blind techniques were discussed
along with its applications in wireless communications, and speech enhancement.
Chapter 2 performs a thorough review of the BSS\ICA algorithms, and then it
gives an overview of the ICA algorithms and emphasizes the approaches that influenced
our work. It also studies some of the methods that have been developed to solve the ICA
problems in the case of instantaneous and convolutive mixtures.
In Chapter 3, a novel divergence measure class is presented based on integrating
convex functions into the Cauchy–Schwarz inequality. This divergence measure is used
as a contrast function to develop new ICA algorithms to solve the Blind Source
Separation (BSS) problem. The CCS-DIV derived algorithms can be controlled to attain
the steepest descent towards the minimum value. Also, a pairwise iterative scheme is
employed to address the high dimensional problem in BSS. Two schemes of pairwise
non-parametric ICA algorithms are developed based on the proposed divergence. Several
236

examples and experiments are carried out to show the improved performance of the
proposed divergence. Furthermore, this chapter compares the metric performance with a
host of leading ICA algorithms. We have developed also nonparametric CCS–ICA
approaches to demixing where the source signals are estimated by the Parzen Window
density. The convergence speed of the parameterized CCS–ICA procedure is evaluated
and compared to other algorithms. The proposed CCS–ICA algorithms attained the
highest SIR in separation of speech and music signals relative to other leading ICA-based
algorithms.
In chapter 4, we presented the RobustICA-based algorithm to solve the
frequency-domain BSS problem for convolutive acoustic mixtures in several adverse
conditions. Through the real-world experiments, we show the superiority of the presented
algorithm among other popular algorithms in the literature in terms of the performance
and complexity computation. Moreover, we compared several permutation solvers in
terms of computation complexity and performance to provide the RobustICA-based
algorithm with an efficient frequency-dependent permutation scheme. Finally, we studied
the effect of several parameters on the separation performance of the presented algorithm.
We also presented the effect of the type of the window on the separation performance and
we also showed that the performance improves at a certain range of overlapping between
the signals. Lastly, in this chapter, we showed the performance of a system that can work
efficiently with around 0.5–10 seconds of input data, which is close to the real-time
implementation. Accordingly, our proposed algorithm is optimized to be suitable for the
real-time operation. As a result, it is suitable for a large number of applications to ensure
the real-time implementation.

237

Chapter 5 has investigated three adaptive algorithms for user-detection in CDMA
systems, the proposed one based on fourth order cumulant matrices, the Fast ICA and the
Robust ICA algorithms. The results show that the proposed algorithm exhibits better
performance relative to the other two user detectors. The results also show that the
proposed algorithm can mitigate Multiple Access Interference (MAI), thus improving the
performance of conventional detection. Furthermore, the performance of the proposed
detector displays the most consistent improvement as M (the number of symbols)
increases. Also, we assess the performance of computational complexity of the three user
detection algorithms employing the average signal of mean square error SMSE, as a
contrast function of independent criteria. The results show that the proposed detector
provides a faster and more robust performance.
Chapter 6 carried out both simulation and theoretical demonstrations of the blind
multiuser detector based on the space state structures in the CDMA system. Also, we
develop the three blind multiuser detectors based on the three algorithms ICA, RICA and
PCA. The results appear to show that the proposed algorithms perform well in the symbol
estimation problem in DS/CDMA systems and outperform the other conventional
detectors and the Adaptive MMSE. Our results also show that Multiple Access
Interference (MAI) can be mitigated by the proposed algorithms, thus improving the
performance of blind multiuser detection. Although the proposed method improves as the
size of the sample set increases, the results show the proposed detector performs well
even though the sample sets are small, unlike the LMMSE detector. Moreover, unlike the
complexity of the LMMSE detector, the complexity of the proposed methods, being
constant, didn’t increase exponentially. Finally, the proposed algorithms, unlike the

238

adaptive LMMSE detector, have no restriction regarding the spreading codes since they
do not require the spreading codes of the interfering users. Therefore, it is a more suitable
choice in the downlink case and it does work in the uplink case as well. Moreover, In
Chapter 7, we introduce a constrained blind multiuser detection in order to improve its
performance with imposing the regularization parameter to cope the ill-conditioning
problem of the covariance matrix and to mitigate the performance degradation.
In Chapter 8, we investigate the ICA algorithms in terms of hardware
implementation. Although software implementation is important to investigate the
capabilities of ICA algorithms and to simulate significant aspects of applications,
Hardware implementation provides real time solutions and an optimal parallelism method
in terms of fast convergence. Furthermore, software implementation may suffer from
insufficient memory problems because the large data sets of the ICAs’ applications and
its high dimensionality. Thus, hardware implementations are a promising approach to
implement the ICA algorithms and they are executed by Integrated Circuit (ICs). Owning
the high speed processing and the parallel architecture features make the hardware
implementation outperforming the software implementation in terms of sufficient
memory and fast convergence.

9.2 Future Work
This section provides directions of future work for the area of Blind Source
Separation and its implementation. Specifically, we itemize the research activities in the
following:
Optimization: We presented a new Convex Cauchy–Schwarz Divergence (CCS-DIV)
measure for Blind Source Separation (BSS) and unsupervised learning of acoustic and
239

speech signals. The CCS-DIV measure is developed by integrating convex functions into
the Cauchy–Schwarz inequality. By including a convexity quality parameter, the measure
has a broad control range of its convexity. With this new measurement technique, a new
CCS–ICA algorithm is structured and a non-parametric form is developed incorporating
the Parzen window-based distribution. Moreover, the CCS–ICA algorithm has a
controlled speed towards timed convergence. Several case-study scenarios were carried
out on instantaneous and noisy mixtures of speech signals. Finally, the superiority of the
proposed CCS–ICA algorithm is demonstrated in metric-comparison performance with
FAST ICA, Robust ICA, convex ICA (C-ICA), and other leading existing algorithms.
The gradient-type algorithms can be considered to be robust optimization techniques; but
they usually suffer from several drawbacks in terms of convergence and stability. Also,
the convergence of the gradient-type algorithms is relatively slow and their stability relies
on the choice of the learning rate. Therefore, one can upgrade the optimization method
that is faster and more robust algorithms such as decoupled and fast relative newton
optimization as in [8] and [9] respectively.
Online implementation: one of the most challenging questions about any proposed
algorithm is that if it can work on-line or not. Real-time implementation is very important
to measure the efficiency of the proposed algorithm. Therefore, our new algorithms will
be extended to online implementation. An interesting approach to implement the
algorithm is to work in a mixed block-based and real time methods such as a block LMStype structure [2], [42], [48], [71]. In this approach, some data is stored in a series of
buffers in order to be processed sequentially and the results are the sequential blocks. In
this case, the challenge is to find the optimal length of this local buffer in order to

240

perform the separation process with acceptable performance. Moreover, the real-time
DSP processor can handle the computational cost without interruptions or distortions.
The challenge here is to determine the length of this interval which needs to be selected
based on two parameters:
It should be short enough for the mixing environment to be considered
non-stationary
It should be long enough to perform a separation processes successfully
with an excellent.
This is the same idea as non-stationary mixing case in [1], [98], [111], [114], and
[120].Therefore, finding an optimal length of the block of data “interval” might solve
both problems at the same time.
Underdetermined mixtures
If the number of observations (sensors) ݉ is less than the number of sources	n, the
mixing process is referred to be underdetermined (not invertible) [1], [52], [53]. The
separation processes can be attained successfully in the frequency-domain up to scaling
and permutation ambiguities under the assumption that the mixing matrix ‫ )݂(ܪ‬is full
column rank at each frequency bin. However, when the number of source signals is more
than the number of sensors, the assumption on the mixing matrix ‫ )݂(ܪ‬becomes not
valid. So, in this case the problem is more difficult since the mixing matrix ‫)݂(ܪ‬
becomes ill-conditioned matrix which means the mixing matrix ‫ )݂(ܪ‬is not left pseudoinvertible. However, a lot of work has been done in order to perform a good separation
process in the case of the instantaouse mixture [1]. However, there are not so many works
that has been done on the underdetermined case in the convolutive mixture [1], [38]. In
the literature, the well-known algorithm of such method is the DUET algorithm which is
proposed by Rickard et al [2], [3], [38] and [117]. The DUET algorithm assumes a
241

specific delayed model that only works for audio signals with small delay, e.g. hearing
aid etc. The DUET algorithm performs the separation processes using the two sensors in
order to compute two parameters amplitude differences and phase differences between
the source signals. Several papers were published to develop and enhance the
performance of the DUET algorithm in [3], but their performance in real reverberant
environment is still limited. One of the promising approaches in this field is to convert
some of the underdetermine cases of the instantounous mixture into the frequency
domain in order to tackle the underdetermined problem in the convoluvtive mixture as
presented in the literature [1], [3], [38].

242

APPENDIX

243

Convex Cauchy–Schwarz Divergence and its
Derivative
Assume the demixed signals Y୲ = WX ୲ where the mth component	y୫୲ = w୫ X ୲ .

Now express the CCS-DIV as a contrast function with a convexity parameter α	as
follows:
Dେୌ (Y୲ , y୫୲ , α) = log

∬ f ଶ (p(Y୲ ))dyଵ dyଶ 	 ∙ 	 ∬ f ଶ (∏୒
ଵ p(y୫୲ ))dyଵ dyଶ 	
୒
[∬ f(p(Y୲ )) ∙ f(∏ଵ p(y୫୲ )) dyଵ dyଶ ]ଶ

By using the Lebsegue measure [5] to approximate the integral with respect to the
joint distribution of		Y୲ = {yଵ , yଶ , … , y୒ }, the contrast function becomes
∑୘ଵ f ଶ (p(WX ୲ )) ∙ ∑୘ଵ f ଶ (∏୒
ଵ (p(w୫୲ X ୲ )))
Dେୌ (Y୲ , y୫୲ , α) = log
ଶ
[∑୘ଵ f(p(WX ୲ )) ∙ f(∏୒
ଵ (p(w୫୲ X ୲ )))]
For simplicity, let us assume
Vଵ =

Vଶ =

୘

෍ f ଶ (Y୲ )	,			 Vଵᇱ
୲ୀଵ

୘

෍ f ଶ (y୫୲ )		,				 Vଶᇱ
୲ୀଵ

୘

= ෍ 2f(Y୲ )f ᇱ (Y୲ )Y୲ᇱ
୲ୀଵ

୘

ᇱ
= ෍ 2f(y୫୲ )f ᇱ (y୫୲ )y୫୲
୲ୀଵ

୘

Vଷ = ෍ f(Y୲ ) f(y୫୲ )	,			
୲ୀଵ

Vଷᇱ

୘

= ෍f
୲ୀଵ

ᇱ

(Y୲ )f(y୫୲ )Y୲ᇱ

୘

ᇱ
+ ෍ f(Y୲ )f ᇱ (y୫୲ )y୫୲
୲ୀଵ

and the convex function is

244

f(t) =

ଵା஑
4
1−α 1+α
൤
+
t−t ଶ ൨
ଶ
1−α
2
2

f ᇱ (t) =

2
஑ିଵ
ቂ1 − t ൗଶ ቃ
1−α

then,
୑

Y୲ = p(WX ୲ )and	y୫୲ = ෑ p(w୫ X ୲ )
୫ୀଵ

Y୲ᇱ =

where		

ப ୢୣ୲(୛)
ப୵ౣౢ

∂Y୲
p(X ୲ )
∂ det(W)
=−
∙
∙ sign(det(W),	
|det(W)|ଶ
∂w୫୪
∂w୫୪

= W୫୪ ;

ᇱ
y୫୲

୑

∂y୫୲
∂p(w୬ X ୲ )
=
= ቎ෑ p൫w୨ X ୲ ൯቏
∙ x .		
∂w୫୪
∂(w୬ X ୲ ) ୪
୨ୀ୫

where	x୪ 	denotes	the	l୲୦ 	entry	of	X୲ .

Thus, we re-write the CCS-DIV as
Dେୌ (Y୲ , y୫୲ , α) = log

Vଵ ∙ Vଶ
[Vଷ ]ଶ

and its derivative becomes
∂Dେୌ (Y୲ , y୫୲ , α)
Vଷଶ Vଵᇱ Vଶ Vଷ + Vଵ Vଶᇱ Vଷ − 2Vଵ Vଶ Vଷ Vଷᇱ
=
∙
∂w୫୪
Vଵ ∙ Vଶ
Vଷସ
∂Dେୌ (Y୲ , y୫୲ , α) Vଵᇱ Vଶ + Vଵ Vଶᇱ − 2Vଵ Vଶ Vଷᇱ
=
∂w୫୪
Vଵ Vଶ Vଷ

245

BIBLIOGRAPHY

246

BIBLIOGRAPHY
[1] P. Comon, C. Jutten (eds.), “Handbook of Blind Source Separation Independent
Component Analysis and Applications.” (Academic Press, Oxford, 2010).
[2] A. Cichocki, S.-I. Amari, Adaptive Blind Signal and Image Processing: Learning
Algorithms and Applications, John Wiley & Sons, Inc., 2002.
[3] M. S. Pedersen, J. Larsen, U. Kjems, and L. C. Parra, “A survey of convolutive
blind source separation methods,” in Springer Handbook of Speech Processing. New
York: Springer, 2007.
[4] A. Cichocki, R. Zdunek, S.-I. Amari, Nonnegative matrix and tensor
factorizations: applications to exploratory multi-way analysis and Blind Source
Separation, John Wiley & Sons, Inc., 2009.
[5] S. Boyd and L. Vandenberghe Convex Optimization, 2004: Cambridge Univ.
Press.
[6] C. E. Shannon "A mathematical theory of communication,” Bell Syst. Tech. J.,
vol. 27, pp. 379 –423, 1948.
[7] P. COMON, ``Independent Component Analysis, a new concept,'' Signal
Processing, Elsevier, 36(3):287--314, April 1994PDF Special issue on Higher-Order
Statistics.
[8] M. Zibulevsky, “Blind source separation with relative Newton method,” in
Proc. ICA 2003, 2003, pp. 897–902.
[9] M. Anderson,
T. Adali, and X.-L. Li, “Joint blind source separation
performance analysis,” IEEE Trans. Signal Process. vol. 60, no. 4, pp. 1672–1683, Apr.
2012.
[10] X.-L. Li and X.-D. Zhang, “Nonorthogonal joint diagonalization free of
degenerate solution,” IEEE Trans. Signal Process. , vol. 55, no. 51, pp. 1803–1814, 2007.
[11] V. Zarzoso and P. Comon, “Robust Independent Component Analysis by
Iterative Maximization of the Kurtosis Contrast with Algebraic Optimal Step Size,” IEEE
Transactions on Neural Networks, vol. 21, no. 2, pp. 248–261, 2010.
[12] E. Oja and Z. Yuan "The fastICA algorithm revisited: Convergence-analysis",
IEEE Trans. Neural Netw., vol. 17, no. 6, pp.1370 -1381 2006
[13] A.-J. van der Veen and A. Paulraj, “An analytical constant modulus algorithm,”
IEEE Trans. Signal Process., vol. 44, pp. 1136– 1155, 1996.

247

[14] Hyvarinen. A, "Fast and robust fixed-point algorithm for independent component
analysis". IEEE Transactions on Neural Network, vol. 10, no. 3, pp. 626–634, May 1999.
[15] Hyvarinen. A. E.Oja, "A fast fixed-point algorithm for independent component
analysis,” Neural Computation, vol. 9, no. 7, pp. 1483–1492, 1997.
[16] F. Cardoso. On the performance of orthogonal source separation algorithms. In
Proc. EUSIPCO, pages 776–779, 1994a.
[17] Jean-François Cardoso, “High-order contrasts for independent component
analysis,” Neural Computation, vol. 11, no 1, pp. 157–192, Jan. 1999.
[18] A. Bell, T. J. Sejnowski, “An information-maximization approach to blind
separation and blind deconvolution”, Neural Computation, 7:1129-1159, 1995.
[19] L, Xianhua; Cardoso, J-Francois; Randall, Robert B. " Very fast blind source
separation by signal to noise ratio based stopping threshold for the SHIBBS/SJAD
algorithm", Mechanical Systems and Signal Processing, Volume 24, Issue 7, p. 20962103. 2010.
[20] Jen-Tzung Chien, Hsin-Lung Hsieh, "Convex Divergence ICA for Blind Source
Separation,” Audio, Speech, and Language Processing, IEEE Transactions on, On
page(s): 302–313 Volume: 20, Issue: 1, Jan. 2012
[21] D. Xu, J. C. Principe, J. Fisher III and H.-C. Wu "A novel measure for
independent component analysis (ICA),” Proc. Int. Conf. Acoust., Speech, Signal
Process., vol. 2, pp. 1161–1164, 1998
[22] R. Boscolo, H. Pan and V. P. Roychowdhury "Independent component analysis
based on nonparametric density estimation,” IEEE Trans. Neural Netw., vol. 15, no. 1,
pp. 55–65, 2004.
[23] Y. Chen "Blind separation using convex function,” IEEE Trans. Signal Process.,
vol. 53, no. 6, pp. 2027–2035, 2005.
[24] J.-T. Chien and B.-C. Chen "A new independent component analysis for speech
recognition and separation,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 4,
pp. 1245–1254, 2006
[25] Y. Matsuyama, N. Katsumata, Y. Suzuki and S. Imahara "The $\alpha$-ICA
algorithm,” Proc. Int. Workshop Ind. Compon. Anal. Blind Signal Separat., pp. 297 –302,
2000
[26] A. Cichocki, R. Zdunek, and S. Amari, "Csiszar's Divergences for Non-Negative
Matrix Factorization: Family of New Algorithms,” 6th International Conference on
Independent Component Analysis and Blind Signal Separation, Charleston SC, USA,
March 5–8, 2006 Springer LNCS 3889, pp. 32–39.

248

[27] E. Moulines, J.-F. Cardoso, and E. Gassiat, “Maximum likelihood for blind
separation and deconvolution of noisy signals uses mixture models,” in Proc. Int. Conf.
Acoust. Speech Signal Process. Apr. 1997, vol. 5, pp. 3617–3620.
[28] D. T. Pham and P. Garat, “Blind separation of mixture of independent sources
through a quasi-maximum likelihood approach,” IEEE Trans. Signal Process, vol. 45, no.
7, pp. 1712–1725, Jul. 1997.
[29] B. A. Pearlmutter and L. C. Parra, “Maximum likelihood blind source separation:
A context-sensitive generalization of ICA,” Adv. Neural Inf. Process. Syst., pp. 613–619,
Dec. 1996.
[30] Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small
bias against heavy contamination. J. Multivariate Anal. 99 2053–2081.
[31] J. Zhang "Divergence function, duality, and convex analysis,” Neural Comput.,
vol. 16, pp. 159 –195, 2004.
[32] J. Lin "Divergence measures based on the Shannon entropy,” IEEE Trans. Inf.
Theory, vol. 37, no. 1, pp. 145 –151, 1991.
[33] S.C.Douglas, X.Sun, "Convolutive blind separation of speech mixtures using the
natural gradient," Speech commun, vol. 39. pp. 65–78, (2002).
[34] Yoshioka, Takuya Nakatani, Tomohiro Miyoshi, Masato Okuno, Hiroshi G.
"Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization,” IEEE
Transactions on Audio Speech and Language Processing, Volume. 19, Issue. 1, pp. 69,
2011.
[35] A. Cichocki, R. Zdunek, S. Amari, G. Hori and K. Umeno, “Blind Signal
Separation Method and System Using Modular and Hierarchical-Multilayer Processing
for Blind Multidimensional Decomposition, Identification, Separation or Extraction,”
Patent pending, No. 2006-124167, RIKEN, Japan, March 2006.
[36] Intae Lee, Taesu Kim, and Te-Won Lee. Independent vector analysis for
convolutive blind speech separation. In Blind Speech Separation. Springer, September
2007.
[37] Solvang, Hiroko Kato Nagahara, Yuichi Araki, Shoko Sawada, Hiroshi Makino,
Shoji "Frequency-Domain Pearson Distribution Approach for Independent Component
Analysis (FD-Pearson-ICA) in Blind Source Separation,” IEEE Transactions on Audio
Speech and Language Processing, Volume 17, Issue. 4, pp. 639, 2009.
[38] H. Sawada, S. Araki, and S. Makino. Frequency-domain blind source separation.
In Blind Speech Separation. Springer, September 2007.

249

[39] D. Schobben, K. Torkkola, and P. Smaragdis, “Evaluation of blind signal
separation methods,” in Proc. Int. Workshop Ind. Compon. Anal. Blind Signal Separation
1999, pp. 261–266.
[40] Saruwatari, H. Kawamura, T. Nishikawa, T. Lee, A. Shikano, K. "Blind source
separation based on a fast-convergence algorithm combining ICA and beamforming,”
IEEE Transactions on Audio Speech and Language Processing, Volume 14, Issue 2, pp.
666, 2006.
[41] Low, S.Y. Nordholm, S. Togneri, R. "Convolutive Blind Signal Separation With
Post-Processing,” IEEE Transactions on Speech and Audio Processing, Volume 12, Issue
5, pp. 539, 2004.
[42] Takahashi, Yu. Saruwatari, H., Shikano, K., "Real-time implementation of blind
spatial subtraction array for hands-free robot spoken dialogue system,” Intelligent Robots
and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, On page(s):
1687–1692.
[43] Yueyue Na, Jian Yu, "Kernel and spectral methods for solving the permutation
problem in frequency domain BSS,” Neural Networks (IJCNN), The 2012 International
Joint Conference on, On page(s): 1 – 8
[44] A. Chen, “Fast kernel density independent component analysis,” in Proc. 6th Int.
Conf. ICA BSS, 2006, vol. 3889, pp. 24–31.
[45] F. Nest, P. Svaizer and M. Omologo “Convolutive BSS of short mixturesby ICA
recursively regularized across frequencies", IEEETrans. Audio, Speech, Lang. Process.,
vol. 19, no. 3, pp.624 -639 2011
[46] M. Triki and D. T. M. Slock “Iterated delay and predictequalization for blind
speech dereverberation", Proc. Int. Workshop Acoust. Echo, Noise Contr., 2006
[47] S. C. Douglas and M. Gupta "Scaled natural gradient algorithms for
instantaneous and convolutive blind source separation", Proc. ICASSP, vol. II, pp.637 640 2007
[48] Y. S. Choi , H. C. Shin and W. J. Song "Robust regularization for normalized
LMS algorithms", IEEE Trans. Circuits Syst., vol. 53, no. 8, pp.627 -631 2006
[49] T. S. Wada and B.-H. Juang "Acoustic echo cancellation based on independent
component analysis and integrated residual echo enhancement", Proc. WASPAA,
pp.205 -208 2009
[50] H. Sawada , S. Araki , R. Mukai and S. Makino "Blind extraction of dominant
target sources using ICA and time–frequency masking", IEEE Trans. Audio, Speech,
Lang. Process., vol. 14, no. 6, pp.2165 -2173 2006

250

[51] P. Bofill and M. Zibulevsky "Underdetermined blind source separation using
sparse representations", Signal Process., vol. 81, no. 11, pp.2353 -2362 2001
[52] P. Georgiev , F. Theis and A. Cichocki "Sparse component analysis and blind
source separation of underdetermined mixtures", IEEE Trans. Neural Netw., vol. 16,
no. 4, pp.992 -996 2005
[53] S. Araki , S. Makino , T. Nishikawa and H. Saruwatari "Fundamental limitation
of frequency domain blind source separation for convolutive mixture of speech", Proc.
ICASSP, pp.2737 -2740 2001
[54] S. Araki , S. Makino , Y. Hinamoto , R. Mukai , T. Nishikawa and H. Saruwatari
"Equivalence between frequency domain blind source separation and frequency domain
adaptive beamforming for convolutive mixtures", EURASIP J. Appl. Signal Process.,
vol. 2003, pp.1157 -1166 2003
[55] E. Robledo-Arnuncio , H. Sawada and S. Makino "Frequency domain blind
source separation of a reduced amount of data using frequency normalization", Proc.
ICASSP, vol. 5, pp.837 -840 2006
[56] J. Reiss, N. Mitianoudis, and M. Sandler. Computation of generalized mutual
information from multichannel audio data. In 110th audio engineering society intern.
Conf. Amsterdam, Netherlands, 2001.
[57] N. Murata , S. Ikeda and A. Ziehe "An approach to blind source separation based
on temporal structure of speech signals", Neurocomputing, vol. 41, no. 1&ndash;4,
pp.1 -24 2001
[58] L. De Lathauwer and A. de Baynast "Blind deconvolution of DS-CDMA signals
by means of decomposition in rank-(1,L,L) terms", IEEE Trans. Signal Process., vol.
56, no. 4, pp.1562 -1571 2008
[59] D Nion and L De Lathauwer "A block component model-based blind DS-CDMA
receiver" IEEE Trans. Signal Process. 56 5567-5579, 2008
[60] D. Nion , K. N. Mokios , N. D. Sidiropoulos and A. Potamianos "Batch and
adaptive PARAFAC-based blind separation of convolutive speech mixtures", IEEE
Trans. Audio, Speech Lang. Process., vol. 18, no. 6, pp.1193 -1207 2010
[61] K. Rahbar and J.-P. Reilly "A frequency domain method for blind source
separation of convolutive audio mixtures", IEEE Trans. Speech Audio Process., vol. 13,
no.
5,
pp.832
-844
2005
[online] Available:
http://www.ece.mcmaster.ca/~reilly/kamran/id18.htm
[62] L. Parra and C. Spence "Convolutive blind separation of non-stationary
sources", IEEE Trans. Speech Audio Process., vol. 8, no. 3, pp.320 -327 2000
[online] Available: http://ida.first.fhg.de/~harmeli/download/download_convbss.html

251

[63] Dinh-Tuan Pham, Zaher El-Chami, Alexandre Guerin, and Christine
Serviere. Modelling the short time fourier transform ratio and application to
underdetermined audio source separation.
In Tulay Adali, Christian Jutten, Jo ao
Marcos Travassos Romano, and Allan Kardec Barros, editors, Independent Component
Analysis and Signal Separation, pages 98–105. Springer, 2009.
[64] C. Serviegrave;re and D.-T. Pham ”Permutation correction in the frequency
domain in blind separation of speech mixtures", EURASIP J. Appl. Signal Process., no.
1, pp.1 -16 2006
[65] N. Mitianoudis and M. Davies”Audio source separation of convolutive
mixtures", IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp.489 -497 2003
[66] A. Westner and J. V. M. Bove ”Blind separation of real world audio signals using
overdetermined
mixtures",
Proc.
ICA\'99,
1999
[online] Available:
http://sound.media.mit.edu/ica-bench
[67] D.-T. Pham, C. Servi&egrave;re and H. Boumaraf "Blind separation of
convolutive audio mixtures using nonstationarity", Proc. Int. Workshop Indep. Compon.
Anal. Blind Signal Separation (ICA\'03), pp.981 -986 2003
[68] L. Xianhua, “Blind Source Separation methods and their mechanical
applications”, PhD thesis, University of New South Wales, august, 2006.
[69] H. Sawada , R. Mukai , S. Araki and S. Makino "A robust and precise method for
solving the permutation problem of frequency-domain blind source separation", IEEE
Trans. Speech Audio Process., vol. 12, no. 5, pp.530 -538 2004
[70] H. Sawada , S. Araki and S. Makino "MLSP 2007 data analysis competition:
Frequency-domain blind source separation for convolutive mixtures of speech/audio
signals", Proc. MLSP\'07, pp.45 -50 2007
[71] N. Mitianoudis, “Audio Source Separation using Independent Component
Analysis”, PhD thesis, University of London, April, 2004.
[72] F. Nesta, “Techniques for robust source separation and localization in adverse
environment”, PhD thesis, University of Trento, April, 2010.
[73] Lai-Wan Chan, Siu-Ming Cha, “Selection of Independent Factor Model In
Finance”, In Pros. Int. Workshop on Independent Component Analysis ICA and Blind
Source Separation, San Diageo USA, 2001.
[74] X. D. Wang and H. V. Poor, Wireless Communication Systems: Advanced
Techniques for Signal Reception. Prentice Hall, 2004.
[75] K. Waheed and F. Salem, “Blind information-theoretic multiuser detection
algorithms for DS-CDMA and WCDMA downlink systems,” IEEE Trans. Neural Netw.,
vol. 16, no. 4, pp. 937–948, Jul. 2005.
252

[76] L. Hu, X. Zhou, L. Zhang, "Blind Multiuser Detection Based on Tikhonov
Regularization", IEEE Communications Letters, vol. 15, no. 5, May 2011.
[77] G. T. Raja and O. Reddy, “Improved ICA based multi-user detection of DSCDMA,” ICETET, 2008, pp.238-241
[78] X. G. Doukopoulos and G. V. Moustakides "Adaptive power techniques for blind
channel estimation in CDMA systems", IEEE Trans. Signal Process., vol. 53, no. 3,
pp.1110 -1120 2005
[79] S. Cui , M. Kisialiou , Z.-Q. Luo and Z. Ding "Robust blind multiuser detection
against signature waveform mismatch based on second-order cone programming", IEEE
Trans. Commun., vol. 4, no. 4, pp.1285 -1291 2005
[80] Tikhonov AN, Arsenin VI. Solutions of ill-posed problems. Washington/New
York; Winston, distributed by Halsted Press. 1977.
[81] G. B. G. Zhang and L. Zhang "Group-blind intersymbol multiuser detection for
downlink cdma with multipath", IEEE transaction on Wireless Communications, vol. 4,
no. 2, pp.434 -443 2005
[82] R. C. de Lamare , M. Haardt and R. Sampaio-Neto "Blind adaptive constrained
reduced-rank parameter estimation based on constant modulus design for CDMA
interference suppression", IEEE Trans. Signal Process., vol. 56, no. 6, pp.2470 -2482
2008
[83] Z. Albataineh, F. Salem, “ robust blind multiuser detection DS-CDMA algorithm
using simplified fourth order cumulant matrices” IEEE International Symposium on
Circuit and System (ISCAS), pp. 1946-1949, 2013.
[84] Z. Albataineh, F. Salem, “New blind multiuser detection DS-CDMA algorithm
based on Extension of Efficient FAST Independent Component Analysis (EF-ICA).” 4th
International Conference on Intelligent Systems Modelling & Simulation (ISMS), pp.
543-548, 2013.
[85] W. N. Yuan , Y. F. Tu and P. Z. Fan "Optimal training sequences for cyclicprefix-based single-carrier multi-antenna systems with space-time block-coding", IEEE
Trans. Wireless Commun., vol. 7, no. 11, pp.4047 -4050 2008
[86] N. D. Sidiropoulos, G. B. Giannakis, and R. Bro, “Blind PARAFAC receivers for
DS-CDMA systems,” IEEE Trans. Signal Process. vol. 48, pp. 810–823, 2000.
[87] X. Wand and H. V. Poor, “Blind multiuser detection: A subspace approach",
IEEE Trans. Inform. Theory, vol. 44, pp.677 -690 1998
[88] T. Huovinen, “Independent Component Analysis in DS-CDMA Multiuser
Detection and Interference Cancellation”, PhD thesis, Tampere University of
Technology, 2008.
253

[89] X. Wang and A. Host-Madsen, "Group-blind multiuser detection for uplink
CDMA", IEEE J. Sel. Areas Commun., vol. 17, no. 11, pp.1971 -1984 1999
[90] P-Y. Qiu, Z.-T. Huang, W.-L. Jiang, C.Zhang. Improved Blind Spreading
Sequence Estimation Algorithm for the Direct Sequence Spread Spectrum Signals [J].
IET Signal Processing, 2008, 2(2):139-146.
[91] X. X. Zhang and T. S. Qiu, "Blind multiuser detection based on improved
Infomax and Fast ICA," 2010, 2nd International Conference on Advanced Computer
Control, Shenyang, 2010, pp. 476-479.
[92] Y. Washizawa , Y. Yamashita , T. Tanaka and A. Cichocki "Blind extraction of
global signal from multi-channel noisy observations", IEEE Trans. Neural Netw., vol. 21,
no. 9, pp.1472 -1481 2010
[93] M. K. Tsatsanis and Z. Xu "Performance analysis of minimum variance CDMA
receivers", IEEE Trans. Signal Process., vol. 46, no. 11, pp.3014 -3022 1998
[94] A. Bayati, S. Prakriya and S. Prasad, “Semi-blind space-time receiver for
multiuser detection of DS/CDMA signals in multipath channels”, The Institution of
Engineering and Technology, vol. 153, no. 3, 2006.
[95] F. M. Salam and G. Erten, “The state space framework for blind dynamic signal
extraction and recovery", Proc. IEEE Int. Symp. Circuits and Systems (ISCAS', 99), vol.
5, pp.66 -69 1999.
[96] Koivisto T, Koivunen V. Blind Despreading of Short-Code DS-CDMA Signals in
Asynchronous Multi-user Systems [J]. Signal Processing, 2007, 11(87): 2560-2568.
[97] E. Ollila "The deflation-based fastICA estimator: Statistical analysis revisited",
IEEE Trans. Signal Process., vol. 58, no. 3, pp.1527 -1541 2010
[98] G. Zhou , Z. Yang , S. Xie and J. M. Yang "Online blind source separation using
incremental nonnegative matrix factorization with volume constraint", IEEE Trans.
Neural Netw., vol. 22, no. 4, pp.550 -560 2011
[99] H. Du , H. Qi and X. Wang "Comparative study of VLSI solutions to
independent component analysis", IEEE Trans. Ind. Electron., vol. 54, no. 1, pp.548 558 2007
[100] K. S. Cho and S. Y. Lee ”Implementation of InfoMax ICA algorithm with analog
CMOS circuits", Proc. Int. Workshop Independent Compon. Anal. Blind Signal
Separat., pp.70 -73 2001
[101] M. H. Cohen and A. G. Andreou "Analog CMOS integration and
experimentation with an autoadaptive independent component analyzer", IEEE Trans.
Circuits Syst. II-Anal. Digital Signal Process., vol. 42, no. 2, pp.65 -77 1995

254

[102] A. Celik , M. Stanacevic and G. Cauwenberghs "Mixed-signal real-time adaptive
blind source separation", Proc. IEEE Int. Symp. Circuits Syst., pp.760 -763 2004
[103] C. M. Kim , H. M. Park , T. Kim , Y. K. Choi and S. Y. Lee "FPGA
implementation of ICA algorithm for blind signal separation and adaptive noise
canceling", IEEE Trans. Neural Netw., vol. 14, no. 5, pp.1038 -1046 2003
[104] H. Du and H. Qi "A reconfigurable FPGA system for parallel independent
component analysis", EURASIP J. Embedded Syst., vol. 2006, no. 23025, pp.1 -12
2006
[105] H. Du , H. Qi and G. D. Peterson "Parallel ICA and its hardware implementation
in hyperspectral image analysis", Proc. SPIE, vol. 5439, pp.74 -83 2004
[106] C. Charoensak and F. Sattar”A single-chip FPGA design for real-time ICA-based
blind source separation algorithm", Proc. IEEE Int. Symp. Circuits Syst., vol. 6,
pp.5822 -5825 2005
[107] K. K. Shyu , M. H. Lee , Y. T. Wu and P. L. Lee "Implementation of pipelined
fastICA on FPGA for real-time blind source separation", IEEE Trans. Neural Netw.,
vol. 19, no. 6, pp.958 -970 2008
[108] M. Kim , K. Ichige and H. Arai "Design of Jacobi EVD processor based on
CORDIC for DOA estimation with MUSIC algorithm", Proc. PIMRC Conf., vol. 1,
pp.120 -124 2002
[109] A. Nordin , C. Hsu and H. Szu "Design of FPGA ICA for hyperspectral imaging
processing", Proc. SPIE, Wavelet Appl. VIII, vol. 4391, pp.444 -454 2001
[110] Yi-Hsin Shih; Tsan-Jieh Chen; Chia-Hsiang Yang; Herming Chiueh "Hardwareefficient EVD processor architecture in FastICA for epileptic seizure detection", Signal
& Information Processing Association Annual Summit and Conference (APSIPA ASC),
2012 Asia-Pacific, On page(s): 1 - 4, Volume: Issue: , 3-6 Dec. 2012
[111] Wei-Yeh Shih; Kuan-Ju Huang; Chiu-Kuo Chen; Wai-Chi Fang; Cauwenberghs,
G.; Tzyy-Ping Jung "An effective chip implementation of a real-time eight-channel EEG
signal processor based on on-line recursive ICA algorithm", Biomedical Circuits and
Systems Conference (BioCAS), 2012 IEEE, On page(s): 192 – 195
[112] Lan-Da Van, Di-You Wu, and Chien-Shiun Chen, "Energy-Efficient FastICA
Implementation for Biomedical Signal Separation," IEEE Trans. Neural Networks,
vol.22, no.11, pp.1809-1822, Nov. 2011.
[113] F. Sattar and C. Charayaphan "Low-cost design and implementation of an ICAbased blind source separation algorithm", Proc. 15th Annu. IEEE Int. ASIC/SOC Conf.,
pp.15 -19 2002.

255

[114] Muhammad Tahir AKHTAR, Tzyy-Ping Jung, Scott Makeigy, and Gert
Cauwenberghs, "Recursive Independent Component Analysis for online Blind Source
Separation," in Proc. IEEE Int. Symp. on Circuits and Systems, May 20-23, 2012.
[115] Rodriguez-Andina, J.J.; Moure, M.J.; Valdes, M.D. "Features, Design Tools, and
Application Domains of FPGAs", Industrial Electronics, IEEE Transactions on, On
page(s): 1810 - 1823 Volume: 54, Issue: 4, Aug. 2007
[116] Lopez, O.; Alvarez, J.; Doval-Gandoy, J.; Freijedo, F.D.; Nogueiras, A.; Lago, A.;
Penalver, C.M. "Comparison of the FPGA Implementation of Two Multilevel Space
Vector PWM Algorithms", Industrial Electronics, IEEE Transactions on, On page(s):
1537 - 1547 Volume: 55, Issue: 4, April 2008
[117] A. Jourjine, S. Rickard, and O. Yilmaz. Blind Separation of disjoint orthogonal
signals: Demixing n sources from 2 Mixtures. In proc. ICASSP’00, Pages 2985 – 2988,
Istanbul Turkey, 2000.
[118] R. Everson and S. Roberts "Blind source separation for non-stationary mixing",
J. VLSI Signal Process., vol. 26, no. 1&ndash;2, pp.15 -23 2000
[119] S. M. Naqvi , Y. Zhang and J. A. Chambers "Multimodal blind source separation
for moving sources", Proc. Int. Conf. Acoust., Speech Signal Process., pp.125 -128 2009
[120] J.-T. Chien and H.-L. Hsieh “Nonstationary source separation using sequential
and variational Bayesian learning", IEEE Trans. Neural Netw. Learn, Syst., vol. 24, no.
5, pp.681 -694 2013
[121] D. J. C. MacKay “Probable networks and plausible predictions—a review of
practical Bayesian methods for supervised neural networks", Netw., Comput. Neural
Syst., vol. 6, no. 3, pp.469 -505 1995
[122] M. E. Tipping “Sparse Bayesian learning and the relevance vector machine", J.
Mach. Learn. Res., vol. 1, pp.211 -244 2001
[123] R. A. Choudrey and S. J. Roberts “Bayesian ICA with hidden Markov sources",
Proc. Int. Workshop Independ. Compon. Anal. Blind Signal Separat., pp.809 -814 2003
[124] Q. Huang , J. Yang and Y. Zhou "Bayesian nonstationary source separation",
Neurocomputing, vol. 71, no. 7&ndash;9, pp.1714 -1729 2008
[125] Z. Koldovsky , J. Malek , P. Tichavsky , Y. Deville and S. Hosseini "Blind
separation of piecewise stationary non-Gaussian sources", Signal Process., vol. 89, no.
12, pp.2570 -2584 2009
[126] E. Vincent, C. Fevotte, and R. Gribonval. Performance measurement in blind
audio source separation.
IEEE Trans. Audio, Speech and Language Processing,
14(4):1462–1469, 2006.

256

[127] E. Vincent, S. Araki, and P. Boﬁll. The 2008 signal separation evaluation
campaign: A community-based approach to large-scale evaluation. In ICA ’09:
Proceedings of the 8th International Conference on Independent Component Analysis and
Signal Separation, pages 734–741, Berlin, Heidelberg, 2009. Springer-Verlag.
[128] F. Nesta, P. Svaizer, and M. Omologo. Robust two-channel tdoa estimation for
multiple speaker localization by using recursive ICA and a state coherence transform.
ICASSP, Taipei, Taiwan, 2009.
[129] K. Matsuoka and S. Nakashima. Minimal distortion principle for blind source
separation. In Proceedings of International Symposium on ICA and Blind Signal
Separation, San Diego, CA, USA, December 2001.
[130] F. Nesta , P. Svaizer and M. Omologo "Convolutive BSS of short mixturesby
ICA recursively regularized across frequencies", IEEETrans. Audio, Speech, Lang.
Process., vol. 19, no. 3, pp.624 -639 2011 http://bssnesta.webatu.com/testhscma.html
[131] Herbert Buchner, Robert Aichner, and Walter Kellermann.
TRINICON: A
versatile framework for multichannel blind signal processing.
In Proceedings of IEEE
International Conference on Acoustics, Speech and Signal Processing, volume 3, pages
889–892, Montreal, Canada, May 17-21 2004.
[132] Robert Aichner, Herbert Buchner, Fei Yan, and Walter Kellermann. A real-time
blind source separation scheme and its application to reverberant and noisy acoustic
environments. Signal Process. 86(6):1260–1277, 2006.
[133] Z. Koldovsky and P. Tichavsky. Time-domain blind audio source separation
using advanced component clustering and reconstruction. In Proceedings of HSCMA,
Trento, Italy, May 2008.
[134] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,
Numerical Recipes in C. The Art of Scientiﬁc Computing, 2nd. Cambridge, UK:
Cambridge University Press, 1992.
[135] K. Whaeed, “Blind Source Recovery: Theoretical Formulations and Application
to CDMA Communication Systems”, PhD thesis, Michigan State University, 2003.
[136] Gharbi, “Blind Source of Unknown Sources in dynamic Environments:
Theoretical Formulation and Micro-Electronic Implementation”, PhD thesis, Michigan
State University, 1996.
[137] T. Serrano-Gotarredona and B. Linares-Barranco "CMOS transistor mismatch
model valid from weak to strong inversion", Proc. Conf. Eur. Solid-State Circuits,
pp.627 -630 2003
[138] Silicon Moore\'s Law,
2004 :Intel Corporation
http://www.intel.com/research/silicon/ mooreslaw.htm

257

[online] Available:

[139] III Frost, O.L. An algorithm for linearly constrained adaptive array processing.
Proceedings of the IEEE, 60(8):926–935, Aug. 1972.
[140] Z. Albataineh, F. Salem, “blind multiuser detection DS-CDMA algorithm using
H-DE and ICA algorithms” 4th International Conference on Intelligent Systems
Modelling & Simulation (ISMS), pp. 569-574, 2013.
[141] http://www.bloomberg.com/slideshow/2013-09-19/countries-with-the-most-4gmobile-users.html#slide1
[142] http://gigaom.com/2013/09/20/mapping-out-the-worlds-lte-coverage-its-in-fewerplaces-than-you-think/

258