THESIS

MICHIGANS STTA E UNNE ESE

IIIII I IIIIIII IIIIIIIII I IIIII IIIIIIIIII

1293 01050 3211

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is to certify that the

dissertation entitled

BLIND SEPARATION OF UNKNOWN SOURCES IN DYNAMIC
ENVIRONMENTS: THEORETICAL FORMULATION AND
MICRO-ELECTRONIC IMPLEMENTATION

presented by

Ammar B. Gharbi

has been accepted towards fulﬁllment
of the requirements for

Ph. D. degree in Electrical Engineering

    

 

Major professor
DateWé

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

I

 

LIBRARY

Michigan State
University

 

 

 

PLACE II RETURN BOX to removothb chockoui from your record.

TO AVOID FINES Mum on or baton «to due.
DATE DUE DATE DUE DATE DUE

 

 

00 1 4 2000

 

‘fL
—i

 

 

 

 

 

 

 

 

 

 

 

 

#——

usu iaAnAfﬂnnatlvo ActionlEqual Opportunity imtltuion #1
W

 

 

 

m1

 

Blind Separation of Unknown Sources in Dynamic
Environments: Theoretical Formulation and

Micro-Electronic Implementation

By

Ammar B. Gharbz'

A DISSERTATION
Submitted to
Michigan State University

in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering

1996

ABSTRACT

Blind Separation of Unknown Sources in Dynamic
Environments: Theoretical Formulation and

Micro-Electronic Implementation
By

Ammar B. Gharbz'

Novel learning algorithms for the problem of blind separation of signals in static
and dynamic environments are developed. The derived learning rules of these algo-
rithms are based on the minimization of mutual information functionals. A novel
learning rule is initially derived from the decorrelation of the output signals. This
rule is then enhanced by including higher order terms to test for the independence of
signals.

Optimization theory is utilized to derive a general framework to develop an update
rule for the parameters of a linear dynamical network model. Higher order statistics
are also explored to develop an approximate expression of the mutual information
which depends on the unknown probability density functions of the output signals.

The modeling of the environment is considered as an important factor in the de-
velopment of the update law. Keeping the analog implementation of the algorithms

in mind, state space realizations of the network which minimize the number of pa-

rameters are considered. Such choice of realization would result in a reduction of the
complexity of the network and also the corresponding circuit blocks.

Computer simulation are conducted to evaluate the performance of the devel-
oped algorithm. A circuit implementation of one of the developed algorithms for the
dynamic case is described and its performance is veriﬁed using the PSPICE circuit
simulator.

In summary, the main contributions of this thesis are the development of:

1. a novel update law for the static environment case based on the decorrelation

of the output signals and its invariants;

2. a novel energy function that characterizes the statistical independence of sig-
nals using higher order statistics for both the feedforward and the feedback

structures;

3. a novel framework to derive the update laws for the parameters of a dynamic

network using optimization theory and the calculus of variations; and

4. a circuit realization of one of the developed algorithms for a dynamic feedback

network.

Copyright by
Ammar B. Gharbi
1996

To my wife Nancy, my parents and my family

ACKNOWLEDGEMENTS

I would like to express my sincere thanks and appreciation to professor Fathi Salam
for his guidance, support, and inspiration throughout the course of this research.
His valuable, kind consideration, and understanding have been an incentive for the
completion of this dissertation.

Sincere gratitude is extended to the members of my guidance committee: Profes-
sor Hassan Khalil, Professor Don Reinhart and professor Charles Maccluer for their
valuable assistance and service on my committee.

I would like to express my sincere thanks and gratitude to my wife Nancy for her
patience, love, affection, support, encouragement and sacriﬁce during every step that
I made towards the completion of this research.

Last, but not least, I would like to express my sincere thanks and appreciation to
my parents, and every member of my family for their patience, support and sacriﬁce
awaiting for this day.

vi

TABLE OF CONTENTS

LIST OF TABLES x
' LIST OF FIGURES xi
1 Introduction 1
2 Literature Review 5
2.1 Neuromemitic Algorithm ......................... 8
2.1.1 Derivation of The H-J Update Law ............... 10

2.1.2 Gradient Descent Method .................... 11

2.1.3 Evaluation of the update law .................. 13

2.1.4 Computer Simulations ...................... 14

2.2 Mutual Information Approach ...................... 24
2.3 Information Theoretic Approach ..................... 29
2.4 Algebraic Approach ............................ 37

3 Blind Separation in a Static Environment 44
3.1 Problem Deﬁnition and Architecture .................. 44
3.2 Theoretical Solution .................... ' ....... 45
3.3 Energy Function ............................. 46
3.4 Update law derivation .......................... 49
3.5 Computer Simulations .......................... 51
3.6 First Improvement ............................ 54
3.6.1 Computer Simulations ...................... 55

3.7 Second Improvement ........................... 64
3.7.1 Computer Simulations ...................... 65

3.8 Observation and Remarks ........................ 75

4 Higher Order Statistics 76
4.1 Mutual information ............................ 77
4.1.1 Cumulants and Moments ..................... 80

vii

4.1.2 Edgeworth Expansion ...................... 81

4.1.3 Entropy Approximation ..................... 83
4.2 Independence Criteria .......................... 86
4.3 Static Case: Feedforward Network Structure .............. 87
4.3.1 Computer Simulations ...................... 90
4.4 Static Case: Feedback Network Structure ................ 94
4.4.1 Computer Simulations ...................... 97

Blind Separation in a Dynamic Environment: FeedForward Struc-

ture 103
5.1 Problem Definition ............................ 103
5.2 Existence of a Theoretical Solution ................... 104
5.3 State Realization ............................. 106
5.3.1 Controllable/Observable Canonical Form ............ 107
5.3.2 Features of the realization .................... 111
5.4 Adaptive Optimal Control Theory .................... 114
5.4.1 Computer Implementation .................... 116
5.5 Update Law Derivation .......................... 117
5.5.1 Nonlinear Dynamic Modeling .................. 117
5.5.2 Linear Dynamical Case ...................... 122
5.6 Computer Simulations .......................... 124
5.7 Observations ................................ 137

Blind Separation in a Dynamic Environment: Feedback Structure 138

6.1 Architecture ................................ 138
6.2 State Representation ........................... 141
6.3 Update Law Derivation .......................... 142
6.4 Computer Simulation Results ...................... 143
6.4.1 The Approach Based on Mutual Information .......... 144
6.4.2 An Approach Based on Neuromemitic ............. 152
6.5 Sequential Update ............................ 169
6.6 Time and Weight Scaling ......................... 200
6.6.1 Mathematical Development ................... 200
6.6.2 Example .............................. 208
6.7 Observations ................................ 210
Micro-electronic Implementation 211
7.1 CMOS Building Blocks ..... i ..................... 211

viii

7.1.1 Transistor .............................
7.1.2 Current mirror ..........................
7.1.3 Transconductance Ampliﬁer ...................
7.1.4 Hyperbolic Sine Function .....................
7.1.5 Gilbert Multiplier .........................
7.1.6 Vector Multipliers .........................
7.1.7 Current to Voltage Converter ..................
7.1.8 Second Order Section .......................
7.1.9 Second Order Filter Design ...................
7.1.10 Circuit Implementation of Second Order Filter .........
7.1.11 Circuit Realization for n-dimensional Filter ...........
7.2 Circuit Realization ............................

V 7.2.1 Implementation of the output equation .............
7.2.2 Implementation of the state equation ..............
7.2.3 Implementation of the C update equation ...........
7.2.4 Implementation of the D update equation ...........

7.3 Circuit Simulation ............................

Conclusion

Deﬁnitions
A.1 Statistical Deﬁnitions ...........................
A.2 Matrix Notations .............................

Proofs
3.1 Proof of Theorem 3.5 ...........................
B.2 CMOS Circuit Function Derivation ...................

Matlab
C.1 Derivation of f,I ..............................
C.2 Matlab Source Code ...........................

PSPICE Files
D.1 Circuit File ................................
D.2 Library File ................................

BIBLIOGRAPHY

ix

211
212
212
214
215
216
218
219
222
223
225
229
229
231
231

242

244
244
246

247
247
248

252
252
253

255
255
259

264

2.1

3.1

3.2

3.3

3.4

3.5

4.1
4.2

5.1
5.2

7.1
7.2

LIST OF TABLES

Information Theoretic Algorithm ....................

Simulation Results Algorithm deﬁned by equation (3.23). f(.1:) = 2:3
and g(z) = a: ...............................
Simulation Results Algorithm deﬁned by equation (3.23). f(:c) =
sinha: and g(:r) = tanha: .........................
Simulation Results Algorithm deﬁned by equation (3.27). f (:r) = x3
and 9(2) = z ...............................
Simulation Results Algorithm deﬁned by equation (3.27). f (x) =
sinhz and g(z:) = tanhz .........................
Results of Simulation Comparison between Algorithms deﬁned by
equations (3.23) and (3.27) ........................

Conversion of moments and cumulants .................
Hermite Polynomials derived from a normal baseline density function

0(2) ....................................

Number of parameters for 2 realizations .................
Different N onlinearity functions and their derivatives .........

Parameters of the Environment Circuit Realization ..........
Parameters of the Environment Circuit Realization ..........

34

64

74

75

83

112
119

LIST OF FIGURES

2.1 Herault and Jutten Architecture ..................... 9
2.2 Performance of H-J Algorithm. The parameters are updated according
to d.,'j = 1] y? yj .............................. 18

2.3 Performance of H-J Algorithm. The parameters are updated according

to d},- = 17 y? y,-. The solid curve is the unknown sources. The dotted

curve, superimposed onto the solid curve, is the network output . . . 19
2.4 Performance of H-J Algorithm. The parameters are updated according

to (iij = r) sinh y,- tanh y,- ......................... 20
2.5 Performance of H-J Algorithm. The parameters are updated according

to (Ii,- = r] sinh y.- tanh y,. The solid curve is the unknown sources. The

dotted curve, superimposed onto the solid curve, is the network output 21
2.6 Performance of H-J Algorithm. The parameters are updated according

to dgj = 17 q;,- y; y,- ............................. 22
2.7 Performance of H-J Algorithm. The parameters are updated according

to (1,3 = 1) q,-,- y; 3],. The solid curve is the unknown sources. The dotted

curve, superimposed onto thede curve, is the network output . . . 23
2.8 Feedforward Architecture ......................... 24
2.9 Nonlinear functions ............................ 27
2.10 Parameter convergence of the algorithm deﬁned by equation (2.49) . . 28
2.11 Bell and Sejnowski’s Architecture .................... 30
2.12 Performance of Bell and Sejnowski Algorithm ............. 33
2.13 Performance of Bell and Sejnowski Algorithm Using Speech Signals . 35
2.14 Performance of Bell and Sejnowski Algorithm Using Sine signals . . . 36
2.15 Cardoso’s Architecture .......................... 37
2.16 Cardoso Algorithm for n =2 ....................... 39
2.17 Cardoso Algorithm for n =3 ....................... 40
2.18 Continuous-time Algorithm for n =2 .................. 42
2.19 Continuous-time Algorithm for n =3 .................. 43
3.1 Static Environment Architecture ..................... 45

3.2
3.3

3.4

3.5

3.6

3.7

3.8
3.9

3.10

3.11

3.12

3.13

3.14

4.1
4.2

4.3
4.4
4.5
4.6
4.7
4.8

4.9

Performance of Algorithm deﬁned by equation (3.19) ......... 53
Performance of Algorithm deﬁned by equation (3.23) for 17(t) = 100.

f(a:) = $3 and g(:r) = 2: .......................... 57
Performance of Algorithm deﬁned by equation (3.23) for ”(t) = 10.
f(.'c) = 2:3 and g(:c) = a: .......................... 58
Performance of Algorithm deﬁned by equation (3.23) for n(t) =
1006'“. f(x) = x3 and 9(1) = a: ..................... 59
Performance of Algorithm deﬁned by equation (3.23) for 17(t) = 100.
f(a:) = sinhzr and g(a:) = tanhr ..................... 61
Performance of Algorithm deﬁned by equation (3.23) for n(t) = 10.
f(2:) = sinha: and g(:r) = tanha: ..................... 62

Performance of Algorithm deﬁned by equation (3.23) for 17(t) = 1006"“ 63
Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 100.

f(2:) = 2:3 and g(:c) = 2: .......................... 67
Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 10.
f(.r) = 2:3 and 9(3) = a: .......................... 68
Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 10.
f(:c) = 2:3 and g(:c) = :c .......................... 69
Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 100.
f(2) = sinhz and g(:r) = tanhz ..................... 71
Performance of Algorithm deﬁned by equation (3.27) for 11(t) = 10.
f(a:) = sinhx and g(a:) = tanha: ..................... 72
Performance of Algorithm deﬁned by equation (3.27) for 11(t) =
1006’“. f(:c) = sinhx and 9(1) = tanha: ................ 73
Feedforward Architecture ......................... 78
Nonlinear function. Graphs (a)-(d) show the plots of the functions
fa(y) through fd(y) ............................ 91
Natural Logarithmic function and few orders of approximations . . . 91
Computer Simulation of Feedforward Structure Using fa ....... 93
Computer Simulation of Feedforward Structure Using fd ....... 95
Performance of the Algorithm deﬁned by the nonlinear function fa . . 98
Performance of the Algorithm deﬁned by the nonlinear function fd . . 99
Performance of the Algorithm deﬁned by the nonlinear function fa
using random Gaussian and a sine function ............... 100
Performance of the Algorithm deﬁned by the nonlinear function fd
using a random Gaussian and a sine function .............. 101

xii

4.10

5.1
5.2

5.3
5.4
5.5

5.6

5.7

5.8

5.9

5.10

5.11

5.12
5.13

6.1
6.2
6.3
6.4
6.5
.6.6
6.7
6.8

Output of the Algorithm deﬁned by the nonlinear function fa using

random Gaussian and a sine function .................. 102
Feedforward Architecture ......................... 103
(a) State Equation 5a,, = A;,-x,-,- + bgjuj, (b) Output Equation yij =

egg-xi,- + (1,321, and (c) Output Equation y,- = ciTx, + dfu ........ 113
Bode plot of the environment model deﬁned by H (s) ......... 125
Bode plot of the network model deﬁned by H ‘(s) ........... 126

Dynamic Forward Structure. Update only the D matrix according to
D = 7[D’T — fa(y)eT] using two sine waveforms, 7 = 0.005 and D0 = 0128
Dynamic Forward Structure. Update only the 0 matrix ac-
cording to C = —17f.,(y)xT using two sine waveforms, r) =
[1000 100 1 100 1 100 10] and Co = 0.86“ ................ 129
Dynamic Forward Structure. Update only the D matrix according to
D = 7[D'T -— fd(y)eT] using two sine waveforms, 7 = 0.5 and D0 = O 130
Dynamic Forward Structure. Update both the C and D matrices ac-
cording to D = 7[D"'T - fa(y)eT] and C = —17f,,(y)xT using two
sine waveforms, 7 = 0.005, r) = [100 10 1 1000 5 2000 10]. All initial

conditions are zero ............................ 131
Dynamic Forward Structure. Update only the D matrix according to
D = -7fa(y)yT using two sine waveforms, 7 = 0.005 and D0 = 0 . . 132
Dynamic Forward Structure. Update only the D matrix according to

D = —7f.1(y)yT using two sine waveforms, 7 = 0.5 and D0 = O . . . . 133

Dynamic Forward Structure. Test Simulation of the update law D =

—7f., (y)yT at convergence using two sine waveforms .......... 134
Update of the D Matrix using D = -7y3y ............... 135
Update of the D Matrix using D = -7 sinhytanh y .......... 136
Feedback Processing Structure ...................... 141
Frequency response of the environment model ............. 145
Frequency response of the network model ................ 146
6,3 = mjfa(yg)$.‘j and 1);," = 10 ...................... 149
(3,3 = mjfa(y,')$,'j and 17;," = 101 ..................... 150
(3,3 = mjfa(y.-)z.-j and 17.3 = 10° ..................... 151
(5,3 = qgjfa(y,-)z,-j and flij = 10-2 ..................... 152

Update all the parameters of the 0 matrix according to a, =
mjfa(y,-)z.-j, (012, ms, 1121, 1722) = (10, 0.1, 2, 0.1) and Co = 0.80 . . 153

xiii

6.9 Update all the parameters of the C matrix according to a, =
17,-jf0(y,-):c,-,-, (1)12, 1713, 1721, 1722) = (10, 0.1, 2, 0.1) and Co = 0.5C . .
6.10 Update all the parameters of the C matrix according to c.)- =
mjfa(y.-)z,~,-, (1)12, 1113, 1121, 1722) = (10, 0.1, 2, 0.1) and Co = 0.1C . .
6.11 Update all the parameters of the C matrix according to (2;,- =
mjfa(y,-):c,-,-, (7)12, 1713, 1721, 7122) = (10, 0.1, 2, 0.1) and .Co = 0 . . . .
6.12 Update all the parameters of the D matrix according to dij = 7 fa(y,-)y,-,
7 = 0.0005 and (due, duo) = (0.0, 0.0). The unknown sources are
31(t) = sin(21rt) and 32(t) = sin(47rt) ..................
6.13 Update all the parameters of the D matrix according to d5,- = 7 fa (y;)y,-,
7 = 0.0005 and (d120, (1210) = (0.0, 1.0). The unknown sources are
31 (t) = sin(27rt) and 32(t) = sin(47rt) ..................
6.14 Update all the parameters of the D matrix according to 3.3 = 7 fa(y,-)y,-,
7 = 0.0005 and (due, dglo) = (1.0, 0.0). The unknown sources are
31 (t) = sin(27rt) and 32(t) = sin(41rt) ..................
6.15 Update all the parameters of the D matrix according to d}; = 7 fa(y,~)y,-,
7 = 0.0005 and (d120, £1210) = (1.0, 1.0). The unknown sources are
.91 (t) = sin(21rt) and 32(t) = sin(41rt) ..................
6.16 Update all the parameters of the D matrix according to d;,- = 7 fa(y,-)y,-
and 7 = 0.0005 for different initial conditions. The unknown sources
are 31(t) = sin(21rt) and 32(t) = sin(47rt) ................
6.17 Dynamic Feedback Structure. Update only the C matrix according
to C = r;f.,¢(y)xT using two sine waveforms, 17 = [100 10 100 10] and
Co = 0.8C”I ................................
6.18 Dynamic Feedback Structure. Update only the D matrix according to

D = 17f,¢(y)yT using two sine waveforms, 17 = 0.2 and D0 = 0 .....

6.19 ng = 11.331535, and 17.3 = 2000 .......................
6.20 égj=mjygxgj andmj=20 . . . . . . . . . . . . ~. ...........
6.21 Cg; = 17.331ng and 17,3 = 500 .......................
6.22 6.3 = ngjygzgj and 17;]: = 10 ........................
i 6.23 Update all the parameters of the C matrix according to é.-,- = ﬂijygitgj,
(1712, 1713, rm, 1122) = (103,10,102,1) and Co = 0.8C ..........

6.24 Update all the parameters of the C matrix according to 6,-1- = ngjygrgj,
(1712, 1713, 1721, rm) = (103,10,102,1) and Co = 0.5C ..........

6.25 Update all the parameters of the C matrix according to a, = ﬂijyixij,
(1712, 7113, 1121, 1722) = (103,1O,102,1) and Co = 0.1C ..........

xiv

154

155

156

161

163
170
171

174

175

176

6.26

6.27

6.28

6.29

6.30

6.31

6.32

6.33

6.34

6.35

6.36

Update all the parameters of the C matrix according to ij = mJ-ygxgj,
(7’12, 1113, 1721, rm) = (103,10,102,1) and Co = 0 ............
Update all the parameters of the C matrix according to a, = 17;jy,:c,-j,
(1712, 1713, 1721, 1722) = (103,10,102,1) and Co = O. The unknown sources
are 31(t) = sin(27rt) and 32(t) = square(47rt) .............
Update all the parameters of the C matrix according to (3,, = mjygxgj,
(1)12, 1113, 1121, 1722) = (103,10,105,102) and Co = 0. The unknown
sources are 31(t) = sawtooth(27rt) and 32(t) = square(47rt) . . . ' . . .
Update all the parameters of the C matrix according to a, =
17,-,- sinh ygtanh 2:,-,-, (1712, 1713, 1721, 1722) = (103,10,102,10) and Co = 0.
The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) . . . .
Update all the parameters of the C matrix according to a, = ﬂijyg3$ij,
(1m, 1713, 1721, 1722) = (103,10,102,10) and Co = 0. The unknown
sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) ............

Update all the parameters of the D matrix according to d}; = 7ygej, -

7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) =
sin(27rt) and 52(t) = sin(41rt) ......................
Update all the parameters of the D matrix according to d5,- = 7yf'yj,
7 = 0.2 and ((1120, duo) = (0.0, 0.0). The unknown sources are 31(t) =
sin(21rt) and 32(t) = sin(41rt) ......................
Update all the parameters of the D matrix according to d5,- =
7sinh ygtanh 3],, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown
sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) ............
Update all the parameters of the D matrix according to d},- =
7sinh y,tanh y,, 7 = 0.2 and ((1120, duo) = (0.0, 0.9). The unknown
sources are 31(t) = sin(27rt) and 32(t) = sin(41rt) ............
Update all the parameters of the D matrix according to (iij =
7sinh ygtanh 3],, 7 = 0.2 and (d120, 61210) = (0.9, 0.0). The unknown
sources are .91 (t) = sin(27rt) and 32(t) = sin(41rt) ............
Update all the parameters of the D matrix according to dij =
7sinh ygtanh 3],, 7 = 0.2 and (d120, (1210) = (0.9, 0.9). The unknown
sources are .91 (t) = sin(21rt) and 32(t) = sin(41rt) ............

6.37 Update all the parameters of the D matrix according to d},- =

7sinh ygtanh y,- and 7 = 0.2 for different initial conditions. The un-
known sources are 31(t) = sin(27rt) and 32(t) = sin(47rt) ........

XV

177

179

180

181

184

185

186

187

188

6.38 Update all the parameters of the D matrix according to d},- =
7sinh 3(7ch 317, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown
sources are 31(t) = sin(21rt) and 32(t) = square(47rt) .........

6.39 Update all the parameters of the D matrix according to d},- =
7sinh ygtanh y,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown
sources are 31(t) = sin(21rt) and 32(t) = sawtooth(41rt) ........

6.40 Update all the parameters of the D matrix according to d},- =
7sinhygtanh 7],, 7 = 0.2 and (d120, (I210) = (0.0, 0.0). The unknown
sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ......

6.41 Update only one parameter of each matrix: update the parameters on
and d1; with 17 = 20, 7 = 0.2 and (c130, duo) = (0.843, (1:2). The
unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt)

6.42 Update only one parameter of each matrix: update the parameters
cm and du with 17 = 20, 7 = 0.2 and (c130, (1120) = (4'3, 0.0). The
unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(47rt)

6.43 Update only one parameter of each matrix: update the parameters
on and d1; with 17 = 40, 7 = 0.2 and (c130, dlgo) = ((33, 0.0). The
unknown sources are .91 (t) = sawtooth(21rt) and 32(t) = square(47rt)

6.44 Update only one parameter of each matrix: update the parameters
on and (In with 17 = 40, 7 = 0.2 and (c130, (1120) = (c'f3, 0.0). The
unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt)

6.45 Update all parameters of each matrix. Co = 0.8C" and D0 = D‘,
17 = [103 10 102 10] , 7 = 0.2 and T = 500 The unknown sources are

. 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............

6.46 Update all parameters of each matrix. Co = 080" and D0 = D‘,
17 = [103 10 104 102], 7 = 0.2 and T = 500. The unknown sources are
31(t) = sawtooth(27rt) and 32(t) = square(41rt) ............

6.47 Update all parameters of each matrix. Co = 080" and D0 = D‘,
17 = [103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are
31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............

6.48 Update all parameters of each matrix. Co = 080" and D0 = 0, 17 =
[103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are
31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............

6.49 Update all parameters of each matrix. Co = 0.80" and D0 = O, 17 =
[1000 10 100 10], 7 = 0.2 and T = 10000. The unknown sources are
31(t) = sawtooth(27rt) and 32(t) = square(47rt) ............

xvi

191

192

193

194

195

6.50

6.51

6.52

6.53

6.54

7.1
7.2
7.3
7.4
7.5

7.6
7.7

7.8
7.9
7.10
7.11
7.12
7.13

7.14

Update all parameters of each matrix. Co = 0.80“ and D0 = 0, 17 =
[1000 10 500 10], 7 = 0.2 and T = 10000. The unknown sources are
31 (t) = sawtooth(21rt) and 32(t) = square(41rt) ............
Update all parameters of each matrix. Co = 080" and D0 = O, 17 =
[200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are
31 (t) = sawtooth(21rt) and 32(t) = square(47rt) ............
Unknown sources, environment output and Network output at initial
timet = 0. Co = 0.80“ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2
and T = 10000. The unknown sources are 31(t) = sawtooth(21rt) and
82(t) = square(41rt) ...........................
Unknown sources, environment output and Network output at ﬁnal
timet = T. Co = 080" and D0 = 0, 17 = [200 10 200 10], 7 = 0.2
and T = 10000. The unknown sources are 31(t) = sawtooth(27rt) and
32(t) = square(47rt) ...........................
Test case: Unknown sources, environment output and Network output
at ﬁnal timet = T. Co = 0.8C‘ and D0 = 0, 17 = [200 10 200 10],
7 = 0.2 and T = 10000. The unknown sources are 31(t) = sin(21rt)
and 32(t) = sawtooth(41rt) ........................

Current Mirror (a) n-channel, (b) p-channel ..............
Transconductance Design .........................
PSPICE simulation of a transconductance ampliﬁer ..........
Circuit Realization of tan and sine hyperbolic functions ........
PSPICE simulation of the sine hyperbolic circuit for different biasing
voltages V2, .................................
l-dimensional Gilbert multiplier .....................
PSPICE simulation of of Modiﬁed Gilbert Multiplier using different
biasing voltages ..............................
n-dimensional Gilbert multiplier .....................
Current to Voltage Converter ......................
Second Order Section Circuit .................. ‘. . . .
PSPICE Simulations of Second Order Section for
V7,, = 625, 650, 675, 700, 725mV ....................
Filter Design. Circuit realization of controllable canonical form of a
second order ﬁlter .............................
Filter Design. Circuit realization of controllable canonical form of a
second order ﬁlter .............................

PSPICE Simulations of second order ﬁlter ...............

xvii

206

212
213
214
215

216
217

217
218
219
220
222

224

226
226

7.15 Filter Design, Circuit realization of a controllable canonical form of a

11“ order ﬁlter ............................... 227
7.16 Circuit Realization of the output equation ............... 230
7.17 Circuit Realization the update equation of the C matrix parameters . 232
7.18 Circuit Realization of tan and sine hyperbolic functions ........ 234
7.19 Block diagram of overall circuit including network and learning . . . . 235
7.20 Unknown sources ............................. 236
7.21 Environment Circuit Model ....................... 236
7.22 Frequency Response of the Environment Circuit Realization ..... 238
7.23 Transient Response of The Environment Circuit Realization ..... 239
7.24 Parameters’ Convergence of the Network Circuit Realization ..... 240
7.25 Output of the Network Circuit Realization at Convergence ...... 241

xviii

CHAPTER 1

Introduction

The Blind Separation of Sources is a challenging signal processing problem with signif-
icant potential applications. The problem is informally described as follows: several
unknown but independent temporal signals propagate through a mixing and / or ﬁlter-
ing, natural or synthetic environment. By observing the outputs of this environment,
a network (e.g., a system, a neural network, or a device) is conﬁgured to counteract
the effect of the environment and adaptively recovers the original signals. For this
processing, only the property of signal independence is assumed. No additional a
priori knowledge of the original signals is required. This processing represents a form
of self (or unsupervised) learning. The weak assumptions and the self-learning capa-
bility render such a network attractive from the vieWpoint of real-world applications
where training is not an option.

The blind separation approach has great advantages over the existing adaptive
ﬁltering algorithms. For example, when the mixture of other signals is labeled as
noise in this approach, no speciﬁc a priori knowledge about any of the signals is
assumed; only that the processed signals are independent. This is in contrast with
the noise cancelation method proposed by Widrow et al. [1] which requires that a
reference signal be correlated exclusively to the part of the waveform (i.e., noise) that

needs to be ﬁltered out. This latter requirement entails speciﬁc a priori knowledge

about the noise, as well as the signal(s).

The blind separation of sources is valuable in numerous and major applications in
areas as diverse as telecommunication systems, sonar and radar systems, audio and
acoustics, image/ information processing, and biomedical engineering. Consider, e.g.,
the audio and sonar application where the original signals are sounds, and the mixed
signals are the output of several microphones or sensors placed at different vantage
points. A network would receive, via each microphone, a mixture of sounds that are
usually delayed relative to one another and/or the original sounds. The network’s
role is then to dynamically reproduce the original signals, where each separated signal
can be subsequently channeled for further processing or transmission. Similar appli-
cation scenarios can be described in situations involving heart-rate measurements,
. communication in noisy environments, engine diagnostics, and uncorrupted cellular
phone communications.

Adequate models of the environment that include time delays or ﬁltering are very
important when the sensors are viewed as part of the environment. In many applica-
tions where the sensors (e.g., microphones) include their own dynamics, the model of
environment should include time-delays or ﬁltering. Otherwise, assuming pure static
models of the environment would result in highly sensitive and unrobust processing for
any network. Indeed, in order to render the network operable in real-world applica-
tions, robust operations must be ensured to parameter variations, dynamic inﬂuences
and signal delays that often result in asynchronous signal propagation. Examples of
such effects include ﬁlters in the case of an audio system, or delay lines and / or echoes
as in the case of a radar system. The environment should be modeled as a dynamic
linear (or even, nonlinear) system [2, 3].

The challenges for this area reside in the development of a mathematical analysis
and framework to the problem of blind separation of sources. This is the angle from

which this research has been approached: a review of the current literature has been

conducted to study the various approaches to this problem, and a general framework
to develop an update law for the network parameters based on optimization theory,
the calculus of variations and higher order statistics has been proposed.

The thesis is divided into eight chapters. An introduction of this work is presented
in Chapter 1. Also included is the deﬁnition of the problem and some applications
which are given to illustrate the need for a solution to this problem. Chapter 2
presents an overview of the literature. The pioneering work of Herault and Jutten
is presented ﬁrst followed by the work of Cardoso, Amari and Sejnowski and their
coworkers. This review presents the different approaches that have been attempted in
ﬁnding a solution to this problem, including neuro-biologically based ﬁndings, alge-
braic method solutions and higher order statistics methods. This chapter summarizes
a snapshot of the status of current approaches in the ﬁeld. In Chapter 3, a solution to
the problem in a static environment is developed based on the decorrelation condition
of the output vector. However, decorrelation characterizes only second order statis-
tics, which are not sufficient for solving this problem. Therefore, improved versions
of the algorithm, built upon the works of Herault, Jutten and Amari, are developed,
and computer simulations are given which validate the performance of these proposed
algorithms. In Chapter 4, higher order statistics are introduced in order to develop an
approximation of an energy function. The developed energy function approximation
does not- assume that the output signals should have unit variance. The developed
energy function also uses a higher order approximation of the natural logarithmic
function. Such an energy function will require more computations. However, it is
shown that the adaptive laws based on such an energy function will perform well,
where other existing algorithms fail to perform the same task. In Chapter 5, the
environment and network models are represented by a dynamical system described
by a matrix transfer function. The existence of a theoretical solution to the problem

is presented. Then, a state space realization of the network is developed. Such a

realization is represented by the least number of parameters. This translates into
efficient computations and would eventually results in reduced chip area in electronic
implementation. The optimization theory and the calculus of variations are explored
in this chapter to establish a general framework for updating the parameters of the
dynamic network. Computer simulations show the performance and the limitations of
the developed algorithms. In Chapter 6, a feedback structure of the dynamic network
is considered. The derived update laws here are based on an extension of the static
case. Computer simulations for the feedback structure are presented and discussed.
In Chapter 7, the basic building blocks for the micro-electronic circuit implementation
of the algorithm are developed. Consequently, the complete realization of a feedback
neural network architecture with learning is presented and supported by PSPICE
computer simulations. Finally, concluding remarks and directions for future work are

presented in Chapter 8.

CHAPTER 2

Literature Review

In this chapter, a review of the present literature is presented in order to reveal the
different approaches already considered in solving the problem of the blind separation
of signals. These efforts have helped direct the path that the research has taken in
tackling this problem. Some of these ideas herein will be explored as possible solutions
and will be incorporated in the development of this work. A general overview is due,
however, in order to give credit to some of the scientists who have contributed to the
evolution of this exciting area of intelligent nonlinear signal processing. Yet, on the
outset, we confess that our overview would not be comprehensive, and very likely,
would overlook some portion of this revolutionary literature.

The blind separation of signals approach, motivated from neuro-biology, was ﬁrst
introduced by Herault and Jutten in the late 1980’s. They developed an adaptive
algorithm based on neuro-biological ﬁndings [4, 5, 6, 7]. Some theoretical analysis
was later developed which provided some validation of the algorithm [8, 9, 10, 11].
This line of work, based on the neuromemitic approach, was further considered by
Karhunen [12] and Chichocki and Moszcznski [13].

The studies on the blind separation of signals are based on the assumption that
no a priori detailed knowledge of the unknown sources is available. However, the

unknown sources are assumed to be (statistically) independent. Thus, to solve the

problem one has to render the components of the output of the network statistically
independent. This hypothesis will be the basis for the development of various energy
functions, or sometimes called contrast functions or independence criteria.

Despite the fact that decorrelation of the components of a signal vector is a weaker
condition than independence, some researchers have considered such a necessary,
but not sufﬁcient, conditions to develop criteria and corresponding algorithms [14].
However, decorrelation describes only 2'“ order statistics of the output signal vector.
Therefore, these criteria are valid only for a small set of input signals and higher order
statistics need to be considered when developing more capable algorithms.

Since the cross cumulants of independent signals are zero, several contrast func-
tions have been considered in solving the problem. For example, when the inde-
pendence is measured in terms of the cancelation of fourth order cumulants of the
outputs, cubic nonlinearity, similar to that deﬁned in [4], will appear in the update of
the algorithm [15, 16, 17]. Cardoso, on the other hand, focused on the algebraic prop-
erties of fourth order cumulants and considered the problem as a series of whitening
and diagonalization processes [18, 19, 20, 21, 22].

These criteria are necessary, but not sufficient, as most of these conditions are
constrained to some speciﬁed set of inputs. Therefore, the celebrated work of Comon
[23] is considered an important framework. The mutual information is considered as
an energy function. The mutual information is expressed in terms of the marginal
probability density functions. However, knowledge of the densities is not accessible by
the hypothesis of the problem. Therefore, an Edgeworth expansion was considered
to approximate them. Amari et a1 [24] followed the same line of work by taking
a Charlier-Gram expansion. Bell and Sejnowski [25, 26] had also considered the
optimization of only one term of the mutual information, but had to resort to express
it in terms of a nonlinear function of the output signals in order to capture the

statistical information of the output.

It should be noted that an important technical difﬁculty faces the identiﬁability of
the solution to the problem of the blind separation of signals. Due to lack of informa-
tion, regarding such items as the signal power, the spectral content or the modulation
scheme, the output of the separating network cannot be ordered corresponding to the
order of sources signals. Thus, the signals can be identiﬁed up to an indetermination
in terms of scale and order. This identiﬁabity problem was ﬁrst treated by Giannakis
et al [27] who used third order cumulants, and, it was further addressed by Tong et
a1 [28, 29, 30, 31] who used fourth order cumulants.

Several algorithms have been proposed in the literature for separating signals
based on the availability of prior spatial, temporal or statistical information. This
direction of work deﬁnes another approach to blind separation because necessary and
sufﬁcient conditions of statistical independence are hard to satisfy. For example, the
MUSIC [32] and ESPRIT [33] algorithms take advantage of the structural informa-
tion of the channels based on the Vandermode matrix channel characterization to
obtain an estimate of the parameters. Unfortunately, such structural information is
not always available. Other algorithms that exploit the temporal structure of a com-
munication channel, while assuming no priori spatial knowledge, have been proposed
in the literature. These techniques consider the constant modulus property [34], dis-
crete alphabet [35], self-coherence [36] and the ﬁnite alphabet property [37]. Other
algorithms were developed based on a priori knowledge of the statistical information
of the signals. When the sources have known probability densities, the maximum
likelihood estimator is used to provide a solution to the problem [38, 39, 40].

So far, this review has focused on the case when a static modeling of the envi-
ronment is considered. How about the case when the environment is modeled as a
dynamic system? In this case, two lines of work can be described: digital versus
continuous. Most of the studies in the literature have tackled this problem using FIR

(ﬁnite impulse response) ﬁlters. For example, Moulines et a1 [41] considered the sub-

space method to decompose the signal-noise space from the noise space to recover the
unknown sources. Gerven and Compernolle [42] used a criteria based on the second
order statistics while Thi and Jutten [43] considered criteria based on the cancela-
tion of fourth order cumulants. In these works, the goal was to determine the FIR
coefﬁcients that separate the signals. Bell and Sejnowski considered an information-
theoretic approach [44, 45]. In all these works, F IR’s were considered to model both
the environment and the network. This thesis intends to address the problem by
considering an environment and a network model that are described by a continuous
linear dynamical system [46, 47], and to deﬁne a general framework for solving the
problem.

Several analog implementations of the blind separation algorithms have been re-
ported in the literature. Vittoz and Arreguit [48] and Cohen and Andreou [49] have
considered the implementation of the static HJ algorithm. On the other hand, Gharbi
and Salam have considered an analog implementation of the dynamic HJ algorithm

[2, 50, 51, 52, 52, 47].

2.1 Neuromemitic Algorithm

Herault and Jutten have pioneered a new paradigm in the area of blind separation
of in the Ph.D. work of Jutten under the supervision of his advisor, Herault [4]. The
problem was labeled as the Independent Component Analyzer (ICA) [4, 6] because
of its similarity to the Principal Component Analysis [53, 54]. In [8, 9, 10, 7, 5],
Herault and Jutten proposed an algorithm for the separation of independent sources.
It was assumed that the medium is linear and static. The inputs to the network were
the measured signals eg(t), 1 S 1' _<_ n, which are linear combinations of the original

signals, namely,

 

 

 

_., e:- : 2;}:1 (15ij y.- = 31' — ngéi dijyj

 

 

 

 

 

 

Figure 2.1. Herault and Jutten Architecture

«(1) = iat- 3,.(1) (2.1)

i=1

which, in vector form, can be expressed as
e(t) = A 8(1) (22)

A is an n x 11 matrix whose components 0,3 are unknown. The matrix A models the
mixing static environment and is assumed to be a nonsingular matrix. Furthermore,
for normalized mixing, its diagonal entries are all ones, and each off- diagonal element
is less than one in absolute value. Herault and Jutten used a recursive architecture
made up of fully interconnected outputs. Each output, y,-(t),l S 1' S 11, received the

mixed signal, 6,-(t), and a weighted sum of all other outputs, — 2,,“ dij y,(t). Thus,

y,-(t) = e;(t) — gdij yj(t) 1 S i S n (2.3)

which, in vector form, becomes

3’0) = 8(0 - D 3’0) (2.4)

D is an n x 11 weight matrix whose main diagonal is zero. Now, the problem of
separation of signals translates to retrieving the original signals s(t). In the limit, it

is thus desired to have:

y(t) = P 8(t) (2.5)

10

where P is a generalized permutation matrix; a matrix that is obtained from a non-
singular diagonal matrix by row and / or column permutation.

Herault and J utten were motivated by some neuro—biological evidence that infor-
mation about the speed and position of body joints is mixed before being sent to
the brain by two different types of nerves. The brain, however, is perfectly capable
of separating (and recovering) speed and position signals. This biological and intu—
itive inspiration led Herault and Jutten to propose an update law, reminiscent of the

Hebbian rule, that presumed the independence of the original signals

do = nu f (yalgfyj) i #1 (2-6)
where f () and g(.) are two nonlinear odd functions and 17,-,- is a constant learning rate.
Despite the fact that their original idea came from a neuro-biological inspiration, the

authors supplied an initial mathematical reasoning for the update law.

2.1.1 Derivation of The H-J Update Law

In [10], the authors have introduced a justiﬁcation of the developed update law deﬁned
by (2.6). The work of the authors and some issues that may not have been considered
during the development of their algorithm will be presented.

This algorithm was developed by assuming that the network is near convergence.
This means that the ﬁrst (11 — 1) outputs were assumed to have converged; y,- =

0553;, V1 < n. Then, the last output can be expressed as:

yn = 6.. — Zdnj 117 (2.7)
#11
n—l
= Zam- 31' - Z d",- yj (2.8)
.7 j=1
n-l
= 2 (Gui - d,,,-a,,-) 81' + annsn (2.9)

:‘=1

11

Since the sources 37’s are independent, the cross correlation between two different

outputs is null. Therefore, the output power of y1, can be expressed as:

n-l
< 313, >= 2: (anj — dnjajj) < 8? > + am, < 83, > (2.10)
i=1

Thus in order to minimize the power of yn, it is required that
an,- — dnjaJ-J- = O, Vj 94 n (2.11)

In this case, the power of the 11.“ output yn(t) is proportional to that of 3,,(t). This
above relationship is true only for the nth output assuming that all other (11 - 1)
signals are at the desired sclution.

The authors proposed to develop the algorithm based on the principle of mini-
mizing the individual output powers using the gradient descent method which will
be presented in the next section. However, minimizing the individual powers of the
outputs does not lead, necessarily, to the minimization of the total energy function,
which is the sum of all individual powers. They assumed that the only contribution
from the total energy to a particular parameter dij comes only from the corresponding
output signal 77,-. Later, in this work, a different update law obtained from the sum of
all individual output powers will be considered as an energy function for the problem.
Due to the fact that the algorithm developed from this energy function is merely a
decorrelation of two output signals, it fails to separate sources. However, including

the nonlinearity functions f and g in (2.6) have provided some improvements.

2.1.2 Gradient Descent Method

As was discussed earlier, the authors in [10] proposed that the network parameters

dij would be updated using the gradient descent method in order to minimize the

12

output power of the individual signals deﬁned as

E,- = y? (2.12)

NIH

Thus the parameters were updated as follows:

_E;1'a_ a_yi

d,- = — , 2.1
J ’7— adij- -775, a_dij ( 3)
It is given that

ym = em " Z dmk g]: (2.14)

k¢m
= 8m - Z (1 - 6mk)dmk y):
I:

Now, differentiate both sides of the above equation with respect to (1,,- in order to

obtain the expression for ﬁym/adgj:

311m By
Evian. 5111/1. (1—61m)2dm1 8730-6...) (215)

Rearrange the above equation to obtain:

a 1‘
2(6... + d... (1— 6.5)) a—j" = 6....- (6.1—1):” = v1.2. (2.16)
I: '1

Let
Q = (1 + D)-1 ' (2.17)

and the previous equation becomes

33’. _ m

13

Therefore,
aym _ 6y
adij " adij m (2.19)
= [Q 1'3”]m (2.20)
= quk ”3(1) (2.21)
I:
= Zkayj 5“ (51¢ - 1) (2.22)
I:
= qm. 15 (6.1- 1) (2.23)
Consequently
' all;
do - '7) y. 67,; (2.24)
= ”7911 (511' - 1) yr yj (2.25)
= '1 (ii; 311' yj (1 - 511') (2.26)

2.1.3 Evaluation of the update law

The update law deﬁned in (2.26) always converges to a symmetric solution matrix;
thus limiting the structure of the environment to a symmetric matrix. However,
the general environment cannot always be modeled by a symmetric matrix. This
represents a drawback of the update law deﬁned by (2.26). Also, when equation

(2.26) is expressed on average as
< a,- >= 7) (111 < y.- yj > (511' - 1) _ (2-27)

this rule tests only the decorrelation of the output signal g(t). However, the goal is
to develop one update law which pushes for the independence of the components of
g(t). To do so, the rule was modiﬁed in order to accommodate higher order moments

by imposing a nonlinear function which produced various moments of the output, as

14

follows:

€1.11 = '7 (Iii f(ya') g(yj) (1 - 51'1") (228)
where f and g are two odd functions. These two functions should be different. Oth-
erwise, the matrix D becomes symmetric. The use of such two functions introduces

higher- order moments. To illustrate this, consider the Taylor series expansion of

these two odd functions as

f(3) = 202k“ 50%“ and 9(3) = :52!“ 32’“ (2-29)
I: 1

Thus, the equilibria of equation (2.28) on the average satisfy
< 61:3 >= We." 202k+1521+1 < 3112 H1 y?“ > = 0 (2-30)
k,i

ASsuming c127,.“ and [321.11 are not zero, this results in
< 1.2"“ 113’“ >= 0, V k, I (2.31)

This condition means that all joint odd moments are zero. This condition is stronger
than correlation. However, it is implied by independence of the components of the
signal vector y(t), when these signals have even probability density functions.

As a last approximation, the authors used the assumption that the diagonal entries
q;,-’s were very close to one, since the off-diagonal entries c,-,-, 1' 79 j, were less than

one. Thus, the proposed update law becomes as deﬁned by equation (2.6).

2.1.4 Computer Simulations

Computer simulations were performed for the H-J algorithm deﬁned by (2.6). The

unknown sources are two sine waveforms with respective frequencies lkHz and 2kHz.

15
The environment mixing matrix is assumed to be

1.0 0.6
0.4 1.0

The learning rate is taken to be 17 = 100 and the initial conditions are all zero. Figure
2.2 shows the performance of the algorithm when the odd functions f and g are

respectively the cubic and the linear functions.
f(x) = x3 and g(x) = 1'

One observes that the off-diagonal coefﬁcients of the D matrix converge to the desired

values. In Figure 2.2, the D matrix converges to

* 0.0000 0.4004
D, =
0.6004 0.0000

In addition, Figure 2.2 displays the performance index, which is deﬁned next.

Performance Index

In the problem of blind separation of signals, one desires to design a network such
that its output is a replica of the unknown sources. Because of the lack of knowledge
of the unknown sources and the mixing matrix, one does not expect to completely
identify the unknown sources. Therefore, the order of the components of the output
cannot be determined, nor is their corresponding magnitudes. Consequently, one can
identify the unknown sources up to a permutation and a scaling. This is deﬁned as

the wave preserving property [28]:

16

Deﬁnition 1 (wave preserving property) The signals y(t) and s(t) satisfy the

wave-preserving property if and only if
y(t) = Ip I‘ s(t) = Ps(t) (2.32)

where I p is a permutation matrix and I‘ is a diagonal matrix, and G is the general-
ized permutation matrix. The deﬁnitions of a permutation matrix and a generalized

permutation matrix are given in Appendix A.

In the case of the H-J work, the network input-output relationship is described by
equation (2.4). For the sake of generality, let’s assume that such a relationship is

represented by

y = We (2.33)
Thus, considering equations (2.32) and (2.33), one obtains

P = WA (2.34)

So, at convergence, the gain matrix WA, which is the product of the matrices W and
A, has to be a generalized matrix in order to claim that the network converged to a
solution that separates the mixed signals.

The following mapping

I: Rn“ —-»R+

P 1-—+I(P)=Z[Z——— ——'—p"' —1]

maxklpacl

+ Z [Z—— I?“ —1] (2.35)

maxi: lij l

17

is always positive and is zero if and only if the matrix P is a generalized permutation
matrix. Therefore, this mapping will be considered as a performance index of any
given algorithm for blind signal separation. Its plot, versus the time of evolution of
the algorithm, represents a measure of the convergence of that algorithm.

Thus, one clearly observes that the performance index, shown in Figure 2.2, ap-

proaches zero and consequently the gain matrix P = WA becomes

1.0002 -0.0006
-0.0005 1.0004

Figure 2.3 displays the response of the network after convergence. The output signals
of the network, y,-(t), are approximately the unknown sources, s,-(t). There is an error
between the input and the output of few milli units.

Computer simulations were also performed using different types of odd functions.

In this case,
f(x) = sinhx and g(x) = tanh :1:

Figure 2.4 and 2.5 show the performance of the algorithm deﬁned (2.6) using these odd
functions. The settings of this simulation are exactly the same as for the ones above,
in terms of initial conditions, learning rate, mixing matrix and unknown sources.
When, the training time is equal to 0.lsec, it was observed that the network did not
converge yet. Therefore, a longer training time, namely 0.2sec, was allocated. In
this case, the parameters converged to the near desired values as shown in Figure
2.4. Also, Figure 2.5 shows that the output represents a near replica of the unknown
sources

When computer simulations where performed for the algorithm deﬁned by equa-

tion (2.26), the network converged to a symmetric matrix as it was anticipated in the

18

discussion of the algorithm development. Figures 2.6 and 2.7 show the performance.

If one considers deriving the update law based on minimizing the total energy of

HJ Algorithm Phase Plot

Perionnance Index

 

 

 

 

 

0 0.05 0.1 0 0.2 0.4 0.6
d1 2

0.5 - 0.8 -

 

0.6 -----

£30.4-

0.2 ’

 

 

 

 

 

0 0.05 0.1 0 0.05 0.1

Figure 2.2. Performance of H-J Algorithm. The parameters are updated according
to ds = n y? 10'

the signal or any other energy function of the form

t = Z 4501?) (2'36)

NIH

one could use equation (2.23) to obtain

01.1 = ”72 ¢'(ym)quma y,- (511' - 1) (2-37)

19

 

 

 

 

 

 

 

 

 

 

 

 

2' 2

'5 ol 9‘. 01

-01 0.1005 0.101 0.1015 0.102 -01 0.1005 0.101 0.1015 0.102
2' 2

'5 0 ‘3 0

_. . = ~ - -2 - ~ - -
0.1 49.1005 0.101 0.1015 0.102 0.1 3.1005 0.101 0.1015 0.102
x10 x10
1' 1.

v- N

E 0 E 0

0 Q

—1 L L 4 ‘ -1 ‘ ‘ ‘ 4
0.1 0.1005 0.101 0.1015 0.102 0.1 0.1005 0.101 0.1015 0.102

Figure 23 Performance of H-J Algorithm. The parameters are updated accord-
ing to d5,- = 17 y? y,-. The solid curve is the unknown sources. The dotted curve,
superimposed onto the solid curve, is the network output

20

 

 

 

 

HJAlgorithm Phase Plot
........... 0.3
0.6»
570.4
0.2
0 g A .
O 0.2 0.4 0.6

 

d12

d12

 

 

 

 

 

0 0.05 0.1 0.15 0.2 0 0.05 0. 1 0. 1 5 0.2

Figure 2.4. Performance of H-J Algorithm. The parameters are updated according
to dgj = 1] sinh y; tanh yj

21

 

 

 

 

 

 

 

 

 

 

 

 

 

2I

'13 0! OI

-1 A v A = v 2 -5 A A A 2
0.2 0.2005 0.201 0.2015 0.202 .2 0.2005 0.201 0.2015 0.202
2 2

'5 0 ‘3 O

- A A A —A -2 A A = A
0.2 _p.2005 0.201 0.2015 0.202 0.2 _p.2005 0.201 0.2015 0.202
x 10 x10

5' 5.

I ‘1‘

2 ° 8 °

0 0

’52 0.2005 0.201 0.2015 0.202 '02 0.2005 0.201 0.2015 0.202

Figure 2.5. Performance of H-J Algorithm. The parameters are updated according
to d;,- = 17 sinh y,- tanh 77,-. The solid curve is the unknown sources. The dotted curve,
superimposed onto the solid curve, is the network output

22

HJ Algorithm

[0

  

_L
01

Performance Index

 

O 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

 

 

 

 

 

0.05 0.1

C)
0

Figure 2.6. Performance of H-J Algorithm. The parameters are updated according
to do = n as 111' 311

23

 

 

 

 

 

 

 

 

 

2.
a a
3 9", 01
U U
_0.1 0.1005 0.101 0.1015 0.102
2' 27
S 0 ‘3‘. 0
.01 0.1005 0.101 0.1015 0.102 -01 0.1005 0.101 0.1015 0.102
0.2' 0.21
"' ‘1‘
8 ° 2 0
Q 0 .
-0.2

 

 

 

 

A A A A _O. A A A J
0.1 0.1005 0.101 0.1015 0.102 0.1 0.1005 0.101 0.1015 0.102

Figure 2.7. Performance of H-J Algorithm. The parameters are updated according

to d,-,- = 17 q“ y,- y,-. The solid curve is the unknown sources. The dotted curve,
superimposed onto the solid curve, is the network output

24
2.2 Mutual Information Approach
In [24], Amari et al. considered the following feedforward architecture for the blind

separation problem. They used the Independent Component Analyzer (ICA) frame-

X

3(1) e(t) \K W)
__., A :

\_.

 

 

 

 

 

 

 

 

 

 

 

Figure 2.8. Feedforward Architecture

work formulated by Comon [23] which is based on minimizing the dependency among
the output components. This dependency is measured by the Kullback-Leilber diver-

gence between the joint and the marginal probability density functions:

_f_y(Y)id

(2.38)

Equation (2.38) is minimum and is equal to zero only when all the components of
the output vector,namely y,-, are statistically independent. The averaged mutual

information can be expressed in terms of entropy as

1(1') = -H (1') + X H (31,-) (239)

where H (y) is the entropy of y which is a measure of uncertainty of the occurrence
of the event produced by y is deﬁned in Appendix A.

When using the mutual information as an independence criterion, partial knowl-
edge of the statistical information of the output is needed, since the chosen inde-
pendence criterion is a function of the probability density of the output. To surpass

that constraint, Amari [24], as well as Comon [23], used a truncation of inﬁnite series

25

expansion of the probability density functions. Unlike Comon, who used Edgeworth
expansion, Amari completed a Gram-Charlier expansion to approximate the proba-

bility distribution of the output. The fourth-order Gram-Charlier expansion of the
f in (311') IS

541'

f1. (1.) ~ f..(y.-)— — a (3,.) [1 + —33,‘1Ha(y.-) +— 12(4)] (2.40)

. _ 1
where 113,- = 1713;, 1:4,- = m4,-—3, mk; = E[y,’°] IS the kth moment of 77;, 0(2) = 715178 '57,

and H7,(x) are the Chebyshev-Hermite polynomials deﬁned by

(—11)*-"—— :0”) = Hk(x)o(x) (2.41)
It is assumed here that the variance term is unity, i.e.

mzi = 1, W (2.42)

A second approximation of the natural logarithmic function was used in order to be

able to compute the integrals

2

ln(1+ x) z x — 22— (2.43)

Assuming that the weighting matrix W was nonsingular, then the following would

hold

 

_ fx(x)
fy(1') - ,W, (2.44)

Therefore, H (y) can be rewritten as

H (3') = Elln |W|l - Elln fx(x)l = H (X) + Elln |W|l (2-45)

26

Thus,

I(y) = 2H0.) -E[1n|Wl] -H(x) (2.46)

Using the above approximations and the gradient descent method to minimize the

mutual independence, the authors arrived at the following update law [24]:

W = nlW‘T - f (3')le (247)
where
_311 2_59__1_‘£7_‘_115 E3
f(y)—4y +411 3y 411+,y (2.48)

This function is plotted in Figure 2.9 along with other functions that have been
considered in the literature. However using the information theory perspective [55]
and assuming the mixing matrix to be nonsingular, the above update law will be

rewritten as

W = n[1 — 1001”] W (249)

Computer simulations of this algorithm were performed. The unknown sources are
assumed to be two sine waveforms with respective frequencies 1H2 and 2H 2. The

mixing matrix is chosen to be random

0.7012 0.7622
0.9103 0.2625

The learning rate is 17 = 0.1 and the random initial condition

0.0475 0.3282
0.7361 0.6326

0:

27

 

 

 

 

Figure 2.9. Nonlinear functions

According to such a setting, the initial gain matrix is

0.3321 0.1223
G0 = WoA =

1.0920 0.7271
It is clear that the gain matrix, Go, is not a generalized permutation matrix. So,
the goal is that the network would update its parameters such that the gain matrix
becomes a generalized permutation matrix. Figure 2.10 shows the performance of the
algorithm using the settings described above. One can observe that all the parameters

of the matrix W converged to

—0.6835 1.9155
2.1673 —1.6387

28

which corresponds to the gain matrix

1.2644 —0.0183

C?f===TV}/i==
0.0280 1.2218
2nd odor, unitvatiance using ("), iwosine inputs
2.5 -

N

Pedonnancelndex
.. in

 

 

 

 

 

 

 

 

0.5
0 W+W
0 50 60 7O 80 90
3 1.5
<
§’ 3
x
g. E
d
g 2
§ 8
.E
g a
a, 8
3 s
-05 r ‘ ‘ r
0 20 40 60 80

 

Figure 2.10. Parameter convergence of the algorithm deﬁned by equation (2.49)

It is very important to note that the gain matrix converged to a matrix differ-
ent from the identity, unlike the algorithm deﬁned by H-J, which was discussed in

the previous section. The reason is that Herault and Jutten had assumed a special

29

architecture of the environment by taking the diagonal of the A matrix to be one.
Thus, when developing the update law, they did not consider any law for the diagonal
elements of the network matrix. Therefore, the network converged to the exact in-
verse of the environment model. By doing so, the identiﬁability issue was eliminated.
However, in the work of Amari et al, no such special structure is assumed. Therefore,
the problem is not completely identiﬁable. Consequently, one should not expect to
obtain the inverse of the mixing matrix as the solution to the problem.

The authors’ main contribution is their analytical derivation to obtain a nonlinear
function f () which previously has been chosen in an ad hoc manner in the literature.
However, only a 2nd order approximation of the logarithmic function was considered
in which the derivation and output signal were assumed to have unit variance (2.42).
This certainly simpliﬁes the computations. But, it does not generate the correct
nonlinear function, since these output signal variances will contribute to the higher-
order terms. Therefore, in Chapter 4, a new energy function that is based on 3"1
order approximation of the logarithmic function, with no unit variance assumption,
will be derived. A justiﬁcation of this derivation will become apparent in the chapter.
Consequently, an update law based on the proposed energy function will perform

certain separation tasks, whereas the update law deﬁned by equation (2.47) and

(2.49) will be shown to fail.

2.3 Information Theoretic Approach

The authors, Bell and Sejnowski [45, 45, 25], considered a feedforward neural network
described by Figure 2.11.

y is the output of the neural network, x is its input vector, W is the parameter
matrix and b is the bias vector.

In their work, the authors considered a criterion that maximized the information

30

\

s(t)—. A x(t) \W\ u(t) .—_.
¥

 

 

 

 

 

 

 

 

 

 

Figure 2.11. Bell and Sejnowski’s Architecture

transfered through the network, which is deﬁned by

_ n fxyl x 3’)
1‘”) ‘ Ell ”11.011. (2'50)

Using the properties of the entropy given in appendix A, one can express equation

(2.50) as
1(x,y) = H(y)+H(X)-H(X.Y) (2-51)

01'

I(X.)’) = H(Y)-H(YIX) (252)

H (y) is the entropy of y and H (ny) is the entropy of y not generated from the input

x. The entropy of y is deﬁned as:

+00
H(y) = -E any(Y)] = - (.0. Mr) lnfy(y) dy

where fy(y) is the probability density function (pdf) of y. These author state that
H (ylx) does not depend on W. Thus, maximizing the information transfered through
the network was equivalent to maximizing the entropy of y.

01(X.y) = 3116')
6W 6W

31

Assuming that the nonlinearity function g is invertible, then the following will hold

_ fx(x)
f!(y) - IJI

 

where [J I is the determinant of the Jacobian matrix and J = 3%. So H (y) can be

rewritten as
H(Y) = Elln |J|l - Elln fx(X)l = H(X) + Elln lJll
Thus,

3__H(Wy) =E[_3_a__ln |J|

Consequently, the parameters will be updated according to

 

61” IJ'] (2.53)

AW: 178E[

Bell and Sejnowski suggested an approximation by dropping the expected value:

_a___1n|J|

AW=17-—a——W

(2.54)

To compute the gradient, a compact form for In [J I must be found.

[6.1/L; =g—::= g’(us):—:—;= 9’(U:)w0= [Ag ’(u)WL

Thus, [J I can be expressed as

IJ I = lAg'(u)Wl = lAg’(u)l-lWl = IWI H9110) (2-55)

32

where Av = diag(vl, ~ . - ,0“).
In |J| = ln |W| + Zlng'(um)

Therefore

alnlJl
610.,-
17[— 1 ('9_|__1’V|+ 1 32011)]
|W| away-+9111.) 810,-,-

(1:1))3] since |W| = ZWiJ'COCfiJ'
j

Ang =

 

          

 

   

= "[ |W|

_ -r 9 _((_g”: :1“ xT
— ,7 IW+1 ij
In matrix form

_ -r 9_(__”“) xT
AW—r)[W+ 97,—“) ]

In particular, if

 

Then

AW = 17[W‘T + (1 — 2y) :8]

(2.56)

(2.57)
(2.58)

(2.59)

(2.60)

(2.61)

(2.62)

(2.63)

Computer simulations were performed in order to study the update law deﬁned by

(2.63). Two prototype sine functions of respective frequencies 1kHz and 2kHz were

used as unknown sources. Computer simulations of various mixing matrices, initial

conditions and learning rates were performed. They all revealed that the algorithm

failed to separate signals. Figure 2.12 shows one example of its performance. Observe

that the performance index did not converge to zero and that the gain matrix P = WA

33

is not a generalized permutation matrix. One could justify the failure of the update
law deﬁned by (2.61) because it is an approximation of (2.53) in which the expected

value was dropped.

9°

cl
0

 

Performancelndex
N
4 UI M (II (0 0| h

o
o
N
o
h
o
a:
.0
on
d
d
to
A
A
d
ca
d
m
N

N

to
have?»

 

d

 

all

 

l
N

weight convergence
0

0

0° 2

 

 

Generalized Matrix WA

 

 

0L

 

Figure 2.12. Performance of Bell and Sejnowski Algorithm

In their work [25, 45, 26, 44], the authors did not simulate the update law as it is
deﬁned in (2.61). Instead, they considered the following algorithm. Given n mixture
deﬁned in a time interval [0, T], the authors ﬁrst deﬁned:

56) = (1 - 2Y(t))X(t)T

34

Table 2.1. Information Theoretic Algorithm

 

f Step Action

1 Pick a random time point t,- such that t,- + 1' S T.
2 Compute

 

AW.- = £22? 6(t)

3 Evaluate I/VfT

4 Compute AW.- = l/VfT + AW}

5 Update the weights

Wi+1 = “I; + 17AM

6 Compute performance index P.-

7 While P.- > Paging), go back to step 1

 

 

 

 

 

Then the weights were updated according to the steps given in Table 2.1.

Computer simulations of this algorithm were conducted using speech and pro-
totype sine signals. The algorithm performed well when speech signals were used.
It was able to separate mixtures of two, three and ﬁve speech segments of different
speakers. Figure 2.13 shows one example of its performance. However, the algorithm
failed to separate prototype sine signals. Observe Figure 2.14.

One undesirable feature of the algorithm developed by the authors is the choice
of a random time t,- at every epoch. The authors claimed that such a choice of t;
would guarantee the input stationarity assumption. However, this choice makes the
algorithm implementation in digital signal processing cumbersome since it requires
the storage of all the data in memory. Consequently, the algorithm does not perform
in real time. Most applications of the blind separation of signals require real time
performance. Thus, the algorithm, as it is, is not attractive and needs to be modiﬁed.

One other undesirable feature of the algorithm is the computation of the inverse of
the weight matrix W. By analyzing the steps of the algorithm closely, one concludes

that the computation of the inverse occurs after all the weights were accumulated.

35

Sejnowski algorithm using 2 speech inputs of 2 different speakers

 

 

 

 

 

 

 

 

 

3.-
£2.5*
— 2
g...
.g 1
0
°-o.5
O l L l l L 1 J
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
10' 8r
<
i 3
5’ g6
.. E
E o 34
.. k g
5... _ _ g.
3 0
(D
-10 - ‘ - o
0 2000 4000 6000 0

 

Figure 2.13. Performance of Bell and Sejnowski Algorithm Using Speech Signals

36

Sejnowski algorithm using 2 sine inputs of 1kl-Iz and 2kHz

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4-
x3.5
E
g 3
25
'3;
a. 2
1.5 l 1 l 1 l
0 50 100 150 200 250 300 350 400 450
4- 3
<
i 2 32.5
E 2
s 2 ~
§ 0 315
.5
52 . ‘
N c$0.5
-4 4 0 ‘ '
0 200 400 600 0 200 400

Figure 2.14. Performance of Bell and Sejnowski Algorithm Using Sine signals

37

This means that the input signals are frozen until such computation is completed.
Thus, the algorithm is not suitable for real time performance as is. One should
reconsider the way, how and/or when such an expression is computed in order to
make the algorithm perform in real time. One solution would be to compute the
inverse while accumulating the weights. This means that the operation of weight
accumulation and inverse computation would be performed in parallel.

As a ﬁnal remark, the algorithm deﬁned by (2.63) is based on the maximization
of the information transferred through the network, deﬁned by equation (2.52), with
appropriate choice of the neuron’s nonlinear function. One can interpret this devel-
opment based on the minimization of of the ﬁrst term of the mutual information
deﬁned by equation (2.39), namely the entropy of the output vector -H (y) This is

equivalent to the maximization of H (y)

2.4 Algebraic Approach

In [21, 56, 57, 22], Cardoso et al. proposed a series of two processes as shown in
the Figure below to achieve blind signal separation. The mixing matrix W was
factored out as the product of whitening matrix B and an orthogonalizing matrix
U; W = U B. Algorithms for updating the matrices B and U were developed, then

combined to obtain a one-stage update for the matrix W.

 

\
8(t) , X(t) B\ 26 ya)

\
\

\_

Figure 2.15. Cardoso’s Architecture

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

38

The matrix B was updated such that the output vector 2 was white, i.e. R, = I
where z = Bx. This was obtained by minimizing the distance between the matrices
R, and I. Thus, the appropriate error function was the Kullback-Leibler divergence
[58] deﬁned as

135(3) = Trace(R,) — In |R.| - n (2.64)

The matrix B was then updated along its gradient descent:

6E5

AB = "my?

: —1][zzT — [)3 (2-65)

The matrix U was updated such that Eu(U) = E f (y) = E f (U 2) was minimized.
Since this minimization was not constrained and may lead to a solution that did not

preserve orthogonality, the matrix U will be updated according to

AU- - “ aE‘T]— [i’ T ’ U 266
—-25€-3U —-nf()')y -yf(y)] (- )

 

By combining the whitening and the orthogonality stages, Cardoso et al obtained an

overall update for the matrix B:

AW= n[I-ny—f'(y)yT+yf’(y)]W (267)

Computer simulations were performed to study the update law deﬁned by (2.67).
When the network dimension was two, the algorithm was able to separate signals
regardless of the initial conditions and the learning rate 1]. Figure 2.16 presents an
example of performance.

When developing the algorithm described by (2.67), the authors considered a Eu-
ler approximation of the gradient of the considered energy (contrast) functions. The

developed algorithm is suitable for digital implementation. Here, I will consider its

39

 

 

 

 

 

 

 

 

 

 

 

Digital Update CPU Time a 74.85
4 ..
3 .-
2 .-
1 .-
0 l l 1 ‘ l l l l 1 I
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
2' _ 1.5,
<
3
3:: 1 ’
(U
2
0i ‘3 0.5
.5
E \/
6 °’
(9
* ‘ ‘ ‘ -0.5 ‘ ‘ ‘ J
0 0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2

Figure 2.16. Cardoso Algorithm for n =2

40

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Digital Update. CPU Time . 199.2
8 -
X
§ 6 ~
g 4 _
1% 2 -
n.
o l l l 1 l l l L I l
0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5
3 . 2 .
k <
E 1.5 .
‘5
o 1 2
1
E, g
a
g 0.5
-1 0
(D
-2 - . ‘ o - -
0 0.2 0.4 0.6 0 0.2 0.4 0.6

Figure 2.17. Cardoso Algorithm for n =3

 

KL!

r—1

41

continuous-time realization of the algorithm, investigate its performance and com-
pare it to its discrete-time counter part as deﬁned by (2.67). The continuous-time

algorithm is now deﬁned by the instantaneous update equation

W -—- n[I—ny— f’(y)yT +yf’(y)]W (2.68)

Starting from the same conditions, computer simulations for both realizations were
performed. A comparison of the cpu time taken by each algorithm is considered. It
can be concluded that the discrete-time version of the algorithm took more time than
its continuous-time counterpart when one compares the 74.85sec that the discrete-
time algorithm took compared to the 50.93sec for the continuous-time algorithm.
Examples of simulations for a two dimensional network are presented in Figures 2.16
and 2.18. However, when the network is of dimension 3, the algorithm converges, but
not to an acceptable solution. Observe in Figures 2.17 and 2.19 how more than 3
entries of the generalized permutation matrix converge to non-zero values. Thus, the
solution is not acceptable. One concludes that the algorithm developed by Cardoso

may not work for networks of dimension higher than 2.

42

Contineous Update CPU Time c 50.83

 

“AA‘AAMAA‘AAAAAAAA.AA %

0.08 0.1 0.12 0.14 0.1 6 0.1 8 0.2

 

 

 

 

 

 

 

 

 

2' W 1.5-
7 " <
3
1. g 1’
g E
g o g 0.5
#v “:__J:*“ 2.2 ‘7 u
E
-15 g o-
..r :“‘.‘:..r if“: r “ o
o
-2 - - ‘ - —o.5 a - - -
o 0.05 0.1 0.15 0.2 o 0.05 0.1 0.15 0.2

Figure 2.18. Continuous-time Algorithm for n =2

43

Contineous Update CPU Time = 319.7

0
1

Performance Index
to a

 

 

l l l I I I J

0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s 2
~ <
2 3
g 1.5 »
1 ﬂ
9 5 1
§ 0 E
s
g g 0.5
-1 o
o
-2 A = - o A -
o 0.2 0.4 0.6 o 0.2 0.4 0.6

Figure 2.19. Continuous-time Algorithm for n =3

CHAPTER 3

Blind Separation in a Static

Environment

In this chapter, a blind separation algorithm in static environment will be devel-
oped. The mathematical analysis to develop adaptive laws for the problem will be
presented. The law will be derived based on the nonlinear decorrelation condition of
the output. The resulting update laws will then b to test for independence of the
output. Computer simulations will be presented to demonstrate the performance of

these novel algorithms.

3.1 Problem Deﬁnition and Architecture .

The problem can be posed as follows: given that some unknown sources are sent
from unknown sources, these sources are mixed according to some unknown model
that describes the medium through which these sources have traveled. The goal
then, is to construct a system that recovers these unknown sources based only on the
measurements of the mixtures of the original unknown sources. It is also assumed that
the environment is a static model. It can, therefore, be represented by a static matrix

to characterize its input-output relationship. Figure 3.1 describes the architecture of

44

45

the system. The unknown sources, the output of the environment and of the network

are respectively labeled as s(t), x(t) and y(t).

\

8(t) x(i) \K W)
, A - :

\__

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3.1. Static Environment Architecture

3.2 Theoretical Solution

Given the measured vector x(t), which is the only given data, the neural network
should be constructed to adaptively change the parameter W so that its output
g(t) and the original signal s(t) will satisfy the wave preserving property which was
deﬁned section 2.1. This property implies that one is not after an exact replica of the
unknown, but rather a permutation and up to a constant of them.

Now, assume that the correct update law of the problem is developed and the

system converged to a solution W‘. Then,

y(t) = W'x(t) = W‘As(t) (3.1)

Therefore, by combining equations (2.32) and (3.1), one obtains

W“ A = P r (3.2)

Recall that P and I‘ are respectively a permutation and nonsingular diagonal matrix.

46

Equation (3.2) can be rewritten as
W" = P I‘ A"1 (3.3)

This equation shows that a theoretical solution to the problem does exist and that
there are inﬁnitely many possible solutions, thanks to the freedom that the matrices
P and F provide. Also, equation (3.2) requires that the gain matrix G = WA is a
generalized permutation matrix.

Knowing that a theoretical solution exists, one can now investigate possible ways
to construct a network 'with appropriate update laws that perform the task of sepa-

rating signals. To do so, an appropriate energy function is deﬁned next.

3.3 Energy Function

To develop an update law, an energy function that characterizes the problem is now
deﬁned. If it were a recognition or a classiﬁcation problem, the energy function would
have been chosen as an error function; a difference between the desired output vector

3(1) and the observed output vector y(t) deﬁned as:

«((2.6) = $- / 26.6) — y(t))” dt (3.4)

This criterion will, therefore, require the knowledge of the target signal. However,
by assumption, there is no a priori knowledge of the signal vector. In this problem,
it is expected that the network will reproduce, regenerate, and recover these sources
without any direct knowledge of them. Thus, such a traditional energy function will
not be appropriate.

On the other hand, some kind of knowledge about the original signals is assumed:

they are independent. This feature or criterion will be the basis for deﬁning the

47

energy function analogous to the one deﬁned by (3.4). In order to deﬁne the energy
function that ﬁts the problem at hand, one needs to rely on the deﬁnition of the

independence of signals [59].

Deﬁnition 2 (Deterministic Independence) The components
of an n-dimensional time-varying signal vector y(t) = [y1(t) - - -yn(t)]T are linearly
independent over a time interval [0,T] if and only if their auto-correlation matrix is

positive deﬁnite for all time t E [0, T]

t
Ry(t) = / y(r) y(r)T dr is positive deﬁnite for allt S T
0

Based on this deﬁnition, one possible energy function could be based on identify-
ing necessary and sufﬁcient conditions for the autocorrelation matrix to be positive

deﬁnite. Theorem 1 deﬁnes this characterization [60].

Theorem 1 (Positive Deﬁnite) If a matrix M is positive deﬁnite, then

[M] S ﬁ my; (3.5)

k=1

with equality if M is diagonal.

The operator |M| denotes the determinant in the case of a square matrix M. The
proof of this theorem is arrived at by induction and shown in appendix B.l. Using
Theorem 1, one can develop a criterion, or energy function, that is always positive,
but is minimum when the matrix is diagonal. To develop such a function, some
equivalent statements of Theorem 1 will be presented. Let M be a positive deﬁnite

matrix. Then,

11
M is positive deﬁnite => H m“; 2 [M I
Ic=l

=> lnHmkk ZlnIMI

i=1

48

=2 Zlnmkk—lnIMI 20

k=1

with equality if and only if M is diagonal.

Deﬁne the mapping

J: Rm“ —-+R+

M l-—> J(M) = %[kz:: lnmkk - In [MI] (3.6)

When the energy function deﬁned by (3.6) is minimized, one would obtain a diagonal
matrix. Thus, this energy function will be used in this problem by considering the
autocorrelation matrix, since the decorrelation of the components of the output signal
vector is obtained when the autocorrelation matrix of the output vector is diagonal.
In the work presented in this chapter, an update law that pushes for the decorrelation
condition only at the ﬁrst step will be developed. Then, some techniques found in the
literature [8, 24] to push for the independence condition will be used. Also, different
numbers of measurements and sensors will be considered. Thus, the matrix W is an
n x m where m 79 n. A special form of the update law for the different scenarios of
the relationships between the number of sensors and measurements will be given.

The autocorrelation matrix of the output signal vector is deﬁned as

12,6) = / y(rmoT dr =< M >. (3.7)

Thus, using the mapping deﬁned by equation (3.6), the energy function will be deﬁned

as

<I>=J(R,(t) =%]k§::lln<y§ >,—ln|<ny >: I] (3.8)

49

The matrix W will be updated along the gradient descent of (I)

 

 

. 6(1)

W = "lo—n?
_ [_ 1 6|<ny>,|_1i 1 6<y§>t
"2|<ny>,| 6W 2k=1<yf>t 6W

3.4 Update law derivation

First compute

all):

6ng

a < 3113 >t
6ng

 

2<yk >t

However,

= Z wuss:
1

Therefore,

 

 

3 3
yk =26 whiz =Z6ki5kj$l= 51:69:,“

 

310,-, r awe-2
Consequently,
'a < 2 >
——!"-—t- = 261,.“ < ykxj >¢
6ng
So,
1 n l a < y)“; >1 .
_ = — ——26 - < :c- >
2g<y§>g 6W 2§<—_1_2yk>¢ h 311:] t
< ll;2 >:

= [(diag < ny >t)-l < yxT >¢] ..

U

(3.9)

(3.10)

(3.11)

(3.12)

50

The other term

a|<ny>. | _ 8|<WxxTWT>tI
6W _ 6W

 

 

(3.13)

In [60], it was derived that for any square matrix M and any matrix X such that

X M X T is nonsingular

BIXMXTI

ox = IXMXT|](XMXT)"XM+(XMXT)‘TXMT]

In addition, if M is symmetric, equation (3.14) becomes

T
M—X—l = 2|XMXT|(XMXT)-1XM
3X
Applying equation (3.15) to equation (3.13)

al<ny>rl
3W

 

= 2 < |WxxTWT|(WxxTWT)-1WxxT >.

= 2 < Inyl(ny)"yxT >¢

Plug in equations (3.12) and (3.16) into (3.10), and one obtains

T|(ny)"yxT >

< IWTI >:

 

. < . _
W = 17] Iyy t—(dzag<ny>t) l<yxT>¢]

If W is a square nonsingular matrix, then (3.17) becomes

W = q[W"T — (diag < ny >.)'1 < yxT >.]
= 17]] — (diag < ny >t)‘l < ny >,]W‘T

= n[I —(diagR,)‘1Ry]W'T

(3.14)

(3.15)

(3.16)

(3.17)

(3.18)
(3.19)

(3.20)

51

At steady state, equation (3.19) becomes

W = 0 I = (diag < ny >1)“l < ny >1
< ny >¢ = diag < ny >1

< yiyj >1: 0,Vi géj

{111111}

the components of the signal vector y(t) are decorrelated.

Thus, the algorithm deﬁned by equation (3.19) tests only for decorrelation.
In order to compute the weight update according to equation (3.19), the time
average < ny >1 deﬁned by equation (3.7) must be computed. To do so, a state

matrix Z (t) is deﬁned as
Z(t) = y(t)y(t)T, with Z(to) = o (3.21)
Thus,
‘ T T
Z(t) = / Y(T))’(T) dT =< W >:
*0
Therefore, equation (3.19) is simpliﬁed to
W = n]I-(diag Z)“Z]W’T (3.22)
By running equations (3.21) and (3.22) simultaneously, the algorithm deﬁned by

(3.19) will be successfully implemented.

3.5 Computer Simulations

Computer simulations were performed to study the derived algorithm. Starting with

two sine waveforms as the unknown sources and mixing them by a matrix A, the

52

mixture signal vector will be obtained. This signal is then fed to the feed forward
neural network. If the output of the network is a constant of any of the two unknown
sources, then the algorithm would have succeeded in separating signals. The separa-
tion can also be achieved if the forward loop gain matrix P = WA is a generalized
permutation matrix, where a generalized permutation (GP) matrix is one that has
only one nonzero entry in each row and column.

Numerous simulations for various initial conditions and learning rates were per-
formed in order to study the algorithm as deﬁned by (3.20). In all these simulations,
the algorithm failed to separate the signals. An example of such simulations is shown

in ﬁgure 3.2. The initial condition is

0.1352 0.4553
0.7832 0.3495

0:

The network converged to

0.2265 0.4187
0.8116 0.2758

which gives the over all gain matrix

0.1277 0.3293
0.9226 -0.3700

Figure 3.2 shows that the performance index does not go to zero, implying that the
gain matrix WA is not a generalized permutation matrix. This can also be observed
by looking at the plots of its entries and noting that none of them go to zero. If the
algorithm converged to a separating matrix W, then two entries of the gain matrix
should go to zero. However, the ﬁgure shows that all the entries converged to nonzero

values.

1°
01

70

ab

Perionnance Index
in

53

Simulation of eq1.m (eta,b)=(100,5) (l0,l)=(0.5418,1.817)

 

 

 

 

 

 

 

 

 

0.5 I A I I I I I I L I
o 01 0.2 03 04 0.5 06 07 ca 0.9 1
1.
<
§o.e 3
g, E
rill
030.6) g
§a4W ‘ g
g, s
v 0
$0.2 5
o
o A I

 

 

 

Figure 3.2. Performance of Algorithm deﬁned by equation (3.19)

54
3.6 First Improvement

The algorithm deﬁned in (3.19) failed to perform a separation of signals because it
tested only for decorrelation of the output signal. In [8, 9, 10], Herault and J utten also
arrived initially at an algorithm that was similar to the one derived in (3.19). They
claimed that independence would be satisﬁed if higher order moments of the output
are generated within the update rule. Therefore, some odd nonlinear functions that
would exhibit an inﬁnitely many order of moments by expanding these functions in
their Taylor series expressions was introduced.

Thus, by considering a similar approach [8], as was discussed in Chapter 2, (3.19)

becomes

W = n[1 — («hag < angry? >.)A‘ < f(y)g(y)T >.]W-T (3.23)
At steady state, (3.23) is

W = 0 I = (diag < f(y)g(y)T >:)‘1 < f(y)g(y)T >.
< f (y)g(y)T >: = diag < f(y)g(y)T >.
< f(ye)9(yj) >: = 0, W #J'
< Z a‘skyfyf >. = 0, Vi 7e j
1,): odd
< yfyf >. = 0, Vi ,1 j, w, W:

1111111}

11

where the coefﬁcients a; and B). are the Taylor series coefﬁcients of the functions f and
g, respectively. This proves that the solution of this algorithm provides independent

output signals as a solution to the problem, which is the desired goal.

55

3.6.1 Computer Simulations

To evaluate the update of the weight according to (3.23), the computation of <

f (y)g(y)T >1 must be performed. Thus, the state matrix

Z(t) = f(Y(t))y(Y(t))Ta with Z (to) = 0 (3-24)

is deﬁned. Consequently, (3.23) becomes

W = 17]] — (diag Z)'1Z]W'T (3.25)
By running (3.24) and (3.25) simultaneously, the algorithm deﬁned by (3.23) is real-
ized.

Computer simulations for the algorithm deﬁned by (3.23) were performed using

different nonlinear functions. Therefore, different cases are considered.

Case 1: f(x) = 3:3 g(z) = :1:

Numerous simulations were performed using different initial conditions and learn-
ing rates. The following set of simulations was obtained by using the same initial

conditions

0.4523 0.9317

0 _
0.8089 0.6516

but, different learning rates. Figures 3.3 through 3.5 show the performance of algo-

rithm (3.23) when the learning rate is equal to y(t) = 100, 10, 1006'5t Vt. In ﬁgure

3.3, One can observe that the algorithm converges. However, it exhibits some oscilla-

tions and never settles to a constant level of zero, though it oscillates around it. This

56

behavior can be eliminated if a smaller learning rate is considered. Figure 3.4 shows
the simulations under the same conditions when n = 10. It can be observed that a
longer time is needed for the algorithm to converge. Therefore, a way of combining
the observed behaviors in ﬁgures 3.3 and 3.4 is to consider a time-varying learning
rate. Thus, when r] = 1006‘“, one observes a smooth convergence of the algorithm
as shown in ﬁgure 3.5.

Table 3.1 shows the performance of the algorithm for 20 different random initial
conditions, but all having the same time-varying learning rate 17 = 100e'5‘. It can be

observed that the algorithm converges in all cases.

Case 2 f(x) = sinha: g(x) = tanh :1:

Numerous simulations were performed using different initial conditions and learn-
ing rates. The following set of simulations was obtained by using the same initial

conditions

W0 = 0.1352 0.4553

0.7832 0.3495
and variable learning rates. Figures 3.6 through 3.8 show the performance of algo-

rithm (3.23) when the learning rate is y(t) = 100, 10, 1006‘5t Vt

Observe that a learning rate 1] of 10 was too small to obtain any concluding
results within the time span for training. When 17 is 100, one is able to observe that
the network is converging to a generalized permutation matrix. However, it can be
observed that the parameters exhibit some oscillations around a constant level. These
oscillations were eliminated by considering a learning rate that decays with time. The
appropriate function is 17(t) = 1006'“. A set of 20 simulations of the algorithm with

different initial conditions was performed. Table 3.2 represents the performance of the

algorithm in all these simulations. The initial conditions were chosen randomly. The

57

Simulation of eq4.m (eta,b) = (100.0) (l0,l) = (0.8499.0.03366)

 

 

 

 

 

 

 

1.—
gos-
$06-
04-
'8
£02]
0 on 02 as o} 03 06 0} d8 d9 1
12» 15'
‘2‘
W a C C : : : : :—
as» 2
g g 05
0,6W/U— ,5
2 WM
04 5 °
W a
8%» 03 1 .0ﬂ) as 1
0.2981 0.9919 —0.0267 1.0163
W’ [08456 0.6035] WA‘[ 0.8377 0.0196

Figure 3.3. Performance of Algorithm deﬁned by equation (3.23) for 17(t) = 100.
f(x) = $3 and 9(3) = 1‘

58

Simulation of eq4.m (eta,b) = (10.0) (l0.l) 2 (0849901006)

A
5

Performance Index
a o o o

.
l

 

 

 

 

 

 

 

 

 

 

 

1 V—_\ 1 5
<
a. .5 C7":
g 2
§ 0.6 NV ‘3 o 5
~ §
30.4 L/\ 2 0b
8
0’20 0:5 1 4'50 04.5 1
—0.0546 1.0416 -0.0267 1.0163
W = [ 0.8341 0.0246 ] WA = [ 0.8377 0.0196

Figure 3.4. Performance of Algorithm deﬁned by equation (3.23) for 17(t) = 10.
f(x) = $3 and y(t) = x

59

Simulation of eq4.m (eta,b) = (100,5) (I0,l) 2 (03499002937)

d
I

I

x 0.8
§

 

 

 

 

 

 

 

 

 

 

 

 

 

 

g 0.6 -
E 0.4 -
E
8 0.2
L I j I I P x 1 l
00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.2- 1.5»
<
§ 1 PW :‘ 1 C75
5'085"’\"* 13 v
9 ' § 05
§ 0.6 M7 %
€04 W, § 0km
— c
0.2 - - -0.5 . -
0 0.5 1 0 0.5 1
0.2919 0.9937 -0.0078 0.9992
W" [ 0.8490 0.5987] WA“ [ 0.8473 0.0056

Figure 3.5. Performance of Algorithm defined by equation (3.23) for y(t) = 1008‘“.
f(x) = $3 and g(x) = x

60

Table 3.1. Simulation Results Algorithm defined by equation (3.23). f (:c) = 2:3 and
9(3) = 1'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Simulation # Final Performance Index
1 0.1490 g
2 0.0125
3 0.0132
4 0.1692
5 0.0134
6 0.1036
7 0.0924
8 0.0157
9 0.0587
10 0.2035
11 0.0560
12 0.1302
13 0.0164
14 0.0266
15 0.7045
16 0.0377
17 0.0209
18 0.0244
19 0.0417
20 0.0526

 

 

 

 

 

 

61

Simulation of eq2.m (eta,b) = (100.0) (|0,l) = (0849902341)

1"
0|
1

in m

Petformance Index

0

o in

 

 

 

O

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

.°
m
_L

 

 

g... E
' g 0.5
g 2
§0.4 g
u
2‘ 3 °
0.2 2
0
c9
0 ~ - —0.5 - -
0 0.5 1 0 0.5 1
0.1076 0.4627 -0.0395 0.4903
W — [ 0.6920 0.5060 J WA = [ 0.6838 0.0273

Figure 3.6. Performance of Algorithm deﬁned by equation (3.23) for ”(t) = 100.
f(x) = sinha: and g(z) = tanha:

62

Simulation of eq2.m (eta,b) = (10,0) (l0,l) = (0.8499.0.3421)

d

Perionnance Index
9
01

 

P
01

 

weight convergence
.0 .°
N 15 ' ‘
Generalized Matrix WA

 

 

 

0 ‘ ‘ -0.5

0.5 mm

 

0 0.5 1 0 0:5
0.1626 0.4462 0.0364 0.4208
W‘ [0.7229 0.4623] WA: [0.7395 -o.0554

Figure 3. 7. Performance of Algorithm deﬁned by equation (3. 23) for y(t):
f-—(:1:) - sinha: and g(z)- — tanha:

63

 

 

 

 

 

 

 

 

 

 

 

Simulation of eq2.m (eta,b) = (100.5) (l0,l) = (03499007759)
2.5 -
E 1.5
1
1%
(L 0.5 P
o l l l L l l l I I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.8 1
. S
0.6 .
E. 5..
g 2
§ 0.4 ‘3
.5
5... g
3 <9
0 ‘ ‘ -0.5 ‘ ‘
0 0.5 1 0 0.5 1

0.1313 0.4567
0.6970 0.4991

-0.0073 0.4618

WA 0.6927 0.0142

w=1

1 [

Figure 3.8. Performance of Algorithm defined by equation (3.23) for y(t) = 1006“"

64

Table 3.2. Simulation Results Algorithm deﬁned by equation (3.23). f (:0) = sinhx
and g(x) = tanhz '

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Simulation # Final Performance Index
1 0.0066
2 0.1168
3 0.0517
4 0.1163
5 0.0543
6 0.0462
7 0.1836
8 0.2338
9 0.0062
10 0.0192
11 0.0084
12 0.0620
13 0.0347
14 . 0.0410
15 0.0038
16 0.2772
17 0.0165
18 0.0180
19 0.0510
20 0.0863

 

 

 

 

 

results show that the convergence was obtained regardless of the initial conditions.

3.7 Second Improvement

In this section, the focus will change to an implementation view point in order to

modify the derived algorithm deﬁned by (3.23). Recall that such equation is

W = n[1-(diag< 10090? >0“ < f(y)g(y)T >.] W

65

This algorithm requires the computation of the inverse of the matrix W. This com-
putation is complex and time consuming. It will be of great advantage if such a term
could be eliminated from the update law. In [55], the author suggests that using the
information geometry perspective, the update equation of W deﬁned by (3.9) can
become

0(1)

since the matrix A is assumed to be nonsingular. Therefore, equation (3.23) can be

modiﬁed to the following algorithm:
W = n[1 - (diag < f(Y)9(Y)T >:)“ < f(Y)9(Y)T >:]W (3-27)

3.7 .1 Computer Simulations
By running (3.24) and

W = ”[1 - (diag Z)"‘Z]W (3.28)
simultaneously, the algorithm deﬁned by (3.27) was implemented successfully. Com-
puter simulations for the algorithm were performed using different types of nonlin-

earities, as was done for the previous algorithm.

Case 1: f(x) = :63 and g(x) = :0

Numerous simulations were performed using different initial conditions and learn-
ing rates. The following set of simulations was obtained by using the same initial

conditions used to test the previous algorithm, in order to provide some comparison

66
between the two. The initial condition is

0.4523 0.9317
0.8089 0.6516

Figures 3.9 through 3.11 show the performance of algorithm (3.27) when the learning
rates are y(t) = 100, 10, 1006’“ Vt, respectively. One can observe in Figure 3.9
that the network parameters oscillate around the the desired values when 1] = 100.
However, by considering a time-varying learning rate 77 = 1006’“, such oscillation
were eliminated and the parameters will settle to a constant value, as shown in Figure
3.11. Figure 3.10, however, demonstrates that a learning rate 17 = 10 is too small in
order to obtain any concluding results.

Table 3.3 shows the results of twenty simulations of the algorithm deﬁned by
equation (3.27) with different initial conditions. It should be observed that 4 out of
the 20 simulations failed to converge to a separating network since the corresponding
performance indices are not near zero. Simulations that have a ﬁnal performance
index of more than n/4, in this case 0.5 since 11 = 2, are considered as failures. These

simulations are labeled by an asterisk‘ in Table 3.3.

Case 2: f(x) = sinhx and g(x) = tanha:

Numerous simulations were performed using different initial conditions and learn-
ing rates. The following set of simulations was obtained by using the same initial
conditions used to test the previous algorithm in order to provide some comparison

between the two. The initial weight is

0.0475 0.3282
0.7361 0.6326

67

Simulation of eq18.m (eta,b) = (100.0) (l0,l) = (08406006222)

coop
out
II

Performance Index
0 i0
i-
L.

 

o
o
d
o
to
o
(a)
o
h
.0
(n
o
a:
o
N
o
m
o
co
cub

 

 

 

 

 

 

 

 

 

1- 0.8
<
§0.8W E 0.6:
C
30.6 M‘ I § 0.4L
§ 8
50.4, g 0.2
a:
00 0:5 1 '0‘20 05 1
0.2276 0.7785 -0.0076 0.7838
W" [0.7458 0.5348] WA"[ 0.7410 0.0161]

Figure 3.9. Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 100.
f(x) = $3 and g(x) = 3

68

Simulation of eq18.m (eta,b) = (10,0) (I0,l) = (0840602191)

cub
l

Perlormance Index
o o o o
is o

I

in
j

 

0°
)—
F

 

 

 

 

 

 

 

1» 0.0»
<
§0.BQ=\ #_ 3 0.6’
§ 3
g0.6\/\ E 0.4.
§04 E 02
5' \/\’ 8 '1 a
0.21 s 0» N
3 o
0 a - -0.2 1 -
o 0.5 1 0 0.5 1

0.1960 0.7498 —0.0366 0.7754
W‘[0.7373 0.4790]WA"[ 0.7514 —0.0470

Figure 3.10. Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 10.
f(x) = $3 and 9(9) = z

69

Simulation of eq1 8.m (eta,b) a (100,5) (I0,l) = (08406001327)

99.0
hard:

Performance Index

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 ’ 0.87
<
0.810 3 0.6»
g» 1%
00.6V\/\ 2 0.4:
E... 5 .2
. .é .
0.2 W § 0%
<9
0 - -0.2 2 1
0 1 0 0.5 1

0.2357 0.7910 -0.0020 0.7924
W: [0.7341 0.5163]WA'[ 0.7332 0.0031]

Figure 3. 31.1 Performance of Algorithm deﬁned by equation (3. 27) for 17(13): 10.
f(x) - $3 and y(t) - 3

70

Table 3.3. Simulation Results Algorithm deﬁned by equation (3.27). f (2:) = 2:3 and
y(t) = z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Simulation # Final Performance Index
1 0.0211
2 0.0733
3 0.3574
4 1.5079“
5 0.0138
6 0.0298
7 0.1022
8 0.2058
9 1.0605“
10 0.0478
11 0.0578
12 0.0228
13 0.0544
14 1.3249“
15 1.4127"
16 0.0254
17 0.0332
18 0.0636
19 0.0415
20 0.0467

 

 

 

 

 

 

 

71

while the learning rates varied. Figures 3.12 through 3.14 show the performance of
algorithm (3.27) when the learning rates are 17(t) = 100, 10, 1006'“ Vt, respectively.
It can also be noted that, in the set of simulations, a time-varying learning rate
17 = 1006'“ Vt, had to be introduced in order to eliminate oscillations, as observed
in Figure 3.12, when the learning rate is of a constant value equal 100. Figure 3.14
shows the results for such a time-varying learning rate. On the other hand, when the

eta = 10, such a learning is to too small to enable the network to arrive at the desired

solutions.

Simulation of eq16.m (eta,b) = (100,0) (l0,l) n: (0878601088)

 

 

 

 

 

 

 

 

 

 

 

g 0.8 -

20.6 -

E 0.4

8 0.2
00
1 » 0.8 k

§0.s W i; 0.6»

30.6: E 0.4»

— 0

g 0.2» g, 0 W4:
00 0:5 1 ”0'20 05 1

W= [313233 322332] WA= [ 33333” 3.313%]

Figure 3.12. Performance of Algorithm deﬁned by equation (3.27) for y(t) = 100.
f(x) = sinhz and y(t) = tanha:

72

Simulation of eq16.m (eta,b) = (10.0) (l0,l) :1: (08786008517)

Performance Index
0 0
rs 01

.0
O N
r—
P
)—

 

 

 

 

 

 

 

 

 

 

 

1 ' 0.8-
<
3 .
$0.8 W .E 0.6
16
o 2 0.4*
S 0.6 '3
.5 0.2
a \/
0.4 l\ g 0 v V
/o:5—_\
0.20 - -0.2 - -
1 0 0.5 1
0.2183 0.7657 —0.0144 0.7757
W: [ 0. 7540 0. 5420 1 WA " [ 0.7485 0.0181 1

Figure 3.13. Performance of Algorithm deﬁned by equation (3.27) for y(t) =
f(x) = sinhzr: and 9(2) = tanhz

Table 3.4 shows the results of twenty simulations of the algorithm deﬁned by (3.27)
with different initial conditions. It should be observed that four simulations out of
these twenty did not succeed in separating the signals. They are marked with a star
in Table 3.4.

Table 3.5 shows the results of computer simulations between algorithms deﬁned

by (3.23) and (3.27). One may conclude that both have similar performance.

73

Simulation of eq16.m (eta,b) .-. (100,5) (I0,I) = (08786006408)

Performance Index
.0 .° .0
a a: 00

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.2
00 0:1 0:2 0:3 0:4 0:5 04.6 0:7 0:8 0:9 1
0.8 ’ 0.8 r
g E
0.6 ’
0.6 '
5 \ 5
g z 0.4:
‘3' 0 4» '3
W’ % 0.2»
$02» g o W¥
(5
0 K A - -0.2 - »
0 0.5 1 0 0.5 1
0.2402 0.7836 0.0065 0.7791
W " [0.7457 0.5223] WA ‘ ] 0.7456 0.0004]

Figure 3.14. Performance of Algorithm deﬁned by equation (3.27) for 17(t) = 1006”“.
f(x) = sinha: and g(x) = tanha:

74

Table 3.4. Simulation Results Algorithm deﬁned by equation (3.27). f (0:) = sinhx
and 9(2) = tanh :1:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Simulation # Final Performance Index
1 0.2664
2 0.2652
3 0.0806
4 0.8735*
5 0.0906
6 0.0832
7 0.1283
8 0.7715"r
9 0.1956
10 0.2490
11 0.1415
12 0.0141
13 1.7091*
14 1.0756*
15 0.0776
16 0.3482
17 0.1142
18 0.0558
19 0.3650
20 0.0484

 

 

 

 

 

 

Table 3.5. Results of Simulation Comparison between Algorithms deﬁned by equa-

tions (3.23) and (3.27)

75

 

 

 

 

 

 

 

 

 

 

 

 

Initial Conditions Final Results Of (3.23) Final Results Of (3.27)
__ W0 f WA WA
“'1' 0.9347 0.5194 0.8805 —0.0079 0.9906 0.0055
_ 0.3835 0.8310 ] 0.0082 0.8692 —0.0127 0.7380
If = 0.0366 If = 0.0433
0.0475 0.3282 -0.0054 0.3223 0.0229 0.3824
_ 0.7361 0.6326 . 0.8063 —0.0169 _ 0.7008 0.0089 .
7 If = 0.0966 If = 0.1283
0.1351 0.4553 —0.0091 0.4634 0.0069 0.4423
_ 0.7832 0.3495 . 0.6913 0.0163 . 1 0.8857 0.0091 J
7 If = 0.0916 I; = 0.0545
0.1351 0.4553 —0.0091 0.4634 0.0069 0.4423
[0.7832 0.3495 ] 0.6913 0.0163 . 0.8857 0.0091 2
I, = 0.0916 If = 0.0545

 

 

 

 

 

3.8 Observation and Remarks

In this chapter, an algorithm based on the decorrelation condition between the com-
ponents of output of the signal vector was developed. The resulting algorithm was
improved in order to test for the independence between the components of the out-
put vector. It was shown that the algorithm deﬁned by the ﬁrst modiﬁcation always
converged to a separating network regardless of the initial conditions. The algorithm
of the second modiﬁcation is more attractive since it eliminates the computation of
the inverse of a matrix. However, such elimination resulted in an algorithm that has

a 20% chance of failure rate.

CHAPTER 4

Higher Order Statistics

It was proved in the literature [8, 23, 61] that second order statistics are not suf-
ﬁcient to solve the problem of the blind separation of sources. One will also recall
that the static algorithm, developed in Chapter. 3, was initially based on the second
order statistics, since the correlation deﬁned such characteristics as the statistical
description of a signal. Some techniques widely used in the literature were used to
improve the algorithm performance to test for higher order statistics by generating
inﬁnitely many orders of moments through the injection of odd nonlinear functions
f and g in the algorithm. It will be shown how these functions can be determined
by deﬁning an independence criterion that deﬁnes the problem. It was proved in the
literature, that cumulants up to the fourth order are sufﬁcient to approximate the
probability density functions of a random variable using the fourth order Edgeworth
approximation [62]. Consequently, higher order statistics have enabled several re-
searchers to analyze the problem to some extent, Amari et a1 [24, 63, 64,.65], Comon
et al [23, 66, 22], Tong et al [29, 28]. However, one should keep in mind that these
algorithms were developed for a static environment or discrete dynamic Finite Im-
pulse Response (FIR) models. However, none of these approaches addressed a general
dynamical environment as it will be deﬁned in this work. Such a dynamic system

has memory represented by its dynamical states. Through the framework of opti-

76

77

mal control theory, or the calculus of variations, and through the use of higher order
statistics , this research will focus on developing new algorithms. In this chapter, an
independence criterion will be developed. It will be used to derive an update law for
the parameters of a static feedforward and feedback network models. The developed
independence criterion will be also used to derive update laws for the parameters of
a dynamic feedforward and feedback network models the subsequent two chapters.
To introduce higher order statistics, one must ﬁrst give the deﬁnition of statistical

independence.

Deﬁnition 3 (Statistical Independence) The components of a random signal
vector y are statistically independent if and only if their joint probability density (pdf)
py(y) is the product of all individual marginal probability density functions p,“(y,-).
Symbolically,

:00) = Hat-(172°) (4.1)

4.1 Mutual information

One way to measure the independence of the components of a random vector is to
measure the distance between the right and the left hand sides of equation (4.1).
When such a distance is zero, then the random vector’s components are statistically
independent. A well known distance in the literature is the Kullback divergence

functional [58]

I(y) = / fy(u)1n 1311421341: (42)

where f,(y) is the pdf of a random vector y. The functional I (y) is always positive
and is zero if the components of the random vector y are statistically independent.

It deﬁnes the level of dependence between the components of the signal. Therefore,

78

it represents a good functional for characterizing statistical independence. I (y) can

be expressed in terms of entropy as deﬁned in Appendix A.

10) = —H(y) + 23110.) (4.3)

Some properties of entropy are presented in Appendix A.
Consider the vector y E ’R’" to be the output of a system described by the linear

mapping W = [W1 W2] due to the input signal vector x e R" as shown in Figure 4.1.

\

s(t) x(t) \\ Y“)
_. A 4

\_

 

 

 

 

 

 

 

 

 

 

 

Figure 4.1. Feedforward Architecture

Deﬁne the vector

z=[yl...ym,xm+l...xn] (4.4)
Then,
2 = Wx (4.5)
where
- W1 W:
W = (4.6)
0 In-m

W is a nonsingular square matrix with the assumption that W1 is also nonsingular.

79

Using equation (4.5), a relationship between the probability density functions of the

random vectors 2 and x can be obtained

= 1}
IWI

 

f. (4.7)

Since 2 = [y1 - - -ym,:r:m+1 - - . :cn], one can express the pdf of y in terms of that of

i=[$,...xm]

 

 

f2 = _: Lawns-~45. (4.8)
= f-Sooo fX(x)'d;/ﬂ;+l ' ' ' dz“ (49)
'5' (2.2)

Finally, using the entropy deﬁnition and the equation above
H(y) = —E[ln f,] = ln IWI + H(i)
Therefore, equation (4.3) becomes
I(y) = —H(5’r) - ln |W| 4.211(0) (4.11)

In equation (4.11), one may compute the ﬁrst two terms of the expression. However,

the summation term is unknown since we have no knowledge of the probality densities

f2.-
H(y.-) = -/f2.-1nf2.dya (4.12)

Thus, one must approximate the probability density f,“ and also its logarithmic ln f,“
in order to obtain some approximate expression of the entropy as deﬁned by equation

(4.12). To compute such an approximation, one would ﬁrst need to introduce moments

80

and cumulants. Then, an Edgeworth expansion will be used to approximate the

probality density f,“ and also its logarithmic ln f0:-

4.1.1 Cumulants and Moments

Given a random variable x, the nth order moment is deﬁned as
Pa = Elma]

Moments are also often deﬁned as the formal power series expansion of the moment

generation function deﬁned as

 

x k I: 1:
M(ﬁ) = E[eﬁ”] = E]; (ﬁle!) ] = ; éTEl-Tkl = 225%"

The cumulant generation function is deﬁned as

I:
Km) = 1114409) = lnElei’l = g "*5

where n), is the cumulant of the random variable 2:. One can express the moments in

terms of the cumulants, and vice versa, by solving the equation

2‘12?—

Table 4.1 shows the relationship between them for the ﬁrst few orders of moments
and cumulants.

Cumulants proved to be computationally efﬁcient compared to moments despite
the fact that the two quantities are equivalent. Some of the most important features

of cumulants , for statistically independent signals [67], are

e the cumulant of the sum is the sum of the cumulants,

81

Table 4.1. Conversion of moments and cumulants

 

 

 

 

 

 

Cumulants in terms of moments ] Moments in terms of cumulants

'31 = #1 #1 = I‘31

”2:”2—f‘1 #2=K2+n§

I‘33 = ﬂ3-3ﬂ1#2+2I-‘i P3 = 53+351K2+I€¥

64 = #4 - 4111113 - 314% + 12112;“? - 61“} #4 = 64 + 46163 + 343 + 6626? + 51’

 

 

 

 

e ~the cross cumulants are zero,

0 the Edgeworth expansion is most conveniently expressed in terms of cumulants,

and

0 most pdfs of practical signals can be approximated by a ﬁnite number of cumu-

lants.

4.1.2 Edgeworth Expansion

The Edgeworth expansion of a given distribution density function f (2') is formally
deﬁned as a density function having cumulants x..- that are constructed from a modi-
ﬁcation of a baseline density function fo(:r:) having cumulants u; [67],

f(x) = 5(4) f 0011—7; (413)

k=0

where hk(:r) deﬁnes a family of orthogonal functions known as the Hermite polynomial

functions as seen below

fclklﬁt)
— — k—
hlc(x) "' ( 1) f0(3) (4'14)

82

Here the 14’s are the pseudo-moment that satisfy the equation

00 ﬂk oo . 1:
2(1):,- — 11,-)-k—' = ln 2 ”VIC—l (4.15)
k=0 ' k=0 '

The choice of the baseline depends on the the statistical properties of the variable
that is to be approximated. However, if no such information is available, it is most

convenient to use the normal density function

1 22

f0($) 0(1‘ ) me ( )
as a baseline. When fo(a:) = 0(3) and the equation (4.13) is truncated at some ﬁnite
number of terms, the resulting density function approximation is called a Gram-
Charlier series [68]. Expansions based on other than normal distributions are rare.

There are, however, some that are based on the x2 distribution density function [62].

Normal Baseline Expansion

It is intended here to determine an expression for the Edgeworth expansion when the
baseline is the normal distribution 0(3) as deﬁned by (4.16). The cumulants of 0(3)

can be extracted from its cumulant generating function

32 I:
In E[expﬂ:c] = —2— = Erik-k7
k .

Therefore, all the cumulants of 0(3) are zero except V2 = 1. Thus, in this case, the
ﬁrst two pseudo-moments p; are zero and the rest are the unmodiﬁed corresponding
cumulants of the distribution density function f (2:), namely 10),. Consequently, the

Edgeworth expansion of a density function with respect to a normal baseline is

f(x) = 5(2) [1 + $5202) + 535(5) + - - -] (4.17)

83

Table 4.2. Hermite Polynomials derived from a normal baseline density function 0(2)

 

 

 

 

 

h1(a:) = a:

h2($) = $2 '— 1

h3(:r:) = 9:3 — 3x
h4(:r) = :1:4 — 6.732 + 3

 

Equation (4.14) deﬁnes the Hermite polynomials. In the case of a normal baseline,

they are easy to compute. Table 4.2 presents few orders of these polynomials.

4.1.3 Entropy Approximation

The tools to approximate the entropies are now developed. For ease of notation, a

random variable .7: having a density function f (2:) will be considered. Its fourth order

Gram-Charlier approximation will be as follows:
it 1:
f(x) 2. 2(2) [1 + 33-1120») + ﬁbre] = 00110)

Therefore,

H(x) = — [101010) 22
z —/f(:r) ln 0(2) dz — /0(x)p(a:)lnp(:c) d2:

However,

-ln0(:c) = --ln

(4.18)

(4.19)

(4.20)

84

Therefore,

—/f(2:)ln0(:r) d2: = éln27r/f(2:) dr+%/xzf(x) dz:

1
'2 ElDZW-Fég-

In order to compute the second term in equation (4.19), one must approximate the

logarithmic function as

2 $3

111(1+5) = :r- 52+ 3+0(x2) (4.21)

Therefore,

122(2) = 0141—11-10:: 1—112+§1p(x)-113
= §!1’c-%-h3+!—h4- '1'[3—!-h 3 '1'?!1 4]2
LCM—ha +41’1“]

2 2
K K ’03 K4
3 __h2_ 4 __h2_ —h3 h4

2 3123 2 4!2 314!

3 2
3 ”4 3+ ”3‘4 2 “3‘4 2
3. ”313,13 + 3. 413h4+ 31271"3 I“ +— 31412 —h3h4

+33!

= §—!h33:+-!’ —h4—

+

 

Consequently,

11411111114) =1w<x)+—,h21np(x)+"—;h.1np(x)

2 2
It It K3K4
3 __h2__ 4 __h2_ _h3h4

2. 3123 2. 4124 3141

2
3 3+ ”3‘4 2 “3‘4 2
3 "312,2 3.J'3 4I3h‘+ 31271"3 h” 4'3141—"2'h3h

2 2
fi’h§+—h2,1,— "3 ——hg- "4 ——h2h§— "3"“
3

= 37h33+',‘:—!h4-

+

 

 

2
+3! 3! 2 3123 2 412 3141h3 h“

"3 4+ ”4 3 “3‘4 3 “3‘4 2 2
+3. 313h3+ 3. 413h3h4+3—_1241h3 h4+ +31412h3 h4l

2
2 "3 2 "4 3 ”3‘4 2
+47 37h3h4+% _h‘i‘ 2312h3 h‘ 2 4122"" h3h314!‘
3
1“"3 3 "4 4+ ”3‘4 2 2 “3"4 3
+.3 313h3 h4+ 3. 413h4+ 31241h3 h4+ +31412h3h4l

85

In order to complete the integral in equation (4.19), one uses the following properties

of the Hermite polynomials
/h;(a:)hj(a:)a(a:) dz = i! 6...,- (4.22)

The following formulae for the moments of the normal density function 0(3) are also

needed
#2,. = [4220(4) dx=1-3---(2k— 1) (4.23)
p21,.“ = szk+lo(x) dx = 0 (4.24)
Consequently,

[h4(z)a(a:) d3: = 444 — 6142 + 3 = 0

/h§(a:)a(a:) d3: = #6 — 6114 + 9442 = 6

]hi(z)a(z) dx = #8 — 12416 + 42/44 — 36,112 + 9 = 24

/h§(a:)h4(z)a(:c) d2: = 1410 — 12118 + 48/46 — 72144 + 2744; = 216

/ h3(x)a(x) dz = W 481110 + 117118 —- 324116 + 351114 — 162,12 + 27 = 1728
/ hg(x)a(z) d2: = 1112 — 121410 + 54118 - 108146 + 81114 = 3348

/h§(z)hi(z)a(2)dx = p14—18p12+123p10—396p8

86

+60326 — 378m + 81,22 = 30672

/ h:(z )dx = #16 - 24,114 + 226;;12 — 1080p10

+2646p8 - 3240;“. + 2052/14 - 648m + 81 = 368064

Consequently,
I 1 1 1 71
/U($)p($) lnp(3:) (137 = 12 “3+.in —48N§K4- 48n4+ 24K3Ki
+31_K4+_71K4
36 "3 192"“
Finally,
~ 1 1‘2 1 2 1 1x3 71 2 2
H(x) ~ 21n27r+2n2—1—2x3—48n4+ -—84n3n4+4—8-x4 —24n3 4
32.4 7_1,.4
36 3 192 4
= ”(K2an3in4)

4.2 Independence Criteria

(4.25)
(4.26)

(4.27)

Mutual information is a good measure of the statistical independence of the com-

ponents of a random vector. Recall that a better expression for it as described by

equation (4.11) has been developed. In the previous two sections, it was explained

how the marginal entropies are approximated by

H(yt) = H(‘Zi: x33" 54;)

(4.23)

87

where K4,,- and let" order cumulants of the random variable y,-. Thus, the mutual

information functional deﬁned by equation (4.11) can be approximated by
I(y) z 41(2) — ln |W| + )3 H.- (4.29)

Note, however, that the term -H(i) is not a function of y, nor it is a function of W.
Therefore such a term can be eliminated from the independence criterion. Thus, the

independence criterion is

¢(y) = - 1n IWI + 2 H.- (4.30)

4.3 Static Case: Feedforward Network Structure

Consider the static environment case deﬁned deﬁned by
y = Wx (4.31)

The derivation will be made for the case when the number of outputs is equal to the
number of sensors. This implies that W is a square matrix. Since the environment
matrix is assumed to be nonsingular, so is W. Also, because W is a square matrix,
one has W = W. The parameters of the network will be updated according to
the gradient descent method along the energy function deﬁned by (4.30). Thus, the

update law is

- 6
W = ”175% (4.32)

where 17 is the learning rate.

043 _ _Bln |W|

67"".
W - 8W +§W (4.33)

88

 

 

 

 

 

 

 

 

= —W‘T + 2 ——BW (434)
However,
a¢ _ -T 3%"; an)";
6ng _ [W ]U +§ am 6ng (4.35)
.. 8H"; animaym
_ _ T
— [W ]ij + ,2"; an)... 6y... 6ng (4.36)
and
? ym _ a
ang — 6212.,- r wmrsc, (4.37)
= Z 6mg6jrx, (4'38)
= 6,"ng (4.39)
Therefore,
6¢ _ _ -T Wmanzm _ .
8ng _ [W l4,- + a: 66:... W... 6mm, (4.40)
_ 570651-
— _ T __‘.__' .
_ [W 133+ 1 K1,- 6.1/ng (4.41)

The computation of the second term may be cumbersome. What one needs to do
is to ﬁrst compute 2:7? in terms of cumulants, use the convolution of polynomials
to compute the product of the cumulants by expressing the cumulants in terms of
moments as shown in Table 4.1 and ﬁnally convolve the results with gig. This
process is repeated 1 times. While evaluating the cumulants, one can now consider
any additional constraint on the statistical information of the output signal. For
example, one could consider normalizing the output by assuming that the output

signals have unit variance. A MALTAB code is provided in Appendix C to perform

 

89

such computations accurately. When all the computations are complete, one obtains

 

0“ = [W77]. - f(y.)x,- . (4.42)

6ng 'J

If no constraints are assumed, then the nonlinear function is computed to be

1 1 4 41 47
L 15 _ 3823,13 + 223,11 __ _052 9 + 9__y7 + _y5 + 3,3 + y (4.43)

“30:12” 12 3 243’ 3 8

However, if the output signals are assumed to have unit variance, then

71 y,;, 497 ,3 259 ,, 4265 9 1937 7 1285 _325 y,
My): + 123’ 6 y 24 y + 12 y 8 3’ 2" (444)
Therefore, the update law in the general case is deﬁned as
W = 17[W-T — f(y)xT] (4.45)

In [24], the assumption that the variance of the signal is unity was considered from
the start of the derivation. In the work presented here, however, no such assumption
was considered. Instead, the signals are assumed to have unconstrained unknown
variance. Any constraint on the variance is considered only at the last step of the
algorithm derivation. Such considerations are supported in [67], where the author
makes the observation that one should not consider simpliﬁcation of the expression
at the last step of the development. Because such last step simpliﬁcation would give
rise to some other terms that are essential to describe the function. We note that, in
[24], the constraint that the variance is unity was assumed before taking the gradient

of the performance index. Therefore, the approximate expression of the entropy in

this case is

K2 2 521
H(z)~ -ln(27r e)- 23 "4 —.—+ «3,14,49—

v 24! 8 16"3 (4'46)

90

and the corresponding the nonlinear function is

3,, 25,147 475 293

M1!) = 19 + Ty — 39 —- -4—y + -4-y (4.47)

If the constraint of unit variance on output is assumed at the end of the derivation,

then the nonlinear function becomes expressed differently in approximation

__ __ __ __ _ 3
fc(y)—4y 29 1237 29 31 +31 (4.48)

The graphs of the four different nonlinear functions described above are shown to-
gether in Figure 4.2.

In addition, a third order approximation of the logarithmic function is considered.
This would provide a better approximation of the entropies of the output signals.
Figure 4.3 shows the plot of the logarithmic function and its ﬁrst, second and third

order approximations.

4.3.1 Computer Simulations

Using the feedforward structure as shown in Figure 4.1, computer simulations of
the algorithm deﬁned by equation (4.45) are conducted by considering the developed
nonlinear function fa and the nonlinear function fd developed in [24]. The simulations
are performed using a uniformity distributed random signal over the interval [—1, 1]
and a sine waveform of frequency 1H2. The mixing matrix A is a random matrix

deﬁned by

0.4385 0.2209
0.0421 0.7463

91

Nonlinear Functions

 

 

100 - , -
. 1
l I
so — i 1
,- 1
- 1
so - .' I
s .' '
, I 1
4O : I ’
I i l
I ' l
20 g .’ 1
:.’ l
.7 1
>. o m—_. -: ..... I '
\ \ 3, '
—2o ~""-<' :
\
\ I
I
.40 Graph (a) \\ I
Graph (b) ‘ \ ”
—60 Graph (c) \ ,
Graph (d)
-80
-1” 1 L 1 I
0.5 1 1 .5 2

 

-2

 

Figure 4.2. Nonlinear function. Graphs (a)-(d) show the plots of the functions fa(y)

through fd(y)

   

 

 

 

 

0.3 r
0.2 -
.- 0.1 ~
-2; —— Log
._._._. 1atordor 0*
>~-3 - - - - 2nd order >~
........... 3rd ordar .0.‘ '-
-4
-o.2 ~
-5 "
_6' _ —O.3 -
-7 L L A j -0. l j_ A I
-1 -o.5 o 0.5 1 $.46 —o.2 o 0.2 0.4
X X

Figure 4.3. Natural Logarithmic function and few orders of approximations

92
The initial weights are also randomly chosen and are deﬁned by the matrix

0.5277 0.4755
0 =
0.7625 —0.5303

Thus, the initial overall gain matrix is

0.3356 0.5572
G0 = W014 =

-0.0647 —0.1067
Clearly, such a matrix is not a generalized permutation matrix. Thus, it is desired
that the network would update its parameters such that the gain matrix WfA, where
W, is the ﬁnal weight matrix, is a generalized permutation matrix.
Using the algorithm deﬁned by equation (4.45) and the nonlinear function fa
deﬁned by equation (4.43) and considering a learning rate 17 = 0.005, the network

converges to

—0.0703 0.7625
1.8137 -—-0.5303

Thus,

0.0013 0.5535
W,A.-.
0.7729 0.0049

is a generalized matrix. Consequently, the algorithm converges to a separating solu-
tion. Figure 4.4 shows the results of such a simulations.
Using the same mixing matrix A and initial weight matrix W0, and considering

a learning rate r) = 0.1, the algorithm deﬁned by equation (4.45) and the nonlinear

93

3rd odor, last step approx. (‘), random and sine input

 

 

 

 

 

 

 

 

2.5
x
'8 2
E
8 1.5
5
E 1
§
(L 0.5
0 M.An..._1““ .... . . . . . ..
O 10 20 3O 40 5O 60 70 80 90
2 ' 1 .
<
‘9 1.5' 3
§ 2 /
8’ 1’ 23'
g 05 2 0.5
1: 'o
8 .§
9 ° 9
c
3-05 . 8 O
-1
O 20 40 60 80 0 20 4O 60 80

Figure 4.4. Computer Simulation of Feedforward Structure Using fa

94

function fd deﬁned by equation (4.47) converges to the ﬁnal weight matrix

—1.4506 3.3833
W, =
2.2009 —0.2815

Thus, ﬁnal overall gain matrix

0.0127 1.2280
G f = WfA =
1.6472 —0.0006
is also a generalized permutation matrix. Figure 4.5 shows the results of the sim-
ulations. These two simulations are a sample of numerous simulations that were

conducted to analyze the performance of the algorithms for various initial conditions.

In all cases, the algorithm always converges to a desired solution.

4.4 Static Case: Feedback Network Structure

Recall that the mutual information is the algebraic sum of the joint entropy and
the sum of all marginal entropies, as described in equation (4.3). Because of the
opposite signs that appear next to each of these two terms, one can deduce that the
minimization of the mutual information is somehow, but not exactly, equivalent to
the minimization of the sum of marginal entropies and/or the maximization of the
joint entropy. In the work of Bell and Sejnowski [45], the authors approached the
problem by maximizing the joint entropy in order to develop an update law for the
network. On the other hand, I propose an update law based on the minimization of

the sum of marginal entropies. Thus, the theorem below is stated.

Theorem 2 Under the assumption that qm, = 0 Vm aé i, the update law

(£3 = nfa(9.-)yj (4.49)

95

2nd oder, first step approx. (‘), random and sine input

1"
0|
l

'0

Performance Index
_. 01

.°
01

 

 

c,4:
8
8r
8
8
8
5‘
8
8

 

 

 

Generalized Matrix WA

 

 

 

 

 

2O 4O 60 80

Figure 4.5. Computer Simulation of Feedforward Structure Using fd

96
is derived by minimizing the functional
q) = Z H(yi) (4.50)

Proof:

The parameters of the network are updated using the gradient descent method

d.” = _0— 4.51
J adij ( )
However, using equation (2.23), the gradient of the function is expressed as
695 _ 695 09...
6d,,- “ 5,; 6y... 8d,,- (4'52)
34>
_ -— ; Eqmiy] (4.53)
By considering the assumption that qm, = 0 Vm 51$ i, then
04> 343
adij — -9" ayi y] (454)
= -q.-.-fa(ye)yj (4'55)
Therefore , the parameters of the D matrix are updated according to
4.3 = ﬂfa(yr)yj (4.56)

The algorithm described by equation (4.56) provides a justiﬁcation of specialized
view of the algorithm developed by Herault and Jutten based on neuromimetic ap-
proach [8], since Equation (4.56) is analogous to equation (2.6) where f (1:) = fa(:c)

and y(t) = :c.

97

4.4.1 Computer Simulations

Using the feedback structure as deﬁned by Figure 2.1, the performance of the al-
gorithm deﬁned by the nonlinear function fa, as described by equation (4.43), will
be compared to that deﬁned by the nonlinear function fd which is developed in [24]
and is described by equation (4.47). Using this structure, the update law for the

parameters is as follows:

6..- = 4.7.09.9.- z'aé 73a = a.d (4.57)

Computer simulations using 2 sine functions of respective frequencies 1H 2 and 2H 2

are ﬁrst conducted. The mixing matrix is

1.0 0.4
A = (4.58)

0.6 1.0

The initial weight matrix is zero and the learning rates are the, following
1]“ = 0.005 17.1 = 0.1 (4.59)

Figures 4.6 and 4.7 show the performance of both algorithms. One can observe that
the algorithm deﬁned by fa succeeded in separating the original sources, whereas the
algorithm deﬁned by fd fails to perform separation.

Computer simulations are also performed using a Gaussian random signal and a
sine function of frequency 2H 2. Also, other simulations using a Gaussian random
signal and a square function are completed. In each case, the algorithm deﬁned by for
fails to separate the signals. On the other hand, the algorithm deﬁned by fa performs
the separation task successfully. Figures 4.8and 4.9 show the performance of both
algorithms in the case of a Gaussian random signal and sine function were considered

as the unknown sources. Figure 4.10 shows the original sources, the mixtures and the

98

New Algorithm

e . n
.................................................................................................

-
.................................................................................................
. o c -

Performance Index

.....................................................................................................

 

 

 

50 60 7O 80 90 1 00

 

 

 

 

 

 

 

 

 

0.8 * 1 .5 .

0.6 g 1 w
E 2
’5 0.4 ' E 0.5
3 E

‘c’
0.2 8 O 4
O ‘ ‘ -O.5 * ‘
0 50 100 O 50 100

0.0000 0.4001 ]

W _ 1.0000 —0.0001
f“ 0.6001 0.0000

-1 _.
(1 “W A" [ —0.0001 1.0001

Figure 4.6. Performance of the Algorithm deﬁned by the nonlinear function f,,

99

 

 

 

 

 

 

 

 

 

 

 

 

AmariAlgorithm
Eternun. ..............................................................................................
E
o
IL1.6-
L 1 1 l l I l 4 l I
O 10 20 30 4O 50 60 70 80 90 100
0.6 ------------------- 12
0.4 -------------------- g 1
g) $0.8»
:5 0.2r _g
3 gas
:2
o (90.4].
-O.2 ‘ ‘ 0.2 ‘ ‘
O 50 100 O 50 100

_ 0.0000 0.0222 _, _ 0.9916 0.3797
W" [0.2229 00000] (”W’) A“ [0.3789 0.9154

Figure 4.7. Performance of the Algorithm deﬁned by the nonlinear function fd

100

output of the network at convergence.

New Algorithm

Performance Index

 

 

 

0.8' 1.5:

 

 

Generalized Matrix
9
of
i

 

 

 

 

 

010203040 010203040

_ 0.0000 0.3940 _, _ 1.0002 0.0078
WI " [0.6003 0.0000] (I +Wf) A‘ [ —0.0004 0.9953]

Figure 4.8. Performance of the Algorithm deﬁned by the nonlinear function fa using
random Gaussian and a sine function

101

Amari Algorithm

Performance Index

 

 

 

 

 

 

 

 

 

 

 

0.6 -----------------------
_x 1
0.4 ----------------------- g t
"’ 2
% 0.2 E
3 E o 5 K
8
-O.2 - 4 O _ - .
O 1 O 20 O 1 O 20 30

_ 0.0000 0.2250 _, _ 0.4949 0.1591
W” [0.3199 0.0000] (MW!) A" [0.2323 0.4908]

Figure 4.9. Performance of the Algorithm deﬁned by the nonlinear function f4 using
a random Gaussian and a sine function

102

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5' 1

'5 0 ‘3‘) 0
-5 - . _1 . .
50 50.5 51 50 50.5 51
30 301 -

- '1

1-201 5 N20

re 7 ID

72' E

310’ 2.0m [fin :10 ill” Illll
0 7 . n 0 7 7
-4 -2‘ O 2 4 -1 -O.5 0 05 1
5 5.

‘ ° 3 °W)WW
-5 g - _5 . .
50 50.5 51 50 50.5 51
5 2.

3'. 0 g. 0
"550 505 51 "250 505 51

x10‘4 ' x10“ '

2' 5.

5 ° 2

I I O

;_21 g.

_4 A . _5 . .
50 50.5 51 50 50.5 51

Figure 4.10. Output of the Algorithm deﬁned by the nonlinear function fa using
random Gaussian and a sine function '

CHAPTER 5

Blind Separation in a Dynamic

Environment: FeedForward

Structure

5.1 Problem Deﬁnition

In Chapter '3, static modeling for both the environment and the network was con-
sidered. However, media cannot always be modeled as such. Therefore, one must
consider more realistic environments, deﬁne their models and develop an update law
to recover the original signals. Such an environment will be modeled as a linear
dynamical system. Consequently, the network will also be modeled as a linear dy-

namical system. Figure 5.1 depicts the architecture of this situation. The realiza-

 

 

A
M
v

e(t) i=Ax+Be y
y=Cx+De

1?

 

s(t)

bl ml

 

0' 3M

 

 

G NI.
XI )0
+ +

 

 

 

 

Figure 5.1. Feedforward Architecture

103

HG

li:

104

tion 5 = (A, B, C', D) represents the parameters of the environment, whereas
D = (A, B, C, D) is that of the network. These parameters dictate the dynamics of
both the environment and the network systems. While the realization T) is constant
and models the behavior of the environment, the realization D of the network are yet
to be modeled or deﬁned. The goal here is to ﬁnd an adaptive law (algorithm), in
such a way, that when these parameters converge to some stable solution, the network
output signals are replicas or similar to the original unknown sources. Or, by using
the deﬁnition of wave-preserving, it is desired that the output vector preserves the
waveforms of the components of the original signals. So, the question at hand is how
should one update these parameters to arrive at such values? and, do such values
exist in the ﬁrst place? In this chapter, ﬁrst, the existence of a theoretical solution to
the problem will be shown. Then, Optimization Theory will be utilized to develop an
adaptive rule based on a criterion that deﬁnes the independence of the output signals.
A state representation of the network model that renders its implementation in VLSI
attractive, and which also minimizes the number of parameters that characterize the
network, will be deve10ped. The algorithm will be tested via computer simulations.

Limitations and some possible improvements of the algorithm will also be discussed.

5-2 Existence of a Theoretical Solution

Suppose that an update necessary to recover the original signals is developed and
that the parameters of the neural networks have converged to the realization D“ =
(A', B’, C”, D'). Then, because of the wave preserving property, there exists a

Permutation matrix P and a. positive deﬁnite diagonal matrix I‘ such that

ﬂﬁ=PFﬁﬁ

Co

105

There also exists a nonsingular matrix T such that x = Tic. Thus, differentiating

both sides with respect to time:

szx => A’x+B'e=TAi+TBs

=> A“): + B'(Cx + D5) = TAT-1x + TBs

=> A‘x + B’Ds = (T/iT"l — B‘CT‘1)x + TBS (5.1)
Also,
y: PI‘s => PI‘s = C'X-l-D'e
=> PI‘s = C‘x + D'(C’x + Ds)
=> PFs = (C‘ + D‘C‘T’l)x + D‘Ds (5.2)
Therefore,
A" = TAT"1 — B"C-'T'1 (5.3)
B‘D = TB (5.4)
C“ + D"C'T‘1 = 0 (5.5)
D'D = I (5.6)
Consequently,
A‘ = T (A + BD“C’) :r-1 (5.7)
B' = TED" (5.8)
C" = --P1‘1')-1(7:I‘-l (5.9)

D‘ = PI‘D“ (5.10)

Pa:

31*

Th

3?;
the

In!
36.0

THU

106

Particular Case

Suppose that T = PI‘ = I where I is the identity square matrix, then

A" = A—BD-‘C‘ (5.11)
B“ = BI)“ (5.12)
C" = D-IC' (5.13)
D‘ = f)‘1 (5.14)

Thus, it is shown that the theoretical solution to the problem exists, and that if the
appropriate energy function of the independence criterion is deﬁned for the problem,
then the parameters of the system will converge to one of the solutions deﬁned by
equations (5.7)-(5.10). However, before going into deﬁning such energy functions,

possible realizations of the system will be discussed;

5.3 State Realization

In this section, different network structures and their suitability to the problem of the
separation of signals will be introduced. The network is assumed to be a multi-input

multi-output system described by
Y(s) = H(3)U(s) (5.15)

H (s) is the m x r system’s transfer matrix where r and m are, respectively, the number
of input and output signals. The goal is to ﬁnd a dynamical system D = (A, B, C, D)

whose transfer function is the m x r matrix H (s) where

H(s) = C(sI — A)“B + D (5.16)

107

However, before starting the development of the structure, the following assumptions

on the transfer function matrix H (s) will be considered.
(A1) H (s) is stable and minimum phase

(A2) Each entry of the matrix H (s) is in its irreducible form
(A3) entries of H (s) have no common poles

The rationale behind these assumptions will be explained in the development of the

structure.

5.3.1 Controllable / Observable Canonical Form

The literature provides one with various realizations of H (s), [59]. Of these, a canoni-
cal controllable realization will be considered. The procedure to develop the proposed
structure is simple and straight forward. A main feature of the structure is that there
is no direct coupling between the input signals. This realization has very nice features,
in terms of its perfect ﬁtness to the problem of blind separation, and also in terms
of circuit implementation. It presents the least number of parameters of the state
representation of the transfer matrix H (3). Such features will enable one to deﬁne
simple update laws for the parameters and to analyze their performance based on
the independence criterion of the output signals. The structure generates a bank of
ﬁlters that are jointly decoupled. The analysis provided below will display such char-
acteristics of the developed realization. Some ﬁgures are also presented to illustrate
this point.

Each output is described as

K-(s) = imam-(s) = inks) (5.17)

j=1 '=l

108

According to assumption (A2), it is assumed that each H,,-(s) is irreducible. Thus,

we choose to represent H;,-(s) by the canonical controllable realization D;,- =

(Aiji bija Cg, dij)-

 

 

 

 

 

 

 

 

 

 

A.,eR"i:‘""-‘1 1,0512%:

. . -_ A 1 . . ’7":
55,1.) 0 1 o o 1 53,1.) 0
5S? 0 o 1 o 55,?) o

= + u,

53‘0") o o o o 1 mgr-1) 0
£351) _agfij) _as;) J $355) 1

y.- = as? as? ass-1) cw + a...

c§j62“‘i
where Org-c) are the coefﬁcients of the characteristic polynomial of the matrix Ag,
deﬁned by
det (AI — A”) = A“? + adj-EV“?1 + - - - + erg-“VIM + aff") (5.18)

The sub-realization D5,- : (Aij, bij, c5, dij) provides us with a set of different
ﬁltered versions of the 1"” input 11,- that affect only the 2'“ output 3],. Figure 5.2(a)
presents the diagram of the state representation the sub-realization ’ng, whereas
Figure 5.2(b) shows the combination of the state and the output representation of
the sub-realization D5, = (A;,, b;,-, c3}, dij). The combination of these realizations
13.3, j = 1, - - - ,r, will provide us with the 2'“ output 3],. Figure 5.2(c) shows such a

structure. But, let’s proceed to see how it is obtained.

T
To do so, one must deﬁne the vector x; = [ x3; . . . x5 ] to be a vector in ’R’“

109

where p,- = 237:1 11.5. Then,

 

 

 

 

 

 

 

 

 

 

A.eR:axua 3,53..."
51,-, A51 0 0 1 - bu 0 0
O 0
x.J = 0 A” 0 Xi + 0 b.J 0 l1
0 0
1 Sq, ‘ L 0 0 Ag,- . L 0 0 b.-,.
y,- = [cg CiTj cgJX, + [(1.1 db. £11,)“
CaG'R‘“ dgE‘R'
Therefore,
5:. = A;x.-+Bgu (5.19)
y.- = ciTx,+d,7'u . (5.20)

An important feature of this realization is the inherent parallel structure that it
possesses, as shown in Figure 5.2(c). This structure clearly decouples the inputs from
each other. This will be used as an advantage when an update law that pushes for
the output independence is to be developed.

At the ﬁnal stage, all the ﬁlters will be combined and the structure below will

T
be developed. Thus, the vector x = [ x? . . . x; 1 in R” is deﬁned, where n =

V1

llOI

110

22;, [15. Then,

AERnx" BERnx'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

, r. n . . ’7‘“.
*1 A] ‘ 0 0 X1 Bl
- 0
X. = 0 ' Ag‘ 0 X. + B, u
- 0
. . . 1
1. - - 1 P a - -
y] c? . . . O . . . 0 x1 df
0
y. = 0 Cz‘r 0 X; ‘1' d? 11
0
y... 0 0 c; x... d:
‘ v a) b b -
Cekmxn De‘Rmxr
Thus,
51 = Ax + Bu (5.21)
y = Cx + Du (5.22)

where A, B, C and D are deﬁned above. The realization D = (A, B, C, D) is called
the controllable canonical realization of H (s).

In [59], it is discussed that D};- = (A3}, b5, cijT, (1.3) is called the observable canon-
ical realization of H;j(s). Therefore, by following the same procedure, as discussed
above, to develop a controllable canonical realization of H (.3), one can demonstrate
that DT = (AT, BT, CT, D7) is an observable canonical realization of H (3).

By assumption (A2) and (A3), this structure is irreducible. If these two assump-

tions are not veriﬁed, then a more compact realization can be found [59]. But, for the

111

sake of generality, it is safe to have such assumptions. Assumption (A1) is introduced
because, in real life applications and in computer implementation, such a feature is

needed in order to guarantee stability.

5.3.2 Features of the realization

In the course of development of the realization, some features of the proposed realiza-
tion have been mentioned. These include the decoupling between inputs, the inherent
parallel structure of the realization, and of most importance, the reduced number of
parameters by which the system can be deﬁned. This last feature is very impor-
tant in terms of circuit implementation because, it implies reduction in the circuit
micro-electronic implemtation area.

Starting with a transfer matrix whose entries have no common poles, the state
representation of such a transfer matrix would require as many poles as the transfer
matrix has. This number is equal to the size of the state vector x, namely n. Despite
the fact that the matrix A is an n x 12 matrix, it is being represented by only 12 =
2,},- p;,-, since each matrix A;,- is represented only by the coefficients of is characteristic
polynomial, and the remaining entries are all ones and zeros. This represents a
reduction from n2 to just n parameters to represent the matrix A.

The matrix C is an m x n matrix. However, only 11 parameters of the matrix are
nonzero. The other n(m — 1) are all zeros. This again represents a reduction in the
number of parameters from nm to just 17..

The matrix B is formed by only zeros and ones. Thus, there is no parameter
for this matrix. Since B is an n x 1' matrix, this again represents a reduction of nr
parameters that must be updated. Finally the D matrix is represented by m x r
nonzero parameters.

Table 5.3.2 shows the number of parameters that would represent a general m x r

transfer matrix with no common poles between its entries in one column. The other

112

Table 5.1. Number of parameters for 2 realizations

Realization

A n
B nr

nm n
mr mr
n +nr+nm+mr 2n+mr

 

column shows the number of parameters for the controllable canonical realization that
has been deﬁned above. One observes that the order of the number of parameters

that deﬁne the system has been reduced from 0(n2) to 0(n).

113

 

 

 

(xi-A.1)"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

drl
u)
_— “ +131'An)- E E E
* 1:1"
m
x . x..
:3 + —> y.
Hg — u, +(Sl‘A‘j)-l E E E
L___ (11,7) . -
x..
11
dir
(I)
F—- 1. E2]
— “r-———>[:l-A.-.)" E 5 5
Ha» m ’
it h‘

Figure 5.2. (a) State Equation x5,- = A,,-x.~,- + bijuJ', (b) Output Equation 31,-, =
chgj + dgju, and (c) Output Equation 3;.- = cfx; + diTu

114

5.4 Adaptive Optimal Control Theory

Optimal adaptive control theory will be utilized to derive an update law for the blind
separation problem in a dynamical environment. The derivation will be valid for
both deterministic and stochastic cases. The update law will be derived for a general
nonlinear dynamical case. Then, the derivation of the linear dynamical case will be
straight forward.

Suppose that the independence of the output is characterized by

c(y,w) = ¢(t.x,w) (5.23)

The goal is then to optimize the functional deﬁned by (5.23) subject to the dynamics

of the following system

it = f(t,x, W1), With X(to) = X0 (5.24)

y = g(t,X,W2) (5.25)

where f and g are two twice-differentiable continuous functions, and WI and W2 are
the parameters of the network model. To accomplish the task of optimization, the

following performance index is introduced
J( ) /T£(t ' )1 dt
x w = x x w
i 0 , a 1 a 1)
where w = [w1, W2] and £(.) is the Lagrangian function deﬁned as
C(tax: is Aiw) = ¢(t,X,W2) + AT (X - f(t,X,W1))

By introducing the Lagrangian parameters Mt), the functional J (x, w) will be op-
timized by considering the different variables w(t), x(t) and Mt) as independent

variables. There is no constraint on the ﬁnal time condition of the system. Thus,

115

the optimization of the functional J is called a free-end variational problem. The

variation of the integral leads to the following well-known Euler equations for the

variables x(t) and /\(t):

6.6 d 6L
57g}-.. (5...)
BL (9L
.5, -1-.{..}= . (.27)

with boundary condition

0.6

5? = o (5.28)

t=T

 

Equation (5.26) is equivalent to
x = f(t,x,w1), with x(0) = x0 (5.29)
while equations (5.27) and (5.28) are equivalent to

i=— GOT) +— 6:, with A(T) = o (5.30)

The parameters will be updated using the gradient of the performance index

‘ 0£d .
w-=na—w—_ J=17/0— =1,2 (5.31)

where 17 is the learning rate and is positive (negative) if the goal is to maximize
(minimize) the performance index. Note here that A is obtained by performing a

back propagation through time starting at time t = T with zero initial value.

116

Equation (5.31) can be simpliﬁed to

 

. ‘BATf

w, _ —.)/0 i 1 do (5.32)
. _ t 65

W2 — 7] 0 __52 d0 (5.33)

The parameters could also be updated using an instantaneous update

816

W; = 175-“; 2:1,2 (5.34)

and would result in the following update laws for WI and W2:

 

6)? f
° _ 5.35
W] " CW1 ( )
. a
.. = ...:é; (.3.)

Remark: one can observe that the update law for the parameter matrix W2 depends
on the independence criterion function (1), whereas that of w] depends on the choice
of both the nonlinear function f and the independence criterion <13 since the dynamics

of A depends on its direction along the state x.

5.4.1 Computer Implementation

The computer simulation of the algorithm will proceed as follows:

1. Perform a forward integration in the time interval [0, T] in order to obtain the

dynamic state equation and output equation deﬁned by (5.24) and (5.25)

2. Perform a backward integration of the system’s adjoint equation deﬁned by the

dynamics of equation (5.30)

3. Update the network parameters using either set of equations deﬁned by (5.32-
5.33) or (5.35-5.36)

117

4. While parameters are not convergent, go back to step 1.

This research has focused on the update of the parameters of W2 while keeping the
parameters of WI at their nominal solutions. In this case, one observes that the
update of W2 does not depend on the adjoint state. Therefore, step 2 described above

can be eliminated while we focus on the update of the W2 parameter.

5.5 Update Law Derivation

5.5.1 Nonlinear Dynamic Modeling

In this section, the output of the network is modeled as a nonlinear function of the

weighted sum of the input vector e and state vector x.

1’: = f(t.X.w1) = f(Ax + 36) = f(U) (5-37)

g(t. x, wz) = g(Cx + D8) = g(V) (5-38)

‘<
ll

Then, the update law for the parameters, as well as the state equation of the adjoint

system, will be derived.

Update law Derivation for A and B

Let a = a, 6. Then,

 

_ a(/\Tf)
1’ 00;,“
afm(um)
_ _ )m_____
a; 30:3
0::
172m: f ( )6a.,-

051' =

118

Thus, one needs to compute 351,1:
‘1

an",
0a,,-
Bum (9me

5E;- _ P 6338? -- $677" 6PJCP — Cm'CJ

 

 

Bump
= a; = 6m;6 -:c =6mgl“
2p: 3m,- p Ep: 10: p .7

where 6 is the Kronecker-delta deﬁned as

1 ifz' = '
6,,- = J
O ifz' 751'
Therefore,
dgj = —T]ZAmffn(um)6mixj = -7If,"(ui))‘i$j
5,-1- = -n21\mfr’n(um)5miej = —’7ff("i))‘iej

In matrix form,

A = -17Aft(u) /\ XT

B = —7]Afr(u) /\ 8T

where A. = diag(zl,zg, - . .,z,.)
Table 5.5.1 presents some nonlinear functions and their corresponding Anti-

hebbian like update rules for the matrices A and B.

Update Law Derivation for C and D

The independence criterion ¢ is a function of the output y, 45 = ¢(y)
Let Oz = c, d again. Then,

645(11)

60.3

 

61,-,- =77

Th

119

Table 5 2. Different Nonlinearity functions and their derivatives

 

 

 

 

 

 

 

 

 

 

 

Function Derivative Anti-Hebb Anti-Hebb
f(u) f’(u) for A for B
u 1 AxT AeT
Tit—‘3 f(u) - f(u) Af(u)_,(u)2AxT Af(u)_fm2).eT
tanh(u) 1 — f(U)2 A1_f(u)2)tXT A1_f(u)2AeT
arctan(u) 1%; A 141/RT A141 z\e
e"“2 -2u f (u) A-2uf(u)AxT A-2uf(u)).e

 

 

 

_ 3¢(y)0ym

— 712 By". CagJ'

_ 395(31) )39m(vm)
_ '12 By... 60,-,

17236;:9 I m( vm)a—:m

 

 

Thus, we need to compute 3724'

 

avm cm

66" = 2p: E3212? = $6,,“ (Swat? = 6";ng

0v", _ 0d,”,

561—1; _ p —ad:jep 2:6,,"- 6,,Jep- — 6 me,-

Therefore,
5 9,

Cu = ’72— fig—1) m(vm)5mexj = ng.(vs)— a:
- 3 g' 3
d..- = n}:— ¢—-:f::) ..(um)6m.-e.- = ng:(v.-)— 45.

3y,- ej

In matrix form,

- 6
C = ”A9'(V)-3_:XT

120

59¢
D = ”quv) b—y—eT

Below, two simple independence criteria are listed.

1. ¢(y) = £2,312 = §yTy. Therefore,

13,-, = 71930031131

3:5 = ng£(v.-)y.~ej
2. ¢(y) =% ,y3. Therefore,

«2.. = 129512.55.-

Jij = 7795(vi)y? 61‘

Adj oint State Equation Derivation

N ow, the adjoint state equation will be computed.

3f T 6f,~( j)
= f§(uj)g—:
— I . . axm
— fj(u1);aJm_a—£
f,’-(u,-) Z ajm6mi

= f;- ("0013'

= [ATAI'(“)] ij

[555)] = 65(1)

6x 5 0x,- '
_. 22%
— gay,- 6x.-

(5.39)

(5.40)

121
_ 34> g)(
_ Z— 1(vJ)-6—_ 8:.“
34> ,
= ;— 9j(vj) CJi
= [CTJ Ag groggy ¢]_
Thus, in compact form:

. 345
A=-ATA ...A+CTA..—
If) aflay

In conclusion, given a network whose states follow the dynamics described by
= f(Ax + Be), x0 = x(to) (5,41)
and whose output is
y = 9(0): + De) (5.42)
The dynamics of the adjoint system are described by
A = —ATA,.(..)A + 07%,“)??? (5-43)

The parameters are updated according to:

A' = —nA,.(u)AzT (5.44)
B = —nAf:(u)AeT . (5.45)
C = 5A,.(,,%§zT (5.46)
11 34‘

= ”A91(v)-a-§CT (5.47)

122

5.5.2 Linear Dynamical Case

The derivation of the update laws for this model is a special case of the nonlinear
dynamic model discussed in the previous section. In this case, the functions f and g
in equations (5.41) and (5.42) are the identity functions. Thus, their derivatives are -

the identity matrices. Consequently, equation (5.43) is simpliﬁed to:

1 = —ATA + 0ng (5.48)
and (5.44)-(5.47) are simply:

A = -n)~ 5T (5.49)

B = -17).! (5.50)

C" = 17%ng (5.51)

1') = 52—3-3 (5.52)

In order to compute the term 33, one can make direct computation as was pursued in
Chapter 4 for the static case. One, however, can deﬁne new variables and transform
the problem into one that is similar to the static case. This will be accomplished by

considering the output equation of the linear dynamical system
y = Cx + De (5.53)

Then, deﬁne the vectors 3] and 5:, and the matrix W as

‘(t
ll
)4:
II
II

(5.54)

123

However,
y D C e
x 0 I x

Therefore, equation (5.53) can be written as
y = W52

Consequently, the update law for the parameters of the matrix W is
W = "[1474” — 5(9):?!) (1 e {a, b, c, d}

First, one needs to ﬁnd W‘T

W-T: D(DTD)“1 0
—CTD(DTD)’1 I
since
D7 0 D(DTD)’1 0 I 0
CT I —CTD(DTD)‘1 I 0 I

Consequently, the update law for the matrix C _

C = "ﬂfa(Y)xT

and that of matrix D is I

D = v[D(DTD)-‘ — fa(y)eT]

(5.55)

(5.55)

(5.57)

(5.58)

(5.59)

(5.50)

(5.61)

124

If D is a square nonsingular matrix, then equation (5.61) becomes

D = )[D‘T—f.(y)eT]' (5.62)

= 7(1 — fa(Y)yT]D‘T (5-63)

5.6 Computer Simulations

Computer simulations of the update laws of the parameters of the matrices C and D,

as deﬁned by equation (5.60) and (5.62), were performed. The considered unknown

sources are two sine waveforms with respective frequencies 1H2 and 2H 2. The envi-

ronment model is the dynamical system represented by the following transfer matrix
—1

.92 a s
1.0% 068:5 = Ht(3)-l (5 64)

:2 28 3 32i6si8
0234-98440 1‘0: +133+42

One theoretical solution to the problem is the transfer function H ‘(s) whose canonical

H(s) =

state representation, D‘ = (A’, B‘, C‘, D‘), is deﬁned by

 

 

 

 

0 1 0 0 0 0 0 0 0‘
—2 —3 0 0 0 0 0 1 0
0 0 -3 0 0 0 0 0 1
A“: 0 0 0 0 1 0 0 8‘: 0 0
0 0 0 -20 —9 0 0 1 0
0 0 0 0 0 0 1 0 0
_ 0 0 0 0 0 —42 ‘13. _0 1‘
= 13 5 —1.2 0 0 0 0 . 0*: 1.0 0.6
0 0 0 -—-3.6 —1.2 —34 -7.0‘ 0.2 1.0

125

The bode plots of the environment, deﬁned by H(s), and the desired realization of

the network model, deﬁned by H ‘(s), are shown respectively in ﬁgures 5.3 and 5.4.

Gain dB Phase deg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f (rad/sec) 1 (rad/soc)

Figure 5.3. Bode plot of the environment model deﬁned by H(s)

First, computer simulations were performed by updating only the parameters of
one matrix while holding the parameters of the other matrix at their nominal values,
as deﬁned by D‘. In either experiments, the algorithm failed to converge to the
desired solution. Figures 5.5 and 5.6 show the the algorithm fails to converge to the
desired solution. Figure 5.5 shows the performance of the algorithm by updating the
parameters of the D matrix while those of the C matrix are held at their nominal

values. On the other hand, Figure 5.6 shows the performance of the algorithm by

126

Galn dB Phase deg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f (rad/soc) f (rad/soc)

Figure 5.4. Bode plot of the network model deﬁned by H‘(s)

127

updating only the C matrix and ﬁxing the parameters of the D matrix to their
nominal values. Observe that the algorithm fails to converge in both simulations.
The algorithm also fails to accomplish the convergence by considering the nonlinear
functions fb, fc, fd as well as the cubic and the sine hyperbolic ones. Figure 5.7
shows a case of simulations when the nonlinear function is fd. The reason for the
failure of the algorithm is the fact that the problem is not completely identiﬁable.
Recall that from equations (5.7)-(5.10), the system has inﬁnitely many solutions that
it could converge to. The solution deﬁned by D‘ = (A’, B‘, C‘, D‘) is one particular
solution. The realization D}; = (A‘, 8", PC“, PD") is also another possible solution
where P is generalized permutation matrix. Thus, by keeping one matrix constant
at its nominal value, one is limiting the set of all possible solutions to one particular
point in the set of equilibrium points.

Next, computer simulations were performed by considering the simultaneous up-
date of both parameters of the matrices C and D. Figure 5.8 shows the respective
performance of when two sine functions are assumed to be the unknown sources. One
observes that the all the parameters converge, but not to the desired values.

On the other hand, if one considers an update law for the matrix D deﬁned by
(4.56)

D = —7fa(Y)yTa a E {0, b: C: d} (5'65)

the algorithm demonstrated its capability to converge only when the nonlinear func-
tion is deﬁned by fa. Figure 5.9 and 5.10 are samples of such simulations. One can
observe the convergence of the matrix D in Figure 5.9 as all its parameters become
equal to the desired values in a ﬁnite time. On the other hand, the parameters of the
D matrix do converge in Figure 5.10, but not to the desired values. Figure 5.11 shows

the test simulation of the algorithm deﬁned by equation (5.65) using the nonlinear

128

Dynamic Environment eta = 0.005
0.4 -

0.3 ~

0.1 -

 

O l l l l l l I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d12

(I12 3 0.67 (121 s 0.37
0.8 '

 

 

vv’rvw vvrrv—vwrvrrv mv—V

AAA- AAAA AAA-
ijf V‘vf

t-o-o—o-co------—----o—o-

   

 

 

O 5 10 15 20 O 5 10 15 20
time (sec) time (sec)

 

Figure 5.5. Dynamic Forward Structure. Update only the D matrix according to
D = 7[D’T -— fa(y)eT] using two sine waveforms, 7 = 0.005 and D0 = 0

129

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c1 1 012 C13
250 1 3 - O '
_1 .
-2
—3
-4
-5 b
_8 b
_7 A
200 400 600 O 200 400 600
021 022 024
O O 5 200 r —3 '
.4 .
-5
-5
-15 i -7
.8 ’
—20
-9 .
-26 -10 r
O O 500

 

Figure 5. 6. Dynamic Forward Structure. Update only the C matrix according to
C = -17f..(y)xT using two sine waveforms, n = [1000 100 1 100 1 100 10] and
Co- — 0. 80“

 

130

Dynamic Environment eta = 0.5

 

 

 

 

 

 

 

0.08 -
0.06 -
g 0.04 ~
0.02 -
o I L 1 l l 1 J
0 0.05 0.1 0.2 0.25 0.3 0.35 0.4
d12
d12 . 0.35 (121 = 0.051
0.51— ------—.——._._.-_._._.- 0.2,- _ .................... -
0.15
0.4»
W
N N
“'5 :5 O 1
0.2
0.05
0 . - 5 - 0 1 L e a
0 5 10 15 20 0 5 10 15 20
time (see) time (sec)

Figure 5.7. Dynamic Forward Structure. Update only the D matrix according to
D = 7[D‘T — fd(y)eT] using two sine waveforms, 7 = 0.5 and D0 = 0

131

 

 

 

 

 

 

 

 

011 c12 C13
40' 5’—---—--—‘----- 2’
[ .
20 O _ ...........
-.-._ ......... l __2
i _w i WM
0 A
O 2% 400 600 <32 200 400 $5) 0 200 ago 600
40 5 100' O

 

 

 

 

 

-C-O—O-l_._l

 

 

 

 

 

 

d12
(A .

d2

0

on

 

 

 

 

0 200 400 600 0 200 400 500
time (see) time (sec)

Figure 5.8. .Dynamic Forward Structure, Update both the C and D matrices ac-
cording to D = 7[D'T - fa(y)eT] and C = —17f.,(y)xT using two sine waveforms,
7 = 0.005, 17 = [100 10 1 1000 5 2000 10]. All initial conditions are zero

132

function fa at convergence. Obeserve that the output of the network is almost an

exact replica of the unknown sources.

Dynamic Environment eta = 0.005

 

 

 

 

 

 

 

0.25 —
0.2 r
.— 0.15 —
0.1 ~
0.05 -
o 1 l l 1 l l I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
012
012 = 0.6 021 = 0.2
0.31 0.25-
0.6 ------ _ - A -22- ﬂ- 0'2’ "
0.15»
N N
5 ° 4 5
0.1
0'2 0.05
0 . - . - 0 - - - -
0 5 10 15 20 0 5 10 15 20
time (sec) time (see)

Figure 5.9. Dynamic Forward Structure. Update only the D matrix according to
D = —7f,,(y)yT using two sine waveforms, 7 = 0.005 and D0 = 0

Other nonlinear function, reported in [8], were also considered. Computer simu-

lation of the following algorithms were performed

1') = —7y3y (5.66)

D = —-7 sinh y tanh y (5.67)

133

Dynamic Environment eta = 0.1

 

 

 

 

 

 

 

 

 

0.1-
0.08"
0.065
g 0.04-
0.02“
0
.0.02 l l l J l l J
0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35
d12
d12 :- 0.34 (121 I 0.081
0.15’
0.4’
N N 0.1 A
a r =5
0.05’
0.2
0
0 ‘ ‘ 4 4 -0.05 L ‘ * ‘
0 5 10 15 20 0 5 10 15 20
time (sec) time (89¢)

Figure 5.10. Dynamic Forward Structure. Update only the D matrix according to
D = -7f.1(y)yT using two sine waveforms, 7 = 0.5 and D0 = 0

134

 

 

'50 /
..1 . e A A

20 20.5 21 21 .5 22 20 20.5 21 21 .5 22

   

 

 

20 20.5 21 21 .5 22 -%0 20.5 21 21 .5 22

 

 

 

...1 A A A -
29 ,0-3 20.5 21 21.5 22
2

   

 

 

20 20.5 21 21.5 22 '220 20.5 21 21.5 22

Figure 5.11. Dynamic Forward Structure. Test Simulation of the update law D =
--7fa (y)yT at convergence using two sine waveforms

134

   

 

 

 

 

_1 A A A n 1 A A A
20 20.5 21 21.5 22 20 20.5 21 21.5 22
2 .

 

 

 

 

 

 

 

 

 

Figure 5.11. Dynamic Forward Structure. Test Simulation of the update law D =
-7fa (y)yT at convergence using two sine waveforms

135

Figures 5.12 and 5.13 show their performance. Observe that both nonlinear functions

are capable of separating signals. The reason of exploring the algorithms deﬁned by

equations (5.66) and (5.67) is to show that the nonlinear function fa performs as

equal to other nonlinear functions that have shown to provide excellent performance,

but that were chosen heuristically[8]. This is in contrast to the function fa that

was developed based on a well deﬁned energy function and independence criterion.

However, the function fd, reported in [24], did not provide the same performance. This

is due to the poor approximation of entropy approximation and/or the assumption

of the unit variance.

 

Dynamic Environment eta = 1

 

0.3 '
0.2 '
8
0.1 "'
o I I I I I I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
(112
(112 II 0.6 (121 I: 0.2
0.8 '

 

 

 

O 5 10
time (sec)

15

20

0.3 [

 

d12

 

O 5 1O 15 20
time (sec)

Figure 5.12. Update of the D Matrix using D = —7y3y

136

Dynamic Environment eta = 1
0.4 -

0.3 r

0.1 -

 

O l I L I I I I

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d1 2

d12 = 0.6 d21 :- 0.2
0.8 - 0.4 .

 

 

 

 

 

 

0 5 10 15 20 0 5 1O 15 20
time (sec) time (sec)

Figure 5.13. Update of the D Matrix using D = —7 sinhytanh y

137
5.7 Observations

Despite the limited results that one obtained in this chapter, a general framework to
derive an update law was developed. Such a framework can be applied to any network
structure. It was also observed that the introduced forward structure , in this chapter,
is subject to an identiﬁability problem due the freedom of the possible solutions that
the network could converge to. Thus, the task is to deﬁne an identiﬁable structure
and to use the herein developed tools to deﬁne the update law for that structure. It

is intended to do so in the following chapter.

CHAPTER 6

Blind Separation in a Dynamic

Environment: Feedback Structure

In this chapter, an architecture different from the one discussed in the previous chapter
is considered. It does not represent a particular network architecture, nor does it limit
the type of environments that can be considered. In this research, however, it will be
shown that any type of environment can be considered. The state space representation
of the network will be presented and the existence of a theoretical solution to the
problem shown. Then, an update law for the parameters of the network based on an

analogy of the static case will be developed.

6.1 Architecture

General systems are deﬁned by their input-output relationship. For time-invariant
systems, such a relation is described by a transfer function L(s) which relates the

input S (s) to the output E(s)

E(s) = L(s)S(3) (6.1)

138

139

If s 6 'R" and e 6 72'", then L(s) is an m x 1' matrix

 

 

r 1
L11 L..- 1...

L(3) = L11 Lit" Lir (6-2)
Lml ' Lm: LmrJ

 

 

 

 

1 £11. 111,1 L11 0 o l
L(3) = I?“ 1 f1,r 0 L“ 0 (6.3)
H... H...- 1 0 o L...
where
H.,-=%:—:j (64)

Let’s denote Z(s) to be the diagonal matrix of L(s). Thus, equation (6.1) is equivalent

to the following two equations:

13(3) = g(3)g(3) (6-5)
and

3(8) = 13(3)S(3) (66)

Therefore, the system described by equation (6.1) is equivalent to the cascaded pro-

cessing of two systems described by equations (6.5) and (6.6). Equation (6.5) performs

140

the pre-ﬁltering process, while equation (6.6) performs the mixing and/or ﬁltering
process that describes the model of the environment.

In the blind separation of signals, the goal is to recover the unknown sources.
This requires the development of a network that performs the inverse function of the
pre-ﬁltering and mixing/ﬁltering processes. Thus, such a network is also a cascade
of two processes. The ﬁrst one will be devoted to reverse the mixing and/ or ﬁltering
process, while the second one will be devoted to the inverse ﬁltering. Let’s denote
y and y to be the respective outputs of these processes. A feedback structure that

models the inverse process of the mixing and/ or ﬁltering is as follows
173) = 13(5) — H(s) 17(3) (6.7)

where H (s) is a matrix with zero diagonal entries. Thus, the goal of such a process is
to recover a replica of the §. Therefore, using equations (6.5) and (6.7), the following

equation is obtained:
I + H(s) = H(s) (6.8)

Thus, with an appropriate update law of the parameters of the matrix transfer func-

tion H (s), the problem has the solution
110(3) = 1310(3) W 751' (6-9)

Finally, to recover the original signal 8, some ﬁlter K (s) will be applied to 5', which

would result in an output

Y(s) = 1(5):: (6.10)
= K(s)§ (6.11)
K(s)l(s)5(s) (6.12)

141

If the goal is to recover S (s), then
K(s) = Z(s)"l (6.13)

However, no knowledge of the Z(s) is available. Thus, one could sufﬁce with a solution
to the problem that is up to a ﬁlter of the unknown sources. Figure 6.1 shows a

schematic diagram of these processes.

I’m-Filtering Mixing/Filtering Separation Algorithm Inverse Filtering

 

 

 

its) W0
---—9

 

 

8(5) H(s) ash

Sts>__., I.(s) K0)
L H(s) +—l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.1. Feedback Processing Structure

6.2 State Representation

In the last chapter, the canonical controllable realization to deﬁne the state space
realization of a given transfer function matrix was considered. Such realization will
also be used here to model the network. However, one should keep in mind that the
transfer matrix H (s) has zero diagonal entries. Therefore, the state space represen-

tation of H,,-(s) is ’D.-,- = (0,0,0, 0). Consequently, the 1"“ entries of the matrices Ag,

142

b,-, c? and 03‘, as deﬁned in equations (5.19) and (5.19), are null. Despite the fact
that these entries are redundant to the network model, they will be preserved for the
sake of clarity of the presentation of the model.

Here is a canonical controllable realization ’D = (A,B,C, D) for H (3) Such a
realization is deﬁned by equations (5.21) and (5.21). The state equation is described
by i

x = Ax + By (6.14)
while the output equation is described by
y = e — Cx — Dy (6.15)

Having deﬁned the structure and the realization of the network model, the update

laws for the parameters C and D are now ready to be developed.

6.3 Update Law Derivation

To develop the update law for the feedback structure, one needs ﬁrst to deﬁne the

vectors 5' and E and the matrix W as

y e .. D C
y = x = W = (6.16)
x 0 0 I
However,
e D C
y = — _ y (6.17)

143

Therefore, equation (6.15) can be rewritten as

~

5' = é - W? (6.18)
Using Theorem 2, the parameters of the matrix W are updated according to

W = 0.000? a e {a, b, c, d} (6.19)

This will reduce to

= nfa(Y)XT (6'20)

D = 7fa(y)yT (6'21)

where 1) and 7 are respectively the learning rates of the matrices C and D.

6.4 Computer Simulation Results

To study the performance of the algorithm and the feedback structure, one needs
to consider two approaches depending on the origin of the nonlinear function. Two
different approaches will be considered. One approach will be based on the nonlinear
functions fa and fd, which were developed in Chapter 4, based on the mutual infor-
mation functional. The other will based on selecting the nonlinear functions reported
in [8]. These two parts deﬁne functions that were developed based on the mutual in-
formation approach and others that were based on the neuromemitic approach. This
comparative study will be considered as a mean to quantify the performance of the
developed nonlinear function fa vis-c‘z-vis other existing nonlinear functions.
Computer simulations were conducted for the proposed architecture. A two di-

mensional network was considered. The two unknown sources were ﬁltered and mixed

144

according to the following transfer function deﬁned by

0.6 3+1 3+4 ‘
11(3): 1 W = 1 H12“)
#4403454: 1 Hats) 1

Therefore, the canonical controllable realization is D = (fl, 3, C, D) deﬁned below as

 

 

 

1

 

0 0 0 0 O 0 0 0

0 0 1 0 0 0 0 0
_ 0 —42 —13 0 0 0 _ 0 l
A: B:

0 0 O 0 1 0 0 0

0 0 0 —40 —13 0 1 O

0 0 0 0 0 0 _0 0
_ 0 -22.8 —4.8 0 0 0 _ 1.0 0.6
C: D:

0 0 0 —13.6 —3.2 0 L0'4 1.0

The bode plots of 312(3) and 1.121(3) are shown in Figure 6.2. Also, the bode plot of

H(s) = H(s)‘l is shown in Figure 6.3.

6.4.1 The Approach Based on Mutual Information

The algorithm deﬁned by equations (6.20) and (6.21) will be studied by exploring the

nonlinear functions fa and fa.

Algorithm deﬁned by fa

Computer simulations of the algorithm deﬁned by the equations (6.20) and (6.21)

and the nonlinear function fa was studied.

C = Ufa(y)xT

D = 7fa(Y)yT

(6.22)

(6.23)

145

Gain 03 Phase deg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_1 L .' _1 L
10° 105 10° 105
i (rad/sec) i (rad/sec)

Figure 6.2. Frequency response of the environment model

146

Gain dB Phase deg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f(rad/sec) I (rad/sec)

Figure 6.3. Frequency response of the network model

147

In order to study its performance thoroughly, the parameters of the network will be
initially updated separately. This was accomplished by updating some parameters
of the network, while holding the remaining parameters, constant to their nominal
values.

The ﬁrst set of simulations considered the update of the individual parameters
of the C matrix, namely €12, c13, 621 and 022. The initial value of each of these
parameters was equal to 0.8 of their nominal values. Also, the unknown sources
, were assumed to be 2 sine waveforms with respective frequencies 1H 2 and 2H 2.
Figures 6.4 through 6.7 display the results of such simulation. One could observe
that all the parameters converge individually to the desired values. It is important to
note that the learning rates of each of the parameters of the C matrix are different,
[1712, 1713, 1121, rm] = [101 10’1 10° 10’2].

Once it was demonstrated that each of the parameters converged to the desired
value, the update law of all the parameters of the C matrix were combined. Computer
simulation were conducted assuming different initial conditions of the parameters of
the C matrix. Figures 6.8 through 6.11 represent the results of simulations when the
initial conditions are respectively 0.8C", 0.5C‘, 0.10“ and 0. In all these simulations,
one could observe that all the parameters of the C matrix converge to the desired
values.

So far, only simulation results of the performance of the parameters of the C
matrix were considered. These simulations showed excellent results in terms of the
performance of the algorithm in separating the unknown sources. The performance
of the algorithm in updating the parameters of the D matrix is considered next.

Computer simulations for different initial conditions of the D matrix, while the
parameters of the C matrix are held at their nominal values. were considered. Figures
6.12 through 6.15 show the results of the simulations when the initial conditions are

respectively (d12, d21) = (0,0), (0,1), (1,0) and (1,1). Figure 6.16 combines the

148

performance of the algorithm for all these tests. One could observe that that the
parameters of the D matrix also converged to the desired ’values.

Despite the fact that the matrices C and D converged to the desired parameters
when each one of these matrices was updated separately, some parameters of the
network failed to converge when both matrices are updated simultaneously. Therefore,

sequential update will be considered in the next section to resolve this problem.

Algorithm deﬁned by fd

When the nonlinear function fd is assumed to deﬁne the nonlinear function of the
algorithm described by equations (6.20) (6.21), the parameters of the matrices C and
D converged to some undesirable solution, causing the algorithm to fail in separating
the unknown sources. Figures 6.17 and 6.18 display the performance of the algorithm
when each matrix is updated separately.

This last result shows again the superior performance of the algorithm deﬁned by

the nonlinear function fa compared to the one deﬁned by fd, as developed in [24].

-16

149

Update the parameter c12 while keeping all other parameters fixed

-17

-18

-19

—22

—23

-24
0

-.---0--.

_ WW 11% i"""'l

L I I

 

20 40 60

C12 3 -23.42

”I '1' I!" ’

l

I

80 100
time (sec)

120 140 160

Figure 6.4. 6.3 = mjfa(y.°)2.'j and "ii = 10

-16

-17

-18

-19

-20

-22

-23

-24

150

Update the parameter 012 while keeping all other parameters fixed

I

I

" ‘. C12 :- -23.42

 

l l l l

O 20 4O 60 80 100 120 140 160 180 200

time (sec)

Figure 6.5. 65,- = mjfa(y.-)x.-j and 17,-,- = 101

-10.5

—11

-11.5

-12

-12.5

-13

-13.5

-14
O

I

I

I

l

 

151

Update the parameter c21 while keeping all other parameters fixed

c21 = -13.47

20 4O 6O 80 100 120 140 160 180 200
time (see)

Figure 6.6 Cij = ngjfa(y.-)x.-j and 1751' = 100

152

Update the parameter (:22 while keeping all other parameters fixed

I
N
\l

l

I
N
‘0

l

c22 = -3.202

_3.2_.-._._.._.

 

_3.3 1 L 1 l 1 l l 1 1 I
O 20 4O 60 80 100 1 20 1 4O 1 60 1 80 200

time (sec)

Figure 6.7. c1)- = ngjfa(y.-):r,-J- and 17.5 = 10"2

6.4.2 An Approach Based on Neuromemitic

Update of C Matrix

The parameters of the C matrix will be updated according to a decorrelation between

the output and the state. Therefore, its parameters will be updated according to:
(:5,- = ﬂijyixij (6-24)

Computer simulations were performed for each of the parameters of the matrix C

individually. Then, the update of all of these parameters together was performed

153

C12 C13

 

 

 

 

 

 

 

 

 

Figure 6.8. Update all the parameters of the 0' matrix according to (3,5 = 7].-]- fa(y,-)a:,-J-,
(’712, 7’13, 7’21, "22) = (101 01a 2a 01) and CO = 080

154

c12 c13

 

 

 

 

 

 

100

c21

 

 

 

 

 

 

 

100 O 50 100

Figure 6.9. Update all the parameters of the C matrix according to éij = ngjfa(yi)$ij,
(7)12, 7713, 1721, 1722) = (10, 0.1, 2, 0.1) and Co = 0.50

155

c12 c13

 

 

 

 

 

100

 

 

 

 

 

 

100

Figure 6.10. Update all the parameters of the 0 matrix according to (3,5 = 7].-,- fa(y,-):c,-j,
(7)12, 7113, 7121, 17213) = (10, 0.1, 2, 0.1) and Co = 0.10

156

C12 013

 

 

 

 

 

 

100

 

 

 

 

 

 

 

 

Figure 6.11. Update all the parameters of the C matrix according to éij = Uij fa(yi)$.‘j,
(1712: 013, 7121, 7122) = (10, 0.1, 2, 0.1) and Co = 0

157

Update D while C is at the nominal value with eta = 0.0005

0.5 -

0.4 P

d21

0.2 -

 

0 0.1 0.2

d12 = 0.6

0.6._._._.....-._J_

0.3 0.4

d12

0.5 '

 

 

d21

 

0 50
time (sec)

100

 

0.5 0.6 0.7

d21 = 0.4

 

L

50 100
time (sec)

Figure 6.12. Update all the parameters of the D matrix according to dij = 7fa(y,-)yj,
7 = 0.0005 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt)
and 32(t) = sin(47rt)

158

Update D while C is at the nominal value with eta = 0.0005

 

 

   

 

 

 

 

0.8 r
g 0.6 -
0.4 -
0.2 l l l l l L l I
-0.1 O O 1 0 2 0.3 0.4 0.5 0.6 0.7
d12
d12 = 0.6 d21 = 0.4
0.8 ' 1
' 0.8 »
N v-
15 $61 0.6’
0.4 , . . . . . . . . .
-O.2 ‘ ‘ 4 ‘ 0.2 4 ‘ ‘ d
0 50 100 150 200 O 50 100 150 200
time (sec) time (sec)

Figure 6.13. Update all the parameters of the D matrix according to (L'j = 7fa(y.-)yj,

7 = 0.0005 and (d120, (1210) = (0.0, 1.0). The unknown sources are 31 (t) = sin(21rt)
and 32(t) = sin(41rt)

159

Update D while C is at the nominal value with eta = 0.0005
0.5 -

0.4 -

d21

0.2 -

0.1 -

 

 

o l ‘ 1 I
0.55 0.6 0.65 0.7 0.75 08 0.85 0.9 0.95 1 1.05

 

 

 

 

 

d12
d12 = 0.6 d21 = 0.39
0.5 ’
O4 .................
0.3 .
5
'0
0.2 r
0.1 .
r ' . o . .
O 50 100 0 50 100
time (sec) time (sec)

Figure 6.14. Update all the parameters of the D matrix according to dij = 7 fa(y.-)y,-,
7 = 0.0005 and (d120, (1210) = (1.0, 0.0). The unknown sources are 31(t) = sin(21rt)
and 32(t) = sin(41rt)

160

Update D while C is at the nominal value with eta = 0.0005

 

 

 

 

 

 

 

 

 

 

1 .—
0.8+
g 0.6 -
0.4-
0.2 l 1 l l l l l l l I
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05
d12
012 = 0.6 d21 = 0.4
1 .
1 .
0.8
N 1-
508+ g00
0.4 ...... .-.-...“ * -—~ _ ~12;
0.6 . ........ . ..... v ‘4' :::::: _ W
‘ ' 0.2 ‘ '
0 50 100 0 50 100
time (sec) time (sec)

Figure 6.15. Update all the parameters of the D matrix according to (15,- = 7 fa(y.-)yj,

7 = 0.0005 and (d120, (1210) = (1.0, 1.0). The unknown sources are 31(t) = sin(27rt)
and 32(t) = sin(41rt)

1'61

Update D while C is at the nominal value with eta = 0.0005

0.9

I

0.8

I

T

0.7

d21

0.5 -'

I

0.4

oooooooooooooooooooooooooooooooooooooooooooooooooooo

V

0.3
0.2 r

0.1 -

 

0 l l l l ‘ I
-0.2 O 0.2 0.4 0.6 0.8 1 1 .2
d12

Figure 6.16. Update all the parameters of the D matrix according to dgj = 7fa(y.-)yj
and 7 = 0.0005 for different initial conditions. The unknown sources are 31(t) =
sin(21rt) and 32(t) = sin(41rt)

162

 

 

 

 

 

 

 

 

 

 

 

c12 C13
—18'
-19~
-2o-
-21' i F L h
-22»
_23----""='"""" —5.5
o 50 100 O 50 10°
c21 ' 022
0»
-1»
-2,
-3’_,_._._._,_,_,--._._._._..
_4 -
0 so 100

 

Figure 6.17. Dynamic Feedback Structure. Update only the C matrix according to
C = 17 fd(y)xT using two sine waveforms, r] = [100 10 100 10] and Co = 0.8C”

163

Update D while 0 is at the nominal value with eta a: 0.2
0.12 -

0.1
0.03
g 0.06
0.04
0.02

I

I

I

l

l

 

l

O 0.05 0.1 0.1 5 0.2 0.25
d12

(“2:024 d21 30.11
0.6 ............... .... _____ .-.- 0.4.-.._._.-._._._.-._._._._._

0.4 '

0.2
0.1

O 50 100 O 50 100
time (see) time (sec)

d12

d21
o
N

 

 

 

 

Figure 6.18. Dynamic Feedback Structure. Update only the D matrix according to
D = nfd(y)yT using two sine waveforms, 17 = 0.2 and D0 = 0

164

while keeping the parameters of the matrix D fixed at their nominal values at all times.
These simulations were performed with two sine waveforms of respective frequencies
f1 = 1H2 and f2 = 2H 2. The initial conditions for each case were such that the
parameters to be updated were 0.8 of the nominal values. Figures 6.19 through
6.22 show, respectively, the convergence of the parameters 612, cm, 621, (:22. Each
one of these parameters required a different learning rate. The fact that each of
these parameters is the coefﬁcient of some power 3, where s is the time derivative
operator, may reveal the reason for this behavior. Therefore, there is an order of
time scale between these parameters. Later in this chapter, this concept will be
introduced. In addition, it will be shown that, under a well defined transformation,
such a time scale could be reduced, as well as all the parameters of C, to be in the
same order of magnitude. Figure 6.23 shows the convergence of the parameters of the
matrix C when all the parameters were updated simultaneously. Observe that all the
parameters have converged to the correct values. K

Using the update law deﬁned by (6.24), a set of simulations was conducted to
study the performance of the algorithm for the C matrix based on the choice of the
initial conditions. Figures 6.23 through 6.26 show the performance when the initial
conditions were respectively 0.8, 0.5, 0.1 and 0.0 of the nominal value of the C matrix.
Regardless of the initial conditions, the parameters converge to the desired values.
These simulations were also repeated with square and sawtooth waveform functions.
The same results were obtained as when using sine waveforms. Some sample results of
using different unknown sources were as follows. For instance, Figure 6.27 shows the
performance of a sine waveform of frequency 1H2 and a square waveform of frequency
2H 2. The initial conditions were all at 0. Figure 6.28 shows the performance of a
sawtooth waveform of frequency 1H 2 and a square waveform of frequency 2H z. The
initial conditions were all at 0.

The update law defined by equation (6.24) was modiﬁed to include some nonlin—

165

earities. This was motivated from the work of Herault and Jutten [8, 9]. The update

equations were the following:

(3:5 = nut/fixa- (5-25)
and

a, = 17,-,- sinh y.- tanh 33,-,- (6.26)

Computer simulation of both algorithms were conducted using the same initial condi-
tions, and mixing matrix, and unknown sources. The results of these two simulations
are displayed respectively in Figures 6.29 and 6.30. One observes that both algorithms

converge.

Update of Matrix D

Starting from the energy function that deﬁnes the total power in the signal, one would

derive an derive an update law for the parameters of the matrix D similar to (5.40)
d5,- = 7.13/{61'

Computer simulations were performed using 2 sine waveforms of respective frequen-
cies 1112 and 2H 2. Zero initial conditions were considered. The results of such a
simulation is displayed in Figure 6.31. It should be noted that the network failed to
converge to the desired values.

On the other hand, successful update of the D matrix, based on the rules developed

in [8], was also accomplished.

do = vomit/j (6.27)

166

and

dgj = 7,-1- sinh y,- tanh y, (6.28)
The performance of the algorithms deﬁned by equations (6.27) and (6.28) are shown
respectively in Figures 6.32 and 6.33. The learning rate 7 is taken to be the same for
all the parameters in each of the above simulations, and is equal to 0.2.

The algorithm deﬁned by equation (6.28) will now be considered and its perfor-
mance will be studied. First, its performance for different initial conditions will be
investigated. Figures 6.33 through 6.36 represent a set of simulations for 4 different
initial conditions. The algorithm always converges to the desired values regardless of
the initial condition. Figure 6.37 shows the convergence of the algorithm deﬁned by
equation (6.28) for the above initial conditions when plotted together.

The performance of the algorithm for various unknown types of sources, sawtooth,
square and sine functions was also studied. Figures 6.38 through 6.40 show respec-
tively the performance of the algorithm for the following scenarios: (i) using a sine
and a square of respective frequencies 1H2 and 2H 2, (ii) using a sine and a sawtooth
of respective frequencies 1H2 and 2H 2, and (iii) using a a sawtooth and a square of
respective frequencies 1H2 and 2H 2. One can observe that the parameters converge
to the desired values in all these scenarios.

In conclusion, it was demonstrated that the algorithms for. updating the matrices
C and D have converged when each of these parameters is updated individually by
considering the mutual information approach or the neuromimetic approach. It was
also shown that both algorithms perform well under various initial conditions, for
different types of unknown sources, as well as for different learning rates in the case
of the C matrix parameters. It was also shown that the algorithm, deﬁned by the

function fd, which was developed in [24] failed to perform the separation task. It

167

was also shown that the algorithm deﬁned by fa has equivalent performance to those
deﬁned based on the neuromimetic approach. The novelty of our approach resides in
the fact that the nonlinear function f,I was developed based on the minimization of
an energy function that deﬁnes the independence of signals while the other functions

in the literature were motivated from biological inspirations.

Update C and D

In the previous two sections, the performance of the algorithms for updating the
matrices C and D, separately was explored. In this section, both update laws will
be combined. In this case, the parameters will be updated simultaneously. The

parameters of the C matrix were updated according to

a, = 770316150 (6-29)
and those of the D matrix were updated according to

dgj = 7;, sinh y; tanh g,» (6.30)

One parameter at a time

In the ﬁrst set of simulations, only one parameter of each of the matrices C and D,
namely cm and d12 were updated. These two parameters will be updated according
to the equations (6.29) and (6.30).

In the ﬁrst simulation, the parameter d12 was initially set at its nominal value,
£2 = 12, while the parameter cm is initially set to be 0.8 of its nominal value,
offs = 0.8613. The results of this simulations are presented in Figure 6.41. In this case,
both parameters converge to the desired values as it can be observed in Figure 6.41.

In the next simulation, the initial conditions of the parameters were reversed. The

parameter cm was initially set at its nominal value, cfl’3 = Cisa while the parameter

168

cm is initially set to be 0.0, diz = 0.0. The results of this simulations are presented in
Figure 6.42. One, however, observes that the parameter c13 did not converge yet, as
it is shown in Figure 6.42. This is due to its small learning rate 1713 = 20. Thus, when
the learning rate is increased to 1713 = 40, both parameters converge, as displayed
in Figure 6.43. It should be noted that, in Figure 6.43, despite the fact that the
parameter cm was initially at the desired value, it was pushed away from it while the
parameter d12 was still searching for the path to correct solution. Ultimately, both
parameters converge to the desired values. Figure 6.44 shows the phase plot of the

parameters d12 and c13.

All the parameters together

The previous section proved that the update laws of the matrices C and D were
successful when only some parameters of each matrix were updated. The idea behind
this set of simulations was to study any coupling effect between the two update laws.
It was concluded that the algorithm performs well. In this section, all the parameters
of the matrices C and D will be updated and the algorithm’s performance will be
observed. A set of simulations will be conducted in which some parameters were
given their nominal values as the initial conditions, while starting others at some
values other than their nominal values. The unknown sources used, for this case,
were the sawtooth and the square waveforms with respective frequencies f; = 1H2
and f2 = 2H 2. Figures 6.45 through 6.48 show the performance of the network for
different initial conditions, while the learning rates for the matrices C and D were
held constant at all simulations. Figure 6.45 shows that the algorithm could not
converge for the speciﬁed time range and learning rate. Thus, a simulation with a
higher learning rate for the parameters 621 and 622 was required, as shown in Figure

6.46. However, a longer running time was also needed. Thus, when T = 1000,

the algorithm was able to converge. Figure 6.47 shows that all the parameters. did

169

converge, except for one, on. One possible route to solve this problem is to consider

sequential update.

6.5 Sequential Update

In Figure 6.47, it was noted that all the parameters converged except for one parame-
ter, 6;]. In order to eliminate this phenomenon, we consider updating the parameters
”sequentially”. This was accomplished by updating the set of parameters of the C
and D matrices at alternate time intervals of length AT. This means that starting
at time to, and without loss of generality, the parameters of the matrix D will be
updated during the time interval [to+ (k —1)AT,to kAT] while those of the C matrix
will be held constant. However, during the time interval [to + IcAT, to + (k + 1)AT],
the order of update will be switched so that the parameters of C will now be updated,
while those of D will remain constant at their nominal values.

Figures 6.49 through 6.51 show the performance of the algorithm when such a
sequential training method is applied with AT = 50 seconds. Figure 6.49 shows
the performance when the learning rates of the parameters of the C matrix were
77 = [103 10 102 10] and that of the D matrix is 7 = 0.2. Observe in this case that the
parameter 621 is converging but a longer time is required for it to converge, or a larger
learning rate is required. Also, one observes that in Figure 6.49, all the parameters
of the C matrix, except €21, converged in a small period of time, while exhibiting
some oscillation. This is due to the large learning rates at which they were updated.
Thus, smaller update rates were considered and taken to be 7; = [200 10 200 10]. The
results of these simulations are presented in Figure 6.51. Also, Figure 6.50 shows a
simulation when '1] = [1000 10 500 10].

Figure 6.52 shows the response of the network at the start of the update. One

observes that the output signals y(t) were different from the desired signals s(t).

170

Recall that the initial condition of the system was not set at the nominal values, but
rather Co = 080' and Do = 0. Figure 6.53 shows the performance of the algorithm
after all the parameters have converged. It can clearly be observed that the original
signals were recovered and that the error was in the order of 10’3. The network is
tested by changing the unknown source to a pair of a sawtooth and a sine functions.
Figure 6.54 shows the signal recovery performed by the network. The ﬁnal values to
which the network converged were [cu cm, 621 622] = {—21.9783 —4.7968 — 13.4989 —
3.3635] and [(112 d21] = [0.3992 0.6006]

Update the parameter c12 while keeping all other parameters fixed
-1 5

-16

        

C12 = -23.25

 

 

_22-
--.-._._._._._..-._.-._._ ‘llllm ..
_23-
.—24 i l 1 l A l 1 1 1 j
0 20 4O 60 80 1 00 1 20 1 4O 1 60 180 200

time (see)

Figure 6.19. 6,3 = mjygxgj and 1751' = 2000

171

Update the parameter c1 3 while keeping all other parameters fixed

   

.43--.-.-.-.-.-.-.._._.-._._._._._llllm

 

_5 l l l l l 1 1 l J
O 20 4O 60 80 100 120 140 160 180 200

time (see)

Figure 6.20. c},- = ﬂijyil'gj and m,- = 20

172

Update the parameter (:21 while keeping all other parameters fixed
—10.5 -

-125 — 021 = —13.59

-13.5-_ _ .l ll 1l on. ll.

 

_14‘s 1 l l l l l l I
O 20 4O 60 80 100 120 140 160 180 200

time (see)

Figure 6.21. é,,- = 17.33/ng and 1);,- = 500

 

173

Update the parameter c22 while keeping all other parameters fixed

c22 = —3.225

20 4O 60 80 1 00 1 20 1 40 1 60 1 80 200
time (see)

Figure 6.22. 6;,- = mJ-ygmgj and 17,-,- = 10

174

C12 C13

 

 

 

 

 

02" c22
_10 .

 

 

 

 

 

Figure 6.23. Update all the parameters of the C matrix according to Cij = mJ-ngj,
(7’12, "131 "213 ”22) = (103,10, 102, 1) and 0'0 = 08C

175

C12 (:13

 

 

 

 

 

021 (:22

 

 

 

 

 

Figure 6.24. Update all the parameters of the C matrix according to c},- = nijyixij,
(7712, 7713, 1721, ’722) = (103,10, 102, 1) and Co = 0-50

176

c12 C13

 

 

 

 

 

 

 

 

 

 

Figure 6.25. Update all the parameters of the C matrix according to (3.-,- = mJ-ygxgj,
(7)12, 013, '721, 7722) = (103,10,102,1) and Co = 0-10

177

C12 C13

 

 

 

 

 

c21 c22

 

 

 

 

 

 

600 0 zoo 400 600

Figure 6.26. Update all the parameters of the C matrix according to 6,,- :: 7103151713,
(nu, ms, 1721, 7122)=(103,10,102,1) and c0 = o

178

C12 C13

C-O-O-—O--——l—l--O-O-

200 400 600

 

 

 

 

 

c21 622

 

 

 

 

 

 

600 0 200 400 600

Figure 6.27. Update all the parameters of the C matrix according to a, = ﬂijyixij,
(7)12, 1113, 1721, 1722) = (103,10,102, 1) and Co = 0. The unknown sources are 31(t) =
sin(21rt) and 32(t) = square(41rt)

179

 

 

 

 

 

 

 

 

Figure 6.28. Update all the parameters of the C matrix according to é.)- = ﬂijyiitgj,
(1)12, 1713, 7121, 1122) = (103,10,105,102) and Co = 0. The unknown sources are 31(t) =
sawtooth(27rt) and 32(t) = square(41rt)

180

C12 C13

 

 

 

 

 

 

 

 

 

 

022
o.
-1
-2.
.3 --
_4 r
600 0 200 400 600

Figure 6.29. Update all the parameters of the C matrix according to (3;,- =
17;,- sinh ygtanh 3.3, (1112, 1713, 7721, 1122) = (103,10,102,10) and Co = 0. The unknown
sources are 31(t) = sin(27rt) and 32(t) = sin(47rt)

181

C12 C13

 

 

 

 

 

 

 

 

 

 

021 c22
0 ’ 0 .
-1
..5 »
-2 -
-10 .
.3 -
-15 ‘ ‘ ‘ -4 ‘ ‘ '
0 200 400 600 0 200 400 600

Figure 6.30. Update all the parameters of the C matrix according to (3.-,- = nag/£322,”
(1)12, 1713, 1721, 1722) = (103, 10, 102, 10) and Co = 0. The unknown sources are 31(t) =
sin(2nt) and 32(t) = sin(41rt)

182

Update D while C is at the nominal value with eta = 0.1

 

 

 

 

 

 

0.8-
0.6-
§o.4-
0.2r
o l l l l l l l
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d12
d12 = 0.67 d21 = 0.62
0.8- 0.8'
0.6.-.-.-. -.-._.-.-._.._.._._. 06,
N v-
304 g0.4._. ........ ._._._. ..... .-.
0.21 0.2’
0 ‘ ‘ 0 ‘ '
0 50 100 0 50 100
time (see) time (sec)

Figure 6.31. Update all the parameters of the D matrix according to d5,- = 73/56,,
7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt) and
32(t) = sin(41rt)

0.5

0.4

0.3

d21

0.2

0.1

183

Update D while C is at the nominal value with eta = 0.1

I

 

0 0.1 0.2

d12: 0.6

p...-—---—---

0.3

 

 

0 50

time (sec)

100

d12

d21

d21 = 0.4

0.5 '

 

 

 

Figure 6.32. Update all the parameters of the D matrix according to (iij = 7y?y,-,
7 = 0.2 and (due, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and

32(t) =

sin(47rt)

184

Update D while C is at the nominal value with eta = 0.2

0.5

0.4 r

0.3 r

d21

0.2 -

0.1 -

 

O 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d12

d12 = 0.6 d21 = 0.4

d21

 

 

 

 

 

 

Figure 6.33. Update all the parameters of the D matrix according to d},- =
7sinh ygtanh 31,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are
31 (t) = sin(21rt) and 32(t) = sin(41rt)

185

Update D while C is at the nominal value with eta = 0.2

0.5

I

0.4

I

0.3

 

d21

I

0.2

I

0.1

 

 

l l l

O I 1 I
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

 

 

 

 

 

 

d12
d12 = 0.6 d21 = 0.4
1 ' 0.5'
0.4.-.-._._._.-
0.3r
N 1-
v- N
'O '0
0.2?
0.1 >
0.5 a . ‘ . ' 0 * ‘ a ‘
0 50 100 150 200 0 50 100 150 200
time (see) time (sec)

Figure 6.34. Update all the parameters of the D matrix according to d},- =
7sinh ygtanh y,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.9). The unknown sources are
31(t) = sin(21rt) and 32(t) = sin(47rt)

186

Update D while C is at the nominal value with eta = 0.2

I

0.8

0.6 *-

d21

0.4 r

 

d12
d12 = 0.6

 

 

0 50 100 150 200
time (sec)

0.2 l l l l
0 0.1 0.2 0.3 0.4

 

 

0.5 0.6 0.7

d21 = 0.4

 

L

 

0.2
0

50 100 150 200
time (sec)

Figure 6.35. Update all the parameters of the D matrix according to al,-,- ==
7sinh ygtanh 31,-, 7 = 0.2 and (d120, (1210) = (0.9, 0.0). The unknown sources are

31(t) = sin(27rt) and 32(t) = sin(41rt)

187

Update D while C is at the nominal value with eta = 0.2

0.8 -

0.6

d21

0.4 r

 

l L l l l l

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9
d12

d12 = 0.6 d21 = 0.4
0.9 '

0.8 ’
0.8 .

 

 

 

50 100 150 200 0 50 100 150 200
time (sec) time (sec)

4*

 

0.5
0

Figure 6.36. Update all the parameters of the D matrix according to (iij =
7sinh ygtanh y,-, 7 = 0.2 and (d120, duo) = (0.9, 0.9). The unknown sources are
31(t) = sin(21rt) and 32(t) = sin(41rt)

188

Update D while C is at the nominal value with eta = 0.2
0.9 ,

I

0.8

I

0.7

d21

 

 

 

0.2 r
0.1 -
. l I 1 L l l l l l
0 0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9 1

d12

Figure 6.37. Update all the parameters of the D matrix according to d},- =
7sinh y.- tanh y,- and 7 = 0.2 for different initial conditions. The unknown sources
are 31(t) = sin(21rt) and 32(t) = sin(47rt)

189

Update D while C is at the nominal value with eta = 0.2

 

 

0.6 '

0.4 '-
Ki
'0

0.2 r

O J I l I I L I
O 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d12
d12 = 0.59 d21 = 0.4
0.8 i 0 6 '

d21

   

 

 

 

 

0 50 100 0 , 50 100
time (see) time (sec)

Figure 6.38. Update all the parameters of the D matrix according to d},- =
7sinh ygtanh yj, 7 = 0.2 and (due, duo) = (0.0, 0.0). The unknown sources are
31(t) = sin(27rt) and 32(t) = square(41rt)

190

Update D while C is at the nominal value with eta = 0.2

0.5

I

0.4

0.3

d21

0.2

I

0.1

 

0 J l I l L l I

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
d12

d12 = 0.6 d21 = 0.4

d21

 

 

 

 

 

0 50 100
time (sec)

 

Figure 6.39. Update all the parameters of the D matrix according to d},- =
7sinh ygtanh 3],, 7 = 0.2 and (due, (1210) = (0.0, 0.0). The unknown sources are
31 (t) = sin(21rt) and 32(t) = sawtooth(47rt)

191

Update D while C is at the nominal value with eta = 0.2

 

 

0.6 -

0.4 r
a
'0

0.2 r

O l l l l l 4 l
0 0 1 0 2 0.3 0.4 0 5 0 6 0 7
d12
d12 = 0.6 d21 = 0.4
0.8' 0 6 '

d21

   

 

 

 

 

o A . O L .
0 50 100 0 50 100
time (sec) time (sec)

Figure 6.40. Update all the parameters of the D matrix according to d},- =
7sinh ygtanh 3],, 7 = 0.2 and (due, dglo) = (0.0, 0.0). The unknown sources are
.91 (t) = sawtooth(21rt) and 32(t) = square(47rt)

192

Update the parameter c13 and d12 while keeping all other parameters fixed

 

 

 

 

 

 

 

 

 

 

 

 

__4-
c13=-4.715
-4.5-

_5 l l l I l l l l l I
0 10 20 30 40 50 60 70 80 90 100
time(sec)

0.62-
d12=0.6031
0.58-
0.56 l 1 1 1 l l l l l I
0 10 20 30 40 50 60 70 80 90 100
time(sec)

Figure 6.41. Update only one parameter of each matrix: update the parameters on
and d12 with 17 = 20, 7 = 0.2 and (c130, duo) = (0.8ci‘3, (11,). The unknown sources
are 31(t) = sawtooth(21rt) and 32(t) = square(41rt)

193

Update the parameter 013 and d12 while keeping all other parameters fixed

    
 

C13 = -4.509

 

 

.4
-4.5
_5 I L l I I L l L I I
0 10 20 30 40 50 60 70 80 90 100
time (sec)
0.8 r

0.6

   

d12 = 0.5989

 

 

0.4
0.2
o J I I I I I l I I I
O 10 2O 30 40 50 60 70 80 90 100

time (sec)

Figure 6.42. Update only one parameter of each matrix: update the parameters cm
and d12 with 17 = 20, 7 = 0.2 and (c130, duo) = (cia, 0.0). The unknown sources are
31 (t) = sawtooth(21rt) and 32(t) = square(41rt)

 

 

194

Update the parameter c13 and d12 while keeping all other parameters fixed

  

C13 = -4.583

 
 

 

 

_4
_5 ' 4 1 1 1 1 1 1 1 4 1
0 1 0 20 3O 4O 50 60 70 80 90 1 00
time (sec)
0.8 l-

 

 

 

Figure 6.43. Update only one parameter of each matrix: update the parameters cm
and d12 with n = 40, 7 = 0.2 and (013°, duo) = (cis, 0.0). The unknown sources are
.91 (t) = sawtooth(27rt) and 32(t) = square(41rt)

195

Phase Plot of two parameters convergence

 

 

 

_6 .-

..7 -

-8 1 1 1 1 1 1 1
O 0.1 0.2 0.3 0.4 0.5 0.6 0.7

d12

Figure 6.44. Update only one parameter of each matrix: update the parameters cm,
and (112 with 1] = 40, 7 = 0.2 and (c130, (1120) = (ch, 0.0). The unknown sources are
.91 (t) = sawtooth(27rt) and 32(t) = square(47rt)

196

012 C13

 

 

 

 

 

200 400 600

 

 

 

 

 

 

 

 

C21 600
-10[
-12.
-14.-.-....-.‘-.._._._..._._.. J .35 L A _
0 200 400 600 0 200 400 600
0.45
0'6 ll! [UH ‘ ,""",1H11l“ 7‘ i
N I',l ‘|' L ‘1 ‘— .
:6 l i} i i g 0.4 i 1 l
' l
' A ‘ ‘ 0.35 , - + .
0 200 400 600 o 200 400 600
time(sec) time(sec)

Figure 6.45. Update all parameters of each matrix. Co = 0.80‘ and D0 = D‘, 1] =
[103 10 102 10] , 7 = 0.2 and T = 500 The unknown sources are 31(t) = sawtooth(21rt)
and 32(t) = square(47rt)

197

c12 c13

 

 

 

 

 

 

 

   
   

 

 

 

zoo 400 600 ' o zoo 400 600
0.45
N 0.6 .... ”I w .l‘ ‘.,‘ ali‘lii ll l P
u V . U
o 200 400 606 ' o 200 400 600

time (sec) time (590)

Figure 6.46. Update all parameters of each matrix. Co = 0.8C" and D0 = D‘,
17 = [103 10 104 102], 7 = 0.2 and T = 500. The unknown sources are 31(t) =
sawtooth(27rt) and 32(t) = square(4xt)

198

 

 

 

 

 

 

 

0 500 1000 0 500 1000
time (sec) time (sec)

Figure 6.47. Update all parameters of each matrix. Co = 0.8C‘ and D0 = D‘,
r] = [103 10 104 102], 7 = 0.2 and T = 1000. The unknown sources are 31(t) =
sawtooth(27rt) and 32(t) = square(41rt)

199

C12 c13

 

 

 

 

 

 

 

 

1
i 0.5 '

 

d12
.°
tn

d21

 

 

 

 

0 500 1000 0 500 1000
time (see) time (596)

Figure 6.48. Update all parameters of each matrix. Co = 080" and D0 = 0,
r} = [103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are
.31 (t) = sawtooth(21rt) and 32(t) = square(41rt)

200

6.6 Time and Weight Scaling

It has been observed that the algorithm fails to converge when the parameters of
the C matrix are too large. This is not a limitation of the algorithm, but rather a
limitation of the computer implementation. Thus, a scaling down of these parameters
is proposed in order to make them equal in order of magnitude and in a range that

is acceptable for digital hardware implementation.

6.6.1 Mathematical Development

Given a single input-single output dynamical system described by its transfer function

313"-1 + 32311—2 + ' - ' + ,Bn-ls + 5n
3" + 013"“ + 02.9"—2 + - - - + art-13 + an

 

H(s)= +d

Consider its canonical controllable realization D = (A,b, CT,d). Such realization

was presented in Chapter 5 to be of the form

 

 

 

 

- 0 1 0 0 . P0-

0 0 1 0 0
x = x+ u

0 0 0 0 1 0

_—a,, —a1_ 1‘

201

C12 C13
-15 7 -3.5 '

 

 

 

 

 

 

 

 

 

d12
P
m v

d21

 

 

 

 

0 5000 10000 0 5000 10000
time (see) time (sec)

Figure 6.49. Update all parameters of each matrix. Co = 0.8C‘ and D0 = 0,
17 = [1000 10 100 10], 7 = 0.2 and T = 10000. The unknown sources are
31 (t) = sawtooth(27rt) and 32(t) = square(47rt)

 

 

202

c12 C13

 

 

 

 

 

 

 

 

 

 

 

 

 

0 5000 10000 0 5000 10000
time (see) time (see)

Figure 6.50. Update all parameters of each matrix. Co = 0.80“ and D0 = 0,
17 = [1000 10 500 10], 7 = 0.2 and T = 10000. The unknown sources are
31(t) = sawtooth(21rt) and 32(t) = square(41rt)

 

 

203

m as
-13. -sr

 

 

 

 

 

 

 

 

 

 

 

d12
.6
a. .
d21

 

 

 

 

0 + ‘ 0 : ‘
0 5000 10000 0 5000 10000
time (see) time (sec)

Figure 6.51. Update all parameters of each matrix. Co = 0.8C’ and D0 = 0,
17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are

31 (t) = sawtooth(21rt) and 32(t) = square(41rt)

 

 

204

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    

 

 

 

 

I; o ‘3‘. o
-1 -1
200 200.5 201 201.5 202 200 200.5 201 201.5 202
2 z,
'5 o ‘3 o
_2 . 1 1_ . - L 4 . .
200 200.5 201 201.5 202 200 200.5 201 201.5 202
0.5 0.5-
;. 0 “i 0
~05 L . L 1 -o.5 . . . -
200 200.5 201 201.5 202 200 200.5 201 201.5 202
1 2'
'5 0
I 0» I 0’
3. ‘9'.
-1 . . 1 . _ 1 . a .
200 200.5 201 201.5 202 200 200.5 201 201.5 202

Figure 6.52. Unknown sources, environment output and Network output at initial
time t = 0. Co = 0.80" and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000.
The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt)

 

 

205

 

 

51

o
32
o

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-1
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002
21 2i
o.
'5 ‘3 0
__2.
_4 . . 1 . _2 A A L .
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002
11 Zr
'io’ ‘io’
-1 -2 4 * 4 ‘
1000 3000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002
x10 x10
17 5'
,. N
m I’D
1 0 I 0
S. ‘1
-1 i 1 4 * ‘ -5 ‘ ‘ ‘ '
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002

Figure 6.53. Unknown sources, environment output and Network output at ﬁnal time

t = T. Co = 0.80‘ and D0 = 0, r] = [200 10 200 10], 7 = 0.2 and T = 10000. The
unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt)

206

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

'5 o 0 o

—1 ‘ ‘ ‘ 4 -1
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002
2 Zr

‘6 ‘3‘. 0

-1 . . 1 4 -2 1 1 . 1 1
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002
1r 1'
0

3. got

-1.

- 4 . . 4 -1

1000 41000.5 1001 1001.5 1002 1000 41000.5 1001 1001.5 1002
x10 x10

Sr 5'

v- N

U) m

l 0 I 0

E. 9

-5 A A A . -5 . 1 . 4
1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002

Figure 6.54. Test case: Unknown sources, environment output and Network output
at ﬁnal timet = T. Co = 0.80“ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and
T = 10000. The unknown sources are .31 (t) = sin(21rt) and 82(t) = sawtooth(41rt)

 

 

 

207

The diagonal transformation TM is deﬁned as

 

e"'1+ 0 0
0 0 0
TM = N 0 6"". 0
0 0 0
0 0 0 1

Under the transformation T“, the new

~

system is D = (l1, b, ET, (1), deﬁned as

 

0 e 0
0 0 e
A — TMATJ,‘l =
0 0 0
_an
6T = CTTJKI = [ 8n Bra—1
Where
a.- = (ml-i, W = 1, ,n
and
16.}: ﬂie- ,Vi=1,---,n

 

q

 

realization of H (s) in the new coordinate

 

 

 

o C '0-
o o
13:1},sz
0 6 0
-&l .n.
3, 3,] J=d

One observes that under such transformation, the coefﬁcients of

~

0 A and 2': are both inversely proportional to some power of e,

 

208

o the nonzero parameter of f) is proportional to I: while those of C are inversely

proportional to n,
0 the parameters of d are independent of c and n,
0 the parameters of 14 are independent of 10.

Using this transformation, one could scale the parameters that are to be updated up
or down in order to make their numerical computation possible by digital machines.
The scaling is achieved by the appropriate choice of e and depends on the application
for which the problem is being set. For example, the parameters can be scaled down
by choosing a value for 6 less than 1. Thus, there is no optimum value of c that can
be used for all possible environments. However, one should have some knowledge of
the desired environment and use that information to select the appropriate value of
6. Such information could be based on the frequency bandwidth of the environment.
In the following example, it will be shown how the choice of 6 can be considered. K.

can be considered as a normalizing coefﬁcient for the parameters of 6.

6.6.2 Example

The following example illustrates the necessity of scaling in order to make the digital
implementation of the algorithm possible. Given the transfer function

(3 + 100)(3 + 200)

(3 + 1000)(s + 2000)(s + 3000) + 0‘6

 

H(S)

or in an expanded form

_ 32+3x1023+2x104 +06
" 33+6x10332+11x106s+6x109 -’

209

The realization of H (s) is

0 1 0
0 0 1

 

 

—6 x109 —11x106 —6 x103

[2x104 3x1021]

 

 

0
b 0
1
d=0.6

However, by considering the transformation TM with c = 103 and I: = 1, H (s) can

be realized by

in

0 103 0
0 0 103

 

 

—6 x103 -11x103 -6 x103 J

L.

[0.02 0.3 1]

0‘1
(3

 

 

J=0.6

and by considering the transformation TM with e = 102 and K. = l, H (s) can be

realized by

an

0 102 0
0 0 102

 

h

[.3.]

 

—6 x105 —11x10“ —6 x103

5":
O

 

 

One may conclude that in order to scale the parameters of the matrix 6, one should

choose 6 on the order of the zeros. Whereas 6 should be on the order of the poles of

the transfer function if one wishes to scale the parameters of the matrix A. While 6

accomplishes the time scaling task, it performs magnitude scaling. By taking a value

210

for 1: different than 1, one could increase (decrease) the parameters of the E (5) vector

and vice versa.

6.7 Observations

In this chapter, a solution to the blind separation of signals in dynamical network
model was developed. The network parameters of the proposed feedback structure,
with the update law as deﬁned by Theorem 2, has shown to converge to the desired
values. In addition, other update laws based on neuromemitic approach were consid-
ered in order to provide a comparative performance between the developed algorithm
and the existing ones in the literature. Thanks to the special structure of the state
space realization of the network model, an update law of the C matrix based on the
decorrelation between the output vector and the state vector was developed. Also,
such a representation of the network model is suitable for circuit implementation

which will be presented in the following chapter.

CHAPTER 7

Micro-electronic Implementation

7 .1 CMOS Building Blocks

7 .1 .1 Transistor

The transistor represents the basic building block of a circuit. Designers must have a
good understanding of its functionality in order to produce any circuit. Models which
accurately deﬁne the characteristics of the transistor can be extremely complex and
not at all possible for manual calculation. On the other hand, if a computer is

employed to handle such a complex model, circuit design can become simple enough

to model it.

The general expression for the drain-to-source current of a transistor operation in

a sub-threshold region is given below
14, = 10 cw“ (e‘v‘ — 6"“)

At saturation, the drain voltage Vd is much higher than the gate voltage V}. Therefore,

the above expression of drain-to-source current at saturation is
Ida = 10 envy—V,

211

212

7 .1.2 Current mirror

In circuit design, it is often necessary to have a copy of a current to perform different
operations on that particular current. Thus, a circuit that provides a replica of a
current is needed. The circuit shown in Figure 7.1 performs such a function. Both

n-channel and p—channel current mirrors are shown.

 

 

(a) (b)

Figure 7.1. Current Mirror (a) n-channel, (b) p-channel

7 .1.3 Transconductance Ampliﬁer

Figure 7.2(a) represents the circuit diagram of a transconductance ampliﬁer. The
circuit will exhibit a sigmoidal function of the differential voltage V1 and V2 presented
at the gates of the transistors M1 and M2 , respectively. The circuit is being driven

by a biasing current 11, which is controlled by the voltage Vb applied to the transistor

Mb.

1.... = I. tanh g—(V. — v2) (7.1)

213

where
[0 = 1010 eKVb (7.2)

w is the width to length ratio of the biasing transistor Mb. For small-signal analysis
the current-voltage relationship of a transconductance ampliﬁer described by equation

(7.1) is simply
Iout = ng/r (7.3)

where V,-,, = V1 - V2 and gm is the transconductance defined as

_ 610111 16

9m - av... 'Vm=° = 507023 (7“)

It is important that the transconductance depends on the biasing voltage Vb. This pa-

 

 

 

(a) Original Transconductance Ampliﬁer (b) Improved Transconductance Ampliﬁer

Figure 7.2. Transconductance Design

214

rameter will be the key to adjusting the behavior of circuits that include the transcon-
ductance ampliﬁers as will be shown in the next section. In [69], the author proposed
an improved design of the transconductance ampliﬁer as shown in Figure 7.2(b). This
design enables the transconductance ampliﬁer to increase its dynamic range. Circuit
simulations of this design are shown in Figure 7.3. Figure 7.3(a) shows the output
current of the transconductance ampliﬁer for different width to length ratios of the
biasing transistor Mb, while Figure 7.3(b) shows the same response of the ampliﬁer

for different biasing voltages V2,.

m .1111. um 1.1m-

 

 

 

 

 

 

 

 

(a) % = 2, 3, 4, 5, 8 (b) Vb =. 625, 250, 675, 700, 725mV

Figure 7.3. PSPICE simulation of a transconductance ampliﬁer

7 .1.4 Hyperbolic Sine Function

A circuit that exhibits the sine hyperbolic function is shown in Figure 7.4. In [48], it

215

 

 

 

 

 

 

 

 

 

Figure 7.4. Circuit Realization of tan and sine hyperbolic functions

was derived that the output current of this circuit can be expressed as
[out = 2one"(V"'6V) sinh 10V

where V1, is the biasing voltage. A PSPICE simulation of the circuit is shown in Figure
7.5

7.1.5 Gilbert Multiplier

A circuit that produces the product of two voltages is needed in order to perform
the implementation of algorithms that involves such an operation. In [70], the author
presented the 1-dimensional multiplier as shown in Figure 7.6. It is now known in

literature as the Gilbert multiplier. The output current of this circuit is
1: x
101“: I], tanh 5(I/1 — I/g) tanh '2-(I/3 -' I4) (7.5)
For small-signal analysis, the current described by equation (7.5) can be expressed as

10111 0< 11(V1 - V2)(V3 - V11) (7.6)

216

“some mu.
buffi- l‘. WW" 11.11 t: 1mm: 11.0
,u'.. ...., . .

..................

 

   

n-I.ﬂ.ﬂoII'.‘-Il

 

 

I)" I!" I.“ 1.6!? I.“ 1.!" I.“ I1” I."
Oman-l JED]! I‘ll 05111.njh am]!
11

Figure 7.5. PSPICE simulation of the sine hyperbolic circuit for different biasing
voltages Vb

PSPICE simulation of the circuit for various biasing voltages is presented in Figure
7.7. Observe the linear range for which the current is linear. It is desirable that one

deﬁnes the range of operation to be within that linear range.

7 .1 .6 Vector Multipliers

Using the design of a single dimensional Gilbert multiplier, one may design the multi-
dimensional multiplier. Such a design is shown in the Figure 7.8easily. In this case,

the output current is
1..., = Ithanh g(x, — tad) tanh 125(11- — V..,) (7.7)
For small-signal analysis, the current expressed by equation (7.7) is approximated by

1out 0( IbH(Xi - ‘40fo - I/ref) (7-8)

217

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 7.6. l-dimensional Gilbert multiplier

mm- 1.. am»: 01:11.10 " I m. 11.-
my

 

 

 

 

 

Figure 7.7. PSPICE simulation of of Modiﬁed Gilbert Multiplier using different bi-
asing voltages

218

Thus, this circuit realizes the dot product of two n-dimensional vectors X and Y and
gives such a value as a current. If one wishes to convert such a quantity to a voltage,

then s/he can utilize the linear resistors to do so.

 

1%.

 

 

 

 

Li ..
y i
i"

 

 

 

 

 

 

 

 

Figure 7.8. n-dimensional Gilbert multiplier

7 .1.7 Current to Voltage Converter

All the basic analog components that have been presented so far produce a current
output function. If one wishes to convert the current quantity to a voltage quantity,
he or she can rely on the linear resistors. Linear resistor can be used to realize the

Conversion of a current to a voltage. A circuit realization of such a block is shown in

Figure 7.9.

219

out

Figure 7.9. Current to Voltage Converter

7 .1.8 Second Order Section

The second order section is a circuit that can be used to generate a response that
Can be represented by two poles in the complex plane. By adjusting its parameters,
one can position the locations of these poles anywhere in the complex plane. A
realization of such a circuit was presented by Mead in [71] and is shown in Figure
7.10. By performing a small-signal analysis, one can derive the transfer function of
such a circuit.

The transconductance ampliﬁer A2 is a follower integrator and is described by the

following equation
02132 2 92(02 —' 01) (7.9)

At the node (1), the current generated by the transconductance ampliﬁers A1 and A3

220

 

 

 

 

 

 

 

 

 

 

' A. v,
+
u
V...
2: CI
Figure 7.10. Second Order Section Circuit
will charge or discharge the capacitor C1
C1131 = g1(u — v1)+ 93(01— 02) (7.10)
By taking the Laplace transform of equations (7.9) and (7.10)
3021/2 = 92(1/2 — 1/1) (7.11)
sC'lVl = 91(U - V1) + 93(V1- V2) (7-12)

and combining these two equations, the following relationship relating the output V2

221

to the input 11 can be obtained

 

 

 

 

Xe. = 9.
U (301 + 91 — 93)(1+ 372) + g3
_ 91
— 720182 + (C1 + 7291 — 72908 + 91
l
= 7.13
1372.32 + (7'1 + 1'2)(1 — a)s +1 ( )
where
01‘
T, = __ 7.14
g.- ( )
and
a = TI 33 (7.15)
7’1 + 7'2 .91

In [71], the authors concentrated their analysis on the case when C1 = C2 = C. In

this particular case, the expression for a in (7.15) is, therefore, reduced to

“=gfg gm)
1 2

 

and the transfer function described by (7.13) is expressed as

V; 1
U - 7232 + 27(1- a)s +1 (7.17)

 

Figure 7.11 shows the frequency response of the second order section for different
values of a.

Based on the small-signal analysis, the second order section is stable for any a
satisfying 0 < a < 1. Also, in the case where a = 1, the poles are purely imaginary
and the circuit is unstable. However, a large-signal analysis for the circuit developed

experimentally by Watts [69] shows that the second order section can be unstable

222

Ice-l au- 80m-
bum- M. “‘11!“ 31.00.)! m: "I
”UVY 1

 

MU“.

.
”M1

 

 

LIX III 1m Ln! m:
9'11}! 0'1”! l'tIJI‘V101071UI

Figure 7.11. PSPICE Simulations of Second Order Section for V1,, =
625, 650, 675, 700, 725mV

within the speciﬁed range of stability based on the small-signal analysis. It was
proven that this nonlinear circuit exhibits some instabilities. Therefore, an improved
design of the second order section was developed based on the improved design of the

transconductance ampliﬁer that is shown in Figure 7.2(b).

7 .1.9 Second Order Filter Design

The design of the second order section developed by Mead and Watts birthed the
inspiration to develop the circuit realization of the state equation for the canonical
controllable state representation of a dynamical system. To start, a circuit realization
for a 2-dimensional system using the second order section will be developed. Then, the
development of the circuit realization for higher dimensions will be straight forward.

The second order section uses the transconductance ampliﬁer A1 as an integrator
follower and thus it realizes a low pass ﬁlter. However, here a pure integrator for A1

will be implemented. Therefore, Al was modiﬁed to produce a pure integrator circuit

 

 

‘1

223

instead.

7.1.10 Circuit Implementation of Second Order Filter

Consider the second order ﬁlter described by the transfer function

 

 

H(s) = :8) = 52 + ablls + (1; (7.18)
The canonical controllable state representation of H(s) is

231 = :02 (7.19)

:02 = —a2x1 — (11172 + 0111 (7.20)
and the output equation is

y = 2:1 (7.21)

It will be shown that a circuit realization of the filter described by equation (7.18) is

shown in Figure 7.12. A1 is a pure integrator. Therefore

01131 = 91(01 - Vrcf) (7.22)
At node (2), Kirchoff’s current law suggests that

02132 = 92(0 — '02) + 9304.; " ”1) . (7-23)
Let 2; = v.- - V"; and 0 = u — ‘43". Therefore, equations (7.22) and (7.22) become

012.1 = 9122

0252 = 92(11 - 22) - 9321)

224

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 7.12. Filter Design. Circuit realization of controllable canonical form of a
second order ﬁlter

which give us the generalized canonical controllable realization deﬁned by

a, = 3122 (7.24)

2. = was — £122. + £12 (7.25)

The canonical controllable realization is obtained by using the transformation T9,,
1

deﬁned by :01 = 21 and 2:2 = ﬁzz. Therefore

271 = 1132 (7.26)
1 1 l
232 = ———$1 — —$2 + —‘II (7.27)
T173 T2 T172
where
Cl C: 02
T] = — T2 = — T3 = —

91 92 93

225

Equations (7.26) and (7.27) describe the canonical controllable of (7.18) where

1 1 1
02 = — a] = — bl = _—
T1T3 T2 717'3

It is important to note that the developed second order ﬁlter described by equation
(7.25) is always stable since the coefﬁcients a1 and a; are positive. In circuit imple—
mentation, it is often assumed that all the capacitors C,- are equal to C, which is in
the order of a few pico F ayrads. The characteristics of the second order ﬁlter, i.e. its
pole locations and gain, are controlled by the biasing voltages of the transconductance
ampliﬁers shown in Figure 7.12. Figure 7.13 shows the transistor level design of the
ﬁlter. PSPICE simulations of this circuit were performed. Figures 7.14(a) and 7.l4(b)
show the frequency response of the ﬁlter for various biasing voltages of transconduc-
tance A2 and A3, respectively. Observe how changing V1,, and V1,, impacts the gain
as well as the pole locations of the ﬁlter. Having developed the circuit realization of
a second order ﬁlter, that of an n-dimensional ﬁlter becomes straight forward as it is

shown in the next section.

7.1.11 Circuit Realization for n-dimensional Filter

Figure 7.15 represents the circuit realization of an n-dimensional ﬁlter described by

the equation

_ Y(8) _ b1
- U(s) — s" + ale"-l + - - - + ands + an

 

 

H (s) (7.28)
By looking at Figure 7.15, A,- is a pure integrator. Therefore,

0.13; = 91(vi+1 - Vrcf) (7.29)

226

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 7.13. Filter Design. Circuit realization of controllable canonical form of a
second order ﬁlter

mm 1-1 10mm 11.11.31 W‘ 11.0 with rw imam 11.11.11 m ”I
my _v

0"

 

 

 

 

 

   

 

 

(a) Vb, = 650, 675, 700, 725mV (b) Vb, = 650, 675, 700, 725mV

Figure 7.14. PSPICE Simulations of second order ﬁlter

227

At node (vn), Kirchoff’s current law suggests that
n-l
Cnén = Z 550/”; - v.) + gn(u - vn) (7.30)

i=1

Let 2.- = v,- — 14,; and i1 = u — va. Therefore. equations (7.29) and (7.30) become

 

 

 

 

 

 

 

 

 

 

 

A.‘
‘7»
__EF"
1 A."
‘7.
_31"
1 A:
’ _-
- vbl
A.
u .. V +A, ._ .. .. --
v _l - v,. v. ’
'- L. c v. == C» v __ c. V
H
-r- 1,— .
' =4,—

Figure 7.15. Filter Design, Circuit realization of a controllable canonical form of a
n“ order ﬁlter

Cd; = gg25+1 i=1,---,n—1
n-l
Cnén = — Z 3523 + git“; — 2")

i=1

228

which gives us the generalized canonical controllable realization deﬁned by

i,- = %z,~+1 i=1,---,n—1 (7.31)
2,, — —%zl—...—g—gizn_l—-.-—%—"—zn+%’ia (7.32)

Based on the developed small-signal analysis, one can use the Routh Hurwitz criteria
to design the necessary parameters to obtain a stable ﬁlter.

Deﬁne T.- = 1'} f.- = 11, Vi = 1,--- n -1 and in = Tn. Therefore e uations
C.’ C" a , q

(7.31) and (7.32) become

2.; = T;Z,'+1 2: 1,° H ,n —1 (7.33)

£7; = —7_-121 _ ° ° ° — fn-lZn-l — ° ° ° — fnZn + Tna (7.34)

In matrix form

 

Azennxn b36Rn
:- II rp 1i - n I- d
2.1 0 T1 0 ' ' ' 0 21 0
:22 O 0 12 . - - 0 22 0
= + 11
if 0 0 0 0 Tn...1 Z.‘ 0
in .4 "fl ' ' ' ' ' ' —fn—l -731 Zn Tn

 

 

 

 

 

 

 

 

Deﬁne the transformation T as

 

 

’1 0 0 i

0 0 0
T: i 0 $2113- 0

0 0 0

_0 0 0 11;:qu

229

 

 

 

 

Therefore
r
0 1 0 0 l 0
O 0 1 0 0
A=TA,T“= 5 5 b=sz=
0 0 0 0 1 0
—a.. -a1 b1
where
75111.7. "-1
ai=71n—i+1'—,.J:‘i——f=7‘n-i+1 H “G
j=n—i J ‘ j=n—i+l
and
71—1 11
b1 =7'n H73: H13
j=l j=l

7 .2 Circuit Realization

Having introduced the library of basic building blOcks for analog CMOS implemen-
tations, it is possible to develop a circuit realization of the network and the update

laws.

7.2.1 Implementation of the output equation

The output equation
31; = 8i - 63X; - dgTy (7.35)
is simpliﬁed to

y; = e.- - étTi. - 3T5! (7.36)

230

where z.- is a vector having one dimension less than 2 and is obtained by eliminating
the 2"” component of 2. For the algorithm development, equation (7.35) was used
rather than (7.36) in order to create a matching in the dimensions for clarity. It
should be recalled that m.- = 0, C.'.' = 0 and d,-.- = 0. Equation (7.36) is realized by
using a transconductance ampliﬁer to convert the input voltage e.- into a current. In
addition, two multi-dimensional vector multipliers are used to compute the product
6,732.- and dTS'. The currents coming out of the vector multipliers are subtracted from
that of the transconductance ampliﬁer to obtain the y.- in current form. Then, a linear

resistor is used to convert the current into a voltage. Figure 7.16 shows the schematic

of circuit realization.

 

._, c-jyﬂi

 

 

 

 

 

 

 

 

c: X ’ [1]—W-
d.
9 X

 

Figure 7.16. Circuit Realization of the output equation

231

7 .2.2 Implementation of the state equation

The state equation is described by the transfer function

II r q |- -I P q

 

 

 

 

 

 

 

 

233;) o 1 0 0 35,1.) 0
if? 0 0 1 0 .53) 0
= i E E + E u,(7.37)
233“?“ 0 0 0 0 1 mfg-“1"“ 0
55:0) . __aff-‘ﬂ _ag)‘ . 1390') ‘ _ﬂij‘

The circuit implementation of this transfer function was completed in the previous
section. Figure 7.15 shows the schematic for a ﬁlter of order n. Thus, to implement

the equation (7.37) one needs to designs a pgjth order ﬁlter. In this case

I: Any-1
(151') = ij..k+1 H T] (7.38)
l=u5j-k+l
and
I‘ij
ﬁij = H T] (7'39)
(=1

7 .2.3 Implementation of the C update equation

The update law of the parameters of the C matrix is described by

(EV-c) = nff’ygxli) ' Vj 75 2' and k = 1, ~ - - , p,,- (7.40)

’J ‘1

Equation (7.40) is implemented using a 1-dimensional gilbert multiplier. The output
current of the multiplier is dumped to a capacitor Cg“) as shown in Figure 7.17. In

this case

(1‘) _ V ._ V
(1.) tanth tanhxgi—"L (7.41)
I:

.(l‘) 'l") =
C c I 2 2

t1 31' b

232
Therefore

éif) = 773:) (935:) - Vrcf) (yi - Vrcf) (7-42)

where

(’F) = 155;) (ﬁ)2
”'1 0;) 2kT

 

(7.43)

Thus, the leaning rate of such parameter is controlled the bias voltage of the multi-

plier, VCon
0

 

 

 

 

 

V...

Xk —.. k
I X C.

Y: ——-I

vc‘,

 

Figure 7.17. Circuit Realization the update equation of the C matrix parameters

7 .2.4 Implementation of the D update equation

The update law of the parameters of the C matrix is described by

(£3 = 751' sinh y,- tanh yj Vj 7Q i (7.44)

Equation (7.44) is implemented using the sine hyperbolic circuit and the transcon-

ductance ampliﬁer that were described in the previous section. Figure 7.18 shows the

233

circuit implementation. The algebraic sum of all the current at the (+) node of the

capacitor 0.3 is
sinh K
Iout = Iout tanh g(yj _ Vrc'f) (7'45)
where

I ”M = 2one"(V"+6V) sinh x(y; — an) (7.46)

out

The current described by equation (7.45) is dumped to the capacitor C5,. Therefore,

when equations (7.45) and (7.46) are combined together, one obtains

 

d..- = ‘m sinh x(y. - v...) tanh g(y. — v...) (7.47)
where
2w] eK(VDb.-j+6V)
70' = o 0.. (7.48)

It is important to observe the parameters that the the learning rate 7.3 depends
on. They are the biasing voltage VDbU; the ratio of width of the biasing transistor
to the width of the arm transistors w; and the offset voltage 6V. In the circuit
implementation, all these parameters will be held constant except the bias gate voltage

VD],

w which will control the learning rate of the parameter dij.

7 .3 Circuit Simulation

In this section, the circuit simulation of a 2 x 2 network is performed using the circuit
simulator PSPICE. The block diagram of the overall circuit including network and

learning is shown in Figure 7.19.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V.

Figure 7.18. Circuit Realization of tan and sine hyperbolic functions

The unknown signals are two sine waveforms with respective frequencies 1kHz
and 2kHz as shown in Figure 7.20. These two signals are ﬁltered and mixed in order
to obtain the signals e1 (t) and 82(t) which are the output signals of the environment as
presented in Figure 7.21. The transfer functions Hub) and 1.1721(3), shown in Figure
7.21, are two second order ﬁlters whose state and output equations are respectively

deﬁned by equations (7.37) and (7.35), with the exception that d5,- = l in this case.

ii: = A.,-x.~,-+3.~,u,- (7-49)

e, = Eng5j + u.- + Jgj‘uj (7.50)

The circuit implementation of equation (7.49) is shown in Figure 7.15. However, the
circuit implementation of equation (7.50) is obtained in a similar fashion as that of
the output equation of the network model described by equation (7.35) whose circuit
implementation is shown in Figure 7.16.

The parameters of the matrix 1,-1- are deﬁned by the biasing voltages of the
transconductance ampliﬁers and the capacitors shown in Figure 7.15. The parameters

of the column vector 6.3 and the parameter 3;, are deﬁned by the voltages at the cor-

235

 

 

Order
Filter

 

 

 

Update y]

 

 

 

 

 

 

 

 

4—dimemional
Cl __,. Veda, 4' yl
Multiplier

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d21 yl

4—dimensionnl
32 __.. we.“ #- Y2
Multiplier

 

 

 

 

 

 

 

 

x2 02

 

 

 

Updue

C Y2

 

 

 

 

 

 

 

 

 

 

Filter

 

Figure 7.19. Block diagram of overall circuit including network and learning

236

 

 

 

 

 

Date/ﬂu m: Iliad/DI 17.10.11 1'... oooooo 21.0
ZJSVT' * ------------------------------------------------------------------------------------ 1
. l
2.50V,
1.4! ---------------------------------------------------------------------------
VL‘IJ)
2.55 -------------------------------------------------------------------------------------- 1
z.sov.
2.4SVOH- v --------------- v v “‘i
o. 1.0- 2.“ Lu- l.“ the
0V“!!!

Figure 7.20. Unknown sources

810)

 

 

ﬁn“) ‘

82(t)l—-. H2] (S)

Figure 7.21. Environment Circuit Model

 

 

 

 

 

 

 

237

Table 7.1. Parameters of the Environment Circuit Realization

 

 

 

 

 

 

 

 

 

 

[ A12 I A21 1
V,1 0.69V v; 0.69V
V,2 0.69V V,2 0.69V
V3 0.71V V3 0.71v
Cl 5pF Cl 4pF
Cg 5pF Cg 4pF

; 512 521
V5,, 2.60V V5,, 2.65V
V5,, 2.40v V5,, 2.45v

d1 d2
VJ“ 2.70V 14,-” 2.56V
VJ” 2.54v VJ” 2.70V

 

 

 

 

 

 

responding gates of the transistor at the positive arm of the gilbert multiplier. Table
7.3 shows all these parameters. A PSPICE simulation of the frequency response of
the ﬁlters 312(3) and {121(3) was conducted. The results of this simulation are shown
in Figure 7.22. Observe that the cutoff frequencies of the ﬁlters H12 and H21 are
within the 1kH z and the 5kHz range, which is the operating frequency range. The
transient response of these ﬁlters is shown in Figure 7.23 where the input signals are

the two sine waveforms shown in Figure 7.20.

The output of the environment is fed to the transconductance ampliﬁer as shown
in Figure 7.16. Unlike the environment circuit model whose parameters (3.-j and d,-
are constant and are deﬁned in Table 7.3, the parameters (3.-,- and d,- of the network
model are controlled by the ‘ time-varying charges, accross the capacitors shown in
ﬁgures 7.17 and 7.18, deﬁned by equations (7.42) and (7.47). The initial voltages

across these capacitors are shown in Table 7.3. Figure 7.24 shows the results of

238

and cm: 'qu
Deco/1':- m- 1112)". 10:31:41 Menu-I 37.0

Io? . can r -------------------- ~ --------------------- ' -------------------- ~ --------------------- . --------------------

1
I
:
I

 

100.0” ‘

 

«.10.; L --------------------------------------------------------------------------------------------------------- 4
unusual”) «autumn». alumnae); -xmma.nu oxncxnaann animxnu ormmcann- manta“)
‘.M1r-...-.OOOQCO-'G-¢.--1..‘OQOOQOOOCO
’M4 . . . . . I .
Ion-r
III-r
‘m1 I I 'lvl-l . . ...... I I ...... . .
soc-r

Cmi

 

 

 

 

 

 

 

 

 

10” ~ L ; L

1 . III III- 10.: x . on: was 10...

alumnus) dualism“) OXDIIIIZ.I13I «aunt-u). ”(I‘LIHI annual“: «Dunstan: Julianna)
m

Into/ﬂ- “: 13133106 Ohio-I! Macao- 87.0

1.”? ----------------------------------------------------- ‘ """""""""""""""""""""""""" '1

 

 

 

 

 

‘0--. ........................................................................................................ I

O“ '
Brownian!) -n(m1.n4|0 :mmaann -!D(m3.lllb Ou|m3.nii downturn) donut-u)- IDIIIQO-Ill)
5°”

'1
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
.1

 

 

p
i

V“""""""""“""""““"'

 

 

 

. 10!“ 1m- 1. . In: to”. soon:
ﬁnin‘n.ln| -xmnn.lnu .xmma.nn annuals“. ”(ELEM -!DIIID.III) ammonia) annual“)
M

32'

Figure 7.22. Frequency Response of the Environment Circuit Realization

239

Dace/Tau rum 18/23/9‘ 10:31:49 - MOIOCUOI 27.0
SOIIAT ------------------------------------------------------------------------------------- 1
. 0
0A”.
: I
_sm .5 ..............................................................................................................

ormmiunn -ID(XD1.IJ‘I ormxnaaan -1Dlm2.mﬂo xmxnaann -Iblm).lll) OID(XDC.I11) quantum)
SMT """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

1

 

o. 1.0.. a.” 1.0- I.“ the
02041311.!3” actual“) oxmxnaann -rmma.uu ormmzann -!D(m3.lld) othXlltllll -xn¢nu.mu
Tl—

Figure 7.23. Transient Response of The Environment Circuit Realization

the network performance for the ﬁrst 50msec. Observe that all the parameters have
converged to the values as shown in Table 7.3. By ﬁxing the parameters of the network
to these ﬁnal values, the network reproduced the original signals as shown in Figure
7.25. The transistor PSPICE model and the PSPICE source code used to simulate
the developed circuit implementation is given in Appendix D.

In this chapter, a subthreshold circuit implementation of the dynamic feedback
network and the learning algorithm was developed. The performance analysis of
circuit realization was conducted using two prototype signals as the unknown sources.
It was observed that the network parameters converge to a separating solution and

thus recover the original unknown sources.

240

Table 7.2. Parameters of the Environment Circuit Realization

 

 

 

 

 

 

 

Parameter Initial Final
d12 1.0V 4.80V
d21 4.0V 4.80V
€11 1.0V 2.31V
cm 4.0V 2.20V
€21 1.0V 2.73V
€22 4.0V 2.36V

 

 

 

 

 

Date/1‘1. MI 11/34/06 O‘IIIIIOI WI 37..
S.W1 - .

 

.......................................

 

 

f I I I
C
. . 1
V O. u 9'" ‘Iﬁ\‘r‘ 'l-A‘ 'ﬁ' Hﬂw ' ' ' V r I f V” ‘

 

 

 

V r V ...................
0. 1b 30- )u d- 3”
OVIIJZ) .VICZI) OVISOI) 'VIJOZ) .VICCI) ’VIO“)

Figure 7.24. Parameters’ Convergence of the Network Circuit Realization

241

0.0““ 036.: filter
Date/ft- MI 11/20/06 21:01.07 Mot-tum 37.0

a-WT oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ‘
. . . . . :

 

 

 

Figure 7.25. Output of the Network Circuit Realization at Convergence

CHAPTER 8

Conclusion

In this work, a novel algorithm for the blind separation of signals in static environment
based on the decorrelation condition of the output signals was developed. It was then
improved to test for independence. An energy function for the problem based on
the minimization of the mutual information functional was also developed. A fourth
order Edgeworth expansion was used in order to ﬁnd an approximate expression of
the probability density function. The new energy function considered a larger set
of signals as compared to the existing energy function, since the assumption of unit
variance of the output was not considered. Also, the energy function represented a
better estimate of the mutual information functional.

A mathematical framework for the development of update laws for the network
parameters based on the theory of adaptive optimal control theory was also developed.
A realization of the environment that represents the least number of parameters was
also introduced. A forward and an feedback structure were considered to model the
environment and hence the network. Computer simulations showed that the update
of the parameters of the matrix D converge for both structures. However, successful
learning of the C matrix was realized only for the feedback structure. The coupling of
both learning rules was eliminated by considering a sequential update method. Basic

building blocks for the circuit implementation of an algorithm in the dynamic feedback

242

243

network model were developed. Overall circuit implementation of the algorithm was

also developed and tested for prototype waveforms.

APPENDICES

APPENDIX A

Deﬁnitions

A.1 Statistical Deﬁnitions

Deﬁnition 4 (Entropy) Entropy is a measure of uncertainty of the occurrence of

an event in an ensemble of experiments described by the random variable (rv) x [72].

It is defined as:
H(X) = -E{1n fx(X)}

where fx(x) is the probability density function (pdf) of the rv x.

Deﬁnition 5 (Joint Entropy) The joint entropy of two random variables x and y
is [72]:

H (X.y) = -E{lnfxy(X.y)}

where fxy(x,y) is the joint pdf of the rv’s x and y

Deﬁnition 6 (Mutual Information) The mutual information among the compo-

nents of a random vector x is deﬁned as [72]:

'_ n fX(x)
I(x)—E{l ___lli “(x0 (A.1)

244

245 .

In the above deﬁnitions, E {} denotes the expected value and is deﬁned as:

Deﬁnition 7 (Expected Value) The expected value of a function of an rv x is
deﬁned as [72]:

was )}= [_°° g((x)f. xdx)

Deﬁnition 8 (Conditional Probability Denesity)

f(X.y) = f (XIy)f(y) = f (yIX)f(X)
Proposition 1 (Properties of Entropy)

P1 H(x) Z 0

P2 H(XJ) = H(X) + H(yIX) = I160+ H(XIy)

P3 H(X.Y)SH(X)+H(y)

P4 H(XIY) S 1100

P5 H(x1,x2,-~ ,xn) = H(x1)+ H(x2lx1)
+H(x3|x2,x1) + - - - + H(xnlxn-1, - - - ,xl)

P6 H(x1,x2,-~,x,,) S H(x1)+ H(xz) +--- +H(x,,)

P7 Equality in P6 will hold if x; ’s are statistically independent.

Proof of P7

In this case, by the deﬁnition of statistical independence, one has fx(x) 2 1'1,- f,;,(x,-)

"Eiln fx(X)}
= ”Eilanx.(xi)l
= ‘ZEilnfx.(xi)}

H(xlix2i ' ' ' ix")

246

= ZH(X.‘)

A.2 Matrix Notations

Deﬁnition 9 A permutation matrix is a square matrix whose rows and columns con-

tain only one nonzero entry that is equal to 1.

Deﬁnition 10 A generalized permutation matrix is a square matrix whose rows and

columns contain only one nonzero entry.

APPENDIX B

Proofs

B.1 Proof of Theorem 3.5

The proof of this theorem is conducted by induction. However, the following theorem

is needed.

Theorem 3 Let A be a positive deﬁnite n X n matrix, and B be the (n +1) x (n +1)

matrix

Then
(i) [B] S alAl, with equality if and only ifb = 0;
(ii) B is positive deﬁnite if and only if IBI > 0

Proof of theorem 3:

Deﬁne the (n + 1) x (n + 1) matrix

1,. —A-1b
0T 1

247

248

Then

T 0
P BP =
0T 0 — bTA‘lb

so that
[B] = IPTBPI = IAI I (a — bTA'lb)

A is positive deﬁnite. Then, A‘1 is positive deﬁnite. Therefore, bTA‘lb > 0,Vb.
Consequently, we prove (i). To prove (ii), we observe that |B| > 0 if and only if
a -— bTA'lb > 0, which is the case if and only if PTBP is positive deﬁnite. Since P
is positive deﬁnite, so is B. D

Proof of theorem 1:

As stated earlier, the proof will be conducted by induction. If M is a positive scalar,
then equation (3.5) is always true. Now, we will assume that the equation (3.5) true

for matrix M of rank 12. Theorem 3 shows that equation (3.5) is also true for (n + 1).

Cl

B.2 CMOS Circuit Function Derivation

Derivation of The 'h'ansconductance Ampliﬁer Function

Figure 7.2(a) represents the circuit diagram of a transconductance ampliﬁer. The
circuit will exhibit a sigmoidal function of the differential voltage V1 and V2 presented
at the gates of the transistors M1 and M2, respectively. The circuit is being driven
by a biasing current 15 which is controlled by the voltage Vb applied to the transistor
Mb. Using KCL at node (1), we have:

11+12=Ib

 

249

where
I] = 10 eKVI—VI
12 = Io enV2—V,
15 = on e‘V" (l — e’V’)

Solving for e'V‘ to obtain:

V w eKVb

 

Substituting the above expression in the expressions of currents 11 and 12, one obtains:

 

V
e" 1
_ "Vb
NV:
V, e

 

12 = on e"
85V] +enV2 +108an

To acquire the difference of the two currents, a current mirror circuit is employed.

The current mirror circuit produces a replica of the current 11 ﬂowing through M4.

Thus,

10.: = 11 - 12

K V1 RV;

C ‘- C

 

= 21210 e‘v"

Multiply numerator and denominator by e"(V1+V2)/ 2

eIt(V1 -V2)/2 _ e-K(V1-V2)/2

en(Vl-Vg)/2 + ‘3'”‘(VI-Vzl/2 + weK(Vb-(V1+V2)/2)

Vb

 

Iout = on 6"

However,

V1+V2
2

 

Vb<<

250

 

Therefore,
eK(V1-V2)/2 _ e-K(V1-V2)/2
Iout = on er: b
e"(Vi-V2)/2 + 3"‘(V1-Vzl/2
which is

[out = on e‘V" tanhn(V1 — V2)/2

Derivation of Hyperbolic Sine Function

A circuit that exhibits the sine hyperbolic function is shown in ﬁgure 7.4. The output

current is expressed as
[out = 1+ - I“
where
I+ = h(V,6V) and 1+ = h(—6V, V)

Now, it is desired to deternime the function h(V1, V2).

11 = IoC‘VI-V’
[2 = Ioean-V,
1;, = one"V’(l — ev’)

Using KCL at node V,,:

h+h=h+Ah

251

Substituting for all expressions and solve for e'v’ to obtain:

KVb
—v, w e

= (1 — A)e"V1 + e""2 + w eKVb

 

6

Therefore with unity feedback (A = 1):

K V1
8 Vb e

65V; + w 65V),

 

I} = 1.0108

Multiplying numerator and denominator by e"V2 to obtain:

x(V1 —V2)
KVb e

 

11 = 1.0106 1+ w eK(Vb"’V2)
However, Vb << V2. Thus, e"(V°’V’) << 1. Consequently,

I = h(Vla V2) = 11 = one‘V" e"(V“V’)

Coming back to ﬁnd the output current of ﬁgure 7.4:

I... h(V,6V) — h(—6V, V)

one

..v, [adv-(W) _ e..(—asv-V)]

x(Vb—JV) [65V _ -..v]

= one e

= 2one"(V°‘6V) sinth

APPENDIX C

Matlab

C.1 Derivation of fa

Consider the assumption that the output signals have zero mean.

cumulants are expressed as follows:

N2i = #2.‘=E[yil
Kai = ﬂair-Elyil

N4; = #4.‘ - 31“; = Ell/fl “ 3Elyil2

Express the approximation of the entropy as

2881n 21r — 288K2,‘ - 48kg,- — 12x3,- + 12%,...-

 

 

 

"‘ = 576
+1216}, — 17045;"; — 496mg, — 213x;-
576
Compute

91- _ 2a

K2; _ 576

a _ _96531' + 2453;54" '— 3408ﬂ3if€3i "" 1984,03,:
K3.‘ — 576

252

Therefore, the

253

8H,- —24I€4i + 12mg,- + 36K},- — 3408K§i54g — 85253,-

 

541 576
Compute also

8’62; 6 6

 

 

 

 

 

— ' = — .2 = . .
awij — 8212,,- [II 2’] 6w,,- Ell/J EDI/.173]
5K3. __ a -__ a 3 __ 2 '
6ng — awij [#3'] _ ngjEiyi] _ EBB/,- $1]
3541‘ a

 

_. _ __ 2. = 3 . - - . .
5.0,]. - awijlu. 3112.] E[4y,x,] 12,12, 3W1]

C.2 Matlab Source Code
XXZ221%!XXZXZXZXXZZZZXZZXXXZZZXXZXXZZX

function F 8 func(unit-var,order);

2%XXXx121222XX%%%%%%%%%%ZZZZXXXZXXXXXZ

if unit-var '3 1

k3-[O 1 o o 01’;
k4 . [1 o o o -3]’;
p3 - [o o 3 o -3]’;
p4 - [o 4 o —12 01’;
else

k3 - [o 1 o o 01’;
k4 - [1 o o o -3]’;
p3 - [o o 3 o 01’;
p4 - [o 4 o o 01';
end

if order =8 3

n 3 13;

H3 - 96*pad(k3,n) - 24*pad(conv(k3,k4),n)

+ 3408*pad(conv(conv(k4,k4),k3),n)

+ 1984*pad(conv(conv(k3,k3),k3),n);

H4 I 24*pad(k4,n) - 12*pad(conv(k3,k3),n)

+ 3408*pad(conv(conv(k3,k3),k4),n)

- 36*pad(conv(k4,k4),n) + 852*pad(conv(conv(k4,k4),k4),n);
H3 8 conv(H3,p3);

H4 I conv(H4,p4);

c 8 576;

elseif order =3 2

n 3 9;

H3 I -8*pad(k3,n) + 60*pad(conv(k3,k4),n);

H4 - -2*pad(k4,n) + 30*pad(conv(k3,k3),n) + 9*pad(conv(k4,k4),n);

254

H3 8 conv(H3,p3);

H4 8 conv(H4,p4);

c t 48;

else

fprintf(’\n No order is defined ..\n’);
break;

end

if unit_var =3 1

H2 8 c*pad([1 O],length(H3));

else

H2 - pad(0,1ength(H3));

end

F ' (H2 + H3 + H4)/c;
XXXXXXXXXXXXZXXXZXXZZXXXXXXXXXZXXXXXXX
function y = pad(x,n);
11111111211ZZXXXXXZXXZXXXXZXXXZXXXZZXX
y 8 zeroan,1);

x I x(z);

m 3 length(x);

y(n-m+1:n) = x;

APPENDIX D

PSPICE Files -

D.1 Circuit File

2 by 2 Separation Network Circuit

*****************
* Apply Voltages
*****************

Vdd 80 0 5.0V

V38 70 0 0.0V

Vref 3 0 2.50

meul 4 0 0.70

VbIV 5 0 10

Vdp 6 0 2.8

Vdm 7 0 2.2

V31 11 0 sin(2.5 0.1 1k)
V32 12 0 sin(2.5 0.1 2k)
de11 121 0 2.70

de12 122 0 2.54

Vcb11 141 0 2.60

Vcb12 142 0 2.40

de21 221 0 2.56

de22 222 0 2.70

Vcb21 241 0 2.65

Vcb22 242 0 2.45

255

*Vd11
*Vd12
*Vc11
*Vc12

*Vd21
*Vd22
*Vc21
*Vc22

Vbsosll
Vbsosl2
Vbsosl3

Vbsos21
Vbsos22
Vbsos23

******************************

* Identify Nodes
******************************

t

*

meul

********************

321
322
341
342

421
422
441
442

151
152
153

251
252
253

Vref
4

s1
s2

db11
db12

xb11
xb12
cb11
cb12

d11
d12

111

0000 0000

c$<>

0
0
0

e1+ 21 e2+ 31
e1- 22 e2- 32

.70
.54
.60
.40

MMNM

.56
.70
.65
.45

[ONION

0.69

0.69
0.71

0.69
0.69
0.71

3

11
12'

121
122

131
132
141
142

321
322

331

Vblsosl 151 Vb2sosl 251
Vbisos2 152 Vb2sos2 252
Vblsos3 153 Vb2sos3 253

db21
db22

xb21
xb22
cb21
cb22

d21
d22

121

256

221
222

231
232
241
242

421
422

431

257

* 112 332 x22 432
* c11 341 C21 441
* c12 342 C22 442
* y1 61 y2 62

************************************************************

* MYSOS s2 xb12 xb11 Vblsosl Vblsos2 Vb18083 Vdd Vgnd Vref
* 1 2 3 11 12 13 8 7 9
44*4444444*:s*******4444444***********************s*********
XSOSEI 12 132 131 151 152 153 80 7O 3 MYSDS

CE11 131 O SpF

CE12 132 0 SpF
******************************************************

* Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I-

t 8 7 1 2 3 4 5 21 22
******************************************************

* realize e1 8 cb11*111 + cb12*xb12 + db11*31 + db12*s2

xs11 so 70 141 3 131 3 4 21 22 GMULDl
x312 so 70 142 3 132 3 4 21 22 GMULDi
xs13 so 70 121 3 11 3 4 21 22 GMULDI
x314 so 70 122 3 12 3 4 21 22 ' GMULDI

*********#**********#***************#*#***t*****************
* MYSOS 31 xb22 xb21 Vblsosl Vb1sos2 Vblsos3 Vdd Vgnd Vref
**************#*********************************************
XSOSE2 11 232 231 251 252 253 80 70 3 MYSOS

CE21 231 O 4pF

C222 232 O 4pF

* realize e2 8 cb21*x21 + cb22*xb22 + db21*s1 + db22*s2

XE21 80 70 241 3 231 3 4 31 32 GMULDI
X322 80 70 242 3 232 3 4 31 32 GMULDi
X823 80 70 221 3 11 3 4 31 32 GMULDl
XE24 80 70 222 3 12 3 4 31 32 GMULDI

***********t**********t***********************************

* MYSOS y2 112 :11 Vblsosl Vbisos2 VbisosS Vdd Vgnd Vref
*ttttttttttttt***#*******it*tt******#******#**************
XSOSY1 62 332 331 151 152 153 80 7O 3 MYSOS

CY11 331 0 5p?

CY12 332 0 5p?
4*444444444444444444444444:4*:444*********¢****s******

* Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I-
*4*44444*4444*44*4444*44444**4444444444444444444444444

* realize y1 8 e1 - c11*111 - c12*112 - d12*y2 ’

XY11 80 7O 3 341 331 3 4 21 22 GMULDI

XY12 80 70 3 342 332 3 4 21 22 GMULDi
XY13 80 7O 3 322 62 3 _4 21 22 GMULD1

258

*********************************III************************

* MYSOS y1 :22 x21 Vbisosi Vblsos2 Vb13083 Vdd Vgnd Vref
**********************************************************
XSOSY2 11 432 431 251 252 253 80 70 3 MYSOS
CY21 431 O 4pF

CY22' 432 0 4p?
******************************************************

* Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I-

* 8 7 1 2 3 4 5 21 22
****¢***************4*************************4444:444

* realize y2 8 e2 - c21*x21 - c22*x22 - d21*y1

XY21 8O 70 3 441 431 3 4 31 32 GMULDI
XY22 80 7O 3 442 432 3 ‘4 31 32 GMULD1
XY23 80 7O 3 421 61 3 4 31 32 GMULD1
4444444444444444*##4##********s*********¢*****44:44:44
* Subcircuit MYIV Vdd Vss I+ I- Vbias Vout
* 8 7 1 2 4 3

4444444444444444*44*4444444444444444444444:44444444444
XIV1 80 70 21 22 5 7 MYIV

XIV2 80 70 31 32 5 72 MYIV
444444444444444444:44444444444444444444444*:

* Subcircuit TRANSAMl V1 V2 Vb Vout Vdd Vgnd
#**************#****************************

* Use Follower

XF1 71 61 4 61 80 7O TRANSAMI

XF2 72 62 4 62 80 7O TRANSAMl
4:44:4444444444444444:44::

* Update equation of dij

44444444444444444444444444

* Subcircuit BLOCK Vdd Vss yi Vdm yj Vb Vdp dij Vref
1812 80 7O 61 6 62 4 7 322 3 F68

CD12 322 O SnF

* Subcircuit BLOCK Vdd Vss y2 Vdm y2 Vb Vdp d21 Vref
X821 80 70 62 6 61 4 7 421 3 F68
CD21 421 0 5nF

**************************

* Update equations of cij

***##********************t

* Subcircuit HGMUL Vdd Vss yi Vref xij Vref Vbias cij
XMUL11 80 7O 61 3 331 3 4 341 HGMUL

XHUL12 80 70 61 3 332 3 4 342 HGMUL

XHUL21 80 70 62 3 431 3 4 441 UGMUL

XMUL22 80 7O 62 3 432 3 4 442 HGMUL

CC11 341 0 1nf

CC12 342 O inf

259

CC21 441 0 inf
CC22 442 0 1nf
*4444s4ts4#44

* Simulations
44:44:4444444

.LIB comp.lib

.TRAN .Sm 50m

.IC V(322) ' 1

.IC V(421) 4

.IC V(341)
.IC V(342)
.IC v(441)
.IC V(442)
.OP

.PROBE
.END

m.»4.p -

D.2 Library File

************************************************

* Transistor Models
***********¢********s*****s444444444444444444444

.MODEL nmos NMDS LEVEL=2

+ VT080.783736 KP-5.46E-5 GAMMA-0.5262 LAMBDA-3.533329£-2
+ TOX=412E-10 CGSOI2.6695658-10 CGDO-2.6695652-1O CJ-1.134E-4

+ MJ-O.708 CJSH-4.77E-1O MJSH-O.253 XJ=0.1500U

+ TPG=1.0000 LDIO.212340U NSUB-5.86OE+15 NFS=9.54427OE+11

+ NEFF=1.0 NSS=1.0000E+12 DELTA-1.99612 VHAX-57874.1

+ U0-651 UEXP-O.177364 UCRIT830664.6 PB-O.800 PHI-0.6
+ RSH=33.4OO CGBD-4.250255E-10

*

.MDDEL pmos PMOS LEVEL-2

+ VT08-0.807 KP-2.13E-5 GAMMA-0.5644 LAMBDA-5.6594915-2
+ TOX'412E-1O CGSO=3.143031E-10 CGDOI3.143O31E-10 CJ-2.54E-4

+ MJ-0.553 CJSU=3.31E-1O MJSH80.352 XJ-0.05U

+ TPGI-1.O LDtO.25U NSUB-6.74OOE+15 NFS=1.000E+11

+ NEFF-1.001 NSS=1.0000E+12 DELTA-1.001368E-6 VMAx-37082.8

+ U0=253.977 UEXP-O.2458 UCRIT=16929.2 PB-D.800 PHI80.6
+ RSH-121.6000 CGBOI4.574377E-10

********#********************************************

* Subcircuit FGB Vdd V88 81 Vdn S2 Vb Vdp Vout Vref

* 1 2 3 4 5 6 7 8 9
t*****************************#*****#****t**#*¢******

.SUBCKT FGB 1 2 3 4 5 6 7 8 9

260

X1 1 2 3 4 9 6 11 12 5 FG
X2 1 2 7 3 9 6 12 11 5 FG
X3 1 12 10 PMIRRDR

X4 2 10 8 NMIRROR

X5 1 11 8 PMIRRUR

.ENDS

********************************************************

* Subcircuit FG Vdd Vss V1 Vd V2 Vb Vout1 Vout2 Vref
* 1 2 3 4 5 6 7 8 9
****************************************************#***
.SUBCKT F6 1 2 3 4 5 6 7 8 9

M1 1 3 12 nmos H-6.0U L-6.0U

M2 11 4 12 nmos H86.0U L86.OU

M3 12 6 2 nmos H86.OU L-6.0U

M4 12 13 2 nmos H-6.0U L-6.0U

M5 13 13 2 nmos “-6.00 L86.OU

M6 14 13 2 nmos U-6.0U L86.OU

M7 7 9 14 nmos U-6.0U L-6.0U

M8 8 5 14 nnos “-6.00 L-6.0U

M9 1 11 11 pmos W36.0U L86.OU

M10 1 11 13 pmos H-6.0U L86.OU

.ENDS

*********s***s*********4444444444*

* Subcircuit PMirror Vdd Vref Vmir

* 1 2 3
*tttttt*****#******************ttt

.SUBCKT PMIRROR 1 2 3

M1 1 2 2 1 pmos U-6.0U L-6.0U

M2 1 2 3 1 pmos H-6.0U L-6.0U

.ENDS

************#*************#******t

* Subcircuit NMirror Vss Vref Vnir

* 1 2 3
**********************************

.SUBCKT NMIRROR 1 2 3

M1 2 2 1 1 nmos H86.OU L-6.0U

M2 3 2 1 1 nmos H-6.0U L-6.0U

.ENDS
ti*t*ttt***********t****************************
* Subcircuit UGMUL Vdd Vss V1 V2 V3 V4 Vb Vout
4 8 7 1 2 3 4 5 6

* Schematic: Mead’s book page 95
*#**#**##*****#*#***it##***#********************
.SUBCKT HGMUL 8 7 1 2 3 4 5 6

M1 12 1 13 7 nmos H-6.0U L-6.0U

HHMMMNMNMM

261

M2 16 2 13 7 nmos “-6.00 L-6.0U
M3 12 12 8 8 pmos “=6.0U L-6.0U
M4 16 16 8 8 pmos “-6.00 L-6.00
M5 15 12 8 8 pmos “-6.00 L=6.00
M6 18 16 8 8 pmos “-6.00 L=6.0U
M7 11 4 15 8 pmos “-6.00 L-6.00
M8 14 4 18 8 pmos “-6.00 L-6.0U
M9 14 3 15 8 pmos “-6.00 L-6.0U

M10 11 3 18 8 pmos “-6.00 L-6.0U

M11 11 11 7 7 nmos “-6.00 L-6.0U

M12 14 14 7 7 nmos “-6.00 L-6.0U

M13 17 11 7 7 nmos “-6.00 L-6.0U

M14 6 14 7 7 nmos “-6.00 L-6.00

M15 17 17 8 8 pmos “-6.00 L-6.0U

M16 6 17 8 8 pmos “-6.00 L-6.0U

Mb 13 5 7 7 nmos “-6.00 L-6.0U

.ENDS
*******************t****tttttt****#*************
* Subcircuit GMULD1 Vdd Vss V1 V2 V3 V4 Vb I+ I-
* 8 7 1 2 3 4 5 21 22
* Schematic: Mead’s book page 95
tit******it***********t****¢#******************t

.SUBCKT GMULD1 8 7 1 2 3 4 5 21 22

M1 12 1 13 7 nmos “-6.00 L-6.0U

142' 16 2 13 7 nmos “-6.00 L-6.0U

M3 12 12 8 8 pmos “-6.00 L-6.0U

M4 16 16 8 8 pmos “-6.00 L-6.0U

M5 15 12 8 8 pnos “-6.00 L-6.0U

M6 18 16 8 8 pmos “-6.00 L-6.0U

M7 11 4 15 8 pmos “-6.00 L-6.00

M8 14 4 18 8 pmos “-6.00 L-6.0U

M9 14 3 15 8 pmos “-6.00 L-6.0U

M10 11 3 18 8 pnos “-6.00 L-6.0U

M11 11 11 7 7 nmos “-6.00 L-6.0U

M12 14 14 7 7 nmos “-6.00 L-6.0U

M13 21 11 7 7 nmos “-6.00 L-6.00

M14 22 14 7 7 nmos “-6.00 L-6.0U

Mb 13 5 7 7 nmos “-6.00 L-6.0U

.ENDS

.ENDS
##-*********************¥*****************************
* Subcircuit MYIV Vdd Vss I+ I- Vbias Vout
- 8 7 1 2 4 3

*****t************************************************
.SUBCKT’MYIV 8 7 1 2 3 4

262

- Mirror current out of 1, add to to current coming from 2
M1 8 1 1 8 pmos “-4.00 L=12.00
M2 8 1 2 8 pmos “-4.00 L=12.00

* Output gain Stage

M3 8 2 4 7 nmos “-4.00 L-12.0U

M4 4 3 7 7 nmos “-4.00 L=12.00

.ENDS
**********************#44444444444444444444444
* Subcircuit STRANSAM V1 V2 Vb Vout Vdd Vgnd

* 1 2 3 4 5 6
* “/L . 1/1
*********************4***************s********
.SUBCKT STRANSAM 1 2 3 4 5 6

M1 7 1 9 6 nmos “-6.00 L-6.00
M2 4 2 10 6 nmos “-6.00 L-6.00
M5 9 9 8 6 nmos “-6.00 L-6.00

M6 10 10 8 6 nmos “-6.00 L-6.0U

Mb 8 3 6 6 nmos “-6.00 L-6.00
M3 7 7 5 5 pmos “-6.00 L-6.00
M4 4 7 5 5 pnos “-6.00 L-6.00
.ENDS

**************#**********************¥********

* Subcircuit TRANSAMl V1 V2 Vb Vout Vdd Vgnd
* . 1 2 3 4 5 6
4 “/L - 1/1
#4444444:44:44:444*44*444444444444444444444444
.SUBCKT TRANSAMi 1 2 3 4 5 6

M1 7 1 8 6 nmos “-6.00 L-6.0U
M2 4 2 8 6 nmos “-6.00 L-6.00
Mb 8 3 6 6 nmos “-6.00 L-6.0U
M3 7 7 5 5 pmos “-6.00 L-6.00
M4 4 7 5 5 pnos “-6.00 L-6.0U
.ENDS

t**t****************************************t*
* Subcircuit TRANSAM V1 V2 Vb Vout Vdd Vgnd

* 1 2 3 4 5 6

* “/L - 1/3
***t***********#*#*#********#***************#*
.SUBCKT TRANSAM13 1 2 3 4 5 6

M1 7 1 s 6 nmos “-6.00 L-6.0U
M2 4 2 s 6 nmos “-6.00 L-6.0U
Mb 6 3 6 6 nmos w-6.ou L-18.00
M3 7 7 5 5 pmos “-6.00 L-6.00
M4 4 7 5 5 pmos “-6.00 ]..-6.00

.ENDS

263

#**********************IN:*************************IkIHIIHI
* Subcircuit MYSDS V1 V2 V3 Vb1 Vb2 Vb3 Vdd Vgnd Vref
* 1‘ 2 3 11 12 13 s 7 9
* C(Al) > G(A3)
annualan:************************************************
.susch MYSOS 1 2 3 11 12 13 s 7 9

XA1 1 2 1 1 2 8 7 STRANSAM
XA2 2 9 12 3 8 7 STRANSAM
XA3 2 3 13 2 8 7 TRANSAM13

.ENDS

BIBLIOGRAPHY

[1]

BIBLIOGRAPHY

B. Widrow et al., “Adaptive noise cancellation: principles and applications,”

IEEE Proceedings, vol. 63, pp. 1691-1717, Apr. 1975.

[2] F. M. Salam, “An adaptive network for blind separation of independent signals,”

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Proc. of IEEE International Symposium on Circuits and Systems, vol. 1, pp. 431-
434, May 1993.

K. S. Narendra and K. Parthasaraphy, “Identiﬁcation and control of dynamical

systems using neural networks,” IEEE Trans. on Neural Networks, vol. 1, pp. 4—
27, March 1991.

C. Jutten, Calcul neuromime’tique et traitement du signal, analyse en com-
posantes inde’pendantes. These de doctorat d’etat, Université Scientiﬁque et
Médicale-Institut Nationale Polytechnique, Grenoble, France, 1987.

C. Jutten and J. Herault, “Space or time signal processing by neural networks
models,” Proceedings on Neural Networks for Computing, vol. 1, pp. 151 - 170,
Apr. 1986.

C. Jutten, J. Herault and A. Guerin, “In.c.a.: An independent components anal-
yser based on neuromimetic network,” Artiﬁcial Intelligence and Cognitive Sci-
ences, pp. 201-219, 1988.

J. Herault and C. Jutten, “Space or time adaptive signal processing by neural
network models,” Neural Network for computing, AIP Conference Proceedings,
pp. 207-211, Apr. 1986.

J. Herault and C. Jutten, “Blind separation of sources, part 1: An adaptive

algorithm based on neuromimetic architecture,” Signal Processing, vol. 24, pp. 1-
10, July 1991.

P. Comon, J. Herault and C. Jutten, “Blind separation of sources, part 2:
Problem statement,” Signal Processing, vol. 24, pp. 11—20, July 1991.

264

265

[10] E. Sorouchyari, “Blind separation of sources, part 3: Stability analysis,” Signal
Processing, vol. 24, pp. 21—29, July 1991.

[11] J.C.Fort, “Stability of the jh sources separation algorithm,” Signal Processing,
vol. 8, no. 1, pp. 35-42, 1991.

[12] J. Karhunen and J. Joutsensalo, “Representation and separation of signals using
nonlinear pca type learning,” Neural Networks, vol. 7, no. 1, pp. 113—127, 1993.

[13] A. Chichocki and L. Moszcznski, “New learning algorithm for blind separation
of sources,” Electronic Letters, vol. 28, pp. 1986—1887, 1992.

[14] K. Meriam, A. Belouchrani and J. F. Cardoso, “Assymptotic performance of
second order blind source separation,” Proc. of IEEE International Symposium
on Speech and Signal Processing, vol. 4, pp. 277-280, 1994.

[15] E. Moreau and O. Macchi, “New self-adaptive algorithms for source separation
based on contrast functions,” Proc. of IEEE International Symposium on Speech
and Signal Processing, vol. 1, pp. 515—519, 1993.

[16] N. Delfosse and P. Loubaton, “Adaptive separation of independent sources: A
deﬂection approach,” Proc. of IEEE International Symposium on Speech and
Signal Processing, vol. 4, pp. 41—44, April 1994.

[17] J. Lacome and P. Ruiz, “Source identiﬁcation: A solution based on cumu-
lants,” Proc. of IEEE International Symposium on Speech and Signal Processing,
vol. Workshop, 1988.

[18] J. F. Cardoso, “Source separation using higher order moments,” Proc. of IEEE
International Symposium on Speech and Signal Processing, pp. 2109-2112, 1989.

[19] J. F. Cardoso, “Super-symmetric decomposition of the fourth order cumulant
tensor, blind identification of more sources than sensors,” Proc. of IEEE Inter-
national Symposium on Speech and Signal Processing, pp. 14-17, 1991.

[20] J. F. Cardoso, “Iterative techniques for blind source Separation using only fourth
order cumulants,” European Signal Processing Conference, pp. 739-742, 1992.

[21] J. F. Cardoso and B. Laheld, “Equivariant adaptive source separation,” submitted
to_IEEE Trans. on Signal Processing, 1996.

[22]

[23]

[24]

[25]

[26]

[271

[28]

[29]

[30]

[31]

[32]

266

J. F. Cardoso and P. Comon, “Independent component analysis, a survey of
some algebraic methods,” Proc. of IEEE International Symposium on Circuits
and Systems, vol. 2, pp. 93-96, May 1996.

P. Comon, “Independent component analysis: A new concepts?,” Signal Pro-
cessing, vol. 36, no. 3, pp. 287-314, 1994.

S. Amari, A. Cichocki ans H. Yang, “A new learning algorithm for blind signal
separation,” MIT Press (in press), 1996.

A. Bell and J. Sejnowski, “Fast blind separation based on information theory,”
Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applica-
tions, pp. 43—47, December 1995.

T. Bell and T. J. Sejnowski, “A non-linear information maximisation algorithm
that performs blind separation,” Advances in Neural Information Processing Sys-
tems 7, MIT Press, pp. 467-474, 1995.

G. Giannakis, Y. Inouye and J.M. Mendel, “Cumulants based identification
of multichannel moving average models,” IEEE Trans. on Automatic Control,
vol. 34, pp. 783—787, 1989.

L. Tong, R. Liu, V. C. Soonand Y. Huang, “Inderminacy and identiﬁability of
blind identification,” IEEE Trans. on Signal Processing, vol. 38, pp. 409—509,
May 1991.

L. Tong, Y. Inouye and R. Liu, “Wave-preserving blind estimation of multiple
independent sources,” IEEE Trans. on Signal Processing, vol. 41, pp. 2461—2470,
July 1993.

X. Ling, Y. Huang and R. Liu, “A neural network for blind signal separation,”
Proc. of IEEE International Symposium on Circuits and Systems, vol. 27, pp. 69—
72, May 1994.

V. C. Soon, L. Tong, F. Huang and R. Liu, “An extended fourth order blind iden-
tiﬁcation algorithm in spatially correlated noise,” Proc. of IEEE International
Symposium on Speech and Signal Processing, pp. 1365-1368, 1990.

R. O. Schmidt, “Multiple emitter location and signal parameter estimation,”
IEEE trans. on Antennas Propagation, vol. AP, pp. 276-280, March 1986.

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

267

R. Roy and T. Kailath, “Esprit-estimation of signal paramater via rotational-
invariance techiques,” IEEE trans. on signal Processing, vol. 37, pp. 984—993,
July 1989.

B. Agee, “Blind separation and capture of communication signals using a multi-
target constant modulus beamformer,” IEEE Millitary Communication Confer-
ence, pp. 340—346, 1989.

A. Swindlehurst, S Daas and J. Yang, “Analysis of a decision directed beam-
former,” IEEE Trans. on Signal Processing, vol. 43, pp. 2920-2927, December
1995.

B. Agee, A. Schell and W. Gardner, “Spectral self-coherence restoral: A new

approach to blind adaptive signal extraction using antenna arrays,” IEEE Pro-
ceedings, vol. 78, pp. 753-767, April 1990.

S. Talwar, M. Viberg and A. Paulraj, “Blind separation of synchronous co-
channel digital signals using an antenna array,” IEEE Trans. on Signal Pro-
cessing, vol. 44, pp. 1184—1197, May 1996.

A. Belouchrani and F. F. Cardoso, “Maximum likelihood source separation by
the expectation-maximization techniquez, Deterministic and stochastic imple—
mentation,” Proc. 0f IEEE International Symposium on Nonlinear Theory and
its Applications, pp. 49—53, December 1995.

D. Pham, P. Garrat and C. Jutten, “Separation of a mixture of independent

sources through a maximum likelyhood approaCh,” European Signal Processing
Conference, pp. 771—774, 1992.

G. Giannakis and M. Tsatsanis, “A unifying maximum likelyhood view of cu-
mulants and polyspectral measures for non-gaussian signal classiﬁcation and es-
timation,” IEEE Trans. on Information Theory, vol. 38, pp. 386-406, March
1992.

E. Moulines, P. Duhamel, J. F. Cardoso and S. Marrague, “Subspace methods
for the blind identiﬁcation of multichannel ﬁr ﬁlters,” IEEE Trans. on Signal
Processing, vol. 43, pp. 516-525, February 1995.

S. Gerven and D. Compernolle, “Feedforward and feedback in a symmetric adap-
tive noice canceller: Stability analysis in simpliﬁed case,” Signal Processing,
pp. 1081-1084, 1992.

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

268

H. Thi and C. Jutten, “Blind source separation for convolutive mixtures,” Signal
Processing, vol. 45, pp. 209-229, 1995.

T. Bell and T. J. Sejnowski, “Blind separation and blind deconvolution: An
information-theoretic approach,,” Proc. of IEEE International Symposium on
Speech and Signal Processing, pp. 3415-3418, May 1995.

A. Bell and T. Sejnowski, “An information-maximization approach to blind sep-
aration and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129—
1159, 1995.

A. B. Gharbi and F. M. Salam, “Blind separation of independent sources in
linear dynamical media,” Proc. 0f IEEE International Symposium on Nonlinear
Theory and its Applications, December 5-9 1993.

A. B. Gharbi and F. M. Salam, “Separation of mixed signals: Formulation
and some implementations,” Proc. of IEEE Midwest Symposium on Circuits and
Systems, August 4-6 1994. I

E. Vittoz and X. Arreguit, “Cmos integration of herault-jutten cells for sep-
aration of sources,” Proceedings Workshop on Analog VLSI and Neural Systems,
May 1989.

M. H. Cohen and G. Andreou, “Current-mode subthreshold mos imple-
mentation of herault-jutten autoadaptive network,” IEEE Journal of Solid-State
Circuits, vol. 27, pp. 714—727, May 1992.

A. B. Gharbi and F. M. Salam, “Implementation and experimental results
of a chip for the separation of mixed and ﬁltered signals,” Journal of Circuits,
Systems and Computers, vol. 6, pp. 115-139, April 1996.

A. B. Gharbi and F. M. Salam, “Test results of a chip for the separation
of mixed and ﬁltered signals,” IEEE Trans. on Circuits and Systems, vol. 42,
pp. 748—751, November 1995.

A. B. Gharbi and F. M. Salam, “Implementation and test results of a chip

for the separation of mixed signals,” Proc. of IEEE International Symposium on
Circuits and Systems, vol. 1, pp. 271—274, May 1995.

E. Oja, “A simpliﬁed neuron model as a principal componenet analyzer,” Journal
of Mathematical Biology, vol. 15, pp. 267-273, 1982. '

269

[54] E. Oja, “Principal components, minor components, and subspaces,” Neural Net-
works, vol. 1, pp. 61—68, 1992.

[55] S. Amari, Diﬁ'erential-Geometry Methods in Statistics, Lecture Notes in Statis-
tics, vol. 28, vol. 28. New York: Springer, 1985.

[56] J. F. Cardoso, “The invariant approach to source separation,” Proc. 0f IEEE
International Symposium on Nonlinear Theory and its Applications, pp. 55-60,
December 1995.

[57] C. Jutten and J. F. Cardoso, “Separation of sources: Really blind?,” Proc.
0f IEEE International Symposium on Nonlinear Theory and its Applications,
pp. 79-84, December 1995.

[58] S. Kullback, Topics in statistical information theory. New York: Springer-Verlag,
1987.

[59] Chi-Tsong Chen, Linear System Theory and Design. New York: Holt, Rinehart
and Winston Publishing, 1984.

[60] J. R. Magnus and H. Neudecker, Matrix Diﬂ'erential Calculus with Applications
in Statistics and Economics. New York: John Wiley Series in Probability and
Mathematical Statistics, 1988.

[61] D. Yellin, E. Weinstein, “Multichannel signal separation: Methods and analysis,”
IEEE Trans. on Signal Processing, vol. 44, pp. 106-118, January 1996.

[62] John E. Kolassa, Series Approximation Methods in Statistics. New York:
Springer-Verlag, 1994.

[63] A. Cichocki, S. Amari, M. Adachi and W. Kasprzak, “Self-adaptive neural net-
works for blind separation of sources,” Proc. of IEEE International Symposium
on Circuits and Systems, vol. 2, pp. 157—160, May 1996.

[64] S. Amari, A. Cichocki and H. H. Yang, “Recurrent neural networks for blind
separation of sources,” Proc. 0f IEEE International Symposium on Nonlinear
Theory and its Applications, pp. 37-42, December 1995.

[65] A. Cichocki, W. Kasprzak and S. Amari, “Multi-layer neural network with local
adaptive learning rules for blind separation of source signals,” Proc. 0f IEEE

International Symposium on Nonlinear Theory and its Applications, pp. 61-65,
December 1995.

270

[66] L. De Lathauwer, P. Comon, B. De More and J. Vandewalle, “Application in
independent component analysis,” Proc. 0f IEEE International Symposium on
Nonlinear Theory and its Applications, pp. 91—96, December 1995.

[67] Peter McCullagh, Tensor methods in statistics. New York: Chapman and Hall,
1987.

[68] Gram Charlier, Application, de la theorie des probabilites a l’astronomie, Part
of the traite. Paris: Gauthier-Villars, 1931.

[69] L. Watts, D. A. Kerns, R. F. Lyon and C. A. Mead, “Improved implementation
of the silicon cohhlea,” IEEE Journal on Solid-State Circuits, vol. 27, pp. 692-
700, May 1992.

[70] B. Gilbert, “A precise four-quadrant multiplier with subnanoseconds response,”
IEEE Journal of Solid-State Circuits, vol. SC, no. 3, pp. 365-372, 1968.

[71] C. Mead, Analog VLSI and Neural Systems. New York: Prentice Hall, 1989.

[72] A. Papoulis, Probability, random variables and Stochastic Processes. New York:
McGraw-Hill, 1984.