THESIS MICHIGANS STTA E UNNE ESE IIIII I IIIIIII IIIIIIIII I IIIII IIIIIIIIII 1293 01050 3211 This is to certify that the dissertation entitled BLIND SEPARATION OF UNKNOWN SOURCES IN DYNAMIC ENVIRONMENTS: THEORETICAL FORMULATION AND MICRO-ELECTRONIC IMPLEMENTATION presented by Ammar B. Gharbi has been accepted towards fulfillment of the requirements for Ph. D. degree in Electrical Engineering Major professor DateWé MS U is an Affirmative Action/Equal Opportunity Institution 0- 12771 I LIBRARY Michigan State University PLACE II RETURN BOX to removothb chockoui from your record. TO AVOID FINES Mum on or baton «to due. DATE DUE DATE DUE DATE DUE 00 1 4 2000 ‘fL —i #—— usu iaAnAfflnnatlvo ActionlEqual Opportunity imtltuion #1 W m1 Blind Separation of Unknown Sources in Dynamic Environments: Theoretical Formulation and Micro-Electronic Implementation By Ammar B. Gharbz' A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1996 ABSTRACT Blind Separation of Unknown Sources in Dynamic Environments: Theoretical Formulation and Micro-Electronic Implementation By Ammar B. Gharbz' Novel learning algorithms for the problem of blind separation of signals in static and dynamic environments are developed. The derived learning rules of these algo- rithms are based on the minimization of mutual information functionals. A novel learning rule is initially derived from the decorrelation of the output signals. This rule is then enhanced by including higher order terms to test for the independence of signals. Optimization theory is utilized to derive a general framework to develop an update rule for the parameters of a linear dynamical network model. Higher order statistics are also explored to develop an approximate expression of the mutual information which depends on the unknown probability density functions of the output signals. The modeling of the environment is considered as an important factor in the de- velopment of the update law. Keeping the analog implementation of the algorithms in mind, state space realizations of the network which minimize the number of pa- rameters are considered. Such choice of realization would result in a reduction of the complexity of the network and also the corresponding circuit blocks. Computer simulation are conducted to evaluate the performance of the devel- oped algorithm. A circuit implementation of one of the developed algorithms for the dynamic case is described and its performance is verified using the PSPICE circuit simulator. In summary, the main contributions of this thesis are the development of: 1. a novel update law for the static environment case based on the decorrelation of the output signals and its invariants; 2. a novel energy function that characterizes the statistical independence of sig- nals using higher order statistics for both the feedforward and the feedback structures; 3. a novel framework to derive the update laws for the parameters of a dynamic network using optimization theory and the calculus of variations; and 4. a circuit realization of one of the developed algorithms for a dynamic feedback network. Copyright by Ammar B. Gharbi 1996 To my wife Nancy, my parents and my family ACKNOWLEDGEMENTS I would like to express my sincere thanks and appreciation to professor Fathi Salam for his guidance, support, and inspiration throughout the course of this research. His valuable, kind consideration, and understanding have been an incentive for the completion of this dissertation. Sincere gratitude is extended to the members of my guidance committee: Profes- sor Hassan Khalil, Professor Don Reinhart and professor Charles Maccluer for their valuable assistance and service on my committee. I would like to express my sincere thanks and gratitude to my wife Nancy for her patience, love, affection, support, encouragement and sacrifice during every step that I made towards the completion of this research. Last, but not least, I would like to express my sincere thanks and appreciation to my parents, and every member of my family for their patience, support and sacrifice awaiting for this day. vi TABLE OF CONTENTS LIST OF TABLES x ' LIST OF FIGURES xi 1 Introduction 1 2 Literature Review 5 2.1 Neuromemitic Algorithm ......................... 8 2.1.1 Derivation of The H-J Update Law ............... 10 2.1.2 Gradient Descent Method .................... 11 2.1.3 Evaluation of the update law .................. 13 2.1.4 Computer Simulations ...................... 14 2.2 Mutual Information Approach ...................... 24 2.3 Information Theoretic Approach ..................... 29 2.4 Algebraic Approach ............................ 37 3 Blind Separation in a Static Environment 44 3.1 Problem Definition and Architecture .................. 44 3.2 Theoretical Solution .................... ' ....... 45 3.3 Energy Function ............................. 46 3.4 Update law derivation .......................... 49 3.5 Computer Simulations .......................... 51 3.6 First Improvement ............................ 54 3.6.1 Computer Simulations ...................... 55 3.7 Second Improvement ........................... 64 3.7.1 Computer Simulations ...................... 65 3.8 Observation and Remarks ........................ 75 4 Higher Order Statistics 76 4.1 Mutual information ............................ 77 4.1.1 Cumulants and Moments ..................... 80 vii 4.1.2 Edgeworth Expansion ...................... 81 4.1.3 Entropy Approximation ..................... 83 4.2 Independence Criteria .......................... 86 4.3 Static Case: Feedforward Network Structure .............. 87 4.3.1 Computer Simulations ...................... 90 4.4 Static Case: Feedback Network Structure ................ 94 4.4.1 Computer Simulations ...................... 97 Blind Separation in a Dynamic Environment: FeedForward Struc- ture 103 5.1 Problem Definition ............................ 103 5.2 Existence of a Theoretical Solution ................... 104 5.3 State Realization ............................. 106 5.3.1 Controllable/Observable Canonical Form ............ 107 5.3.2 Features of the realization .................... 111 5.4 Adaptive Optimal Control Theory .................... 114 5.4.1 Computer Implementation .................... 116 5.5 Update Law Derivation .......................... 117 5.5.1 Nonlinear Dynamic Modeling .................. 117 5.5.2 Linear Dynamical Case ...................... 122 5.6 Computer Simulations .......................... 124 5.7 Observations ................................ 137 Blind Separation in a Dynamic Environment: Feedback Structure 138 6.1 Architecture ................................ 138 6.2 State Representation ........................... 141 6.3 Update Law Derivation .......................... 142 6.4 Computer Simulation Results ...................... 143 6.4.1 The Approach Based on Mutual Information .......... 144 6.4.2 An Approach Based on Neuromemitic ............. 152 6.5 Sequential Update ............................ 169 6.6 Time and Weight Scaling ......................... 200 6.6.1 Mathematical Development ................... 200 6.6.2 Example .............................. 208 6.7 Observations ................................ 210 Micro-electronic Implementation 211 7.1 CMOS Building Blocks ..... i ..................... 211 viii 7.1.1 Transistor ............................. 7.1.2 Current mirror .......................... 7.1.3 Transconductance Amplifier ................... 7.1.4 Hyperbolic Sine Function ..................... 7.1.5 Gilbert Multiplier ......................... 7.1.6 Vector Multipliers ......................... 7.1.7 Current to Voltage Converter .................. 7.1.8 Second Order Section ....................... 7.1.9 Second Order Filter Design ................... 7.1.10 Circuit Implementation of Second Order Filter ......... 7.1.11 Circuit Realization for n-dimensional Filter ........... 7.2 Circuit Realization ............................ V 7.2.1 Implementation of the output equation ............. 7.2.2 Implementation of the state equation .............. 7.2.3 Implementation of the C update equation ........... 7.2.4 Implementation of the D update equation ........... 7.3 Circuit Simulation ............................ Conclusion Definitions A.1 Statistical Definitions ........................... A.2 Matrix Notations ............................. Proofs 3.1 Proof of Theorem 3.5 ........................... B.2 CMOS Circuit Function Derivation ................... Matlab C.1 Derivation of f,I .............................. C.2 Matlab Source Code ........................... PSPICE Files D.1 Circuit File ................................ D.2 Library File ................................ BIBLIOGRAPHY ix 211 212 212 214 215 216 218 219 222 223 225 229 229 231 231 242 244 244 246 247 247 248 252 252 253 255 255 259 264 2.1 3.1 3.2 3.3 3.4 3.5 4.1 4.2 5.1 5.2 7.1 7.2 LIST OF TABLES Information Theoretic Algorithm .................... Simulation Results Algorithm defined by equation (3.23). f(.1:) = 2:3 and g(z) = a: ............................... Simulation Results Algorithm defined by equation (3.23). f(:c) = sinha: and g(:r) = tanha: ......................... Simulation Results Algorithm defined by equation (3.27). f (:r) = x3 and 9(2) = z ............................... Simulation Results Algorithm defined by equation (3.27). f (x) = sinhz and g(z:) = tanhz ......................... Results of Simulation Comparison between Algorithms defined by equations (3.23) and (3.27) ........................ Conversion of moments and cumulants ................. Hermite Polynomials derived from a normal baseline density function 0(2) .................................... Number of parameters for 2 realizations ................. Different N onlinearity functions and their derivatives ......... Parameters of the Environment Circuit Realization .......... Parameters of the Environment Circuit Realization .......... 34 64 74 75 83 112 119 LIST OF FIGURES 2.1 Herault and Jutten Architecture ..................... 9 2.2 Performance of H-J Algorithm. The parameters are updated according to d.,'j = 1] y? yj .............................. 18 2.3 Performance of H-J Algorithm. The parameters are updated according to d},- = 17 y? y,-. The solid curve is the unknown sources. The dotted curve, superimposed onto the solid curve, is the network output . . . 19 2.4 Performance of H-J Algorithm. The parameters are updated according to (iij = r) sinh y,- tanh y,- ......................... 20 2.5 Performance of H-J Algorithm. The parameters are updated according to (Ii,- = r] sinh y.- tanh y,. The solid curve is the unknown sources. The dotted curve, superimposed onto the solid curve, is the network output 21 2.6 Performance of H-J Algorithm. The parameters are updated according to dgj = 17 q;,- y; y,- ............................. 22 2.7 Performance of H-J Algorithm. The parameters are updated according to (1,3 = 1) q,-,- y; 3],. The solid curve is the unknown sources. The dotted curve, superimposed onto thede curve, is the network output . . . 23 2.8 Feedforward Architecture ......................... 24 2.9 Nonlinear functions ............................ 27 2.10 Parameter convergence of the algorithm defined by equation (2.49) . . 28 2.11 Bell and Sejnowski’s Architecture .................... 30 2.12 Performance of Bell and Sejnowski Algorithm ............. 33 2.13 Performance of Bell and Sejnowski Algorithm Using Speech Signals . 35 2.14 Performance of Bell and Sejnowski Algorithm Using Sine signals . . . 36 2.15 Cardoso’s Architecture .......................... 37 2.16 Cardoso Algorithm for n =2 ....................... 39 2.17 Cardoso Algorithm for n =3 ....................... 40 2.18 Continuous-time Algorithm for n =2 .................. 42 2.19 Continuous-time Algorithm for n =3 .................. 43 3.1 Static Environment Architecture ..................... 45 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Performance of Algorithm defined by equation (3.19) ......... 53 Performance of Algorithm defined by equation (3.23) for 17(t) = 100. f(a:) = $3 and g(:r) = 2: .......................... 57 Performance of Algorithm defined by equation (3.23) for ”(t) = 10. f(.'c) = 2:3 and g(:c) = a: .......................... 58 Performance of Algorithm defined by equation (3.23) for n(t) = 1006'“. f(x) = x3 and 9(1) = a: ..................... 59 Performance of Algorithm defined by equation (3.23) for 17(t) = 100. f(a:) = sinhzr and g(a:) = tanhr ..................... 61 Performance of Algorithm defined by equation (3.23) for n(t) = 10. f(2:) = sinha: and g(:r) = tanha: ..................... 62 Performance of Algorithm defined by equation (3.23) for 17(t) = 1006"“ 63 Performance of Algorithm defined by equation (3.27) for 17(t) = 100. f(2:) = 2:3 and g(:c) = 2: .......................... 67 Performance of Algorithm defined by equation (3.27) for 17(t) = 10. f(.r) = 2:3 and 9(3) = a: .......................... 68 Performance of Algorithm defined by equation (3.27) for 17(t) = 10. f(:c) = 2:3 and g(:c) = :c .......................... 69 Performance of Algorithm defined by equation (3.27) for 17(t) = 100. f(2) = sinhz and g(:r) = tanhz ..................... 71 Performance of Algorithm defined by equation (3.27) for 11(t) = 10. f(a:) = sinhx and g(a:) = tanha: ..................... 72 Performance of Algorithm defined by equation (3.27) for 11(t) = 1006’“. f(:c) = sinhx and 9(1) = tanha: ................ 73 Feedforward Architecture ......................... 78 Nonlinear function. Graphs (a)-(d) show the plots of the functions fa(y) through fd(y) ............................ 91 Natural Logarithmic function and few orders of approximations . . . 91 Computer Simulation of Feedforward Structure Using fa ....... 93 Computer Simulation of Feedforward Structure Using fd ....... 95 Performance of the Algorithm defined by the nonlinear function fa . . 98 Performance of the Algorithm defined by the nonlinear function fd . . 99 Performance of the Algorithm defined by the nonlinear function fa using random Gaussian and a sine function ............... 100 Performance of the Algorithm defined by the nonlinear function fd using a random Gaussian and a sine function .............. 101 xii 4.10 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 6.1 6.2 6.3 6.4 6.5 .6.6 6.7 6.8 Output of the Algorithm defined by the nonlinear function fa using random Gaussian and a sine function .................. 102 Feedforward Architecture ......................... 103 (a) State Equation 5a,, = A;,-x,-,- + bgjuj, (b) Output Equation yij = egg-xi,- + (1,321, and (c) Output Equation y,- = ciTx, + dfu ........ 113 Bode plot of the environment model defined by H (s) ......... 125 Bode plot of the network model defined by H ‘(s) ........... 126 Dynamic Forward Structure. Update only the D matrix according to D = 7[D’T — fa(y)eT] using two sine waveforms, 7 = 0.005 and D0 = 0128 Dynamic Forward Structure. Update only the 0 matrix ac- cording to C = —17f.,(y)xT using two sine waveforms, r) = [1000 100 1 100 1 100 10] and Co = 0.86“ ................ 129 Dynamic Forward Structure. Update only the D matrix according to D = 7[D'T -— fd(y)eT] using two sine waveforms, 7 = 0.5 and D0 = O 130 Dynamic Forward Structure. Update both the C and D matrices ac- cording to D = 7[D"'T - fa(y)eT] and C = —17f,,(y)xT using two sine waveforms, 7 = 0.005, r) = [100 10 1 1000 5 2000 10]. All initial conditions are zero ............................ 131 Dynamic Forward Structure. Update only the D matrix according to D = -7fa(y)yT using two sine waveforms, 7 = 0.005 and D0 = 0 . . 132 Dynamic Forward Structure. Update only the D matrix according to D = —7f.1(y)yT using two sine waveforms, 7 = 0.5 and D0 = O . . . . 133 Dynamic Forward Structure. Test Simulation of the update law D = —7f., (y)yT at convergence using two sine waveforms .......... 134 Update of the D Matrix using D = -7y3y ............... 135 Update of the D Matrix using D = -7 sinhytanh y .......... 136 Feedback Processing Structure ...................... 141 Frequency response of the environment model ............. 145 Frequency response of the network model ................ 146 6,3 = mjfa(yg)$.‘j and 1);," = 10 ...................... 149 (3,3 = mjfa(y,')$,'j and 17;," = 101 ..................... 150 (3,3 = mjfa(y.-)z.-j and 17.3 = 10° ..................... 151 (5,3 = qgjfa(y,-)z,-j and flij = 10-2 ..................... 152 Update all the parameters of the 0 matrix according to a, = mjfa(y,-)z.-j, (012, ms, 1121, 1722) = (10, 0.1, 2, 0.1) and Co = 0.80 . . 153 xiii 6.9 Update all the parameters of the C matrix according to a, = 17,-jf0(y,-):c,-,-, (1)12, 1713, 1721, 1722) = (10, 0.1, 2, 0.1) and Co = 0.5C . . 6.10 Update all the parameters of the C matrix according to c.)- = mjfa(y.-)z,~,-, (1)12, 1113, 1121, 1722) = (10, 0.1, 2, 0.1) and Co = 0.1C . . 6.11 Update all the parameters of the C matrix according to (2;,- = mjfa(y,-):c,-,-, (7)12, 1713, 1721, 7122) = (10, 0.1, 2, 0.1) and .Co = 0 . . . . 6.12 Update all the parameters of the D matrix according to dij = 7 fa(y,-)y,-, 7 = 0.0005 and (due, duo) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) .................. 6.13 Update all the parameters of the D matrix according to d5,- = 7 fa (y;)y,-, 7 = 0.0005 and (d120, (1210) = (0.0, 1.0). The unknown sources are 31 (t) = sin(27rt) and 32(t) = sin(47rt) .................. 6.14 Update all the parameters of the D matrix according to 3.3 = 7 fa(y,-)y,-, 7 = 0.0005 and (due, dglo) = (1.0, 0.0). The unknown sources are 31 (t) = sin(27rt) and 32(t) = sin(41rt) .................. 6.15 Update all the parameters of the D matrix according to d}; = 7 fa(y,~)y,-, 7 = 0.0005 and (d120, £1210) = (1.0, 1.0). The unknown sources are .91 (t) = sin(21rt) and 32(t) = sin(41rt) .................. 6.16 Update all the parameters of the D matrix according to d;,- = 7 fa(y,-)y,- and 7 = 0.0005 for different initial conditions. The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) ................ 6.17 Dynamic Feedback Structure. Update only the C matrix according to C = r;f.,¢(y)xT using two sine waveforms, 17 = [100 10 100 10] and Co = 0.8C”I ................................ 6.18 Dynamic Feedback Structure. Update only the D matrix according to D = 17f,¢(y)yT using two sine waveforms, 17 = 0.2 and D0 = 0 ..... 6.19 ng = 11.331535, and 17.3 = 2000 ....................... 6.20 égj=mjygxgj andmj=20 . . . . . . . . . . . . ~. ........... 6.21 Cg; = 17.331ng and 17,3 = 500 ....................... 6.22 6.3 = ngjygzgj and 17;]: = 10 ........................ i 6.23 Update all the parameters of the C matrix according to é.-,- = flijygitgj, (1712, 1713, rm, 1122) = (103,10,102,1) and Co = 0.8C .......... 6.24 Update all the parameters of the C matrix according to 6,-1- = ngjygrgj, (1712, 1713, 1721, rm) = (103,10,102,1) and Co = 0.5C .......... 6.25 Update all the parameters of the C matrix according to a, = flijyixij, (1712, 7113, 1121, 1722) = (103,1O,102,1) and Co = 0.1C .......... xiv 154 155 156 161 163 170 171 174 175 176 6.26 6.27 6.28 6.29 6.30 6.31 6.32 6.33 6.34 6.35 6.36 Update all the parameters of the C matrix according to ij = mJ-ygxgj, (7’12, 1113, 1721, rm) = (103,10,102,1) and Co = 0 ............ Update all the parameters of the C matrix according to a, = 17;jy,:c,-j, (1712, 1713, 1721, 1722) = (103,10,102,1) and Co = O. The unknown sources are 31(t) = sin(27rt) and 32(t) = square(47rt) ............. Update all the parameters of the C matrix according to (3,, = mjygxgj, (1)12, 1113, 1121, 1722) = (103,10,105,102) and Co = 0. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(47rt) . . . ' . . . Update all the parameters of the C matrix according to a, = 17,-,- sinh ygtanh 2:,-,-, (1712, 1713, 1721, 1722) = (103,10,102,10) and Co = 0. The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) . . . . Update all the parameters of the C matrix according to a, = flijyg3$ij, (1m, 1713, 1721, 1722) = (103,10,102,10) and Co = 0. The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) ............ Update all the parameters of the D matrix according to d}; = 7ygej, - 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt) and 52(t) = sin(41rt) ...................... Update all the parameters of the D matrix according to d5,- = 7yf'yj, 7 = 0.2 and ((1120, duo) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) ...................... Update all the parameters of the D matrix according to d5,- = 7sinh ygtanh 3],, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) ............ Update all the parameters of the D matrix according to d},- = 7sinh y,tanh y,, 7 = 0.2 and ((1120, duo) = (0.0, 0.9). The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(41rt) ............ Update all the parameters of the D matrix according to (iij = 7sinh ygtanh 3],, 7 = 0.2 and (d120, 61210) = (0.9, 0.0). The unknown sources are .91 (t) = sin(27rt) and 32(t) = sin(41rt) ............ Update all the parameters of the D matrix according to dij = 7sinh ygtanh 3],, 7 = 0.2 and (d120, (1210) = (0.9, 0.9). The unknown sources are .91 (t) = sin(21rt) and 32(t) = sin(41rt) ............ 6.37 Update all the parameters of the D matrix according to d},- = 7sinh ygtanh y,- and 7 = 0.2 for different initial conditions. The un- known sources are 31(t) = sin(27rt) and 32(t) = sin(47rt) ........ XV 177 179 180 181 184 185 186 187 188 6.38 Update all the parameters of the D matrix according to d},- = 7sinh 3(7ch 317, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = square(47rt) ......... 6.39 Update all the parameters of the D matrix according to d},- = 7sinh ygtanh y,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sawtooth(41rt) ........ 6.40 Update all the parameters of the D matrix according to d},- = 7sinhygtanh 7],, 7 = 0.2 and (d120, (I210) = (0.0, 0.0). The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ...... 6.41 Update only one parameter of each matrix: update the parameters on and d1; with 17 = 20, 7 = 0.2 and (c130, duo) = (0.843, (1:2). The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) 6.42 Update only one parameter of each matrix: update the parameters cm and du with 17 = 20, 7 = 0.2 and (c130, (1120) = (4'3, 0.0). The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(47rt) 6.43 Update only one parameter of each matrix: update the parameters on and d1; with 17 = 40, 7 = 0.2 and (c130, dlgo) = ((33, 0.0). The unknown sources are .91 (t) = sawtooth(21rt) and 32(t) = square(47rt) 6.44 Update only one parameter of each matrix: update the parameters on and (In with 17 = 40, 7 = 0.2 and (c130, (1120) = (c'f3, 0.0). The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) 6.45 Update all parameters of each matrix. Co = 0.8C" and D0 = D‘, 17 = [103 10 102 10] , 7 = 0.2 and T = 500 The unknown sources are . 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............ 6.46 Update all parameters of each matrix. Co = 080" and D0 = D‘, 17 = [103 10 104 102], 7 = 0.2 and T = 500. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt) ............ 6.47 Update all parameters of each matrix. Co = 080" and D0 = D‘, 17 = [103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............ 6.48 Update all parameters of each matrix. Co = 080" and D0 = 0, 17 = [103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) ............ 6.49 Update all parameters of each matrix. Co = 0.80" and D0 = O, 17 = [1000 10 100 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(47rt) ............ xvi 191 192 193 194 195 6.50 6.51 6.52 6.53 6.54 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 Update all parameters of each matrix. Co = 0.80“ and D0 = 0, 17 = [1000 10 500 10], 7 = 0.2 and T = 10000. The unknown sources are 31 (t) = sawtooth(21rt) and 32(t) = square(41rt) ............ Update all parameters of each matrix. Co = 080" and D0 = O, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31 (t) = sawtooth(21rt) and 32(t) = square(47rt) ............ Unknown sources, environment output and Network output at initial timet = 0. Co = 0.80“ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(21rt) and 82(t) = square(41rt) ........................... Unknown sources, environment output and Network output at final timet = T. Co = 080" and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(47rt) ........................... Test case: Unknown sources, environment output and Network output at final timet = T. Co = 0.8C‘ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sin(21rt) and 32(t) = sawtooth(41rt) ........................ Current Mirror (a) n-channel, (b) p-channel .............. Transconductance Design ......................... PSPICE simulation of a transconductance amplifier .......... Circuit Realization of tan and sine hyperbolic functions ........ PSPICE simulation of the sine hyperbolic circuit for different biasing voltages V2, ................................. l-dimensional Gilbert multiplier ..................... PSPICE simulation of of Modified Gilbert Multiplier using different biasing voltages .............................. n-dimensional Gilbert multiplier ..................... Current to Voltage Converter ...................... Second Order Section Circuit .................. ‘. . . . PSPICE Simulations of Second Order Section for V7,, = 625, 650, 675, 700, 725mV .................... Filter Design. Circuit realization of controllable canonical form of a second order filter ............................. Filter Design. Circuit realization of controllable canonical form of a second order filter ............................. PSPICE Simulations of second order filter ............... xvii 206 212 213 214 215 216 217 217 218 219 220 222 224 226 226 7.15 Filter Design, Circuit realization of a controllable canonical form of a 11“ order filter ............................... 227 7.16 Circuit Realization of the output equation ............... 230 7.17 Circuit Realization the update equation of the C matrix parameters . 232 7.18 Circuit Realization of tan and sine hyperbolic functions ........ 234 7.19 Block diagram of overall circuit including network and learning . . . . 235 7.20 Unknown sources ............................. 236 7.21 Environment Circuit Model ....................... 236 7.22 Frequency Response of the Environment Circuit Realization ..... 238 7.23 Transient Response of The Environment Circuit Realization ..... 239 7.24 Parameters’ Convergence of the Network Circuit Realization ..... 240 7.25 Output of the Network Circuit Realization at Convergence ...... 241 xviii CHAPTER 1 Introduction The Blind Separation of Sources is a challenging signal processing problem with signif- icant potential applications. The problem is informally described as follows: several unknown but independent temporal signals propagate through a mixing and / or filter- ing, natural or synthetic environment. By observing the outputs of this environment, a network (e.g., a system, a neural network, or a device) is configured to counteract the effect of the environment and adaptively recovers the original signals. For this processing, only the property of signal independence is assumed. No additional a priori knowledge of the original signals is required. This processing represents a form of self (or unsupervised) learning. The weak assumptions and the self-learning capa- bility render such a network attractive from the vieWpoint of real-world applications where training is not an option. The blind separation approach has great advantages over the existing adaptive filtering algorithms. For example, when the mixture of other signals is labeled as noise in this approach, no specific a priori knowledge about any of the signals is assumed; only that the processed signals are independent. This is in contrast with the noise cancelation method proposed by Widrow et al. [1] which requires that a reference signal be correlated exclusively to the part of the waveform (i.e., noise) that needs to be filtered out. This latter requirement entails specific a priori knowledge about the noise, as well as the signal(s). The blind separation of sources is valuable in numerous and major applications in areas as diverse as telecommunication systems, sonar and radar systems, audio and acoustics, image/ information processing, and biomedical engineering. Consider, e.g., the audio and sonar application where the original signals are sounds, and the mixed signals are the output of several microphones or sensors placed at different vantage points. A network would receive, via each microphone, a mixture of sounds that are usually delayed relative to one another and/or the original sounds. The network’s role is then to dynamically reproduce the original signals, where each separated signal can be subsequently channeled for further processing or transmission. Similar appli- cation scenarios can be described in situations involving heart-rate measurements, . communication in noisy environments, engine diagnostics, and uncorrupted cellular phone communications. Adequate models of the environment that include time delays or filtering are very important when the sensors are viewed as part of the environment. In many applica- tions where the sensors (e.g., microphones) include their own dynamics, the model of environment should include time-delays or filtering. Otherwise, assuming pure static models of the environment would result in highly sensitive and unrobust processing for any network. Indeed, in order to render the network operable in real-world applica- tions, robust operations must be ensured to parameter variations, dynamic influences and signal delays that often result in asynchronous signal propagation. Examples of such effects include filters in the case of an audio system, or delay lines and / or echoes as in the case of a radar system. The environment should be modeled as a dynamic linear (or even, nonlinear) system [2, 3]. The challenges for this area reside in the development of a mathematical analysis and framework to the problem of blind separation of sources. This is the angle from which this research has been approached: a review of the current literature has been conducted to study the various approaches to this problem, and a general framework to develop an update law for the network parameters based on optimization theory, the calculus of variations and higher order statistics has been proposed. The thesis is divided into eight chapters. An introduction of this work is presented in Chapter 1. Also included is the definition of the problem and some applications which are given to illustrate the need for a solution to this problem. Chapter 2 presents an overview of the literature. The pioneering work of Herault and Jutten is presented first followed by the work of Cardoso, Amari and Sejnowski and their coworkers. This review presents the different approaches that have been attempted in finding a solution to this problem, including neuro-biologically based findings, alge- braic method solutions and higher order statistics methods. This chapter summarizes a snapshot of the status of current approaches in the field. In Chapter 3, a solution to the problem in a static environment is developed based on the decorrelation condition of the output vector. However, decorrelation characterizes only second order statis- tics, which are not sufficient for solving this problem. Therefore, improved versions of the algorithm, built upon the works of Herault, Jutten and Amari, are developed, and computer simulations are given which validate the performance of these proposed algorithms. In Chapter 4, higher order statistics are introduced in order to develop an approximation of an energy function. The developed energy function approximation does not- assume that the output signals should have unit variance. The developed energy function also uses a higher order approximation of the natural logarithmic function. Such an energy function will require more computations. However, it is shown that the adaptive laws based on such an energy function will perform well, where other existing algorithms fail to perform the same task. In Chapter 5, the environment and network models are represented by a dynamical system described by a matrix transfer function. The existence of a theoretical solution to the problem is presented. Then, a state space realization of the network is developed. Such a realization is represented by the least number of parameters. This translates into efficient computations and would eventually results in reduced chip area in electronic implementation. The optimization theory and the calculus of variations are explored in this chapter to establish a general framework for updating the parameters of the dynamic network. Computer simulations show the performance and the limitations of the developed algorithms. In Chapter 6, a feedback structure of the dynamic network is considered. The derived update laws here are based on an extension of the static case. Computer simulations for the feedback structure are presented and discussed. In Chapter 7, the basic building blocks for the micro-electronic circuit implementation of the algorithm are developed. Consequently, the complete realization of a feedback neural network architecture with learning is presented and supported by PSPICE computer simulations. Finally, concluding remarks and directions for future work are presented in Chapter 8. CHAPTER 2 Literature Review In this chapter, a review of the present literature is presented in order to reveal the different approaches already considered in solving the problem of the blind separation of signals. These efforts have helped direct the path that the research has taken in tackling this problem. Some of these ideas herein will be explored as possible solutions and will be incorporated in the development of this work. A general overview is due, however, in order to give credit to some of the scientists who have contributed to the evolution of this exciting area of intelligent nonlinear signal processing. Yet, on the outset, we confess that our overview would not be comprehensive, and very likely, would overlook some portion of this revolutionary literature. The blind separation of signals approach, motivated from neuro-biology, was first introduced by Herault and Jutten in the late 1980’s. They developed an adaptive algorithm based on neuro-biological findings [4, 5, 6, 7]. Some theoretical analysis was later developed which provided some validation of the algorithm [8, 9, 10, 11]. This line of work, based on the neuromemitic approach, was further considered by Karhunen [12] and Chichocki and Moszcznski [13]. The studies on the blind separation of signals are based on the assumption that no a priori detailed knowledge of the unknown sources is available. However, the unknown sources are assumed to be (statistically) independent. Thus, to solve the problem one has to render the components of the output of the network statistically independent. This hypothesis will be the basis for the development of various energy functions, or sometimes called contrast functions or independence criteria. Despite the fact that decorrelation of the components of a signal vector is a weaker condition than independence, some researchers have considered such a necessary, but not sufficient, conditions to develop criteria and corresponding algorithms [14]. However, decorrelation describes only 2'“ order statistics of the output signal vector. Therefore, these criteria are valid only for a small set of input signals and higher order statistics need to be considered when developing more capable algorithms. Since the cross cumulants of independent signals are zero, several contrast func- tions have been considered in solving the problem. For example, when the inde- pendence is measured in terms of the cancelation of fourth order cumulants of the outputs, cubic nonlinearity, similar to that defined in [4], will appear in the update of the algorithm [15, 16, 17]. Cardoso, on the other hand, focused on the algebraic prop- erties of fourth order cumulants and considered the problem as a series of whitening and diagonalization processes [18, 19, 20, 21, 22]. These criteria are necessary, but not sufficient, as most of these conditions are constrained to some specified set of inputs. Therefore, the celebrated work of Comon [23] is considered an important framework. The mutual information is considered as an energy function. The mutual information is expressed in terms of the marginal probability density functions. However, knowledge of the densities is not accessible by the hypothesis of the problem. Therefore, an Edgeworth expansion was considered to approximate them. Amari et a1 [24] followed the same line of work by taking a Charlier-Gram expansion. Bell and Sejnowski [25, 26] had also considered the optimization of only one term of the mutual information, but had to resort to express it in terms of a nonlinear function of the output signals in order to capture the statistical information of the output. It should be noted that an important technical difficulty faces the identifiability of the solution to the problem of the blind separation of signals. Due to lack of informa- tion, regarding such items as the signal power, the spectral content or the modulation scheme, the output of the separating network cannot be ordered corresponding to the order of sources signals. Thus, the signals can be identified up to an indetermination in terms of scale and order. This identifiabity problem was first treated by Giannakis et al [27] who used third order cumulants, and, it was further addressed by Tong et a1 [28, 29, 30, 31] who used fourth order cumulants. Several algorithms have been proposed in the literature for separating signals based on the availability of prior spatial, temporal or statistical information. This direction of work defines another approach to blind separation because necessary and sufficient conditions of statistical independence are hard to satisfy. For example, the MUSIC [32] and ESPRIT [33] algorithms take advantage of the structural informa- tion of the channels based on the Vandermode matrix channel characterization to obtain an estimate of the parameters. Unfortunately, such structural information is not always available. Other algorithms that exploit the temporal structure of a com- munication channel, while assuming no priori spatial knowledge, have been proposed in the literature. These techniques consider the constant modulus property [34], dis- crete alphabet [35], self-coherence [36] and the finite alphabet property [37]. Other algorithms were developed based on a priori knowledge of the statistical information of the signals. When the sources have known probability densities, the maximum likelihood estimator is used to provide a solution to the problem [38, 39, 40]. So far, this review has focused on the case when a static modeling of the envi- ronment is considered. How about the case when the environment is modeled as a dynamic system? In this case, two lines of work can be described: digital versus continuous. Most of the studies in the literature have tackled this problem using FIR (finite impulse response) filters. For example, Moulines et a1 [41] considered the sub- space method to decompose the signal-noise space from the noise space to recover the unknown sources. Gerven and Compernolle [42] used a criteria based on the second order statistics while Thi and Jutten [43] considered criteria based on the cancela- tion of fourth order cumulants. In these works, the goal was to determine the FIR coefficients that separate the signals. Bell and Sejnowski considered an information- theoretic approach [44, 45]. In all these works, F IR’s were considered to model both the environment and the network. This thesis intends to address the problem by considering an environment and a network model that are described by a continuous linear dynamical system [46, 47], and to define a general framework for solving the problem. Several analog implementations of the blind separation algorithms have been re- ported in the literature. Vittoz and Arreguit [48] and Cohen and Andreou [49] have considered the implementation of the static HJ algorithm. On the other hand, Gharbi and Salam have considered an analog implementation of the dynamic HJ algorithm [2, 50, 51, 52, 52, 47]. 2.1 Neuromemitic Algorithm Herault and Jutten have pioneered a new paradigm in the area of blind separation of in the Ph.D. work of Jutten under the supervision of his advisor, Herault [4]. The problem was labeled as the Independent Component Analyzer (ICA) [4, 6] because of its similarity to the Principal Component Analysis [53, 54]. In [8, 9, 10, 7, 5], Herault and Jutten proposed an algorithm for the separation of independent sources. It was assumed that the medium is linear and static. The inputs to the network were the measured signals eg(t), 1 S 1' _<_ n, which are linear combinations of the original signals, namely, _., e:- : 2;}:1 (15ij y.- = 31' — ngéi dijyj Figure 2.1. Herault and Jutten Architecture «(1) = iat- 3,.(1) (2.1) i=1 which, in vector form, can be expressed as e(t) = A 8(1) (22) A is an n x 11 matrix whose components 0,3 are unknown. The matrix A models the mixing static environment and is assumed to be a nonsingular matrix. Furthermore, for normalized mixing, its diagonal entries are all ones, and each off- diagonal element is less than one in absolute value. Herault and Jutten used a recursive architecture made up of fully interconnected outputs. Each output, y,-(t),l S 1' S 11, received the mixed signal, 6,-(t), and a weighted sum of all other outputs, — 2,,“ dij y,(t). Thus, y,-(t) = e;(t) — gdij yj(t) 1 S i S n (2.3) which, in vector form, becomes 3’0) = 8(0 - D 3’0) (2.4) D is an n x 11 weight matrix whose main diagonal is zero. Now, the problem of separation of signals translates to retrieving the original signals s(t). In the limit, it is thus desired to have: y(t) = P 8(t) (2.5) 10 where P is a generalized permutation matrix; a matrix that is obtained from a non- singular diagonal matrix by row and / or column permutation. Herault and J utten were motivated by some neuro—biological evidence that infor- mation about the speed and position of body joints is mixed before being sent to the brain by two different types of nerves. The brain, however, is perfectly capable of separating (and recovering) speed and position signals. This biological and intu— itive inspiration led Herault and Jutten to propose an update law, reminiscent of the Hebbian rule, that presumed the independence of the original signals do = nu f (yalgfyj) i #1 (2-6) where f () and g(.) are two nonlinear odd functions and 17,-,- is a constant learning rate. Despite the fact that their original idea came from a neuro-biological inspiration, the authors supplied an initial mathematical reasoning for the update law. 2.1.1 Derivation of The H-J Update Law In [10], the authors have introduced a justification of the developed update law defined by (2.6). The work of the authors and some issues that may not have been considered during the development of their algorithm will be presented. This algorithm was developed by assuming that the network is near convergence. This means that the first (11 — 1) outputs were assumed to have converged; y,- = 0553;, V1 < n. Then, the last output can be expressed as: yn = 6.. — Zdnj 117 (2.7) #11 n—l = Zam- 31' - Z d",- yj (2.8) .7 j=1 n-l = 2 (Gui - d,,,-a,,-) 81' + annsn (2.9) :‘=1 11 Since the sources 37’s are independent, the cross correlation between two different outputs is null. Therefore, the output power of y1, can be expressed as: n-l < 313, >= 2: (anj — dnjajj) < 8? > + am, < 83, > (2.10) i=1 Thus in order to minimize the power of yn, it is required that an,- — dnjaJ-J- = O, Vj 94 n (2.11) In this case, the power of the 11.“ output yn(t) is proportional to that of 3,,(t). This above relationship is true only for the nth output assuming that all other (11 - 1) signals are at the desired sclution. The authors proposed to develop the algorithm based on the principle of mini- mizing the individual output powers using the gradient descent method which will be presented in the next section. However, minimizing the individual powers of the outputs does not lead, necessarily, to the minimization of the total energy function, which is the sum of all individual powers. They assumed that the only contribution from the total energy to a particular parameter dij comes only from the corresponding output signal 77,-. Later, in this work, a different update law obtained from the sum of all individual output powers will be considered as an energy function for the problem. Due to the fact that the algorithm developed from this energy function is merely a decorrelation of two output signals, it fails to separate sources. However, including the nonlinearity functions f and g in (2.6) have provided some improvements. 2.1.2 Gradient Descent Method As was discussed earlier, the authors in [10] proposed that the network parameters dij would be updated using the gradient descent method in order to minimize the 12 output power of the individual signals defined as E,- = y? (2.12) NIH Thus the parameters were updated as follows: _E;1'a_ a_yi d,- = — , 2.1 J ’7— adij- -775, a_dij ( 3) It is given that ym = em " Z dmk g]: (2.14) k¢m = 8m - Z (1 - 6mk)dmk y): I: Now, differentiate both sides of the above equation with respect to (1,,- in order to obtain the expression for fiym/adgj: 311m By Evian. 5111/1. (1—61m)2dm1 8730-6...) (215) Rearrange the above equation to obtain: a 1‘ 2(6... + d... (1— 6.5)) a—j" = 6....- (6.1—1):” = v1.2. (2.16) I: '1 Let Q = (1 + D)-1 ' (2.17) and the previous equation becomes 33’. _ m 13 Therefore, aym _ 6y adij " adij m (2.19) = [Q 1'3”]m (2.20) = quk ”3(1) (2.21) I: = Zkayj 5“ (51¢ - 1) (2.22) I: = qm. 15 (6.1- 1) (2.23) Consequently ' all; do - '7) y. 67,; (2.24) = ”7911 (511' - 1) yr yj (2.25) = '1 (ii; 311' yj (1 - 511') (2.26) 2.1.3 Evaluation of the update law The update law defined in (2.26) always converges to a symmetric solution matrix; thus limiting the structure of the environment to a symmetric matrix. However, the general environment cannot always be modeled by a symmetric matrix. This represents a drawback of the update law defined by (2.26). Also, when equation (2.26) is expressed on average as < a,- >= 7) (111 < y.- yj > (511' - 1) _ (2-27) this rule tests only the decorrelation of the output signal g(t). However, the goal is to develop one update law which pushes for the independence of the components of g(t). To do so, the rule was modified in order to accommodate higher order moments by imposing a nonlinear function which produced various moments of the output, as 14 follows: €1.11 = '7 (Iii f(ya') g(yj) (1 - 51'1") (228) where f and g are two odd functions. These two functions should be different. Oth- erwise, the matrix D becomes symmetric. The use of such two functions introduces higher- order moments. To illustrate this, consider the Taylor series expansion of these two odd functions as f(3) = 202k“ 50%“ and 9(3) = :52!“ 32’“ (2-29) I: 1 Thus, the equilibria of equation (2.28) on the average satisfy < 61:3 >= We." 202k+1521+1 < 3112 H1 y?“ > = 0 (2-30) k,i ASsuming c127,.“ and [321.11 are not zero, this results in < 1.2"“ 113’“ >= 0, V k, I (2.31) This condition means that all joint odd moments are zero. This condition is stronger than correlation. However, it is implied by independence of the components of the signal vector y(t), when these signals have even probability density functions. As a last approximation, the authors used the assumption that the diagonal entries q;,-’s were very close to one, since the off-diagonal entries c,-,-, 1' 79 j, were less than one. Thus, the proposed update law becomes as defined by equation (2.6). 2.1.4 Computer Simulations Computer simulations were performed for the H-J algorithm defined by (2.6). The unknown sources are two sine waveforms with respective frequencies lkHz and 2kHz. 15 The environment mixing matrix is assumed to be 1.0 0.6 0.4 1.0 The learning rate is taken to be 17 = 100 and the initial conditions are all zero. Figure 2.2 shows the performance of the algorithm when the odd functions f and g are respectively the cubic and the linear functions. f(x) = x3 and g(x) = 1' One observes that the off-diagonal coefficients of the D matrix converge to the desired values. In Figure 2.2, the D matrix converges to * 0.0000 0.4004 D, = 0.6004 0.0000 In addition, Figure 2.2 displays the performance index, which is defined next. Performance Index In the problem of blind separation of signals, one desires to design a network such that its output is a replica of the unknown sources. Because of the lack of knowledge of the unknown sources and the mixing matrix, one does not expect to completely identify the unknown sources. Therefore, the order of the components of the output cannot be determined, nor is their corresponding magnitudes. Consequently, one can identify the unknown sources up to a permutation and a scaling. This is defined as the wave preserving property [28]: 16 Definition 1 (wave preserving property) The signals y(t) and s(t) satisfy the wave-preserving property if and only if y(t) = Ip I‘ s(t) = Ps(t) (2.32) where I p is a permutation matrix and I‘ is a diagonal matrix, and G is the general- ized permutation matrix. The definitions of a permutation matrix and a generalized permutation matrix are given in Appendix A. In the case of the H-J work, the network input-output relationship is described by equation (2.4). For the sake of generality, let’s assume that such a relationship is represented by y = We (2.33) Thus, considering equations (2.32) and (2.33), one obtains P = WA (2.34) So, at convergence, the gain matrix WA, which is the product of the matrices W and A, has to be a generalized matrix in order to claim that the network converged to a solution that separates the mixed signals. The following mapping I: Rn“ —-»R+ P 1-—+I(P)=Z[Z——— ——'—p"' —1] maxklpacl + Z [Z—— I?“ —1] (2.35) maxi: lij l 17 is always positive and is zero if and only if the matrix P is a generalized permutation matrix. Therefore, this mapping will be considered as a performance index of any given algorithm for blind signal separation. Its plot, versus the time of evolution of the algorithm, represents a measure of the convergence of that algorithm. Thus, one clearly observes that the performance index, shown in Figure 2.2, ap- proaches zero and consequently the gain matrix P = WA becomes 1.0002 -0.0006 -0.0005 1.0004 Figure 2.3 displays the response of the network after convergence. The output signals of the network, y,-(t), are approximately the unknown sources, s,-(t). There is an error between the input and the output of few milli units. Computer simulations were also performed using different types of odd functions. In this case, f(x) = sinhx and g(x) = tanh :1: Figure 2.4 and 2.5 show the performance of the algorithm defined (2.6) using these odd functions. The settings of this simulation are exactly the same as for the ones above, in terms of initial conditions, learning rate, mixing matrix and unknown sources. When, the training time is equal to 0.lsec, it was observed that the network did not converge yet. Therefore, a longer training time, namely 0.2sec, was allocated. In this case, the parameters converged to the near desired values as shown in Figure 2.4. Also, Figure 2.5 shows that the output represents a near replica of the unknown sources When computer simulations where performed for the algorithm defined by equa- tion (2.26), the network converged to a symmetric matrix as it was anticipated in the 18 discussion of the algorithm development. Figures 2.6 and 2.7 show the performance. If one considers deriving the update law based on minimizing the total energy of HJ Algorithm Phase Plot Perionnance Index 0 0.05 0.1 0 0.2 0.4 0.6 d1 2 0.5 - 0.8 - 0.6 ----- £30.4- 0.2 ’ 0 0.05 0.1 0 0.05 0.1 Figure 2.2. Performance of H-J Algorithm. The parameters are updated according to ds = n y? 10' the signal or any other energy function of the form t = Z 4501?) (2'36) NIH one could use equation (2.23) to obtain 01.1 = ”72 ¢'(ym)quma y,- (511' - 1) (2-37) 19 2' 2 '5 ol 9‘. 01 -01 0.1005 0.101 0.1015 0.102 -01 0.1005 0.101 0.1015 0.102 2' 2 '5 0 ‘3 0 _. . = ~ - -2 - ~ - - 0.1 49.1005 0.101 0.1015 0.102 0.1 3.1005 0.101 0.1015 0.102 x10 x10 1' 1. v- N E 0 E 0 0 Q —1 L L 4 ‘ -1 ‘ ‘ ‘ 4 0.1 0.1005 0.101 0.1015 0.102 0.1 0.1005 0.101 0.1015 0.102 Figure 23 Performance of H-J Algorithm. The parameters are updated accord- ing to d5,- = 17 y? y,-. The solid curve is the unknown sources. The dotted curve, superimposed onto the solid curve, is the network output 20 HJAlgorithm Phase Plot ........... 0.3 0.6» 570.4 0.2 0 g A . O 0.2 0.4 0.6 d12 d12 0 0.05 0.1 0.15 0.2 0 0.05 0. 1 0. 1 5 0.2 Figure 2.4. Performance of H-J Algorithm. The parameters are updated according to dgj = 1] sinh y; tanh yj 21 2I '13 0! OI -1 A v A = v 2 -5 A A A 2 0.2 0.2005 0.201 0.2015 0.202 .2 0.2005 0.201 0.2015 0.202 2 2 '5 0 ‘3 O - A A A —A -2 A A = A 0.2 _p.2005 0.201 0.2015 0.202 0.2 _p.2005 0.201 0.2015 0.202 x 10 x10 5' 5. I ‘1‘ 2 ° 8 ° 0 0 ’52 0.2005 0.201 0.2015 0.202 '02 0.2005 0.201 0.2015 0.202 Figure 2.5. Performance of H-J Algorithm. The parameters are updated according to d;,- = 17 sinh y,- tanh 77,-. The solid curve is the unknown sources. The dotted curve, superimposed onto the solid curve, is the network output 22 HJ Algorithm [0 _L 01 Performance Index O 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.05 0.1 C) 0 Figure 2.6. Performance of H-J Algorithm. The parameters are updated according to do = n as 111' 311 23 2. a a 3 9", 01 U U _0.1 0.1005 0.101 0.1015 0.102 2' 27 S 0 ‘3‘. 0 .01 0.1005 0.101 0.1015 0.102 -01 0.1005 0.101 0.1015 0.102 0.2' 0.21 "' ‘1‘ 8 ° 2 0 Q 0 . -0.2 A A A A _O. A A A J 0.1 0.1005 0.101 0.1015 0.102 0.1 0.1005 0.101 0.1015 0.102 Figure 2.7. Performance of H-J Algorithm. The parameters are updated according to d,-,- = 17 q“ y,- y,-. The solid curve is the unknown sources. The dotted curve, superimposed onto the solid curve, is the network output 24 2.2 Mutual Information Approach In [24], Amari et al. considered the following feedforward architecture for the blind separation problem. They used the Independent Component Analyzer (ICA) frame- X 3(1) e(t) \K W) __., A : \_. Figure 2.8. Feedforward Architecture work formulated by Comon [23] which is based on minimizing the dependency among the output components. This dependency is measured by the Kullback-Leilber diver- gence between the joint and the marginal probability density functions: _f_y(Y)id (2.38) Equation (2.38) is minimum and is equal to zero only when all the components of the output vector,namely y,-, are statistically independent. The averaged mutual information can be expressed in terms of entropy as 1(1') = -H (1') + X H (31,-) (239) where H (y) is the entropy of y which is a measure of uncertainty of the occurrence of the event produced by y is defined in Appendix A. When using the mutual information as an independence criterion, partial knowl- edge of the statistical information of the output is needed, since the chosen inde- pendence criterion is a function of the probability density of the output. To surpass that constraint, Amari [24], as well as Comon [23], used a truncation of infinite series 25 expansion of the probability density functions. Unlike Comon, who used Edgeworth expansion, Amari completed a Gram-Charlier expansion to approximate the proba- bility distribution of the output. The fourth-order Gram-Charlier expansion of the f in (311') IS 541' f1. (1.) ~ f..(y.-)— — a (3,.) [1 + —33,‘1Ha(y.-) +— 12(4)] (2.40) . _ 1 where 113,- = 1713;, 1:4,- = m4,-—3, mk; = E[y,’°] IS the kth moment of 77;, 0(2) = 715178 '57, and H7,(x) are the Chebyshev-Hermite polynomials defined by (—11)*-"—— :0”) = Hk(x)o(x) (2.41) It is assumed here that the variance term is unity, i.e. mzi = 1, W (2.42) A second approximation of the natural logarithmic function was used in order to be able to compute the integrals 2 ln(1+ x) z x — 22— (2.43) Assuming that the weighting matrix W was nonsingular, then the following would hold _ fx(x) fy(1') - ,W, (2.44) Therefore, H (y) can be rewritten as H (3') = Elln |W|l - Elln fx(x)l = H (X) + Elln |W|l (2-45) 26 Thus, I(y) = 2H0.) -E[1n|Wl] -H(x) (2.46) Using the above approximations and the gradient descent method to minimize the mutual independence, the authors arrived at the following update law [24]: W = nlW‘T - f (3')le (247) where _311 2_59__1_‘£7_‘_115 E3 f(y)—4y +411 3y 411+,y (2.48) This function is plotted in Figure 2.9 along with other functions that have been considered in the literature. However using the information theory perspective [55] and assuming the mixing matrix to be nonsingular, the above update law will be rewritten as W = n[1 — 1001”] W (249) Computer simulations of this algorithm were performed. The unknown sources are assumed to be two sine waveforms with respective frequencies 1H2 and 2H 2. The mixing matrix is chosen to be random 0.7012 0.7622 0.9103 0.2625 The learning rate is 17 = 0.1 and the random initial condition 0.0475 0.3282 0.7361 0.6326 0: 27 Figure 2.9. Nonlinear functions According to such a setting, the initial gain matrix is 0.3321 0.1223 G0 = WoA = 1.0920 0.7271 It is clear that the gain matrix, Go, is not a generalized permutation matrix. So, the goal is that the network would update its parameters such that the gain matrix becomes a generalized permutation matrix. Figure 2.10 shows the performance of the algorithm using the settings described above. One can observe that all the parameters of the matrix W converged to —0.6835 1.9155 2.1673 —1.6387 28 which corresponds to the gain matrix 1.2644 —0.0183 C?f===TV}/i== 0.0280 1.2218 2nd odor, unitvatiance using ("), iwosine inputs 2.5 - N Pedonnancelndex .. in 0.5 0 W+W 0 50 60 7O 80 90 3 1.5 < §’ 3 x g. E d g 2 § 8 .E g a a, 8 3 s -05 r ‘ ‘ r 0 20 40 60 80 Figure 2.10. Parameter convergence of the algorithm defined by equation (2.49) It is very important to note that the gain matrix converged to a matrix differ- ent from the identity, unlike the algorithm defined by H-J, which was discussed in the previous section. The reason is that Herault and Jutten had assumed a special 29 architecture of the environment by taking the diagonal of the A matrix to be one. Thus, when developing the update law, they did not consider any law for the diagonal elements of the network matrix. Therefore, the network converged to the exact in- verse of the environment model. By doing so, the identifiability issue was eliminated. However, in the work of Amari et al, no such special structure is assumed. Therefore, the problem is not completely identifiable. Consequently, one should not expect to obtain the inverse of the mixing matrix as the solution to the problem. The authors’ main contribution is their analytical derivation to obtain a nonlinear function f () which previously has been chosen in an ad hoc manner in the literature. However, only a 2nd order approximation of the logarithmic function was considered in which the derivation and output signal were assumed to have unit variance (2.42). This certainly simplifies the computations. But, it does not generate the correct nonlinear function, since these output signal variances will contribute to the higher- order terms. Therefore, in Chapter 4, a new energy function that is based on 3"1 order approximation of the logarithmic function, with no unit variance assumption, will be derived. A justification of this derivation will become apparent in the chapter. Consequently, an update law based on the proposed energy function will perform certain separation tasks, whereas the update law defined by equation (2.47) and (2.49) will be shown to fail. 2.3 Information Theoretic Approach The authors, Bell and Sejnowski [45, 45, 25], considered a feedforward neural network described by Figure 2.11. y is the output of the neural network, x is its input vector, W is the parameter matrix and b is the bias vector. In their work, the authors considered a criterion that maximized the information 30 \ s(t)—. A x(t) \W\ u(t) .—_. ¥ Figure 2.11. Bell and Sejnowski’s Architecture transfered through the network, which is defined by _ n fxyl x 3’) 1‘”) ‘ Ell ”11.011. (2'50) Using the properties of the entropy given in appendix A, one can express equation (2.50) as 1(x,y) = H(y)+H(X)-H(X.Y) (2-51) 01' I(X.)’) = H(Y)-H(YIX) (252) H (y) is the entropy of y and H (ny) is the entropy of y not generated from the input x. The entropy of y is defined as: +00 H(y) = -E any(Y)] = - (.0. Mr) lnfy(y) dy where fy(y) is the probability density function (pdf) of y. These author state that H (ylx) does not depend on W. Thus, maximizing the information transfered through the network was equivalent to maximizing the entropy of y. 01(X.y) = 3116') 6W 6W 31 Assuming that the nonlinearity function g is invertible, then the following will hold _ fx(x) f!(y) - IJI where [J I is the determinant of the Jacobian matrix and J = 3%. So H (y) can be rewritten as H(Y) = Elln |J|l - Elln fx(X)l = H(X) + Elln lJll Thus, 3__H(Wy) =E[_3_a__ln |J| Consequently, the parameters will be updated according to 61” IJ'] (2.53) AW: 178E[ Bell and Sejnowski suggested an approximation by dropping the expected value: _a___1n|J| AW=17-—a——W (2.54) To compute the gradient, a compact form for In [J I must be found. [6.1/L; =g—::= g’(us):—:—;= 9’(U:)w0= [Ag ’(u)WL Thus, [J I can be expressed as IJ I = lAg'(u)Wl = lAg’(u)l-lWl = IWI H9110) (2-55) 32 where Av = diag(vl, ~ . - ,0“). In |J| = ln |W| + Zlng'(um) Therefore alnlJl 610.,- 17[— 1 ('9_|__1’V|+ 1 32011)] |W| away-+9111.) 810,-,- (1:1))3] since |W| = ZWiJ'COCfiJ' j Ang = = "[ |W| _ -r 9 _((_g”: :1“ xT — ,7 IW+1 ij In matrix form _ -r 9_(__”“) xT AW—r)[W+ 97,—“) ] In particular, if Then AW = 17[W‘T + (1 — 2y) :8] (2.56) (2.57) (2.58) (2.59) (2.60) (2.61) (2.62) (2.63) Computer simulations were performed in order to study the update law defined by (2.63). Two prototype sine functions of respective frequencies 1kHz and 2kHz were used as unknown sources. Computer simulations of various mixing matrices, initial conditions and learning rates were performed. They all revealed that the algorithm failed to separate signals. Figure 2.12 shows one example of its performance. Observe that the performance index did not converge to zero and that the gain matrix P = WA 33 is not a generalized permutation matrix. One could justify the failure of the update law defined by (2.61) because it is an approximation of (2.53) in which the expected value was dropped. 9° cl 0 Performancelndex N 4 UI M (II (0 0| h o o N o h o a: .0 on d d to A A d ca d m N N to have?» d all l N weight convergence 0 0 0° 2 Generalized Matrix WA 0L Figure 2.12. Performance of Bell and Sejnowski Algorithm In their work [25, 45, 26, 44], the authors did not simulate the update law as it is defined in (2.61). Instead, they considered the following algorithm. Given n mixture defined in a time interval [0, T], the authors first defined: 56) = (1 - 2Y(t))X(t)T 34 Table 2.1. Information Theoretic Algorithm f Step Action 1 Pick a random time point t,- such that t,- + 1' S T. 2 Compute AW.- = £22? 6(t) 3 Evaluate I/VfT 4 Compute AW.- = l/VfT + AW} 5 Update the weights Wi+1 = “I; + 17AM 6 Compute performance index P.- 7 While P.- > Paging), go back to step 1 Then the weights were updated according to the steps given in Table 2.1. Computer simulations of this algorithm were conducted using speech and pro- totype sine signals. The algorithm performed well when speech signals were used. It was able to separate mixtures of two, three and five speech segments of different speakers. Figure 2.13 shows one example of its performance. However, the algorithm failed to separate prototype sine signals. Observe Figure 2.14. One undesirable feature of the algorithm developed by the authors is the choice of a random time t,- at every epoch. The authors claimed that such a choice of t; would guarantee the input stationarity assumption. However, this choice makes the algorithm implementation in digital signal processing cumbersome since it requires the storage of all the data in memory. Consequently, the algorithm does not perform in real time. Most applications of the blind separation of signals require real time performance. Thus, the algorithm, as it is, is not attractive and needs to be modified. One other undesirable feature of the algorithm is the computation of the inverse of the weight matrix W. By analyzing the steps of the algorithm closely, one concludes that the computation of the inverse occurs after all the weights were accumulated. 35 Sejnowski algorithm using 2 speech inputs of 2 different speakers 3.- £2.5* — 2 g... .g 1 0 °-o.5 O l L l l L 1 J 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 10' 8r < i 3 5’ g6 .. E E o 34 .. k g 5... _ _ g. 3 0 (D -10 - ‘ - o 0 2000 4000 6000 0 Figure 2.13. Performance of Bell and Sejnowski Algorithm Using Speech Signals 36 Sejnowski algorithm using 2 sine inputs of 1kl-Iz and 2kHz 4- x3.5 E g 3 25 '3; a. 2 1.5 l 1 l 1 l 0 50 100 150 200 250 300 350 400 450 4- 3 < i 2 32.5 E 2 s 2 ~ § 0 315 .5 52 . ‘ N c$0.5 -4 4 0 ‘ ' 0 200 400 600 0 200 400 Figure 2.14. Performance of Bell and Sejnowski Algorithm Using Sine signals 37 This means that the input signals are frozen until such computation is completed. Thus, the algorithm is not suitable for real time performance as is. One should reconsider the way, how and/or when such an expression is computed in order to make the algorithm perform in real time. One solution would be to compute the inverse while accumulating the weights. This means that the operation of weight accumulation and inverse computation would be performed in parallel. As a final remark, the algorithm defined by (2.63) is based on the maximization of the information transferred through the network, defined by equation (2.52), with appropriate choice of the neuron’s nonlinear function. One can interpret this devel- opment based on the minimization of of the first term of the mutual information defined by equation (2.39), namely the entropy of the output vector -H (y) This is equivalent to the maximization of H (y) 2.4 Algebraic Approach In [21, 56, 57, 22], Cardoso et al. proposed a series of two processes as shown in the Figure below to achieve blind signal separation. The mixing matrix W was factored out as the product of whitening matrix B and an orthogonalizing matrix U; W = U B. Algorithms for updating the matrices B and U were developed, then combined to obtain a one-stage update for the matrix W. \ 8(t) , X(t) B\ 26 ya) \ \ \_ Figure 2.15. Cardoso’s Architecture 38 The matrix B was updated such that the output vector 2 was white, i.e. R, = I where z = Bx. This was obtained by minimizing the distance between the matrices R, and I. Thus, the appropriate error function was the Kullback-Leibler divergence [58] defined as 135(3) = Trace(R,) — In |R.| - n (2.64) The matrix B was then updated along its gradient descent: 6E5 AB = "my? : —1][zzT — [)3 (2-65) The matrix U was updated such that Eu(U) = E f (y) = E f (U 2) was minimized. Since this minimization was not constrained and may lead to a solution that did not preserve orthogonality, the matrix U will be updated according to AU- - “ aE‘T]— [i’ T ’ U 266 —-25€-3U —-nf()')y -yf(y)] (- ) By combining the whitening and the orthogonality stages, Cardoso et al obtained an overall update for the matrix B: AW= n[I-ny—f'(y)yT+yf’(y)]W (267) Computer simulations were performed to study the update law defined by (2.67). When the network dimension was two, the algorithm was able to separate signals regardless of the initial conditions and the learning rate 1]. Figure 2.16 presents an example of performance. When developing the algorithm described by (2.67), the authors considered a Eu- ler approximation of the gradient of the considered energy (contrast) functions. The developed algorithm is suitable for digital implementation. Here, I will consider its 39 Digital Update CPU Time a 74.85 4 .. 3 .- 2 .- 1 .- 0 l l 1 ‘ l l l l 1 I 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 2' _ 1.5, < 3 3:: 1 ’ (U 2 0i ‘3 0.5 .5 E \/ 6 °’ (9 * ‘ ‘ ‘ -0.5 ‘ ‘ ‘ J 0 0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2 Figure 2.16. Cardoso Algorithm for n =2 40 Digital Update. CPU Time . 199.2 8 - X § 6 ~ g 4 _ 1% 2 - n. o l l l 1 l l l L I l 0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5 3 . 2 . k < E 1.5 . ‘5 o 1 2 1 E, g a g 0.5 -1 0 (D -2 - . ‘ o - - 0 0.2 0.4 0.6 0 0.2 0.4 0.6 Figure 2.17. Cardoso Algorithm for n =3 KL! r—1 41 continuous-time realization of the algorithm, investigate its performance and com- pare it to its discrete-time counter part as defined by (2.67). The continuous-time algorithm is now defined by the instantaneous update equation W -—- n[I—ny— f’(y)yT +yf’(y)]W (2.68) Starting from the same conditions, computer simulations for both realizations were performed. A comparison of the cpu time taken by each algorithm is considered. It can be concluded that the discrete-time version of the algorithm took more time than its continuous-time counterpart when one compares the 74.85sec that the discrete- time algorithm took compared to the 50.93sec for the continuous-time algorithm. Examples of simulations for a two dimensional network are presented in Figures 2.16 and 2.18. However, when the network is of dimension 3, the algorithm converges, but not to an acceptable solution. Observe in Figures 2.17 and 2.19 how more than 3 entries of the generalized permutation matrix converge to non-zero values. Thus, the solution is not acceptable. One concludes that the algorithm developed by Cardoso may not work for networks of dimension higher than 2. 42 Contineous Update CPU Time c 50.83 “AA‘AAMAA‘AAAAAAAA.AA % 0.08 0.1 0.12 0.14 0.1 6 0.1 8 0.2 2' W 1.5- 7 " < 3 1. g 1’ g E g o g 0.5 #v “:__J:*“ 2.2 ‘7 u E -15 g o- ..r :“‘.‘:..r if“: r “ o o -2 - - ‘ - —o.5 a - - - o 0.05 0.1 0.15 0.2 o 0.05 0.1 0.15 0.2 Figure 2.18. Continuous-time Algorithm for n =2 43 Contineous Update CPU Time = 319.7 0 1 Performance Index to a l l l I I I J 0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5 s 2 ~ < 2 3 g 1.5 » 1 fl 9 5 1 § 0 E s g g 0.5 -1 o o -2 A = - o A - o 0.2 0.4 0.6 o 0.2 0.4 0.6 Figure 2.19. Continuous-time Algorithm for n =3 CHAPTER 3 Blind Separation in a Static Environment In this chapter, a blind separation algorithm in static environment will be devel- oped. The mathematical analysis to develop adaptive laws for the problem will be presented. The law will be derived based on the nonlinear decorrelation condition of the output. The resulting update laws will then b to test for independence of the output. Computer simulations will be presented to demonstrate the performance of these novel algorithms. 3.1 Problem Definition and Architecture . The problem can be posed as follows: given that some unknown sources are sent from unknown sources, these sources are mixed according to some unknown model that describes the medium through which these sources have traveled. The goal then, is to construct a system that recovers these unknown sources based only on the measurements of the mixtures of the original unknown sources. It is also assumed that the environment is a static model. It can, therefore, be represented by a static matrix to characterize its input-output relationship. Figure 3.1 describes the architecture of 44 45 the system. The unknown sources, the output of the environment and of the network are respectively labeled as s(t), x(t) and y(t). \ 8(t) x(i) \K W) , A - : \__ Figure 3.1. Static Environment Architecture 3.2 Theoretical Solution Given the measured vector x(t), which is the only given data, the neural network should be constructed to adaptively change the parameter W so that its output g(t) and the original signal s(t) will satisfy the wave preserving property which was defined section 2.1. This property implies that one is not after an exact replica of the unknown, but rather a permutation and up to a constant of them. Now, assume that the correct update law of the problem is developed and the system converged to a solution W‘. Then, y(t) = W'x(t) = W‘As(t) (3.1) Therefore, by combining equations (2.32) and (3.1), one obtains W“ A = P r (3.2) Recall that P and I‘ are respectively a permutation and nonsingular diagonal matrix. 46 Equation (3.2) can be rewritten as W" = P I‘ A"1 (3.3) This equation shows that a theoretical solution to the problem does exist and that there are infinitely many possible solutions, thanks to the freedom that the matrices P and F provide. Also, equation (3.2) requires that the gain matrix G = WA is a generalized permutation matrix. Knowing that a theoretical solution exists, one can now investigate possible ways to construct a network 'with appropriate update laws that perform the task of sepa- rating signals. To do so, an appropriate energy function is defined next. 3.3 Energy Function To develop an update law, an energy function that characterizes the problem is now defined. If it were a recognition or a classification problem, the energy function would have been chosen as an error function; a difference between the desired output vector 3(1) and the observed output vector y(t) defined as: «((2.6) = $- / 26.6) — y(t))” dt (3.4) This criterion will, therefore, require the knowledge of the target signal. However, by assumption, there is no a priori knowledge of the signal vector. In this problem, it is expected that the network will reproduce, regenerate, and recover these sources without any direct knowledge of them. Thus, such a traditional energy function will not be appropriate. On the other hand, some kind of knowledge about the original signals is assumed: they are independent. This feature or criterion will be the basis for defining the 47 energy function analogous to the one defined by (3.4). In order to define the energy function that fits the problem at hand, one needs to rely on the definition of the independence of signals [59]. Definition 2 (Deterministic Independence) The components of an n-dimensional time-varying signal vector y(t) = [y1(t) - - -yn(t)]T are linearly independent over a time interval [0,T] if and only if their auto-correlation matrix is positive definite for all time t E [0, T] t Ry(t) = / y(r) y(r)T dr is positive definite for allt S T 0 Based on this definition, one possible energy function could be based on identify- ing necessary and sufficient conditions for the autocorrelation matrix to be positive definite. Theorem 1 defines this characterization [60]. Theorem 1 (Positive Definite) If a matrix M is positive definite, then [M] S fi my; (3.5) k=1 with equality if M is diagonal. The operator |M| denotes the determinant in the case of a square matrix M. The proof of this theorem is arrived at by induction and shown in appendix B.l. Using Theorem 1, one can develop a criterion, or energy function, that is always positive, but is minimum when the matrix is diagonal. To develop such a function, some equivalent statements of Theorem 1 will be presented. Let M be a positive definite matrix. Then, 11 M is positive definite => H m“; 2 [M I Ic=l => lnHmkk ZlnIMI i=1 48 =2 Zlnmkk—lnIMI 20 k=1 with equality if and only if M is diagonal. Define the mapping J: Rm“ —-+R+ M l-—> J(M) = %[kz:: lnmkk - In [MI] (3.6) When the energy function defined by (3.6) is minimized, one would obtain a diagonal matrix. Thus, this energy function will be used in this problem by considering the autocorrelation matrix, since the decorrelation of the components of the output signal vector is obtained when the autocorrelation matrix of the output vector is diagonal. In the work presented in this chapter, an update law that pushes for the decorrelation condition only at the first step will be developed. Then, some techniques found in the literature [8, 24] to push for the independence condition will be used. Also, different numbers of measurements and sensors will be considered. Thus, the matrix W is an n x m where m 79 n. A special form of the update law for the different scenarios of the relationships between the number of sensors and measurements will be given. The autocorrelation matrix of the output signal vector is defined as 12,6) = / y(rmoT dr =< M >. (3.7) Thus, using the mapping defined by equation (3.6), the energy function will be defined as =J(R,(t) =%]k§::lln,—ln|: I] (3.8) 49 The matrix W will be updated along the gradient descent of (I) . 6(1) W = "lo—n? _ [_ 1 6|,|_1i 1 6t "2|,| 6W 2k=1t 6W 3.4 Update law derivation First compute all): 6ng a < 3113 >t 6ng 2t However, = Z wuss: 1 Therefore, 3 3 yk =26 whiz =Z6ki5kj$l= 51:69:,“ 310,-, r awe-2 Consequently, 'a < 2 > ——!"-—t- = 261,.“ < ykxj >¢ 6ng So, 1 n l a < y)“; >1 . _ = — ——26 - < :c- > 2gg 6W 2§<—_1_2yk>¢ h 311:] t < ll;2 >: = [(diag < ny >t)-l < yxT >¢] .. U (3.9) (3.10) (3.11) (3.12) 50 The other term a|. | _ 8|tI 6W _ 6W (3.13) In [60], it was derived that for any square matrix M and any matrix X such that X M X T is nonsingular BIXMXTI ox = IXMXT|](XMXT)"XM+(XMXT)‘TXMT] In addition, if M is symmetric, equation (3.14) becomes T M—X—l = 2|XMXT|(XMXT)-1XM 3X Applying equation (3.15) to equation (3.13) alrl 3W = 2 < |WxxTWT|(WxxTWT)-1WxxT >. = 2 < Inyl(ny)"yxT >¢ Plug in equations (3.12) and (3.16) into (3.10), and one obtains T|(ny)"yxT > < IWTI >: . < . _ W = 17] Iyy t—(dzagt) l¢] If W is a square nonsingular matrix, then (3.17) becomes W = q[W"T — (diag < ny >.)'1 < yxT >.] = 17]] — (diag < ny >t)‘l < ny >,]W‘T = n[I —(diagR,)‘1Ry]W'T (3.14) (3.15) (3.16) (3.17) (3.18) (3.19) (3.20) 51 At steady state, equation (3.19) becomes W = 0 I = (diag < ny >1)“l < ny >1 < ny >¢ = diag < ny >1 < yiyj >1: 0,Vi géj {111111} the components of the signal vector y(t) are decorrelated. Thus, the algorithm defined by equation (3.19) tests only for decorrelation. In order to compute the weight update according to equation (3.19), the time average < ny >1 defined by equation (3.7) must be computed. To do so, a state matrix Z (t) is defined as Z(t) = y(t)y(t)T, with Z(to) = o (3.21) Thus, ‘ T T Z(t) = / Y(T))’(T) dT =< W >: *0 Therefore, equation (3.19) is simplified to W = n]I-(diag Z)“Z]W’T (3.22) By running equations (3.21) and (3.22) simultaneously, the algorithm defined by (3.19) will be successfully implemented. 3.5 Computer Simulations Computer simulations were performed to study the derived algorithm. Starting with two sine waveforms as the unknown sources and mixing them by a matrix A, the 52 mixture signal vector will be obtained. This signal is then fed to the feed forward neural network. If the output of the network is a constant of any of the two unknown sources, then the algorithm would have succeeded in separating signals. The separa- tion can also be achieved if the forward loop gain matrix P = WA is a generalized permutation matrix, where a generalized permutation (GP) matrix is one that has only one nonzero entry in each row and column. Numerous simulations for various initial conditions and learning rates were per- formed in order to study the algorithm as defined by (3.20). In all these simulations, the algorithm failed to separate the signals. An example of such simulations is shown in figure 3.2. The initial condition is 0.1352 0.4553 0.7832 0.3495 0: The network converged to 0.2265 0.4187 0.8116 0.2758 which gives the over all gain matrix 0.1277 0.3293 0.9226 -0.3700 Figure 3.2 shows that the performance index does not go to zero, implying that the gain matrix WA is not a generalized permutation matrix. This can also be observed by looking at the plots of its entries and noting that none of them go to zero. If the algorithm converged to a separating matrix W, then two entries of the gain matrix should go to zero. However, the figure shows that all the entries converged to nonzero values. 1° 01 70 ab Perionnance Index in 53 Simulation of eq1.m (eta,b)=(100,5) (l0,l)=(0.5418,1.817) 0.5 I A I I I I I I L I o 01 0.2 03 04 0.5 06 07 ca 0.9 1 1. < §o.e 3 g, E rill 030.6) g §a4W ‘ g g, s v 0 $0.2 5 o o A I Figure 3.2. Performance of Algorithm defined by equation (3.19) 54 3.6 First Improvement The algorithm defined in (3.19) failed to perform a separation of signals because it tested only for decorrelation of the output signal. In [8, 9, 10], Herault and J utten also arrived initially at an algorithm that was similar to the one derived in (3.19). They claimed that independence would be satisfied if higher order moments of the output are generated within the update rule. Therefore, some odd nonlinear functions that would exhibit an infinitely many order of moments by expanding these functions in their Taylor series expressions was introduced. Thus, by considering a similar approach [8], as was discussed in Chapter 2, (3.19) becomes W = n[1 — («hag < angry? >.)A‘ < f(y)g(y)T >.]W-T (3.23) At steady state, (3.23) is W = 0 I = (diag < f(y)g(y)T >:)‘1 < f(y)g(y)T >. < f (y)g(y)T >: = diag < f(y)g(y)T >. < f(ye)9(yj) >: = 0, W #J' < Z a‘skyfyf >. = 0, Vi 7e j 1,): odd < yfyf >. = 0, Vi ,1 j, w, W: 1111111} 11 where the coefficients a; and B). are the Taylor series coefficients of the functions f and g, respectively. This proves that the solution of this algorithm provides independent output signals as a solution to the problem, which is the desired goal. 55 3.6.1 Computer Simulations To evaluate the update of the weight according to (3.23), the computation of < f (y)g(y)T >1 must be performed. Thus, the state matrix Z(t) = f(Y(t))y(Y(t))Ta with Z (to) = 0 (3-24) is defined. Consequently, (3.23) becomes W = 17]] — (diag Z)'1Z]W'T (3.25) By running (3.24) and (3.25) simultaneously, the algorithm defined by (3.23) is real- ized. Computer simulations for the algorithm defined by (3.23) were performed using different nonlinear functions. Therefore, different cases are considered. Case 1: f(x) = 3:3 g(z) = :1: Numerous simulations were performed using different initial conditions and learn- ing rates. The following set of simulations was obtained by using the same initial conditions 0.4523 0.9317 0 _ 0.8089 0.6516 but, different learning rates. Figures 3.3 through 3.5 show the performance of algo- rithm (3.23) when the learning rate is equal to y(t) = 100, 10, 1006'5t Vt. In figure 3.3, One can observe that the algorithm converges. However, it exhibits some oscilla- tions and never settles to a constant level of zero, though it oscillates around it. This 56 behavior can be eliminated if a smaller learning rate is considered. Figure 3.4 shows the simulations under the same conditions when n = 10. It can be observed that a longer time is needed for the algorithm to converge. Therefore, a way of combining the observed behaviors in figures 3.3 and 3.4 is to consider a time-varying learning rate. Thus, when r] = 1006‘“, one observes a smooth convergence of the algorithm as shown in figure 3.5. Table 3.1 shows the performance of the algorithm for 20 different random initial conditions, but all having the same time-varying learning rate 17 = 100e'5‘. It can be observed that the algorithm converges in all cases. Case 2 f(x) = sinha: g(x) = tanh :1: Numerous simulations were performed using different initial conditions and learn- ing rates. The following set of simulations was obtained by using the same initial conditions W0 = 0.1352 0.4553 0.7832 0.3495 and variable learning rates. Figures 3.6 through 3.8 show the performance of algo- rithm (3.23) when the learning rate is y(t) = 100, 10, 1006‘5t Vt Observe that a learning rate 1] of 10 was too small to obtain any concluding results within the time span for training. When 17 is 100, one is able to observe that the network is converging to a generalized permutation matrix. However, it can be observed that the parameters exhibit some oscillations around a constant level. These oscillations were eliminated by considering a learning rate that decays with time. The appropriate function is 17(t) = 1006'“. A set of 20 simulations of the algorithm with different initial conditions was performed. Table 3.2 represents the performance of the algorithm in all these simulations. The initial conditions were chosen randomly. The 57 Simulation of eq4.m (eta,b) = (100.0) (l0,l) = (0.8499.0.03366) 1.— gos- $06- 04- '8 £02] 0 on 02 as o} 03 06 0} d8 d9 1 12» 15' ‘2‘ W a C C : : : : :— as» 2 g g 05 0,6W/U— ,5 2 WM 04 5 ° W a 8%» 03 1 .0fl) as 1 0.2981 0.9919 —0.0267 1.0163 W’ [08456 0.6035] WA‘[ 0.8377 0.0196 Figure 3.3. Performance of Algorithm defined by equation (3.23) for 17(t) = 100. f(x) = $3 and 9(3) = 1‘ 58 Simulation of eq4.m (eta,b) = (10.0) (l0.l) 2 (0849901006) A 5 Performance Index a o o o . l 1 V—_\ 1 5 < a. .5 C7": g 2 § 0.6 NV ‘3 o 5 ~ § 30.4 L/\ 2 0b 8 0’20 0:5 1 4'50 04.5 1 —0.0546 1.0416 -0.0267 1.0163 W = [ 0.8341 0.0246 ] WA = [ 0.8377 0.0196 Figure 3.4. Performance of Algorithm defined by equation (3.23) for 17(t) = 10. f(x) = $3 and y(t) = x 59 Simulation of eq4.m (eta,b) = (100,5) (I0,l) 2 (03499002937) d I I x 0.8 § g 0.6 - E 0.4 - E 8 0.2 L I j I I P x 1 l 00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.2- 1.5» < § 1 PW :‘ 1 C75 5'085"’\"* 13 v 9 ' § 05 § 0.6 M7 % €04 W, § 0km — c 0.2 - - -0.5 . - 0 0.5 1 0 0.5 1 0.2919 0.9937 -0.0078 0.9992 W" [ 0.8490 0.5987] WA“ [ 0.8473 0.0056 Figure 3.5. Performance of Algorithm defined by equation (3.23) for y(t) = 1008‘“. f(x) = $3 and g(x) = x 60 Table 3.1. Simulation Results Algorithm defined by equation (3.23). f (:c) = 2:3 and 9(3) = 1' Simulation # Final Performance Index 1 0.1490 g 2 0.0125 3 0.0132 4 0.1692 5 0.0134 6 0.1036 7 0.0924 8 0.0157 9 0.0587 10 0.2035 11 0.0560 12 0.1302 13 0.0164 14 0.0266 15 0.7045 16 0.0377 17 0.0209 18 0.0244 19 0.0417 20 0.0526 61 Simulation of eq2.m (eta,b) = (100.0) (|0,l) = (0849902341) 1" 0| 1 in m Petformance Index 0 o in O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 .° m _L g... E ' g 0.5 g 2 §0.4 g u 2‘ 3 ° 0.2 2 0 c9 0 ~ - —0.5 - - 0 0.5 1 0 0.5 1 0.1076 0.4627 -0.0395 0.4903 W — [ 0.6920 0.5060 J WA = [ 0.6838 0.0273 Figure 3.6. Performance of Algorithm defined by equation (3.23) for ”(t) = 100. f(x) = sinha: and g(z) = tanha: 62 Simulation of eq2.m (eta,b) = (10,0) (l0,l) = (0.8499.0.3421) d Perionnance Index 9 01 P 01 weight convergence .0 .° N 15 ' ‘ Generalized Matrix WA 0 ‘ ‘ -0.5 0.5 mm 0 0.5 1 0 0:5 0.1626 0.4462 0.0364 0.4208 W‘ [0.7229 0.4623] WA: [0.7395 -o.0554 Figure 3. 7. Performance of Algorithm defined by equation (3. 23) for y(t): f-—(:1:) - sinha: and g(z)- — tanha: 63 Simulation of eq2.m (eta,b) = (100.5) (l0,l) = (03499007759) 2.5 - E 1.5 1 1% (L 0.5 P o l l l L l l l I I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 1 . S 0.6 . E. 5.. g 2 § 0.4 ‘3 .5 5... g 3 <9 0 ‘ ‘ -0.5 ‘ ‘ 0 0.5 1 0 0.5 1 0.1313 0.4567 0.6970 0.4991 -0.0073 0.4618 WA 0.6927 0.0142 w=1 1 [ Figure 3.8. Performance of Algorithm defined by equation (3.23) for y(t) = 1006“" 64 Table 3.2. Simulation Results Algorithm defined by equation (3.23). f (:0) = sinhx and g(x) = tanhz ' Simulation # Final Performance Index 1 0.0066 2 0.1168 3 0.0517 4 0.1163 5 0.0543 6 0.0462 7 0.1836 8 0.2338 9 0.0062 10 0.0192 11 0.0084 12 0.0620 13 0.0347 14 . 0.0410 15 0.0038 16 0.2772 17 0.0165 18 0.0180 19 0.0510 20 0.0863 results show that the convergence was obtained regardless of the initial conditions. 3.7 Second Improvement In this section, the focus will change to an implementation view point in order to modify the derived algorithm defined by (3.23). Recall that such equation is W = n[1-(diag< 10090? >0“ < f(y)g(y)T >.] W 65 This algorithm requires the computation of the inverse of the matrix W. This com- putation is complex and time consuming. It will be of great advantage if such a term could be eliminated from the update law. In [55], the author suggests that using the information geometry perspective, the update equation of W defined by (3.9) can become 0(1) since the matrix A is assumed to be nonsingular. Therefore, equation (3.23) can be modified to the following algorithm: W = n[1 - (diag < f(Y)9(Y)T >:)“ < f(Y)9(Y)T >:]W (3-27) 3.7 .1 Computer Simulations By running (3.24) and W = ”[1 - (diag Z)"‘Z]W (3.28) simultaneously, the algorithm defined by (3.27) was implemented successfully. Com- puter simulations for the algorithm were performed using different types of nonlin- earities, as was done for the previous algorithm. Case 1: f(x) = :63 and g(x) = :0 Numerous simulations were performed using different initial conditions and learn- ing rates. The following set of simulations was obtained by using the same initial conditions used to test the previous algorithm, in order to provide some comparison 66 between the two. The initial condition is 0.4523 0.9317 0.8089 0.6516 Figures 3.9 through 3.11 show the performance of algorithm (3.27) when the learning rates are y(t) = 100, 10, 1006’“ Vt, respectively. One can observe in Figure 3.9 that the network parameters oscillate around the the desired values when 1] = 100. However, by considering a time-varying learning rate 77 = 1006’“, such oscillation were eliminated and the parameters will settle to a constant value, as shown in Figure 3.11. Figure 3.10, however, demonstrates that a learning rate 17 = 10 is too small in order to obtain any concluding results. Table 3.3 shows the results of twenty simulations of the algorithm defined by equation (3.27) with different initial conditions. It should be observed that 4 out of the 20 simulations failed to converge to a separating network since the corresponding performance indices are not near zero. Simulations that have a final performance index of more than n/4, in this case 0.5 since 11 = 2, are considered as failures. These simulations are labeled by an asterisk‘ in Table 3.3. Case 2: f(x) = sinhx and g(x) = tanha: Numerous simulations were performed using different initial conditions and learn- ing rates. The following set of simulations was obtained by using the same initial conditions used to test the previous algorithm in order to provide some comparison between the two. The initial weight is 0.0475 0.3282 0.7361 0.6326 67 Simulation of eq18.m (eta,b) = (100.0) (l0,l) = (08406006222) coop out II Performance Index 0 i0 i- L. o o d o to o (a) o h .0 (n o a: o N o m o co cub 1- 0.8 < §0.8W E 0.6: C 30.6 M‘ I § 0.4L § 8 50.4, g 0.2 a: 00 0:5 1 '0‘20 05 1 0.2276 0.7785 -0.0076 0.7838 W" [0.7458 0.5348] WA"[ 0.7410 0.0161] Figure 3.9. Performance of Algorithm defined by equation (3.27) for 17(t) = 100. f(x) = $3 and g(x) = 3 68 Simulation of eq18.m (eta,b) = (10,0) (I0,l) = (0840602191) cub l Perlormance Index o o o o is o I in j 0° )— F 1» 0.0» < §0.BQ=\ #_ 3 0.6’ § 3 g0.6\/\ E 0.4. §04 E 02 5' \/\’ 8 '1 a 0.21 s 0» N 3 o 0 a - -0.2 1 - o 0.5 1 0 0.5 1 0.1960 0.7498 —0.0366 0.7754 W‘[0.7373 0.4790]WA"[ 0.7514 —0.0470 Figure 3.10. Performance of Algorithm defined by equation (3.27) for 17(t) = 10. f(x) = $3 and 9(9) = z 69 Simulation of eq1 8.m (eta,b) a (100,5) (I0,l) = (08406001327) 99.0 hard: Performance Index 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 ’ 0.87 < 0.810 3 0.6» g» 1% 00.6V\/\ 2 0.4: E... 5 .2 . .é . 0.2 W § 0% <9 0 - -0.2 2 1 0 1 0 0.5 1 0.2357 0.7910 -0.0020 0.7924 W: [0.7341 0.5163]WA'[ 0.7332 0.0031] Figure 3. 31.1 Performance of Algorithm defined by equation (3. 27) for 17(13): 10. f(x) - $3 and y(t) - 3 70 Table 3.3. Simulation Results Algorithm defined by equation (3.27). f (2:) = 2:3 and y(t) = z Simulation # Final Performance Index 1 0.0211 2 0.0733 3 0.3574 4 1.5079“ 5 0.0138 6 0.0298 7 0.1022 8 0.2058 9 1.0605“ 10 0.0478 11 0.0578 12 0.0228 13 0.0544 14 1.3249“ 15 1.4127" 16 0.0254 17 0.0332 18 0.0636 19 0.0415 20 0.0467 71 while the learning rates varied. Figures 3.12 through 3.14 show the performance of algorithm (3.27) when the learning rates are 17(t) = 100, 10, 1006'“ Vt, respectively. It can also be noted that, in the set of simulations, a time-varying learning rate 17 = 1006'“ Vt, had to be introduced in order to eliminate oscillations, as observed in Figure 3.12, when the learning rate is of a constant value equal 100. Figure 3.14 shows the results for such a time-varying learning rate. On the other hand, when the eta = 10, such a learning is to too small to enable the network to arrive at the desired solutions. Simulation of eq16.m (eta,b) = (100,0) (l0,l) n: (0878601088) g 0.8 - 20.6 - E 0.4 8 0.2 00 1 » 0.8 k §0.s W i; 0.6» 30.6: E 0.4» — 0 g 0.2» g, 0 W4: 00 0:5 1 ”0'20 05 1 W= [313233 322332] WA= [ 33333” 3.313%] Figure 3.12. Performance of Algorithm defined by equation (3.27) for y(t) = 100. f(x) = sinhz and y(t) = tanha: 72 Simulation of eq16.m (eta,b) = (10.0) (l0,l) :1: (08786008517) Performance Index 0 0 rs 01 .0 O N r— P )— 1 ' 0.8- < 3 . $0.8 W .E 0.6 16 o 2 0.4* S 0.6 '3 .5 0.2 a \/ 0.4 l\ g 0 v V /o:5—_\ 0.20 - -0.2 - - 1 0 0.5 1 0.2183 0.7657 —0.0144 0.7757 W: [ 0. 7540 0. 5420 1 WA " [ 0.7485 0.0181 1 Figure 3.13. Performance of Algorithm defined by equation (3.27) for y(t) = f(x) = sinhzr: and 9(2) = tanhz Table 3.4 shows the results of twenty simulations of the algorithm defined by (3.27) with different initial conditions. It should be observed that four simulations out of these twenty did not succeed in separating the signals. They are marked with a star in Table 3.4. Table 3.5 shows the results of computer simulations between algorithms defined by (3.23) and (3.27). One may conclude that both have similar performance. 73 Simulation of eq16.m (eta,b) .-. (100,5) (I0,I) = (08786006408) Performance Index .0 .° .0 a a: 00 0.2 00 0:1 0:2 0:3 0:4 0:5 04.6 0:7 0:8 0:9 1 0.8 ’ 0.8 r g E 0.6 ’ 0.6 ' 5 \ 5 g z 0.4: ‘3' 0 4» '3 W’ % 0.2» $02» g o W¥ (5 0 K A - -0.2 - » 0 0.5 1 0 0.5 1 0.2402 0.7836 0.0065 0.7791 W " [0.7457 0.5223] WA ‘ ] 0.7456 0.0004] Figure 3.14. Performance of Algorithm defined by equation (3.27) for 17(t) = 1006”“. f(x) = sinha: and g(x) = tanha: 74 Table 3.4. Simulation Results Algorithm defined by equation (3.27). f (0:) = sinhx and 9(2) = tanh :1: Simulation # Final Performance Index 1 0.2664 2 0.2652 3 0.0806 4 0.8735* 5 0.0906 6 0.0832 7 0.1283 8 0.7715"r 9 0.1956 10 0.2490 11 0.1415 12 0.0141 13 1.7091* 14 1.0756* 15 0.0776 16 0.3482 17 0.1142 18 0.0558 19 0.3650 20 0.0484 Table 3.5. Results of Simulation Comparison between Algorithms defined by equa- tions (3.23) and (3.27) 75 Initial Conditions Final Results Of (3.23) Final Results Of (3.27) __ W0 f WA WA “'1' 0.9347 0.5194 0.8805 —0.0079 0.9906 0.0055 _ 0.3835 0.8310 ] 0.0082 0.8692 —0.0127 0.7380 If = 0.0366 If = 0.0433 0.0475 0.3282 -0.0054 0.3223 0.0229 0.3824 _ 0.7361 0.6326 . 0.8063 —0.0169 _ 0.7008 0.0089 . 7 If = 0.0966 If = 0.1283 0.1351 0.4553 —0.0091 0.4634 0.0069 0.4423 _ 0.7832 0.3495 . 0.6913 0.0163 . 1 0.8857 0.0091 J 7 If = 0.0916 I; = 0.0545 0.1351 0.4553 —0.0091 0.4634 0.0069 0.4423 [0.7832 0.3495 ] 0.6913 0.0163 . 0.8857 0.0091 2 I, = 0.0916 If = 0.0545 3.8 Observation and Remarks In this chapter, an algorithm based on the decorrelation condition between the com- ponents of output of the signal vector was developed. The resulting algorithm was improved in order to test for the independence between the components of the out- put vector. It was shown that the algorithm defined by the first modification always converged to a separating network regardless of the initial conditions. The algorithm of the second modification is more attractive since it eliminates the computation of the inverse of a matrix. However, such elimination resulted in an algorithm that has a 20% chance of failure rate. CHAPTER 4 Higher Order Statistics It was proved in the literature [8, 23, 61] that second order statistics are not suf- ficient to solve the problem of the blind separation of sources. One will also recall that the static algorithm, developed in Chapter. 3, was initially based on the second order statistics, since the correlation defined such characteristics as the statistical description of a signal. Some techniques widely used in the literature were used to improve the algorithm performance to test for higher order statistics by generating infinitely many orders of moments through the injection of odd nonlinear functions f and g in the algorithm. It will be shown how these functions can be determined by defining an independence criterion that defines the problem. It was proved in the literature, that cumulants up to the fourth order are sufficient to approximate the probability density functions of a random variable using the fourth order Edgeworth approximation [62]. Consequently, higher order statistics have enabled several re- searchers to analyze the problem to some extent, Amari et a1 [24, 63, 64,.65], Comon et al [23, 66, 22], Tong et al [29, 28]. However, one should keep in mind that these algorithms were developed for a static environment or discrete dynamic Finite Im- pulse Response (FIR) models. However, none of these approaches addressed a general dynamical environment as it will be defined in this work. Such a dynamic system has memory represented by its dynamical states. Through the framework of opti- 76 77 mal control theory, or the calculus of variations, and through the use of higher order statistics , this research will focus on developing new algorithms. In this chapter, an independence criterion will be developed. It will be used to derive an update law for the parameters of a static feedforward and feedback network models. The developed independence criterion will be also used to derive update laws for the parameters of a dynamic feedforward and feedback network models the subsequent two chapters. To introduce higher order statistics, one must first give the definition of statistical independence. Definition 3 (Statistical Independence) The components of a random signal vector y are statistically independent if and only if their joint probability density (pdf) py(y) is the product of all individual marginal probability density functions p,“(y,-). Symbolically, :00) = Hat-(172°) (4.1) 4.1 Mutual information One way to measure the independence of the components of a random vector is to measure the distance between the right and the left hand sides of equation (4.1). When such a distance is zero, then the random vector’s components are statistically independent. A well known distance in the literature is the Kullback divergence functional [58] I(y) = / fy(u)1n 1311421341: (42) where f,(y) is the pdf of a random vector y. The functional I (y) is always positive and is zero if the components of the random vector y are statistically independent. It defines the level of dependence between the components of the signal. Therefore, 78 it represents a good functional for characterizing statistical independence. I (y) can be expressed in terms of entropy as defined in Appendix A. 10) = —H(y) + 23110.) (4.3) Some properties of entropy are presented in Appendix A. Consider the vector y E ’R’" to be the output of a system described by the linear mapping W = [W1 W2] due to the input signal vector x e R" as shown in Figure 4.1. \ s(t) x(t) \\ Y“) _. A 4 \_ Figure 4.1. Feedforward Architecture Define the vector z=[yl...ym,xm+l...xn] (4.4) Then, 2 = Wx (4.5) where - W1 W: W = (4.6) 0 In-m W is a nonsingular square matrix with the assumption that W1 is also nonsingular. 79 Using equation (4.5), a relationship between the probability density functions of the random vectors 2 and x can be obtained = 1} IWI f. (4.7) Since 2 = [y1 - - -ym,:r:m+1 - - . :cn], one can express the pdf of y in terms of that of i=[$,...xm] f2 = _: Lawns-~45. (4.8) = f-Sooo fX(x)'d;/fl;+l ' ' ' dz“ (49) '5' (2.2) Finally, using the entropy definition and the equation above H(y) = —E[ln f,] = ln IWI + H(i) Therefore, equation (4.3) becomes I(y) = —H(5’r) - ln |W| 4.211(0) (4.11) In equation (4.11), one may compute the first two terms of the expression. However, the summation term is unknown since we have no knowledge of the probality densities f2.- H(y.-) = -/f2.-1nf2.dya (4.12) Thus, one must approximate the probability density f,“ and also its logarithmic ln f,“ in order to obtain some approximate expression of the entropy as defined by equation (4.12). To compute such an approximation, one would first need to introduce moments 80 and cumulants. Then, an Edgeworth expansion will be used to approximate the probality density f,“ and also its logarithmic ln f0:- 4.1.1 Cumulants and Moments Given a random variable x, the nth order moment is defined as Pa = Elma] Moments are also often defined as the formal power series expansion of the moment generation function defined as x k I: 1: M(fi) = E[efi”] = E]; (file!) ] = ; éTEl-Tkl = 225%" The cumulant generation function is defined as I: Km) = 1114409) = lnElei’l = g "*5 where n), is the cumulant of the random variable 2:. One can express the moments in terms of the cumulants, and vice versa, by solving the equation 2‘12?— Table 4.1 shows the relationship between them for the first few orders of moments and cumulants. Cumulants proved to be computationally efficient compared to moments despite the fact that the two quantities are equivalent. Some of the most important features of cumulants , for statistically independent signals [67], are e the cumulant of the sum is the sum of the cumulants, 81 Table 4.1. Conversion of moments and cumulants Cumulants in terms of moments ] Moments in terms of cumulants '31 = #1 #1 = I‘31 ”2:”2—f‘1 #2=K2+n§ I‘33 = fl3-3fl1#2+2I-‘i P3 = 53+351K2+I€¥ 64 = #4 - 4111113 - 314% + 12112;“? - 61“} #4 = 64 + 46163 + 343 + 6626? + 51’ e ~the cross cumulants are zero, 0 the Edgeworth expansion is most conveniently expressed in terms of cumulants, and 0 most pdfs of practical signals can be approximated by a finite number of cumu- lants. 4.1.2 Edgeworth Expansion The Edgeworth expansion of a given distribution density function f (2') is formally defined as a density function having cumulants x..- that are constructed from a modi- fication of a baseline density function fo(:r:) having cumulants u; [67], f(x) = 5(4) f 0011—7; (413) k=0 where hk(:r) defines a family of orthogonal functions known as the Hermite polynomial functions as seen below fclklfit) — — k— hlc(x) "' ( 1) f0(3) (4'14) 82 Here the 14’s are the pseudo-moment that satisfy the equation 00 flk oo . 1: 2(1):,- — 11,-)-k—' = ln 2 ”VIC—l (4.15) k=0 ' k=0 ' The choice of the baseline depends on the the statistical properties of the variable that is to be approximated. However, if no such information is available, it is most convenient to use the normal density function 1 22 f0($) 0(1‘ ) me ( ) as a baseline. When fo(a:) = 0(3) and the equation (4.13) is truncated at some finite number of terms, the resulting density function approximation is called a Gram- Charlier series [68]. Expansions based on other than normal distributions are rare. There are, however, some that are based on the x2 distribution density function [62]. Normal Baseline Expansion It is intended here to determine an expression for the Edgeworth expansion when the baseline is the normal distribution 0(3) as defined by (4.16). The cumulants of 0(3) can be extracted from its cumulant generating function 32 I: In E[expfl:c] = —2— = Erik-k7 k . Therefore, all the cumulants of 0(3) are zero except V2 = 1. Thus, in this case, the first two pseudo-moments p; are zero and the rest are the unmodified corresponding cumulants of the distribution density function f (2:), namely 10),. Consequently, the Edgeworth expansion of a density function with respect to a normal baseline is f(x) = 5(2) [1 + $5202) + 535(5) + - - -] (4.17) 83 Table 4.2. Hermite Polynomials derived from a normal baseline density function 0(2) h1(a:) = a: h2($) = $2 '— 1 h3(:r:) = 9:3 — 3x h4(:r) = :1:4 — 6.732 + 3 Equation (4.14) defines the Hermite polynomials. In the case of a normal baseline, they are easy to compute. Table 4.2 presents few orders of these polynomials. 4.1.3 Entropy Approximation The tools to approximate the entropies are now developed. For ease of notation, a random variable .7: having a density function f (2:) will be considered. Its fourth order Gram-Charlier approximation will be as follows: it 1: f(x) 2. 2(2) [1 + 33-1120») + fibre] = 00110) Therefore, H(x) = — [101010) 22 z —/f(:r) ln 0(2) dz — /0(x)p(a:)lnp(:c) d2: However, -ln0(:c) = --ln (4.18) (4.19) (4.20) 84 Therefore, —/f(2:)ln0(:r) d2: = éln27r/f(2:) dr+%/xzf(x) dz: 1 '2 ElDZW-Fég- In order to compute the second term in equation (4.19), one must approximate the logarithmic function as 2 $3 111(1+5) = :r- 52+ 3+0(x2) (4.21) Therefore, 122(2) = 0141—11-10:: 1—112+§1p(x)-113 = §!1’c-%-h3+!—h4- '1'[3—!-h 3 '1'?!1 4]2 LCM—ha +41’1“] 2 2 K K ’03 K4 3 __h2_ 4 __h2_ —h3 h4 2 3123 2 4!2 314! 3 2 3 ”4 3+ ”3‘4 2 “3‘4 2 3. ”313,13 + 3. 413h4+ 31271"3 I“ +— 31412 —h3h4 +33! = §—!h33:+-!’ —h4— + Consequently, 11411111114) =1w. o m—_. -: ..... I ' \ \ 3, ' —2o ~""-<' : \ \ I I .40 Graph (a) \\ I Graph (b) ‘ \ ” —60 Graph (c) \ , Graph (d) -80 -1” 1 L 1 I 0.5 1 1 .5 2 -2 Figure 4.2. Nonlinear function. Graphs (a)-(d) show the plots of the functions fa(y) through fd(y) 0.3 r 0.2 - .- 0.1 ~ -2; —— Log ._._._. 1atordor 0* >~-3 - - - - 2nd order >~ ........... 3rd ordar .0.‘ '- -4 -o.2 ~ -5 " _6' _ —O.3 - -7 L L A j -0. l j_ A I -1 -o.5 o 0.5 1 $.46 —o.2 o 0.2 0.4 X X Figure 4.3. Natural Logarithmic function and few orders of approximations 92 The initial weights are also randomly chosen and are defined by the matrix 0.5277 0.4755 0 = 0.7625 —0.5303 Thus, the initial overall gain matrix is 0.3356 0.5572 G0 = W014 = -0.0647 —0.1067 Clearly, such a matrix is not a generalized permutation matrix. Thus, it is desired that the network would update its parameters such that the gain matrix WfA, where W, is the final weight matrix, is a generalized permutation matrix. Using the algorithm defined by equation (4.45) and the nonlinear function fa defined by equation (4.43) and considering a learning rate 17 = 0.005, the network converges to —0.0703 0.7625 1.8137 -—-0.5303 Thus, 0.0013 0.5535 W,A.-. 0.7729 0.0049 is a generalized matrix. Consequently, the algorithm converges to a separating solu- tion. Figure 4.4 shows the results of such a simulations. Using the same mixing matrix A and initial weight matrix W0, and considering a learning rate r) = 0.1, the algorithm defined by equation (4.45) and the nonlinear 93 3rd odor, last step approx. (‘), random and sine input 2.5 x '8 2 E 8 1.5 5 E 1 § (L 0.5 0 M.An..._1““ .... . . . . . .. O 10 20 3O 40 5O 60 70 80 90 2 ' 1 . < ‘9 1.5' 3 § 2 / 8’ 1’ 23' g 05 2 0.5 1: 'o 8 .§ 9 ° 9 c 3-05 . 8 O -1 O 20 40 60 80 0 20 4O 60 80 Figure 4.4. Computer Simulation of Feedforward Structure Using fa 94 function fd defined by equation (4.47) converges to the final weight matrix —1.4506 3.3833 W, = 2.2009 —0.2815 Thus, final overall gain matrix 0.0127 1.2280 G f = WfA = 1.6472 —0.0006 is also a generalized permutation matrix. Figure 4.5 shows the results of the sim- ulations. These two simulations are a sample of numerous simulations that were conducted to analyze the performance of the algorithms for various initial conditions. In all cases, the algorithm always converges to a desired solution. 4.4 Static Case: Feedback Network Structure Recall that the mutual information is the algebraic sum of the joint entropy and the sum of all marginal entropies, as described in equation (4.3). Because of the opposite signs that appear next to each of these two terms, one can deduce that the minimization of the mutual information is somehow, but not exactly, equivalent to the minimization of the sum of marginal entropies and/or the maximization of the joint entropy. In the work of Bell and Sejnowski [45], the authors approached the problem by maximizing the joint entropy in order to develop an update law for the network. On the other hand, I propose an update law based on the minimization of the sum of marginal entropies. Thus, the theorem below is stated. Theorem 2 Under the assumption that qm, = 0 Vm aé i, the update law (£3 = nfa(9.-)yj (4.49) 95 2nd oder, first step approx. (‘), random and sine input 1" 0| l '0 Performance Index _. 01 .° 01 c,4: 8 8r 8 8 8 5‘ 8 8 Generalized Matrix WA 2O 4O 60 80 Figure 4.5. Computer Simulation of Feedforward Structure Using fd 96 is derived by minimizing the functional q) = Z H(yi) (4.50) Proof: The parameters of the network are updated using the gradient descent method d.” = _0— 4.51 J adij ( ) However, using equation (2.23), the gradient of the function is expressed as 695 _ 695 09... 6d,,- “ 5,; 6y... 8d,,- (4'52) 34> _ -— ; Eqmiy] (4.53) By considering the assumption that qm, = 0 Vm 51$ i, then 04> 343 adij — -9" ayi y] (454) = -q.-.-fa(ye)yj (4'55) Therefore , the parameters of the D matrix are updated according to 4.3 = flfa(yr)yj (4.56) The algorithm described by equation (4.56) provides a justification of specialized view of the algorithm developed by Herault and Jutten based on neuromimetic ap- proach [8], since Equation (4.56) is analogous to equation (2.6) where f (1:) = fa(:c) and y(t) = :c. 97 4.4.1 Computer Simulations Using the feedback structure as defined by Figure 2.1, the performance of the al- gorithm defined by the nonlinear function fa, as described by equation (4.43), will be compared to that defined by the nonlinear function fd which is developed in [24] and is described by equation (4.47). Using this structure, the update law for the parameters is as follows: 6..- = 4.7.09.9.- z'aé 73a = a.d (4.57) Computer simulations using 2 sine functions of respective frequencies 1H 2 and 2H 2 are first conducted. The mixing matrix is 1.0 0.4 A = (4.58) 0.6 1.0 The initial weight matrix is zero and the learning rates are the, following 1]“ = 0.005 17.1 = 0.1 (4.59) Figures 4.6 and 4.7 show the performance of both algorithms. One can observe that the algorithm defined by fa succeeded in separating the original sources, whereas the algorithm defined by fd fails to perform separation. Computer simulations are also performed using a Gaussian random signal and a sine function of frequency 2H 2. Also, other simulations using a Gaussian random signal and a square function are completed. In each case, the algorithm defined by for fails to separate the signals. On the other hand, the algorithm defined by fa performs the separation task successfully. Figures 4.8and 4.9 show the performance of both algorithms in the case of a Gaussian random signal and sine function were considered as the unknown sources. Figure 4.10 shows the original sources, the mixtures and the 98 New Algorithm e . n ................................................................................................. - ................................................................................................. . o c - Performance Index ..................................................................................................... 50 60 7O 80 90 1 00 0.8 * 1 .5 . 0.6 g 1 w E 2 ’5 0.4 ' E 0.5 3 E ‘c’ 0.2 8 O 4 O ‘ ‘ -O.5 * ‘ 0 50 100 O 50 100 0.0000 0.4001 ] W _ 1.0000 —0.0001 f“ 0.6001 0.0000 -1 _. (1 “W A" [ —0.0001 1.0001 Figure 4.6. Performance of the Algorithm defined by the nonlinear function f,, 99 AmariAlgorithm Eternun. .............................................................................................. E o IL1.6- L 1 1 l l I l 4 l I O 10 20 30 4O 50 60 70 80 90 100 0.6 ------------------- 12 0.4 -------------------- g 1 g) $0.8» :5 0.2r _g 3 gas :2 o (90.4]. -O.2 ‘ ‘ 0.2 ‘ ‘ O 50 100 O 50 100 _ 0.0000 0.0222 _, _ 0.9916 0.3797 W" [0.2229 00000] (”W’) A“ [0.3789 0.9154 Figure 4.7. Performance of the Algorithm defined by the nonlinear function fd 100 output of the network at convergence. New Algorithm Performance Index 0.8' 1.5: Generalized Matrix 9 of i 010203040 010203040 _ 0.0000 0.3940 _, _ 1.0002 0.0078 WI " [0.6003 0.0000] (I +Wf) A‘ [ —0.0004 0.9953] Figure 4.8. Performance of the Algorithm defined by the nonlinear function fa using random Gaussian and a sine function 101 Amari Algorithm Performance Index 0.6 ----------------------- _x 1 0.4 ----------------------- g t "’ 2 % 0.2 E 3 E o 5 K 8 -O.2 - 4 O _ - . O 1 O 20 O 1 O 20 30 _ 0.0000 0.2250 _, _ 0.4949 0.1591 W” [0.3199 0.0000] (MW!) A" [0.2323 0.4908] Figure 4.9. Performance of the Algorithm defined by the nonlinear function f4 using a random Gaussian and a sine function 102 5' 1 '5 0 ‘3‘) 0 -5 - . _1 . . 50 50.5 51 50 50.5 51 30 301 - - '1 1-201 5 N20 re 7 ID 72' E 310’ 2.0m [fin :10 ill” Illll 0 7 . n 0 7 7 -4 -2‘ O 2 4 -1 -O.5 0 05 1 5 5. ‘ ° 3 °W)WW -5 g - _5 . . 50 50.5 51 50 50.5 51 5 2. 3'. 0 g. 0 "550 505 51 "250 505 51 x10‘4 ' x10“ ' 2' 5. 5 ° 2 I I O ;_21 g. _4 A . _5 . . 50 50.5 51 50 50.5 51 Figure 4.10. Output of the Algorithm defined by the nonlinear function fa using random Gaussian and a sine function ' CHAPTER 5 Blind Separation in a Dynamic Environment: FeedForward Structure 5.1 Problem Definition In Chapter '3, static modeling for both the environment and the network was con- sidered. However, media cannot always be modeled as such. Therefore, one must consider more realistic environments, define their models and develop an update law to recover the original signals. Such an environment will be modeled as a linear dynamical system. Consequently, the network will also be modeled as a linear dy- namical system. Figure 5.1 depicts the architecture of this situation. The realiza- A M v e(t) i=Ax+Be y y=Cx+De 1? s(t) bl ml 0' 3M G NI. XI )0 + + Figure 5.1. Feedforward Architecture 103 HG li: 104 tion 5 = (A, B, C', D) represents the parameters of the environment, whereas D = (A, B, C, D) is that of the network. These parameters dictate the dynamics of both the environment and the network systems. While the realization T) is constant and models the behavior of the environment, the realization D of the network are yet to be modeled or defined. The goal here is to find an adaptive law (algorithm), in such a way, that when these parameters converge to some stable solution, the network output signals are replicas or similar to the original unknown sources. Or, by using the definition of wave-preserving, it is desired that the output vector preserves the waveforms of the components of the original signals. So, the question at hand is how should one update these parameters to arrive at such values? and, do such values exist in the first place? In this chapter, first, the existence of a theoretical solution to the problem will be shown. Then, Optimization Theory will be utilized to develop an adaptive rule based on a criterion that defines the independence of the output signals. A state representation of the network model that renders its implementation in VLSI attractive, and which also minimizes the number of parameters that characterize the network, will be deve10ped. The algorithm will be tested via computer simulations. Limitations and some possible improvements of the algorithm will also be discussed. 5-2 Existence of a Theoretical Solution Suppose that an update necessary to recover the original signals is developed and that the parameters of the neural networks have converged to the realization D“ = (A', B’, C”, D'). Then, because of the wave preserving property, there exists a Permutation matrix P and a. positive definite diagonal matrix I‘ such that flfi=PFfifi Co 105 There also exists a nonsingular matrix T such that x = Tic. Thus, differentiating both sides with respect to time: szx => A’x+B'e=TAi+TBs => A“): + B'(Cx + D5) = TAT-1x + TBs => A‘x + B’Ds = (T/iT"l — B‘CT‘1)x + TBS (5.1) Also, y: PI‘s => PI‘s = C'X-l-D'e => PI‘s = C‘x + D'(C’x + Ds) => PFs = (C‘ + D‘C‘T’l)x + D‘Ds (5.2) Therefore, A" = TAT"1 — B"C-'T'1 (5.3) B‘D = TB (5.4) C“ + D"C'T‘1 = 0 (5.5) D'D = I (5.6) Consequently, A‘ = T (A + BD“C’) :r-1 (5.7) B' = TED" (5.8) C" = --P1‘1')-1(7:I‘-l (5.9) D‘ = PI‘D“ (5.10) Pa: 31* Th 3?; the In! 36.0 THU 106 Particular Case Suppose that T = PI‘ = I where I is the identity square matrix, then A" = A—BD-‘C‘ (5.11) B“ = BI)“ (5.12) C" = D-IC' (5.13) D‘ = f)‘1 (5.14) Thus, it is shown that the theoretical solution to the problem exists, and that if the appropriate energy function of the independence criterion is defined for the problem, then the parameters of the system will converge to one of the solutions defined by equations (5.7)-(5.10). However, before going into defining such energy functions, possible realizations of the system will be discussed; 5.3 State Realization In this section, different network structures and their suitability to the problem of the separation of signals will be introduced. The network is assumed to be a multi-input multi-output system described by Y(s) = H(3)U(s) (5.15) H (s) is the m x r system’s transfer matrix where r and m are, respectively, the number of input and output signals. The goal is to find a dynamical system D = (A, B, C, D) whose transfer function is the m x r matrix H (s) where H(s) = C(sI — A)“B + D (5.16) 107 However, before starting the development of the structure, the following assumptions on the transfer function matrix H (s) will be considered. (A1) H (s) is stable and minimum phase (A2) Each entry of the matrix H (s) is in its irreducible form (A3) entries of H (s) have no common poles The rationale behind these assumptions will be explained in the development of the structure. 5.3.1 Controllable / Observable Canonical Form The literature provides one with various realizations of H (s), [59]. Of these, a canoni- cal controllable realization will be considered. The procedure to develop the proposed structure is simple and straight forward. A main feature of the structure is that there is no direct coupling between the input signals. This realization has very nice features, in terms of its perfect fitness to the problem of blind separation, and also in terms of circuit implementation. It presents the least number of parameters of the state representation of the transfer matrix H (3). Such features will enable one to define simple update laws for the parameters and to analyze their performance based on the independence criterion of the output signals. The structure generates a bank of filters that are jointly decoupled. The analysis provided below will display such char- acteristics of the developed realization. Some figures are also presented to illustrate this point. Each output is described as K-(s) = imam-(s) = inks) (5.17) j=1 '=l 108 According to assumption (A2), it is assumed that each H,,-(s) is irreducible. Thus, we choose to represent H;,-(s) by the canonical controllable realization D;,- = (Aiji bija Cg, dij)- A.,eR"i:‘""-‘1 1,0512%: . . -_ A 1 . . ’7": 55,1.) 0 1 o o 1 53,1.) 0 5S? 0 o 1 o 55,?) o = + u, 53‘0") o o o o 1 mgr-1) 0 £351) _agfij) _as;) J $355) 1 y.- = as? as? ass-1) cw + a... c§j62“‘i where Org-c) are the coefficients of the characteristic polynomial of the matrix Ag, defined by det (AI — A”) = A“? + adj-EV“?1 + - - - + erg-“VIM + aff") (5.18) The sub-realization D5,- : (Aij, bij, c5, dij) provides us with a set of different filtered versions of the 1"” input 11,- that affect only the 2'“ output 3],. Figure 5.2(a) presents the diagram of the state representation the sub-realization ’ng, whereas Figure 5.2(b) shows the combination of the state and the output representation of the sub-realization D5, = (A;,, b;,-, c3}, dij). The combination of these realizations 13.3, j = 1, - - - ,r, will provide us with the 2'“ output 3],. Figure 5.2(c) shows such a structure. But, let’s proceed to see how it is obtained. T To do so, one must define the vector x; = [ x3; . . . x5 ] to be a vector in ’R’“ 109 where p,- = 237:1 11.5. Then, A.eR:axua 3,53..." 51,-, A51 0 0 1 - bu 0 0 O 0 x.J = 0 A” 0 Xi + 0 b.J 0 l1 0 0 1 Sq, ‘ L 0 0 Ag,- . L 0 0 b.-,. y,- = [cg CiTj cgJX, + [(1.1 db. £11,)“ CaG'R‘“ dgE‘R' Therefore, 5:. = A;x.-+Bgu (5.19) y.- = ciTx,+d,7'u . (5.20) An important feature of this realization is the inherent parallel structure that it possesses, as shown in Figure 5.2(c). This structure clearly decouples the inputs from each other. This will be used as an advantage when an update law that pushes for the output independence is to be developed. At the final stage, all the filters will be combined and the structure below will T be developed. Thus, the vector x = [ x? . . . x; 1 in R” is defined, where n = V1 llOI 110 22;, [15. Then, AERnx" BERnx' , r. n . . ’7‘“. *1 A] ‘ 0 0 X1 Bl - 0 X. = 0 ' Ag‘ 0 X. + B, u - 0 . . . 1 1. - - 1 P a - - y] c? . . . O . . . 0 x1 df 0 y. = 0 Cz‘r 0 X; ‘1' d? 11 0 y... 0 0 c; x... d: ‘ v a) b b - Cekmxn De‘Rmxr Thus, 51 = Ax + Bu (5.21) y = Cx + Du (5.22) where A, B, C and D are defined above. The realization D = (A, B, C, D) is called the controllable canonical realization of H (s). In [59], it is discussed that D};- = (A3}, b5, cijT, (1.3) is called the observable canon- ical realization of H;j(s). Therefore, by following the same procedure, as discussed above, to develop a controllable canonical realization of H (.3), one can demonstrate that DT = (AT, BT, CT, D7) is an observable canonical realization of H (3). By assumption (A2) and (A3), this structure is irreducible. If these two assump- tions are not verified, then a more compact realization can be found [59]. But, for the 111 sake of generality, it is safe to have such assumptions. Assumption (A1) is introduced because, in real life applications and in computer implementation, such a feature is needed in order to guarantee stability. 5.3.2 Features of the realization In the course of development of the realization, some features of the proposed realiza- tion have been mentioned. These include the decoupling between inputs, the inherent parallel structure of the realization, and of most importance, the reduced number of parameters by which the system can be defined. This last feature is very impor- tant in terms of circuit implementation because, it implies reduction in the circuit micro-electronic implemtation area. Starting with a transfer matrix whose entries have no common poles, the state representation of such a transfer matrix would require as many poles as the transfer matrix has. This number is equal to the size of the state vector x, namely n. Despite the fact that the matrix A is an n x 12 matrix, it is being represented by only 12 = 2,},- p;,-, since each matrix A;,- is represented only by the coefficients of is characteristic polynomial, and the remaining entries are all ones and zeros. This represents a reduction from n2 to just n parameters to represent the matrix A. The matrix C is an m x n matrix. However, only 11 parameters of the matrix are nonzero. The other n(m — 1) are all zeros. This again represents a reduction in the number of parameters from nm to just 17.. The matrix B is formed by only zeros and ones. Thus, there is no parameter for this matrix. Since B is an n x 1' matrix, this again represents a reduction of nr parameters that must be updated. Finally the D matrix is represented by m x r nonzero parameters. Table 5.3.2 shows the number of parameters that would represent a general m x r transfer matrix with no common poles between its entries in one column. The other 112 Table 5.1. Number of parameters for 2 realizations Realization A n B nr nm n mr mr n +nr+nm+mr 2n+mr column shows the number of parameters for the controllable canonical realization that has been defined above. One observes that the order of the number of parameters that define the system has been reduced from 0(n2) to 0(n). 113 (xi-A.1)" drl u) _— “ +131'An)- E E E * 1:1" m x . x.. :3 + —> y. Hg — u, +(Sl‘A‘j)-l E E E L___ (11,7) . - x.. 11 dir (I) F—- 1. E2] — “r-———>[:l-A.-.)" E 5 5 Ha» m ’ it h‘ Figure 5.2. (a) State Equation x5,- = A,,-x.~,- + bijuJ', (b) Output Equation 31,-, = chgj + dgju, and (c) Output Equation 3;.- = cfx; + diTu 114 5.4 Adaptive Optimal Control Theory Optimal adaptive control theory will be utilized to derive an update law for the blind separation problem in a dynamical environment. The derivation will be valid for both deterministic and stochastic cases. The update law will be derived for a general nonlinear dynamical case. Then, the derivation of the linear dynamical case will be straight forward. Suppose that the independence of the output is characterized by c(y,w) = ¢(t.x,w) (5.23) The goal is then to optimize the functional defined by (5.23) subject to the dynamics of the following system it = f(t,x, W1), With X(to) = X0 (5.24) y = g(t,X,W2) (5.25) where f and g are two twice-differentiable continuous functions, and WI and W2 are the parameters of the network model. To accomplish the task of optimization, the following performance index is introduced J( ) /T£(t ' )1 dt x w = x x w i 0 , a 1 a 1) where w = [w1, W2] and £(.) is the Lagrangian function defined as C(tax: is Aiw) = ¢(t,X,W2) + AT (X - f(t,X,W1)) By introducing the Lagrangian parameters Mt), the functional J (x, w) will be op- timized by considering the different variables w(t), x(t) and Mt) as independent variables. There is no constraint on the final time condition of the system. Thus, 115 the optimization of the functional J is called a free-end variational problem. The variation of the integral leads to the following well-known Euler equations for the variables x(t) and /\(t): 6.6 d 6L 57g}-.. (5...) BL (9L .5, -1-.{..}= . (.27) with boundary condition 0.6 5? = o (5.28) t=T Equation (5.26) is equivalent to x = f(t,x,w1), with x(0) = x0 (5.29) while equations (5.27) and (5.28) are equivalent to i=— GOT) +— 6:, with A(T) = o (5.30) The parameters will be updated using the gradient of the performance index ‘ 0£d . w-=na—w—_ J=17/0— =1,2 (5.31) where 17 is the learning rate and is positive (negative) if the goal is to maximize (minimize) the performance index. Note here that A is obtained by performing a back propagation through time starting at time t = T with zero initial value. 116 Equation (5.31) can be simplified to . ‘BATf w, _ —.)/0 i 1 do (5.32) . _ t 65 W2 — 7] 0 __52 d0 (5.33) The parameters could also be updated using an instantaneous update 816 W; = 175-“; 2:1,2 (5.34) and would result in the following update laws for WI and W2: 6)? f ° _ 5.35 W] " CW1 ( ) . a .. = ...:é; (.3.) Remark: one can observe that the update law for the parameter matrix W2 depends on the independence criterion function (1), whereas that of w] depends on the choice of both the nonlinear function f and the independence criterion <13 since the dynamics of A depends on its direction along the state x. 5.4.1 Computer Implementation The computer simulation of the algorithm will proceed as follows: 1. Perform a forward integration in the time interval [0, T] in order to obtain the dynamic state equation and output equation defined by (5.24) and (5.25) 2. Perform a backward integration of the system’s adjoint equation defined by the dynamics of equation (5.30) 3. Update the network parameters using either set of equations defined by (5.32- 5.33) or (5.35-5.36) 117 4. While parameters are not convergent, go back to step 1. This research has focused on the update of the parameters of W2 while keeping the parameters of WI at their nominal solutions. In this case, one observes that the update of W2 does not depend on the adjoint state. Therefore, step 2 described above can be eliminated while we focus on the update of the W2 parameter. 5.5 Update Law Derivation 5.5.1 Nonlinear Dynamic Modeling In this section, the output of the network is modeled as a nonlinear function of the weighted sum of the input vector e and state vector x. 1’: = f(t.X.w1) = f(Ax + 36) = f(U) (5-37) g(t. x, wz) = g(Cx + D8) = g(V) (5-38) ‘< ll Then, the update law for the parameters, as well as the state equation of the adjoint system, will be derived. Update law Derivation for A and B Let a = a, 6. Then, _ a(/\Tf) 1’ 00;,“ afm(um) _ _ )m_____ a; 30:3 0:: 172m: f ( )6a.,- 051' = 118 Thus, one needs to compute 351,1: ‘1 an", 0a,,- Bum (9me 5E;- _ P 6338? -- $677" 6PJCP — Cm'CJ Bump = a; = 6m;6 -:c =6mgl“ 2p: 3m,- p Ep: 10: p .7 where 6 is the Kronecker-delta defined as 1 ifz' = ' 6,,- = J O ifz' 751' Therefore, dgj = —T]ZAmffn(um)6mixj = -7If,"(ui))‘i$j 5,-1- = -n21\mfr’n(um)5miej = —’7ff("i))‘iej In matrix form, A = -17Aft(u) /\ XT B = —7]Afr(u) /\ 8T where A. = diag(zl,zg, - . .,z,.) Table 5.5.1 presents some nonlinear functions and their corresponding Anti- hebbian like update rules for the matrices A and B. Update Law Derivation for C and D The independence criterion ¢ is a function of the output y, 45 = ¢(y) Let Oz = c, d again. Then, 645(11) 60.3 61,-,- =77 Th 119 Table 5 2. Different Nonlinearity functions and their derivatives Function Derivative Anti-Hebb Anti-Hebb f(u) f’(u) for A for B u 1 AxT AeT Tit—‘3 f(u) - f(u) Af(u)_,(u)2AxT Af(u)_fm2).eT tanh(u) 1 — f(U)2 A1_f(u)2)tXT A1_f(u)2AeT arctan(u) 1%; A 141/RT A141 z\e e"“2 -2u f (u) A-2uf(u)AxT A-2uf(u)).e _ 3¢(y)0ym — 712 By". CagJ' _ 395(31) )39m(vm) _ '12 By... 60,-, 17236;:9 I m( vm)a—:m Thus, we need to compute 3724' avm cm 66" = 2p: E3212? = $6,,“ (Swat? = 6";ng 0v", _ 0d,”, 561—1; _ p —ad:jep 2:6,,"- 6,,Jep- — 6 me,- Therefore, 5 9, Cu = ’72— fig—1) m(vm)5mexj = ng.(vs)— a: - 3 g' 3 d..- = n}:— ¢—-:f::) ..(um)6m.-e.- = ng:(v.-)— 45. 3y,- ej In matrix form, - 6 C = ”A9'(V)-3_:XT 120 59¢ D = ”quv) b—y—eT Below, two simple independence criteria are listed. 1. ¢(y) = £2,312 = §yTy. Therefore, 13,-, = 71930031131 3:5 = ng£(v.-)y.~ej 2. ¢(y) =% ,y3. Therefore, «2.. = 129512.55.- Jij = 7795(vi)y? 61‘ Adj oint State Equation Derivation N ow, the adjoint state equation will be computed. 3f T 6f,~( j) = f§(uj)g—: — I . . axm — fj(u1);aJm_a—£ f,’-(u,-) Z ajm6mi = f;- ("0013' = [ATAI'(“)] ij [555)] = 65(1) 6x 5 0x,- ' _. 22% — gay,- 6x.- (5.39) (5.40) 121 _ 34> g)( _ Z— 1(vJ)-6—_ 8:.“ 34> , = ;— 9j(vj) CJi = [CTJ Ag groggy ¢]_ Thus, in compact form: . 345 A=-ATA ...A+CTA..— If) aflay In conclusion, given a network whose states follow the dynamics described by = f(Ax + Be), x0 = x(to) (5,41) and whose output is y = 9(0): + De) (5.42) The dynamics of the adjoint system are described by A = —ATA,.(..)A + 07%,“)??? (5-43) The parameters are updated according to: A' = —nA,.(u)AzT (5.44) B = —nAf:(u)AeT . (5.45) C = 5A,.(,,%§zT (5.46) 11 34‘ = ”A91(v)-a-§CT (5.47) 122 5.5.2 Linear Dynamical Case The derivation of the update laws for this model is a special case of the nonlinear dynamic model discussed in the previous section. In this case, the functions f and g in equations (5.41) and (5.42) are the identity functions. Thus, their derivatives are - the identity matrices. Consequently, equation (5.43) is simplified to: 1 = —ATA + 0ng (5.48) and (5.44)-(5.47) are simply: A = -n)~ 5T (5.49) B = -17).! (5.50) C" = 17%ng (5.51) 1') = 52—3-3 (5.52) In order to compute the term 33, one can make direct computation as was pursued in Chapter 4 for the static case. One, however, can define new variables and transform the problem into one that is similar to the static case. This will be accomplished by considering the output equation of the linear dynamical system y = Cx + De (5.53) Then, define the vectors 3] and 5:, and the matrix W as ‘(t ll )4: II II (5.54) 123 However, y D C e x 0 I x Therefore, equation (5.53) can be written as y = W52 Consequently, the update law for the parameters of the matrix W is W = "[1474” — 5(9):?!) (1 e {a, b, c, d} First, one needs to find W‘T W-T: D(DTD)“1 0 —CTD(DTD)’1 I since D7 0 D(DTD)’1 0 I 0 CT I —CTD(DTD)‘1 I 0 I Consequently, the update law for the matrix C _ C = "flfa(Y)xT and that of matrix D is I D = v[D(DTD)-‘ — fa(y)eT] (5.55) (5.55) (5.57) (5.58) (5.59) (5.50) (5.61) 124 If D is a square nonsingular matrix, then equation (5.61) becomes D = )[D‘T—f.(y)eT]' (5.62) = 7(1 — fa(Y)yT]D‘T (5-63) 5.6 Computer Simulations Computer simulations of the update laws of the parameters of the matrices C and D, as defined by equation (5.60) and (5.62), were performed. The considered unknown sources are two sine waveforms with respective frequencies 1H2 and 2H 2. The envi- ronment model is the dynamical system represented by the following transfer matrix —1 .92 a s 1.0% 068:5 = Ht(3)-l (5 64) :2 28 3 32i6si8 0234-98440 1‘0: +133+42 One theoretical solution to the problem is the transfer function H ‘(s) whose canonical H(s) = state representation, D‘ = (A’, B‘, C‘, D‘), is defined by 0 1 0 0 0 0 0 0 0‘ —2 —3 0 0 0 0 0 1 0 0 0 -3 0 0 0 0 0 1 A“: 0 0 0 0 1 0 0 8‘: 0 0 0 0 0 -20 —9 0 0 1 0 0 0 0 0 0 0 1 0 0 _ 0 0 0 0 0 —42 ‘13. _0 1‘ = 13 5 —1.2 0 0 0 0 . 0*: 1.0 0.6 0 0 0 -—-3.6 —1.2 —34 -7.0‘ 0.2 1.0 125 The bode plots of the environment, defined by H(s), and the desired realization of the network model, defined by H ‘(s), are shown respectively in figures 5.3 and 5.4. Gain dB Phase deg f (rad/sec) 1 (rad/soc) Figure 5.3. Bode plot of the environment model defined by H(s) First, computer simulations were performed by updating only the parameters of one matrix while holding the parameters of the other matrix at their nominal values, as defined by D‘. In either experiments, the algorithm failed to converge to the desired solution. Figures 5.5 and 5.6 show the the algorithm fails to converge to the desired solution. Figure 5.5 shows the performance of the algorithm by updating the parameters of the D matrix while those of the C matrix are held at their nominal values. On the other hand, Figure 5.6 shows the performance of the algorithm by 126 Galn dB Phase deg f (rad/soc) f (rad/soc) Figure 5.4. Bode plot of the network model defined by H‘(s) 127 updating only the C matrix and fixing the parameters of the D matrix to their nominal values. Observe that the algorithm fails to converge in both simulations. The algorithm also fails to accomplish the convergence by considering the nonlinear functions fb, fc, fd as well as the cubic and the sine hyperbolic ones. Figure 5.7 shows a case of simulations when the nonlinear function is fd. The reason for the failure of the algorithm is the fact that the problem is not completely identifiable. Recall that from equations (5.7)-(5.10), the system has infinitely many solutions that it could converge to. The solution defined by D‘ = (A’, B‘, C‘, D‘) is one particular solution. The realization D}; = (A‘, 8", PC“, PD") is also another possible solution where P is generalized permutation matrix. Thus, by keeping one matrix constant at its nominal value, one is limiting the set of all possible solutions to one particular point in the set of equilibrium points. Next, computer simulations were performed by considering the simultaneous up- date of both parameters of the matrices C and D. Figure 5.8 shows the respective performance of when two sine functions are assumed to be the unknown sources. One observes that the all the parameters converge, but not to the desired values. On the other hand, if one considers an update law for the matrix D defined by (4.56) D = —7fa(Y)yTa a E {0, b: C: d} (5'65) the algorithm demonstrated its capability to converge only when the nonlinear func- tion is defined by fa. Figure 5.9 and 5.10 are samples of such simulations. One can observe the convergence of the matrix D in Figure 5.9 as all its parameters become equal to the desired values in a finite time. On the other hand, the parameters of the D matrix do converge in Figure 5.10, but not to the desired values. Figure 5.11 shows the test simulation of the algorithm defined by equation (5.65) using the nonlinear 128 Dynamic Environment eta = 0.005 0.4 - 0.3 ~ 0.1 - O l l l l l l I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 (I12 3 0.67 (121 s 0.37 0.8 ' vv’rvw vvrrv—vwrvrrv mv—V AAA- AAAA AAA- ijf V‘vf t-o-o—o-co------—----o—o- O 5 10 15 20 O 5 10 15 20 time (sec) time (sec) Figure 5.5. Dynamic Forward Structure. Update only the D matrix according to D = 7[D’T -— fa(y)eT] using two sine waveforms, 7 = 0.005 and D0 = 0 129 c1 1 012 C13 250 1 3 - O ' _1 . -2 —3 -4 -5 b _8 b _7 A 200 400 600 O 200 400 600 021 022 024 O O 5 200 r —3 ' .4 . -5 -5 -15 i -7 .8 ’ —20 -9 . -26 -10 r O O 500 Figure 5. 6. Dynamic Forward Structure. Update only the C matrix according to C = -17f..(y)xT using two sine waveforms, n = [1000 100 1 100 1 100 10] and Co- — 0. 80“ 130 Dynamic Environment eta = 0.5 0.08 - 0.06 - g 0.04 ~ 0.02 - o I L 1 l l 1 J 0 0.05 0.1 0.2 0.25 0.3 0.35 0.4 d12 d12 . 0.35 (121 = 0.051 0.51— ------—.——._._.-_._._.- 0.2,- _ .................... - 0.15 0.4» W N N “'5 :5 O 1 0.2 0.05 0 . - 5 - 0 1 L e a 0 5 10 15 20 0 5 10 15 20 time (see) time (sec) Figure 5.7. Dynamic Forward Structure. Update only the D matrix according to D = 7[D‘T — fd(y)eT] using two sine waveforms, 7 = 0.5 and D0 = 0 131 011 c12 C13 40' 5’—---—--—‘----- 2’ [ . 20 O _ ........... -.-._ ......... l __2 i _w i WM 0 A O 2% 400 600 <32 200 400 $5) 0 200 ago 600 40 5 100' O -C-O—O-l_._l d12 (A . d2 0 on 0 200 400 600 0 200 400 500 time (see) time (sec) Figure 5.8. .Dynamic Forward Structure, Update both the C and D matrices ac- cording to D = 7[D'T - fa(y)eT] and C = —17f.,(y)xT using two sine waveforms, 7 = 0.005, 17 = [100 10 1 1000 5 2000 10]. All initial conditions are zero 132 function fa at convergence. Obeserve that the output of the network is almost an exact replica of the unknown sources. Dynamic Environment eta = 0.005 0.25 — 0.2 r .— 0.15 — 0.1 ~ 0.05 - o 1 l l 1 l l I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 012 012 = 0.6 021 = 0.2 0.31 0.25- 0.6 ------ _ - A -22- fl- 0'2’ " 0.15» N N 5 ° 4 5 0.1 0'2 0.05 0 . - . - 0 - - - - 0 5 10 15 20 0 5 10 15 20 time (sec) time (see) Figure 5.9. Dynamic Forward Structure. Update only the D matrix according to D = —7f,,(y)yT using two sine waveforms, 7 = 0.005 and D0 = 0 Other nonlinear function, reported in [8], were also considered. Computer simu- lation of the following algorithms were performed 1') = —7y3y (5.66) D = —-7 sinh y tanh y (5.67) 133 Dynamic Environment eta = 0.1 0.1- 0.08" 0.065 g 0.04- 0.02“ 0 .0.02 l l l J l l J 0 0.05 0.1 0. 1 5 0.2 0.25 0.3 0.35 d12 d12 :- 0.34 (121 I 0.081 0.15’ 0.4’ N N 0.1 A a r =5 0.05’ 0.2 0 0 ‘ ‘ 4 4 -0.05 L ‘ * ‘ 0 5 10 15 20 0 5 10 15 20 time (sec) time (89¢) Figure 5.10. Dynamic Forward Structure. Update only the D matrix according to D = -7f.1(y)yT using two sine waveforms, 7 = 0.5 and D0 = 0 134 '50 / ..1 . e A A 20 20.5 21 21 .5 22 20 20.5 21 21 .5 22 20 20.5 21 21 .5 22 -%0 20.5 21 21 .5 22 ...1 A A A - 29 ,0-3 20.5 21 21.5 22 2 20 20.5 21 21.5 22 '220 20.5 21 21.5 22 Figure 5.11. Dynamic Forward Structure. Test Simulation of the update law D = --7fa (y)yT at convergence using two sine waveforms 134 _1 A A A n 1 A A A 20 20.5 21 21.5 22 20 20.5 21 21.5 22 2 . Figure 5.11. Dynamic Forward Structure. Test Simulation of the update law D = -7fa (y)yT at convergence using two sine waveforms 135 Figures 5.12 and 5.13 show their performance. Observe that both nonlinear functions are capable of separating signals. The reason of exploring the algorithms defined by equations (5.66) and (5.67) is to show that the nonlinear function fa performs as equal to other nonlinear functions that have shown to provide excellent performance, but that were chosen heuristically[8]. This is in contrast to the function fa that was developed based on a well defined energy function and independence criterion. However, the function fd, reported in [24], did not provide the same performance. This is due to the poor approximation of entropy approximation and/or the assumption of the unit variance. Dynamic Environment eta = 1 0.3 ' 0.2 ' 8 0.1 "' o I I I I I I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (112 (112 II 0.6 (121 I: 0.2 0.8 ' O 5 10 time (sec) 15 20 0.3 [ d12 O 5 1O 15 20 time (sec) Figure 5.12. Update of the D Matrix using D = —7y3y 136 Dynamic Environment eta = 1 0.4 - 0.3 r 0.1 - O l I L I I I I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d1 2 d12 = 0.6 d21 :- 0.2 0.8 - 0.4 . 0 5 10 15 20 0 5 1O 15 20 time (sec) time (sec) Figure 5.13. Update of the D Matrix using D = —7 sinhytanh y 137 5.7 Observations Despite the limited results that one obtained in this chapter, a general framework to derive an update law was developed. Such a framework can be applied to any network structure. It was also observed that the introduced forward structure , in this chapter, is subject to an identifiability problem due the freedom of the possible solutions that the network could converge to. Thus, the task is to define an identifiable structure and to use the herein developed tools to define the update law for that structure. It is intended to do so in the following chapter. CHAPTER 6 Blind Separation in a Dynamic Environment: Feedback Structure In this chapter, an architecture different from the one discussed in the previous chapter is considered. It does not represent a particular network architecture, nor does it limit the type of environments that can be considered. In this research, however, it will be shown that any type of environment can be considered. The state space representation of the network will be presented and the existence of a theoretical solution to the problem shown. Then, an update law for the parameters of the network based on an analogy of the static case will be developed. 6.1 Architecture General systems are defined by their input-output relationship. For time-invariant systems, such a relation is described by a transfer function L(s) which relates the input S (s) to the output E(s) E(s) = L(s)S(3) (6.1) 138 139 If s 6 'R" and e 6 72'", then L(s) is an m x 1' matrix r 1 L11 L..- 1... L(3) = L11 Lit" Lir (6-2) Lml ' Lm: LmrJ 1 £11. 111,1 L11 0 o l L(3) = I?“ 1 f1,r 0 L“ 0 (6.3) H... H...- 1 0 o L... where H.,-=%:—:j (64) Let’s denote Z(s) to be the diagonal matrix of L(s). Thus, equation (6.1) is equivalent to the following two equations: 13(3) = g(3)g(3) (6-5) and 3(8) = 13(3)S(3) (66) Therefore, the system described by equation (6.1) is equivalent to the cascaded pro- cessing of two systems described by equations (6.5) and (6.6). Equation (6.5) performs 140 the pre-filtering process, while equation (6.6) performs the mixing and/or filtering process that describes the model of the environment. In the blind separation of signals, the goal is to recover the unknown sources. This requires the development of a network that performs the inverse function of the pre-filtering and mixing/filtering processes. Thus, such a network is also a cascade of two processes. The first one will be devoted to reverse the mixing and/ or filtering process, while the second one will be devoted to the inverse filtering. Let’s denote y and y to be the respective outputs of these processes. A feedback structure that models the inverse process of the mixing and/ or filtering is as follows 173) = 13(5) — H(s) 17(3) (6.7) where H (s) is a matrix with zero diagonal entries. Thus, the goal of such a process is to recover a replica of the §. Therefore, using equations (6.5) and (6.7), the following equation is obtained: I + H(s) = H(s) (6.8) Thus, with an appropriate update law of the parameters of the matrix transfer func- tion H (s), the problem has the solution 110(3) = 1310(3) W 751' (6-9) Finally, to recover the original signal 8, some filter K (s) will be applied to 5', which would result in an output Y(s) = 1(5):: (6.10) = K(s)§ (6.11) K(s)l(s)5(s) (6.12) 141 If the goal is to recover S (s), then K(s) = Z(s)"l (6.13) However, no knowledge of the Z(s) is available. Thus, one could suffice with a solution to the problem that is up to a filter of the unknown sources. Figure 6.1 shows a schematic diagram of these processes. I’m-Filtering Mixing/Filtering Separation Algorithm Inverse Filtering its) W0 ---—9 8(5) H(s) ash Sts>__., I.(s) K0) L H(s) +—l Figure 6.1. Feedback Processing Structure 6.2 State Representation In the last chapter, the canonical controllable realization to define the state space realization of a given transfer function matrix was considered. Such realization will also be used here to model the network. However, one should keep in mind that the transfer matrix H (s) has zero diagonal entries. Therefore, the state space represen- tation of H,,-(s) is ’D.-,- = (0,0,0, 0). Consequently, the 1"“ entries of the matrices Ag, 142 b,-, c? and 03‘, as defined in equations (5.19) and (5.19), are null. Despite the fact that these entries are redundant to the network model, they will be preserved for the sake of clarity of the presentation of the model. Here is a canonical controllable realization ’D = (A,B,C, D) for H (3) Such a realization is defined by equations (5.21) and (5.21). The state equation is described by i x = Ax + By (6.14) while the output equation is described by y = e — Cx — Dy (6.15) Having defined the structure and the realization of the network model, the update laws for the parameters C and D are now ready to be developed. 6.3 Update Law Derivation To develop the update law for the feedback structure, one needs first to define the vectors 5' and E and the matrix W as y e .. D C y = x = W = (6.16) x 0 0 I However, e D C y = — _ y (6.17) 143 Therefore, equation (6.15) can be rewritten as ~ 5' = é - W? (6.18) Using Theorem 2, the parameters of the matrix W are updated according to W = 0.000? a e {a, b, c, d} (6.19) This will reduce to = nfa(Y)XT (6'20) D = 7fa(y)yT (6'21) where 1) and 7 are respectively the learning rates of the matrices C and D. 6.4 Computer Simulation Results To study the performance of the algorithm and the feedback structure, one needs to consider two approaches depending on the origin of the nonlinear function. Two different approaches will be considered. One approach will be based on the nonlinear functions fa and fd, which were developed in Chapter 4, based on the mutual infor- mation functional. The other will based on selecting the nonlinear functions reported in [8]. These two parts define functions that were developed based on the mutual in- formation approach and others that were based on the neuromemitic approach. This comparative study will be considered as a mean to quantify the performance of the developed nonlinear function fa vis-c‘z-vis other existing nonlinear functions. Computer simulations were conducted for the proposed architecture. A two di- mensional network was considered. The two unknown sources were filtered and mixed 144 according to the following transfer function defined by 0.6 3+1 3+4 ‘ 11(3): 1 W = 1 H12“) #4403454: 1 Hats) 1 Therefore, the canonical controllable realization is D = (fl, 3, C, D) defined below as 1 0 0 0 0 O 0 0 0 0 0 1 0 0 0 0 0 _ 0 —42 —13 0 0 0 _ 0 l A: B: 0 0 O 0 1 0 0 0 0 0 0 —40 —13 0 1 O 0 0 0 0 0 0 _0 0 _ 0 -22.8 —4.8 0 0 0 _ 1.0 0.6 C: D: 0 0 0 —13.6 —3.2 0 L0'4 1.0 The bode plots of 312(3) and 1.121(3) are shown in Figure 6.2. Also, the bode plot of H(s) = H(s)‘l is shown in Figure 6.3. 6.4.1 The Approach Based on Mutual Information The algorithm defined by equations (6.20) and (6.21) will be studied by exploring the nonlinear functions fa and fa. Algorithm defined by fa Computer simulations of the algorithm defined by the equations (6.20) and (6.21) and the nonlinear function fa was studied. C = Ufa(y)xT D = 7fa(Y)yT (6.22) (6.23) 145 Gain 03 Phase deg _1 L .' _1 L 10° 105 10° 105 i (rad/sec) i (rad/sec) Figure 6.2. Frequency response of the environment model 146 Gain dB Phase deg f(rad/sec) I (rad/sec) Figure 6.3. Frequency response of the network model 147 In order to study its performance thoroughly, the parameters of the network will be initially updated separately. This was accomplished by updating some parameters of the network, while holding the remaining parameters, constant to their nominal values. The first set of simulations considered the update of the individual parameters of the C matrix, namely €12, c13, 621 and 022. The initial value of each of these parameters was equal to 0.8 of their nominal values. Also, the unknown sources , were assumed to be 2 sine waveforms with respective frequencies 1H 2 and 2H 2. Figures 6.4 through 6.7 display the results of such simulation. One could observe that all the parameters converge individually to the desired values. It is important to note that the learning rates of each of the parameters of the C matrix are different, [1712, 1713, 1121, rm] = [101 10’1 10° 10’2]. Once it was demonstrated that each of the parameters converged to the desired value, the update law of all the parameters of the C matrix were combined. Computer simulation were conducted assuming different initial conditions of the parameters of the C matrix. Figures 6.8 through 6.11 represent the results of simulations when the initial conditions are respectively 0.8C", 0.5C‘, 0.10“ and 0. In all these simulations, one could observe that all the parameters of the C matrix converge to the desired values. So far, only simulation results of the performance of the parameters of the C matrix were considered. These simulations showed excellent results in terms of the performance of the algorithm in separating the unknown sources. The performance of the algorithm in updating the parameters of the D matrix is considered next. Computer simulations for different initial conditions of the D matrix, while the parameters of the C matrix are held at their nominal values. were considered. Figures 6.12 through 6.15 show the results of the simulations when the initial conditions are respectively (d12, d21) = (0,0), (0,1), (1,0) and (1,1). Figure 6.16 combines the 148 performance of the algorithm for all these tests. One could observe that that the parameters of the D matrix also converged to the desired ’values. Despite the fact that the matrices C and D converged to the desired parameters when each one of these matrices was updated separately, some parameters of the network failed to converge when both matrices are updated simultaneously. Therefore, sequential update will be considered in the next section to resolve this problem. Algorithm defined by fd When the nonlinear function fd is assumed to define the nonlinear function of the algorithm described by equations (6.20) (6.21), the parameters of the matrices C and D converged to some undesirable solution, causing the algorithm to fail in separating the unknown sources. Figures 6.17 and 6.18 display the performance of the algorithm when each matrix is updated separately. This last result shows again the superior performance of the algorithm defined by the nonlinear function fa compared to the one defined by fd, as developed in [24]. -16 149 Update the parameter c12 while keeping all other parameters fixed -17 -18 -19 —22 —23 -24 0 -.---0--. _ WW 11% i"""'l L I I 20 40 60 C12 3 -23.42 ”I '1' I!" ’ l I 80 100 time (sec) 120 140 160 Figure 6.4. 6.3 = mjfa(y.°)2.'j and "ii = 10 -16 -17 -18 -19 -20 -22 -23 -24 150 Update the parameter 012 while keeping all other parameters fixed I I " ‘. C12 :- -23.42 l l l l O 20 4O 60 80 100 120 140 160 180 200 time (sec) Figure 6.5. 65,- = mjfa(y.-)x.-j and 17,-,- = 101 -10.5 —11 -11.5 -12 -12.5 -13 -13.5 -14 O I I I l 151 Update the parameter c21 while keeping all other parameters fixed c21 = -13.47 20 4O 6O 80 100 120 140 160 180 200 time (see) Figure 6.6 Cij = ngjfa(y.-)x.-j and 1751' = 100 152 Update the parameter (:22 while keeping all other parameters fixed I N \l l I N ‘0 l c22 = -3.202 _3.2_.-._._.._. _3.3 1 L 1 l 1 l l 1 1 I O 20 4O 60 80 100 1 20 1 4O 1 60 1 80 200 time (sec) Figure 6.7. c1)- = ngjfa(y.-):r,-J- and 17.5 = 10"2 6.4.2 An Approach Based on Neuromemitic Update of C Matrix The parameters of the C matrix will be updated according to a decorrelation between the output and the state. Therefore, its parameters will be updated according to: (:5,- = flijyixij (6-24) Computer simulations were performed for each of the parameters of the matrix C individually. Then, the update of all of these parameters together was performed 153 C12 C13 Figure 6.8. Update all the parameters of the 0' matrix according to (3,5 = 7].-]- fa(y,-)a:,-J-, (’712, 7’13, 7’21, "22) = (101 01a 2a 01) and CO = 080 154 c12 c13 100 c21 100 O 50 100 Figure 6.9. Update all the parameters of the C matrix according to éij = ngjfa(yi)$ij, (7)12, 7713, 1721, 1722) = (10, 0.1, 2, 0.1) and Co = 0.50 155 c12 c13 100 100 Figure 6.10. Update all the parameters of the 0 matrix according to (3,5 = 7].-,- fa(y,-):c,-j, (7)12, 7113, 7121, 17213) = (10, 0.1, 2, 0.1) and Co = 0.10 156 C12 013 100 Figure 6.11. Update all the parameters of the C matrix according to éij = Uij fa(yi)$.‘j, (1712: 013, 7121, 7122) = (10, 0.1, 2, 0.1) and Co = 0 157 Update D while C is at the nominal value with eta = 0.0005 0.5 - 0.4 P d21 0.2 - 0 0.1 0.2 d12 = 0.6 0.6._._._.....-._J_ 0.3 0.4 d12 0.5 ' d21 0 50 time (sec) 100 0.5 0.6 0.7 d21 = 0.4 L 50 100 time (sec) Figure 6.12. Update all the parameters of the D matrix according to dij = 7fa(y,-)yj, 7 = 0.0005 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(47rt) 158 Update D while C is at the nominal value with eta = 0.0005 0.8 r g 0.6 - 0.4 - 0.2 l l l l l L l I -0.1 O O 1 0 2 0.3 0.4 0.5 0.6 0.7 d12 d12 = 0.6 d21 = 0.4 0.8 ' 1 ' 0.8 » N v- 15 $61 0.6’ 0.4 , . . . . . . . . . -O.2 ‘ ‘ 4 ‘ 0.2 4 ‘ ‘ d 0 50 100 150 200 O 50 100 150 200 time (sec) time (sec) Figure 6.13. Update all the parameters of the D matrix according to (L'j = 7fa(y.-)yj, 7 = 0.0005 and (d120, (1210) = (0.0, 1.0). The unknown sources are 31 (t) = sin(21rt) and 32(t) = sin(41rt) 159 Update D while C is at the nominal value with eta = 0.0005 0.5 - 0.4 - d21 0.2 - 0.1 - o l ‘ 1 I 0.55 0.6 0.65 0.7 0.75 08 0.85 0.9 0.95 1 1.05 d12 d12 = 0.6 d21 = 0.39 0.5 ’ O4 ................. 0.3 . 5 '0 0.2 r 0.1 . r ' . o . . O 50 100 0 50 100 time (sec) time (sec) Figure 6.14. Update all the parameters of the D matrix according to dij = 7 fa(y.-)y,-, 7 = 0.0005 and (d120, (1210) = (1.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) 160 Update D while C is at the nominal value with eta = 0.0005 1 .— 0.8+ g 0.6 - 0.4- 0.2 l 1 l l l l l l l I 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 d12 012 = 0.6 d21 = 0.4 1 . 1 . 0.8 N 1- 508+ g00 0.4 ...... .-.-...“ * -—~ _ ~12; 0.6 . ........ . ..... v ‘4' :::::: _ W ‘ ' 0.2 ‘ ' 0 50 100 0 50 100 time (sec) time (sec) Figure 6.15. Update all the parameters of the D matrix according to (15,- = 7 fa(y.-)yj, 7 = 0.0005 and (d120, (1210) = (1.0, 1.0). The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(41rt) 1'61 Update D while C is at the nominal value with eta = 0.0005 0.9 I 0.8 I T 0.7 d21 0.5 -' I 0.4 oooooooooooooooooooooooooooooooooooooooooooooooooooo V 0.3 0.2 r 0.1 - 0 l l l l ‘ I -0.2 O 0.2 0.4 0.6 0.8 1 1 .2 d12 Figure 6.16. Update all the parameters of the D matrix according to dgj = 7fa(y.-)yj and 7 = 0.0005 for different initial conditions. The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) 162 c12 C13 —18' -19~ -2o- -21' i F L h -22» _23----""='"""" —5.5 o 50 100 O 50 10° c21 ' 022 0» -1» -2, -3’_,_._._._,_,_,--._._._._.. _4 - 0 so 100 Figure 6.17. Dynamic Feedback Structure. Update only the C matrix according to C = 17 fd(y)xT using two sine waveforms, r] = [100 10 100 10] and Co = 0.8C” 163 Update D while 0 is at the nominal value with eta a: 0.2 0.12 - 0.1 0.03 g 0.06 0.04 0.02 I I I l l l O 0.05 0.1 0.1 5 0.2 0.25 d12 (“2:024 d21 30.11 0.6 ............... .... _____ .-.- 0.4.-.._._.-._._._.-._._._._._ 0.4 ' 0.2 0.1 O 50 100 O 50 100 time (see) time (sec) d12 d21 o N Figure 6.18. Dynamic Feedback Structure. Update only the D matrix according to D = nfd(y)yT using two sine waveforms, 17 = 0.2 and D0 = 0 164 while keeping the parameters of the matrix D fixed at their nominal values at all times. These simulations were performed with two sine waveforms of respective frequencies f1 = 1H2 and f2 = 2H 2. The initial conditions for each case were such that the parameters to be updated were 0.8 of the nominal values. Figures 6.19 through 6.22 show, respectively, the convergence of the parameters 612, cm, 621, (:22. Each one of these parameters required a different learning rate. The fact that each of these parameters is the coefficient of some power 3, where s is the time derivative operator, may reveal the reason for this behavior. Therefore, there is an order of time scale between these parameters. Later in this chapter, this concept will be introduced. In addition, it will be shown that, under a well defined transformation, such a time scale could be reduced, as well as all the parameters of C, to be in the same order of magnitude. Figure 6.23 shows the convergence of the parameters of the matrix C when all the parameters were updated simultaneously. Observe that all the parameters have converged to the correct values. K Using the update law defined by (6.24), a set of simulations was conducted to study the performance of the algorithm for the C matrix based on the choice of the initial conditions. Figures 6.23 through 6.26 show the performance when the initial conditions were respectively 0.8, 0.5, 0.1 and 0.0 of the nominal value of the C matrix. Regardless of the initial conditions, the parameters converge to the desired values. These simulations were also repeated with square and sawtooth waveform functions. The same results were obtained as when using sine waveforms. Some sample results of using different unknown sources were as follows. For instance, Figure 6.27 shows the performance of a sine waveform of frequency 1H2 and a square waveform of frequency 2H 2. The initial conditions were all at 0. Figure 6.28 shows the performance of a sawtooth waveform of frequency 1H 2 and a square waveform of frequency 2H z. The initial conditions were all at 0. The update law defined by equation (6.24) was modified to include some nonlin— 165 earities. This was motivated from the work of Herault and Jutten [8, 9]. The update equations were the following: (3:5 = nut/fixa- (5-25) and a, = 17,-,- sinh y.- tanh 33,-,- (6.26) Computer simulation of both algorithms were conducted using the same initial condi- tions, and mixing matrix, and unknown sources. The results of these two simulations are displayed respectively in Figures 6.29 and 6.30. One observes that both algorithms converge. Update of Matrix D Starting from the energy function that defines the total power in the signal, one would derive an derive an update law for the parameters of the matrix D similar to (5.40) d5,- = 7.13/{61' Computer simulations were performed using 2 sine waveforms of respective frequen- cies 1112 and 2H 2. Zero initial conditions were considered. The results of such a simulation is displayed in Figure 6.31. It should be noted that the network failed to converge to the desired values. On the other hand, successful update of the D matrix, based on the rules developed in [8], was also accomplished. do = vomit/j (6.27) 166 and dgj = 7,-1- sinh y,- tanh y, (6.28) The performance of the algorithms defined by equations (6.27) and (6.28) are shown respectively in Figures 6.32 and 6.33. The learning rate 7 is taken to be the same for all the parameters in each of the above simulations, and is equal to 0.2. The algorithm defined by equation (6.28) will now be considered and its perfor- mance will be studied. First, its performance for different initial conditions will be investigated. Figures 6.33 through 6.36 represent a set of simulations for 4 different initial conditions. The algorithm always converges to the desired values regardless of the initial condition. Figure 6.37 shows the convergence of the algorithm defined by equation (6.28) for the above initial conditions when plotted together. The performance of the algorithm for various unknown types of sources, sawtooth, square and sine functions was also studied. Figures 6.38 through 6.40 show respec- tively the performance of the algorithm for the following scenarios: (i) using a sine and a square of respective frequencies 1H2 and 2H 2, (ii) using a sine and a sawtooth of respective frequencies 1H2 and 2H 2, and (iii) using a a sawtooth and a square of respective frequencies 1H2 and 2H 2. One can observe that the parameters converge to the desired values in all these scenarios. In conclusion, it was demonstrated that the algorithms for. updating the matrices C and D have converged when each of these parameters is updated individually by considering the mutual information approach or the neuromimetic approach. It was also shown that both algorithms perform well under various initial conditions, for different types of unknown sources, as well as for different learning rates in the case of the C matrix parameters. It was also shown that the algorithm, defined by the function fd, which was developed in [24] failed to perform the separation task. It 167 was also shown that the algorithm defined by fa has equivalent performance to those defined based on the neuromimetic approach. The novelty of our approach resides in the fact that the nonlinear function f,I was developed based on the minimization of an energy function that defines the independence of signals while the other functions in the literature were motivated from biological inspirations. Update C and D In the previous two sections, the performance of the algorithms for updating the matrices C and D, separately was explored. In this section, both update laws will be combined. In this case, the parameters will be updated simultaneously. The parameters of the C matrix were updated according to a, = 770316150 (6-29) and those of the D matrix were updated according to dgj = 7;, sinh y; tanh g,» (6.30) One parameter at a time In the first set of simulations, only one parameter of each of the matrices C and D, namely cm and d12 were updated. These two parameters will be updated according to the equations (6.29) and (6.30). In the first simulation, the parameter d12 was initially set at its nominal value, £2 = 12, while the parameter cm is initially set to be 0.8 of its nominal value, offs = 0.8613. The results of this simulations are presented in Figure 6.41. In this case, both parameters converge to the desired values as it can be observed in Figure 6.41. In the next simulation, the initial conditions of the parameters were reversed. The parameter cm was initially set at its nominal value, cfl’3 = Cisa while the parameter 168 cm is initially set to be 0.0, diz = 0.0. The results of this simulations are presented in Figure 6.42. One, however, observes that the parameter c13 did not converge yet, as it is shown in Figure 6.42. This is due to its small learning rate 1713 = 20. Thus, when the learning rate is increased to 1713 = 40, both parameters converge, as displayed in Figure 6.43. It should be noted that, in Figure 6.43, despite the fact that the parameter cm was initially at the desired value, it was pushed away from it while the parameter d12 was still searching for the path to correct solution. Ultimately, both parameters converge to the desired values. Figure 6.44 shows the phase plot of the parameters d12 and c13. All the parameters together The previous section proved that the update laws of the matrices C and D were successful when only some parameters of each matrix were updated. The idea behind this set of simulations was to study any coupling effect between the two update laws. It was concluded that the algorithm performs well. In this section, all the parameters of the matrices C and D will be updated and the algorithm’s performance will be observed. A set of simulations will be conducted in which some parameters were given their nominal values as the initial conditions, while starting others at some values other than their nominal values. The unknown sources used, for this case, were the sawtooth and the square waveforms with respective frequencies f; = 1H2 and f2 = 2H 2. Figures 6.45 through 6.48 show the performance of the network for different initial conditions, while the learning rates for the matrices C and D were held constant at all simulations. Figure 6.45 shows that the algorithm could not converge for the specified time range and learning rate. Thus, a simulation with a higher learning rate for the parameters 621 and 622 was required, as shown in Figure 6.46. However, a longer running time was also needed. Thus, when T = 1000, the algorithm was able to converge. Figure 6.47 shows that all the parameters. did 169 converge, except for one, on. One possible route to solve this problem is to consider sequential update. 6.5 Sequential Update In Figure 6.47, it was noted that all the parameters converged except for one parame- ter, 6;]. In order to eliminate this phenomenon, we consider updating the parameters ”sequentially”. This was accomplished by updating the set of parameters of the C and D matrices at alternate time intervals of length AT. This means that starting at time to, and without loss of generality, the parameters of the matrix D will be updated during the time interval [to+ (k —1)AT,to kAT] while those of the C matrix will be held constant. However, during the time interval [to + IcAT, to + (k + 1)AT], the order of update will be switched so that the parameters of C will now be updated, while those of D will remain constant at their nominal values. Figures 6.49 through 6.51 show the performance of the algorithm when such a sequential training method is applied with AT = 50 seconds. Figure 6.49 shows the performance when the learning rates of the parameters of the C matrix were 77 = [103 10 102 10] and that of the D matrix is 7 = 0.2. Observe in this case that the parameter 621 is converging but a longer time is required for it to converge, or a larger learning rate is required. Also, one observes that in Figure 6.49, all the parameters of the C matrix, except €21, converged in a small period of time, while exhibiting some oscillation. This is due to the large learning rates at which they were updated. Thus, smaller update rates were considered and taken to be 7; = [200 10 200 10]. The results of these simulations are presented in Figure 6.51. Also, Figure 6.50 shows a simulation when '1] = [1000 10 500 10]. Figure 6.52 shows the response of the network at the start of the update. One observes that the output signals y(t) were different from the desired signals s(t). 170 Recall that the initial condition of the system was not set at the nominal values, but rather Co = 080' and Do = 0. Figure 6.53 shows the performance of the algorithm after all the parameters have converged. It can clearly be observed that the original signals were recovered and that the error was in the order of 10’3. The network is tested by changing the unknown source to a pair of a sawtooth and a sine functions. Figure 6.54 shows the signal recovery performed by the network. The final values to which the network converged were [cu cm, 621 622] = {—21.9783 —4.7968 — 13.4989 — 3.3635] and [(112 d21] = [0.3992 0.6006] Update the parameter c12 while keeping all other parameters fixed -1 5 -16 C12 = -23.25 _22- --.-._._._._._..-._.-._._ ‘llllm .. _23- .—24 i l 1 l A l 1 1 1 j 0 20 4O 60 80 1 00 1 20 1 4O 1 60 180 200 time (see) Figure 6.19. 6,3 = mjygxgj and 1751' = 2000 171 Update the parameter c1 3 while keeping all other parameters fixed .43--.-.-.-.-.-.-.._._.-._._._._._llllm _5 l l l l l 1 1 l J O 20 4O 60 80 100 120 140 160 180 200 time (see) Figure 6.20. c},- = flijyil'gj and m,- = 20 172 Update the parameter (:21 while keeping all other parameters fixed —10.5 - -125 — 021 = —13.59 -13.5-_ _ .l ll 1l on. ll. _14‘s 1 l l l l l l I O 20 4O 60 80 100 120 140 160 180 200 time (see) Figure 6.21. é,,- = 17.33/ng and 1);,- = 500 173 Update the parameter c22 while keeping all other parameters fixed c22 = —3.225 20 4O 60 80 1 00 1 20 1 40 1 60 1 80 200 time (see) Figure 6.22. 6;,- = mJ-ygmgj and 17,-,- = 10 174 C12 C13 02" c22 _10 . Figure 6.23. Update all the parameters of the C matrix according to Cij = mJ-ngj, (7’12, "131 "213 ”22) = (103,10, 102, 1) and 0'0 = 08C 175 C12 (:13 021 (:22 Figure 6.24. Update all the parameters of the C matrix according to c},- = nijyixij, (7712, 7713, 1721, ’722) = (103,10, 102, 1) and Co = 0-50 176 c12 C13 Figure 6.25. Update all the parameters of the C matrix according to (3.-,- = mJ-ygxgj, (7)12, 013, '721, 7722) = (103,10,102,1) and Co = 0-10 177 C12 C13 c21 c22 600 0 zoo 400 600 Figure 6.26. Update all the parameters of the C matrix according to 6,,- :: 7103151713, (nu, ms, 1721, 7122)=(103,10,102,1) and c0 = o 178 C12 C13 C-O-O-—O--——l—l--O-O- 200 400 600 c21 622 600 0 200 400 600 Figure 6.27. Update all the parameters of the C matrix according to a, = flijyixij, (7)12, 1113, 1721, 1722) = (103,10,102, 1) and Co = 0. The unknown sources are 31(t) = sin(21rt) and 32(t) = square(41rt) 179 Figure 6.28. Update all the parameters of the C matrix according to é.)- = flijyiitgj, (1)12, 1713, 7121, 1122) = (103,10,105,102) and Co = 0. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt) 180 C12 C13 022 o. -1 -2. .3 -- _4 r 600 0 200 400 600 Figure 6.29. Update all the parameters of the C matrix according to (3;,- = 17;,- sinh ygtanh 3.3, (1112, 1713, 7721, 1122) = (103,10,102,10) and Co = 0. The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(47rt) 181 C12 C13 021 c22 0 ’ 0 . -1 ..5 » -2 - -10 . .3 - -15 ‘ ‘ ‘ -4 ‘ ‘ ' 0 200 400 600 0 200 400 600 Figure 6.30. Update all the parameters of the C matrix according to (3.-,- = nag/£322,” (1)12, 1713, 1721, 1722) = (103, 10, 102, 10) and Co = 0. The unknown sources are 31(t) = sin(2nt) and 32(t) = sin(41rt) 182 Update D while C is at the nominal value with eta = 0.1 0.8- 0.6- §o.4- 0.2r o l l l l l l l 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 d12 = 0.67 d21 = 0.62 0.8- 0.8' 0.6.-.-.-. -.-._.-.-._.._.._._. 06, N v- 304 g0.4._. ........ ._._._. ..... .-. 0.21 0.2’ 0 ‘ ‘ 0 ‘ ' 0 50 100 0 50 100 time (see) time (sec) Figure 6.31. Update all the parameters of the D matrix according to d5,- = 73/56,, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(41rt) 0.5 0.4 0.3 d21 0.2 0.1 183 Update D while C is at the nominal value with eta = 0.1 I 0 0.1 0.2 d12: 0.6 p...-—---—--- 0.3 0 50 time (sec) 100 d12 d21 d21 = 0.4 0.5 ' Figure 6.32. Update all the parameters of the D matrix according to (iij = 7y?y,-, 7 = 0.2 and (due, (1210) = (0.0, 0.0). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) 184 Update D while C is at the nominal value with eta = 0.2 0.5 0.4 r 0.3 r d21 0.2 - 0.1 - O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 d12 = 0.6 d21 = 0.4 d21 Figure 6.33. Update all the parameters of the D matrix according to d},- = 7sinh ygtanh 31,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.0). The unknown sources are 31 (t) = sin(21rt) and 32(t) = sin(41rt) 185 Update D while C is at the nominal value with eta = 0.2 0.5 I 0.4 I 0.3 d21 I 0.2 I 0.1 l l l O I 1 I 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 d12 d12 = 0.6 d21 = 0.4 1 ' 0.5' 0.4.-.-._._._.- 0.3r N 1- v- N 'O '0 0.2? 0.1 > 0.5 a . ‘ . ' 0 * ‘ a ‘ 0 50 100 150 200 0 50 100 150 200 time (see) time (sec) Figure 6.34. Update all the parameters of the D matrix according to d},- = 7sinh ygtanh y,-, 7 = 0.2 and (d120, (1210) = (0.0, 0.9). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) 186 Update D while C is at the nominal value with eta = 0.2 I 0.8 0.6 *- d21 0.4 r d12 d12 = 0.6 0 50 100 150 200 time (sec) 0.2 l l l l 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d21 = 0.4 L 0.2 0 50 100 150 200 time (sec) Figure 6.35. Update all the parameters of the D matrix according to al,-,- == 7sinh ygtanh 31,-, 7 = 0.2 and (d120, (1210) = (0.9, 0.0). The unknown sources are 31(t) = sin(27rt) and 32(t) = sin(41rt) 187 Update D while C is at the nominal value with eta = 0.2 0.8 - 0.6 d21 0.4 r l L l l l l 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 d12 d12 = 0.6 d21 = 0.4 0.9 ' 0.8 ’ 0.8 . 50 100 150 200 0 50 100 150 200 time (sec) time (sec) 4* 0.5 0 Figure 6.36. Update all the parameters of the D matrix according to (iij = 7sinh ygtanh y,-, 7 = 0.2 and (d120, duo) = (0.9, 0.9). The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(41rt) 188 Update D while C is at the nominal value with eta = 0.2 0.9 , I 0.8 I 0.7 d21 0.2 r 0.1 - . l I 1 L l l l l l 0 0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9 1 d12 Figure 6.37. Update all the parameters of the D matrix according to d},- = 7sinh y.- tanh y,- and 7 = 0.2 for different initial conditions. The unknown sources are 31(t) = sin(21rt) and 32(t) = sin(47rt) 189 Update D while C is at the nominal value with eta = 0.2 0.6 ' 0.4 '- Ki '0 0.2 r O J I l I I L I O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 d12 = 0.59 d21 = 0.4 0.8 i 0 6 ' d21 0 50 100 0 , 50 100 time (see) time (sec) Figure 6.38. Update all the parameters of the D matrix according to d},- = 7sinh ygtanh yj, 7 = 0.2 and (due, duo) = (0.0, 0.0). The unknown sources are 31(t) = sin(27rt) and 32(t) = square(41rt) 190 Update D while C is at the nominal value with eta = 0.2 0.5 I 0.4 0.3 d21 0.2 I 0.1 0 J l I l L l I 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 d12 = 0.6 d21 = 0.4 d21 0 50 100 time (sec) Figure 6.39. Update all the parameters of the D matrix according to d},- = 7sinh ygtanh 3],, 7 = 0.2 and (due, (1210) = (0.0, 0.0). The unknown sources are 31 (t) = sin(21rt) and 32(t) = sawtooth(47rt) 191 Update D while C is at the nominal value with eta = 0.2 0.6 - 0.4 r a '0 0.2 r O l l l l l 4 l 0 0 1 0 2 0.3 0.4 0 5 0 6 0 7 d12 d12 = 0.6 d21 = 0.4 0.8' 0 6 ' d21 o A . O L . 0 50 100 0 50 100 time (sec) time (sec) Figure 6.40. Update all the parameters of the D matrix according to d},- = 7sinh ygtanh 3],, 7 = 0.2 and (due, dglo) = (0.0, 0.0). The unknown sources are .91 (t) = sawtooth(21rt) and 32(t) = square(47rt) 192 Update the parameter c13 and d12 while keeping all other parameters fixed __4- c13=-4.715 -4.5- _5 l l l I l l l l l I 0 10 20 30 40 50 60 70 80 90 100 time(sec) 0.62- d12=0.6031 0.58- 0.56 l 1 1 1 l l l l l I 0 10 20 30 40 50 60 70 80 90 100 time(sec) Figure 6.41. Update only one parameter of each matrix: update the parameters on and d12 with 17 = 20, 7 = 0.2 and (c130, duo) = (0.8ci‘3, (11,). The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) 193 Update the parameter 013 and d12 while keeping all other parameters fixed C13 = -4.509 .4 -4.5 _5 I L l I I L l L I I 0 10 20 30 40 50 60 70 80 90 100 time (sec) 0.8 r 0.6 d12 = 0.5989 0.4 0.2 o J I I I I I l I I I O 10 2O 30 40 50 60 70 80 90 100 time (sec) Figure 6.42. Update only one parameter of each matrix: update the parameters cm and d12 with 17 = 20, 7 = 0.2 and (c130, duo) = (cia, 0.0). The unknown sources are 31 (t) = sawtooth(21rt) and 32(t) = square(41rt) 194 Update the parameter c13 and d12 while keeping all other parameters fixed C13 = -4.583 _4 _5 ' 4 1 1 1 1 1 1 1 4 1 0 1 0 20 3O 4O 50 60 70 80 90 1 00 time (sec) 0.8 l- Figure 6.43. Update only one parameter of each matrix: update the parameters cm and d12 with n = 40, 7 = 0.2 and (013°, duo) = (cis, 0.0). The unknown sources are .91 (t) = sawtooth(27rt) and 32(t) = square(41rt) 195 Phase Plot of two parameters convergence _6 .- ..7 - -8 1 1 1 1 1 1 1 O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 d12 Figure 6.44. Update only one parameter of each matrix: update the parameters cm, and (112 with 1] = 40, 7 = 0.2 and (c130, (1120) = (ch, 0.0). The unknown sources are .91 (t) = sawtooth(27rt) and 32(t) = square(47rt) 196 012 C13 200 400 600 C21 600 -10[ -12. -14.-.-....-.‘-.._._._..._._.. J .35 L A _ 0 200 400 600 0 200 400 600 0.45 0'6 ll! [UH ‘ ,""",1H11l“ 7‘ i N I',l ‘|' L ‘1 ‘— . :6 l i} i i g 0.4 i 1 l ' l ' A ‘ ‘ 0.35 , - + . 0 200 400 600 o 200 400 600 time(sec) time(sec) Figure 6.45. Update all parameters of each matrix. Co = 0.80‘ and D0 = D‘, 1] = [103 10 102 10] , 7 = 0.2 and T = 500 The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(47rt) 197 c12 c13 zoo 400 600 ' o zoo 400 600 0.45 N 0.6 .... ”I w .l‘ ‘.,‘ ali‘lii ll l P u V . U o 200 400 606 ' o 200 400 600 time (sec) time (590) Figure 6.46. Update all parameters of each matrix. Co = 0.8C" and D0 = D‘, 17 = [103 10 104 102], 7 = 0.2 and T = 500. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(4xt) 198 0 500 1000 0 500 1000 time (sec) time (sec) Figure 6.47. Update all parameters of each matrix. Co = 0.8C‘ and D0 = D‘, r] = [103 10 104 102], 7 = 0.2 and T = 1000. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt) 199 C12 c13 1 i 0.5 ' d12 .° tn d21 0 500 1000 0 500 1000 time (see) time (596) Figure 6.48. Update all parameters of each matrix. Co = 080" and D0 = 0, r} = [103 10 10‘ 102], 7 = 0.2 and T = 1000. The unknown sources are .31 (t) = sawtooth(21rt) and 32(t) = square(41rt) 200 6.6 Time and Weight Scaling It has been observed that the algorithm fails to converge when the parameters of the C matrix are too large. This is not a limitation of the algorithm, but rather a limitation of the computer implementation. Thus, a scaling down of these parameters is proposed in order to make them equal in order of magnitude and in a range that is acceptable for digital hardware implementation. 6.6.1 Mathematical Development Given a single input-single output dynamical system described by its transfer function 313"-1 + 32311—2 + ' - ' + ,Bn-ls + 5n 3" + 013"“ + 02.9"—2 + - - - + art-13 + an H(s)= +d Consider its canonical controllable realization D = (A,b, CT,d). Such realization was presented in Chapter 5 to be of the form - 0 1 0 0 . P0- 0 0 1 0 0 x = x+ u 0 0 0 0 1 0 _—a,, —a1_ 1‘ 201 C12 C13 -15 7 -3.5 ' d12 P m v d21 0 5000 10000 0 5000 10000 time (see) time (sec) Figure 6.49. Update all parameters of each matrix. Co = 0.8C‘ and D0 = 0, 17 = [1000 10 100 10], 7 = 0.2 and T = 10000. The unknown sources are 31 (t) = sawtooth(27rt) and 32(t) = square(47rt) 202 c12 C13 0 5000 10000 0 5000 10000 time (see) time (see) Figure 6.50. Update all parameters of each matrix. Co = 0.80“ and D0 = 0, 17 = [1000 10 500 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(21rt) and 32(t) = square(41rt) 203 m as -13. -sr d12 .6 a. . d21 0 + ‘ 0 : ‘ 0 5000 10000 0 5000 10000 time (see) time (sec) Figure 6.51. Update all parameters of each matrix. Co = 0.8C’ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31 (t) = sawtooth(21rt) and 32(t) = square(41rt) 204 I; o ‘3‘. o -1 -1 200 200.5 201 201.5 202 200 200.5 201 201.5 202 2 z, '5 o ‘3 o _2 . 1 1_ . - L 4 . . 200 200.5 201 201.5 202 200 200.5 201 201.5 202 0.5 0.5- ;. 0 “i 0 ~05 L . L 1 -o.5 . . . - 200 200.5 201 201.5 202 200 200.5 201 201.5 202 1 2' '5 0 I 0» I 0’ 3. ‘9'. -1 . . 1 . _ 1 . a . 200 200.5 201 201.5 202 200 200.5 201 201.5 202 Figure 6.52. Unknown sources, environment output and Network output at initial time t = 0. Co = 0.80" and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt) 205 51 o 32 o -1 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 21 2i o. '5 ‘3 0 __2. _4 . . 1 . _2 A A L . 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 11 Zr 'io’ ‘io’ -1 -2 4 * 4 ‘ 1000 3000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 x10 x10 17 5' ,. N m I’D 1 0 I 0 S. ‘1 -1 i 1 4 * ‘ -5 ‘ ‘ ‘ ' 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 Figure 6.53. Unknown sources, environment output and Network output at final time t = T. Co = 0.80‘ and D0 = 0, r] = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are 31(t) = sawtooth(27rt) and 32(t) = square(41rt) 206 '5 o 0 o —1 ‘ ‘ ‘ 4 -1 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 2 Zr ‘6 ‘3‘. 0 -1 . . 1 4 -2 1 1 . 1 1 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 1r 1' 0 3. got -1. - 4 . . 4 -1 1000 41000.5 1001 1001.5 1002 1000 41000.5 1001 1001.5 1002 x10 x10 Sr 5' v- N U) m l 0 I 0 E. 9 -5 A A A . -5 . 1 . 4 1000 1000.5 1001 1001.5 1002 1000 1000.5 1001 1001.5 1002 Figure 6.54. Test case: Unknown sources, environment output and Network output at final timet = T. Co = 0.80“ and D0 = 0, 17 = [200 10 200 10], 7 = 0.2 and T = 10000. The unknown sources are .31 (t) = sin(21rt) and 82(t) = sawtooth(41rt) 207 The diagonal transformation TM is defined as e"'1+ 0 0 0 0 0 TM = N 0 6"". 0 0 0 0 0 0 0 1 Under the transformation T“, the new ~ system is D = (l1, b, ET, (1), defined as 0 e 0 0 0 e A — TMATJ,‘l = 0 0 0 _an 6T = CTTJKI = [ 8n Bra—1 Where a.- = (ml-i, W = 1, ,n and 16.}: flie- ,Vi=1,---,n q realization of H (s) in the new coordinate o C '0- o o 13:1},sz 0 6 0 -&l .n. 3, 3,] J=d One observes that under such transformation, the coefficients of ~ 0 A and 2': are both inversely proportional to some power of e, 208 o the nonzero parameter of f) is proportional to I: while those of C are inversely proportional to n, 0 the parameters of d are independent of c and n, 0 the parameters of 14 are independent of 10. Using this transformation, one could scale the parameters that are to be updated up or down in order to make their numerical computation possible by digital machines. The scaling is achieved by the appropriate choice of e and depends on the application for which the problem is being set. For example, the parameters can be scaled down by choosing a value for 6 less than 1. Thus, there is no optimum value of c that can be used for all possible environments. However, one should have some knowledge of the desired environment and use that information to select the appropriate value of 6. Such information could be based on the frequency bandwidth of the environment. In the following example, it will be shown how the choice of 6 can be considered. K. can be considered as a normalizing coefficient for the parameters of 6. 6.6.2 Example The following example illustrates the necessity of scaling in order to make the digital implementation of the algorithm possible. Given the transfer function (3 + 100)(3 + 200) (3 + 1000)(s + 2000)(s + 3000) + 0‘6 H(S) or in an expanded form _ 32+3x1023+2x104 +06 " 33+6x10332+11x106s+6x109 -’ 209 The realization of H (s) is 0 1 0 0 0 1 —6 x109 —11x106 —6 x103 [2x104 3x1021] 0 b 0 1 d=0.6 However, by considering the transformation TM with c = 103 and I: = 1, H (s) can be realized by in 0 103 0 0 0 103 —6 x103 -11x103 -6 x103 J L. [0.02 0.3 1] 0‘1 (3 J=0.6 and by considering the transformation TM with e = 102 and K. = l, H (s) can be realized by an 0 102 0 0 0 102 h [.3.] —6 x105 —11x10“ —6 x103 5": O One may conclude that in order to scale the parameters of the matrix 6, one should choose 6 on the order of the zeros. Whereas 6 should be on the order of the poles of the transfer function if one wishes to scale the parameters of the matrix A. While 6 accomplishes the time scaling task, it performs magnitude scaling. By taking a value 210 for 1: different than 1, one could increase (decrease) the parameters of the E (5) vector and vice versa. 6.7 Observations In this chapter, a solution to the blind separation of signals in dynamical network model was developed. The network parameters of the proposed feedback structure, with the update law as defined by Theorem 2, has shown to converge to the desired values. In addition, other update laws based on neuromemitic approach were consid- ered in order to provide a comparative performance between the developed algorithm and the existing ones in the literature. Thanks to the special structure of the state space realization of the network model, an update law of the C matrix based on the decorrelation between the output vector and the state vector was developed. Also, such a representation of the network model is suitable for circuit implementation which will be presented in the following chapter. CHAPTER 7 Micro-electronic Implementation 7 .1 CMOS Building Blocks 7 .1 .1 Transistor The transistor represents the basic building block of a circuit. Designers must have a good understanding of its functionality in order to produce any circuit. Models which accurately define the characteristics of the transistor can be extremely complex and not at all possible for manual calculation. On the other hand, if a computer is employed to handle such a complex model, circuit design can become simple enough to model it. The general expression for the drain-to-source current of a transistor operation in a sub-threshold region is given below 14, = 10 cw“ (e‘v‘ — 6"“) At saturation, the drain voltage Vd is much higher than the gate voltage V}. Therefore, the above expression of drain-to-source current at saturation is Ida = 10 envy—V, 211 212 7 .1.2 Current mirror In circuit design, it is often necessary to have a copy of a current to perform different operations on that particular current. Thus, a circuit that provides a replica of a current is needed. The circuit shown in Figure 7.1 performs such a function. Both n-channel and p—channel current mirrors are shown. (a) (b) Figure 7.1. Current Mirror (a) n-channel, (b) p-channel 7 .1.3 Transconductance Amplifier Figure 7.2(a) represents the circuit diagram of a transconductance amplifier. The circuit will exhibit a sigmoidal function of the differential voltage V1 and V2 presented at the gates of the transistors M1 and M2 , respectively. The circuit is being driven by a biasing current 11, which is controlled by the voltage Vb applied to the transistor Mb. 1.... = I. tanh g—(V. — v2) (7.1) 213 where [0 = 1010 eKVb (7.2) w is the width to length ratio of the biasing transistor Mb. For small-signal analysis the current-voltage relationship of a transconductance amplifier described by equation (7.1) is simply Iout = ng/r (7.3) where V,-,, = V1 - V2 and gm is the transconductance defined as _ 610111 16 9m - av... 'Vm=° = 507023 (7“) It is important that the transconductance depends on the biasing voltage Vb. This pa- (a) Original Transconductance Amplifier (b) Improved Transconductance Amplifier Figure 7.2. Transconductance Design 214 rameter will be the key to adjusting the behavior of circuits that include the transcon- ductance amplifiers as will be shown in the next section. In [69], the author proposed an improved design of the transconductance amplifier as shown in Figure 7.2(b). This design enables the transconductance amplifier to increase its dynamic range. Circuit simulations of this design are shown in Figure 7.3. Figure 7.3(a) shows the output current of the transconductance amplifier for different width to length ratios of the biasing transistor Mb, while Figure 7.3(b) shows the same response of the amplifier for different biasing voltages V2,. m .1111. um 1.1m- (a) % = 2, 3, 4, 5, 8 (b) Vb =. 625, 250, 675, 700, 725mV Figure 7.3. PSPICE simulation of a transconductance amplifier 7 .1.4 Hyperbolic Sine Function A circuit that exhibits the sine hyperbolic function is shown in Figure 7.4. In [48], it 215 Figure 7.4. Circuit Realization of tan and sine hyperbolic functions was derived that the output current of this circuit can be expressed as [out = 2one"(V"'6V) sinh 10V where V1, is the biasing voltage. A PSPICE simulation of the circuit is shown in Figure 7.5 7.1.5 Gilbert Multiplier A circuit that produces the product of two voltages is needed in order to perform the implementation of algorithms that involves such an operation. In [70], the author presented the 1-dimensional multiplier as shown in Figure 7.6. It is now known in literature as the Gilbert multiplier. The output current of this circuit is 1: x 101“: I], tanh 5(I/1 — I/g) tanh '2-(I/3 -' I4) (7.5) For small-signal analysis, the current described by equation (7.5) can be expressed as 10111 0< 11(V1 - V2)(V3 - V11) (7.6) 216 “some mu. buffi- l‘. WW" 11.11 t: 1mm: 11.0 ,u'.. ...., . . .................. n-I.fl.floII'.‘-Il I)" I!" I.“ 1.6!? I.“ 1.!" I.“ I1” I." Oman-l JED]! I‘ll 05111.njh am]! 11 Figure 7.5. PSPICE simulation of the sine hyperbolic circuit for different biasing voltages Vb PSPICE simulation of the circuit for various biasing voltages is presented in Figure 7.7. Observe the linear range for which the current is linear. It is desirable that one defines the range of operation to be within that linear range. 7 .1 .6 Vector Multipliers Using the design of a single dimensional Gilbert multiplier, one may design the multi- dimensional multiplier. Such a design is shown in the Figure 7.8easily. In this case, the output current is 1..., = Ithanh g(x, — tad) tanh 125(11- — V..,) (7.7) For small-signal analysis, the current expressed by equation (7.7) is approximated by 1out 0( IbH(Xi - ‘40fo - I/ref) (7-8) 217 Figure 7.6. l-dimensional Gilbert multiplier mm- 1.. am»: 01:11.10 " I m. 11.- my Figure 7.7. PSPICE simulation of of Modified Gilbert Multiplier using different bi- asing voltages 218 Thus, this circuit realizes the dot product of two n-dimensional vectors X and Y and gives such a value as a current. If one wishes to convert such a quantity to a voltage, then s/he can utilize the linear resistors to do so. 1%. Li .. y i i" Figure 7.8. n-dimensional Gilbert multiplier 7 .1.7 Current to Voltage Converter All the basic analog components that have been presented so far produce a current output function. If one wishes to convert the current quantity to a voltage quantity, he or she can rely on the linear resistors. Linear resistor can be used to realize the Conversion of a current to a voltage. A circuit realization of such a block is shown in Figure 7.9. 219 out Figure 7.9. Current to Voltage Converter 7 .1.8 Second Order Section The second order section is a circuit that can be used to generate a response that Can be represented by two poles in the complex plane. By adjusting its parameters, one can position the locations of these poles anywhere in the complex plane. A realization of such a circuit was presented by Mead in [71] and is shown in Figure 7.10. By performing a small-signal analysis, one can derive the transfer function of such a circuit. The transconductance amplifier A2 is a follower integrator and is described by the following equation 02132 2 92(02 —' 01) (7.9) At the node (1), the current generated by the transconductance amplifiers A1 and A3 220 ' A. v, + u V... 2: CI Figure 7.10. Second Order Section Circuit will charge or discharge the capacitor C1 C1131 = g1(u — v1)+ 93(01— 02) (7.10) By taking the Laplace transform of equations (7.9) and (7.10) 3021/2 = 92(1/2 — 1/1) (7.11) sC'lVl = 91(U - V1) + 93(V1- V2) (7-12) and combining these two equations, the following relationship relating the output V2 221 to the input 11 can be obtained Xe. = 9. U (301 + 91 — 93)(1+ 372) + g3 _ 91 — 720182 + (C1 + 7291 — 72908 + 91 l = 7.13 1372.32 + (7'1 + 1'2)(1 — a)s +1 ( ) where 01‘ T, = __ 7.14 g.- ( ) and a = TI 33 (7.15) 7’1 + 7'2 .91 In [71], the authors concentrated their analysis on the case when C1 = C2 = C. In this particular case, the expression for a in (7.15) is, therefore, reduced to “=gfg gm) 1 2 and the transfer function described by (7.13) is expressed as V; 1 U - 7232 + 27(1- a)s +1 (7.17) Figure 7.11 shows the frequency response of the second order section for different values of a. Based on the small-signal analysis, the second order section is stable for any a satisfying 0 < a < 1. Also, in the case where a = 1, the poles are purely imaginary and the circuit is unstable. However, a large-signal analysis for the circuit developed experimentally by Watts [69] shows that the second order section can be unstable 222 Ice-l au- 80m- bum- M. “‘11!“ 31.00.)! m: "I ”UVY 1 MU“. . ”M1 LIX III 1m Ln! m: 9'11}! 0'1”! l'tIJI‘V101071UI Figure 7.11. PSPICE Simulations of Second Order Section for V1,, = 625, 650, 675, 700, 725mV within the specified range of stability based on the small-signal analysis. It was proven that this nonlinear circuit exhibits some instabilities. Therefore, an improved design of the second order section was developed based on the improved design of the transconductance amplifier that is shown in Figure 7.2(b). 7 .1.9 Second Order Filter Design The design of the second order section developed by Mead and Watts birthed the inspiration to develop the circuit realization of the state equation for the canonical controllable state representation of a dynamical system. To start, a circuit realization for a 2-dimensional system using the second order section will be developed. Then, the development of the circuit realization for higher dimensions will be straight forward. The second order section uses the transconductance amplifier A1 as an integrator follower and thus it realizes a low pass filter. However, here a pure integrator for A1 will be implemented. Therefore, Al was modified to produce a pure integrator circuit ‘1 223 instead. 7.1.10 Circuit Implementation of Second Order Filter Consider the second order filter described by the transfer function H(s) = :8) = 52 + ablls + (1; (7.18) The canonical controllable state representation of H(s) is 231 = :02 (7.19) :02 = —a2x1 — (11172 + 0111 (7.20) and the output equation is y = 2:1 (7.21) It will be shown that a circuit realization of the filter described by equation (7.18) is shown in Figure 7.12. A1 is a pure integrator. Therefore 01131 = 91(01 - Vrcf) (7.22) At node (2), Kirchoff’s current law suggests that 02132 = 92(0 — '02) + 9304.; " ”1) . (7-23) Let 2; = v.- - V"; and 0 = u — ‘43". Therefore, equations (7.22) and (7.22) become 012.1 = 9122 0252 = 92(11 - 22) - 9321) 224 Figure 7.12. Filter Design. Circuit realization of controllable canonical form of a second order filter which give us the generalized canonical controllable realization defined by a, = 3122 (7.24) 2. = was — £122. + £12 (7.25) The canonical controllable realization is obtained by using the transformation T9,, 1 defined by :01 = 21 and 2:2 = fizz. Therefore 271 = 1132 (7.26) 1 1 l 232 = ———$1 — —$2 + —‘II (7.27) T173 T2 T172 where Cl C: 02 T] = — T2 = — T3 = — 91 92 93 225 Equations (7.26) and (7.27) describe the canonical controllable of (7.18) where 1 1 1 02 = — a] = — bl = _— T1T3 T2 717'3 It is important to note that the developed second order filter described by equation (7.25) is always stable since the coefficients a1 and a; are positive. In circuit imple— mentation, it is often assumed that all the capacitors C,- are equal to C, which is in the order of a few pico F ayrads. The characteristics of the second order filter, i.e. its pole locations and gain, are controlled by the biasing voltages of the transconductance amplifiers shown in Figure 7.12. Figure 7.13 shows the transistor level design of the filter. PSPICE simulations of this circuit were performed. Figures 7.14(a) and 7.l4(b) show the frequency response of the filter for various biasing voltages of transconduc- tance A2 and A3, respectively. Observe how changing V1,, and V1,, impacts the gain as well as the pole locations of the filter. Having developed the circuit realization of a second order filter, that of an n-dimensional filter becomes straight forward as it is shown in the next section. 7.1.11 Circuit Realization for n-dimensional Filter Figure 7.15 represents the circuit realization of an n-dimensional filter described by the equation _ Y(8) _ b1 - U(s) — s" + ale"-l + - - - + ands + an H (s) (7.28) By looking at Figure 7.15, A,- is a pure integrator. Therefore, 0.13; = 91(vi+1 - Vrcf) (7.29) 226 Figure 7.13. Filter Design. Circuit realization of controllable canonical form of a second order filter mm 1-1 10mm 11.11.31 W‘ 11.0 with rw imam 11.11.11 m ”I my _v 0" (a) Vb, = 650, 675, 700, 725mV (b) Vb, = 650, 675, 700, 725mV Figure 7.14. PSPICE Simulations of second order filter 227 At node (vn), Kirchoff’s current law suggests that n-l Cnén = Z 550/”; - v.) + gn(u - vn) (7.30) i=1 Let 2.- = v,- — 14,; and i1 = u — va. Therefore. equations (7.29) and (7.30) become A.‘ ‘7» __EF" 1 A." ‘7. _31" 1 A: ’ _- - vbl A. u .. V +A, ._ .. .. -- v _l - v,. v. ’ '- L. c v. == C» v __ c. V H -r- 1,— . ' =4,— Figure 7.15. Filter Design, Circuit realization of a controllable canonical form of a n“ order filter Cd; = gg25+1 i=1,---,n—1 n-l Cnén = — Z 3523 + git“; — 2") i=1 228 which gives us the generalized canonical controllable realization defined by i,- = %z,~+1 i=1,---,n—1 (7.31) 2,, — —%zl—...—g—gizn_l—-.-—%—"—zn+%’ia (7.32) Based on the developed small-signal analysis, one can use the Routh Hurwitz criteria to design the necessary parameters to obtain a stable filter. Define T.- = 1'} f.- = 11, Vi = 1,--- n -1 and in = Tn. Therefore e uations C.’ C" a , q (7.31) and (7.32) become 2.; = T;Z,'+1 2: 1,° H ,n —1 (7.33) £7; = —7_-121 _ ° ° ° — fn-lZn-l — ° ° ° — fnZn + Tna (7.34) In matrix form Azennxn b36Rn :- II rp 1i - n I- d 2.1 0 T1 0 ' ' ' 0 21 0 :22 O 0 12 . - - 0 22 0 = + 11 if 0 0 0 0 Tn...1 Z.‘ 0 in .4 "fl ' ' ' ' ' ' —fn—l -731 Zn Tn Define the transformation T as ’1 0 0 i 0 0 0 T: i 0 $2113- 0 0 0 0 _0 0 0 11;:qu 229 Therefore r 0 1 0 0 l 0 O 0 1 0 0 A=TA,T“= 5 5 b=sz= 0 0 0 0 1 0 —a.. -a1 b1 where 75111.7. "-1 ai=71n—i+1'—,.J:‘i——f=7‘n-i+1 H “G j=n—i J ‘ j=n—i+l and 71—1 11 b1 =7'n H73: H13 j=l j=l 7 .2 Circuit Realization Having introduced the library of basic building blOcks for analog CMOS implemen- tations, it is possible to develop a circuit realization of the network and the update laws. 7.2.1 Implementation of the output equation The output equation 31; = 8i - 63X; - dgTy (7.35) is simplified to y; = e.- - étTi. - 3T5! (7.36) 230 where z.- is a vector having one dimension less than 2 and is obtained by eliminating the 2"” component of 2. For the algorithm development, equation (7.35) was used rather than (7.36) in order to create a matching in the dimensions for clarity. It should be recalled that m.- = 0, C.'.' = 0 and d,-.- = 0. Equation (7.36) is realized by using a transconductance amplifier to convert the input voltage e.- into a current. In addition, two multi-dimensional vector multipliers are used to compute the product 6,732.- and dTS'. The currents coming out of the vector multipliers are subtracted from that of the transconductance amplifier to obtain the y.- in current form. Then, a linear resistor is used to convert the current into a voltage. Figure 7.16 shows the schematic of circuit realization. ._, c-jyfli c: X ’ [1]—W- d. 9 X Figure 7.16. Circuit Realization of the output equation 231 7 .2.2 Implementation of the state equation The state equation is described by the transfer function II r q |- -I P q 233;) o 1 0 0 35,1.) 0 if? 0 0 1 0 .53) 0 = i E E + E u,(7.37) 233“?“ 0 0 0 0 1 mfg-“1"“ 0 55:0) . __aff-‘fl _ag)‘ . 1390') ‘ _flij‘ The circuit implementation of this transfer function was completed in the previous section. Figure 7.15 shows the schematic for a filter of order n. Thus, to implement the equation (7.37) one needs to designs a pgjth order filter. In this case I: Any-1 (151') = ij..k+1 H T] (7.38) l=u5j-k+l and I‘ij fiij = H T] (7'39) (=1 7 .2.3 Implementation of the C update equation The update law of the parameters of the C matrix is described by (EV-c) = nff’ygxli) ' Vj 75 2' and k = 1, ~ - - , p,,- (7.40) ’J ‘1 Equation (7.40) is implemented using a 1-dimensional gilbert multiplier. The output current of the multiplier is dumped to a capacitor Cg“) as shown in Figure 7.17. In this case (1‘) _ V ._ V (1.) tanth tanhxgi—"L (7.41) I: .(l‘) 'l") = C c I 2 2 t1 31' b 232 Therefore éif) = 773:) (935:) - Vrcf) (yi - Vrcf) (7-42) where (’F) = 155;) (fi)2 ”'1 0;) 2kT (7.43) Thus, the leaning rate of such parameter is controlled the bias voltage of the multi- plier, VCon 0 V... Xk —.. k I X C. Y: ——-I vc‘, Figure 7.17. Circuit Realization the update equation of the C matrix parameters 7 .2.4 Implementation of the D update equation The update law of the parameters of the C matrix is described by (£3 = 751' sinh y,- tanh yj Vj 7Q i (7.44) Equation (7.44) is implemented using the sine hyperbolic circuit and the transcon- ductance amplifier that were described in the previous section. Figure 7.18 shows the 233 circuit implementation. The algebraic sum of all the current at the (+) node of the capacitor 0.3 is sinh K Iout = Iout tanh g(yj _ Vrc'f) (7'45) where I ”M = 2one"(V"+6V) sinh x(y; — an) (7.46) out The current described by equation (7.45) is dumped to the capacitor C5,. Therefore, when equations (7.45) and (7.46) are combined together, one obtains d..- = ‘m sinh x(y. - v...) tanh g(y. — v...) (7.47) where 2w] eK(VDb.-j+6V) 70' = o 0.. (7.48) It is important to observe the parameters that the the learning rate 7.3 depends on. They are the biasing voltage VDbU; the ratio of width of the biasing transistor to the width of the arm transistors w; and the offset voltage 6V. In the circuit implementation, all these parameters will be held constant except the bias gate voltage VD], w which will control the learning rate of the parameter dij. 7 .3 Circuit Simulation In this section, the circuit simulation of a 2 x 2 network is performed using the circuit simulator PSPICE. The block diagram of the overall circuit including network and learning is shown in Figure 7.19. V. Figure 7.18. Circuit Realization of tan and sine hyperbolic functions The unknown signals are two sine waveforms with respective frequencies 1kHz and 2kHz as shown in Figure 7.20. These two signals are filtered and mixed in order to obtain the signals e1 (t) and 82(t) which are the output signals of the environment as presented in Figure 7.21. The transfer functions Hub) and 1.1721(3), shown in Figure 7.21, are two second order filters whose state and output equations are respectively defined by equations (7.37) and (7.35), with the exception that d5,- = l in this case. ii: = A.,-x.~,-+3.~,u,- (7-49) e, = Eng5j + u.- + Jgj‘uj (7.50) The circuit implementation of equation (7.49) is shown in Figure 7.15. However, the circuit implementation of equation (7.50) is obtained in a similar fashion as that of the output equation of the network model described by equation (7.35) whose circuit implementation is shown in Figure 7.16. The parameters of the matrix 1,-1- are defined by the biasing voltages of the transconductance amplifiers and the capacitors shown in Figure 7.15. The parameters of the column vector 6.3 and the parameter 3;, are defined by the voltages at the cor- 235 Order Filter Update y] 4—dimemional Cl __,. Veda, 4' yl Multiplier d21 yl 4—dimensionnl 32 __.. we.“ #- Y2 Multiplier x2 02 Updue C Y2 Filter Figure 7.19. Block diagram of overall circuit including network and learning 236 Date/flu m: Iliad/DI 17.10.11 1'... oooooo 21.0 ZJSVT' * ------------------------------------------------------------------------------------ 1 . l 2.50V, 1.4! --------------------------------------------------------------------------- VL‘IJ) 2.55 -------------------------------------------------------------------------------------- 1 z.sov. 2.4SVOH- v --------------- v v “‘i o. 1.0- 2.“ Lu- l.“ the 0V“!!! Figure 7.20. Unknown sources 810) fin“) ‘ 82(t)l—-. H2] (S) Figure 7.21. Environment Circuit Model 237 Table 7.1. Parameters of the Environment Circuit Realization [ A12 I A21 1 V,1 0.69V v; 0.69V V,2 0.69V V,2 0.69V V3 0.71V V3 0.71v Cl 5pF Cl 4pF Cg 5pF Cg 4pF ; 512 521 V5,, 2.60V V5,, 2.65V V5,, 2.40v V5,, 2.45v d1 d2 VJ“ 2.70V 14,-” 2.56V VJ” 2.54v VJ” 2.70V responding gates of the transistor at the positive arm of the gilbert multiplier. Table 7.3 shows all these parameters. A PSPICE simulation of the frequency response of the filters 312(3) and {121(3) was conducted. The results of this simulation are shown in Figure 7.22. Observe that the cutoff frequencies of the filters H12 and H21 are within the 1kH z and the 5kHz range, which is the operating frequency range. The transient response of these filters is shown in Figure 7.23 where the input signals are the two sine waveforms shown in Figure 7.20. The output of the environment is fed to the transconductance amplifier as shown in Figure 7.16. Unlike the environment circuit model whose parameters (3.-j and d,- are constant and are defined in Table 7.3, the parameters (3.-,- and d,- of the network model are controlled by the ‘ time-varying charges, accross the capacitors shown in figures 7.17 and 7.18, defined by equations (7.42) and (7.47). The initial voltages across these capacitors are shown in Table 7.3. Figure 7.24 shows the results of 238 and cm: 'qu Deco/1':- m- 1112)". 10:31:41 Menu-I 37.0 Io? . can r -------------------- ~ --------------------- ' -------------------- ~ --------------------- . -------------------- 1 I : I 100.0” ‘ «.10.; L --------------------------------------------------------------------------------------------------------- 4 unusual”) «autumn». alumnae); -xmma.nu oxncxnaann animxnu ormmcann- manta“) ‘.M1r-...-.OOOQCO-'G-¢.--1..‘OQOOQOOOCO ’M4 . . . . . I . Ion-r III-r ‘m1 I I 'lvl-l . . ...... I I ...... . . soc-r Cmi 10” ~ L ; L 1 . III III- 10.: x . on: was 10... alumnus) dualism“) OXDIIIIZ.I13I «aunt-u). ”(I‘LIHI annual“: «Dunstan: Julianna) m Into/fl- “: 13133106 Ohio-I! Macao- 87.0 1.”? ----------------------------------------------------- ‘ """""""""""""""""""""""""" '1 ‘0--. ........................................................................................................ I O“ ' Brownian!) -n(m1.n4|0 :mmaann -!D(m3.lllb Ou|m3.nii downturn) donut-u)- IDIIIQO-Ill) 5°” '1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I .1 p i V“""""""""“""""““"' . 10!“ 1m- 1. . In: to”. soon: finin‘n.ln| -xmnn.lnu .xmma.nn annuals“. ”(ELEM -!DIIID.III) ammonia) annual“) M 32' Figure 7.22. Frequency Response of the Environment Circuit Realization 239 Dace/Tau rum 18/23/9‘ 10:31:49 - MOIOCUOI 27.0 SOIIAT ------------------------------------------------------------------------------------- 1 . 0 0A”. : I _sm .5 .............................................................................................................. ormmiunn -ID(XD1.IJ‘I ormxnaaan -1Dlm2.mflo xmxnaann -Iblm).lll) OID(XDC.I11) quantum) SMT """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 1 o. 1.0.. a.” 1.0- I.“ the 02041311.!3” actual“) oxmxnaann -rmma.uu ormmzann -!D(m3.lld) othXlltllll -xn¢nu.mu Tl— Figure 7.23. Transient Response of The Environment Circuit Realization the network performance for the first 50msec. Observe that all the parameters have converged to the values as shown in Table 7.3. By fixing the parameters of the network to these final values, the network reproduced the original signals as shown in Figure 7.25. The transistor PSPICE model and the PSPICE source code used to simulate the developed circuit implementation is given in Appendix D. In this chapter, a subthreshold circuit implementation of the dynamic feedback network and the learning algorithm was developed. The performance analysis of circuit realization was conducted using two prototype signals as the unknown sources. It was observed that the network parameters converge to a separating solution and thus recover the original unknown sources. 240 Table 7.2. Parameters of the Environment Circuit Realization Parameter Initial Final d12 1.0V 4.80V d21 4.0V 4.80V €11 1.0V 2.31V cm 4.0V 2.20V €21 1.0V 2.73V €22 4.0V 2.36V Date/1‘1. MI 11/34/06 O‘IIIIIOI WI 37.. S.W1 - . ....................................... f I I I C . . 1 V O. u 9'" ‘Ifi\‘r‘ 'l-A‘ 'fi' Hflw ' ' ' V r I f V” ‘ V r V ................... 0. 1b 30- )u d- 3” OVIIJZ) .VICZI) OVISOI) 'VIJOZ) .VICCI) ’VIO“) Figure 7.24. Parameters’ Convergence of the Network Circuit Realization 241 0.0““ 036.: filter Date/ft- MI 11/20/06 21:01.07 Mot-tum 37.0 a-WT oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ‘ . . . . . : Figure 7.25. Output of the Network Circuit Realization at Convergence CHAPTER 8 Conclusion In this work, a novel algorithm for the blind separation of signals in static environment based on the decorrelation condition of the output signals was developed. It was then improved to test for independence. An energy function for the problem based on the minimization of the mutual information functional was also developed. A fourth order Edgeworth expansion was used in order to find an approximate expression of the probability density function. The new energy function considered a larger set of signals as compared to the existing energy function, since the assumption of unit variance of the output was not considered. Also, the energy function represented a better estimate of the mutual information functional. A mathematical framework for the development of update laws for the network parameters based on the theory of adaptive optimal control theory was also developed. A realization of the environment that represents the least number of parameters was also introduced. A forward and an feedback structure were considered to model the environment and hence the network. Computer simulations showed that the update of the parameters of the matrix D converge for both structures. However, successful learning of the C matrix was realized only for the feedback structure. The coupling of both learning rules was eliminated by considering a sequential update method. Basic building blocks for the circuit implementation of an algorithm in the dynamic feedback 242 243 network model were developed. Overall circuit implementation of the algorithm was also developed and tested for prototype waveforms. APPENDICES APPENDIX A Definitions A.1 Statistical Definitions Definition 4 (Entropy) Entropy is a measure of uncertainty of the occurrence of an event in an ensemble of experiments described by the random variable (rv) x [72]. It is defined as: H(X) = -E{1n fx(X)} where fx(x) is the probability density function (pdf) of the rv x. Definition 5 (Joint Entropy) The joint entropy of two random variables x and y is [72]: H (X.y) = -E{lnfxy(X.y)} where fxy(x,y) is the joint pdf of the rv’s x and y Definition 6 (Mutual Information) The mutual information among the compo- nents of a random vector x is defined as [72]: '_ n fX(x) I(x)—E{l ___lli “(x0 (A.1) 244 245 . In the above definitions, E {} denotes the expected value and is defined as: Definition 7 (Expected Value) The expected value of a function of an rv x is defined as [72]: was )}= [_°° g((x)f. xdx) Definition 8 (Conditional Probability Denesity) f(X.y) = f (XIy)f(y) = f (yIX)f(X) Proposition 1 (Properties of Entropy) P1 H(x) Z 0 P2 H(XJ) = H(X) + H(yIX) = I160+ H(XIy) P3 H(X.Y)SH(X)+H(y) P4 H(XIY) S 1100 P5 H(x1,x2,-~ ,xn) = H(x1)+ H(x2lx1) +H(x3|x2,x1) + - - - + H(xnlxn-1, - - - ,xl) P6 H(x1,x2,-~,x,,) S H(x1)+ H(xz) +--- +H(x,,) P7 Equality in P6 will hold if x; ’s are statistically independent. Proof of P7 In this case, by the definition of statistical independence, one has fx(x) 2 1'1,- f,;,(x,-) "Eiln fx(X)} = ”Eilanx.(xi)l = ‘ZEilnfx.(xi)} H(xlix2i ' ' ' ix") 246 = ZH(X.‘) A.2 Matrix Notations Definition 9 A permutation matrix is a square matrix whose rows and columns con- tain only one nonzero entry that is equal to 1. Definition 10 A generalized permutation matrix is a square matrix whose rows and columns contain only one nonzero entry. APPENDIX B Proofs B.1 Proof of Theorem 3.5 The proof of this theorem is conducted by induction. However, the following theorem is needed. Theorem 3 Let A be a positive definite n X n matrix, and B be the (n +1) x (n +1) matrix Then (i) [B] S alAl, with equality if and only ifb = 0; (ii) B is positive definite if and only if IBI > 0 Proof of theorem 3: Define the (n + 1) x (n + 1) matrix 1,. —A-1b 0T 1 247 248 Then T 0 P BP = 0T 0 — bTA‘lb so that [B] = IPTBPI = IAI I (a — bTA'lb) A is positive definite. Then, A‘1 is positive definite. Therefore, bTA‘lb > 0,Vb. Consequently, we prove (i). To prove (ii), we observe that |B| > 0 if and only if a -— bTA'lb > 0, which is the case if and only if PTBP is positive definite. Since P is positive definite, so is B. D Proof of theorem 1: As stated earlier, the proof will be conducted by induction. If M is a positive scalar, then equation (3.5) is always true. Now, we will assume that the equation (3.5) true for matrix M of rank 12. Theorem 3 shows that equation (3.5) is also true for (n + 1). Cl B.2 CMOS Circuit Function Derivation Derivation of The 'h'ansconductance Amplifier Function Figure 7.2(a) represents the circuit diagram of a transconductance amplifier. The circuit will exhibit a sigmoidal function of the differential voltage V1 and V2 presented at the gates of the transistors M1 and M2, respectively. The circuit is being driven by a biasing current 15 which is controlled by the voltage Vb applied to the transistor Mb. Using KCL at node (1), we have: 11+12=Ib 249 where I] = 10 eKVI—VI 12 = Io enV2—V, 15 = on e‘V" (l — e’V’) Solving for e'V‘ to obtain: V w eKVb Substituting the above expression in the expressions of currents 11 and 12, one obtains: V e" 1 _ "Vb NV: V, e 12 = on e" 85V] +enV2 +108an To acquire the difference of the two currents, a current mirror circuit is employed. The current mirror circuit produces a replica of the current 11 flowing through M4. Thus, 10.: = 11 - 12 K V1 RV; C ‘- C = 21210 e‘v" Multiply numerator and denominator by e"(V1+V2)/ 2 eIt(V1 -V2)/2 _ e-K(V1-V2)/2 en(Vl-Vg)/2 + ‘3'”‘(VI-Vzl/2 + weK(Vb-(V1+V2)/2) Vb Iout = on 6" However, V1+V2 2 Vb<< 250 Therefore, eK(V1-V2)/2 _ e-K(V1-V2)/2 Iout = on er: b e"(Vi-V2)/2 + 3"‘(V1-Vzl/2 which is [out = on e‘V" tanhn(V1 — V2)/2 Derivation of Hyperbolic Sine Function A circuit that exhibits the sine hyperbolic function is shown in figure 7.4. The output current is expressed as [out = 1+ - I“ where I+ = h(V,6V) and 1+ = h(—6V, V) Now, it is desired to deternime the function h(V1, V2). 11 = IoC‘VI-V’ [2 = Ioean-V, 1;, = one"V’(l — ev’) Using KCL at node V,,: h+h=h+Ah 251 Substituting for all expressions and solve for e'v’ to obtain: KVb —v, w e = (1 — A)e"V1 + e""2 + w eKVb 6 Therefore with unity feedback (A = 1): K V1 8 Vb e 65V; + w 65V), I} = 1.0108 Multiplying numerator and denominator by e"V2 to obtain: x(V1 —V2) KVb e 11 = 1.0106 1+ w eK(Vb"’V2) However, Vb << V2. Thus, e"(V°’V’) << 1. Consequently, I = h(Vla V2) = 11 = one‘V" e"(V“V’) Coming back to find the output current of figure 7.4: I... h(V,6V) — h(—6V, V) one ..v, [adv-(W) _ e..(—asv-V)] x(Vb—JV) [65V _ -..v] = one e = 2one"(V°‘6V) sinth APPENDIX C Matlab C.1 Derivation of fa Consider the assumption that the output signals have zero mean. cumulants are expressed as follows: N2i = #2.‘=E[yil Kai = flair-Elyil N4; = #4.‘ - 31“; = Ell/fl “ 3Elyil2 Express the approximation of the entropy as 2881n 21r — 288K2,‘ - 48kg,- — 12x3,- + 12%,...- "‘ = 576 +1216}, — 17045;"; — 496mg, — 213x;- 576 Compute 91- _ 2a K2; _ 576 a _ _96531' + 2453;54" '— 3408fl3if€3i "" 1984,03,: K3.‘ — 576 252 Therefore, the 253 8H,- —24I€4i + 12mg,- + 36K},- — 3408K§i54g — 85253,- 541 576 Compute also 8’62; 6 6 — ' = — .2 = . . awij — 8212,,- [II 2’] 6w,,- Ell/J EDI/.173] 5K3. __ a -__ a 3 __ 2 ' 6ng — awij [#3'] _ ngjEiyi] _ EBB/,- $1] 3541‘ a _. _ __ 2. = 3 . - - . . 5.0,]. - awijlu. 3112.] E[4y,x,] 12,12, 3W1] C.2 Matlab Source Code XXZ221%!XXZXZXZXXZZZZXZZXXXZZZXXZXXZZX function F 8 func(unit-var,order); 2%XXXx121222XX%%%%%%%%%%ZZZZXXXZXXXXXZ if unit-var '3 1 k3-[O 1 o o 01’; k4 . [1 o o o -3]’; p3 - [o o 3 o -3]’; p4 - [o 4 o —12 01’; else k3 - [o 1 o o 01’; k4 - [1 o o o -3]’; p3 - [o o 3 o 01’; p4 - [o 4 o o 01'; end if order =8 3 n 3 13; H3 - 96*pad(k3,n) - 24*pad(conv(k3,k4),n) + 3408*pad(conv(conv(k4,k4),k3),n) + 1984*pad(conv(conv(k3,k3),k3),n); H4 I 24*pad(k4,n) - 12*pad(conv(k3,k3),n) + 3408*pad(conv(conv(k3,k3),k4),n) - 36*pad(conv(k4,k4),n) + 852*pad(conv(conv(k4,k4),k4),n); H3 8 conv(H3,p3); H4 I conv(H4,p4); c 8 576; elseif order =3 2 n 3 9; H3 I -8*pad(k3,n) + 60*pad(conv(k3,k4),n); H4 - -2*pad(k4,n) + 30*pad(conv(k3,k3),n) + 9*pad(conv(k4,k4),n); 254 H3 8 conv(H3,p3); H4 8 conv(H4,p4); c t 48; else fprintf(’\n No order is defined ..\n’); break; end if unit_var =3 1 H2 8 c*pad([1 O],length(H3)); else H2 - pad(0,1ength(H3)); end F ' (H2 + H3 + H4)/c; XXXXXXXXXXXXZXXXZXXZZXXXXXXXXXZXXXXXXX function y = pad(x,n); 11111111211ZZXXXXXZXXZXXXXZXXXZXXXZZXX y 8 zeroan,1); x I x(z); m 3 length(x); y(n-m+1:n) = x; APPENDIX D PSPICE Files - D.1 Circuit File 2 by 2 Separation Network Circuit ***************** * Apply Voltages ***************** Vdd 80 0 5.0V V38 70 0 0.0V Vref 3 0 2.50 meul 4 0 0.70 VbIV 5 0 10 Vdp 6 0 2.8 Vdm 7 0 2.2 V31 11 0 sin(2.5 0.1 1k) V32 12 0 sin(2.5 0.1 2k) de11 121 0 2.70 de12 122 0 2.54 Vcb11 141 0 2.60 Vcb12 142 0 2.40 de21 221 0 2.56 de22 222 0 2.70 Vcb21 241 0 2.65 Vcb22 242 0 2.45 255 *Vd11 *Vd12 *Vc11 *Vc12 *Vd21 *Vd22 *Vc21 *Vc22 Vbsosll Vbsosl2 Vbsosl3 Vbsos21 Vbsos22 Vbsos23 ****************************** * Identify Nodes ****************************** t * meul ******************** 321 322 341 342 421 422 441 442 151 152 153 251 252 253 Vref 4 s1 s2 db11 db12 xb11 xb12 cb11 cb12 d11 d12 111 0000 0000 c$<> 0 0 0 e1+ 21 e2+ 31 e1- 22 e2- 32 .70 .54 .60 .40 MMNM .56 .70 .65 .45 [ONION 0.69 0.69 0.71 0.69 0.69 0.71 3 11 12' 121 122 131 132 141 142 321 322 331 Vblsosl 151 Vb2sosl 251 Vbisos2 152 Vb2sos2 252 Vblsos3 153 Vb2sos3 253 db21 db22 xb21 xb22 cb21 cb22 d21 d22 121 256 221 222 231 232 241 242 421 422 431 257 * 112 332 x22 432 * c11 341 C21 441 * c12 342 C22 442 * y1 61 y2 62 ************************************************************ * MYSOS s2 xb12 xb11 Vblsosl Vblsos2 Vb18083 Vdd Vgnd Vref * 1 2 3 11 12 13 8 7 9 44*4444444*:s*******4444444***********************s********* XSOSEI 12 132 131 151 152 153 80 7O 3 MYSDS CE11 131 O SpF CE12 132 0 SpF ****************************************************** * Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I- t 8 7 1 2 3 4 5 21 22 ****************************************************** * realize e1 8 cb11*111 + cb12*xb12 + db11*31 + db12*s2 xs11 so 70 141 3 131 3 4 21 22 GMULDl x312 so 70 142 3 132 3 4 21 22 GMULDi xs13 so 70 121 3 11 3 4 21 22 GMULDI x314 so 70 122 3 12 3 4 21 22 ' GMULDI *********#**********#***************#*#***t***************** * MYSOS 31 xb22 xb21 Vblsosl Vb1sos2 Vblsos3 Vdd Vgnd Vref **************#********************************************* XSOSE2 11 232 231 251 252 253 80 70 3 MYSOS CE21 231 O 4pF C222 232 O 4pF * realize e2 8 cb21*x21 + cb22*xb22 + db21*s1 + db22*s2 XE21 80 70 241 3 231 3 4 31 32 GMULDI X322 80 70 242 3 232 3 4 31 32 GMULDi X823 80 70 221 3 11 3 4 31 32 GMULDl XE24 80 70 222 3 12 3 4 31 32 GMULDI ***********t**********t*********************************** * MYSOS y2 112 :11 Vblsosl Vbisos2 VbisosS Vdd Vgnd Vref *ttttttttttttt***#*******it*tt******#******#************** XSOSY1 62 332 331 151 152 153 80 7O 3 MYSOS CY11 331 0 5p? CY12 332 0 5p? 4*444444444444444444444444:4*:444*********¢****s****** * Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I- *4*44444*4444*44*4444*44444**4444444444444444444444444 * realize y1 8 e1 - c11*111 - c12*112 - d12*y2 ’ XY11 80 7O 3 341 331 3 4 21 22 GMULDI XY12 80 70 3 342 332 3 4 21 22 GMULDi XY13 80 7O 3 322 62 3 _4 21 22 GMULD1 258 *********************************III************************ * MYSOS y1 :22 x21 Vbisosi Vblsos2 Vb13083 Vdd Vgnd Vref ********************************************************** XSOSY2 11 432 431 251 252 253 80 70 3 MYSOS CY21 431 O 4pF CY22' 432 0 4p? ****************************************************** * Subcircuit GMULDI Vdd Vss V1 Vref V2 Vref Vb I+ I- * 8 7 1 2 3 4 5 21 22 ****¢***************4*************************4444:444 * realize y2 8 e2 - c21*x21 - c22*x22 - d21*y1 XY21 8O 70 3 441 431 3 4 31 32 GMULDI XY22 80 7O 3 442 432 3 ‘4 31 32 GMULD1 XY23 80 7O 3 421 61 3 4 31 32 GMULD1 4444444444444444*##4##********s*********¢*****44:44:44 * Subcircuit MYIV Vdd Vss I+ I- Vbias Vout * 8 7 1 2 4 3 4444444444444444*44*4444444444444444444444:44444444444 XIV1 80 70 21 22 5 7 MYIV XIV2 80 70 31 32 5 72 MYIV 444444444444444444:44444444444444444444444*: * Subcircuit TRANSAMl V1 V2 Vb Vout Vdd Vgnd #**************#**************************** * Use Follower XF1 71 61 4 61 80 7O TRANSAMI XF2 72 62 4 62 80 7O TRANSAMl 4:44:4444444444444444:44:: * Update equation of dij 44444444444444444444444444 * Subcircuit BLOCK Vdd Vss yi Vdm yj Vb Vdp dij Vref 1812 80 7O 61 6 62 4 7 322 3 F68 CD12 322 O SnF * Subcircuit BLOCK Vdd Vss y2 Vdm y2 Vb Vdp d21 Vref X821 80 70 62 6 61 4 7 421 3 F68 CD21 421 0 5nF ************************** * Update equations of cij ***##********************t * Subcircuit HGMUL Vdd Vss yi Vref xij Vref Vbias cij XMUL11 80 7O 61 3 331 3 4 341 HGMUL XHUL12 80 70 61 3 332 3 4 342 HGMUL XHUL21 80 70 62 3 431 3 4 441 UGMUL XMUL22 80 7O 62 3 432 3 4 442 HGMUL CC11 341 0 1nf CC12 342 O inf 259 CC21 441 0 inf CC22 442 0 1nf *4444s4ts4#44 * Simulations 44:44:4444444 .LIB comp.lib .TRAN .Sm 50m .IC V(322) ' 1 .IC V(421) 4 .IC V(341) .IC V(342) .IC v(441) .IC V(442) .OP .PROBE .END m.»4.p - D.2 Library File ************************************************ * Transistor Models ***********¢********s*****s444444444444444444444 .MODEL nmos NMDS LEVEL=2 + VT080.783736 KP-5.46E-5 GAMMA-0.5262 LAMBDA-3.533329£-2 + TOX=412E-10 CGSOI2.6695658-10 CGDO-2.6695652-1O CJ-1.134E-4 + MJ-O.708 CJSH-4.77E-1O MJSH-O.253 XJ=0.1500U + TPG=1.0000 LDIO.212340U NSUB-5.86OE+15 NFS=9.54427OE+11 + NEFF=1.0 NSS=1.0000E+12 DELTA-1.99612 VHAX-57874.1 + U0-651 UEXP-O.177364 UCRIT830664.6 PB-O.800 PHI-0.6 + RSH=33.4OO CGBD-4.250255E-10 * .MDDEL pmos PMOS LEVEL-2 + VT08-0.807 KP-2.13E-5 GAMMA-0.5644 LAMBDA-5.6594915-2 + TOX'412E-1O CGSO=3.143031E-10 CGDOI3.143O31E-10 CJ-2.54E-4 + MJ-0.553 CJSU=3.31E-1O MJSH80.352 XJ-0.05U + TPGI-1.O LDtO.25U NSUB-6.74OOE+15 NFS=1.000E+11 + NEFF-1.001 NSS=1.0000E+12 DELTA-1.001368E-6 VMAx-37082.8 + U0=253.977 UEXP-O.2458 UCRIT=16929.2 PB-D.800 PHI80.6 + RSH-121.6000 CGBOI4.574377E-10 ********#******************************************** * Subcircuit FGB Vdd V88 81 Vdn S2 Vb Vdp Vout Vref * 1 2 3 4 5 6 7 8 9 t*****************************#*****#****t**#*¢****** .SUBCKT FGB 1 2 3 4 5 6 7 8 9 260 X1 1 2 3 4 9 6 11 12 5 FG X2 1 2 7 3 9 6 12 11 5 FG X3 1 12 10 PMIRRDR X4 2 10 8 NMIRROR X5 1 11 8 PMIRRUR .ENDS ******************************************************** * Subcircuit FG Vdd Vss V1 Vd V2 Vb Vout1 Vout2 Vref * 1 2 3 4 5 6 7 8 9 ****************************************************#*** .SUBCKT F6 1 2 3 4 5 6 7 8 9 M1 1 3 12 nmos H-6.0U L-6.0U M2 11 4 12 nmos H86.0U L86.OU M3 12 6 2 nmos H86.OU L-6.0U M4 12 13 2 nmos H-6.0U L-6.0U M5 13 13 2 nmos “-6.00 L86.OU M6 14 13 2 nmos U-6.0U L86.OU M7 7 9 14 nmos U-6.0U L-6.0U M8 8 5 14 nnos “-6.00 L-6.0U M9 1 11 11 pmos W36.0U L86.OU M10 1 11 13 pmos H-6.0U L86.OU .ENDS *********s***s*********4444444444* * Subcircuit PMirror Vdd Vref Vmir * 1 2 3 *tttttt*****#******************ttt .SUBCKT PMIRROR 1 2 3 M1 1 2 2 1 pmos U-6.0U L-6.0U M2 1 2 3 1 pmos H-6.0U L-6.0U .ENDS ************#*************#******t * Subcircuit NMirror Vss Vref Vnir * 1 2 3 ********************************** .SUBCKT NMIRROR 1 2 3 M1 2 2 1 1 nmos H86.OU L-6.0U M2 3 2 1 1 nmos H-6.0U L-6.0U .ENDS ti*t*ttt***********t**************************** * Subcircuit UGMUL Vdd Vss V1 V2 V3 V4 Vb Vout 4 8 7 1 2 3 4 5 6 * Schematic: Mead’s book page 95 *#**#**##*****#*#***it##***#******************** .SUBCKT HGMUL 8 7 1 2 3 4 5 6 M1 12 1 13 7 nmos H-6.0U L-6.0U HHMMMNMNMM 261 M2 16 2 13 7 nmos “-6.00 L-6.0U M3 12 12 8 8 pmos “=6.0U L-6.0U M4 16 16 8 8 pmos “-6.00 L-6.00 M5 15 12 8 8 pmos “-6.00 L=6.00 M6 18 16 8 8 pmos “-6.00 L=6.0U M7 11 4 15 8 pmos “-6.00 L-6.00 M8 14 4 18 8 pmos “-6.00 L-6.0U M9 14 3 15 8 pmos “-6.00 L-6.0U M10 11 3 18 8 pmos “-6.00 L-6.0U M11 11 11 7 7 nmos “-6.00 L-6.0U M12 14 14 7 7 nmos “-6.00 L-6.0U M13 17 11 7 7 nmos “-6.00 L-6.0U M14 6 14 7 7 nmos “-6.00 L-6.00 M15 17 17 8 8 pmos “-6.00 L-6.0U M16 6 17 8 8 pmos “-6.00 L-6.0U Mb 13 5 7 7 nmos “-6.00 L-6.0U .ENDS *******************t****tttttt****#************* * Subcircuit GMULD1 Vdd Vss V1 V2 V3 V4 Vb I+ I- * 8 7 1 2 3 4 5 21 22 * Schematic: Mead’s book page 95 tit******it***********t****¢#******************t .SUBCKT GMULD1 8 7 1 2 3 4 5 21 22 M1 12 1 13 7 nmos “-6.00 L-6.0U 142' 16 2 13 7 nmos “-6.00 L-6.0U M3 12 12 8 8 pmos “-6.00 L-6.0U M4 16 16 8 8 pmos “-6.00 L-6.0U M5 15 12 8 8 pnos “-6.00 L-6.0U M6 18 16 8 8 pmos “-6.00 L-6.0U M7 11 4 15 8 pmos “-6.00 L-6.00 M8 14 4 18 8 pmos “-6.00 L-6.0U M9 14 3 15 8 pmos “-6.00 L-6.0U M10 11 3 18 8 pnos “-6.00 L-6.0U M11 11 11 7 7 nmos “-6.00 L-6.0U M12 14 14 7 7 nmos “-6.00 L-6.0U M13 21 11 7 7 nmos “-6.00 L-6.00 M14 22 14 7 7 nmos “-6.00 L-6.0U Mb 13 5 7 7 nmos “-6.00 L-6.0U .ENDS .ENDS ##-*********************¥***************************** * Subcircuit MYIV Vdd Vss I+ I- Vbias Vout - 8 7 1 2 4 3 *****t************************************************ .SUBCKT’MYIV 8 7 1 2 3 4 262 - Mirror current out of 1, add to to current coming from 2 M1 8 1 1 8 pmos “-4.00 L=12.00 M2 8 1 2 8 pmos “-4.00 L=12.00 * Output gain Stage M3 8 2 4 7 nmos “-4.00 L-12.0U M4 4 3 7 7 nmos “-4.00 L=12.00 .ENDS **********************#44444444444444444444444 * Subcircuit STRANSAM V1 V2 Vb Vout Vdd Vgnd * 1 2 3 4 5 6 * “/L . 1/1 *********************4***************s******** .SUBCKT STRANSAM 1 2 3 4 5 6 M1 7 1 9 6 nmos “-6.00 L-6.00 M2 4 2 10 6 nmos “-6.00 L-6.00 M5 9 9 8 6 nmos “-6.00 L-6.00 M6 10 10 8 6 nmos “-6.00 L-6.0U Mb 8 3 6 6 nmos “-6.00 L-6.00 M3 7 7 5 5 pmos “-6.00 L-6.00 M4 4 7 5 5 pnos “-6.00 L-6.00 .ENDS **************#**********************¥******** * Subcircuit TRANSAMl V1 V2 Vb Vout Vdd Vgnd * . 1 2 3 4 5 6 4 “/L - 1/1 #4444444:44:44:444*44*444444444444444444444444 .SUBCKT TRANSAMi 1 2 3 4 5 6 M1 7 1 8 6 nmos “-6.00 L-6.0U M2 4 2 8 6 nmos “-6.00 L-6.00 Mb 8 3 6 6 nmos “-6.00 L-6.0U M3 7 7 5 5 pmos “-6.00 L-6.00 M4 4 7 5 5 pnos “-6.00 L-6.0U .ENDS t**t****************************************t* * Subcircuit TRANSAM V1 V2 Vb Vout Vdd Vgnd * 1 2 3 4 5 6 * “/L - 1/3 ***t***********#*#*#********#***************#* .SUBCKT TRANSAM13 1 2 3 4 5 6 M1 7 1 s 6 nmos “-6.00 L-6.0U M2 4 2 s 6 nmos “-6.00 L-6.0U Mb 6 3 6 6 nmos w-6.ou L-18.00 M3 7 7 5 5 pmos “-6.00 L-6.00 M4 4 7 5 5 pmos “-6.00 ]..-6.00 .ENDS 263 #**********************IN:*************************IkIHIIHI * Subcircuit MYSDS V1 V2 V3 Vb1 Vb2 Vb3 Vdd Vgnd Vref * 1‘ 2 3 11 12 13 s 7 9 * C(Al) > G(A3) annualan:************************************************ .susch MYSOS 1 2 3 11 12 13 s 7 9 XA1 1 2 1 1 2 8 7 STRANSAM XA2 2 9 12 3 8 7 STRANSAM XA3 2 3 13 2 8 7 TRANSAM13 .ENDS BIBLIOGRAPHY [1] BIBLIOGRAPHY B. Widrow et al., “Adaptive noise cancellation: principles and applications,” IEEE Proceedings, vol. 63, pp. 1691-1717, Apr. 1975. [2] F. M. Salam, “An adaptive network for blind separation of independent signals,” [3] [4] [5] [6] [7] [8] [9] Proc. of IEEE International Symposium on Circuits and Systems, vol. 1, pp. 431- 434, May 1993. K. S. Narendra and K. Parthasaraphy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. on Neural Networks, vol. 1, pp. 4— 27, March 1991. C. Jutten, Calcul neuromime’tique et traitement du signal, analyse en com- posantes inde’pendantes. These de doctorat d’etat, Université Scientifique et Médicale-Institut Nationale Polytechnique, Grenoble, France, 1987. C. Jutten and J. Herault, “Space or time signal processing by neural networks models,” Proceedings on Neural Networks for Computing, vol. 1, pp. 151 - 170, Apr. 1986. C. Jutten, J. Herault and A. Guerin, “In.c.a.: An independent components anal- yser based on neuromimetic network,” Artificial Intelligence and Cognitive Sci- ences, pp. 201-219, 1988. J. Herault and C. Jutten, “Space or time adaptive signal processing by neural network models,” Neural Network for computing, AIP Conference Proceedings, pp. 207-211, Apr. 1986. J. Herault and C. Jutten, “Blind separation of sources, part 1: An adaptive algorithm based on neuromimetic architecture,” Signal Processing, vol. 24, pp. 1- 10, July 1991. P. Comon, J. Herault and C. Jutten, “Blind separation of sources, part 2: Problem statement,” Signal Processing, vol. 24, pp. 11—20, July 1991. 264 265 [10] E. Sorouchyari, “Blind separation of sources, part 3: Stability analysis,” Signal Processing, vol. 24, pp. 21—29, July 1991. [11] J.C.Fort, “Stability of the jh sources separation algorithm,” Signal Processing, vol. 8, no. 1, pp. 35-42, 1991. [12] J. Karhunen and J. Joutsensalo, “Representation and separation of signals using nonlinear pca type learning,” Neural Networks, vol. 7, no. 1, pp. 113—127, 1993. [13] A. Chichocki and L. Moszcznski, “New learning algorithm for blind separation of sources,” Electronic Letters, vol. 28, pp. 1986—1887, 1992. [14] K. Meriam, A. Belouchrani and J. F. Cardoso, “Assymptotic performance of second order blind source separation,” Proc. of IEEE International Symposium on Speech and Signal Processing, vol. 4, pp. 277-280, 1994. [15] E. Moreau and O. Macchi, “New self-adaptive algorithms for source separation based on contrast functions,” Proc. of IEEE International Symposium on Speech and Signal Processing, vol. 1, pp. 515—519, 1993. [16] N. Delfosse and P. Loubaton, “Adaptive separation of independent sources: A deflection approach,” Proc. of IEEE International Symposium on Speech and Signal Processing, vol. 4, pp. 41—44, April 1994. [17] J. Lacome and P. Ruiz, “Source identification: A solution based on cumu- lants,” Proc. of IEEE International Symposium on Speech and Signal Processing, vol. Workshop, 1988. [18] J. F. Cardoso, “Source separation using higher order moments,” Proc. of IEEE International Symposium on Speech and Signal Processing, pp. 2109-2112, 1989. [19] J. F. Cardoso, “Super-symmetric decomposition of the fourth order cumulant tensor, blind identification of more sources than sensors,” Proc. of IEEE Inter- national Symposium on Speech and Signal Processing, pp. 14-17, 1991. [20] J. F. Cardoso, “Iterative techniques for blind source Separation using only fourth order cumulants,” European Signal Processing Conference, pp. 739-742, 1992. [21] J. F. Cardoso and B. Laheld, “Equivariant adaptive source separation,” submitted to_IEEE Trans. on Signal Processing, 1996. [22] [23] [24] [25] [26] [271 [28] [29] [30] [31] [32] 266 J. F. Cardoso and P. Comon, “Independent component analysis, a survey of some algebraic methods,” Proc. of IEEE International Symposium on Circuits and Systems, vol. 2, pp. 93-96, May 1996. P. Comon, “Independent component analysis: A new concepts?,” Signal Pro- cessing, vol. 36, no. 3, pp. 287-314, 1994. S. Amari, A. Cichocki ans H. Yang, “A new learning algorithm for blind signal separation,” MIT Press (in press), 1996. A. Bell and J. Sejnowski, “Fast blind separation based on information theory,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applica- tions, pp. 43—47, December 1995. T. Bell and T. J. Sejnowski, “A non-linear information maximisation algorithm that performs blind separation,” Advances in Neural Information Processing Sys- tems 7, MIT Press, pp. 467-474, 1995. G. Giannakis, Y. Inouye and J.M. Mendel, “Cumulants based identification of multichannel moving average models,” IEEE Trans. on Automatic Control, vol. 34, pp. 783—787, 1989. L. Tong, R. Liu, V. C. Soonand Y. Huang, “Inderminacy and identifiability of blind identification,” IEEE Trans. on Signal Processing, vol. 38, pp. 409—509, May 1991. L. Tong, Y. Inouye and R. Liu, “Wave-preserving blind estimation of multiple independent sources,” IEEE Trans. on Signal Processing, vol. 41, pp. 2461—2470, July 1993. X. Ling, Y. Huang and R. Liu, “A neural network for blind signal separation,” Proc. of IEEE International Symposium on Circuits and Systems, vol. 27, pp. 69— 72, May 1994. V. C. Soon, L. Tong, F. Huang and R. Liu, “An extended fourth order blind iden- tification algorithm in spatially correlated noise,” Proc. of IEEE International Symposium on Speech and Signal Processing, pp. 1365-1368, 1990. R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE trans. on Antennas Propagation, vol. AP, pp. 276-280, March 1986. [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] 267 R. Roy and T. Kailath, “Esprit-estimation of signal paramater via rotational- invariance techiques,” IEEE trans. on signal Processing, vol. 37, pp. 984—993, July 1989. B. Agee, “Blind separation and capture of communication signals using a multi- target constant modulus beamformer,” IEEE Millitary Communication Confer- ence, pp. 340—346, 1989. A. Swindlehurst, S Daas and J. Yang, “Analysis of a decision directed beam- former,” IEEE Trans. on Signal Processing, vol. 43, pp. 2920-2927, December 1995. B. Agee, A. Schell and W. Gardner, “Spectral self-coherence restoral: A new approach to blind adaptive signal extraction using antenna arrays,” IEEE Pro- ceedings, vol. 78, pp. 753-767, April 1990. S. Talwar, M. Viberg and A. Paulraj, “Blind separation of synchronous co- channel digital signals using an antenna array,” IEEE Trans. on Signal Pro- cessing, vol. 44, pp. 1184—1197, May 1996. A. Belouchrani and F. F. Cardoso, “Maximum likelihood source separation by the expectation-maximization techniquez, Deterministic and stochastic imple— mentation,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 49—53, December 1995. D. Pham, P. Garrat and C. Jutten, “Separation of a mixture of independent sources through a maximum likelyhood approaCh,” European Signal Processing Conference, pp. 771—774, 1992. G. Giannakis and M. Tsatsanis, “A unifying maximum likelyhood view of cu- mulants and polyspectral measures for non-gaussian signal classification and es- timation,” IEEE Trans. on Information Theory, vol. 38, pp. 386-406, March 1992. E. Moulines, P. Duhamel, J. F. Cardoso and S. Marrague, “Subspace methods for the blind identification of multichannel fir filters,” IEEE Trans. on Signal Processing, vol. 43, pp. 516-525, February 1995. S. Gerven and D. Compernolle, “Feedforward and feedback in a symmetric adap- tive noice canceller: Stability analysis in simplified case,” Signal Processing, pp. 1081-1084, 1992. [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] 268 H. Thi and C. Jutten, “Blind source separation for convolutive mixtures,” Signal Processing, vol. 45, pp. 209-229, 1995. T. Bell and T. J. Sejnowski, “Blind separation and blind deconvolution: An information-theoretic approach,,” Proc. of IEEE International Symposium on Speech and Signal Processing, pp. 3415-3418, May 1995. A. Bell and T. Sejnowski, “An information-maximization approach to blind sep- aration and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129— 1159, 1995. A. B. Gharbi and F. M. Salam, “Blind separation of independent sources in linear dynamical media,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, December 5-9 1993. A. B. Gharbi and F. M. Salam, “Separation of mixed signals: Formulation and some implementations,” Proc. of IEEE Midwest Symposium on Circuits and Systems, August 4-6 1994. I E. Vittoz and X. Arreguit, “Cmos integration of herault-jutten cells for sep- aration of sources,” Proceedings Workshop on Analog VLSI and Neural Systems, May 1989. M. H. Cohen and G. Andreou, “Current-mode subthreshold mos imple- mentation of herault-jutten autoadaptive network,” IEEE Journal of Solid-State Circuits, vol. 27, pp. 714—727, May 1992. A. B. Gharbi and F. M. Salam, “Implementation and experimental results of a chip for the separation of mixed and filtered signals,” Journal of Circuits, Systems and Computers, vol. 6, pp. 115-139, April 1996. A. B. Gharbi and F. M. Salam, “Test results of a chip for the separation of mixed and filtered signals,” IEEE Trans. on Circuits and Systems, vol. 42, pp. 748—751, November 1995. A. B. Gharbi and F. M. Salam, “Implementation and test results of a chip for the separation of mixed signals,” Proc. of IEEE International Symposium on Circuits and Systems, vol. 1, pp. 271—274, May 1995. E. Oja, “A simplified neuron model as a principal componenet analyzer,” Journal of Mathematical Biology, vol. 15, pp. 267-273, 1982. ' 269 [54] E. Oja, “Principal components, minor components, and subspaces,” Neural Net- works, vol. 1, pp. 61—68, 1992. [55] S. Amari, Difi'erential-Geometry Methods in Statistics, Lecture Notes in Statis- tics, vol. 28, vol. 28. New York: Springer, 1985. [56] J. F. Cardoso, “The invariant approach to source separation,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 55-60, December 1995. [57] C. Jutten and J. F. Cardoso, “Separation of sources: Really blind?,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 79-84, December 1995. [58] S. Kullback, Topics in statistical information theory. New York: Springer-Verlag, 1987. [59] Chi-Tsong Chen, Linear System Theory and Design. New York: Holt, Rinehart and Winston Publishing, 1984. [60] J. R. Magnus and H. Neudecker, Matrix Difl'erential Calculus with Applications in Statistics and Economics. New York: John Wiley Series in Probability and Mathematical Statistics, 1988. [61] D. Yellin, E. Weinstein, “Multichannel signal separation: Methods and analysis,” IEEE Trans. on Signal Processing, vol. 44, pp. 106-118, January 1996. [62] John E. Kolassa, Series Approximation Methods in Statistics. New York: Springer-Verlag, 1994. [63] A. Cichocki, S. Amari, M. Adachi and W. Kasprzak, “Self-adaptive neural net- works for blind separation of sources,” Proc. of IEEE International Symposium on Circuits and Systems, vol. 2, pp. 157—160, May 1996. [64] S. Amari, A. Cichocki and H. H. Yang, “Recurrent neural networks for blind separation of sources,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 37-42, December 1995. [65] A. Cichocki, W. Kasprzak and S. Amari, “Multi-layer neural network with local adaptive learning rules for blind separation of source signals,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 61-65, December 1995. 270 [66] L. De Lathauwer, P. Comon, B. De More and J. Vandewalle, “Application in independent component analysis,” Proc. 0f IEEE International Symposium on Nonlinear Theory and its Applications, pp. 91—96, December 1995. [67] Peter McCullagh, Tensor methods in statistics. New York: Chapman and Hall, 1987. [68] Gram Charlier, Application, de la theorie des probabilites a l’astronomie, Part of the traite. Paris: Gauthier-Villars, 1931. [69] L. Watts, D. A. Kerns, R. F. Lyon and C. A. Mead, “Improved implementation of the silicon cohhlea,” IEEE Journal on Solid-State Circuits, vol. 27, pp. 692- 700, May 1992. [70] B. Gilbert, “A precise four-quadrant multiplier with subnanoseconds response,” IEEE Journal of Solid-State Circuits, vol. SC, no. 3, pp. 365-372, 1968. [71] C. Mead, Analog VLSI and Neural Systems. New York: Prentice Hall, 1989. [72] A. Papoulis, Probability, random variables and Stochastic Processes. New York: McGraw-Hill, 1984.