PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE WYPEMEE T §413 :2 6/01 cJCIRC/DateDuepGS—p. 1 5 i B\ Blind Source Recovery: Theoretical Formulations, Implementations and Application to CDMA Communication Systems By Khurram Waheed A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering 2003 Blind Blind Sou approach 1 include co. is to rccox environmc adeptcd t Constraint adaptatior Tl framcwm general L imCICSlin [”3 for haVe DEC algorithm undercm BSR alg ABSTRACT Blind Source Recovery: Theoretical Formulations, Implementations and Application to CDMA Communication Systems By Khurram Waheed Blind Source Recovery (BSR) is an autonomous (or unsupervised) stochastic adaptation approach that denotes recovering signals from measurements in environments that may include convolution, transients, and even possible nonlinearity. The primary goal of BSR is to recover original source signals, as best as possible, even in the absence of precise environment identifiability. A discrete-time optimization framework for BSR has been adopted based on minimization of the Kullback-Lieblar Divergence, subject to the constraints of a state-space representation, using the Riemannian contra-variant gradient adaptation. The modeling of the environment and its representation is vital in the proposed framework. The adoption of the state space framework allows for the derivation of more general update laws capable of catering to most known filtering paradigms. For the interesting case of multi-variable linear time-invariant dynamic BSR, parametric update laws for discrete—time canonical feedforward and feedback state space configurations have been derived. Higher Order Statistics have been explored to obtain adaptation algorithms that are purely blind of actual source distributions. The BSR cases of undercomplete and overcomplete mixtures have also been investigated. A new two-stage BSR algebraic algorithm for sparse static overcomplete mixtures has been developed. Ir Further. 1' modern l dcmonstr. Cl research 1 scenarios. and comp: this thesis 0 Info-th invcsri phase feedba ° Ancx' of ada sourci- ' Thcc algori ovcrc OfAI. Deve‘ CDM framc Further, the BSR framework has been successfully formulated for multi-user detection in modern CDMA wireless communication networks. Promising new results clearly demonstrate the effectiveness and practicality of the formulated approach. Computer Simulations have been widely conducted during the course of this research to evaluate the performance of all the developed algorithms for a variety of scenarios. In most cases, the performance verification has been done using actual speech and communication data in the presence of noise. In summary the main contributions of this thesis are Info-theoretic update laws for a multi-variable dynamic state space network were investigated. Natural gradient based BSR update laws for the cases of minimum phase and non-minimum phase mixing environments as well as both feedforward and feedback canonical state space configurations were developed and implemented. An exploration and extension of parametric source distribution models and derivation of adaptive score-functions (or non-linearities) for the BSR of sources with multiple source distributions. The development of the Algebraic ICA (AICA) Algorithm, which is a new ICA algorithm for blind, sparse, static mixing matrix recovery. It enables the BSR from overcomplete mixtures of speech and other sparse distributions using a combination of AICA and interior point linear programming (IP-LP) techniques. Development and simulation of new Blind Multi-user Detection algorithms for DS- CDMA and WCDMA wireless communication networks based on the BSR framework. Dedicated to the most important people in my life My parents, my wife and family iv lwould 1' for his g his value stimulalio Sir Professor Professor Dtpanmm my career I l encouragg lndividua' M affection completlr eTlrlchcd L Skiers? lr ACKNOWLEDGEMENTS I would like to express my sincere thanks and appreciation to Professor Fathi M. Salem for his guidance, support and inspiration throughout the course of this research. Without his valuable, kind consideration, understanding, firm belief in my capabilities and stimulation of my mind this dissertation would not be an object of my pride. Sincere gratitude is extended to the members of my guidance committee: Professor Habib Salehi from the Department of Statistics, Professor Hassan Khalil, Professor John R. Deller Jr., Professor Ning Xi and Professor Tongtong Li from the Department of Electrical and Computer Engineering for their valuable contributions to my career and service on my committee. I would not be who I am today without the unconditional love, support and encouragement of my parents Abdul Waheed and Najma. They have shaped me into the individual I am and inspired me to pursue my dreams in life. My sincere thanks and gratitude to my wife Ayesha for her patience, love affection, support, encouragement and sacrifice during every step that I made towards the completion of this research. My love and adoration to my kids Rafay and Iqra, who have enriched my life but have to spend most of their time without me. Last, but not least, I would like to express my sincere thanks to my brother, sisters, in-laws and their family members for their spiritual and moral support during the course of this research. LIST ( LIST ( Chapter INTRO 1.1 1.2 13 1.4 1 1.4.1 TABLE OF CONTENTS LIST OF TABLES x1v LIST OF FIGURES xv Chapter 1 INTRODUCTION ......................................................................................... 1 1.1 Background - - -- -- - 3 1.2 Importance of Modeling.... ..................... -- -- - -- - 4 1.3 The State Space Framework----- - - -- -- 4 1.4 Problem Definition: Adaptive Framework for Blind Source Recovery ...... 6 1.4.1 Blind Source Separation: An Example ........................... i ............................ 8 1.5 Organization of the dissertation -- -- - ....... 11 Chapter 2 REVIEW OF LITERATURE ..................................................................... 14 2.1 The Natural Gradient Framework--- .......... - - - -- 18 2.1.] Definition of the Natural Gradient ............................................................ 19 2.1.2 Theorem 2.1 .............................................................................................. 20 2.1.3 The Natural Gradient Learning ................................................................. 21 2.2 Blind Source Separation. ................................................................................ 22 2.2.1 Theorem 2.2 .............................................................................................. 23 vi 23 2.4 Chapter BLIND SPACE 3.1 3.1.1 3.1.2 3.2 3.2.1 3.3 2.3 Blind Source Deconvolution. - - - - - - -- - 24 2.3.1 Theorem 2.3 .............................................................................................. 24 2.4 Equivariant Performance of the Natural Gradient - - 25 Chapter 3 BLIND SOURCE RECOVERY: A FRAMEWORK IN THE STATE SPACE .......................................................................................................... 26 3.1 Formulation of the Optimization Structure -- - 27 3.1.1 Algorithms for the Nonlinear Dynamic Case ........................................... 28 3.1.2 Algorithm for Nonlinear Dynamic Canonical Neural Structure ............... 31 3.2 Algorithms for the Linear Dynamic Case - 34 3.2.1 Extensions to the Natural Gradient ........................................................... 37 3.3 Blind Source Recovery Algorithms for Minimum Phase Environments... 38 3.3.] Theorem 3.1: Feedforward Minimum Phase Demixing Network ............ 38 3.3.1.1 Derivation of Update Laws for F eedforward BSR ............................... 39 3.3.1.2 Auxiliary Conditions for Convergence ................................................. 41 3.3.1.2.] Remark 3.1: ..................................................................................... 41 3.3.1.2.2 Remark 3.2: ..................................................................................... 42 3.3.2 Theorem 3.2: Feedback Minimum Phase Demixing ................................ 43 3.3.2.1 Derivation of Update Laws for Feedback BSR .................................... 43 3.3.2.2 Auxiliary Conditions for Convergence ................................................. 46 3.3.2.2.] Remark 3.3: ..................................................................................... 46 vii 3.4 [min 3.4. 3. B 3.5 3.5.1 Chapter BLIND FUNCT] 3.4 Blind Source Recovery Algorithms for Non-minimum Phase Environments ....... .......... -- - ...................... - 47 3.4.1 Theorem 3.3: Feedforward Non-minimum Phase Demixing ................... 47 3.4.1.1 Derivation of Update Laws for Feedforward Non-minimum Phase BSR ............................................................................................................... 48 3.4.1.1.] Remark 3.4: ..................................................................................... 55 3.5 Simulation Results - - - - -- -- -- -- 56 3.5.1 BSR Simulations -- Minimum Phase Mixing Environments .................... 58 3.5.1.1 Simulation 1: Feedforward Recovery of Minimum Phase Mixing ....... 61 3.5.1.2 Simulation H: Feedback Recovery of Minimum Phase Mixing ........... 65 3.5.1.3 Simulation III: Feedforward Recovery of Non-minimum Phase Mixing ............................................................................................................... 69 3.6 Remarks. ....... - .................... -- - - i - -- - , 74 Chapter 4 BLIND SOURCE RECOVERY USING ADAPTIVE SCORE FUN CTION S.... ...... . .............. .. ...... 75 4.1 Generalized Gaussian Distribution Model - -- -- - 77 4.1.1 Generalized Kurtosis Measures ................................................................ 79 4.1.2 Relationship between Bi (oi ,ori ) and ed .................................................. 81 4.1.3 Derivation of Generalized Gaussian Adaptive Score Function (GGASF) 82 4.2 Hyperbolic Source Density Models - - - -- -- 84 4.2. 1 Background ............................................................................................... 84 viii 4.2.1.1 Pearson Distribution Model .................................................................. 84 4.2.1.2 Generalized Cauchy Distribution .......................................................... 86 4.2.2 Proposed Hyperbolic Distribution Models ..; ............................................ 87 4.2.2.1 Hyperbolic Sub-Gaussian Model .......................................................... 87 4.2.2.2 Hyperbolic Super-Gaussian Model ....................................................... 89 4.2.3 Derivation of Hyperbolic Adaptive Score Function (HASF) ................... 91 4.3 Adaptive Estimation of the Fisher (or normalized) Kurtosis ..... - - 94 4.4 Parameter Regulation for the Proposed Adaptive Score Functions .......... 95 4.4.1 Choice of parameters for Generalized Gaussian Adaptive Score Function ................................................................................................................... 96 4.4.2 Choice of parameters for Hyperbolic Adaptive Score Function ............... 96 4.5 Simulation Results - ....... - -- 97 4.5.1 Environment Model .................................................................................. 98 4.5.2 Demixing Network Structure .................................................................... 98 4.6 Remarks...-. .......... - -- - - - -- ..... 103 Chapter 5 BLIND SOURCE RECOVERY OF UNDERCOMPLETE AND OVERCOMPLETE MIXTURES ........ 105 5.1 BSR of Undercomplete Mixtures ................................................................. 107 5.1.1 Efficient reduction of Mixture Dimensions ............................................ 108 5.1.1.1 Algebraic Principal Component Analysis ........................................... 108 5.1.1.2 Adaptive Principal Component Analysis ............................................ 109 ix ..... 5.4 Chapte BLIND CDMA 6.1 6.2 63 6.3 5.2 BSR of Overcomplete Mixtures---- - - -- - - 110 5.2.] Two Stage Approach to Overcomplete ICA ........................................... 110 5.3 Algebraic Independent Component Analysis--. ...... - - -- -- - -- - -- 112 5.3.1 The Algebraic ICA Algorithm ................................................................ 116 5.3.2 Algebraic Matrix-Distance Index (AMDI) ............................................. 120 5.3.3 Resolution of AICA Local Minima and Ambiguities ............................. 121 5.3.4 Inferring Sources from Overcomplete Mixtures ..................................... 123 5.3.5 Simulation Examples .............................................................................. 124 5.3.5.1 Simulation 1: 4 x 2 Laplacian Data ..................................................... 125 5.3.5.2 Simulation H: Overcomplete Speech Data (3 x 2 Case) ..................... 127 5.3.5.3 Simulation III: 5 x 3 Speech Data ....................................................... 130 5.3.5.4 Simulation IV: Independent Identically Distributed (iid) Laplacian Data ..................................................................................... ' ........................ 134 5.4 Remarks ........ ...... 136 Chapter 6 BLIND SOURCE RECOVERY APPROACHES FOR WIRELESS lDS- CDMA AND WCDMA DOWNLIN K SYSTEMS 138 6.1 User Detection in CDMA Systems- - - - - - - 139 6.2 CDMA Downlink Communication Channel - - - - -- 143 6.3 Downlink Signal Models for CDMA Systems - -- -- 146 6.3.1 DS-CDMA Signal Model ....................................................................... 147 6.3.2 WCDMA (UMTS F DD) Signal Model .................................................. 151 X 6.4 CDMA User Detection Structures- - - - - -- -- 154 6.4.1 Conventional Detection Schemes ........................................................... 154 6.4.1.1 Matched Filter (MF) Detector..................‘. .......................................... 155 6.4.1.2 RAKE Detector ................................................................................... 156 6.4.1.3 Linear Minimum Mean Squared Error (LMMSE) Detector. .............. 157 6.4.1.4 Channel Estimation Using The Common Pilot Channel (CPICH) ..... 158 6.4.1.4.] Channel Estimation Procedure ...................................................... 159 6.4.2 Proposed Blind Source Recovery Detection Schemes ............................ 160 6.4.2.1 Natural Gradient Blind Multi-User Detection (BMUD) Algorithms. 160 6.4.2.1.1 Feedforward BMU D Configuration: ............................................. 163 6.4.2.1.2 Feedback BMUD Configuration I: ............................................... 164 6.4.2.1.3 Feedback BMUD Configuration II: ............................................. 165 6.4.2.2 RAKE-Blind Source Recovery (RAKE—BSR) and RAKE-Principal Component Analysis (RAKE-PCA) Detectors ................................................... 167 6.5 Simulation Results - ----- - - -- 169 6.5.1 Simulation 1: DS-CDMA System ........................................................... 170 6.5.2 Simulation H: WCDMA (UMTS F DD) System ..................................... 173 6.6 Remarks....--- - -- -- - -- - - -- - - -- - 176 Chapter 7 CONCLUSIONS. ........ . ...................................................... ............... . 178 xi APP] Appen letof Appenr Kuflbar Source Append State Sp Cl. (12. 1 Appendi Pfldbrnl Appendil DJJ. 1111 D141 01;; D15. APPENDICES Appendix A List of Abbreviations and Acronyms ...................................................... 184 Appendix B Kullback Lieblar Divergence: A Performance Functional for Blind Source Recovery ........................................................................................ 187 Appendix C State Space Network: Specialized Filtering Structures ........................ 190 C.1. IIR Filtering - - -- - 190 C.2. FIR Filtering- ----- - - - - - -- -- -- - - - 193 Appendix D Performance Benchmarks for Blind Source Recovery ......................... 195 Appendix D]. Conventional Performance Benchmarks ...................... 195 D.l.l. Signal to Noise Ratio (SNR) ................................................................... 195 D.l.2. Multichannel Intersymbol Interference (MISI) ...................................... 196 D. l .3. Maximum signal to interference plus noise ratio (SINRM) ................... 197 D] .4. Mean Squared Error (MSE) .................................................................... 198 D.l.5. Distortion Measure .................................................................................. 198 D.l.6. Separation Measure ................................................................................. 199 xii Appen D1 APpend Multi-I‘ 12.1. BIBL Appendix D.2. Quadratic Information Measure (QIM) ....................... 200 D.2.1. Practical Issues in Estimating Independence of Output Sequences ........ 200 D.2.2. Independence Benchmark using Quadratic Mutual Information ............ 202 D.2.2.1. Renyi’s Entropy .................................................................................. 202 D.2.2.2. Parzen Window Estimator .................................................................. 203 D.2.3. Quadratic Independence Measure (QIM) ............................................... 204 D.2.4. Simulations Results ................................................................................. 209 D.2.5. Observations ........................................................................................... 211 Appendix D.3. Algebraic Matrix Distance Index (AMDI) ................... 212 Appendix E Multi-Input Multi-Output (MIMO) Canonical Form .......................... 214 E.1. Transformations required for a MIMO Transfer Function 215 BIBLIOGRAPHY ............................................................................ 219 xiii LIST OF TABLES Table 3.]. Quantitative Signal Recovery Comparison of the proposed algorithms .......... 73 xiv Figure 1 Figure 1 . Figure Function Figure 3. Figure 3.: Figure 3.3 Figure 3.~ 311d Poles. Figure 3.5 Environmi Figure 3.6 Figure 3.7. Demixing Figure 3.8. NCIWOrk T Figure 3.1( Figure 3']‘ Figure 3.1; Figure 3.1 3 ms and P LIST OF FIGURES Figure 1.1. General framework for Blind Source Recovery ............................................... 6 Figure 1.2. Blind Source Separation (a) Original Sources (b) Observed Mixtures ............ 9 Figure 1.3. Blind Source Separation (a) Separated Signals (b) Global Transfer Function ............................................................................................................................ 10 Figure 3.1. State Space Mixing Environment ................................................................... 34 Figure 3.2. State Space Demixing Network ...................................................................... 35 Figure 3.3. State Space Feedback Demixing Structure ..................................................... 43 Figure 3.4. Minimum Phase Mixing Environment (a) Environment Transmission Zeros and Poles, (B) Impulse Response of the Environment Model .......................................... 59 Figure 3.5. Minimum Phase Mixing Environment, Impulse Response of (a) Theoretical Environment Inverse, (b) Estimated Demixing Network ................................................. 60 Figure 3.6. Minimum Phase Mixing Environment, Final Global Transfer Function ....... 61 Figure 3.7. Minimum Phase Feedforward BSR —— All poles at the theoretical solution: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 62 Figure 3.8. Minimum Phase Feedforward BSR -— All poles set to zero: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ........................ 63 Figure 3.9. Minimum Phase Feedforward BSR — All poles initialized arbitrarily: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 64 Figure 3.10. Minimum Phase Feedback BSR — All poles at the theoretical solution: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 66 Figure 3.11. Minimum Phase Feedback BSR - All poles set at zero: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ........................ 67 Figure 3.12. Minimum Phase Feedback BSR — All poles initialized arbitrarily: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index ....... 68 Figure 3.13. Non-minimum Phase Mixing Environment (a) Environment Transmission Zeros and Poles, (b) Impulse Response of the Environment Model ................................. 70 XV Figure . Theoreiz. Figure : Functior Figure Perfomin Figure 4. Figure 4.. Figure 4 parametei Figure 4 function I Figure 4. range 0f1 Flgure 4. range of I:lgure 4 (b) hwc? Flglue 4 “EM 4‘ “Verse. ngure 4 kenVorl Figure . Batch K FlgUre , ndex’( Flgure unCUc Figure 3.14. Non-minimum Phase Mixing Environment, Impulse Response of (a) Theoretical environment Inverse, (b) Estimated Demixing Network ............................... 71 Figure 3.15. Non-minimum Phase Mixing Environment, (a) Final Global Transfer Function, (b) Demixing Network Transmission Zeros and Poles ..................................... 72 Figure 3.16. Non-minimum Phase Mixing Environment, Convergence of MISI Performance Index. ........................................................................................................... 73 Figure 4.1. Some members of Generalized Gaussian Distribution Family ...................... 78 Figure 4.2. Fisher kurtosis Characteristics for the Generalized Gaussian Family ............ 81 Figure 4.3. Scaled Generalized Gaussian Score Function for different values of parameter or ....................................................................................................................... 83 Figure 4.4. Fisher Kurtosis Characteristics for the symmetric Pearson Model as a function of ,u .................................................................................................................... 85 Figure 4.5. (a) Proposed Symmetric Sub-gaussian Density Model, (b) Fisher Kurtosis range of the proposed model; forO S k S 3 and 2.,- =1 . ..................................................... 88 Figure 4.6. (a) Proposed Symmetric Super-gaussian Density Model, (b) Fisher Kurtosis range of the proposed model; for 0 S k .<_ 3; 11,- =1 . ............................... p .......................... 90 Figure 4.7. Score Functions based on the proposed: (a) hyperbolic sub-gaussian model, (b) hyperbolic super-gaussian model; for 0 S k S 3; 1,- =1. ............................................ 92 Figure 4.8. Proposed Adaptive Hyperbolic Score Function: some possible combinations. ........................................................................................................................................... 94 Figure 4.9. (a) Mixing Environment Transfer Function, (b) Theoretical Environment Inverse ............................................................................................................................... 99 Figure 4.10. Feedforward BSR using Adaptive Score Functions: (a) Estimated Demixing Network Transfer Emotion, (b) Convergence of MISI Index ........................................ 100 Figure 4.11. Feedforward BSR using Adaptive Score Functions: (b) Convergence of Batch Kurtosis, (c) Final Global Transfer Function ....................................................... 101 Figure 4.12. Feedback BSR using Adaptive Score Functions: (a) Convergence of MISI Index, (b) Convergence of Batch Kurtosis ..................................................................... 102 Figure 4.13. Feedback BSR using Adaptive Score Functions: Final Global Transfer Function .......................................................................................................................... 103 xvi Figure 5 space.”. Figure 5 initial w Figure 5 Converg Figure 5 Mixed s; Figure 5.1 Figure 5.7 Figure 58‘ .......... Humor Figure 6.2, Humor Humor Figme 6. 5. Figul‘e 6.6. “Wear 051%” FiEBJre 6. 8, will] PErfeC.' BEWC ...... Figure 5.1. A scatter plot example of the projection of two sources onto the 2 x 2 mixture space ................................................................................................................................ 115 Figure 5.2. Convergence of the weights in the 2 x 2 mixture space; wi(0) denotes the initial weight locations while wi(oo) denotes the final weight location. ........................ 119 Figure 5.3. BMMR for Laplacian Data using AICA: (a) Interim Results, (b) Final Convergence ................................................................................................................... 126 Figure 5.4. Overcomplete Speech Mixture (3 x 2 Case): (a) Original speech sources, (b) Mixed speech .................................................................................................................. 128 Figure 5.5. Overcomplete Speech Mixture (3 x 2 Case): (a) Recovered speech, (b) Convergence of Normalized AMDI over 100 experiments ............................................ 129 Figure 5.6. Overcomplete Speech Mixture (5 x 3 Case): 3 Overcomplete Mixtures ..... 131 Figure 5.7. Overcomplete Speech Mixture (5 x 3 Case): Original 5 Speech Utterances 132 Figure 5.8. Overcomplete Speech Mixture (5 x 3 Case): Recovered 5 Speech Utterances ......................................................................................................................................... 133 Figure 5. 9. Unambiguous AICA estimate using multiple runs: (a) using H1, (b) using H 2 .................................................................................................................................. 135 Figure 6.1. Signal Generation Model for a QPSK DS-CDMA system .......................... 147 Figure 6.2. Signal generation based on the proposed 3GPP UMTS F DD standard ....... 152 Figure 6.3. Channel Estimation Using CPICH ............................................................... 160 Figure 6.4. Feedforward Demixing Structure ................................................................. 163 Figure 6.5. Feedback Demixing Structure I .................................................................... 164 Figure 6.6. Feedforward Demixing Structure II ............................................................. 165 Figure 6.7. DS-CDMA downlink performance for K=20, 50 users (a) using Gold Codes G=63, (b) using OVSF Codes G=64. .............................................................................. 172 Figure 6.8. WCDMA Downlink System for K=20, 50 users (a) Performance Comparison with Perfect Channel Estimation, (b) Performance Comparison with Imperfect Channel Estimate ........................................................................................................................... 174 xvii Figure l with 1111 Figure l (b10113. Figure I of Ql.\1 ; Figure 6.9. WCDMA Downlink System for K=20, 50 users: Performance Comparison with Imperfect Channel Estimate and Inter-Cellular Interference .................................. 175 Figure D. 1. Effect of length of observation sequence on (a) Cross Information Measure, (b) Quadratic Independence Measure (QIM) .................................................................. 208 Figure D.2. (a) Convergence of MISI performance Index, (b) Corresponding Performance of QIM performance Index ............................................................................................. 210 xviii Blind 501 recover 11 SOUICC pn CllVirol-malC illfo-tthn 0f the Cnv Optimzn an 811 in the “flea Separation : 0f recQVCril thir linear deconl’Oly i r a prion' 1'an [33’ 54]. Th Chapter 1 ' INTRODUCTION Blind Source Recovery (BSR) is an unsupervised adaptation technique to stochastically recover independent original source signals from corrupted measurements caused by source propagation through a noisy, linear or nonlinear, dynamic mixing and filtering environment. The measurements from this environment are the only inputs to an adaptive info-theoretic BSR network, which seeks to counteract, to the extent possible, the effects of the environment [34, 64, 139, 140]. Eventually, the purpose of BSR is to develop an optimal autonomous blind estimate of original signals [8, 9, 53, 110]. Blind Source Recovery (BSR) is a challenging signal processing formulation that in the linear case includes the well-known adaptive filtering sub-problems of blind source separation and blind source deconvolution. Blind Source Separation (B88) is the process of recovering a number of original stochastically independent sources/signals when only their linear static mixtures are available. Blind Source Deconvolution (BSD) deals with deconvolving the effects of a temporal or a spatial mixing linear filter on signals without a priori information about the filtering medium/network or the original sources/signals [33, 54]. This dissertation presents a generalized framework based on the multi-variable l canonic system. from 1111 have als l indepent signals’s learning World ap T analyses ; aPproachi 77], We 1: rePresenta The fram minimurn Separate u and chOn [112, 1401 derived fro mixfures o llndercOmpI SOlution c l canonical state space representation of both the mixing environment and the demixing system. Various filtering paradigms have been consequently derived as special cases from the proposed framework. Simulation results verifying the theoretical developments have also been included in this work [109, 110, 126]. For this unsupervised adaptive filtering task, only the property of signal independence is assumed. No additional a priori knowledge of the original signals/sources is assumed. BSR requires few assumptions and possesses the self- learning capability, which render such networks attractive from the viewpoint of real world applications where on-line training is often desired. The challenges for the BSR reside in the development of sound mathematical analyses and a framework capable of handling a variety of diverse problems. While other approaches to BSR have used more conventional signal processing structures [33, 60, 77], we propose a comprehensive BSR framework based on multi-variable state-space representations, optimization theory, the calculus of variations and higher order statistics. The framework is utilized to develop natural gradient adaptation algorithms for both minimum phase and non-minimum phase environments [110, 112, 126, 129, 135, 139]. Separate update laws for the feedforward and feedback recovering (i.e., both demixing and deconvolving) network parameters have been derived and verified by simulations [112, 140]. Further, algorithms have been extended using adaptive score functions derived from comprehensive source distribution models for unmitigated recovery from mixtures of multiple source distributions [127, 130, 136, 138]. The problems of undercomplete and overcomplete BSR have also been extensively studied. A new solution, called Algebraic Independent Component Analysis (AICA) for the case of static. r propose commur problen: conventi 1.1 B: Interest 1 during r; SCparatioz parallel a1 ability to stimuli a static, overcomplete mixtures of sparse sources has been proposed [132-134, 137]. The proposed adaptive BSR framework is also applied to the modern CDMA wireless communication networks (such as DS-CDMA, WCDMA etc.) to efficiently address the problem of multi-user detection in downlink channel with favorable results compared to conventional techniques [ 122-1 25]. 1.1 Background Interest in the field of blind source separation and deconvolution has grown dramatically during recent years, motivated to a large extent by its similarity to the mixed signal separation capability of the human brain. The brain makes use of unknown nonlinear parallel and complex dynamic signal processing with auto-leaming and self-organization ability to perform such tasks. The peripheral nervous system integrates complex external stimuli and endogenous information into packets, which are transformed, filtered and transmitted in a manner that is yet to be completely understood. This complex mixture of information is received by the central nervous system (brain), split again into original information, and relevant information relayed to various sections of cerebral cortex for further processing and action [89]. BSR is valuable in numerous applications that include telecommunication systems, sonar, acoustics and radar systems, mining and oil drilling systems, image enhancement, feature extraction and biomedical signal processing. Consider, for example, the audio and sonar applications where the original signals are sounds, and the mixed signals are the output of several microphones or sensors placed at different vantage points. A network would receive, via each sensor, a mixture of original sounds that usually undergo multi-path delays. The network’s role in this scenario will be to 3 apphca: cardiac engine '1 1.2 Ir Develop1 multi-pai practical highly St HCIWOrk Variation propagat eve“ nor dynamically reproduce, as closely as possible, the original signals. These separated signals can subsequently be channeled for firrther processing or transmission. Similar application scenarios can be described in situations involving measurement of neural, cardiac or other vital biological parameters, communication in noisy environments, engine or plant diagnostics, and cellular mobile communications. 1.2 Importance of Modeling Development of adequate models of the environment that include time delays or filtering, multi-path effects, time-varying parameters or sensor dynamics is important in desired practical applications. The choice of inadequate models of the environment may result in highly sensitive and un-robust processing by a network. Indeed, in order to render the network operable in real world scenarios, robust operations must account for parameter variations, dynamic influences and signal delays that often result in asynchronous signal propagation. The environment needs to be modeled as an appropriate dynamic linear (or even nonlinear) system [53, 86, 151]. 1.3 The State Space Framework The state space formulation provides a general framework capable of dealing with a variety of situations. The multi-variable state space provides a compact and computationally attractive representation of multi-input multi-output (MIMO) filters. This state space representation allows for the derivation of generalized iterative update laws for the BSR. The state notion abridges weighted past as well as filtered versions of input signals and can be easily extended to include non-linear networks [17, 103]. There are several reasons for choosing this framework. 4 inter case inver the C1 Parar Parnc finite All c Gamr Th€12 nOn-“ gener Transfer function models, although equivalent to the state space models when initial conditions are zero, do not exploit any common features that may be present in the real dynamic systems. State space models give an efficient internal description of a system. Further, this choice allows various equivalent state space realizations for a system, more important being the canonical observable and controllable forms. The inverse for a state space representation is easily derived subject to the invertibility of the instantaneous relational mixing matrix between input-output — in case this matrix is not square; the condition reduces to the existence of the pseudo- inverse of this matrix. This feature ensures recoverability of original sources provided the environment model is invertible. Parameterization methods are well known for specific classes of models. In particular, the state space model enables much more general description than standard fmite/infinite impulse response (FIR/11R) convolutive filtering. All conventional (dynamic) filtering models, like AR, MA, ARMA, ARMAX and Gamma filters, can be considered as special cases of flexible state space models. The linear state space formulation of a problem is conveniently extendable to include non-linear and time-varying models. The state space models are ideal candidates for generalized non-linear model representations that include, e.g., neural networks. 1.4 1 In the 11‘ mm 10“ outputs about ill; I unknow to compi without a 0fthe unl 1.4 Problem Definition: Adaptive Framework for Blind Source Recovery In the most general setting, the mixing/convolving environment may be represented by an unknown dynamic process H with inputs being the n-d independent sources ,2 and the outputs being the m-d measurements Ln_. In this extreme case, no structure is assumed about the model of the environment. The environment may be also modeled as a dynamic system with fixed but unknown parameters. The processing network H must be constructed with the capability to compute the “inverse” (or the “closest to an inverse”) of the environment model without assuming any knowledge of the mixing environment or the distribution structure of the unknown sources. 1’10) 110) Mixing/Convolving I £20) Demixing/Deconvolving ___.YZ (0 Network/Filter 5 NetworldFilter ' H . ' H ___. mm“) YMU) Figure 1.1. General framework for Blind Source Recovery environ enxiron state sp dynamic equilibr‘. Where P ~ is a g D~$ad I ‘ repri represents Po Stem frOm impOSSlble miXtUres [4 ° For For dim . For In th It is possible that an augmented network be constructed so that the inverse of the environment is merely a subsystem of the network with learning. In this case, even if the environment is unstable (e. g., due to existence of non-minimum phase zeros in a linear state space model), the overall augmented network may represent a nonlinear adaptive dynamic system, which may converge towards the desired parameters as a stable equilibrium point or set. Thus achieving the global task of blind identification [126, 156]. For the linear filtering environments, this problem may be presented as _X=H*m=H*H*§=P*D*§ (1.1) where P — is a generalized permutation matrix D — is a diagonal matrix of filters with each diagonal filter having only one non-zero tap. * - represents ordinary matrix multiplication for the static mixing case, while it represents polynomial matrix multiplication for the multi-tap deconvolution problems. 4 Possible differences in the dimensions of the signal, mixture and output vector stem from the blindness of the problem. However, in most cases it is very difficult if not impossible to determine the number of unknown sources directly from the observed mixtures [4,6]. This results in three types of BSR problems, i.e. o For m = n , the problem is referred to as simply BSR. o For m > n , the problem is termed as undercomplete BSR, due to higher dimensionality of the projected mixture space 0 For m < n , the problem is called overcomplete BSR, due to scarcity of dimension in the mixtures space. BSR a conven for all ' easily measure decomp< for m < algorithr BSR. w} 1.4.1 In this matrix . Simulatl For notational and mathematical convenience, in Chapter 3 we will derive all BSR algorithms for the case where n = m = M, unless otherwise specified. This convenient choice allows for unambiguous mathematical manipulations in the derivation for all the algorithms. The under-complete mixture case of BSR, where m > n, can be easily reduced to the fore mentioned case by appropriately discarding some measurements using pre-processors, e.g., Principal Component Analysis (PCA), eigen decomposition etc. [33, 52, 53, 60]. On the other hand, the over-complete mixture case for m < n, in general, is not tractable and cannot be handled by the existing BSR algorithms in any straightforward fashion. A solution for a special case of overcomplete BSR, where the underlying source distributions are sparse, is discussed in chapter 5. 1.4.1 Blind Source Separation: An Example In this example, we mix three signals using an arbitrarily chosen normalized mixing matrix . The original sources are 0.1sin(400t)cos(30t) s(t) = 0.08.sign(sin(500t + 9 cos(40t))) noise; unifome distributed The nonlinearity used in this case is based on Edgeworth expansion [8, 47]. The simulation results are presented below n(t) lnput Signals 11 :11 11] 111.1 11 F 'l '1' A {El .1 “ll"! 11’- 151’111 11111 1111:. 111.112.] 1.1111, 21.111111111141111i‘lii‘l‘iluil'll‘lu‘m1”? 1111111111 11M ‘1' llg'tl‘p‘ 11;1:;1‘p'111n;1};i 1.1111111111111511 11‘1 '11" 15;.31' no 1‘11 111 it 1111 11 1 1 1 l -‘ 0.5 0.6 0.7 0.8 0.9 1 l Sill-iii! I lip 1,715“ illll l 1 ll M1. élll :21" {‘ll] {”1 l“ 11$ 1' liilliiilliull O. 9 "'l 1.111.111.1111. 0.6 JLl0.2 (a) Mixed Signals I 1 I I I T I —T T 111111111111th ill-111* 111111111111J11111111111111111111111111it11111111 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t wwwxmimmwvtWanWWtWWM/w Figure 1.2. Blind Source Separation (a) Original Sources (b) Observed Mixtures (b) Blindly Separated Signals 1.11111 1 1 1'1'l1.1'11,11'l"l i'nF 1 ll l'l"'1 11"11‘11'171' 111111 11‘111 11 ill 1'11 111 11111 "1" 1111f '1" 1111' '1'l'l"' ' Lfl'sbl' H“ l L 0 5 0.6 0.7 (a) Global Transfer Function (b) Figure 1.3. Bhnd Source Separation (a) Separated Signals (b) Global Transfer Function 1.5 011 This diss motivalii‘; oven'ie“ Chapter I approach with a di recovery I general 11. dmamical for more g are def“, Classificm feedfOnya performan 4 dlSCuSSC Chapter 3. (hum-aha models. rc parameter source dist are 3150 Dr and OVCTCC 1.5 Organization of the dissertation This dissertation is divided into seven chapters and five indices. Some basic introduction, motivating applications and the definition of the problem are presented in Chapter 1. An overview of the current literature on the problems of B88 and BSD is presented in Chapter 2. This is followed by a concise discussion on the natural gradient learning approach and its application to blind source recovery networks. In chapter 3, we start with a discussion on the multivariable optimization framework for the blind source recovery problem. We initially present BSR update laws previously derived in a more general nonlinear mixing setting, the framework is then specialized to the linear dynamical state space representation. The natural gradient based update laws are derived for more efficient estimation of the recovery network parameters. Separate update laws are derived for the minimum and non-minimum phase environments. Further classification is done on the basis of the BSR network structure, which can be either in a feedforward or a feedback formulation. Simulation results and a comparison of the performance for all the proposed BSR structures make up the rest of the chapter. Chapter 4 discusses the issue of score function adaptation for all the update laws proposed in chapter 3. Two separate adaptive score functions are proposed for BSR based on the Generalized Gaussian distribution model and the proposed Hyperbolic distribution models, respectively. A discussion on adaptive estimation of the batch output kurtosis, parameter regulation for the proposed score functions and their effectiveness for various source distributions is included, The simulation results using the proposed score functions are also presented. In chapter 5, we discuss the recovery issues related to undercomplete and overcomplete mixtures, which may arise due to excess and deficiency of dimension ll in mix. sparse Analys Simula are incl to the a The foc CDMA Compre‘. equivale derived. schemes the pmp also Sug this digs 5 dissenat this “or entropy in the 1 SimpljfiC (FIR) an BSR are in mixture space with respect to the original source space. For the case of overcomplete sparse distributions, a new ICA algorithm called Algebraic Independent Component Analysis (AICA) is presented and related implementation issues are discussed. Simulation examples using both independent identical distribution (iid) data and speech are included as a demonstration of the AICA recovery performance. Chapter 6 is devoted to the application of BSR approaches to the CDMA wireless communication networks. The focus is primarily placed on the forward link (or downlink) for both a generic DS- CDMA implementation, and a UMTS FDD (or WCDMA) wireless network. Comprehensive signal models are initially developed and then cast into convolutive equivalents. Three adaptive BSR algorithms for a user’s signal detection are subsequently derived. These proposed detection schemes are then compared to conventional detection schemes for CDMA using extensive simulation results that outline the effectiveness of the proposed approaches. Chapter 7 summarizes the developments made in this work and also suggests various possible future extensions possible to the work being presented in this dissertation. Several appendices have been included to supplement the main text of this dissertation. Appendix A enlists the various abbreviations and acronyms used throughout this work. In appendix B, we present the Kullback-Lieblar divergence [72] or the relative entropy measure, which is used as the objective performance functional to be minimized in the BSR optimization framework. Appendix C presents specific implementation simplifications in the state space BSR framework for the cases of finite impulse response (FIR) and infinite impulse response (IIR) filters. Various performance benchmarks for BSR are presented in Appendix D. Two of these benchmarks namely the Quadratic 12 mmn1 develt1 techniq fbnnul Information Measure (QIM) and the Algebraic Matrix distance Index (AMDI) have been developed during the course of this work. Lastly appendix E presents a transformation technique to transform a generic MIMO filtering structure to a canonical state space formulation. 13 Blind S informa apphcat lCiCCQm is t0 rec InixmrC based 1 decOn.e depend COmDOI senSOrS blind S( 2113011115 indeper Chapter 2 ' REVIEW OF LITERATURE Blind Source Separation (BSS) by Independent Component Analysis (ICA) using the information theoretic approaches has been a hot research topic because of its potential applications in signal processing such as in speech recognition systems, telecommunications, time series analysis and medical signal processing. The goal of ICA is to recover independent sources given only sensor observations that are unknown linear mixtures of the unobserved independent source signals. In contrast to the correlation- based transformations such as Principal Component Analysis (PCA), ICA not only decorrelates the signals (2nd order statistics) but also reduces higher-order statistical dependencies, attempting to make the signals as independent as possible [53]. Two different research communities have considered the analysis of independent components. On one hand, the study of separating mixed sources observed in an array of sensors has been a classical and difficult signal processing problem. The seminal work on blind source separation was by Herault and Jutten [56] where they introduced an adaptive algorithm in a simple feedback architecture that was able to separate several unknown independent sources. Their approach was further developed by Jutten and Herault [63]; 14 Karhunen and Joutsensalo [65]; Cichocki, Unbehauen and Rummert [36]; Comon [37] elaborated the concept of independent component analysis and proposed cost functions related to the approximate minimization of mutual information between the sensors. In parallel to blind source separation studies, unsupervised learning rules based on information theory were proposed by Linsker [83]. The goal was to maximize the mutual information between the inputs and outputs of a neural network. This approach is related to the principle of redundancy reduction suggested by Barlow [20] as a coding strategy in neurons. Each neuron should encode features that are as statistically independent as possible from other neurons over a natural ensemble of inputs; decorrelation as a strategy for visual processing was explored by Atick [19]. Salam, et. al., [45, 106] first proposed the use of linear state space models for B88 and derived a novel adaptive algorithm based on multivariable optimization theory. Nadal and Parga [88] showed that in the low-noise case, the maximum of the mutual information between the input and output of a neural network implied that the output distribution was factorial; that is, the multivariate probability density function (pdt) can be factorized as a product of marginal pdfs. Roth and Baram [102] and Bell and Sejnowski [21] independently derived stochastic gradient learning rules for this maximization and applied them, respectively, to forecasting, time series analysis, and the blind separation of sources. Bell and Sejnowski [21] put the blind source separation problem into an information-theoretic framework and demonstrated the separation and deconvolution of mixed sources. Their adaptive methods are more plausible from a neural processing perspective than the cumulant-based cost functions proposed by Comon [37]. A similar adaptive method for source separation was proposed by Cardoso and Laheld [29]. 15 vicwpl- propos1 Girolar Project kurtosi: linear n separati Similari‘ process1 deveIOp algorith the COn the nan C iChoc]. Other algorithms for performing ICA have been proposed from different viewpoints. Maximum Likelihood Estimation (MLE) approaches to ICA were first proposed by Gaeta and Lacoume [43] and elaborated by Pearhnutter and Parra [94]. Girolami and Fyfe [51], motivated by information-theoretic indices for Exploratory Projection Pursuit (EPP) used marginal negentropy as a projection index and showed that kurtosis-seeking projection pursuit will extract one of the underlying sources from a linear mixture. A multiple output EPP network was eventually developed to allow full separation of all the underlying sources [50]. Bell and Sejnowski [21] have pointed out a similarity between their infomax algorithm and the Bussgang algorithm in signal processing. Torkkola [118] and Girolami [48] also made theoretical and implementation developments using infomax algorithm. Lambert [73, 74] proposed extensions in BSS algorithms using frequency domain polynomial matrix algebra, further he also elucidated the connection between three different Bussgang cost functions. Amari [7, 8] proposed the natural gradient based learning rules based on information geometry while Amari, Cichocki and Yang [13] proposed implementations based on use of higher order statistics and source pdf approximations using Gram Charlier and Edgeworth series expansions [30]. Gharbi and Salam [45, 47] extended their state space algorithms dervived using higher order statistics. Girolami and Fyfe [51] chose negentropy as a projection pursuit index, developing a learning rule that is able to blindly separate mixed sub— and super- Gaussian. Lee, Girolami, Bell and Sejnowski [76, 77, 79] have shown the interconnection of all the above mentioned approaches, unifying them under the banner of extended infomax algorithm. Amari, Douglas, Cichocki and Yang [14] proposed a comprehensive algorithm using the natural gradient for the multichannel blind source 16 separa repres. algoru. equah2 separation problems. Constraining both the mixing and demixing environment to be represented by FIR filters. Several research groups have further worked to optimize these algorithms for various domains of applications especially for mixtures of speech and equalization of communication channels. It has been proved that the algorithms for the Blind Source separation and Blind Source deconvolution give superior performance when optimized using the Riemannian contra-variant or the natural gradient [6, 9, 16], or equivalently the relative gradient [29]. The use of state space for BSS was first proposed by Salam [106]. Gharbi and Salam [45, 47, 11]] extended their algorithms for continuous time static and dynamic environments using multivariable optimization theory and the natural gradient. Zhang, Cichocki et. al. [153, 155, 156] followed suit with several research papers, focused on derivation of state space algorithms based on Lie Group structures. They have also advocated use of Kalman filters for state estimation [157]. Salam and Erten. [109] proposed a general framework for the state space blind source recovery. More recently, Waheed and Salam [110, 112, 126, 129] have proposed extended discrete time state space algorithms, for static and dynamic environments, both in feedforward and feedback configurations for Blind Source Recovery (BSR). They have demonstrated the application of algorithms for both minimum and non-minimum phase systems, and also for a variety of unknown source distributions using adaptive non-linearities. Our current research proposes extended discrete-time natural (or the Riemannian contravariant) gradient algorithms for the Blind Source recovery (BSR) or multichannel blind deconvolution (MCBD) problems fused with the use the state-space formulation, and the optimization theory. BSS in this approach can be viewed as a trivial static case. 17 Before providing details of the conducted research, the notion of the natural gradient and its application to the problems of BSS and BSD has been concisely discussed. The choice of the natural gradient, for the problems of Blind Source Recovery (BSR) or Multichannel Blind Deconvolution (MCBD) has been made due its superior and equivariant performance, which is preserved in the state-space formulation for this research 2.1 The Natural Gradient Framework For any general nonlinear optimization framework, the stochastic gradient method [5, 104, 119, 147] is a popular learning method. However, in many cases the parameter space is not Euclidean but has a Riemannian metric structure. In these cases, the ordinary gradient does not give the steepest direction of a target function; rather, the steepest direction is given by the natural (or the contravariant) gradient. The Riemannian metric structures were introduced by means of information geometry [6-9, 16]. The natural gradient is discussed explicitly in the space of matrices for blind source separation (BSS), and the space of linear dynamical systems for the blind source deconvolution (BSD) problem. A performance comparison between the ordinary stochastic gradient and the natural gradient has been presented in [9]. The reference studies the asymptotic behavior of online natural gradient learning. For online adaptation, the training examples can be used only once as they appear during the learning/adaptation phase. Therefore, the asymptotic performance of online learning/adaptation cannot be better than the optimal batch procedure where all the examples can be reused again and again. This has been proved that natural gradient online learning gives the Fisher-efficient estimator in the 18 5611 St? asymp loss ft than b- byum gndkn 2.1;1 Let S: Euclide incremc 1 Where ( Orthonor \1 length Of Theitx general 0 ii I’CduCeS sense of asymptotic statistics when the loss function is differentiable, so that it is asymptotically equivalent to the optimal batch procedure, see [7, 91]. However, when the loss function is non-differentiable, the accuracy of asymmotic online learning is worse than batch learning by a factor of 2, for example see [25]. In this section, what is meant by the natural gradient is briefly outlined, and its relationship to the ordinary stochastic gradient is established. 2.1.1 Definition of the Natural Gradient Let S = {w e R" } be a parameter space on which a function L(w) is defined. When S is a Euclidean space with an orthonormal coordinate system w, the squared length of a small incremental vector connecting w and w + dw is given by 2 " 2 10111 =Z(dw1-) (2.1) i=1 where dwi are the components of dw. However, when the coordinate system is non- orthonormal, the squared length is given by the quadratic form 1W12 = 28:; (W)dwidw j (2.2) i,j When S is a curved manifold, there are no orthonormal linear coordinates, and the length of dw is always written as in equation (2.2). Such a space is a Riemannian space. The nx n matrix G =(gij)is called the Riemannian metric tensor, and it depends in general on w. It reduces to 1, -=- gg-(w)=6.-j=1 '. J. (2.3) O,1¢j l9 in the that m for a St 2.1.2 The SIC Where gradie “rh Cr 1 in the Euclidean orthonormal case, so that G is the unit matrix I in this case. The steepest descent direction of a function L(w) at w is defined by the vector dw that minimizes L(w+dw) where ldwl has a fixed length, that is, under the constraint ldle = .92 (2.4) for a sufficiently small constant a , see [7, 9, 16]. 2.1.2 Theorem 2.1 The steepest descent direction of L(w) in a Riemannian space is given by —€7L(w) = —G" (w)VL(w) (2.5) where G‘1 = (gij) is the inverse of the metric G = (gij) and VL is the conventional gradient, 6 6 VL(w)—(5—w1-L(w),...., 6w n L(w))T the superscript T denoting the transposition. Where €7L(w) = G“VL(w) (2.6) is defined to be the natural gradient of L in the Riemannian space. Thus, -§L represents the steepest descent direction of L. Using the tensorial notation, this is nothing but the contravariant form of -VL . Observe that when the space is Euclidean and the coordinate system is orthonormal, this reduces to BL = VL 20 I probl. 1 preset 2.1.3 Let us variabl Signals adjusta process Where , SiOChaf Where definill C(19) e. Where The natural gradient learning has been briefly outlined and its use for the problems of BSS and BSR (or MCBD) is shown below. For complete proofs of the presented theorems, please see [9, 16]. 2.1.3 The Natural Gradient Learning Let us consider an information source that generates a sequence of independent random variables sl,sz,...,s,,...,subject to the same probability distribution q(s). The random signals 3, are processed by an adaptive processor (e.g., a neural network) that has a set of adjustable parameters w. Let I(s,w) be a loss function when signal s is processed by the processor whose parameter is w. Then the risk function or the average loss is L(w) = E[I(s, w)] (2.7) where E denotes the expectation with respect to 3. Learning is a procedure to search for the optimal w*that minimizes L(w). The stochastic gradient descent learning method can be formulated in general as w,” = w, — n,C(w,)VI(s,, w,) (2.8) where n, is a learning rate that may depend on t and C(w) is a suitably chosen positive definite matrix [5]. In the natural gradient online learning method, it is proposed to put C(w) equal to G_1(w) when the Riemannian structure is defined as (2.6). This suggests the natural gradient descent algorithm of the form Wt+l = Wt -m\7L(w,). (2.9) where n, is the learning rate that determines the step size. 21 Two illustrative examples for the natural gradient learning are presented that include Blind Source Separation (BSS) and Blind Source Deconvolution (BSD). A brief introduction to the setup of each problem is followed by the derivation of update laws based on the natural gradient, this approach will serve to outline both problems and also present the motivation in the use of natural gradient while searching for an adaptive solution. 2.2 Blind Source Separation. Let us consider n signal sources that produce n independent signals si(t) , i = l,..., n, at discrete times t =1, 2, Assuming that si(t) are independent at different times and that the expectations of s,- are 0. Let r(s) be the joint probability density function of s. Then it is written in the product form n . "(S)=I_Iri(si) (2.10) i=1 Consider the case where there exists no direct access to the source signals s(t) but only their n instantaneous mixtures m(t) can be observed, m(t) = Hs(t), or mi(t)=ZHy-sj(t) (2-11) i=1 where H = (H ij) is an nx n nonsingular mixing matrix that does not depend on t, and m = (m1,..., mn)T is the observed mixtures. 22 denshi Where form 1 follow 2.2.1 The n. there: The Dojn Blind source separation is the problem of recovering the original signals s(t), t = l, 2, ..., from the observed signals m(t), t = 1, 2, ..., [63]. If H is known, this is trivial, because s(t) = H '1m(t) The “blind” implies that neither the mixing matrix H nor the probability distribution densities r,-(s,-) are known. A typical algorithm to solve the problem is to transform m(t) into yo) = W. m(t) (2.12) where W, is an estimate of H '1, and is modified by the following update equation of the form (2.9). The specific form of the update law for BSS can be derived using the following theorem by Amari [9]. 2.2.1 Theorem 2.2 The natural gradient in the matrix space is given by 61 = (V1)WTW (2.13) therefore, the natural gradient update equation for BSS is %=m(1-¢(y)yT)W (2.14) where (p( y) is the column vector My) =[¢1(y1).....¢m(ym)]T ( ) d1 f ( ) (2'15) . . =—_O . 1. $1 y] 6? g I y The efficiency of this equation is studied from the statistical and information geometrical point ofview [11, 15]. 23 2.3 ' When signal introd: Them Where T0 rec of deg 2.3 Blind Source Deconvolution. When the original signals s(t) are mixed not only instantaneously but also with past signals as well, the problem is called blind source deconvolution or equalization. By introducing the time delay operator 2’1 , z‘lso) = s(t —1) (2.16) The mixing matrix filter H is denoted by H(z) = i sz‘k (2.17) k=0 where A). are mxm matrices. The observed mixtures are m(t) = H(z)s(t) = ZHksa — k) (2.18) It To recover the original independent sources, using the finite impulse response model d W(z) = Z sz-k (2.19) k=0 of degree d. The original signals are recovered by y(t) = W.(2)m(t) (2.20) where W, is adaptively modified by 111.1(2)=W2)—n.171{m..m._1...JV.} (2.21) where the natural gradient in this case is given by the following theorem [9]. 2.3.1 Theorem 2.3 The natural gradient of the manifold of systems is given by in = Vl(z)WT(z_1)W(z), (2.22) 24 when “here Post r when where operator z‘l should be operated adequately. 2.4 Equivariant Performance of the Natural Gradient An algorithm for source separation of instantaneous additive mixtures is defined to have equivariant performance if its behavior does not depend explicitly on the form of the mixing matrix H [29]. This definition can be extended for the case of source deconvolution [13] by stating that a deconvolution algorithm is equivariant if its behavior only depends on the combined filter G(z,k) Using (2.21), the update algorithm can be expressed in the form 61(2) AW(Z,k) : —77t6l{mt,mt_1,...,m} = _"t aX(Z) W(z,k) (2.23) where dX(z) = dW(Z){W(Z)}_1 Post multiplying both sides of (2.23) by H(z), the relation is given by 61(2) A603,") = _77! 6X(Z) G(z, k) (2.24) where G(z,k) = W(z,k) * H (z) , Note, that both (2.24) and the definition of gradient can be expressed in a form that is independent of H(z), which proves the equivariant property of the natural gradient algorithms. 25 Chapter 3 ‘ BLIND SOURCE RECOVERY: A FRAMEWORK IN THE STATE SPACE Blind Source Recovery (BSR) denotes recovery of original sources/signals from environments that may include convolution, temporal variation, and even nonlinearity. It also infers the recovery of sources even in the absence of precise environment identifiability. This chapter describes a generalized BSR formulation, see Salem, et. al. [109, 110], achieved by the application of stochastic optimization principles to the Kullback-Lieblar divergence as a performance functional subject to the constraints of the general (i.e., nonlinear and time-varying) state space representation. This technique was used to derive update laws for nonlinear time-varying dynamical systems, which are subsequently specialized to canonical neural and time-invariant linear systems. Some of this work has also been implemented in dedicated hardware/software platforms [40-42, 46]. The state space feedforward and feedback demixing network structures have been exploited to develop natural gradient based learning rules. These adaptation algorithms 26 are capable of handling most filtering paradigms and which can be conveniently extended to nonlinear models. In the special cases, distinct linear state-space algorithms are presented for the minimum phase and non-minimum phase mixing environment models. Conventional (FIR/IIR) filtering models can be subsequently derived from this general structure (see, Appendix C). Illustrative simulation examples are presented to demonstrate the online adaptation capabilities of the developed BSR algorithms. 3.1 Formulation of the Optimization Structure This section formally describes the state space BSR optimization framework, see [109, 112] for more details. In this section, the BSR stochastic gradient update laws are derived via the proposed constrained multivariable optimization framework [109, 110] using Kullback-Lieblar divergence as the performance ftmctional (see, Appendix B). This performance ftmctional is a well known information-theoretic ‘g‘distance” measure, appropriate for quantifying the mutual dependence of signals in a mixture. Using the calculus of variations and the optimization theory, the update laws are initially derived for a more general non-linear dynamic structure. Subsequently, the update algorithm is specialized to the case of a linear dynamical state space model. For the case of discrete-time, independent, wide sense stationary output vector y(k) , the Kullback Lieblar divergence (or the relative entropy) is given by n L(y(k))=—H(y(k))+ZIHi(y,(k)) (3.1) 1: where, H(y(k)): represents the entropy of the signal vector y(k), given by 27 r- j' py(y)1n|Py(y)1dy Continuous Case = _ = er H(y) E[lnlpy(y)l] 1 —Z py(y)ln|py(y)l Discrete Case , er ' and H ,- ( y,(k)) : represents the marginal entropy of a component signal y,(k) 3.1.1 Algorithms for the Nonlinear Dynamic Case Assume that the environment can be modeled as the following nonlinear discrete-time dynamic forward model X e (k +1) = fetXe(k).s(k).h1) (3.2) m(k) =ge(Xe(k)aS(k)ah2) (3.3) where s(k) : n-d vector of original source signals m(k): m-d vector of measurements X e (k) : Me -d state vector for the environment hl: constant parameter vector (or matrix) of dynamic state equation h2: constant parameter vector (or matrix) of output equation fe(.) and ge(.): differentiable nonlinear functions [86] that specify the structure of the environment. Further it is assumed that existence and uniqueness of solutions are satisfied for any given initial conditions X e(t0) and sources s(k) thus a Lipschitz condition on fe(.) is satisfied [71]. 28 eQUau andIT The processing demixing network model may be represented by a dynamic feedforward or feedback network. Focusing on a feedforward network model, assume the network to be represented by X (k+1) =f (X (k).M(k),W1) (3-4) y(k) = g(X(k),m(k), W2) (35) where m(k): m-d vector of measurements y(k): N—d vector of network output X (k) : M-d state vector for the processing network wl : parameters of the network state equation w2: parameters of the network output equation f (.) and g(.): differentiable nonlinear functions defining the structure of the demixing network The assumption of existence and uniqueness of solutions of the nonlinear difference equations is also assumed for the network model for any given initial conditions X (to) and measurement vector m(k). In order to derive the update law, the notation is abused for the sake of convenience so that y(k) in (3.1) is represented as yk and L(y(k)) is generalized to Lk (yk) so that the functional may also be a function of the time index k. Thus, the following constrained optimization problem can be formulated [109, 112] to be Minimize 29 subjc. with t compu becom Where Conse Where k1 Jo(W1aW2) = Z L"(yk) (3.6) k=ko subject to Xk+1=fk(XkamkaW1) (3.7) yk =g"(Xr,mr.W2) (3.8) with the initial condition X k0 , where [k0,k1) is the discrete-time frame used for the computation of the cost functional. Thus, the augmented cost functional to be optimized becomes k1 J(W1,Wz)= Z Lk(yk)+41rT+1(fk(X/rrmkrW1)—Xk+1) (3-9) k=ko where 4k is the Langrange variable [81]. Define the Hamiltonian as H" = thyk1+dlnfk X§+1=f2"(X1‘.X§.mk.W2) = (3.17) X,’,‘+1 =f2k(X1k,...,X,’f,mk,wn) yk =g(X1k,...,X,lf,mk,wy) and the initial conditions X I" 0 , X g0 ,. . . , X 50 The augmented cost functional to be minimized becomes 32 DCiillt and the Ill/(+1 k1 k 15H J(wl,...,w,,,wy)= Z L (yk)+ , k=k0 : hl'f'i'l — Define the Hamiltonian as F k+1q ’11 h - f1k(X1kamk,"1) k+l k k k Hk =Lk(yk)+ 12: f2(X1.X:2,mk.MQ) k+l k k k 1,, Lfn(X1,...,Xn,mk,wn)d f1k(Xlk,mk’W1)’Xlk+l f2k(X{‘,X§,mk,w2)-X§+1 k k k k+1 _fn(X1,...,Xn,mk,wn)—X,, - Consequently the necessary conditions for optimality are or!" Xik+1 = .1, 6X," — 6X? ' ex," and the change in weighting parameters become (— k —7‘ 5ft O k 1 + arr" 61'" 1". AWi——7]k-5;".———77k . I ' k k+l 5f" O 371 5W1 K- .1 Aw —_ git—...,] 2.11.: y 77k awy k awy .1 -611" _ aft-"0 ,...,” ,am) $1.34 . =fik(X1k’”°’Xikamk’wl)’i= 1,2,.”,n ,1=1,2,...,n 0X," (3.18) (3.19) (3.20) (3.21) (3.22) (3.23) In this proposed neural structure the update of the states takes the structural lower triangular form, while the update of co-states and the weighting parameters assumes upper triangular structures. 33 3.2 111 111C form 3.2 Algorithms for the Linear Dynamic Case In the linear dynamic case, the environment model is assumed to be in the state space form X e (k +1) = AeXe (k) + Bes(k) (3.24) m(k) = CeXe(k) + Des(k) (3.25) DO 39:) B. X.(k+1) 1 z" I X.(k) C. m k) A0 Figure 3.1. State Space Mixing Environment In this case the feedforward separating network will attain the state space form X(k +1) = A X(k) + B m(k) (3.26) y(k) = C X (k) + D m(k) (3.27) The existence of an explicit solution in this case has been shown by [106, 107]. This existence of solution ensures that the network has the capacity to compensate for the environment and consequently recover the original signals, see Figures 3.1 and 3.2. Specializing the general BSR algorithm to the linear network case, one forms the (Kullback-Lieblar divergence) performance (“distance”) measure, see (3.6) as 34 subjcc The at Agair. giW‘n f D / f mgk) B X(k+1) 24' X(k) C y(k) / g / A / Figure 3.2. State Space Demixing Network k1 J.(A.B.C.D)= 2 Hot) (328) k=ko subject to (3.26) and (3.27) with the initial conditions X k0 The augmented cost functional to be optimized becomes k1 J(A,B.C.D)= Z Horn/113nm X1. +3 m. —Xk+1) (3.29) k=k0 Again, define the Hamiltonian as Hk _ Lk T — (yk)+ 21.104 X k + B ml.) (3.30) For the linear time-invariant case the ordinary stochastic gradient update laws are given by [109, 110]. k Xk+l = 6H =AXk+Bmk (3.31) k+l k k 2k =91: Af2k+1+ckT§IL (3.32) an ayk 35 provid matric guaran Where P113») : 136]. aH" AA = — —— = — 2 XT 3.33 77k 6 A 77k k+l k ( ) aHk r . AB = -77k — = -'7k/1k+1mk (3.34) BB 611" en" T AC=- —=— —=— X 3.35 77k 6C 77]: a C 77k¢(y) ( ) on 61" —T T 77k 6 D 77k 6 D Wk“ 1 (001)"? ) ( ) The above derived update laws form a comprehensive theoretical algorithm and provides the update laws for the states X k , the co-states 3k and all the parametric matrices in the state space. The invertibility of the state space as discussed in [45, 106] is guaranteed if the matrix D is invertible. In the above derived laws 77k: positive learning rate of the algorithm, which may be adaptive [D]—.T : represents the transpose of the inverse of the matrix D if it is a square matrix or the transpose of its pseudo-inverse in case D is not a square matrix. (0( y) : represents a vector of the usual nonlinearity or score function [67, 127, 136, 138], which acts individually on each component of the output vector y, with each component, say m(yi) , is given as 5P(yi) ¢(yj)=_alogp(yi)='— A)? (3.37) 5% PO’i) where p(y,-): is the (estimate) of the probability density function of each source [127, 130, 136]. 36 result upda: and l An in and It 3.2.1 In thi Recov envirc [1 10, SYSlCr Unstal 6x111 the r. The update law in (3.35) and (3.36) is similar to the standard gradient descent results [12, 13, 155], indicating its optimality for the Euclidean parametric structure. The update law provided above although non-causal for the update of parametric matrices A and B, can be easily implemented using frame-buffering and memory storage [52, 54]. An inherent delay or latency in the recovered signal are related to the length of the buffer and the dimension of the state space. 3.2.1 Extensions to the Natural Gradient In this section, the linear state space algorithms for the problem of Blind Source Recovery (BSR) derived in the previous section are extended using the natural gradient [6, 7, 9, 14, 16]. Specialized algorithms for the class of minimum phase mixing environments are presented both in feedforward and feedback state space configuration [110, 112, 126]. For the non-minimum phase mixing environment, the requisite demixing system becomes unstable due to the presence of poles outside the unit circle. These unstable poles are required to cancel out the non-minimum phase transmission zeros of the environment. In order to avoid instability due to the existence of these poles outside the unit circle, the natural gradient algorithm may be derived with the constraint that the demixing system is a double sided FIR filter, i.e., instead of trying to determine the HR inverse of the environment, the inverse is approximated using an all zero non-causal filter [11, 129]. The double-sided filters have been proven to adequately approximate IIR filters at least in the magnitude terms with a certain associated delay [22]. The minimum phase algorithms have lesser computational requirements and exhibit asymptotically better convergence characteristics, compared to the algorithm for the non-minimum phase mixing environments, when applied to the same class of 37 minimum phase systems. The minimum phase algorithm also exhibits more robust performance with respect to the choice of the order of demixing/deconvolving filter per channel, and the multi-source distribution composition of mixtures that may include at most one pure gaussian source [33]. The non-minimum phase algorithm gives good performance for the non-gaussian mixtures and/or the non-gaussian contents of a complex multi-source distribution mixture. The gaussian source although separated from other sources may still be auto-convolved (Note that the sum of co-centric copies of a gaussian distributions is also gaussian). Also notice that the presented algorithms in this chapter focus primarily on adaptive determination of zeros. The poles are kept fixed either at zero or at randomly selected stable locations. Otherwise, they are assumed known using other methods, e.g., adaptive state estimation techniques such as the recursive least squares (RLS), kalman filter [155, 157] etc. 3.3 Blind Source Recovery Algorithms for Minimum Phase Environments 3.3.1 Theorem 3.1: Feedforward Minimum Phase Demixing Network Assume the (mixing) environment is modeled by a multiple-input, multiple-output (MMO) minimum phase transfer function. Then, the update laws for the zeros of the feedforward state space demixing network using the natural gradient are given by AC(k) = n((IN — ¢(y(k»yT(k»C(k> — ca>XT(k)) (3.38) AD(k) = 77(1 N - ¢(y(k))yT(k))D(k) (339) where (o( y) is a nonlinear score function given by (3.37) 38 3.3.1.1 For the {L beas in feedforxx gradient Riemmar better cc (3.35) an been dro Def the stc When p361 3.3.1.1 Derivation of Update Laws for Feedforward BSR For the feedforward demixing network, its linear state space representation is assumed to be as in (3.26)-(3.27). A formal formulation has been developed for deriving the feedforward update laws for the problem using the output equation (3.27) and the natural gradient learning. This leads to modified update laws for the matrices C and D using the Riemmanian manifold. Further, these new update laws based on the natural gradient have better convergence performance compared to those based on the stochastic gradient (3.35) and (3.36). Note that in the following derivation, instantaneous time index k has been dropped for convenience Defining augmented vectors )7 and f , and the matrix W as "— y ”— m W— D C 340 y_1X1 x_1X1 _ 0 ’(L—lim (' ) so the augmented output equation becomes y=Wr can The update law for this augmented parameter matrix W is similar in form to (3.36) or the stochastic gradient law for the static mixing case. Thus the update is AW = apt/“T — (0(9))?T 1 (3.42) 1177— DT 0 — CT 1 (L-l)m Consequently for the general case where D may not be square, its inverse (assuming the where pseudo-inverse for D to exist) is given by 39 Factoring 01 mi Post-multip Ali Using (3 .41 mi Writing in ' Al Th exam 32m K. and (348) W‘T 13(DTD)‘l 0 —CTD(DTD)’1 1(L_,),,, Factoring out the augmented weight term W'T , (3.42) can be written as AW = 7711 — n(mTWT1W-T (3.43) Post-multiplying by the matrix WTW , the update law becomes AW = ”[1 — n(mTWT 1 W (3.44) Using (3.41), the above update rule can be written as AW=n[I—¢(91}T]W (3.45) Writing in terms of the original state space variables the update law (3.45) is given by D C y T T D C A ___ 1- X 3.46 [0 I(L—l)m1 ”1 ¢1X1[y 1110 [“4”] . ( ) Considering the update laws for matrices C and D only, the relation is _ T T D C " [AD AC1-77lD Cl-U[¢(y)y ¢(y)X 110 1(L—1)m1 Therefore the final instantaneous update laws for the matrices C and D are 461k) = 221th —¢(y(k»yT(k»C(k)— «200% =0 (3.51) Also A1(L—l)m = 0 = 1(L—1)m - (P(X)yTC - ¢(X)XT By a rearrangement of terms, the condition is 10(40er + ¢(X)XT = I(L—1)m Using the relation (3.51), the condition reduces to ¢(X)XT = 1(1—1)». (3.52) 3.3.1.2.] Remark 3.1: The resulting update laws for the natural gradient update derived in [155] are in exact agreement with the update laws (3.47)-(3.48), also see [107, 110]. However, the derivation approaches and methodologies are different. 41 explicitly 11 Update law feedback 5 HOWCvcr, f. (in a stocha 3.3. 1.2.2 Remark 3. 2: The new auxiliary relations (3.51) and (3.52) define supplementary stochastic conditions on the nonlinear “correlation” between the outputs and the states of the demixing network. In this feedforward minimum phase structure case, these relations do not explicitly appear in the update laws and thus can not be used directly to simplify the update laws (3.47)-(3.48). (In contrast, similar auxiliary conditions derived for the feedback structure do simplify the resulting update laws, see proof of Theorem 3.2.) However, for effective convergence of the algorithm, these conditions need to be satisfied (in a stochastic sense), i.e., E (BOYD/T] = O(L-1)me , and E ¢(X)XT1= 1(L-1)m The first equality signifies that the output and the states of the network are “uncorrelated” as defined. This condition supplements the condition E[¢(y)XT] —> 0( L-1)mx N , which must be satisfied in (3.47) at convergence. Viewing the states as prior signal information, the causal algorithm necessarily includes only causal correlation while the auxiliary condition (3.51) signifies the noncausal correlation. The second condition signifies that the states of the demixing network need to be a white process. It is noted that, in the special case of FIR filtering where the states are basically delayed versions of previous measurements, i.e., X ,- (k) = mk_,- , the second condition (3.52) implies that whitening the measurement signals prior to applying the algorithm would enhance the convergence of the update law. These conditions are in addition to the stability conditions for the algorithm [11, 27, 109]. 42 3.3.2 T Assume I}, S}"S(em 0r demixing , 3.3.2 Theorem 3.2: Feedback Minimum Phase Demixing m(k) + add , y(k) z(k) [ABCD] Figure 3.3. State Space Feedback Demixing Structure Assume the MMO (mixing) environment modeled by a minimum phase linear oynamic system or transfer function. Then the update laws for the zeros of the feedback state space demixing network using the natural gradient are given by AD = n[(I~ + D)((0()’)yT —I~)] (3.53) AC=n[(1N +D)co(y)XT] (3.54) where ¢( y) is the appropriate score function by (3.37) 3.3.2.1 Derivation of Update Laws for Feedback BSR An alternate BSR structure was presented in [112], where the demixing network comprises of a feedback transfer function, see Figure 3.3. This feedback path is composed of tunable poles and zeros while the feedforward path is assumed to have fixed parameters. Using only the update laws for adaptively determining the zeros of this feedback path can potentially alter the poles of the overall demixing structure. The network equations for the feedback path are (see Figure 3.3) 43 :( A Defining e( k the output y( A For simpli 3'11 rearranging (1.1 In matrix f 11,. 01' z(k) = C X(k) + D y(k) (3.55) Defining e(k) = m(k) — z(k) ' (3.56) the output of the feedback structure is given by y(k) = H n ”“600 For simplicity, assume H n = I , so y(k) = 1 *(M(k) - 206)) = m(k) - 2(k) (357) rearranging terms, the output equation becomes (1N + D) y(k) + C X(k) = m(k) (3.58) In matrix form, (3.58) can be written as 1’”JD 1.311812 --1 y - IN+D C m 1X11 0 ’(L—11m1 1X1 (359) Defining the augmented vectors 3? and T’ , and the matrix W as ~ m ~ y ~ IN+D C X: ,Y= ,andW= X X 0 I(L—l)m (3.59) can be expressed in the compact form 01' Y=W X=P71T where 71?:W (3.60) Using the natural gradient, the update law for 7V is 44 z z-T ~ ~T sz AW=n[W —(0(Y)X 1W W,or 2: ~ ~TzT 2: , AW=n IM+N—(p(Y)X W W (3.61) Differentiating WW 2 I z/~ z~/ W W+WW =0 and by rearranging terms z/ z~ ~—1 ~—1~/~—1 W =—WWW =—W WW (3.62) Using lSt order Euler approximation for the derivatives in (3.62), the variation in W can be expressed as AW = —W'1(AW)W" (3.63) Also note that from (3.60) ~ ~ zT YT = XTW (3.64) Using (3.62) and (3.64), the update law in (3.61) becomes —W_1(AW)W_1 = n[l —¢(T’)T’T]W , or -1?an = 271! - «xi/WT] since WW = I . Arranging terms ~ ~ ~ ~T ~ ~ ~r AW=—nW [1—¢(Y)Y ]=nW [p(Y)Y —1] (3.65) Inserting the definition of W and 1" , (3.65) becomes 45 Therefo rows blt 3.3.2.2 By equa‘ 11221 Observe Slippleme Case, hm netWork feedback signify ,1, deconelatt 11N+D C 1_ [1N+D C 1x ¢(Y)yT-1N (0(Y)XT 0 1(L—11m 0 1(L—11m (0(X)yT ¢(X)XT - I(H)», Therefore the update laws for the matrices C and D are extracted by equating the first rows block wise to get 41111 + D) = AD = 22101, + moon" — IN > + CromyT] (3.66) AC=ni(¢(y)yT—I~ )1 (3.70) AC=nt[D-¢ where, the first term on the right hand side can be simplified as (W1711T 6jiWT(z-1)W(z) =1W1711T (o‘jiWT(z))W(z) T (3.85) =(Wj'1) WITH—”(2) = [Tn-«2) = "7(2) while the second term can be simplified as L" r —'-T —1 - If] - T - —' Z ¢(J’k )mr_.-z 'W (2 >W = «204) Z ([W(Z)]mk—i) W(z>z ' i=0 i=0 L-l r __ _ = (DOME J’k—iW(z)Z_' :3 T (3.86) = ¢(yk)Z([WT(2‘l)1yk_r1 z-i i=0 L—l . = ¢(y/r)z “Li Z” '=0 where, we used equality (3.75) and u; is the transpose of the non-causal back- propagation filter output, defined as 52 Usit tran req. 1m; Upc‘ am. Pra w, I I; L—l “k % [WT(2"‘>]yk a DTyk + 2 CM..- (3.87) i=1 Let yk represent an NL x 1 matrix of outputs at the k’h iteration, which includes the k’h and the future L-l N-d output vectors, i.e., T A T T T Xk =[yk Yk+l .Vk+L—l:| then the back-propagation filter can also be expressed in the compact form _ “T u(k) — W _}_’k (3.88) Using (3.85) and (3.86) in (3.84), we get the following update laws for the MIMO FIR transfer Function L—l , 417(2) = 221. W12) - Z (PUkW/i—izfl (3.89) i=0 The update law (3.89) is non-causal as computation of u£_,-;i =0, 1, 2,...,L—l requires up to L —1 future outputs to be available. Practically the update law can be implemented (or made causal) by introducing a delay of L —1 iterations between the update of the parameters and the computation of the output at the k’" iteration (this in fact amount of using a frame-buffering in order to initialize the update law as is used in practice). Thus, a causal version of (3.89) becomes L—l , [AW(Z)]C = 77k [W(Z) - Z ¢()’k—L+1)ui—L-i+12_' (3-90) i=0 Writing down the update laws in the time domain for the 1‘” component coefficient matrix W,- , we have 53 [Ni/11C = 7711””: - (0(yk-L+1)ui—L—i+11 (3°91) Defining the new time index k = k — L +1, we have [An-lo=nklm-81y11711u’1i-ill we where the computation of the network output y(k) and the back-propagation filter output u (k) is computed as in the following expressions L—l y(k)§y(k—L+l)=ka_L+l = ZCim(k—L+l—i)+Dm(k—L+1) (3.93) i=1 L-l u(k)éu(k—L+l)=WTy_k_L+1 -_- ZCiTy(k-L+1+i)+DTy(k-L+l) (3.94) i=1 Therefore the explicit causal update laws for the state space matrices C and D, using the time index k are AC,- = 77,, 1C,- —¢(y(k))u(k—i)T],i=1,2,-~,L—1 (3.95) AD=nk [D-wtytk'11utif] (3.96) In the state space regime, the information back-propagation filter assumes the form 20? -— 1) = AUG?) + C50?) (3.97) u(k) = 37.10?) + DTy(k) (3.98) The non-minimum natural gradient algorithms [14, 126, 129, 130, 154-156] require both forward and backward in time propagation by the adaptive MIMO FIR filter. One would require a buffer of 3L-2 input samples to initiate the adaptation according to the update laws. 54 encon11 conven anylos beenig where minim Consui Proces Phase 10 5110 3.4.1., 11 sho F IR 5. In tha Simila; The non-minimum phase state space BSR update laws, being more general, also encompass the domain of the minimum phase mixing network case. This can be verified conveniently by using definition (3.87) in the update law for the matrix D. Note, without any loss of generality, the delay introduced in the update law (3.96) to make it causal has been ignored for clarity. no = 77,. :D — (0(y(k))u(k)T] p L—l T = 77k D - e(y(k))[DTy(k) + Z CrTyUc + 0] i=1 1— H (3.99) = 77k D - ¢(y(k))[y(k)TD + 2 WC NYC/1'11 i=1 ' L—l = 77k (1 - (/?(y(k))y(k)T ) D - (001(k)) Z y(k + 0" Ci] _ i=1 where the first term on the right hand side of equation (3.99) comprises the instantaneous minimum phase update of the matrix D (see equation (3.48)), while the second term constitutes the “non-causal” update term corresponding to the back-propagating processing required for the double sided FIR demixing filter in the case of non-minimum phase environments. Similarly, the update law (3.95) can be expanded using (3.80)-(3.87) to show the explicit relationship to the minimum phase feedforward update law (3.47). 3.4.1.1.] Remark 3.4: It should be noted that Equation (3.84) holds for the doubly infinite filter space. For an FIR Space domain, it does not satisfy a Lie group in the sense of the formulation in [154]. In that event, the matrix product in (3.84) is defined differently and consequently a similar, but different, update law results. 55 3.5 ' MATl all thr follox‘ Wher BVCt y(k) dep aux Pol. knc rep We eff 131 831 3.5 Simulation Results MATLAB simulation results for the three linear BSR algorithms are being presented. In all three cases the environment comprises of a 3 x 3 IIR mixing convolving filter with the following model [110, 112, 155]. m—l n-l Z A,m(k — i) = Z B,s(k — i) + v(k) (3.100) j=0 i=0 where h A,-,B,- - are the co-efficient matrices for the 1‘ tap of the autoregressive and moving average sections of the filter respectively, and v(k) - is the additive gaussian measurement noise. m, n -— are the lengths of the auto-regressive and moving average lags in the filter. The demixing network for each simulation can be initialized in a number of ways, depending on the available environment information, which may be collected via auxiliary means [155, 157]. This may include complete, partial, or no knowledge of the poles of the mixing environment [126, 135, 139]. The primary advantage of incorporating knowledge of poles in a BSR algorithm is to obtain a more compact demixing network representation. All the three cases have been verified via simulation and the algorithms were able to converge satisfactorily using online update algorithms. However, to make efficient use of space only typical results are being presented here, see [110, 112, 128- 130, 135, 139, 140]. Unless otherwise specified, the original sources had sub-gaussian, gaussian and super-gaussian densities respectively. The convergence performance of the BSR algorithms, for the linear and convolutive mixing, is measured using the multi-channel intersymbol interference (MISI) 56 benchma pennutati depend ( Convergg Simulatir 0n the b (3.102) . Outpms Where 5413') : 7', repl benchmark. MISI is a measure of the global transfer function diagonalization and permutation as achieved by the demixing network and is defined as N ZZlerl“maxp./' 1012171 _ 2161”] 1"“pr 1619171 I p 7 . N MISIk=Z ’ p . +2 . (3.101) i=1 mxm 1sz7 j=1 maxp.ileij G(z) = H (2) * 17(2) - Global Transfer Function, H(z) = [A9, Be, Ce, De] —Transfer Function of Environment, H (z) = [A, B, C, D]—Transfer Function of Network. As earlier discussed, the optimal score function for the derived update laws depend on the density function of the sources to be separated, which upon successful convergence of the algorithm is similar to the density of the separated outputs. For simulations, a hyperbolic score function [126, 127, 130, 136] has been used, which relies on the batch kurtosis of the output of the demixing system. This score ftmction given by (3.102) converges to the optimal non-linearity for the demixing system as the network’s outputs approach stochastic independence. y-atanhwy) K402) S. -r (0(y)= y for |K4(y)|- %W\ " M. m 0 1 l 1 l l l l 0 05 1 15 2 25 3 35 4 d 3 I I I I T I I x 10 “*1. 3 2 F MAW-MA ‘ .5. “W. (Z) 1 ’— NWMWA -‘ Marv-HM.— 0 1 1 l 1 I 1_ 1 o 0.5 1 1.5 2 2.5 3 3.5 4 number of iterations x 10‘ (b) Figure 3.11. Minimum Phase Feedback BSR — All poles set at zero: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index 67 Pole Zero Map of Final Demixing Network I—\ l : . Pobs 1, o Zeros 0' O 1 o g, o~——— ------- 0__{,__-_3_60_§. —————————— ———-i m .0 g . 4. 6 Real Axis (a) Multichannel Feedback Blind Source Recovery 3 . . I I I I I 2 m d lSl fors1 oo 05 1 15 2 25 3 35 4 4 T I fii I I T X1o‘ N 3 d (I) § 2* 1 E7) _1__ 1 W 0 1 l l L L 1 l o 05 1 15 2 25 3 35 4 3 ,4. T .11w ISI for 33 N WWM_NW 0O 0.5 1 1 .5 2 2.5 3 3.5 4 number of iterations x 10‘ 0» Figure 3.12. Minimum Phase Feedback BSR — All poles initialized arbitrarily: (a) Demixing Network Transmission Pole-Zero Map, (b) Convergence of MISI Index 68 3.5.1.3 Simulation III: Feedforward Recovery of Non-minimum Phase Mixing For the case of non-minimum phase, we again choose m = n = 3 for the HR mixing environment model (3.100). In this case, the coefficient matrices are given by P 1 1 —1 0.5 0.8 -0.7 0.06 0.4 —o.5 A0=l —1 1 ,A1= 0.8 0.3 —o.2 ,A2= 0.16 -o.1 —o.4 _1 —1 1 —0.1 —0.5 0.4 -0.3 —0.06 0.3 '1 0.6 0.8 0.5 0.7 0.16 .425 0.3 0.7 BO: 0.5 1 0.7 ,Bl= 0.7 0.2 —o.3 ,Bz= —o.1 o —o.4 b0.6 0.8 1 —o.2 0.53 0.6 0.08 -o.13 0.3 The theoretical inverse of this environment model will be an unstable IIR filter with a minimum of 20 states. This instability stems from having two poles of the intended demixing network to be outside the unit circle. However, the demixing network is setup to be comprised of a doubly-finite 3 x 3 FIR demixing filter with 31 taps per filter (i.e., a total of 90 states) and supplied with mixtures of multiple source! distributions. The polynomial matrix W =[D C] is initialized to be full rank, while A and B are in Canonical form I. Instantaneous update results afier 40,000 iterations are presented in Figures 3.13-3.16. A close comparison of the theoretical environment inverse and the adaptively estimated inverse will reveal that the theoretical results form a part of the much larger (in taps), estimated inverse. 69 Pole Zero map of the Mixing Environment r x Poms o Zeros 1 I 1 1 I. 1 1 III, I 1 1 l 1 | I l Imag Axus O I l l I I I I l l l , L. l I o l t o I l l __,__;-" l I I l l l I l l l l I l I 0 Real Axis (a) Mixing Environment (impulse 1 0.6 0.5 0.4 I _o.5 N n o l-------- g I g 0.2 g o — I----—---- o _ ___ 41.5 . , -1 ' '05 o 5 1o 15 '02 o 5 1o 15 o 5 1o 15 0.5 0.5 1 '_ E N o ---———--—— ”0'5 a“ 0 ...,—__ a“ Cu -0.5 0 PM--- '0‘5 o 5 1o 15 ’1 o 5 1o 15 ‘0'5 o 5 1o 15 0.5 0.5 1 -04 o I.-—— ------ 05 I :302 :3 :8 __ 41.5 1 o .--—--—--- 1 o I —--——— 02 o 5 1o 15 4 o 5 1o 15 '05 o 5 1o 15 Taps-n Taps-n Taps-n (b) Figure 3.13. Non-minimum Phase Mixing Environment (a) Environment Transmission Zeros and Poles, (b) Impulse Response of the Environment Model 70 Mixing Environment (impulse) 1 1 0.5 0.5 0.5 o I I-u--- :: s: «:3 I I I a c o - '- ———————— o .—---— ————— -o.5 -o.5 . - o 5 1o 15 '05 o 5 1o 15 1 o 5 1o 15 0.5 0.5 1 ‘_ I o ..--—-——--- 1 0.5 1 {N 0 '.-_—_—_ it“ £8 I -0.5 1 0 .Fw—-—— J -o.5. e -1 - -o.5 A - o 5 1o 15 o 5 1o 15 o 5 1o 15 1 0.5 1 “0.5 o l.---—----- 1 0.5 a” L- ag «:8 o I --—--- 1 43.5 1 0 .---~--——- 1 '0'5 o 5 1? 15 '1 o 5 1‘0 15 '0'5 (T 5 1o 15 Taps-n Taps-n Taps-n (a) Estimated DenifingNetmork 1 0.4 0.4 0.5 02 .502 v- N v- 3“ o ,3 ° 3 ° -02 «02 -o.5 - - .4 1 : - -o.4 - A o 20 4o 60 '0 o 20 4o 60 o 20 40 60 0.5 2 1 .— 0 N 1 1 0.5 N N M g N .05 g o 3 o -1 -1 - 0.5 - - o 20 4o 60 o 20 4o 60 o 20 4o 60 0.4 1 1 02 0.5 5 o , g 0 , 80.5 3 3 3 0. .02 -0.5 0,4 -1 0.5 0 20 4o 60 o 20 4o 60 o 20 40 60 Taps-n Tws-n Taps-n (b) Figure 3.14. Non-minimum Phase Mixing Environment, Impulse Response of (a) Theoretical environment Inverse, (b) Estimated Demixing Network. 71 Final Globd Transfer Function 1 1 1 v- N 2 3 0 T ' ‘ E 0 * ‘W E 0 " r o 0 o 1 -1 -1 1 -1 o 20 4o 60 80 o 20 47) 50 so 0 20 4o 60 80 1 1 1 ,— to § 0 A a v: ‘ A..— § 0 : “1“ YA- It]? 0 -:,.-.:-.:- o o 0 -1 .1 -1 6(2):“ l -1 1 -1 -1 1 T335 - n Tans - n Taps -n (a) Pole-Zero Map ofthe Final Demixing Network "F ' ’ ‘-’—‘ f '_ T Xi” 7; ' ‘m' ‘ ' ' “ //<.1506) C530 \\ 0.8 ,/ 00 @Q2 \\ ‘1'] d5) b \\ /' O O ‘\ 0.6 * l.’ O O Cb \'\\ ' o 0 <2; O 4 (9 Q ".\ 0 CU 02; f 8 O ‘1 9 1 i O ; x 1 O O O ; E. 0 03 - - - — - - ”e E l. O 1 -O.2 ~ 1. 9) o f 0 4 K‘ O ((3)0 \ 1' -0 6 \ o o 00 x O O ,/ -0 8 \ CQ) 00 {5 C593,, '/ “ CO 0% Q: o .1/ _1 _ 7 - 1 , §"_\—.a_‘———- _._ , -1 -0 5 O 0 5 1 Real Axis (b) Figure 3.15. Non-minimum Phase Mixing Environment, (a) Final Global Transfer Function, (b) Demixing Network Transmission Zeros and Poles. 72 Multichannel Feedback Blind Source Recovery 6 F f r r r ,A“ ‘ 21H” '5 4 l‘ “4“" —« 5 “1.. — 2 1. wag, (L) M‘wNWwV‘M m’ " WfV‘MI-vv~~’ ”\W o 1 1 L L 1 1 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 I 8 f fl x 10 6 fl” WK 5 a: ' ‘1 E 4 L— Rx '1 (7) “M. 0 1 1 1 1 1 1 1 L g 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x10‘ 6 [1033K r —r fl T 4 - K 4 ‘5: “x .5 M ('72 2 p M 0 1 L 1 1 1 1 1 I 4; 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 number of iterations x 10‘ Figure 3.16. Non-minimum Phase Mixing Environment, Convergence of MISI Performance Index. A quantitative performance comparison of the blindly recovered communication signals using the presented BSR algorithms is provided in Table 3.1. Minimum Phase Feedforward Minimum Phase Feedback Non- Demixing Demixing minimum Poles Poles Random, Poles Poles Random, D5331: known fixed at stable known fixed at stable 3 zero fixed zero fixed poles poles AM -22.55 -21.77 -26.02 -27.67 -l6.47 -23.42 -17.67 Signal 151 PSK -26.66 -l8.48 -23.52 -23.52 -l8.85 -24.85 -11.43 Signal (1“ dB) White -2109 -21.85 -2157 ~26.03 -18.36 -23.96 -11.39 Noise Table 3.1. Quantitative Signal Recovery Comparison of the BSR algorithms 73 3.6 Remarks This chapter presents a generalized Blind Source Recovery (BSR) framework based on the theory of optimization under constraints of a state space representation with the Kullback-Lieblar divergence as the performance functional. The framework is then specialized to the state space blind source recovery for both static and dynamic environments, which may possess either a minimum phase or a non-minimum phase structure. For the case of minimum phase environments, two possible state-space demixing networks in feedforward and feedback configuration are proposed. For non- minimum phase (convolutive) mixing, to avoid any instability, the demixing network is set as a non-causal all-zero feedforward state space network. The update laws for all the cases have been derived using the natural (or the Riemannian contra-variant) gradient. The algorithms have been extensively simulated for both synthetic and physical data. The simulation results for all the cases have been provided using mixtures of multiple source densities, including one gaussian source. The algorithms exhibit success in recovery of sources in all cases. Furthermore, for the case of minimum phase environments, various possible initial conditions have been explored. A comparative table for a set of communication signals has also been provided as a guideline to the achievable separation/recovery results using the proposed online adaptive algorithms. 74 Chapter 4 BLIND SOURCE RECOVERY USING ADAPTIVE SCORE FUNCTIONS The chapter discusses Blind Source Recovery (BSR) of minimum phase and non- minimum phase mixtures of multiple source distributions using adaptive score functions. For the linear static and the dynamic convolutive mixtures, the ability to adaptively determine discerning nonlinear score functions allows for recovery of original sources independent from (and even in the absence of) precise environment identifiability. The performance of BSR algorithms strongly depends on the choice of an appropriate score function which appears as an element wise acting non-linearity on the output signals [1, 3, 14, 22, 24, 29, 30]. For a particular problem, the optimal score function depends on the distribution of the original source signals which are unknown in a blind scenario. In such cases, unless some assumptions about the distribution of the sources are made, any BSR algorithm will potentially be unable to deliver the desired performance. Therefore, adaptive estimation of appropriate score functions is very attractive fiom a practical implementation viewpoint. The proposed score functions have 75 been applied to BSR for both the cases of minimum phase and non-minimum phase mixing environment models [4, 10, 11, 16]. Several closed form entropy expansion based deterministic expressions for adaptive score functions have been earlier proposed, but they are typically suited for a limited range of distributions and converge to the true estimating score function for static mixtures only [2, 5, 6, 21]. Most of the mixtures encountered in practical BSR problems are from sources with a variety of non-gaussian distributions, e.g., speech and acoustics, communication signals, intemet data, interstellar signals, bio-medical signals etc. On the other hand, most noise phenomena or unidentified sources are assumed to possess gaussian distributions. This results in practical situations, where one has to cope with multiple source distribution mixtures including gaussian distributions. This chapter describes the use of the proposed adaptive score functions for the linear convolutive class of BSR. Two adaptive score functions are presented in this chapter. The first score function is derived from the generalized gaussian distribution model [7, 12, 26]. The second score function is based on proposed new hyperbolic source models, which extend current Pearson and Hyper-Cauchy distribution models [12]. The proposed score fimctions can be tuned online by certain control parameters. An adaptive algorithm to estimate these control parameters for the proposed tuning of score functions using the batch kurtosis of BSR output is also presented. The primary advantage of the proposed online parameter estimation is that it renders the adaptive estimation of the demixing network to be completely blind. No a priori information about the distribution structure of the original sources is required. Simulation examples verifying the proposed BSR 76 formulation (inclusive of the proposed score function estimation technique) are included as an illustration of the proposed framework. 4.1 Generalized Gaussian Distribution Model The generalized gaussian distribution class was proposed by Miller et. al. [17] and used in the detection theory as a model for non-gaussian noise. This class has a symmetric unimodal density characteristic. Various classes of densities are generated by a parameterization of the exponential decay of this density function. The Generalized Gaussian Family has the following density relation f(xi,a,-,a,-) = “ifli(°'i’ai) exp{-(fi,(a,-,a,-)|x,.|)ai} (4.1) 211%.) where ai >0, is a positive parameter that controls the distribution’s exponential rate of decay 03-2 is the variance of the distribution F(.) is the gamma function, given by (I) run = Ira-16"Tdr (4.2) 0 ,Bi(0',-,a,-) is a scaling function of a,- and a,- , defined as 1 151% 11%.) This generalized gaussian family can encompass a wide variety of distributions, 161(0'13611)é 0i— (4.3) e.g., for a,- = 1, the model reduces to the laplacian probability density function 77 meta.) =I‘L‘2"flexp{—(fl.(m,l)lxil)} (4.4) For 01,- = 2, the end results is the gaussian probability density function i i,1 2 f(xi1210'i) =%JCXP{—(fli(0112)lxil) } (4-5) Some members of Generalized Gaussian Family, 02:1 3 . . . , f - - - Gamma. “1:05 ___— Laplacian.a,=1 __ Gaussian.al=2 A _ Uniform.al=oo 2.55 l---—_____ 1.51- " .1 f(x, 0, 1) , \ 0.5— / ‘ ~ /’ // \\‘ — /,x’ \ \_, / vii-H.” ‘ o —— M'fih' ' — ’1 1 -3 -2 -1 0 1 2 3 Figure 4.1. Some members of Generalized Gaussian Distribution Family As a,- tends to infinity, the distribution approaches the uniform density function f(xi,w,ai):m;i 0) represents the super-gaussian family of distributions, while a platykurtic (Fisher kurtosis < 0) distribution represents the family of sub-gaussian distributions [12, 13]. For the generalized gaussian family, the generalized Pearson kurtosis can be simplified to riml‘lii KP(;,x,-)= a‘ a’ (4.11) Therefore, the standard Fisher kurtosis can be written as a function of the shaping parameter a,- only, i.e, Kf(,i,___r(%,-)F(%.)_, PM) Note that for the parameter ai = 2 , the distribution reduces to the true gaussian (4.12) with zero kurtosis. The complete kurtosis characteristics of the generalized gaussian family are given in Figure 4.2. As generalized gaussian family can only represent unimodal distributions. The Fisher kurtosis approaches approximately —1.2 as a,- approaches infinity (i.e., the Fisher Kurtosis for a flat or uniform distribution). 80 Fisher Kurtosis Characteristics of Generalized Gaussian Family 6 I I I T r Y7 T i 5 i— i -1 ‘1 4 ’ 1. 3. l ‘5 85 2 ~ ukv 1 r- \ o \ - 1 IT‘““ We 2 L I l o 1 2 3 4 5 6 7 8 81 Figure 4.2. Fisher kurtosis Characteristics for the Generalized Gaussian Family 4.1.2 Relationship between Bi (oi ,ai ) and ai Imposing the statistical constraint of distribution integral on the generalized Gaussian distribution family, i.e., (I) If(xiaai’ai)dxi =1 (4.13) _w where, the relation between 6,-(0', ,a,) and a,- can be estimated to be 510711011 Ya" = aiE{|xilai} (4.14) This relationship is helpful in the determining the final form of the generalized gaussian score function. 81 4.1.3 Derivation of Generalized Gaussian Adaptive Score Function (GGASF) Due to the very parameterized nature of the generalized gaussian family and its ability to model a variety of unimodal source densities of interest to blind source recovery, a parametric score function (nonlinearity) can be derived based on this family. This comprehensive score function inherits a nice parametric structure from its parameterized parent density. In an adaptive setting, the regulation parameter a,- can be adapted during the adaptation for the source recovery algorithm. The element-wise nonlinear score function required for the BSR problem has the definition [1, 3, 22, 24, 29, 30] ¢=-91’%f5¥l=[n(yi) 42622) 14.6.)? (4.15) For the 1"" output with an exponential density function, the equation can be simplified to 5P1()’%y (0i(}’i) = ———-—' (4.16) P101) where y,- : represents the i’h output of the network p,- (y,-) : represents the statistical probability density of the i’h output Using the generalized gaussian family as the candidate density in relation (4.16) for the score function 82 (0101): '- _6_ aifli(0'iaai) ...,. 2M.) “1131(011‘11) 211%.) CXP{—(fli(01aai)|xil)ai} by simple calculus, the relation reduces to (0101-) = «xi/31101.04)” 181 a; —16 a—ylflyil exp{_(fli(ai1ai)|xi|)ai} (4.17) (4.18) 1 0.5 * 1 ID. ‘- 0 . 1| 0 11 d o -0.5» _1 . 1 -1 -0.5 0 0.5 1 1 - 0.5 N on 11 0+ 11 6 CS -0.5 -1 . . 1 -1 -0.5 0 0.5 1 1 - 0.5 > 1 v to 11 0’ 11 :5 cs -O.5> -1 . . 1 -1 -0.5 0 0.5 1 Generalized Gaussian Score Function scaled to a Unit Square 1 0.5» 0. -o.5+ _1 L. n -1 -0.5 1 Y 1 0.5 0.5» 0+ -0.5’ -1 -1 -o‘.5 0:5 1 0.51 O» 41.51/ -1 -0.5 0.5 Figure 4.3. Scaled Generalized Gaussian Score Function for different values of parameter a 83 Using (4.14) and the definition |yi| = yisign(y,-) , the score function is given by la" 1 . _ lyi SlgnO’i) " 5W} where an appropriate choice of ai makes the proposed score function (or nonlinearity) (xi-(yr) (4.19) suitable to a particular source distribution. This characterization of the source distribution can be done adaptively using the Fisher kurtosis estimate for the batch-output of the BSR network. Figure 4.3 shows some members of the generalized gaussian score function family as a function of 51,-. 4.2 Hyperbolic Source Density Models In this section, first new probability density models are proposed and their relationship to existing well-known density models is established. Consequently, a score function for each of the proposed density models is derived. A combination of these two score functions results in the proposed Hyperbolic adaptive score function [30]. Interestingly the proposed score function is quite similar to our earlier proposed adaptive score function based on heuristic justifications, see Waheed, et al. [28]. 4.2.1 Background 4.2.1.1 Pearson Distribution Model A simple mixture of two gaussian densities gives the Pearson density model [18, 19], i.e., f(xifli) = (1 - ai)fl(xi) + aif2(xi) (4-20) where 84 — .— o 2 1 exp (x1 1“!) 2 27:0",-2 201' fi(xi) = represents a gaussian pdf. For the symmetric case of the Pearson model, i.e.. ai=12, |,u1|=|,u2|=|,u| and 0'12 = 0% = 0'; the Pearson model possesses two distinct modes for ,u > 0. However, for p = O the density reduces to the regular gaussian density. This model can effectively model gaussian and sub-gaussian densities including bimodal (binary) densities. For the symmetric Pearson density, the model can be expressed as 2 2 1 . x- -x- f (xi) = exp ——’u’—2- exp -—'2 cosh _y, 2' (4.21) 0i V 2” 201' 20',‘ 0" Fisher Kurtosis for Pearson Distribution, 62:1 0 a . , . , , 1 -0.4[ 1 \\ . -0.8 r \ — 3 1 k .1 “2‘! -1.2 - J -1.4 ~ - -1.6 ~ ‘ _ -1 .8 ~ _ -2 4 1 1 4 1 1 1 “—‘ o 1 2 3 4 5 6 7 8 l p Figure 4.4. Fisher Kurtosis Characteristics for the symmetric Pearson Model as a function of ,u 85 The Fisher kurtosis [12, 13] for the symmetric Pearson distribution model is given by 4 K:(#,02)=__‘_2£__ 2 (4.22) #2 +02) It is evident that for ,u > O , the kurtosis of the distribution is strictly negative; see Figure 4.4, making it a suitable model for sub-gaussian to gaussian distributions only. 4.2.1.2 Generalized Cauchy Distribution The symmetric super-gaussian density models are best represented by the Generalized Cauchy density. The generalized standard Cauchy density is given by [12, 18] k- f (xi)=——’—2—m—, (4.23) (1 +x ) 1 where mi: represents the order of the Cauchy model (i.e., determines the kurtosis of the distribution) ki : represents a scaling parameter to satisfy the constraints of a pdf, e.g., for m,- =1, k,-=%[. The Cauchy distribution is a symmetric distribution with heavy tails and a single peak at the center of the distribution. Due to the heavier tail, the kurtosis is much larger than that for a normal distribution. In short, the kurtosis for a generalized Cauchy model (mi > 0) is strictly positive and has an infinite range. 86 4.2.2 Proposed Hyperbolic Distribution Models 4.2.2.1 Hyperbolic Sub-Gaussian Model An alternate model is proposed based on the simplified Pearson model in (4.21). Not only can this model represent all the kurtosis range of the original Pearson model; but also due to an increased degree of freedom in the hyperbolic component, it can model additional range of bimodal densities, which the original Pearson model cannot. The proposed model is given by fsubo.)=N(y..a?)cosh(4 II ~ 0 U G 6 -O.5 i N: g_ g... -1 -5 -5 0 5 -5 0 5 y, yr Figure 4.8. Proposed Adaptive Hyperbolic Score Function: some possible combinations. 4.3 Adaptive Estimation of the Fisher (or normalized) Kurtosis In order to perform an efficient adaptive estimate of the Fisher (or the normalized) kurtosis [12, 32] of the BSR output, a batch update algorithm is employed. Practically, one efficient simple formulation to adaptively estimate the output Fisher kurtosis is using a forgetting factor as follows: K.(n+1)= y.K.-(n>+(1—r,-)Kf(y,-(n» (4.29) where 94 K ,(n) : represents the adaptively determined Fisher kurtosis for the 1"” output at the n’h iteration 7,: represents the forgetting factor for adaptive kurtosis determination of the i’h output; K I ( yi(n)) : represents the normalized or Fisher kurtosis estimate for the rim batch of the i’h output, which is computed as _ _. 4 Kf(y.-(n» = M- 3 (4.30) 0i where E: represents a batch of length N for the 1"” BSR network output 12,-,6} : represent the mean and the variance of the batch of the 1"” BSR network output, respectively. 4.4 Parameter Regulation for the Proposed Adaptive Score Functions Using the estimated output Fischer kurtosis (4.29), discrimination can be done between the three possible cases of gaussian, sub-gaussian or super-gaussian distributions. This is done by appropriate selection of the score function shaping parameters for both the proposed adaptive score fimctions. This can be effectively done for the case of source recovery from linear convolutive mixtures, where only a discriminating score function is adequate to separate sources with different distributions [8, 15, 26, 28, 30, 31, 33]. On the other hand, for nonlinear BSR, a good match between the true source distribution and the 95 chosen score function is required for estimation of the correct separating network [7, 9, 24]. 4.4.1 Choice of parameters for Generalized Gaussian Adaptive Score Function In order to employ the proposed adaptive generalized gaussian score function for the linear convolutive BSR, one needs to primarily focus on only three cases of parameter a,- such that a classification between super-gaussian, sub- gaussian and gaussian distribution families can be made. Practically, a choice of a, for the computation of the score function, see (4.19), is made using the adapted value of kurtosis; see (4.29), by the following simple scheme. 0 if K,(.) _>_ V , 1 S 0:,- <1.5 ; where l]! > O is a constant threshold (for super-gaussian source distributions) 0 elseif Ki(.)S—§ , 611-23; where 5 >0 a constant threshold (for sub-gaussian source distributions) 0 else a1: 2 (for gaussian distribution) Typical values chosen for or, in the super-gaussian case is l, and for the sub- gaussian case is 3. 4.4.2 Choice of parameters for Hyperbolic Adaptive Score Function In order to employ the proposed hyperbolic adaptive score function for the linear convolutive BSR, the estimated normalized batch kurtosis of the BSR network output is used to discriminate between the three possible cases of gaussian, sub-gaussian or super- 96 gaussian distributions. This is done by appropriate selection of the score function shaping parameters 1),- , a,- and ,Bi. Outlined below is an algorithm to choose the salient feature of the proposed score function 0 if Ki(.)2u/, 0S1),- S 1; 61,21 and ,8,- 21 where u/ >0 is a constant threshold (for super-gaussian source distributions) - elseif K,(.) S —§ , u,- 21 ; a,- 21 and ,6, 21 where 6 > 0 a constant threshold (for sub-gaussian source distributions) 0 else a; = 0 (for gaussian distribution) Note that for the case of sub-gaussian and super-gaussian distributions, er, and fl,- can be fine tuned to match the dynamic range and also further improve the signal to noise ratio (SNR) and intersymbol interference (ISI) of the recovered BSR sources. 4.5 Simulation Results Simulation results using a 3 x 3 HR mixing environment are being presented. The demixing system is formulated as a state-space network [20, 22, 29, 33]. For both simulations involving feedforward and feedback networks respectively, the structure for matrices A and B is kept fixed [23], whereas elements of matrices C and D are adapted. For an in depth discussion on the BSR algorithms see [22, 23, 25, 27-29]. In all presented simulations, the source signals are chosen to possess different distributions, i.e., super- gaussian, gaussian and sub-gaussian respectively; the score function is adapted online along with the BSR algorithm. The convergence performance of the algorithm is measured using the multi- channel intersymbol interference (MISI) benchmark. MISI is a measure of the global 97 transfer function diagonalization and permutation as achieved by the demixing network, see Appendix D for details. 4.5.1 Environment Model This environment model is assumed to be a 3 x 3 HR filter m—l n-l Z A,m(k — i) = Z B,s(k — i) + v(k) (4.31) j=0 i=0 where '1 1 -1 0.5 0.8 —0.7 0.06 0.4 —0.5 Ao=l —1 1 ,A1= 0.8 0.3 —0.2 ,A2= 0.16 -0.1 -0.4 _1 —1 1 —o.1 —o.5 0.4 -o.3 —0.06 0.3 ’1 0.6 0.8 0.5 0.5 0.6 .125 0.06 0.2 BO: 0.3 1 0.1 ,131 = -o.3 0.2 -0.3 .32: —o.1 o 0.4 _0.6 —0.8 1 —0.2 —0.43 0.6 0.08 —O.13 0.3 and v(k) - represents additive gaussian noise 4.5.2 Demixing Network Structure The theoretical inverse of this IIR mixing environment will be a 3 x 3 HR matrix filter with each individual element being an IIR (or ARMA) filter of order 18. For both the feedforward and the feedback demixing cases, matrix C can be initialized with very small random numbers or all zero elements, while the matrix D is chosen to be an identity matrix. The number of taps per filter was chosen to be 21. Note for these equivariant minimum phase setups, any over-determination in the number of taps does not affect the performance of the algorithm. 98 Mixing Environment (impulse) 1 0.6 5 0.5 0'4 0 Ir-__--__-___ . F N P) g o ' c“ 0.2 I €05 - o - —.-____._..-_ ' 0'5 o {o 20 '0‘2 o 10 20 '1 o 10 20 0.5 0.5 0.4 I 0.2 . o -- --------- o ----—-———-- :3 a a o I -... c -05 c -05 c ' ' -0.2 ’1 o 10 2o '1 0 1‘0 20 '0" o 10 20 0.6 1 1 0'4 0.5 0.5 cm 0,2 I :81 :8 o _.__ _________ o -——-—-——— o -------- .02 0 1O 20 '0'5 0 1O 20 '05 0 1O 20 Taps - n Taps - n Taps - n (8) Theoretical Environment Inverse 0.6 1 0.4 0.4 1 0.5 0.2 :: t.- ‘2 I ___---- 0 III— ------- l 0 I ___—___— ‘ -o.2 . . 1 - _ .4 .02 0 5 1O 15 .05 0 5 10 15 ‘0 O 5 10 15 0.2 0.5 0.4 0.1 0.2 - I - ———————— .. I. 3“ o '-r- —————— gwo s i it“ o -.“f__—__ 0.1 1 ' oz 1 '0’2 6 s 140 15 '1 6 5 1o 15 '0" o ? 1F 15 0.2 I 0.6 1 - -~-_ _____ . l - ° I - ,1 °4 8 0.. {L02 i 02 1 i o - -O.4 o I_-__ ______ I I - . 2 . .06 0 5 10 15 ‘0 0 5 10 15 '05 0 5 10 15 Taps - n Taps - n Taps - n (b) Figure 4.9. (a) Mixing Environment Transfer Function, (b) Theoretical Environment Inverse 99 Estimated Demixing Network by BSR 2 —-r 4 2 .. 1 w 2 m 1 ll 5" L 3" I 2" 0 I""‘“‘“ 0 II. ______ (M I —__ ______ . I I 4 '1 o 5 10 15 '2 o 5 1o 15 '2 o 5 10 15 0.4 2 1 0.2 1 0.5 I " LI__I_._. . N "’ .._ ___- EN 0 I I 2W 0 - - ------ 304 O I l.— .- -02 '- " -o.5 '0" o 5 10 15 '1 o 5 1o 15 '1 o 5 1o 15 0.5 0.6 1.5 .. o l.__________ 1 a 0.4 5:; 1 303.0 5 3 0.2 3 0.5 . o '1.-_ ___—___ o I_-_.. _ ---— -1 1 A = -0.2—-—-—-——-——-~ -0.5 0 5 10 15 0 5 10 15 O 5 10 15 Taps - n Taps - n Taps - n (a) Multichannel Blind Source Recovery 8 If I I I I _1 ‘5 6 ~ 1 .5. 4 — ~ ‘9 2 _ _ 0 1 1 T‘Ww‘mi "' 1 "A“ "FA 0 0.5 1 1.5 2 2.5 3 4 10 4 , , r 1 x 10 N In 5 5~ ~ Z 0 1 l l l # 0 0.5 1 1.5 2 2.5 3 I ‘r f I I x—lo‘ 8 -l a 6 ~ - 2 4- -1 (2 2L W q 0 L l l I L w 0 0.5 1 1.5 2 2.5 3 number of iterations x 10‘ (b) Figure 4.10. F eedforward BSR using Adaptive Score Functions: (a) Estimated Demixing Network Transfer Function, (b) Convergence of MISI Index 100 Convergence of Output Signal Kurtosis 4 I fi r I T 5:: : 1 11M 0,11%va WI” wfi‘i’“\'l~:fi’blfl)i"a EMMWMMXi,”~,~ml‘l“imhifilj\llhlv a” 1.1%ng M“. M '1" ll 2 0o 01.5 1 11.5 2 2T5 3 2 . . f r . x 111‘ 11 ‘ 031%., N“ “Mme/“Wm? WW" NW1 fiw‘wkihfi'r’ WMW l 1 l 0 0.5 1 1.5 2.5 3 2 fi T T f I _1 1 r 1 J l 0 0.5 1 1.5 2 2.5 3 numberotilontlom 4 x 10 (a) Final Global Transfer Function 1 1 ‘ 1 :2 ‘2 $3 34: 0 ——~—-—— 3.: o -—-——-——-— E o - i (D (D (D -1 -1 -1 « o 10 20 30 o 16 20 30 o 20 4o 1 1 1 3:1 53 a 0 0 (9 -1 -1 .1 1 0 1o 20 30 o 10 20 30 o 20 4o 1 1 1 1 F, 3 32 E 0 u v. 0 1 E 0 r 1 (D 0 (D -1 -1 l .1 1 o 20 40 o 10 20 30 o 20 4o Taps - n Taps - n Taps - n (b) Figure 4.11. Feedforward BSR using Adaptive Score Functions: (b) Convergence of Batch Kurtosis, (c) Final Global Transfer Function 101 Multichannel Feedback Blind Source Recovery 6 T I I I ‘1' ‘5 4L \MHMMMAV “ 5 xx- — 2 _ """\ s ‘2 ”WWW WWfi 0 l 1 1 1 0 0 5 1 1.5 2 2 5 3 6 T‘ I I I _x—lo \ \\ g 4 _ \\ - L. W .2 VW\/./’\ (7) 2 l— M...“ ... — WWW-'- 0 l _L 1 l l 0 0.5 1 1 5 2 2.5 3 6 I I I I I X1o‘ x‘\ 2 4— ‘\\r ~ g M a.) 2 F W -4 O r 1 41 1 l_ 0 0.5 1 1.5 2 2.5 3 number of iterations x 10‘ (a) Convergence of Output Signal Kurtosis for Feedback BSR 3 . r . 2» - 8‘ 1 i wwdfifyflwwwwhwa“"flfifiwnMvawfi\”é T." OW‘wa/‘W , 40 d5 1 {5 2 is 3 x4 for y2 o 05 1 15 2 25 3 2 r f I I T x 10‘ a 1 1 CC} xv 0 PW NV “(WWW-l": 1 1 1 1 w 1 '0 05 1 15 2 15 3 number of iterations x 10‘ (b) Figure 4.12. Feedback BSR using Adaptive Score Functions: (a) Convergence of MISI Index, (b) Convergence of Batch Kurtosis 102 Final Global Transfer Function after BSD 3 3 3 2 2 2 :I 1 .33 1 4 $3 1 3: a. E 0 o —— 0 o e — w o . A -1 -1 -1 -2 -2 -2 0 20 40 0 20 40 o 20 40 3[ 3 3 2 < 2 1 2 _5. 1 «a 1 «_3 1 N 3* o - S’ 0 ” 59’ ° -1 l -1 -1 -2 -2 - -2 0 20 40 o 20 40 o 20 4o 3 3 3 2 2 2 " 1 “‘ 1 «'3 1 5 o 1 5 o 4 Q o -1 -1 J -1 -2 -2 -2 o 10 20 30 o 20 40 o 20 4o Taps - n Taps - n Taps - n Figure 4.13. Feedback BSR using Adaptive Score Functions: Final Global Transfer Function 4.6 Remarks In this chapter, two separate adaptive score functions for blind source recovery algorithms have been proposed. One of the score functions is based on the generalized gaussian density model, while the other is derived using proposed hyperbolic extensions to the Pearson and hyper-Cauchy models. Further we presented a method for online adaptation of the parameters for these score functions. This online adaptation scheme has been extensively tested for both minimum phase and non-minimum phase, instantaneous as well as convolutive mixing. For linear convolutive mixing, the results using both the proposed score functions are comparable. For simulations using bimodal distributions only, the latter score function outperforms the former that is best suited for unimodal 103 distributions. For nonlinear BSR, however, the slightly different kurtosis characteristics of the two proposed distribution models may be better exploited to estimate the correct demixing network. A couple of simulation examples using two different demixing architectures have also been presented. It is observed that the proposed adaptive framework can recover sources from a complex convolutive mixture efficiently even when all the sources have different probability density structure. 104 Chapter 5 BLIND SOURCE RECOVERY OF UNDERCOMPLETE AND OVERCOMPLETE MIXTURES Blind Source Recovery is a framework for the estimation of unlmown original source signals. Most of the existing BSR literature as well as most theoretical developments in this dissertation have been done under the assumption that the number of observed mixtures is equal to the number of original sources. In essence, this implies that the dimension of original sources to be estimated is known a priori and thus sufficient dimension in mixtures is recorded. In such a quadratic dimensional scenario, the recoverability of the sources is guaranteed provided the environment is invertible [13, 14]. In case the environment is linear in nature, this theoretical demixing network is unique and BSR adaptation results approach a permuted and scaled version of this 105 solution. However, in practical settings, the number of mixtures may or may not exactly equal the number of original sources. In case the dimension of observed mixture space exceeds the dimension of the original sources, the BSR problem is termed as Undercomplete BSR. Contrarily, if the dimension of the mixture space is smaller than the signal space, the recovery problem is called Overcomplete BSR [1, 4, 8, 27]. This variation in dimension may arise due to a number of reasons. Firstly, in a blind setting, the number of original stochastically independent sources may not be exactly known, e.g., consider stellar measurements, cocktail party problem, financial markets etc. Secondly, the dimension of the mixture or observation space may be determined by the structure of the problem [6, 18, 19], e.g., communication networks, number of installed recording devices etc. Thirdly, the additive noise in an environment may be dynamic and originate from a spurious number of sources causing the number of actual sources to vary over time, e.g., number of users in a wireless communication system, channel additive sources and multi-path effects in multiple access communication systems or in fact any measurements in dynamic noisy environments [5, 8, 11]. Due to these practical issues and difficulties in BSR problems, it is imperative to consider the cases of undercomplete and overcomplete BSR. In this chapter, assuming the mixing to be linear in nature, a methodology is presented to use the BSR algorithms presented earlier in this work for the case of undercomplete mixtures. The case of overcomplete BSR is intractable in general. However, only possible solutions exist for sparse sources and are discussed in subsequent sections [1, 4, 10, 16, 24, 25, 27]. 106 5.1 BSR of Undercomplete Mixtures Undercomplete BSR is a more commonly encountered problem especially in the realm of communication systems, radar and sonar measurements or any other noisy BSR formulations, where adequate sensors are employed to collect the mixed signals and/or corrupted sources. For the case of undercomplete mixtures, the original sources occupy only a subspace of the mixture dimensions. In noisy undercomplete BSR, the update laws presented in earlier chapters can be used as is on the mixture dimensions to recover an equal number of sources. However, in such a recovered set of estimated signals, some of the sources may be recovered with multiple permutations and deflationary approaches may be needed to efficiently estimate the actual estimated sources [1, 5]. Altemately undercomplete BSR problem can be converted to the standard BSR problem by efficiently discarding some of the mixture data. This can be achieved by using signal pre-processors [4, 5, 7], such as PCA, projection pursuitietc. Efficient data reduction is useful in several ways; firstly, it reduces the dimension of the BSR problems and therefore results in a reduction of computational requirements. Secondly, data reduction techniques such as PCA based pre-whitening may also result in reduction of additive noise in the mixture space simplifying the BSR problem at hand. However, the performance of the data reduction stage is critical, as any data discarded at this stage is not used by the BSR stage and thus in-efficient data reduction may limit the performance of the overall recovery setup [5, 8]. See chapter 6, for the performance of BSR algorithms with a data-reducing pre-processor. 107 5.1.1 Efficient reduction of Mixture Dimensions In undercomplete mixtures, efficient estimation of the desired signal subspace results in both reduction of overall computational load as well as it may serve as a pre—processor to reduce the additive measurement noise in the mixture space. Two best candidates for such reduction of data dimensions are basis pursuit and pre-whitening using principal component analysis (PCA). Under good implementation both techniques lead to similar performance [4, 5, 7], therefore only PCA is discussed in this work. PCA may be implemented either using algebraic or adaptive techniques. 5.1.1.1 Algebraic Principal Component Analysis In algebraic PCA, an auto-correlation matrix of the received mixture data is computed or estimated online as [6]. R(m) = ar(m)rH (m) + (1 —a)R(m -1) ‘ (5.1) An efficient data reducing pre-whitening matrix in this case is then constructed as follows: W ___ D;%VSH (5.2) where 15$ and 173 are the estimates of the n dominant eigen-values and eigen-vectors of the auto-correlation matrix R. These eigen values and vectors can also be computed adaptively, see [8, 9]. 108 5.1.1.2 Adaptive Principal Component Analysis Instead of trying to estimate the auto-correlation of the mixture data or its eigen decomposition, the pre-whitening matrix may be directly adapted from the mixture data [6, 7] by the following simple update law W(k +1) -_- W(k) + n(k)[1 —x(k)xH (10] W(k) (5.3) or its averaged version AW(k) = 77(10[1 — and RW =0 wj(k+l) = . * (5.15) i/I(wj(k)*+77(t)sgn(—M,~—wj(k) )), if Mfr-w]. <0 where, n(t) - represents a small time adaptive learning rate. Often the learning rate is chosen to be an exponentially decreasing function with a decay rate that depends on the characteristics of the source distributions to be estimated. w(.): is a nonlinear function that projects the updated weight vector again onto the unit hyper-sphere, i.e., w(.) = —(l- "01 . Determine the change in weight space AWi=Wi-W,-_1;i=1,2,...N (5.16) . If IAW1|< a where e>0 is a small positive valued stopping criterion; the algorithm has converged to its final state. . Otherwise, update the iteration counter i: i+1, and repeat steps 2 to 9 till the algorithm converges. For the proposed algorithm, the update is according to the rule: one winner takes all. The algorithm reaches a fixed-point (or settles down to a solution), when 117 E(sign(M,--wj(k)*))=0. The AICA algorithm at this stage has converged to the median of the distribution, which for a symmetric distribution also matches with the mean of the distribution. At convergence, there is no change in the weights and E(Wk+1)=E(Wk). In [3, 15, 16] similar conditions have been derived for gee-ICA, termed as Geometric Convergence Condition (GCC), but these conditions are in general satisfied by all self-organizing algorithms (SOM). Both AICA and geo-ICA algorithms, resemble the Kohonen’s algorithm for self-organizing maps (SOM) with a trivial zero- neighborhood [7]. In a 2-sources 2-mixtures case, the algorithm possesses only two fixed points [15], W(oo) = (5-17) where, X is a generalized permutation matrix. The convergence of the algorithm to the true solution is therefore a function of initialization and distribution of the sources to be estimated. In case of super-gaussian sources, however, only the true solution is the stable fixed point. Another observation for lower dimension problems is that a weight vector may get trapped between two data clusters and may require re-initialization especially if the two directions are very close to each other. However, for larger dimension problems, the probability of this occurrence is negligible. The proposed algorithm framework has similarities to the Geometric ICA algorithm [3, 12, 15, 17] but requires significantly less computation due to the choice of a different distance measure and a reduced number of parameters to update. The use of a neighborhood function in the proposed algorithm is currently being investigated for 118 source distributions other than the super-gaussian. There is interest in the ability to handle signals with multiple source distributions [20]. Figure 5.2 presents the results after convergence of [the proposed algorithm for the 2x2 case with initialization as presented in Figure 5.1. The initial estimates of the weight vector were chosen randomly. It is evident from Figure 5.2 that the algorithm caused the initialized weights to converge to the nearest desired direction. Directions of Initial 81 Final Mixing Matrix Estimates 0.8-4 0.64 0.2L Figure 5.2. Convergence of the weights in the 2 x 2 mixture space; wi(0) denotes the initial weight locations while wi(oo) denotes the final weight location. 119 5.3.2 Algebraic Matrix-Distance Index (AMDI) Algebraic Matrix Distance Index (AMDI) is primarily a scaling, sign and permutation invariant symmetric numerical index to estimate the “similarity” between the column- bases of two “normalized matrices” with conformal dimensions. Unlike the Multichannel Intersymbol Intereference (MISI) measure (also called Amari Index by some) used in ICA to measure the diagonalization of the global transfer function, AMDI is a measure of similarity between two column-normalized matrices with similar dimensions. These attributes also make AMDI suitable for comparison of results for algorithms such as AICA and geo-ICA especially for overcomplete ICA where computation of MISI is not possible. In order to compute this metric, the columns of the matrices are normalized by projecting them onto the conformal unit hyper-sphere by dividing all the elements of each column by the norm of the corresponding column. The AMDI is given by n-ZmaxdW'*H|) n-Zmax(|W'*H|) AM)I(W, H) = '0“ + 00’s (5.18) n n where W, H — are two column-normalized matrices of similar dimensions. The Algebraic Matrix Distance Index (AMDI) has the following properties: 0 0 S AA/HDI (W ,H ) S 2; AAfl)1(W,H) = 2 only if both matrices are orthogonal to each other in the n-d space, also . AMDI(W,H) = AA/fl)1(H,W) 120 5.3.3 Resolution of AICA Local Minima and Ambiguities Algebraic ICA is a member of a class of data-dependant self-organizing algorithms. The performance of the algorithm is a function of the choice of the initial weights, learning rate and most importantly characteristics of the data. While AICA weight columns converge to the directions (or columns) of the actual mixing matrix in the limit, provided the data is unimodal with a high value of kurtosis, and the mixing matrix directions are sparse [3, 16, 22]. For data that does not meet these requirements either due to lower kurtosis, or due to the actual mixing matrix directions being “close” to each other (especially for the overcomplete cases), the algorithm does not give the desired estimate performance. For such cases, a better estimate of the mixing matrix may be derived using more than one trials of the algorithm. This technique is also useful for the cases where the available data is inadequate for the AICA algorithm to estimate the mixing matrix. A heuristic stepwise approach for such ambiguous cases can be followed as: l. Initialize all the columns of the estimator matrix at the same location. For each successive trial, choose a different initial location, e.g., one way to do so is to initialize the weights at possible m-d binary combinations of 1’s and zeros. For m mixtures there are possible 2’"-1 such initializations possible, i.e. leaving aside the trivial combination of all zeros. 2. Execute the AICA algorithm for each initial condition. For a sparse mixing matrix, the algorithm converges independent of the initialization provided adequate data is available [17, 24]. In most cases, the algorithm will converge during one of the trials. 121 3. In case, the algorithm fails to converge for any choice of initialization. Note, in a blind scenario this condition cannot be verified and multiple trials serve as a mean of estimating the mixing matrix with a higher confidence level. The AICA estimate for all cases can be post-processed to determine the high confidence candidate weight columns, see steps 4 to 10. Determine the list of “close” columns in all trials. This can be algebraically realized by choosing the corresponding columns with dot product values above a certain threshold or. This threshold a can be chosen as a function of data sparseness and the desired resolution of the AICA algorithm. Concatenate the column pairs with dot products better than a higher threshold ,6, where ,6 2 a and a common column. Repeat Step 5 till all possible concatenations are made (this may take 77 -l iterations for 17 trials, if 1 combination is made each time). Determine the independent 2 columns as the mixing matrix candidate columns. The concatenated column cases are given a relatively higher rank. The candidate columns are arranged as a function of their dot-products values. At this instant, one option is to choose the M highest ranked columns as the final mixing matrix candidate columns. In case the final column candidates making the out are smaller than M, deflationary approaches can be used after estimation of the output using the mixing matrix estimate. The remaining columns can now be determined in a consecutive stage. 122 10. Altemately when the candidate number of columns is larger than M, instead of choosing the columns via ranking, all the columns may be used for the source recovery stage, then the selection of appropriate outputs is made by measuring the independence of recovered outputs, e.g., such an estimate can be made using Quadratic Information Measure [21]. Using the above-mentioned technique, for a 3-d mixture space, the mixing matrix with a resolution of better than a 15° (approx. 0.25 rad) hyper-cone in the weight space has been successfully estimated. See simulation III for some more details. 5.3.4 Inferring Sources from Overcomplete Mixtures Once the mixing matrix has been estimated by the AICA algorithm, the original sources can be estimated. For quadratic ICA, the sources can be estimated by computing the inverse of the estimated mixing matrix W, i.e. 5 = W‘lm (5.19) For the case of overcomplete mixtures, the recovery of the sources is in general not possible. However, for the case of sparse super—gaussian sources, once the mixing matrix has been estimated, interior point linear programming (IP-LP) techniques [4] can be used to determine the best possible estimate of M-d sources from m-d mixture where M > m. The technique uses interior point LP with an L1 norm based performance functional. Standard interior point L2 norm based LP methods can be used for this technique according to the following procedure [4, 10]: min ch min "5“1 3.1. 3 st. (5.20) Ax = b,x _>_ 0 W5 = m 123 where A = [W —W]: W is the estimated mixing matrix of dimension m x M , note by choice M = n in this case. x = [u v]T: represents the M—d intermediate recovered sub-signals with the constraints uZO and v20 c=[l HT: constrains the recovered M—d sub-signals u and v equally, l is an M—d vector of ones. b = m: m are the m-d overcomplete mixtures. The M-d outputs are then determined using the relation 5 = u — v (5.21) Another technique using the shortest path approximation is explained in [3, 16]. This technique requires less computation than the aforementioned technique. However, in our experience the quality of sources recovered using the LP approach is better than the latter approach. 5.3.5 Simulation Examples Four simulation examples are being presented: the first example demonstrates the issue of multiple fixed points (or presence of local minima) in the AICA algorithm using independent identically distributed (IID) data. Then there are two examples using overcomplete mixtures of 3 and 5 sparse speech segments, respectively. In these sparse speech samples, words are spoken with certain time intervals to increase the sparseness of the actual speech signals. The recovery of these speech signals from their 2 and 3 static mixtures, respectively, is presented. The AICA algorithm is first used to estimate the 124 mixing matrix, while the speech sources are inferred using the L. norm interior point linear programming. The fourth example discusses BMMR for a 6 x 3 mixture using IID laplacian sources. The motivation for this example are cases where some of the mixing directions are “so close” that algebraic ICA algorithm cannot discriminate these algebraically adjacent columns due to the relatively high overlap of data aligned to them (resolution). For such cases, the proposed ambiguity resolution technique is applied, which allows for the exploration of the complete mixture bases in an orderly and organized manner. 5.3.5.1 Simulation 1: 4 x 2 Laplacian Data In this simulation, a synthetic random matrix mixes 3 independent identically distributed laplacian sources. For this simulation the mixing matrix is .._ 0.7231 —0.6121 —0.6587 0.8005 0.6908 0.7908 0.7524 -0.5994 All the weights for the AICA algorithm in this case are initialized along the real axis. It is observed that the algorithm converges to the undesired fixed point. This is sensed by AICA observing that only two of the weights are updated, while the other two are not, see Figure 5.3.a. The algorithm in this case re-initializes the discarded weights very close to the weights which are currently being updated. This results in the convergence of the algorithm to the true solution, see Figure 5.3.b. The determined final weights for this simulation are _ 0.7265 0.8039 0.6448 0.6075 — 0.6872 -0.5948 0.7644 —0.7943 and the achieved AMDI performance is less than 0.0003. 125 Weight locations (interim) K r . \ (a) Final Weight locations I K r r r )— (b) Figure 5.3. BMMR for Laplacian Data using AICA: (a) Interim Results, (b) Final Convergence 126 5.3.5.2 Simulation II: Overcomplete Speech Data (3 x 2 Case) A simulation example for an overcomplete mixture which comprises of three speech signals mixed by a random mixing matrix into two mixtures is being presented. The AICA algorithm is first used to estimate the mixing matrix, while the speech sources are inferred using the L1 norm interior point linear programming. The AICA estimated mixing matrix was very close to the original mixing matrix used with a normalized AMDI of 0.001. The recovered sources show a permutation between the first and third sources. Also notice that the recovered signals from the overcomplete mixture in Figure 5.5 .a are not without artifacts but very closely resemble the original sources in Figure 5.4.a. Observe that the separated quality achieved using separate Blind Mixing Matrix Recovery (BMMR) and source recovery steps far exceeds other algorithms [1, 4, 10], provided a good mixing matrix estimate was found. 127 Original Speech .— ° _rééfl‘w‘fiikflF—éfi‘— ‘ 7 Sam ples x 10‘ (a) 1 Mixed Speech 7 “.4 7 x 10‘ (b) Figure 5.4. Overcomplete Speech Mixture (3 x 2 Case): (3) Original speech sources, (b) Mixed speech 128 Separated Speech by 1.1 based LP Sampbs ‘ (a) I Convergence of Algebraic Matrix Distance hdex (AMDI) 0.8 1 AMDI 0.2 ~ \ 4 \ \\ 0 1 1 \\~ -¥—. 1 _- 4 1 1 1 1 0 0 5 1 1.5 2 2.5 3 3 5 4 4 5 5 Iterations 4 x 11') (b) Figure 5.5. Overcomplete Speech Mixture (3 x 2 Case): (a) Recovered speech, (b) Convergence of Normalized AMDI over 100 experiments 129 5.3.5.3 Simulation III: 5 x 3 Speech Data For this simulation, the simulation example comprises of an overcomplete speech mixture. Five utterance are mixed using a 5 x 3 mixing matrix, as given below. 0.4872 —0.9063 0.3285 0.6203 -0.8837 H = 0.2021 -0.0247 0.2714 -0.6752 0.2019 -0.8489 —0.4220 0.9047 0.3990 0.4223 The AICA algorithm is first used to estimate the mixing matrix W, while the speech sources are inferred using the L1 norm interior point linear programming. The AICA algorithm converges to the estimate in approximately 25000 iterations achieving an AMDI of 0.002. The estimated mixing matrix W exhibits a permutation, where the columns appear in the order 4, 3, 5, l and 2. Also the second and fourth columns of Whave an inverted sign. 0.6243 —0.3298 —0.8787 —0.4861 09015 W = —0.6724 —0.2773 0.1993 —0.2020 0.0228 0.3949 —0.9157 0.4240 0.8445 0.4160 In Figures 5.6-5.8, the 3 speech mixtures, the original 5 speech utterances and the 5 inferred speech sources are shown. Notice that the recovered signals from the overcomplete mixture in Figure 5.8 are not without artifacts (introduced primarily by the source inferring stage) but closely resemble the original speech utterances. A listening test proves this. The speech simulation data can be downloaded at http://www.egr.msu.edu/bsr/results. Observe that the two stage overcomplete mixture recovery approaches [16, 24, 25] are relatively simpler to implement as compared to other complex approaches [1, 4, 10]. Also combined BMMR and source inferencing provide satisfactory results as long as 130 the underlying probability distribution is sparse. The performance of the AICA algorithm depends on the quantity of preamble data available for estimation of the weight matrix. Although the algorithm may be implemented for any dimensions, the amount of data required may be immense. This can be dealt with by repeatedly training for weight estimation on a batch of data, provided the data batch size is adequate to represent the actual underlying source distributions. 5 x 3 Speech Mixtures 0.5 Eatk T i I 1 ET iii-€111 ;_, e“ 0 rr'jv saw-1,1 1- 11.. -0.5 L 0 1 0.2 W’tl' [ .. : t“ 1 .111“ N .’;u= 4 if“; ' ‘1- E O 9s'i Fir 15$“??? ".414? - .2 1 0 O 1 0.5 F 1‘1 13 E” 0 4' 11$ .H _0'5 4 l 1 1 L 1 1 O 1 2 3 4 5 6 7 Samples 11 104 Figure 5.6. Overcomplete Speech Mixture (5 x 3 Case): 3 Overcomplete Mixtures 131 Original Speech Data “ "=1:‘*‘1.+ii“”‘“* MW1—an —--——~ g wee-WM- . ’1' 1? “ti ‘0" 1 2 4 5 6 7 1 2 4 5 6 7 1% #3. a"? ' '1”: _q 1 1 2 4 5 6 7 Samples x 104 132 Figure 5.7. Overcomplete Speech Mixture (5 x 3 Case): Original 5 Speech Utterances Recovered Speech using AICA and IP-LP 0.5 1 . e a . . T 1 ,.~— 0 WWW wwwmwfirk+w~4fl§+- Figure 5.8. Overcomplete Speech Mixture (5 x 3 Case): Recovered 5 Speech Utterances 133 5.3.5.4 Simulation IV: Independent Identically Distributed (iid) Laplacian Data For this simulation, six iid laplacian sources are mixed by a randomly generated matrix of dimension 6 x 3 into three mixtures. Also presented are some of the results where the columns of the mixing matrix are close and the matrix is estimated using the ambiguity resolution scheme is presented in section 1.3.4. For the mixing matrix below the columns 5 and 6 are within 0.4 rad of each other. AICA is able to estimate the mixing matrix successfully using the data from 3 trials. 0.0226 -0.4698 0.5311 —0.2640 0.9597 -0.9498 H1: -0.0620 -0.0327 -0.5890 -0.4616 0.0309 0.1029 0.9978 -0.8822 0.6091 -0.8469 0.1793 0.0021 Figure 5.9.a shows the results using the proposed technique. The recovered weight are shown in red, while the blue dash-dot line shows the actual mixing matrix locations. In another simulation, the mixing matrix is 0.0839 0.6345 0.6606 0.9997 0.3875 0.3164 H2 = 0.8850 0.6525 0.7506 0.0029 0.1546 -0.0014 -0.4579 0.2144 -0.0149 0.0248 0.8509 0.9486 where columns pairs 2 and 3; 5 and 6 are within 0.35 rad of each other. The AICA can resolve these columns successfirlly using results from 7 trials. The final result is shown in Figure 5.9.b. 134 Final Iteration Weights and Environment (a) Final Iteration Weights and Environment (b) Figure 5.9.Unambiguous AICA estimate using multiple runs: (a) using H], (b) using H 2 135 The performance of the AICA algorithm depends on the quantity of preamble data available for estimation of the weight matrix. Although the algorithm may be implemented for any dimensions, the amount of data required may be immense. In such cases, the same data batch or a randomly drawn batch may be repeated during the weight estimation process provided the amount of data in the batch is adequate to represent the actual source distributions [7]. The convergence of weights certainly depends on the initialization. Another observation for lower dimension problems is that there are more chances for a weight vector to be trapped between two algebraically close data clusters. However, for larger dimension ICA problems, the probability of this occurrence is minimum. 5.4 Remarks In this chapter, the adaptation issues for the undercomplete and the. overcomplete BSR have been discussed. A new Algebraic ICA (AICA) algorithm based dot-product has been presented. The algorithm estimates the static mixing matrix under the assumption that the original sources are unimodal and possess positive fisher kurtosis. It is demonstrated that the proposed AICA algorithm for the case of overcomplete mixing is used for BMMR. The original sources can then be estimated using interior point L1 norm based LP technique. Also presented is AMDI, which is a matrix-distance measure to compare two matrices, which may be both scaled and permuted with respect to each other. Further, a heuristic technique to determine the BMMR solution when the mixing matrix columns are close to each other is also discussed. The algorithm has been successfully applied to the problems where the dimension of mixtures is half the number of original independent signal sources. Currently algorithm extension using a 136 neighborhood function is being investigated for gaussian and sub-gaussian distributed signals. The speech data for the examples can be accessed at the website http://www.egr.msu.edu/bsr/res% [26]. 137 Chapter 6 BLIND SOURCE RECOVERY APPROACHES FOR WIRELESS DS-CDMA AND WCDMA DOWNLINK SYSTEMS Code Division Multiple Access (CDMA) is based on spread-spectrum technology and is a dominant air interface for the proposed modern 3G and 4G wireless networks [145]. The transmitted CDMA signals propagate through noisy multipath fading communication channels before arriving at the receiver of the user equipment (UE). In contrast to classical single-user detection (SUD) algorithms, which do not provide the requisite performance for modern high data rate applications, multi-user detection (MUD) approaches require a lot of a-priori information not available to the UE. In this chapter, 138 three promising adaptive natural-gradient based info-theoretic blind user detection approaches capable of handling the wireless dynamic environments are proposed. The first approach, Blind Multi User Detection (BMUD), is the. process of simultaneously estimating multiple symbol sequences associated with all users in the downlink of a Code Division Multiple Access (CDMA) communication system using only the received wireless data. This approach is applicable to CDMA systems with short codes but becomes impractical for systems using long codes. Also proposed are two other adaptive approaches, namely RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal Component Analysis (RAKE-PCA) that fuse an info-theoretic stage into a generic RAKE receiver. This results in robust detection algorithms with performance exceeding the standard LMMSE detectors for both DS-CDMA and WCDMA systems under conditions of congestion, imprecise channel estimation and unmodeled multiple access interference (MAI). 6.1 User Detection in CDMA Systems The conventional detection schemes for CDMA signals only exploit second order statistics among user codes. Practically, however, the underlying user data symbol sequences are in general mutually (near-) “independent”. This is a key assumption, which propels the application of info-theoretic learning approaches such as information maximization [21] and minimum mutual information [14] to the realm of CDMA. The use of these algorithms is justified since a wide sense stationary slowly fading multipath CDMA environment can be conveniently represented as a linear multi-channel convolution model (see section 6.3.1) [39, 123]. The received CDMA signal can be considered as a sum of several non-gaussian random variables generated by the linear 139 convolutive transformations of statistically (near-) independent component user variables [58, 141]. This linear transformation accounts for the user spreading codes, the cell scrambling codes (in case of a cellular architecture), multiple channel paths and slowly fading channel effects [141]. Our goal is to estimate a linear transformation to counteract, as “optimally” as possible, the effects of a first transforrnation-- resulting in the recovery of the original user signals under the constraint of knowing only the user’s signature code (and the corresponding cell scrambling code for a two stage implementation) [58, 62, 122,123,141] For a generic DS-CDMA implementation using only a single stage of user spreading codes (e.g., using PN codes, gold codes or Walsh Hadamard codes etc.), info- theoretic blind source recovery techniques [62, 123, 124] can be employed to recover the user data sequences in the multi user environment. Blind Source Recovery (BSR) is the process of estimating the original “independent” user-specific symbol sequences independent of, and even in the absence of, precise system/channel identification [109, 110]. In typical downlink signal processing, where many of the system parameters are unknown, including the codes for co-existing users at any instant of time, one can use the blind techniques for better estimate of the user-specific signals [62, 123, 124]. Altemately, Blind Multi-User Detection (BMUD) algorithms based on the Natural Gradient Blind Source Recovery (BSR) techniques in both feedback and feedforward structures can be used [109, 110, 135, 139]. The “quasi-orthogonality” of the spreading codes and the inherent “independence” among the various transmitted user symbol sequences form the basis of the proposed BMUD methods. The proposed structures and algorithms demonstrate promising results as compared to the conventional techniques 140 comprising, e.g., Match Filter (MF), RAKE and block LMMSE [57, 98, 120, 121]. Our proposed algorithms can be implemented either using the batch or the more computationally efficient instantaneous update methods [1.10, 112]. Although batch implementations exhibit better performance [123], it is however accompanied by longer latency and more involved implementation structures not suitable for a UE/MS. Focus is placed on the instantaneous (or on-line) performance of the BSR algorithms only, which still exceeds the performance of other approaches [122, 124]. This (on-line) detection technique can be easily extended to other CDMA implementations, using relatively short scrambling codes, but becomes impractical in WCDMA downlink using long scrambling codes. In spite of the fact that very low bit error rates (BER) can be achieved with this technique and the detection process does not even require the knowledge of user’s own signature code, the recovered signal stream is at the symbol level with no user identification [62, 123]. Further, inherent sign and permutation ambiguities exist in BSR (scaling is not relevant as the recovered streams are either BPSK or QPSK etc.); therefore user identification is not possible unless some preamble or pilot data is transmitted periodically [62, 123]. This periodic requirement stems from the dynamic nature of the wireless communication scenario where users may dynamically enter or exit the system. The environment structure also varies widely due to the mobility of the MS/UE [4]. With these practical constraints in mind, new algorithms are proposed by an infusion of info-theoretic learning algorithms such as static Blind Source Recovery (BSR) (or Independent Component Analysis, ICA) and Principal Component Analysis (PCA) in the existing structure of a RAKE receiver [122, 125]. The purpose of this 141 additional info-theoretic stage is to counter, as best as possible, the unmodeled multiple access interference (MAI) and the additive (white gaussian) noise in the channel [122]. Further, use of a simple info-theoretic stage does not make the receiver structure too complex (in fact, it is simpler than most other proposed adaptive LMMSE implementations [44, 121]). RAKE-PCA uses up to second order statistics, as compared to RAKE-BSR, which utilizes higher order statistics. This results in slightly simpler update structure for the RAKE-PCA, but the performance of the RAKE—BSR is found to be better than RAKE-PCA. Further, assuming the score-firnction for the ICA update law to be chosen properly, the resulting equalization matrix in case of RAKE-BSR has relatively smaller element values (energy) as compared to the corresponding matrix for RAKE-PCA, which can be translated to the need of fewer memory bits. Lastly, both RAKE-BSR and RAKE-PCA use all the available user information, so that there are no issues of user identification and sign ambiguities in this case [122, 125]. The main advantage of both the RAKE-BSR and RAKE-PCA algorithms is the improved BER performance for the UE/MS without the need of any additional information than what a standard RAKE receiver already has. The proposed algorithms can be applied to both generic DS-CDMA and modern 3G UMTS implementations. The remaining chapter is organized as follows: the next section discusses the CDMA forward communication channel (or downlink) and the limitations of the conventional user detection approaches. In the third section, the signal models are derived for a generic single stage spreading DS-CDMA and the two stage WCDMA (or 3GPP UMTS FDD) downlink scenarios [143]. This is followed by a section on detection algorithms for UE/MS. Initially the structure of the well-known linear detectors such as 142 MF, RAKE, LMMSE etc. is briefly described. This is followed by a description of the proposed info-theoretic structures for DS-CDMA and WCDMA wireless networks. The fifih section presents simulation results under various test conditions and congestion levels for the systems. Simulation results using the BER performance as a performance measure support the superiority of the proposed info-theoretic formulations for various DS-CDMA and WCDMA situations. In the conclusion section, the findings of the work are enlisted and summarizing remarks are made on the effectiveness of the proposed algorithms. 6.2 CDMA Downlink Communication Channel Code Division Multiple Access (CDMA) uses spread spectrum techniques in which all the users share the same temporal and spectral resources [58, 98, 121]. In the downlink signal processing, each user is identified by a unique signature code, which is chosen to be “orthogonal” (or quasi-orthogonal) to the signature codes allotted to other users in the system. The Direct-Sequence Code-Division Multiple Access (DS-CDMA) is a promising data transmission technique capable of high data rates and better immunity to channel impairments and noise [98]. In DS-CDMA, the signal energy is “spread” over a wide frequency range, which reduces the effects of fading channels. CDMA systems utilize the given spectral and temporal resources more effectively as compared to other multiple access schemes such as FDMA and TDMA [145]. This results in a larger user capacity per given resources [57, 98, 121], making CDMA systems a cost effective choice for modern and proposed future wireless networks [1-4]. Other advantageous features include sofi capacity limit, cell frequency reuse, soft handover of users etc. [57, 98, 121]. Wide bandwidth CDMA will be a dominant technology for third generation 143 (3G) and future wireless communication systems and form an integral part of the CDMA2000, 3rd Generation Partnership Project (3GPP) Universal Mobile Telecommunication System (UMTS) Frequency Domain Duplex (FDD, also called WCDMA) and Time Domain Duplex (TDD) standards [1-4, 143]. Note that the UMTS TDD standard uses multi-carrier CDMA (MC-CDMA) instead of DS-CDMA. While indoor, personal, ad-hoc and local non-cellular implementations of CDMA use one stage of signature codes for user identification [27, 149, 152], the modern 2.5G and beyond wireless voice-data—multimedia networks such as IS-95, CDMA2000, UMTS FDD (or WCDMA) and beyond use both user-specific spreading codes and cell-specific long scrambling codes to permit cost-effective cellular implementations [14]. Unlike the uplink communication channel, where all user codes are known and the base station (BS) possesses superior signal processing capabilities and resources, the downlink path is constrained by restricted computational resources and limited knowledge of the user equipment (UE)/mobile station (MS) [3, 121]. In a CDMA system, the symbol train to a user may be detected using either a single-user or a multi-user detector. A single-user detector (SUD) such as Matched Filter (MF) detector, Zero Forcing (ZF) detector [121], RAKE etc., does not model the multiple access interference (MAI) due to the presence of other users and extraneous signals in its signal path, rather it treats all interfering users and disturbances as additive noise. This severely limits its detection performance and it fails to provide the performance levels vital to modern high data rate applications [4]. In contrast, a multi-user detector (MUD) [58, 120, 121, 146] includes all the users in the signal model. Significant improvement can be obtained with a multi-user receiver [121, 146]. However the optimal MUD [120] is computationally 144 intensive and requires several dynamic system parameters to be known precisely. Several linear multi-user detection (MUD) techniques such as Best linear unbiased estimator (BLUE) [146], linear minimum mean squared error (LMMSE) estimator [44, 59, 146] have been proposed for the wireless downlink based on the linear convolutive channel model. In practical situations most of these estimators require extensive knowledge of channel parameters and massive computations. Even if approximated using block level implementations, the system performance becomes sub-optimal, which further degrades quickly if the a-priori channel estimates are far from true system parameters. Adaptive approximations of these algorithms also require some a-priori information for correct initialization and may not match the performance of the algebraic counterparts under all conditions. For the proposed 3GPP UMTS FDD [2, 3] and future tightly synchronous CDMA standards, the user specific spreading is done by Orthogonal Variable Spreading Factor (OVSF) codes, while the cell specific complex scrambling is done using a set of long gold sequences (and/or Kasami sequences) [66]. Long codes have their own advantages such as code-hopping which results in similar performance for all users in the system (i.e., better QoS). In addition long codes also result in improved power control for users with relatively smaller data rates in the system and better rejection of extra-cellular interference (i.e., lower effective BER) [58]. Contrarily, the use of long scrambling codes makes the implementation of the exact linear detection algorithms and BMUD impractical due to the excessive computational requirements [44, 146]. Further, the actual MUD model also does not capture effects such as inter-cellular interference of signals and interference from other services sharing the same temporal and spectral resources. 145 Even the performance of block implementations of linear detectors such as block LMMSE (B-LMMSE) is severely degraded due to imprecise channel estimation (and/or inaccurate estimate of the phase reference for the principal path), if possible at all [59, 121,122] The restrictive downlink performance of the conventional detectors in practical communication systems and the higher performance requirements of the modern multimedia and other dedicated user services, necessitate better detection schemes for the downlink CDMA channel. 6.3 Downlink Signal Models for CDMA Systems In this section, the signal model for a generic lDS-CDMA implementation using one layer of spreading codes only is first discussed. In the subsequent sub—section, this model is extended for the modern wireless networks such as IS-95, CDMA2000, UMTS etc. These modern CDMA implementations utilize another stage of codes for cell-specific scrambling [1 -4]. 146 6.3.1 DS-CDMA Signal Model b.(t) User 1 Data Q —+ X] 320) I [92(1) 2 Multipath ,.(t) l ' AWGN : | Channel 1 : skfl) 191(1) User K Data 0 Q ——tl—;J_ Figure 6.1. Signal Generation Model for a QPSK DS-CDMA system In a typical downlink synchronous DS-CDMA system employed for indoor ATM and certain ad-hoc wireless networks [149, 152], each user’s data is spread using an individual signature waveform (or spreading code), then the data for all users is combined and transmitted by the Base Station (BS). Each User Equipment (UE) or Mobile station (MS) synchronizes itself with the BS using the broadcast synchronization/pilot channels, once synchronized the BS and UE/MS can communicate on the traffic channel (comprising of both data and control streams). Assuming the data transmission to be QPSK, i.e., comprising of two composite data channels created by a serial-to-parallel (S/P) stage, which are constellated in quadrature. At the UE/MS receiver, the received signal is first passed through a chip-matched filter and sampled at chip rate. 147 Considering a total of K active users in an L multipath environment and N transmitted symbols during the observation frame T p, the received signal is given by [82, 122, 124] N K L—l r(t)= Z Z Zt/Ekn(t)bk(n)h1(t)5k(t -"T-Tr) + n(t) (6-1) n=lk=11=0 where ekn is the energy of the nth symbol for the kth user, bk(n) e {ilzti} is the n'h complex symbol for the k’h user, 111 and r, are the l‘h path’s gain co-efficients and delay, respectively. n(t) is the additive noise and sk(t) is the k’h user’s signature code (or spreading sequence) generated by G-l ska) = Z ak(m)p(t—mTc); ak(m) 6 {—1,1}; os m s G—l (6.2) m=0 ak(m) is a real spreading sequence (i.e., any of the standard CDMA PN codes, such as the Gold, Walsh-Hadamard, Kasami etc. [66]) sequence for the k"I user containing G chips per symbol, i.e., G = T b / Tc . p(t) is a chipping pulse of duration Tc. T b being the symbol period. Under the assumption of time-invariance, the model (6.1) can be more compactly written in a vector-matrix format as 7 = 1155' + a (6.3) where, H is a (NG +L—1)xNG multipath propagation co-efficient matrix containing the channel coefficients. S is a NGxNG block diagonal matrix with the matrix of spreading codes forming the diagonal elements, 3 is an NG -d vector containing the user symbols, while '1? is the (NG + L —- 1)-d channel noise vector with covariance matrix Q. The structure of the above defined matrices and vectors is given by 148 ” kg 0 0 i h 0 H: L—l 0 ho _ 0 0 hL—l- S=diag[§ S S],§=[sl $2 sk] __ T b=[t(nT be)?" bT] wept/able) ifs—23202) fibre)? The compact linear model (6.3) is useful in deriving the linear detectors such as matched filter (MP), linear minimum mean squared error (LMMSE) etc. (see, section 6.4.1) for recovery of the transmitted symbol train for a desired user. However, the primary disadvantage of this model is the prohibitive dimensions of the constituent matrices, especially with longer frame durations and larger G, the matrices become excessively large, making this model unsuitable for any real-time implementation at UE/MS. Altemately, the signal model can be represented as a linear convolutive model [57, 62, 122-124], i.e., during symbol time, the received chip data is constituted of the chips corresponding to the currently transmitted symbol and its delayed multipath components as well as delayed chips from some previously transmitted symbols and the channel added noise. In this formulation, G chips arriving at the UE/MS during the 11'“ symbol time are computed as the sum of the chips from L multipaths of the n“ transmitted symbol and the multipath components of the previous J-l symbols (n-l, ..., n-J-l), where J $1.410 (6.4) 149 rL being the maximum chip delay in L multipaths. The n’h received symbol can be expressed as K L—l m(t) = z bk(”)\/5kn(t) Z h1(t)Sk(t-nT—Tr)+ "n(t)+ 1:0 (6.5) k=1 J-l K L—l Z Ebro—111%.- n(t) Zhrmstlt —r (6.22) where [)3 and 173 are the estimates of the K dominant eigen-values and eigen-vectors of the auto-correlation matrix R. These eigen values and vectors can also be computed adaptively, see [52, 62]. 6.4.1.4 Channel Estimation Using The Common Pilot Channel (CPICH) The Common Pilot Channel (CPICH) is a fixed rate (30 kbps, G=256) downlink physical channel that carries a pre-defined bit sequence [1-4]. Each cell has a Primary Common Pilot Channel (P-CPICH) and possibly one or more Secondary Common Pilot Channels (S-CPICH). What configuration of these pilot channels is available and whether they can provide phase reference and possibly channel estimation for the dedicated data channels DPCH is communicated to the UE/MS by higher-level signaling. It is possible in a UMTS implementation that neither the P—CPICH nor any S-CPICH is a phase reference for any downlink DPCH [4]. If applicable, CPICH allows the UE/MS to equalize the channel in order to achieve a phase reference with the SCH (Synchronization Channel). CPICH also allows active estimation for power control. Primary CPICH always uses the same channelization code. 158 6. 4.1.4.1 Channel Estimation Procedure In order to outline a simple and practical channel estimation technique using CPICH [1, 4], we assume that the transmitted pilot comprises of a stream of a single symbol train of 1—1’ , and uses an all ones spreading code (i.e., the first OVSF code) 1. For each independent multipath, multiply the incoming symbol train with the corresponding delayed scrambling sequence. 2. Remove the modulation form CPICH by simply multiplying the CPICH data by its conjugate, i.e., 1+i. The resulting channel estimate is noisy because of AWGN and multiple-access interference. 3. Pass the noisy channel estimate through a smoothing filter to achieve better noise immunity. This filter can be either a moving average window of length 2M+1, e.g., . 1 E; hi = pik (6-23) 2M +1 k=_M ’ Since, a moving UE/MS represents a dynamic channel, the channel estimates may be processed using a relatively smaller factor M followed by a single-zero smoothing filter II,- = (l — p)l-1;- + p11,; where 0 S p S1 (6.24) 4. Decimate or interpolate the filtered channel estimate obtained in step 2 to match the data rate of the CPICH to the data rate of the DPCCH/DPDCH. This simple technique works well in most cases because the channel is assumed to be stable for the symbol duration. 159 CPI CH Smfotlrthmg Data kRate hi from each multipath 1 ter Mate lng Channel Estimate (CPICH)” ' Figure 6.3. Channel Estimation Using CPICH This simple technique works well in most cases. The channel is typically assumed to be stable for symbol duration. 6.4.2 Proposed Blind Source Recovery Detection Schemes In this section, three new MUD detection algorithms using BSR techniques are presented. 6.4.2.1 Natural Gradient Blind Multi-User Detection (BMUD) Algorithms. Blind Multi-user Detection (BMU D) is the process to blindly estimate all the. user symbol sequences directly fi'om the received composite CDMA signal using the Blind Source Recovery (BSR) techniques. BSR framework implies recovery of original signals from environments that may include transient, convolution and even non-linearity [109, 110, 112, 126]. The linear BSR algorithms [110, 112, 129] have been developed for linear convolutive mixing environments by the minimization of mutual information (Kullback Lieblar Divergence) using the natural gradient subject to the structural constraints of a recovery network. The natural gradient BMUD network can be either in the feedforward or the feedback configuration [112]. The proposed BMUD algorithm adaptively estimates a set of matrices to counter the linear convolutive environment model (6.9). The justification for BMUD algorithms is based on the convenient convolutive signal model representation of DS-CDMA systems, see (6.9), and the reasonable 160 assumption that the various transmitted user symbol sequences are mutually “independent” as they are generated independently of each other. In this framework, both the transmitted sequence and the mixing matrices in the model (6.9) are unknown to the user. The only known entity to the user is the self-identification code. Other available prior information is the nature of transmitted data, which is assumed to be quaternary sub-gaussian distribution, e. g., QPSK data distorted by the multipath channel and additive noise. There exists enough information to apply the info-theoretic natural gradient Blind Source Recovery (BSR) algorithms for BMUD in this case [55, 62, 101, 123,124] Further assume that the DS-CDMA channel is not over-saturated and K S G. The proposed BMUD algorithms do not require any pre—whitening of received data. However, in most modern WCDMA, G is chosen to be very large and in general K < G. Therefore, it is computationally advantageous to pre-process the data for dimension reduction to K which is the actual number of principal independent symbol sequences (or users) in the received data. The process of pre-whitening will also remove the second order dependence among the received data samples and some of the additive noise [33, 39, 62, 101, 123-125]. The data pre-whitening can be achieved either online using adaptive principal component analysis (PCA) algorithms or it may be done algebraically [33, 52, 60,98] The pre-whitening matrix can be computed as _. l W = D, AV,” (6.25) where D, and V, are as defined in section 6.4.1.3. The whitened version of (6.9) is given by 161 rnw = W(Hobn + Hlbn-l + n") E [701)" + filbn-l (6.26) where 5;” represents the G-d whitened data received during the n’h symbol time and H0 ,H1 represent the equivalent G xK convolutive mixing matrices for the current and the previous symbol. BMUD algorithms blindly adapt a set of matrices to estimate the independent user symbols y" at the n’h instant. This is followed by a decision stage to interpret, as best as possible, yn and estimate the corresponding user symbol estimates b” also at the n'h instant. 13.. = My.) (627) where 111(.) : represents the (nonlinear) decision stage. The update laws for both feedforward and feedback structures [110, 112] have been presented. In the next section, the performance of the proposed algorithms is discussed and compared with conventional user detection algorithms. 162 6. 4. 2. I . I F eedforward BM UD Configuration: 3 Figure 6.4. Feedforward Demixing Structure For the feedforward configuration, the BMUD stage output is computed as K y" = Worry + Z Wkrgtk ' (6.28) k=l The update laws for this feedforward structure can be easily derived, using the techniques outlined in Chapter 3. For the feedforward parametric matrices W0 and Wk , the update laws are We m(I—wonbf.’ )Wo (6.29) and H 4W1 06 (1 —¢(yn)yrII-I )Wk —¢(yn)(rn”:k) (6.30) where o(.) is an element—wise acting score function [12, 27, 36, 37, 60, 114, 127, 130, 136], and I is a K-d identity matrix. 163 For the initialization of the algorithm, W0 is chosen to be either an identity or a diagonally dominantly matrix, while all other matrices Wk are initialized to have either random elements with a very small variance or as a matrix of all zeros [110, 112, 155]. Note that no matrix inversion is required for the feedforward algorithm. 6. 4. 2. 1.2 Feedback BM UD Configuration I: 3 £__ 3 Figure 6.5. Feedback Demixing Structure 1 In the feedback configuration I, the output is estimated by K ya = W0“1 [Div — Z WkYn—k] (6-31) k=l The update laws for this structure using the natural gradient have been derived in [112]. The update law for the matrix W0 is given by H AW0 °C —Wo (1 -¢(yn)yn ) (6.32) While for the feedback matrices Wk , the update law is 164 H 4W1 DC We [u(ynm—k) (6.33) The matrices in this case are also initialized in a fashion similar to the feedforward case. However, note that at least one matrix inversion is required in this formulation. 6. 4. 2. 1.3 Feedback BM UD Configuration II: r- “‘4‘“ 3 E rX——>"6 Figure 6.6. Feedforward Demixing Structure 11 An alternate feedback configuration, see Waheed et a1. (Waheed, 2003 #630}, implements the feedback structure without the need for any matrix inversion. For this feedback configuration 11, the BMUD stage output is computed as K ya = WOrriv 11 Z kan—k (6.34) k=1 The update laws for this feedback structure have been derived in [110, 112, 155] and are given by 165 4% at (I «out»? )Wo (635) Mt at (I won»? )Wk + <00.)th ’ (6.36) In case the channel can be estimated [57], the performance of the proposed BMUD algorithms may be adjudged by the diagonalization of the absolute value of the global transfer function. The global transfer function presents the combined effect of the complex mixing and demixing transfer functions. For the two symbol convolutive models for the case of L S G , the global transfer function for the natural gradient algorithms in the z-domain are given by [123-125]: G = GO + (112*1 + 622-2 (6.37) where, for the feedforward configuration Go = Wofio = WOWHO’ G1 = W017] + ME, = WOWHI + WIWHO and , (6.38) 62 1" W1‘71 = WIWHI while, for the feedback configuration I G0 = WO-lfio = WO—IIVI‘Io, Gl =W0"1(H1—W1)=W0’1(IWI1 —W1) and (6.39) 02 = and for the for the feedback configuration 11 GO = WOHO = WOWHO, G1 = Wofil + W1 = WOWHI + W1 and (6.40) 02 = o The proposed BMUD algorithms as formulated result in recovery of the user symbols directly. The algorithms can be conveniently applied to DS-CDMA systems 166 using only user-specific spreading sequences. They may also be extended to CDMA systems using relatively short scrambling codes, though the dimension of matrices may still become large for DSP implementations in a UE/MS. The WCDMA system uses long codes in the downlink, making the application of these algorithms impractical because of the requisite dimension of the demixing network matrices. 6.4.2.2 RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal Component Analysis (RAKE-PCA) Detectors RAKE-BSR and RAKE-PCA are two new proposed adaptive detectors [122, 125], which utilize the same knowledge as a RAKE receiver. An info-theoretic adaptive weighting matrix of dimension G x6 is introduced into the RAKE structure, which gives a big performance boost to the RAKE receiver. The performance of RAKE-BSR/RAKE-PCA exceeds the performance of LMMSE detectors under the conditions of high network congestion, imprecise channel estimation, and unmodeled inter-cellular interference etc [57, 122, 125]. The closed form structure of these proposed detectors is given by [122] 3 _ Si” WHH'r' for DS-CDMA Systems (6 41) "RAKE—ICA / PCA — 51H CH WHHF for WCDMA Systems ° where W = diag[A A A], and A is the G x G matrix that is adaptively estimated either using static BSR (ICA) or PCA algorithms. It is proposed to adapt the matrix A using the natural gradient update laws [7, 9, 122]. However, there exist several other methods for ICA/PCA and any other suitable method may be used for these adaptations [21, 60, 77]. This blind adaptation of the A matrix has several advantages and improves the performance of the overall equalization process in several ways. Firstly, it can counter artifacts in the estimated channel co- 167 efficients H . Secondly, the channel estimation process (as in RAKE receivers) may be limited by the structure (such as number of fingers) and may estimate only a few of the dominant channel parameters. W stage tends to counteract this anomaly, as best as possible, and provides better performance than LMMSE in such cases [122]. Thirdly, this adaptive stage minimizes the effect of the additive channel noise, which may have an unmodeled intricate underlying structure. Lastly, the natural gradient ICA /PCA algorithms inherently reduce near-far problems by removing any ill conditioning in the signal space for all the users in the system. This results in all the mobile users in the system to have a BER performance similar to the average BER performance of the downlink channel [57, 122]. The matrix A is adaptively estimated using the update laws A(k +1) = A(k) + ntAA(k) (6.42) where H . AA(k) = {(1 — rp(y(k))y(k) )A(k) for static BSR (or ICA) (6.43) (1 — y(k)y(k)H )A(k) for PCA and (0(.) is a nonlinear score function [7, 11, 12, 28, 36, 126, 130, 138] which depends on the underlying distribution structure of the signals involved. For QPSK signal, a suitable score-function [136] is $101) = 0m — “1' 031111031 Re{yi}) + tanh (.31 MM») (6-44) Of these proposed RAKE-BSR/RAKE-PCA structures, RAKE-BSR exhibits relatively faster and more stable convergence [110, 155]. However, in CDMA systems the underlying code structure is chosen to be “orthogonal”, and RAKE-PCA may converge to a slightly lower BER solution if the channel impairments are linear in nature. Note that in (6.41), if the channel estimate H [57] is either not available or changes very 168 dynamically, the detector can be estimated without using the channel estimate and the structure reduces to Matched Filter BSR/PCA, i.e., 8371177 for DS-CDMA Systems . _ = 6.45 “W BSR/PCA {sfiCHWr for WCDMA Systems ( ) I) Q" The performance of this structure is better than MF alone, and approaches RAKE performance as the underlying matrix A converges. However, in this chapter, this structure will not be discussed any further. 6.5 Simulation Results To verify the performance of the proposed BSR algorithms, a series of experiments for both DS-CDMA and WCDMA systems were conducted. Some selected results are being provided for the purpose of illustration. The proposed algorithms can be easily extended to multirate CDMA transmission. However, for a clear comparison of the proposed algorithms with the conventional approaches, the simulations have been restricted to the case where all the users have the same data rate. Assuming a constant spreading gain, G = 64. The channel is assumed to be a wide sense stationary slowly fading with several possible multiple paths to the UE/MS. The transmitted signal is also corrupted by additive white gaussian noise (AWGN) during the transmission. The simulated SNR range for all simulations is from —10 to 20 dB. The dominant multi-path delays and attenuation co-efficients for the signal received at the UE/MS are assumed fixed (i.e., the UE/MS is assumed to be static). Further all the multi-path attenuation co-efficients are assumed to be complex, i.e., each multipath applies both scaling and rotation to the propagated signal. For all the included simulations, the received CDMA signal comprises 169 of five multipaths with delays of 0, 1, 2, 3, 4 chips. The corresponding channel attenuation co-efficients are chosen to be. h=[0.25+0.18i 0.21+0.14i 0.18+0.11i 0.14+0.11i 0.11+0.07i],respectively. The multipath channel estimate is either estimated online [57] or assumed to be known unless otherwise specified. The score fimction for both the BMUD and proposed RAKE-BSR algorithms is chosen to be (12,-(n): yi —(tanh(Re{y,-})+ tanh(1m{y,-})). The UE/MS and BS are assumed to be in perfect synchronism. A decaying time-adaptive learning rate is chosen for all the adaptive algorithms. The B-LMMSE algorithm is applied on a block size of G chips, the auto-correlation matrix (for the B-LMMSE algorithm) is computed from the whole ensemble of the received data. The conjugate of the channel filter H, is applied recursively. These steps were done to ensure that the performance of the simulated B-LMMSE is not restricted due to implementation. In order to efficiently utilize the space, for each simulation, presentation is restricted to just two congestion scenarios of 20 users (approx. 30% congestion) and 50 users (approx 80% congestion). The final symbol decision stage for all algorithms is given by y(yn) = sign(Re{yn})+sign(Im{yn})i (6.46) Any sign ambiguity in the recovered symbols is fixed using the pilot bits in each user’s dedicated data channel. 6.5.1 Simulation 1: DS-CDMA System In this simulation, a DS-CDMA systems is simulated and modeled as in (6.9). All the 5 multi-path channel co-efficients and delays are assumed known. The system has been 170 simulated using two types of spreading codes, namely gold codes with a spreading gain G = 63, and OVSF (or Walsh Hadamard) codes with a spreading gain G = 64. The simulation results are presented below in Fig. 6.6. It is evident from the presented results that the performance of proposed BMUD algorithm far exceeds any other user detection method. Under very good SNR conditions, the BER performance of B-LMMSE using gold codes improves over the proposed BMUD algorithm. This is because the channel estimate is assumed to be perfectly known for B-LMMSE and secondly, the BMUD algorithm is applied in instantaneous (on-line) mode with 5000 iterations only. The performance of the BMUD exceeds B-LMMSE if the algorithm is allowed to adapt longer. 171 DS-CDMA Downlink using Gold Codes 0 - G = 63, H Known 10 I I I .4 4 A A _ 1' '3 “L w;— ""~9—1‘—.— _ I“ 5.. 10 I ‘ggggbl-\j1-V---~V---——V -\ ‘-~~ : —e— MF 20 users k‘ ‘V\\g 1 r -- +— RAKE 20 users \\ \‘\«i * —e— B-LMMSE 20 users 8. ‘ ~-e’— B-MUD 20 users \\ < —e— RAKE-PCA 20 users \19 —e— RAKE-BSR 20 users 4, — A~ MF 50 users : - RAKE 50 users — o- B-LMMSE 50 users - v— B-MUD 50 users — o— RAKE-PCA 50 users — a- RAKE-BSR 50 users -10 -5 0 5 10 15 20 BER 10- VWVTVr (a) o DS-CDMA Downlink using OVSF Codes - G = 64, H Known 10 . . . . . , 10 MF 20 users RAKE 20 users B-LMMSE 20 users \ B-MUD 20 users RAKE-PCA 20 users RAKE-BSR 20 users MF 50 users RAKE 50 users B-LMMSE 50 users - v— B-MUD 50 users - <>— RAKE-PCA 50 users 5 - a- RAKE-BSR 50 users -10 -5 0 5 10 15 20 811111111 (b) Figure 6.7. DS-CDMA downlink performance for K=20, 50 users (3) using Gold Codes G=63, (b) using OVSF Codes G=64. 172 The performance of the proposed RAKE-PCA and RAKE-BSR algorithms is also better than B-LMMSE at lower SNRs and under congestion of the DS-CDMA system. However, for lower congestion of the system and high SNR,‘the B-LMMSE performance supercedes these algorithms if the channel estimate is perfect. In case, the channel estimate is imprecise [57], the proposed algorithms are always better than B-LMMSE. 6.5.2 Simulation II: WCDMA (UMTS FDD) System For the WCDMA Downlink simulation, 5 multipaths are assumed to exist similar to the previous simulation, all the user-specific codes are OVSF with a spreading gain G of 64. The long scrambling code has the frame-length of 38400 chips (10ms). A performance comparison of the proposed RAKE-BSR and RAKE-PCA algorithms with conventional RAKE, B-LMMSE under various network conditions is presented in Figures 6.7 and 6.8. In the first comparison (Fig. 6.7.a), the channel estimation is done for all the multipaths in the signal generation model. It is observed that the performance of the proposed algorithms RAKE-PCA and RAKE-BSR is very similar to B-LMMSE for lower SNR values. Under good SNR conditions, the performance of the B-LMMSE algorithm is better than the proposed algorithms. However, note that for the B-LMMSE the auto-correlation matrix is estimated using 5000 symbols, while only 1000 instantaneous adaptations (for K = 50) are done for the proposed algorithms. In case more adaptations are done, the performance of proposed algorithms will approach the LMMSE limit, but even the performance attained with a limited number of iterations exhibits their effectiveness. 173 o UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 5 j I 10 BER 3 —+— RAKE 20 users .9” B-LMMSE 20 users \ 3 3 ~A— RAKE-PCA 20 users 10' r —a— RAKE-BSR 20 users E —+‘- RAKE 50 users 1 i —e— B-LMMSE 50 users \ it ’ —A— RAKE-PCA 50 users ‘5 —e— RAKE-BSR 50 users 0 .4 l L I l l 10-10 -5 o 5 1o 15 20 SNR (a) 0 UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 3 10 fl 1 ' ' ' ‘~..‘ ===‘¢‘-.- 1 L ,...,“ 1““9 10"? t (I - LIJ 10 2 r m . _w RAKE 20 users ——0— B-LMMSE 20 users _3 -—er— RAKE-PCA 20 users 10 .— —e— RAKE-BSR 20 users 3 —+— RAKE 50 users [ -, e— B-LMMSE 50 users - A- RAKE-PCA 50 users ’ —B— RAKE-BSR 50 users ‘4 1 1 1 1 1 J 10-10 -5 0 5 10 15 2'0 SNR (b) Figure 6.8. WCDMA Downlink System for K=20, 50 users (a) Performance Comparison with Perfect Channel Estimation, (b) Performance Comparison with Imperfect Channel Estimate 174 fl I O UMTS Downlink Simulation - G = 64, L = 5, Rake Fingers = 5 10 T Y i ‘ .J BER i —¢—— RAKE 20 users . -e— B-LMMSE 20 users ‘ - ~e— RAKE-PCA 20 users J . —e— RAKE-BSR 20 users 1— RAKE 50 users » -o— B-LMMSE 50 users —A— RAKE-PCA 50 users —a- RAKE-BSR 50 users .2 1 L l l l l 0-10 -5 0 5 10 15 20 SNR Figure 6.9. WCDMA Downlink System for K=20, 50 users: Performance Comparison with Imperfect Channel Estimate and Inter-Cellular Interference In the second comparison (Figure 6.7.b), the channel estimate is restricted to the three dominant paths only, which correspond to the choice of delays 0, l and 2 in this case. In this case it is observed that the detection performance of the proposed RAKE- BSR and RAKE-PCA algorithms exceeds B-LMMSE at all SNRs. Comparing both RAKE-PCA and RAKE-BSR, it is observed that the performance of RAKE-PCA is approximately 1% better than RAKE-BSR. But RAKE-BSR has other advantages (as discussed in section 6.4.2.2) of stability and smaller energy of the computed filtering matrix W. Another important observation in this case is that at very low SNRs, RAKE gives the best performance, but as the SNR improves the proposed algorithms can achieve very small BER. Therefore, a SNR based switching mechanism can be developed 175 to switch between the standard RAKE and the proposed algorithms, which just constitute an additional adaptive stage in the structure of the RAKE receiver. In the third scenario (Figure 6.8), both the channel estimate is assumed to imprecise as in the previous comparison with only three dominant multipaths out of five estimated. In addition, the received signal is corrupted by extra-cellular signals of the neighboring cell BS. This is a realistic scenario in busy metropolitan areas, where there exist several dominant multipaths and the cell size is also kept relatively small to maximize the number of users per unit area. Inter-cellular interference is also critical when the UE/MS is on the cell boundary and undergoes a soft (or soft-sofier) hand-over. The intercellular interference is unmodeled MAI and severely limits the performance of the detection algorithms. For the purpose of simulation this auxiliary BS interference is assumed to have half the energy of the primary BS. It is observed that the proposed RAKE- BSR/RAKE-PCA algorithms exhibit better immunity to this excessive unmodeled MAI and retain their qualitative performance advantage over the conventional algorithms. 6.6 Remarks Three adaptive info-theoretic algorithms for CDMA systems have been proposed. Firstly a purely blind multi-user detection (BMUD) algorithm for the case of DS-CDMA systems, or WCDMA systems using short codes has been proposed. This algorithm gives better performance relative to all other user detection algorithms. This performance advantage further dominates conventional techniques under practical constraints of accuracy in on—line channel estimation and unmodeled environment MAI effects. Two other info-theoretic extensions have also been proposed on top of the standard RAKE detector namely RAKE-Blind Source Recovery (RAKE-BSR) and RAKE-Principal 176 Component Analysis (RAKE-PCA). These detection schemes add an adaptive info- theoretic stage, based on higher-order statistics for RAKE-BSR and second-order statistics for RAKE-PCA, to the standard RAKE detector. This makes the resultant hybrid detector more robust to imperfections in channel estimates; unmodeled MAI and other slowly varying channel effects etc. Of these proposed algorithms, RAKE-BSR is more immune to synchronization errors between UE/MS and BS, possesses faster convergence and is better capable to cater for various dynamic and time-varying channel effects as compared to RAKE-PCA. However, for the presented WCDMA results RAKE- PCA demonstrates a slight performance advantage. This is due to the fact that the simulated channel imperfections are only linear, the channel synchronization is perfect and the underlying code structure is orthogonal. 177 Chapter 7 CONCLUSIONS This dissertation is dedicated to the stochastic blind adaptive nonlinear signal processing problem of Blind Source Recovery (BSR). Blind Source Recovery represents unsupervised estimation of original source signals directly from the measurement space, with or without precise MIMO environment identification. A discussion on the BSR problem definition, applications, choice of the state space representation and practical issues is included in Chapter 1. In chapter 2, a focused survey of the major theoretical development in the field of BSR as related to this work has been provided. The natural gradient and its application to the static and dynamic BSR adaptation has also been reviewed. Chapter 3 is the crux of this work. Using state space as a compact network representation, the BSR optimization framework has been described for both non-linear and linear structures. The natural gradient learning has been incorporated to develop equivariant linear state space update laws for the cases of minimum phase and non- minimum phase environments. In the case of minimum phase environments, separate adaptation algorithms have been developed for feedforward and feedback demixing 178 networks. It is demonstrated, via simulations, that the derived BSR algorithms exhibit robust convergence irrespective of the adopted network structure and various parameter initializations. Further, it is observed that based on the initialization of parameters the overall network may converge to different but equivalent state space representations. Since BSR denotes blind recovery, state space BSR algorithms have been extended using parametric source distributions. A couple of formations are introduced in chapter 4 with the capability to estimate the types of the underlying source distributions and eventually control the shape of the nonlinear score function in all the BSR update laws. These complementary adaptation algorithms add the capability to estimate the underlying distribution structure of original sources to the BSR framework. This results in extensive algorithms that can be applied in an autonomous fashion to mixtures of multiple source distributions. This is an important development as typically the nature of corruption in the measurement space is unknown. It is demonstrated via simulations that the proposed adaptive score functions can indeed efficiently capture the stochastic nature of actual sources and cause proficient convergence of BSR algorithm for mixtures of several different types of sources. Although it is theoretically tractable to have the dimension of mixture space equal to the number of original sources, but in a blind setting the actual number of sources is unknown. This deficiency in the a priori knowledge or the structure of a recovery problem may ensue an undercomplete or an overcompete BSR problem. The transformation of an undercomplete BSR mixture into a complete BSR mixtures has been discussed and related issues have been pointed out. For the overcomplete mixtures, there exists no solution unless the mixture can be projected onto a sparse temporal or spectral 179 analysis space. Algebraic ICA algorithm has been proposed to perform static mixing environment estimation for sources that exhibit sparseness in the time-domain. AICA approach has been combined with an Ll-norm interior. point linear programming approach to recover overcomplete mixtures of speech. Chapter 6 discusses the application of BSR framework to the domain of CDMA wireless communication networks. An in-depth signal model has been utilized to develop a good understanding of the basic DS-CDMA implementations and the modem 30 WCDMA or UMTS FDD implementations. Three BSR based solutions are presented for the critical forward link multi-user detection in these networks. Extensive comparison of the proposed BSR user detection schemes with conventional downlink user detection formats indicate their effectiveness and robustness under conditions of congestion, channel estimation artifacts and unmodeled multiple access interference. Quadratic Independence Measure (QIM) and Algebraic Matrix Distance Index (AMDI) are two new BSR performance measures that have been an outcome of this work and are presented in appendices D2 and D.3. The major highlights of this work include 0 Using an info-theoretic, parametric, state space BSR optimization framework and efficient natural gradient learning, robust BSR update laws have been derived for various state space arrangements. 0 Structural environment configurations such as the minimum phase and the non- minimum phase have been treated comprehensively. Linear feedforward and feedback structures for BSR have been proposed and implemented. 180 The convergence of the natural gradient BSR algorithms has been investigated, via extensive simulations, for various demixing network structures and parameter initializations. Investigation into efficient parametric modeling of the source distributions, resulting in two new source density models proposed for sub- and super-gaussian sources, respectively. Adaptive estimation of underlying source distributions was achieved for linear convolutive BSR. This results in completely blind algorithms, which do not even require the knowledge of the type of underlying source distributions to be estimated. The comprehensive algorithms are also capable to demix / decorrelate mixtures of multiple source distributions. A new data space, cost-effective, algebraic independent component analysis algorithm, called Algebraic ICA (AICA) has been introduced. In conjunction with source inferencing algorithms for overcomplete mixtures, AICA has been applied to the realm of overcomplete blind source separation of speech with encouraging results. Three new BSR based multi-user detection algorithms have been proposed to the field of CDMA wireless communication networks. The proposed algorithms have been simulated for both generic DS-CDMA and modem 36 WCDMA (and CDMA-2000) communication standards. The BSR MUD algorithms demonstrate a favorable performance advantage to conventional user detection schemes for CDMA wireless communication networks. 181 Although, the dissertation concludes here, but there are several, yet to be tackled, open issues in BSR. Some of the possible extensions of this work are 0 For the case of nonlinear environments, efficient BSR algorithms need to be derived and their performance needs to be characterized for various different types of nonlinearities. - BSR performance evaluation of the Algebraic ICA algorithm needs to be done on more realistic sparse overcomplete data, e.g., electromedical data (i.e., EEG, MEG to name a few), astrophysical data etc. o BSR formulations have been applied to the CDMA downlink wireless communication networks in this work. Currently the work is being extended to the CDMA uplink channel. There are several other applications in communication systems, wireless networks, acoustics, imaging, text, astronomy, finance etc., where application of BSR techniques can be very useful. 182 APPENDICES 183 Appendix A _ List of Abbreviations and Acronyms 3G 3GPP AICA AMDI ARMA ARMAX AWGN BER BLER B-LMMSE BLUE BMMR BMUD BP BPSK BS BSD BSR BSS cdf CDMA 3rd Generation (wireless networks) 3rd Generation Partnership Project Algebraic Independent Component Analysis Algebraic Matrix-Distance Index Autoregressive filter Autoregressive Moving Average Filter Autoregressive Moving Average Filter with Exogenous Variables Additive White Gaussian Noise Bit Error Rate Blocking Error Rate Block Linear Minimum Mean Squared Error Detector Best Linear Unbiased Estimator Blind Mixing Matrix Recovery Blind Multi-user Detection Basis Pursuit Binary Phase Shift Keying Base Station Blind Source Deconvolution Blind Source Recovery Blind Source Separation Cumulative Density Function Code Division Multiple Access 184 CDMA2000 CPICH CRC DM DPCCH DPCH DPDCH DS-CDMA EPP F B FDD FEC FF FIR GGASF GICA or geo-ICA GTF HASF ICA IID or iid IIR IP-LP lS-95 KL LMMSE LMMSE MA MAI MCBD MF MIMO MISI Wireless Standard Name (U.S. equivalent of WCDMA) Common Pilot Channel Cyclic Redundancy Check Distortion Measure Dedicated Physical Control Channel Dedicated Physical Channel Dedicated Physical Data Channel Direct Sequence Code Division Multiple Access Exploratory Projection Pursuit F eedbaek Network Frequency Domain Duplex Forwared Error Correction Codes Feedforward Network Finite Impulse Response Generalized Gaussian Adaptive Score Function Geometric Independent Component Analysis Global Transfer Function Hyperbolic Adaptive Score Function Independent Component Analysis Independent Identically Distributed Infinite Impulse Response Interior Point Linear Programming EIA Interim Standard 95 (also called CDMAone) Kullback Lieblar Divergence Linear Minimum Mean Squared Error Linear Minimum Mean Squared Error Detector Moving Average Filter Multiple Access Interference Multichannel Blind Source Deconvolution Matched Filter Multiple Input Multiple Output Multichannel Inter-symbol Interference 185 MLE MS MSE MUD OSE OVSF P/ S PCA pdf PDF PN QIM QoS QPSK RAKE-BSR RAKE-PCA S/P SF SINRM SM SNR SUD TB TDD TF TFCI TPC UMTS WCDMA ZF Maximum Likelihood Estimation Mobile Station Mean Squared Error Multi-user Detection Overcomplete Source Estimation Orthogonal Variable Spreading Factor Codes (Variable Length Walsh Codes) Parallel to Serial Converter Principal Component Analysis Probability Density Function Probability Distribution Function Pseudo-noise Quadratic Information Measure Quality of Service Quaternary Phase Shifi Keying RAKE with Blind Source Recovery Detector RAKE with Principle Component Analysis Detector Serial to Parallel Converter Spreading Factor Maximum Signal to Interference plus Noise Ratio Separation Measure Signal to Noise Ratio Single User Detection Transport Block Time Domain Duplex Transfer Function Transport Format Combination Indicator Transmit Power Control User Equipment Universal Mobile Telecommunication System Wideband Code Division Multiple Access Zero Forcing 186 Appendix B Kullback Lieblar Divergence: A Performance Functional for Blind Source Recovery Kullback Lieblar Divergence (or the relative entropy) [3 8, 69, 72, 100] between two probability laws Q(X = xi) = qi and P(X = x,) = p,- is defined as KX= jp,m(%)dx (13.47) xeX The relative entropic divergence of the mutual probablilty distriburion function (pdf) of a random output vector y (Kullback-Lieblar divergence) with respect to the product of marginal distribution functions is a measure of the independence of the maginal output variables. Kullback—Lieblar divergence is, therefore, an appropriate performance measure for blind source recovery based on the minimization of mutual information. 187 In the continuous case this relation is given by L(y>= pronn ,IW) dy (8.48) yEY pri(yi) '=l where py (y) is the probability density function of the random output vector y p Yi (Yr) is the probability density function of the ith component of the output vector y This functional L0») is a “distance” measure with the following properties i) L( y) 2 0 (13.49) ii) L(y) = 0 if‘fpyb’) = H pyi (yi) (B50) i=1 This measure provides an estimate of the degree of dependence among the various components of the recovered output signal vector and is appropriate to be used as the objective functional in the optimization framework of the BSR problem. For the discrete case, the functional becomes L(y)= Z py(y)ln "py(y) (B-51) yEY pri (yi) i=1 The functional can be further simplified assuming the statistical properties of the output signals to be ergodic. 188 k L(y(k)): Z py(y(k))1n py(y( » (3.52) ”Y H py, (yi(k)) i=1 Further simplification can be achieved using the assumption that as the algorithm approaches convergence the components of the output vector y(k) will become statistically less dependent, ultimately approaching independence as close as possible. Therefore the above functional can be re-written in its simplified mutual information form using the entropy of signals, namely, n L(y(k)) =-H(y(k))+Z;Hi(yi(k)) (3.53) ,= where H(y(k)): Entropy of the signal vector y(k), given by r— I py(y)ln|py(y)|dy Continuous Case : _ = er . H(y) E[lnlpy 00'] T - Z Py (Y)ln|Py()’)l Discrete Case (B 54) L er H ,- ( yi(k)) : Marginal entropy of a component signal yi(k). 189 Appendix C ‘ State Space Network: Specialized Filtering Structures One of the advantages of using state space model is that most signal processing filter structures form special cases where the constituent matrices of the dynamic state equation take specific forms. As an example, implementation for both the Infinite Impulse Response (IIR) and the Finite Impulse Response (FIR) filters using state space is discussed in this appendix [109, 110]. CI. IIR Filtering Consider the case where the matrices A and B are set in the canonical form I (or the controller form). In this case A and B are conveniently represented as F1411 A12 ’4qu ‘11 1 0 o 0 A: : z = z : andB= : (c.1) _ o 0 1 o _ _o_ where 190 A : matrix of dimension Lm x Lm. A1}- : Block sub-matrix of dimension m x m, may be simplified to a diagonal matrix I : Identity matrix of dimension m x m 0 : Zero matrix of dimension m x m B: matrix of dimension Lm x m The state matrix is given by _ X1001 Xk = X(k) = X73“) ((3.2) _X L (k )1 where X (k) : is a Lm x m dimensional state vector for the filter, and each X J- (k): is an m x m dimensional component state vector For this model the state model reduces to the following set of equations representing an IIR filtering structure. L X1(k+l) = z Alej(k)+m(k) j=l X2(k+1)=X1(k) . (C.3) XL(k+1)=XL—1(k) L y(k) = 2 C jX ,- (k) + Dm(k) j=l For an IIR filter represented in the state space canonical form I, there is no update law required for B, while for the matrix A, it is needed update only the first block row and 191 therefore the state equation update law (see section 2.1.2) reduces to the following block update rule arr" = -77-—- = WU: .)T/1k+1 = “771k+1X,T(k) (C.4) 6A1j 1} The special structure in matrix A also effects the update law for the co-state equations (propagating in future time) which reduces to CT aLk k= k1 k 11()12(+)+C1ay() CT aLk 12(k) = 13(k+1)+C2 zayk --(k) (C.5) T aLk 3L0?) = C 6” --(k) Solving specifically for time k and then using time shift for time k+i, we obtain the recursive form for the update as CT aLk 2100:: ayk(lc+j— 1) L k (06) 11(k+1)= ZCT— 61’ j=1 k —(k+j) which is structurally similar to the natural gradient algorithm derived for blind source separation/ deconvolution by [155] and can be implemented in a similar fashion by using the usual time delayed version of the algorithm implemented by buffer storage memory. Using (C.6), we can further simplify (C.4) for the update of the block sub- matrices in A as L —an1 C176 ———-—(k + j)XT (07) 192 C.2. FIR Filtering In this case, the first row of block sub-matrices A1 j in (CI) is zero, therefore the above filtering model (C.3) reduces to the FIR filtering structure. In this case the state space model reduces to X1(k+1)=m(k) X2(k+1)=X1(k) 5 (C8) XL(k+1)=XL—1(k) L y(k) = 2 C jX j (k) + Dm(k) j=l where the state dynamic equations reduce merely to the delays of the mixture inputs i.e. X1(k)=m(k—l) .Xz(k)=M(k-2) (09) :XL(k) = 17106 -L) In this case only matrices C and D need to be updated. Using the MIMO controller form, both matrices A and B contain only fixed block identity and zero sub- matrices and are conveniently absorbed in (C9). In case of non-minimum phase environments, the demixing network is typically constituted as a double-sided FIR filter [14, 129]. Discrete time finite length double sided FIR filter can be used to approximate the otherwise unstable theoretical demixing network due to the presence of poles outside the unit circle [22]. The implementation of these non-causal double sided filter is handled by using a time delayed version of the above algorithm with the corresponding increased requirement of buffered storage. The 193 primary advantage of these doubly finite filters being their ability to converge at “stable” solutions even for non-minimum phase environments [14, 129]. One possible method to deal with the problem of the filter approximation length and resulting delay is in the frequency domain implementations of the algorithm, where the converged solution represents a 1024 or higher tap double sided FIR filter in equivalent frequency domain. The result can be converted to the time domain equivalent by appropriately using the inverse FFT followed by chopping or windowing techniques to optimally contain maximum possible filter power spectral density (PSD) while minimizing algorithmic delay and hence the buffering memory requirement [73, 76]. However, recent studies have shown that the frequency domain techniques have an upper bound on the performance as the filters lengths become large [18, 90]. 194 Appendix D I Performance Benchmarks for Blind Source Recovery Appendix D.1. Conventional Performance Benchmarks In this section we will provide an overview of several performance benchmarks that have been employed by various researchers for the problem of blind source recovery [113, 131]. D.l.l.Signal to Noise Ratio (SNR) Signal to noise ratio is a well-known communication benchmark, which defines the ratio between the desired signal and the unwanted noise/distortion powers usually expressed in st. P SNR=1010g %, (D.l.lO) n where P, — represents the signal power, and 195 P" — represents the noise and/or unwanted signal power This benchmark is applicable in situations where the desired signal is known and is therefore not very suitable for BSR problems where no such reference signal exists. D.l.2. Multichannel Intersymbol Interference (MISI) MISI is an extension of the communication benchmark inter-symbol interference, which quantifies the smearing of a communication channel by an adjacent channel due to their overlapping temporal and/or spectral characteristics [52, 98]. MISI is useful as a dispersion measure as it is insensitive to the overall gain and mean group delay. In the case of BSR type problems, MISI is a measure to determine the “distance” between the global, i.e., the combined mixing and demixing network transfer functions and an identity impulse transfer function. This is a measure of the global diagonalization and permutation as achieved by the demixing network [9 and the references therein] MSIk = filzjzplerl-maxpuler-ILfiIZerlGrvl-leGpvll i=1 maxPJ leij I H max?” lGP'j I (D. 1 .1 1) where for a linear case G(z) = H (z) * 17(2) - represents Global Transfer Function, H(z) = [A6, Be, Ce, De] — Transfer Function of Environment H (z) = [A, B, C, D]-— Transfer Function of Network This is the performance benchmark of choice for most of the research on BSR. However, the main drawback of this benchmark is that it requires knowledge of the mixing 196 environment, which is available offline only in synthetic simulation environments and hence cannot be employed in practical blind scenarios. Another variation of the above performance measure uses a squared version of the global transfer function matrix GM, i.e., the performance measure relies on L2 norm instead of the L1 norm used above. This technique has an additional inherent flaw that small error terms will be de-emphasized. D.l.3.Maximum signal to interference plus noise ratio (SINRM) This measure uses the maximum signal to interference plus noise ratio as a measure to determine separation. Interference in source k is constituted by all sources j such that j at k. Then SINR of source k at output 1' of separator W is defined as [32] H W: a SINRk(w,-)=a,fl ' k (13.1.12) where w,- - is the im column of separator matrix W ak - is the kth column of mixing matrix A a]? - is the input power of source k Rvk - is the temporal mean (for nonstationary sources) of the correlation matrix of the noise plus interference for source k, defined by Rvk = Rx — efiakaf (D.1.13) Again this measure requires knowledge of mixing transfer function, which is unknown in blind problems. 197 D.l.4.Mean Squared Error (MSE) Mean squared error is probably the best-known adaptation measure. MSE uses the L2 norm of the error as a measure to determine the convergence of an algorithm. In the absence of a desired reference to compute an explicit error measure as in the BSR case, this error may be computed as the difference between the current and the previous value of an adapted parameter or using a probabilistic score function. Although using MSE one can determine whether an algorithm has reached its steady state but there is no way to determine whether the algorithm has converged to the desired solution or a spurious local minimum, and as such is not suitable for quantification of BSR algorithm. For any parameter y , MSE is given by 2 MSE(7) = WC) ”(k-m (D.1.14) 2 I7(k)-W(7(k))l Some other performance indices proposed in [113] for synthetic audio environments are given below D.l.5.Distortion Measure The distortion in the jth separated output can be defined as Dj =1010g(E{(mJ-,Sj —ajyj)2}/E{(mj,sj,sj)2}] (D.1.15) where aj = E{m2_ }/E{y12} - is a scaling factor Jr" J E {.} - is the statistical expectation operator 198 y j - is the corresponding output for the j‘h source, and mks]. - is the contribution of thej‘h source to the ith microphone. D.l.6. Separation Measure The quality of separation for the fh separated output can be defined as 2 2 Sj=1010g E{(yj,sj) } E [ZyJ-W] (D.1.16) 199] where y j, Si - represents the f" output when only ith source is active. The last two benchmarks can also be represented in the frequency domain and are applicable for synthetic evaluations only, where the original source signals are known and their contribution to an observed mixture & output can be estimated. , 199 Appendix D.2. Quadratic Information Measure (QIM) Quadratic Information Measure (QIM) is a novel performance index to measure the statistical independence of data sequences. In this work, QIM has been proposed and applied it to the framework of blind source recovery (BSR) that includes blind source separation, deconvolution and equalization. This performance index is capable of measuring the mutual independence of data sequences directly from the data. This information theoretic independence measure is derived based on Renyi’s quadratic entropy [100] estimated by a finite data length Parzen window [93] using a gaussian kernel. Simulation results are presented to validate the performance of the proposed benchmark and compare it with other established benchmarks. D.2.].Practical Issues in Estimating Independence of Output Sequences Stochastic blind signal processing tasks such as blind source separation, deconvolution and equalization have been the focus of wide interest during the last two decades. This wide interest is due to several attractive and diverse application domains for these kinds of problems that include blind classification of topics in an intemet chatroom, communication systems, audio and acoustics, finance and marketing, astronomy and physics, bio-medical, space and geo-exploratory applications. However, a daunting task remains to reliably classify the performance of a number of proposed algorithms especially in a fashion that makes them suitable for practical implementation. Most of the presented research on the topic relies on performance benchmarks that require knowledge 200 of either the original source signals or the mixing environment transfer function. Both these quantities are otherwise assumed to be unknown (no precise knowledge of the above mentioned quantities is actually what makes these tasks blind). Therefore, these performance benchmarks render themselves useful for synthetic simulations only. In a practical situation, while making observations or recording a sequence of available information, precise knowledge of the original signals or the environment transfer function is never available. Compounding this situation with the inherent indeterminacies of permutation and scaling in the blind signal recovery problem, the dilemma of having an unknown number of sources, the unknown order of filters required say for a deconvolution type of setup etc. severely limits the capability to determine the quality of signal separation algorithms in practical situations. The use of a new benchmark has been pr0posed to determine mutual independence of a batch of signals for source recovery formulations. Unlike most other simulation performance benchmarks employed for BSR, this benchmark can be computed directly from the output data of the BSR network. It may also be applied directly to the observed mixture data in order to quantify the transformation effect of the BSR network. This novel benchmark is based on Renyi’s Mutual Information, see [97, 150]. In the context of BSR, the use of this independence performance benchmark ensues the debate on the correspondence between statistical independence and source recovery. In this appendix, an overview of Renyi’s quadratic entropy is first provided and it is discussed that how it can be utilized for estimation of signal independence. This is followed by a discussion on how to apply QIM to BSR type problems. At the end, simulated results are presented and observations are made from this study. 201 D.2.2.Independence Benchmark using Quadratic Mutual Information A general benchmark for determination of statistical independence based on Renyi’s quadratic mutual information has been proposed in this work [131]. Also discussed are salient features of this measure and use of Parzen Window method for its estimation. For a more detailed discussion on Renyi’s quadratic mutual information, see [97, 100]. The framework was applied to the problem of blind source separation by Principe et. al. for the development of a generalized information theoretic learning framework; where the learning of the demixing structure can be derived directly from data. However, the resulting update laws are computationally very expensive. Instead, it is proposed to utilize a finite data-length quadratic entropic measure as a performance benchmark for achieving statistical independence. D.2.2.l. Renyi’s Entropy Renyi’s entropy definition is derived from his proposed theory of means [100] and is given by N H arr-{Z 12mm») (13.2.17) k=1 where, (p(.) - is a continuous and strictly monotonic function subclass of Kolmogorov-Nagumo functions [69, 92]. To satisfy the constraints of an information measure (D.2.18) x Shannon's Entropy (0(x) = (l—a) ; ., 2 x Renyr s Entropy 202 I(pk) - any information measure, e.g., I(pk)=—log(pk)is Hartley’s information measure (or Shannon’s info-theoretic measure [38]). Simplifying the above relation, (D.2. l 7) N HRa zl—lzlog[2p?];a>0,a¢l (D.2.19) " k=1 Renyi’s quadratic entropy is the special case for a = 2, i.e., N HR2 = —log[kzl 11,3] (D.2.20) D.2.2.2. Parzen Window Estimator For the purpose of estimating the squared probability density for the quadratic mutual information, a Parzen kernel estimator [93] is employed. The Parzen estimator takes the form of a convolution of an estimator kernel with the observations. Using this estimator, the pdf py (z) of a random vector y e 91'" is given by N 16y(z)=%2x(z—y.) (13.2.21) i=1 where y, e 93'" is the ith observation vector x(.) is a kernel function that satisfies the conditions for a pdf. There are several such well-known kernel functions, e.g., the gaussian kernel, the cauchy kernel or the uniform kernel [93]. For ease of implementation, the Gaussian kernel is chosen with covariance matrix 2 , i.e., 203 m) = G(z,2) = (D222) 1 exp[ zTZ_lz] M 1 (271') A Isl/2 2 This choice is motivated by an integral property of the Gaussian kernel, which results in an efficient and exact computation of Renyi’s quadratic entropy, for the case of 2 gaussian kernels N Lim 2 6(2), -y,-,21)G(Zk _yjaZZ) = EG(z-yi,21)G(z—yj,22)dz = 007 _yjrzl +2:2) where y,- e 91'" and y j e 93'" are two data vectors in the space 21 and 22 are covariance matrices for corresponding Gaussian kernels. D.2.3.Quadratic Independence Measure (QIM) The proposed performance benchmark is based on the Cauchy-Schwartz Inequality. For L2 norm, the inequality is 2 2 2 1x1 urn 2(xTy) «1224) Using the above inequality, the Kullback-Lieblar divergence is “approximated” by taking the logarithm of both sides of (D224) and rearranging 2 2 logmxllz— 2 0 (D.2.25) (xTy) For two pdfs f (x) and g(x) , the above inequality can be used as a divergence measure between the two pdfs. The relation is given by 204 ( EoflxfdxX [jogefdx] [ J: f(x)g(x)dx)2 Pcs- /I/ 0.8 - / . 0.7L / — Mary) \ 0.6 ~ — 0.5 0.4/ - 0.3 L 10 I I 103 10‘ 105 Number of Data Samples (a) Quadratic Independence Measure vs. Data Length 0.9 1/ 0.8L ' 1 0.7 ~ 4 0.6 ~ 1 0.5 . \ . 0.4 - \ - 0.3 ~ PCSM 0.1r o 1 . 1 I 1 I 144 L . r 1 1 r 1 I 1 . 1 1 10 1o3 10‘ 10 Number of Data Samples 0)) Figure D. 1. Effect of length of observation sequence on (a) Cross Information Measure, (b) Quadratic Independence Measure (QIM) 208 The effectiveness of this performance measure depends on the length as well as the probability distributions of the constituents in an observation sequence. In Fig. D], a graph for QIM performance measure is being provided as a function of length of observation data. The data has three constituents that include speech, communication signals, and gaussian noise. D.2.4. Simulations Results Due to space limitation, only one simulation is being provided. The mixing environment is given by m-l n—l Z Aim(k — i) : Z Bis(k — i) + v(k) (D234) j=0 i=0 where '1 1 -1 0.5 0.8 —0.7 0.06 0.4 —0.5 A0=l -1 1 ,A1= 0.8 0.3 —o.2 ,A2= 0.16 —0.1 -0.4 _1 -1 1 —o.1 -0.5 0.4 —0.3 —0.06 0.3 "1 0.6 0.8 w 0.5 0.5 0.6 .125 0.06 0.2 BO: 0.3 1 0.1 ,3]: -0.3 0.2 —0.3 ,32: -o.1 0 0.4 _O.6 —0.8 1 —0.2 -o.43 0.6 0.08 —o.13 0.3 v(k) - Additive Gaussian noise The feedforward separation results for each channel using MISI and QIM (computed with a batch of 1000 samples) are shown below 209 Blind Source Recovery - Convergence of MISI Index 1 If I I I a a _ ,. a): _ £3 4 - x... - (2 2 ,— XWM -‘ 0 1 (WWW 1““ names”..- 1:22 ...... o 0.5 1 1.5 2 2 5 3 1o . . , , 1 x 10‘ \ ISI fory2 0'1 3/ <" I 0 I I I I 0 0.5 1 1.5 2 2 5 3 x10‘ I I I I I 8 —< >10 6 '— .1 g 4 .. J m _ 2 _ _ o I l I I I 0 0.5 1 1.5 2 2.5 3 number of iterations x 10‘ (a) I cs(y) - Cauchy Schwartz Quadratic Mutual Information 3 I I T I I 2 _. 1 _. -1 WW 0 I I I I I 0 0.5 1 1.5 2 2.5 3 Vc(y) - Cross I nforrnation Potential x 10‘ 1 T T I I T 0.5 e r 0 I L P I I 0 0.5 1 1.5 2 2.5 3 number of Iterations x 10‘ (b) Figure D.2. (a) Convergence of MISI performance Index, (b) Corresponding Performance of QIM performance Index 210 D.2.5. Observations The proposed quadratic independence benchmark [131] has been thoroughly investigated for BSR problems that include both minimum phase and non-minimum phase systems. Due to the normalized nature of QIM, it is suitable for performance comparison of demixing achieved by different algorithms. The primary inhibition is that computationally the benchmark has a quadratic relationship with the length of the data set. Therefore for practical situations, it is recommended to use either the QIM measure with a shorter data set or another computationally inexpensive measure such as MSE to determine the convergence of the algorithm. Once the algorithm has converged, a larger data chunk may be used to determine the achieved performance level using QIM. 211 Appendix D.3. Algebraic Matrix Distance Index (AMDI) Algebraic Matrix Distance Index (AMDI) [132-134, 137] is primarily a scaling, sign and permutation invariant numerical index to estimate the “similarity” between the column- bases of two matrices with conformal dimensions. These attributes make AMDI suitable for comparison of any two matrices of similar dimensions including results for algorithms such as AICA and geo-ICA. Unlike the multichannel intersymbol interference (MISI, also called Amari Performance Index), which measures the diagonalization of the global solution (if it can be calculated) in ICA or BSS. AMDI is used to compares the estimated mixing matrix with the synthetic or known mixing matrix to gauge the performance of the algorithms that compute the mixing matrix instead of the demixing matrix. Fru'thermore, for the case of overcomplete ICA, although no MISI can be calculated, yet AMDI can be used to verify the correctness of the estimated mixing matrix. This Algebraic Matrix-Distance Index (AMDI) is a symmetric “matrix-distance” measure. Unlike geometric measure [115, 117], AMDI estimates the distance between two column-normalized matrices independent of possible sign and permutation ambiguities. The columns of the matrices are normalized by projecting them onto the 212 conformal unit hyper-sphere by dividing all the elements of each column by the norm of the corresponding column. The AMDI is given by n-ZmaxQW'*H|) n-Zmax(|W'*H|) AAfl)1(W, H) : rows + 00’s (D.3. 1) n n where W, H — are two column-normalized matrices of similar dimensions. The Algebraic Matrix Distance Index (AMDI) has the following properties: 0 Os AW] (W ,H ) S 2; AMDI(W,H) = 2 only if both matrices are orthogonal to each other in the n—d space, also 0 AAflJI(W,H)=AAfl)I(H,W) In order to better visualize the convergence properties of ICA, at times it is useful to compute the normalized AMDI instead. The normalization is done by the maximum AMDI during the estimation phase, which is typically AMDI at initialization. 213 Appendix E Multi-Input Multi-Output (MIMO) Canonical Form This appendix describes the steps required to convert a MINIO filter definition, as used in all simulation examples to the canonical state space representation. For all the state space BSR simulations and derivations, the network structure is assumed to be in MIMO controller form [17, 103]. In this form, the matrices A and B have the structure F—QI ’Qz "QM-l "QM1 ’1] I 0 0 O 0 A: o 1 o 0 ,B= 0 (El) I 1 ° 0 _ o 0 1 o _ _o_ C=lfi P2 PM_1 PMLD=IP01 (152) where the state space network/filter can be represented by H(z)=1P112rl (13.3) with P(z) : fizz-z" (13.4) i=0 214 Q(Z) = 29,-z“ with Q0 = IN (13.5) i=0 The polynomial matrices P(z) and Q(z) of (E3) are derived from a MIMO transfer as explained in the following section. E.1. Transformations required for a MIMO Transfer Function Consider a pxm multi-input multi-output transfer function of the form "11(2) "12(2) "1111(2) - d11(2) 6112(2) d1m(2) "21(2) "22(2) "2111(2) H(z) = (121(2) d22(2) d2m(Z) (E6) npl(z) "p2(z) "pm(z) _dpl(z) dp2(z) dpm(z)d Defining, the monic least common denominator of each column i to be l,(z) of dimension say her, i.e., I1(2) é LCM (6111‘ x (121' x ' ' ' dpi) (E7) We can represent each term of the transfer function as n--(z) If (z) nr_ n,(.) _ '1 dry-<2) (er) 2(2) ‘ die) ’ I-(z) ' . Wi’ 1M} . Now, defining the modified numerator polynomials of dimension lxn as A 1 ° 511(2) = "y(2){ 1(2) dij-(z)} (E9) 215 Using the above definitions, we can represent (E.6) as H(z) : OI' H(z): PM 11(2) 521(2) 11(2) fip1(z) 7"1_2(_Z_) 12(2) 522(2) 12(2) fip2(z) _ I1(2) 711(2) 7121(2) _fipl (Z) therefore, by defining N(z)é and D(z) é r711(2) 521(2) Lfipl (Z) "11(2) I2(2) 0 0 12(2) 0 h- 0 (EH) can be expressed as H(z) : N(z)D“l (z) film(z)q [m(z) fi2m(z) Im(z) fipm(z) 4n(z) _ film(z)1 fi2m(z) fipm (2)2 film(z)1 fizm(2) ' 1 11(2) 0 fipm(z)d h- 0 ‘.’ /.<> 0 %m (Z)_ (E.10) (E.ll) (E.12) (13.13) (E.l4) Note that (EM) and (E3) although similar are not equivalent. To express the above matrices N(z) and D(z) in (BM) as P(z) and Q(z) of (E3), we need to re-arrange the 216 polynomials in the matrices to be in sub-matrices containing co-efficients of descending power of z, i.e., for a causal P(z), we have P(z) é and (2(2) é where, 51 1,0 521,0 npl,0 11 1,0 -— — -' F'—— — —— 1 "12,0 "1m,0 "11,1 "12,1 "1m,1 522,0 fiZm,O + 521,1 522,1 7721111 51220 fipm,0d _fipu fip2,l fipde "-- —- — q T "l l,n "12,71 nlm,n 572131 722m 72111,» _n ...... + . : . z _fiplm 5,12," fipm,n_ _ "in,l 0 0 ‘ __1 . . . Z +' 0 : : : l22,0 . O 0 11ml- . I WILM O O - 0 1pm.0_ 0 122,114 + o L _ 0 0 [pmMJ P(z) : [P0 +1012:l + ------ + Pnz'"] Q(z) : [Q1271 + ...... + QMz‘M ] and Q0 : I fiij, k : represents the k’h filter tap for the if” modified numerator filter, and Iij, 217 _1 + (13.15) 1 . (E.16) z—M (E.17) (E.18) k : represents the k’h filter tap for the if” common column denominator filter BIBLIOGRAPHY 218 BIBLIOGRAPHY [1] 3GPP, Multiplexing and Channel Coding (FDD), in rs 25.212, V3.3.0. 2002. [2] 3GPP, Physical channels and mapping of transport channels onto physical channels (FDD), in TS 25.211, V3.3.0. 2002. [3] 3GPP, Spreading and Modulation (FDD), in TS 25.213, V4.5.0. 2002. [4] 3GPP, UE Radio Transmission and Reception (FDD), in TR 25.101, V4.5.0. 2002. [5] Amari, 8., Theory of Adaptive Pattern Classifiers. IEEE Trans. on Electronic Computers, 1967. EC-l6(3): p. 299-307. [6] Amari, S., Differential-Geometrical Methods in Statistics Lecture Notes in Statistics 1985. 28. [7] Amari, S., Learning and Statistical Inference, in Hand-book of Brain Theory and Neural Networks, M.A. Arbib, Editor. 1995, MIT Press: Cambridge, MA. p. 522-526. [8] Amari, S. Neural Learning in Structured Parameter Spaces-Natural Riemannian gradient. in Neural Info. Processing Systems, NIPS-96. 1996: MIT Press. [9] Amari, S., Natural Gradient Works Efficiently in Learning. Neural Computation, 1998.10:p.251—276. [10] Amari, S., Natural Gradient Learning for Over- and Undercomplete Bases in ICA. Neural Computation, 1999. 11(8): p. 1875-1883. [1 1] Amari, S. and J.F. Cardoso, Blind Source Separation: Semi-Parametric Statistical Approach. IEEE Trans. on Signal Process, 1997. 45(11): p. 2692—2700. [12] Amari, S., T.-P. Chen, and A. Cichocki, Stability Analysis of Adaptive Blind Source Separation. Neural Networks, 1997. 10(8): p. 1345—1352. [13] Amari, S., A. Cichocki, and H.H. Yang, A New Learning Algorithm for Blind Signal Separation, in Advances in Neural Information Processing Systems. 1996, MIT Press: Cambridge, MA. p. 757-763. [14] Amari, 8., SC. Douglas, A. Cichocki, and H.H. Yang, Multichannel Blind Deconvolution and Equalization using the Natural Gradient, in Proceeding of IEEE Workshop on Signal Processing, Adv. in Wireless Communications. 1997: Paris, France. p. 101—104. 219 [15] Amari, S. and M. Kawanabe, Information Geometry of Estimating Functions in Semiparametric Statistical Models. Bernoulli, 1997. 3: p. 29-54. [16] Amari, S. and H. Nagaoka, Methods of Information Geometry. 1999: AMS and Oxford University Press. ' [l7] Antsaklis, PI. and AN. Michel, Linear Systems. 1997: Electrical and Computer Engineering Series, McGraw-Hill. [l8] Araki, S., S. Makino, R. Mukai, T. Nishikawa, and H. Saruwatari, Fundamental Limitation of Frequency Domain Blind Source Separation for Convolved Mixture of Speech, in 3rd Int'l Symposium on ICA and BSS, ICA-2001. 2001: San Diego, CA. p. 132-137. [19] Attick, J.J., Could Information Theory provide an Ecological Theory of Sensory Processing ? Network: Computation in Neural Systems, 1992. 3(2): p. 213-251. [20] Barlow, H.B., The coding of sensory messages, in Current Problems in Animal Behavior, W.H. Thorpe and CL. Zangwill, Editors. 1961, Cambridge University Press: MA. p. 331-361. [21] Bell, AJ. and TJ. Sejnowski, An Inforrnation-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 1995. 7: p. 1129—1159. [22] Benveniste, A., M. Goursat, and G. Ruget, Robust Identification of a Non- minimum Phase System: Blind Adjustment of a Linear equalizer in Data Communications. IEEE Trans. on Automatic Control, 1980. 25: p. 385—399. [23] Bofill, P. and M. Zibulevsky, Blind Separation of More Sources than Mixtures using the Sparsity of their Short Term Fourier Transform, in Proc. of Int’l Symposium on Independent Component Analysis and Blind Source Separation, ICA-2000. 2000: Helsinki, Finland. p. 87-92. [24] Bofill, P. and M. Zibulevsky, Underdetermined Blind Source Separation using Sparse Representation. Signal Processing, 2001. 81(1 1): p. 2353-2362. [25] Broeck, C.V.d. and P. Reimann, Unsupervised Learning by Examples: On-line versus Off-line. Physical Review Letters, 1996. 76: p. 2181-2191. [26] Cao, J. and N. Murata, A Stable and Robust ICA Algorithm Based on t- distribution and Generalized Gaussian Distribution Models, in Neural Networks for Signal Processing IX. 1999. p. 283-292. [27] Cardoso, J.F., Blind Signal Processing: Statistical Principles. Proceedings of the IEEE, 1998. 90(8): p. 2009—20026. [28] Cardoso, J.F., Entropic Contrasts for Source Separation: Geometry and Stability, Chapter 4, in Unsupervised Adaptive Filtering. 1999, John Wiley & Sons. 220 [29] Cardoso, J.F. and B. Laheld, Equivariant Adaptive Source Separation. IEEE Trans. on Signal Processing, 1996. 44: p. 3017—3030. [30] Charlier, G., Application, de la theorie des probabilites a l'astronomie, Part of the traite. 1931, Paris: Gauthier-Villars. ' [31] Chen, S.S., D.L. Donoho, and M.A. Saunders, Atomic Decomposition by Basis Pursuit, in Technical Report. 1996: Dept. of Statistics, Stanford University, CA [32] Chevalier, P., On the Performance of Higher Order Blind Source Separation Methods, in IEEE-ATHOS workshop on HOS. 1995: Begur Spain. p. 30-34. [33] Cichocki, A. and S. Amari, Adaptive Blind Signal and Image Processing. 2002: John Wiley and Sons. [34] Cichocki, A., S. Amari, and J. Cao, Neural Network Models for Blind Separation of Time Delayed and Convolved Signals. IECE Trans. Fundamentals, 1997. E82-A(9): p. 1595-1603. [35] Cichocki, A., 8.1. Amari, and J. Cao. Blind Separatrion of Delayed and Convolved Sources with Self Adaptive Learning Rate. in Int'l Symposium on Nonlinear Theory and Applications, NOLTA-96. 1996. Kochi, Japan. [36] Cichocki, A., R. Unbehauen, and E. Rummert, Robust learning algorithm for blind separation of signals. Electronics Letters, 1994. 30(17): p. 1386-1387. [37] Common, P., Independent Component Analysis, a new concept ? Signal Processing, 1994. 36(3): p. 287-314. [38] Cover, T.M. and J.A. Thomas, Elements of Information Theory. 1991: Wiley Interscience Series. [39] Cristescu, R., T. Ristaniemi, J. Joutsensalo, and J. Karhunen, Blind Separation of Convolved Mixtures for CDMA Systems, in Proc. of the European Signal Processing Conference, EUSIPCO-2000. 2000: Tampere, Finland. p. 619-622. [40] Erten, G. and FM. Salam, Real Time Separation of Audio Signals Using Digital Signal Processors, in Proceedings of the 40th Midwest Symposium on Circuits and Systems. 1998. p. 1237 -1240. [41] Erten, G. and F .M. Salam, Voice Output Extraction by Signal Separation, in Proceedings of the 1998 IEEE International Symposium on Circuits and Systems. 1998. p. 5—8. [42] Erten, G. and FM. Salam, Voice Extraction by On-line Signal Separation and Recovery. IEEE Transactions on Circuits and Systems 11: Analog and Digital Signal Processing, 1999. 46(7): p. 915—922. 221 [43] Gaeta, M. and J .-L. Lacounme, Source Sparation without a priori Kowledge: The Mximum Lkelihood Slution, in Proc. Eur. Signal Processing Conf. 1990. p. 621-624. [44] Gesbert, D., J. Sorelius, P. Stoica, and A. Paulraj, Blind Multi-User MMSE Detectior for CDMA Signals in 181 channels. IEEE Communications Letters, 1999. 3(8): p. 233-235. [45] Gharbi, A.B.A. and FM. Salam, Separation of Mixed Signals in Dynamic Environments: Formulation and some Implementation, in Proc. of 37th Midwest Symposium on Circuits and Systems. 1994: Lafayette, LA. p. 587-590. [46] Gharbi, A.B.A. and EM. Salam, Implementation and Test Results of a Chip for the Separation of Mixed Signals. IEEE Transactions on Circuits and Systems, 1995. 42(11): p. 748—751. [47] Gharbi, A.B.A. and FM. Salam, Algorithms for Blind Signal Separation and Recovery in Static and Dynamic Environments, in Proceedings of 1997 Int’l Symp. on Circuits and Systems. 1997: Hong Kong. p. 713—716. [48] Girolami, M., An Alternative Perspective on Adaptive Independent Component Analysis Algorithms, in Technical Report, Computing and Information Systems. 1998: Paisley University, Scotland. [49] Girolami, M., A]. Bell, and T.J. Sejnowski, A Unifying Information Theoretic Framework for Independent Component Analysis. Int’l Journal on Mathematical and Computer Modeling 2000. 38: p. 1—21. - [50] Girolami, M. and C. Fyfe, Extraction of Independent Signal Sources using a Deflationary Exploratory Projection Pursuit Network with Lateral Inhibition. IEE Proceedings on Vision, Image and Signal Processing, 1997. 14(5): p. 299-306. [51] Girolami, M. and C. Fyfe, Generalised Independent Component Analysis through Un-supervised Learning With Emergent Bussgang Properties, in Proc. International Conference on Neural Networks. 1997: Houston, TX. p. 1788-1791. [52] Haykin, 8., Adaptive Filter Theory. 4 ed. 1996: Prentice-Hall. [53] Haykin, S., Neural Networks: A Comprehensive Foundation. 2 ed. 1998: Prentice Hall. [54] Haykin, S., Unsupervised Adaptive Filtering, ed. S. Haykin. Vol. I and II. 2000: John Wiley and Sons. [55] Heikkila, M.J., A Novel Blind Adaptive Algorithm for Channel Equalization in WCDMA Downlink, in 12th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC-Ol. 2001: San Diego, CA. p. A-41--A-45. 222 [56] Herault, J. and C. Jutten, Space or Time Adaptive Signal Processing by Neural Network Models AIP Conference Proceedings, 1986. 151: p. 206-211. [57] Holma, H. and A. Toskala, WCDMA for UMTS Radio Access for Third Generation Mobile Communications. 2001: John Wiley & Sons, Inc. [58] Honig, M., U. Madhow, and S. Verdr'r, Blind Adaptive Multiuser Detection. IEEE Trans. on Information Theory, 1995. 41(4): p. 944-960. [59] Hooli, K., M. Latva-Aho, and M. Juntti, Performance Evaluation of Adaptive Chip-Level Channel Equalizers in WCDMA Downlink, in IEEE International Conference on Communications. 2001. p. 1974 —l979. [60] Hyvarinen, A., J. Karhunen, and E. Oja, Independent Component Analysis. 2001: Wiley Inter-Science Series. [61] Hyvarinen, A. and E. Oja, A Fast Fixed-Point Algorithm for Independent Component Analysis. Neural Computation, 1997. 9: p. 1483—1492. [62] J outsensalo, J. and T. Ristaniemi, Blind Multi-user Detection by Fast Fixed Point Algorithm without Prior Knowledge of Symbol Level Timing, in Proc. of the IEEE Signal Processing Workshop on Higher Order Statistics. 1999: Ceasarea, Israel. p. 305- 308. [63] Jutten, C. and J. Herault, Blind separation of sources, an adaptive algorithm based on neuromimetic architecture. Signal Processing, 1991. 24(1): p. 1-31. . [64] Karhunen, J ., Neural Approaches to Independent Component Analysis and Source Separation, in Proc. of 4th European Symp. on Artificial Neural Networks (ESANN-96). 1996: Bruges, Belgium. p. 249-266. [65] Karhunen, J. and J. Joutsensalo, Representation and Separation of Signals Using Non-linear PCA Type Learning. Neural Networks, 1994. 7(1): p. 113-127. [66] Karkkéiinen, K.H.A. and RA. Leppéinen, The Influence of Initial-Phases of a PN Code Set on the Performance of an Asynchronous DS-CDMA System. Wireless Personal Communications, 2000. 3(3): p. 279-293. [67] Karvanen, J., E. J., and K. V., Adaptive Score Functions for Maximum Likelihood ICA. Journal of VLSI Signal Processing, 2002. 32: p. 83-92. [68] Karvanen, J. and V. Koivunen, Blind Separation Methods Based on Pearson system its Extensions. Signal Processing, 2002. 82(4): p. 663-673. [69] Kendall, M. and A. Stuart, The Advanced Theory of Statistics. Vol. I. 1977: Charles Griffin and Co. Ltd. 223 [70] Kenney, J.F. and ES. Keeping, The k-Statistics, in Mathematics of Statistics, V. Nostrand, Editor. 1962: Princeton, NJ. p. 99-100. [71] Khalil, H.K., Nonlinear Systems. 3 ed. 2002: Prentice Hall. [72] Kullback, S., Topics in Statistical Information Theory. 1987, New York: Springer-Verlag. [73] Lambert, R., Multichannel Blind Deconvolution: FIR Matrix Algebra and Separation of Multipath Mixtures. 1996. [74] Lambert, R. and A.J. Bell, Blind separation of Multiple Speakers in a Multipath Environment, in Int’l Conference of Acoustics, Speech and Signal Processing, ICASSP- 97. 1997: Munich, Germany. p. 423-426. [75] Lee, T.W., A.J. Bell, and R. Lambert, Blind separation of Delayed and Convolved Sources. Advances in Neural Information Processing Systems, 1997. 9: p. 758-764. [76] Lee, T.W., A.J. Bell, and R. Orglmeister, Blind Source Separation of Real-World Signals, in Proc. of IEEE Int’l Conference on Neural Networks ICNN-97. 1997: Houston, TX. p. 2129—2135. [77] Lee, T.W. and M. Girolami, Advances in Independent Component Analysis. 2000: Springer Verlag. [78] Lee, T.-W., M. Girolami, and T.J. Sejnowski, Independent Component Analysis using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources. Neural Computation, 1999. 11(2): p. 417—441. [79] Lee, T.W. and T.J. Sejnowski, Independent Component Analysis for Sub- Gaussian and Super-Gaussian Mixtures, in 4th Joint Symposium on Neural Computation. 1997, Institute for Neural Computation. p. 132—140. [80] Lewicki, M. and T.J. Sejnowski, Learning Overcomplete Representations. Neural Computation 2000. 12: p. 337-365. [81] Lewis, FL. and V.L. Syrmos, Optimal Control. 2 ed. 1995: Wiley Interscience Series. [82] Li, H. and H.V. Poor, Iterative Channel Estimation and Multiuser Detection in Multipath CDMA Channels, in Proc. of IEEE Int’l Conference on Communications 2003: Anchorage, Alaska, USA. p. to appear. [83] Linsker, R., A local learning rule that enables information maximization for arbitrary input distributions. Neural Computations, 1997. 9(8): p. 1661-1665. [84] Mahram, H.H., H.E. B011, and G. Alirezaei, Performance Evaluation of Advanced Receivers for WCDMA Downlink Detection, in Proc. of 5th Int’l Symposium on 224 Wireless Personal Multimedia Communications, WPMC-2002. 2002: Honolulu, Hawaii, USA. [85] Mathis, H., T.P. Von-Hoff, and M. Joho, Blind Separation of Signals with Mixed Kurtosis Signs using Threshold Activation Functions. IEEE Transactions on Neural Networks, 2001. 12(3): p. 618—624. [86] Michel, AN. and D. Liu, Qualitative Analysis and Synthesis of Recurrent Neural Networks. 2002, New York: Marcel Dekker. [87] Miller, J.H. and 1B. Thomas, Detectors for Discrete-Time Signals in Non- Gaussian noise. IEEE Transactions on Information Theory, 1972. IT-18(2): p. 241-250. [88] Nadal, J .P. and N. Parga, Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive approaches. Neural Computation, 1997. 9(7): p. 1421-1456. [89] Nicholls, J ., From Neuron to Brain. 4 ed. 2001: Sinauer Associates Incorporated. [90] Nishikawa, T., H. Saruwatari, and K. Shikano, Blind Source Separation Based on Multi-Stage ICA Using Frequency-Domain ICA and Time-Domain ICA, in Int'l conference on Fundamentals of Electronics, Communications and Computer Sciences, ICFS-2002. 2002. p. 7-12. [91] Opper, M, Online versus Offline Learning from Random Examples: General Results. Physics Review Letters, 1996.77: p. 4671-4674. [92] Papoulis, A., Probability, Random Variables and Stochastic Processes. 1984, New York: McGraw-Hill. [93] Parzen, E., On the Estimation of a Probability Density Function and the Mode. Annals Mathematical Statistics, 1962.33: p. 1065-1076. [94] Pearlrnutter, B. and L. Parra, A Context-Sensitive Generalization of Independent Component Analysis, in 1996 International Conference on Neural Information Processing. 1996: Hong Kong. [95] Pearson, K., Contributions to the Mathematical Study of Evolution. Philosophical Transactions of the Royal Society A, 1894. 185: p. 71-110. [96] Prator, O., C. Unger, A. Zoch, and GP. Fettweis, Performance of Adaptive Chip Equalization for the WCDMA Downlink in Fast Changing Environments, in IEEE 7th lnt’l Symposium on Spread Spectrum Technology and Applications, ISSSTA-02. 2002: Prague, Czech Republic. p. 273-277. [97] Principe, J ., D. Xu, and J. Fisher, Information-Theoretic Learning, Chapter 7,, in Unsupervised Adaptive F iltering,, S. Haykin, Editor. 1999, John Wiley & Sons. 225 [98] Proakis, J .G., Digital Communications. 4 ed. 2001: McGraw-Hill. [99] Puntonet, CG. and A. Prieto, An Adaptive Geometrical Procedure for Blind Separation of Sources. Neural Processing Letters, 1995. 2. [100] Renyi, A., On measures of Entropy and Information. 1 ed. Selected Papers of Alfred Renyi. Vol. 2. 1976: Budapest Akademia Kiado. [101] Ristaniemi, T. and J. Joutsensalo, Advanced ICA-based Receivers for Block Fading DS-CDMA Channels. Signal Processing, 2002. 82(3): p. 417-431. [102] Roth, Z. and Y. Baram, Multi-dimensional Density Shaping by Sigmoids. IEEE Transactions on Neural Networks, 1996. 7(5): p. 1291-1298. [103] Rugh, W.J., Linear System Theory. 2 ed. 1996: Prentice Hall, NJ. [104] Rumelhart, D.E., G.E. Hinton, and RJ. Williams, Learning Internal Representations by Error Propagation, in Parallel Distributed Processing, S. Amari, Editor. 1986, MIT Press: Cambridge, MA. p. 318-362. [105] Sabala, I., A. Cichocki, and S. Amari. Relationships between Instantaneous Blind Source Separation and Multichannel Blind Deconvolution. in Proc. Int. Joint Conference on Neural Networks. 1998. Alaska USA. [106] Salam, F .M., An Adaptive Network for Blind Separation of Independent Signals, in Proc. of International Symposium on Circuits and Systems. 1993: Chicago, IL. p. 431— 434. [107] Salam, F.M. and G. Erten, Blind Signal Separation and Recovery in Dynamic Environments, in Proceedings of 3rd IEEE Workshop on Nonlinear Signal and Image Processing 1997: Mackinac Island, MI. [108] Salam, F.M. and G. Erten, Exact Entropy Series Representation for Blind Source Separation, in Proceedings of IEEE SMC Conference. 1999. p. 553—558. [109] Salam, F.M. and G. Erten, The State Space Framework for Blind Dynamic Signal Extraction and Recovery, in Proceedings of the IEEE International Symposium on Circuits and Systems. 1999. p. 66—69. [110] Salam, F.M., G. Erten, and K. Waheed, Blind Source Recovery: Algorithms for Static and Dynamic Environments, in Proc. of INNS-IEEE International Joint Conference on Neural Networks. 2001. p. 902—907. [111] Salam, F.M., A.B.A. Gharbi, and G. Erten, F orrnulation and Algorithms for Blind Signal Recovery, in Proceedings of the 40th Midwest Symposium on Circuits and Systems. 1998. p. 1233—1236. 226 [112] Salam, F.M. and K. Waheed, State-Space Feedforward and Feedback Structures for Blind Source Recovery, in Proc. of 3rd International Conference on Independent Component Analysis and Blind Signal Separation. 2001: San Diego, CA. p. 248—253. [113] Schobben, D., K. Torkkola, and P. Smaragdis, Evaluation of Signal Separation Methods, in Proc. of lst Intl. workshop on ICA and BSS, ICA-99. 1999: Aussois, France. p. 261-266. [114] Taleb, A. and C. Jutten, Sources Separation in Post-Nonlinear Mixtures. IEEE Transactions on Signal Processing, 1999. 10(47): p. 2807-2820. [115] Theis, F.J., A. Jung, E.W. Lang, and CG. Puntonet, A Theoretical Model for Linear Geometric ICA, in Proc. of 3rd Int’l Symposium on Independent Component Analysis and Blind Source Separation, ICA-2001. 2001. p. 349-354. [116] Theis, F .J . and E.W. Lang, Formalization of Two-Step Approach to Overcomplete BSS; , in IASTED Int’l Conf. on Signal & Image Proc. SIP-2002. 2002: Hawaii, USA. [117] Theis, F.J. and E.W. Lang, Geometric Overcomplete ICA, in Proc. of 10th European Symp. on Artificial Neural Networks, ESANN 2002. 2002: Bruges, Belgium. p.217-223,. [118] Torkkola, K., Blind Separation of Convolved Sources based on Information Maximization, in IEEE Workshop on Neural Networks for Signal Processing. 1996: Kyoto, Japan. p. 423—432. [119] Tsypkin, Y.Z., Foundation of the Theory of Learning Systems. 1973, New York: Academic Press. [120] Verdu, S. Minimum Probability of Error for Asynchronous Gaussian Multiple-Access Channels. in IEEE Transactions on Information Theory. 1986. [121] Verdu, S., Multiuser Detection. 1998: Cambridge University Press. [122] Waheed, K., K. Desai, and F.M. Salem, Adaptive RAKE-Blind Source Recovery Algorithms for 3GPP UMTS/WCDMA Downlink Receivers, in 2003 IEEE International Conference on Robotics, Intelligent Systems & Signal processing, RISSP-2003. 2003: Changsha, Hunan, China. p. to appear. [123] Waheed, K., K. Desai, and F.M. Salem, Blind Multi User Detection in DS-CDMA Systems using the Natural Gradient based Symbol Recovery Structures, in 4th International Symposium on Independent Component Analysis and Blind Source Separation, ICA-2003. 2003: Nara, Japan. p. 727-732. [124] Waheed, K., K. Desai, and F.M. Salem, Natural Gradient based Blind Multi User Detection in QPSK DS-CDMA Systems, in 2003 IEEE-INNS Joint International Conference on Neural Networks, IJCNN-2003. 2003: Portland OR. p. to appear. 227 [125] Waheed, K., K. Desai, and F.M. Salem, Blind Info-Theoretic Multi-User Detection Algorithms for DS-CDMA and WCDMA Downlink Systems. IEEE Transactions on Neural Networks, submitted 2003: p. to appear. [126] Waheed, K. and F .M. Salam, Blind Source Recovery: Some Implementation and Performance Issues, in Proceedings of the 44th IEEE Midwest Symposium on Circuits and Systems. 2001: Dayton, OH. p. 694-697. [127] Waheed, K. and F .M. Salam, Blind Source Recovery using an Adaptive Generalized Gaussian Score Function, in 45th IEEE Int’l Midwest Symposium on Circuits and Systems. 2002: Tulsa, OK p. 418-421. [128] Waheed, K. and F.M. Salam, Cascaded Structures for Blind Source Recovery, in 45th IEEE Int’l Midwest Symposium on Circuits and Systems. 2002: Tulsa, OK. p. 656- 659. [129] Waheed, K. and F.M. Salam, State Space Blind Source Recovery of Non- minimum Phase Environments, in 45th IEEE International Midwest Symposium on Circuits and Systems. 2002: Tulsa, OK. p. 422-425. [130] Waheed, K. and F .M. Salam, State-Space Blind Source Recovery for Mixtures of Multiple Source Distributions, in Proc. of IEEE Int’l Symposium on Circuits and Systems. 2002: Scottsdale, Arizona. p. 197—200. [131] Waheed, K. and F.M. Salam, A Data-Derived Quadratic Independence Measure for Adaptive Blind Source Recovery in Practical Applications, in 45th IEEE Int’l Midwest Symp. on Circuits and Systems. 2002 Tulsa, OK. p. 473-476. [132] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis, in 2003 IEEE International Conference on Robotics, Intelligent Systems & Signal processing, RISSP-2003. 2003: Changsha, Hunan, China. p. to appear. [133] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis: An approach for Separation of Overcomplete Speech Mixtures, in 2003 IEEE-INNS Joint International Conference on Neural Networks, IJCNN-2003. 2003: Portland OR. p. to appear. [134] Waheed, K. and F.M. Salem, Algebraic Overcomplete Independent Component Analysis, in 4th International Symposium on Independent Component Analysis and Blind Source Separation, ICA-2003. 2003: Nara, Japan. p. 1077-1082. [135] Waheed, K. and F.M. Salem, Blind Source Recovery: A State Space Formulation, in Contemporary Issues in System Stability and Control with Applications, D. Liu and P]. Antsaklis, Editors. 2003, Birkhauser Boston, MA. p. 171-196. [136] Waheed, K. and F.M. Salem, New Hyperbolic Source Density Models for Blind Source Recovery Score Functions, in IEEE International Symposium on Circuits and Systems. 2003: Bangkok, Thailand. p. to appear. 228 [137] Waheed, K. and F.M. Salem, Algebraic Independent Component Analysis and its Application to Overcomplete Blind Source Separation. Neural Processing Letters, submitted 2003: p. to appear. [138] Waheed, K. and F.M. Salem, Blind Source Recovery for Multiple Source Distributions using Adaptive Score Functions. Neural Processing Letters, submitted 2003: p. to appear. [139] Waheed, K. and F.M. Salem, Blind Source Recovery: A Framework in the State Space. Journal of Machine Learning Research, submitted 2003: p. to appear. [140] Waheed, K. and F.M. Salem, State Space Feedforward and Feedback Structures for Blind Source Recovery. Neural Processing Letters, submitted 2003: p. to appear. [141] Wang, X. and H.V. Poor, Blind Equalization and Multiuser Detection in Dispersive CDMA Channels. IEEE Transactions on Communications, 1998. 46(1): p. 91- 103. [142] Wang, X. and H.V. Poor, Subspace Methods for Blind Joint Channel Estimation and Multiuser Detection in CDMA Systems. Wireless Networks, 2000. 6(1): p. 59-71. [143] Website, 3rd Generation Partnership Project Website. hm://W.3mg. 2003. [144] Website, Blind Source Recovery. http://www.cg:.msu.edu/bsr. 2003. [145] Website, CDMA Group website. http://www.cdg.org[index.asp. 2003. [146] Werner, S. and J. Lilleberg, Downlink Channel Decorrelation in CDMA Systems with long Codes, in IEEE Vehicular Technology Conference, VTC-99 1999: Houston, Texas, USA. p. 1614 -l617. [147] Widrow, B., A Statistical Theory of Adaptation. 1963: Oxford Pergamon Press. [148] Woods, J.W. and H. Stark, Probability and Random Processes with Applications to Signal Processing. 3 ed. 2001: Prentice Hall. [149] Wu, Y., Q. Zhang, W. Zhu, and S.-Y. Kung, Spreading Code Assignment in an Ad-hoc DS-CDMA wireless network, in International Conference on Communications, ICC-2002. 2002: New York, NY. p. 3066 —3070. [150] Xu, D., Energy, Entropy and Information Potential in Neurocomputing, in Ph.D. Thesisl998, Department of Electrical Engineering, University of Florida: Florida. [151] Yang, H.H., S. Amari, and A. Cichocki, Information Back-Propagation for Blind Separation of Sources from Non-linear Mixtures, in Proc. of Int’l Conference on Neural Networks ICNN-97. 1997: Houston, TX. p. 2141—2146. 229 [152] Zhang, J.-G., A.B. Sharmaand, and WC. Kwong, Optical CDMA-based ATM Switches Supporting Equal-Bit-Rate and Variable-Bit-Rate Communication Services, in IEEE Global Telecommunications Conference, GLOBECOM -98. 1998. p. 3221 —3226. [153] Zhang, L.Q., S. Amari, and A. Cichocki, Natural Gradient Approach to Blind Separation of Over- and Under-complete Mixtures, in First International Workshop on Independent Component Analysis and signal Separation (ICA99). 1999: Aussois, France. p. 455-460. [154] Zhang, L.Q., S. Amari, and A. Cichocki, Semiparametric Approach to Multichannel Blind Deconvolution of Non-minimum Phase Systems. Advances in Neural Information Processing Systems 2000. 12: p. 363—369. [155] Zhang, L.Q. and A. Cichocki, Blind Deconvolution of Dynamical Systems: A State Space Approach. Journal of Signal Processing, 2000. 4(2 ): p. 111-130. [156] Zhang, L.Q., A. Cichocki, and S. Amari, Multichannel Blind Deconvolution of Nonminimum Phase Systems using Information Backpropagation, in Proc. of 5th Int’l Conf. on Neural Information Processing. 1999: Perth, Australia. p. 210—216. [157] Zhang, L.Q., A. Cichocki, and S. Amari, Kalman Filter and State-Space Approach to Multichannel Blind Deconvolution, in IEEE Workshop on Neural Networks for Signal Processing. 2000: Sydney, Australia. p. 425—434. [158] Zibulevsky, M. and BA. Pearlrnutter, Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural Computation, 2001. 13(4): p. 863-882. 230 llfl