PLACE IN RETURN BOX to romovo this chockout from your record. TO AVOID FINES return on or baton dd. dun. DATE DUE DATE DUE DATE DUE ‘5“) '2’ 7 "1"3 1. #3 L_-—__-_ J ! MSU Is An Affirmative Action/Equal Opportunity Institution ALGORITHMS AND ARCHITECTURES FOR ADAPTIVE SET MEMBERSHIP-BASED SIGNAL PROCESSING By S ouheil Farah Odeh A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1990 ABSTRACT ALGORITHMS AND ARCHITECTURES FOR ADAPTIVE SET MEMBERSHIP-BASED SIGNAL PROCESSING By S ouheil Farah Odeh This research is concerned with a class of set membership (SM) algorithms for esti- mating the parameters of linear system or signal models in which the error sequence is pointwise “energy bounded.” Specifically, it is focused on the set membership weighted recursive least squares {SM- WRLS’) algorithm which works with bounding hyperel- lipsoidal regions to describe the solution sets which are a consequence of the error bounds. SM-WRLS is based on the familiar WRLS algorithm with the SM considera- tions handled through a special weighting strategy. The original version of SM-WRLS is applicable to real scalar data. In this work, a generalized SM-WRLS algorithm that can handle complex vector-input vector-output data streams is developed which ex- tends the applicability of this algorithm to virtually any signal processing problem involving parametric models. Further, a significant reduction in computational com- plexity can be achieved by employing a “suboptimal” test for information content in an incoming equation. This new strategy can be applied to virtually any version of the SM-WRLS algorithm to improve the computational complexity. The suboptimal check is argued to be a useful determiner of the ability of incoming data to shrink the ellipsoid. The “unmodified” SM-WRLS algorithm has inherent adaptive capabilities in its own right. However, it is not possible to depend upon this algorithm to reliably behave in an adaptive manner, particularly in cases of quickly varying system dynamics. In this work, explicitly adaptive SM-VVRLS algorithms are developed. Adaptation is in- corporated into SM-WRLS in a very general way by introducing a. flexible mechanism by which the algorithm can forget the influence of past data. The general formula! ion permits the extension of SM-WRLS to a wide range of adaptation strategies. The various SM-WRLS developments are tested on models derived from real speech data. Simulation results are presented which illustrate important points about the various methods and show that the adaptive algorithms yield accurate estimates using very few of the data and quickly adapt to fast variations in the signals dy- namics. It is also significant that in preliminary experiments, most of the SM-WRLS algorithms are found to be robust in small (16-bit) wordlength environments. Finally, a parallel architecture is developed that implements the various SM- WRLS algorithms in 0(m) floating point operations per equation using 0(m) cells, where m represents the number of parameters estimated. A detailed analysis of the computational complexity issues is carried out. To my parents Farah and Georgette Odeh for their love, support, and sacrifice iv Acknowledgments I am gratefully indebted to my thesis advisor, Professor John R. Deller, Jr., for his invaluable guidance and generous support throughout the course of this research. I would like to express my gratitude to all the members in my Ph.D. guidance committee, Professor Donnie K. Reinhard, Professor Lionel M. Ni, Professor Majid Nayeri, and Professor Paul M. Parker, for their time and effort in reviewing and discussing my dissertation. The late Professor Harriett B. Rigas was also an important source of inspiration and guidance throughout my education, especially in difficult times. I would also like to give a special thanks to everyone in my family for their con- tinuous love, understanding, and encouragement. I also gratefully acknowledge the partial support of this work by a grant from the Whitaker Foundation. Table of Contents List of Tables List of Figures 1 Introduction and Background 1.1 General Objectives and Scope ...................... 1.2 Set Membership Theory ......................... 1.2.1 The Identification Problem and Least Squares Estimation . . . 1.2.2 The OBE Algorithm ....................... 1.2.3 The SM-WRLS Algorithm .................... 1.2.3.1 MIL-based SM-WRLS Algorithm ........... 1.2.3.2 GR-based SM—WRLS Algorithm ............ 2 New Theoretical Results and Algorithms 2.1 Introduction ................................ 2.2 A Generalized “Non-adaptive” SM-WRLS Algorithm ......... 2.3 Suboptimal Tests for Innovation in SM-WRLS Algorithms ...... 2.4 Adaptive SM-WRLS Algorithms ..................... 2.4.1 General Formulation ....................... 2.4.1.1 Windowing ....................... 2.4.1.2 Graceful Forgetting ................... 2.4.1.3 Selective Forgetting ................... 2.4.2 Exponential Forgetting Factor Adaptation ........... 2.5 A Survey of the Computational Complexities of Several Related Se- quential Algorithms ............................ 3 Simulation Studies 3.1 Introduction ................................ 3.2 Simulation Results of two AR(2) models ................ 3.2.1 Conventional RLS and SM-WRLS Algorithms ......... 3.2.2 Adaptive SM-WRLS Algorithms ................. 3.2.2.1 Windowing ....................... 3.2.2.2 Graceful Forgetting ................... 3.2.2.3 Selective Forgetting ................... vi viii g. WCOCDKIUIu-fiht—‘H y... 18 18 19 34 37 38 40 40 41 42 45 52 52 53 57 62 62 65 65 3.2.3 Suboptimal SM-WKIS ...................... 3.2.4 Adaptive Suboptimal SM~VVRLS 3.3 Simulation Results of an AR(14) model ................. 3.3.1 Conventional RLS and SM-WRLS Algorithms ......... 3.3.2 Adaptive SM—WRLS Algorithms ................. 3.3.2.1 Windowing 3.3.2.2 Graceful Forgetting ................... 3.3.2.3 Selective Forgetting ................... 3.3.3 Suboptimal SM-WRLS ...................... 3.3.4 Adaptive Suboptimal SM-WRLS 3.4 Roundoff Error Analysis ......................... 4 Architectures and Complexity Issues 4.1 Introduction . OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 4.2 Parallel Architecture for SM-WRLS ................... 4.3 An Adaptive Compact Parallel Architecture .............. 4.4 Computational Complexities ....................... 5 Conclusions and Further Work 5.1 Algorithmic Developments ........................ 5.1.1 A Generalized SM-WRLS Algorithm .............. 5.1.2 Suboptimal Tests for Innovation ................. 5.1.3 Adaptive SM-WRLS Algorithms ................. 5.2 Simulation Studies 5.3 Architectures and Complexity Issues 5.4 Further Work Appendix A Bibliography vii Ti 71 7-1 78 T9 79 80 81 82 83 85 97 97 97 106 112 117 117 117 118 118 119 119 120 122 126 2.1 2.2 4.1 4.2 4.3 4.4 4.5 List of Tables Computational complexities (in floating point operations (flops) per equation) for the real scalar sequential algorithms ........... 46 Computational complexities (in complex floating point Operations (cflops) per equation) for the generalized sequential algorithms ........ 50 The timing diagram of the triangular array of Fig. 4.1 ......... 105 The timing diagram of the back substitution array of Fig. 4.1 ..... 106 The timing diagram of the Givens rotation (GR) array of Fig. 4.5 . . 111 Computational complexities (in flops per equation) for the real scalar sequential and parallel GR-based SM-WRLS algorithms ........ 113 Parallel computational complexities (in flops per equation) for the var- ious SM-WRLS algorithms using the implementations of Figs. 4.1 and 4.5113 viii 1.1 1.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 List of Figures Local and global membership sets in 2-D ................ 10 The SM—WRLS algorithm based on Givens rotations ......... 16 (a) The real scalar linear model and (b) the general complex vector linear model ................................ 21 The acoustic waveform of the word “four” ............... 53 The acoustic waveform of the word “six” ................ 54 The “true” parameters for the word “four”. (a) Parameter a1 and (b) Parameter a2 ............................... 55 The “true” parameters for the word “six”. (a) Parameter a; and (b) Parameter a; ............................... 56 Simulation results of the conventional RLS algorithm for the word four 58 Simulation results of the SM-WRLS algorithm for the word four. 1.86% of the data is employed in the estimation process ........... 59 Simulation results of the conventional RLS algorithm for the word six 60 Simulation results of the SM—WRLS algorithm for the word six. 2.16% of the data is employed in the estimation process ........... 61 Simulation results of the windowed SM-WRLS algorithm for the word four (I = 1000). 5.69% of the data is employed in the estimation process 63 Simulation results of the windowed SM-WRLS algorithm for the word six (I = 1000). 5.44% of the data is employed in the estimation process 64 Simulation results of the graceful forgetting SM-WRLS algorithm for the word four ()1 = 10'3). 6.19% of the data is employed in the esti- mation process .............................. 66 Simulation results of the graceful forgetting SM-WRLS algorithm for the word six ()1 = 104). 4.89% of the data is employed in the estima- tion process ................................ 67 Simulation results of the selective forgetting SM-WRLS algorithm for the word four. 3.6% of the data is employed in the estimation process 69 Simulation results of the selective forgetting SM-WRLS algorithm for the word six. 2.83% of the data is employed in the estimation process 70 Simulation results of the SM-WRLS algorithm with suboptimal data selection for the word four. 1.19% of the data is employed in the estimation process ............................ 72 ix 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.30 3.31 3.32 3.33 Simulation results of the SM-WRLS algorithm with suboptimal data selection for the word six. 1.53% of the data is employed in the esti- mation process .............................. T3 Simulation results of the selective forgetting SMoWRLS algorithm with suboptimal data selection for the word four. 1.89% of the data is employed in the estimation process ................... 75 Simulation results of the selective forgetting SM-WRLS algorithm with suboptimal data selection for the word six. 1.86% of the data is em- ployed in the estimation process ..................... 76 The acoustic waveform of the word “seven” ............... 77 The fourth “true” parameter (a.,) for the word “seven” ........ 77 Simulation results of the conventional RLS algorithm for the word seven 78 Simulation results of the SM-WRLS algorithm for the word seven. 7.93% of the data is employed in the estimation process ........ 79 Simulation results of the windowed SM-WRLS algorithm for the word seven (I = 500). 22.1% of the data is employed in the estimation process 80 Simulation results of the windowed SM-WRLS algorithm for the word seven (1 = 1000). 17.04% of the data is employed in the estimation process ................................... 81 Simulation results of the windowed SM-WRLS algorithm for the word seven (l = 1500). 14.34% of the data is employed in the estimation process ................................... 8?. Simulation results of the graceful forgetting 'SM-WRLS algorithm for the word seven (p = 10‘3). 18.66% of the data is employed in the estimation process ............................ 83 Simulation results of the graceful forgetting SM-WRLS algorithm for the word seven ()1 = 2 x 10‘3). 27.63% of the data is employed in the estimation process ............................ 84 Simulation results of the graceful forgetting SM-WRLS algorithm for the word seven ()1 = 4 x 10’“). 35.59% of the data is employed in the estimation process ............................ 85 Simulation results of the selective forgetting SM-WRLS algorithm for the word seven. 12.89% of the data is employed in the estimation process 86 Simulation results of the SM-WRLS algorithm with suboptimal data selection for the word seven. 4.74% of the data is employed in the estimation process ............................ 87 Simulation results of the windowed SM-WRLS algorithm with subop- timal data selection for the word seven ( = 500). 10.93% of the data is employed in the estimation process .................. 88 Simulation results of the windowed SM-WRLS algorithm with subop- timal data selection for the word seven (1 = 1000). 8.71% of the data is employed in the estimation process .................. 89 Simulation results of the selective forgetting SM-WRLS algorithm with suboptimal data selection for the word seven. 8.8% of the data is employed in the estimation process ................... 90 3.34 3.35 3.36 3.37 3.38 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Simulation results of the SM-WRLS algorithm for the word four using a l6-bit wordlength. 1.96% of the data is employed in the estimation process ................................... Simulation results of the SM-WRLS algorithm for the word six using a 16-bit wordlength. 2.17% of the data is employed in the estimation process ................................... Simulation results of the windowed SM-WRLS algorithm for the word four (I = 1000) using a 16-bit wordlength. 5.4% of the data is employed in the estimation process ......................... Simulation results of the graceful forgetting SM-WRLS algorithm for the word four ()1 = 10‘3) using a 16-bit wordlength. 3.27% of the data is employed in the estimation process .................. Simulation results of the selective forgetting SM-WRLS algorithm for the word four using a 16-bit wordlength. 3.27% of the data is employed in the estimation process ......................... Systolic array implementation of the Givens rotation-based SM~WRLS algorithm. For simplicity of notation but without loss of generality, the figure shows a purely autoregressive case with p=3; AR(S) ...... The operations performed by the cells used in the triangular array of Fig. 4.1. (a) The Givens generation (GG) cells, (b) the Givens rotation (GR) cells, and (c) the delay element .................. The operations performed by the back substitution array. (a) The left- end processor and (b) the multiply-add units. The initial yin-,1 entering the rightmost cell is set to 0 ....................... The multiply-add unit used in Fig. 4.1 ................. A compact architecture implementation of the adaptive SM-WRLS algorithm ................................. The Operations performed by (a) the GG and (b) the GR cells used in the modules of Fig. 4.7. 6 = +1 (-1) for rotating the equation into (out of ) the system ............................ (a) The GG’ module and (b) the GR’ module used in the architecture of Fig. 4.5 ................................. xi 9] 93 94 99 101 102 103 108 109 110 Chapter 1 Introduction and Background 1.1 General Objectives and Scope This research is concerned with techniques for estimating the parameters of linear system or signal models. Set membership (SM) identification refers to a class of estimation techniques that uses certain a priori knowledge about a linear paramet- ric model to constrain the solutions to certain sets. Based on certain set-theoretic checking criteria, SM algorithms select and use only the useful data to update the parameter estimates and refine the “membership sets” to which the true parameters must belong. When data do not help refine these membership sets, the effort of up- dating the parameter estimates at those points can be avoided. The power of SM algorithms becomes more apparent when implemented using parallel architectures due to the significant reduction in the computational complexity. Because of their strong potential for application and theoretical development in virtually any signal processing problem involving parametric models (such as speech recognition, image processing, beamforming, spectral estimation, and neural networks), SM algorithms have been the subject of intense research effort in recent years [1] - [26]. Much of the recent work on parametric models is closely related to previous papers by Schweppe [27], Witsenhausen [28], and Bertsekas and Rhodes [29] which study state space sys- tems in the control and systems science domains. 1 The most widely studied class of SM algorithms involves the case in which the error sequence, say v(n), is pointwise “energy bounded,” 7(n)v2(n) <1 (1.1) where the sequence 7(n) is known or can be estimated from the data. It is this problem with which this research is concerned. (Other interesting variations involve stability constraints [30], and other noise parameter bounds [31, 32]. Veres and Norton [33] have also investigated the effects of error bounding on model structure identification.) In addition to the many advantages of the algorithms to be developed from this information, the constraint (1.1) minimizes the necessary knowledge of the input. In particular, it is not necessary to know the form of the density function for v(n). Constraints of form (1.1), in conjunction with the model and data, imply pointwise “hyperstrip” regions of possible parameter sets in the parameter space which, when intersected over a given time range, usually form convex polytopes of permissible solutions for the “true” parameters. While exact descriptions of these polytopes are possible [1, 10, 11, 12, 17, 21, 24], algorithms of much lower complexity have been developed which work with a bounding hyperellipsoid, a tight superset of the polytope [5, 6, 7, 14, 20]. Such an “optimal bounding ellipsoid” (OBE) algorithm is the focus of this research. Recently, Deller [5, 6] has reformulated an OBE algorithm of Huang [3, 8, 14, 19, 20] so that it is exactly the familiar weighted recursive least squares solution [34, 35] with the SM considerations handled through a special weighting strategy. This algorithm is referred to as set membership weighted recursive least squares (SM- WRLS) to distinguish it from the original OBE algorithm. A review of SM theory and related topics is presented in the next section. The main objectives of this research are: 1. To generalize the SM-WRLS algorithm to handle complex vector-input vector- 2 output data streams. 2. To reduce the computational complexity of the algorithm. 3. To make the estimator adaptive. 4. To perform simulation studies to research the performance of the SM-WRLS algorithms. 5. To develop parallel architectures to implement the SM-WRLS algorithms. The first three objectives which are addressed in Chapter 2 are concerned with algorithmic developments centered on SM-WRLS identification. Currently, the SM— WRLS algorithm can be used in the noted application areas if the data are real numbers. It will be advantageous if this powerful algorithm can be extended to work with complex numbers. Complex-valued data are encountered in many digital signal processing and image processing problems (e.g., see [36, Chs. 2 & 8]). The current version of SM-WRLS cannot handle complex-valued data. Also, SM-WRLS has the potential for application to a wide range of problems in which the data to be processed (at each instant of time) may take the form of a vector quantity. For example, a very important research area is beamforming [37], a spatial filtering task, to which the SM-WRLS algorithm may be applied. At every time interval, the array of sensors of a narrowband beamformer provide vector outputs that need to be processed by the beamformer using complex computations. There is also potential for applying SM-WRLS to neural networks in which both the input and the output are (complex) vectors [38]. The theoretical development of a generalized SM-WRLS algorithm that can handle complex vector-input vector-output data (objective 1) is the subject of Section 2.2. The second main objective of this research is to develop a more efficient SM- WRLS algorithm which uses a suboptimal checking criterion to reduce the computa- tional complexity of SM-WRLS at the expense of using “suboptimal” weights. This suboptimal strategy can be applied to any version, adaptive or “non-adaptive,” of the SM-WltlS algorithm to improve the computational efficiency. The theoretical development of the suboptimal SM-WRLS algorithm is presented in Section 2.3. Sec- tion 2.4 demonstrates how to make the SM-WRLS algorithm explicitly adaptive (ob- jective 3) by introducing a flexible mechanism by which it can “forget” the influence of past data. It is to be noted, however, that the basic (unmodified) SM-WRLS algo- rithm is inherently “adaptive” compared with the RLS algorithm. This adaptation is inherent in the use of data weights which are “optimal” in the SM sense. Chapter 3 is concerned with objective 4 which discusses simulation studies of the various suboptimal and adaptive strategies presented in Chapter 2. These studies illustrate important points about these strategies and about the SM-WRLS algorithm in general. One of the advantages of the SM-WRLS formulation (contrasted with Huang’s OBE algorithm [20]) is that it immediately admits solution by contemporary paral- lel architectures. This is critical because it reduces the complexity of the algorithm from 0(m2) to 0(m), where m is the number of parameters to be estimated. The significant reduction of computational complexity and parallel hardware implementa- tion of SM algorithms improve their potential for real time applications. Chapter 4 is devoted to parallel hardware implementations (objective 5) of the SM-WRLS and dis- cussion of their advantages, particularly with regard to their improved computational complexity. Finally, Chapter 5 summarizes the main conclusions and contributions of this research and suggests possible directions for future research topics in the SM realm. 1.2 Set Membership Theory A brief background on SM theory is presented in this section which is divided into three major subsections. The first contains a brief overview of the identification problem and least squares (LS) estimation. The second gives an overview of the ()lll'l algorithm. The third presents the “scalar case” of the SM-WRLS algorithm and its formulation based on Givens rotations (GR ’3). However, the computational com plex- ities of these algorithms (and others) are presented and compared in Chapter 2. 1.2.1 The Identification Problem and Least Squares Esti- mation A well known identification problem is the estimation of the parameters of a general autoregressive moving average with exogenous input (ARMAX(p,q)) [35] model of the form ye) = fawn — 2') + )3th - j) + v(n) (1.2) i=1 j=0 in which y(n) is a scalar output of the model; w(n) is a measurable, uncorrelated, input sequence; v(n) is an uncorrelated‘ driving (or error) sequence, known to be bounded as in (1.1), which is independent of w(n); and a,’s and bj,3 are the parameters to be identified. For convenience, the following vector notations are employed 3T(n) '-'- [y(n -1)y(n - 2) - - - v(n - P)w(n)w(n - 1) - ' - w(n - 9)] (1-3) and 93; s [a,a,...a,bob,...b,] (1.4) and hence, y(n) -'_: 03::(12) + v(n) . (1.5) 1As will be noticed below, the SM-WRLS algorithm has no nominal constraint on v(n) other than (1.1). However, if v(n) is correlated, one would expect to obtain biased estimates since the solution is essentially based on the RLS method [35]. The issue of a correlated v(n) with SM processing remains open for further research. Note that this model has an important subcase of a purely order p autoregressive (AR(p)) model which is prevalent in many important applications. The dimension of 00 is defined as the integer m, m=p+q+1 (1.6) noting that m should be reduced to simply m = p for the pure AR case. Consider the LS problem [39]: Given data (or a system of observations) on the interval i 6 [1,11] (n 2 m), and some set of error minimization weights, say {A(i)}, form the overdetermined system of equations q i A(1)y(1) l \/*(2.)y(2) (1.7) i A(1)2:T(1) _. VAC-037(2) -* 00 shown) -», _ Maw), denoted X0090 = v(n) , - (1-8) A and find the LS estimate, say 0(n), for the vector 90. There are well known methods to solve this problem. The first is the “batch” solution given by [39] . —1 em = [xT(n)X(n)] xr(..),,(..) (1.9) with the matrix in brackets playing the role of the weighted covariance matrix, i.e., C,(n) = [XT(n)X(n)] = ih(i)z(i)zT(i) . (1.10) i=1 The second is the recursive matrix inversion lemma (MIL)-based WRLS solution given by [34, 35] P(n -1)a:(n)a:T(n)P(n —- 1) 1+ A(n)C(n) P(n) = P(n-l)-/\(n) (1.11) l A 9(n) = 9(n—1)+/\(n)P(n)a:(n)en_1(n) (1.12) where, P(n) é C;1(n) (1.13) G(n) e 2T(n)P(n-1)a:(n) (1.14) en_1(n) e y(n)-ST(n—l)a:(n) (1.15) in which C,(n) is the weighted covariance matrix defined above, P(n) is its inverse; 6(n) is the parameter vector estimate using n points of data; en_1(n) is the residual (or error) at time 71 based on 9(n — 1); and A(n) is some error minimization weight. These recursions are theoretically equivalent to the batch solution given in (1.9) at each 12. 1.2.2 The OBE Algorithm The OBE algorithm is an SM algorithm developed by Fogel, Huang, and colleagues [3, 8, 14, 19, 20, 23, 25] which can be used to identify a general ARMAX(p, q) model. An overview of this algorithm and its origin is presented in this section (paraphrased from [5, 6, 7]). Let us first present the general idea behind SM algorithms. Consider the case of a real m-dimensional parameter space; 11'". These algorithms bound the parameters in a subset of ’R’" as follows: At any given time, the model, incoming datum, and constraint (1.1) define a pointwise “hyperstrip” region of possible parameter sets in 72’”. Over a given time range, the intersection of these hyperstrips forms a convex polytope of permissible solutions for the “true” parameters; 00. The description of this polytope can be greatly simplified (mathematically) when approximated by a bounding hyperellipsoid, a tight superset of the polytope [5, 6, 7, 14, 20]. By recognizing the relationship between the conventional WRLS and SM identifi- cation, Fogel [25] showed that there is a membership set (hyperellipsoid) centered on the unweighted RLS estimate associated with the identification of constant unknown parameters of a linear system driven by uncorrelated noise, based on constraints of the form (1.1). Fogel also studied the convergence of the membership set to a single point. The subsequent papers by Fogel, Huang, and colleagues [3, 8, 14, 19, 20, ‘23] discuss a weighted approach (i.e., the OBE algorithm) based on the same principle. This algorithm uses energy constraints of form (1.1) to restrict the solutions of the linear parameters to ellipsoidal domains. At time n, the estimator, 0(n), is the center of an ellipsoidal region in R”, of the form E(n) =°_- {o | [a - é(n)]T-1(n) [o — é(n)] g 1} , o e nm (1.16) which represents the smallest bounding ellipsoid of possible solutions, 9, to which the true parameters must belong. “(n) can be interpreted as a weighted covariance matrix on the observations, 7! ’l(n) = ZA(i)c(i)zT(i) . (1.17) i=1 The algorithm, in effect, seeks the Mn) which minimizes the “volume ratio” of the sequential ellipsoids, E(n) and E(n — 1), given the incoming datum, and subject to scaling of the previous weights. For this reason, the technique is referred to as the OBE algorithm. Frequently, no such weight exists, indicating that the new datum is uninformative (in the SM sense) and the effort of updating the parameter estimates can be avoided. It is important to note that the OBE algorithm is derived from geometric consid- erations and is centered on three recursions [20], two of which are remarkably similar to the conventional MIL-based WRLS algorithm [34]. Careful analysis of OBE reveals that it is “WRLS with time varying weights.” This is alluded to above. Whereas the 0813’s geometric approach solves a weighted LS problem on a point-by-point basis, its recursions cannot be exactly interpreted as conventional WRLS because of the fundamental difference in their development. Recently, Deller [5, 6] has reformulated OBE into a more conventional WRLS technique referred to as SM-WRLS. The SM- WRLS formulation, which incorporates SM considerations directly into the standard WRLS recursions, is treated in the next section. 1.2.3 The SM-WRLS Algorithm 1.2.3.1 MIL-based SM-WRLS Algorithm In this section, the theoretical development of the basic MIL-based SM-WRLS algo- rithm is sketched. This algorithm, which is described in detail in [6, 7], accepts only scalar, real-valued data which makes it applicable to a specific class of problems. A brief overview of the SM theory and the main results of the basic SM-WRLS algorithm are summarized below (paraphrased from [6, 7]) to serve as a foundation for general- izing this algorithm to handle complex-valued vector-input vector-output data. The theoretical development of the generalized algorithm is presented in Chapter 2. Suppose that, at time i, y(i), 3(2), and the model of form (1.5) are given. If there is no other information about the system, the parameter vector 00 can theoretically take any real vector value. A constraint on v(i) like (1.1), however, restricts the possible range of values of 90. From (1.1) and (1.5), it is clear that (at time i) 122(1) = [31(1) — 932(i)]’ < 3112—) . (1.18) Figure 1.1: Local and global membership sets in 2-D. It is assumed that the sequence of numbers, 7(i), is known (or can be accurately estimated), and therefore, (1.18) restricts possible values of 90 to some range, say w(i), that is called a “local membership set,” to which 00 must belong at time i. If the data and the v(i) values are given over a range [1,n], then n local sets w(i), i = 1,2, . . . , n (one for each observation) can be generated. Each of these takes the form of a “hyperstrip” in ’R’”, the two-dimensional (2D) case shown in Fig. 1.1. The parameter vector 90 must simultaneously belong to each of these sets, and therefore, must belong 10 to a global membership set given by 11(72): flw(i) . (1.19) Q(i) will be a monotonically non—increasing set with i, and it will be the minimal (most restrictive) membership set known under the conditions of the problem. The global membership set, 0(n), is the intersection of the individual strips w(i) (see 1.19), which takes the form of a convex polytope in ’R'", as illustrated in Fig. 1.1 for a 2-D case. Note that the individual w(i) does not necessarily help to refine Q(i), i.e., it might be true that Q(i) = fl(i—1)flw(i) = Q(i — l) (1.20) and the corresponding data are considered “unuseful” in the SM sense. See, for example, the case of (2(4) in Fig. 1.1. The set 0(i), which requires high computational complexity algorithms [1, 10, 11, 12, 17, 21, 24] to describe and work with, provides a useful way of determining which data points are informative and which are not. Note that the center of this set provides a systematic estimate of the parameter vector 90 which would be expected to improve as i increases. However, neither 9(i) nor its center is clearly or conveniently related to the WRLS estimation process of interest. There is, however, a related (but potentially larger) global membership set associated with the WRLS process at time n. This is derived from (1.1) and (1.18) by noting that the constraint (1.1) on the input implies an “accumulated inequality” given by )5 1(1) [31(1) — (93,‘:.:(.')]2 < )2 3(7) (1.21) i=1 which holds as long as the set of “weights” used, {A(i)}, are non-negative (see [7, 11 Lemma 1]). This inequality leads to a global membership set, say C(12), to which 00 must belong. Since C(n) (from the discussion above) is the smallest known set, it must be true that fl(n) Q C(n). Two fundamental theorems underlie the basic SM-WRLS algorithm [5, 6, 7, 18]. The first indicates how the bounding ellipsoid is related to the conventional W RLS process and the second indicates how the optimal data weights are chosen. Proofs are found in [7] for the AR(p) case. The generalization to ARMAX(p,q) is straight- forward. Theorem 1 [7] Let 0(n) C_: ’R'" be the set of all parameter vectors which are com- patible with the data fori E [1,n] under constraint (1.1). Then there exists a superset of C(n), say C(n), a hyperellipsoid in ’R'”, which is closely associated with the WRLS estimation process: " “ C307“ 7 m e(n) = {a ([9 — 9(n)]T E(n)) [9 — 0(n)] <1} , o e n (1.22) where, n A . an) -- 6T(n>c.(n)éi 7 (hence) —-» - (/1(2)z"(2) -» 90: )(2)y"(2) —+ (2.3) ( Mann) —+ . , 1(n)y"(n) —» , denoted x”(n)€-)0 = Y”(n) . (2.9) The batch solution is given by (39] out) = [X(n)x"(n)]"]l X(n)Y”(n) (2.10) with the matrix in brackets called the (weighted) covariance matrix, i.e., c,(n) s [X(n)x"(n)] = Z Mi)a:(i)a:”(i) . (2.11) i=1 The remaining matrix product (in (2.10)) is cross-covariance matrix for the vector inputs and outputs, denoted Cn(n) and given by 0.,(n) s. [X(n)Y”(n)] = Z Mi):c(i)y”(i) . (2.12) i=1 The conventional recursions of the MIL-based WRLS solution, which we upgrade 22 here to the complex vector case, are given by where, P(n) a c;1(n) (2.15) C(n) é m”(n)P(n—1)a:(n) (2.16) 6,,_1(n) s. y(n)—O”(n—l)z(n). (2.17) As in the real scalar case, it is assumed that the complex vector error sequence, v(n), is pointwise “energy bounded,” i.e., 7(n) tr {v(n)v”(n)} < l (2.18) where the sequence 7(n) is known or can be estimated from the data, and tr {A} denotes the trace of the matrix A. Since v(n)v” (n) is a hermitian matrix with real diagonal elements, the sequence 7(n) is real numbers. This is useful in the proof of Lemma 1 below. The significance of the bounding sequence on tr {v(n)v”(n)} is that it implies pointwise “local membership sets” to which any reasonable estimate for 90 must belong. If the local membership set at time n is w(n), it follows immediately from (2.5) and (2.18) that w(n) = {a I 7(71) tr { [y(n) — @Hz(n)] [g(n) — @H2(n)]H} < 1} , 9 E kaxk (2.19) where ('9 is a general matrix replacing Go. The interpretation of this set becomes 23 clearer when considering a single output, y,(n), the i‘” element of y(n). The related subset is w,(n) = {0,- I 7(n) [y,(n) —— 0,”.1:(n)] [31,-(n) -— 9[{x(n)]fl <1} , 0,- E cm}: (2.20) in which 9,- is the i‘” column of the parameter matrix 9. Each w,(n) takes the form of a hyperstrip (or degenerate hyperellipsoid) in the parameter subspace 6””. If the data and the 7(i) values are given over a range [1,n], then it local sets w(i),i = 1,2, . . . ,n (one for each observation) can be generated. The parameter matrix 90 must simultaneously belong to each of these sets, and therefore, must belong to a “global membership set” given by 51(11): (33(1) . (2.21) (Mi) will be a monotonically non-increasing set with i, and it will be the minimal (most restrictive) membership set known under the conditions of the problem. Following the same arguments as in the real scalar case (see Section 1.2.3.1), a related (but potentially larger) global membership set associated with the WRLS process (at time n) can be derived. This is done by noting that the constraint (2.18) on v(n) implies that an “accumulated inequality” holds: Lemma 1 Condition (2.18) implies fl ZMi) tr {v(n)v”(n)} S iii—3 (2.22) i=1 for any non-negative (real) sequence {Mi)}. The equality can be removed for n 2 i0, where i0 is the minimum i for which Mi) aé 0. '24 Proof of Lemma 1: (Guided by Deller and Luk [7]). That the equality holds for n < i0 is obvious. At i0 1 r 1 "1 ”('0) 1(0)1{v(0)v (o)}< 7,0 which follows immediately from the positivity of Mia) and (2.18). But 1(1 011(1.,)11{ ”=(10)} 21(1) 1111(1){ v"(i)} (2.24) and = :Mi) . (2.25) SO, . . :1(1)1r{v(1)v”(1)} < gig—[3. (2.26) Also Adding (2.26) and (2.27) io+l io+l Z 2; 1(1) 11{1(1)1”(1)} < Z: %1‘j . (2,28) and so on, by induction. D It is assumed, for convenience, that M1) 75 0, and therefore, Lemma 1 becomes in i)t(i)r{ v(1)}<):j—E—. (2.29) i=1 i=1 7 By inserting y(i) — 9512(i) for v(i), inequality (2.29) becomes 2": 1(1) tr [ [v(i) - 93’ 21(1)] [9(1) — 932(1)]H} < :1: 59-}- . (2.30) i=1 7(2 25 This inequality is a fundamental result which leads to a global membership set, say 8:2(n), to which 90 must belong. Since fl(n) (from the discussion above) is the smallest known set, it must be true that (Mn) Q E(n). The main result is stated as a theorem. Theorem 3 Let “(12) Q 0““ be the set of all parameter matrices which are com- patible with the data fori E [1,n] under constraint (2.18). Then there exists a super- set of (Mn), say C(n), a hyperellipsoid in Cm“"”, which is closely associated with the WRLS estimation process: fi(n)={® | tr{[G—é(n )]”C’((n)[9- é( n)]}<1} , 966111111).- (2.31) where, 11(11): 11{o”(n)c n)e(n )}+§: ”(2 i=17 7(6) — tr {Cy(n )}. (2.32) in which 9 is a general matrix replacing Go. As before, the interpretation of the set (Mn) is simple when considering each scalar component of the output individually. The result is a corollary of the theorem. Corollary 1 Under the conditions of Theorem 3, all possible parameter vectors as- sociated with output yg, say 9,, are confined to a hyperellipsoidal membership set which is centered on its current estimate, 0,-(n), a.(n)=(9. | [9.— 19,-(11))"C(n)(") [91—12(11)] <1} , 0,-6ka (2.33) in which 6,-(n) is the WRLS estimate of column i of the parameter matrix @(11) at time n using error minimization weights {Mi)}. 26 This means simply that there is a hyperellipsoidal domain in the parameter sub- space which contains all possible parameter vectors and which is centered on the WRLS estimate. Note that the ellipsoid associated with each y,, i = 1,2, . . ..k, is identical to all others except for its center. Proofs of Theorem 3 and Corollary 1: (Guided by Deller [45]) It follows imme- diately from Lemma 1 that (see (2.30)) i: 1(1) 11 I [11(1) — @"x(i)} [11(1) — O”x(i)IH} < i % . (2.311) This constrains the possible parameter matrices to the set {(9 | 2: 1(1) 11 I [g(i) — O”x(i)} [11(1) — ®”x(i)IH} < 2:31-38} (2.35) Expanding the trace term, {9 l i Mi) tr {v(i)y"(i)- 9”z(i)v”(i) - v(i)z”(i)@ + ®”m(i)w”(i)®} < Z— (’ (I,} (2.36) i=17(z) Moving the summation across terms, W) {9 I tr {Cy(n) — @"CIy(n)— ny(n)9 + GHC (n)@} < 1:1 jig—(0} (2.37) where, C,(n) and C,y(n) are defined in (2.11) and (2.12), and Cy(n) is defined in the same way as Cx(n). From (2.10), Cut”) = Cx(n)é(n) or Cg(n) = O”(n)Cf(n)=OH(n)Cx(n). (2.38) This substitution in (2.37) and some simple manipulation yields {o I 11(9"c.(n)o — e"c,(n)o(n) - e"(n)c,(n)o} < Z 3%} — 111{C,(n))} i=1 (2.39) Completing the square on the left side yields {9 I tr {@"C,(n)@ - GHC;(n)(-:)(n) — é”(n)C,(n)€-) + é"(n)Cx(n)O(n)} < éié—Z} -— tr {C,,(n)} + tr {9"(n)C,(n)€-)(n)} é n(n)} (2.40) from which it follows that the set is described by {O I tr{I@ -- O(n)}H C,(n) [G — é(n)}} < Ic(n)} . (2.41) If the system is assumed to be stationary then C,(n) is positive definite, and the left side of this inequality must be a positive number. x(n), therefore, must also be positive. Dividing both sides by x(n) yields (2.31). D To prove Corollary 1, it is convenient to write . H .. " tr I [e — e(n)] c.(n) [o - 9(n)}} = 21,- (2.42) i=1 . H 1. where c,- indicates the 1‘” diagonal element of [G - é(n)I C;(n) I9 — C(11)}. Now it is clear that H . e,- = [9.- - 9101)] c,(n) [9,- — 9,-(n)} (2.43) for any i, where 9; and 6,(n) are the i‘” columns of G and C(n). It is also true that all the cj’s are positive since Cx(n) is a positive definite matrix. Therefore, I: q 7’1(n) . (2.72) 35 Lemma 2 Let A‘(n — 1) denote the optimal weight in the sense of 'l‘heorem 4 (whirl) might be zero) at time n — 1. Then _ . Mn) Mn) tr{e,,_1(n)6£,’_,(n)} . _. n(n,Mn))_n(n—1,)\ (n_1))+:y_(TII- l+Mn)G(n) . (2.73) Proof of Lemma 2: See (2.61). Proof of Theorem 5: The minimum of n(n, Mn)) with respect to Mn) can be found by differentiating (2.73) and setting the result equal to 0, 311(11, A(71)) 2 2 u _ W5)— : G (11)1 (11) + 20(11)1(11) + [1— 7(11) 11 {1.._,(11)1,,_,(11)}I = 0 . (2.74) This is a concave upward quadratic function with its minimum at X(n) = —G"1(n) < 0 . (2.75) Two real roots of (2.74) always exist, —1:t n tr 6”. n61 n 11101.01) = 1/7( ) { 1( ) 1( )} (2.76) C(n) the smaller corresponding to a maximum of n(n,Mn)), the larger to a minimum. Only the larger root can be positive since the lower root is bound to be less than /\'(n). Therefore, it is only possible for n(n,Mn)) to exhibit a minimum or to be increasing on positive Mn). It is easy to use (2.76) to verify that the larger root. is positive if condition (2.72) is met. Cl With these results, it can be argued that: If det Cx(n,Mn)) is increasing, but not changing significantly over reasonably small values of Mn), then it is sufficient to seek Mn) which minimizes n(n, Mn)). If n(n, Mn)) is monotonically increasing on 36 Mn) 2 0, this value is Mn) = 0 which corresponds to rejection of the equation at time n. It suffices, therefore to have a test for a minimum of n(n,Mn)) on positive Mn). As noted above; a simple test is embodied in condition (2.72). If this test is met, it is then cost effective to proceed with the standard optimization process centered on (2.47). Otherwise, the explicit construction and solution of (2.47) can be avoided. It is to be noted that even if (2.72) is met, it is possible that the optimization procedure will still reject the datum. Perhaps more importantly, it is also possible for (2.72) to reject data which would have been accepted by the usual process. These ideas will be explored in the simulation studies (Chapter 3). Finally, note that when the simplified test (2.72) accepts the new equation, there are tools to compute the weight which is “optimal” in the sense of minimizing n(n, Mn)). In particular, this would be the larger of the roots in (2.76). However, it clearly makes more sense to compute the optimal weight according to (2.47), since this computation is not much more expensive. The improvement in the computational complexity due to “suboptimal checking” is discussed in Chapter 4. 2.4 Adaptive SM-WRLS Algorithms While the theoretical developments of the previous two sections, in principle, provide the background for further SM-WRLS developments with general complex vector in- puts and outputs, it is upon the special case of real scalar data that most of the remaining work will focus. This special case was necessary in order to initiate a tractable study of the difficult issues of adaptation, algorithmic behavior, and archi- tecture development. The work on this simpler case which is to follow, however, will lay the groundwork for future studies on the more general cases. 37 2.4.1 General Formulation The adaptive algorithm presented here uses “back rotation” in order to partially or completely “forget” past information enabling it to track (potentially fast) time varying signals. Back rotation [40] is a Givens rotation-based technique that removes (or rotates out) a previously included equation from the system. In this section, the back rotation technique is modified such that a previous equation can be partially removed. This will permit a broader class of adaptive strategies. In SM terms, back rotation causes the ellipsoidal membership set to expand due to the removal of information. This expansion entices the algorithm to incorporate present data. The back rotation technique requires that all the weights with the corresponding equations (for weights other than zero) be stored for later use. Recalling Fig. 1.2, it is seen that at each step in the SM-WRLS algorithm, the upper triangular system of simultaneous equations T(n)9(n) = d1(n), is solved (when data are accepted) to obtain the optimal estimate [6, 7, 9]. Suppose in approaching time n that the past equation to be (partially) removed is at time 1'. Rotating this equation out of the system is accomplished by re-introducing it as though it were a new equation. A weight M, where u is the fraction of the equation to be removed from the system, is used, and some sign changes in the rotation equations are necessary [40]. The system of equations with r removed is referred to as the “downdated” system at time n — 1, and the related quantities are labeled with subscript d, i.e., T.(11 — 1)91(n — 1) = 91,.(11 — 1) . (2.77) The downdated ellipsoid matrix is Cd(n — 1)/nd(n — 1) where 01(11—1) = T§(11—1)Td(n—1), (2.78) 1102 — 1) = 1111.101 —1) 111+ 71102 — 1) . (2.79) 38 with INT) ( l Rd(n -— 1) i h(n — 1) — 7(7) - 7(1)y2(1)) (2.80) in which h(n — 1) represents the updated value of k which includes the equation at time n — 1. Equations (2.79) and (2.80) follow immediately from the definition of 11 found in (1.23). These relations can be used repeatedly regardless of the number of equations (partially or completely) removed prior to time 12. If more than one equation is removed prior to n, E(n — 1) in the right side of (2.80) is replaced by the h(n — 1) for all downdates after the first one. Following all necessary downdating just prior to time n, the algorithm uses the downdated system to compute the quantities Gd(n) and tin-1,1101) which are necessary to compute the optimal weight for the equation at n. To compute a downdated SM-WRLS estimate, therefore, it is only necessary to downdate the matrix T(n — l) and the vector d1(n — 1) and to solve for 94(n — 1) prior to Step 1 in Fig. 1.2, then replace all relevant quantities in Step 1 by their downdated versions, i.e., 0101) = Hera-‘01 -1)T1T(n -1)1(.1) = n 9.01) 1):. (2.81) and (n-1,d(n) = y(n) - 95(n — 1)x(n) . (2.82) n(n - 1) and h(n — 1) are downdated according to (2.79) and (2.80). Then A‘(n) is found in Step 2 using (1.26) with downdated quantities. Note that downdating is unnecessary if the equation 7' was rejected by SM-WRLS. In this case Td(n — 1) = T(n — 1) and 9.1(n —- 1) = 9(n - 1). Conversely, when the “new” equation at n is rejected, then T(n) = Td(n — 1) and 9(n) = 9.1(n - 1). A wide range of adaptation strategies is inherent in the general formulation de- scribed above. Three major subcases are identified in the following. In each of these subcases, the objective (in SM terms) is to expand the ellipsoidal region of possible 39 solutions in order to track fast time variations in the signal. 2.4.1.1 Windowing Windowed adaptation is a special subcase of the general formulation which uses a sliding window of fixed length, l (l 2 m + 1), so that the estimate at time 77. covers the range In — l + 1, n]. The windowing technique is made possible by the ability to completely (i.e., p = 1) and systematically remove equations at the trailing edge of the window. An illustration of this technique is shown in [5]. This technique implements the same procedure as that of the basic SM-WRLS algorithm for i = 1, 2, . . . , l, and thus, exhibits similar performance. The initialization process (for this strategy and all other GR-based SM-WRLS strategies) is also the same as that of the basic SM-WRLS algorithm; i.e., the working matrix W (or T) is filled with zeros, Mi) = 1 for i = 1,2,...,m + 1, and 76(0) = 0 (see Fig. 1.2). Windowed adaptation starts at time 1 +1 and works as follow: Prior to consideration of the equation at time n (n _>_ I + 1), it is simply necessary to remove the equation at time 1' = n —l from the system by complete “downdating” as described above. All the theoretical results of the general formulation are valid with the value kd(n — 1) given by 1101-1) = . 190—1101121110 i=n—l+l 7 Z) , Mn —I 2 = n(n — 1) — fl (1 — 7(n - ()3; (n — 1)) (2.83) in which the value n — l acts like a special 7' for the windowed adaptation. 2.4.1.2 Graceful Forgetting In this technique, only a fraction, )1 (chosen such that p“ is an integer), of all previ- ously included equations is removed at each n similarly to the exponential forgetting 40 factor conventionally used with WRLS”. This technique has a sliding window of fixed 1 length, )1” , with the effective weights decreasing linearly when moving toward the trailing edge of the window. Hence, the equation at the trailing edge of the window has an effective weight of ([pMn + 1 -— p") and the equation just rotated in has an effective weight of I/Mn). Note that although each equation must be partially rotated out )1" times, only those equations that were previously accepted (in the past 17" recursions) need to be considered by the algorithm. 2.4.1.3 Selective Forgetting This technique selectively chooses the equations to be (partially or completely) re- moved from the system based on certain user defined criteria in order to remove their influence from the system. The selection process can be, for example, to remove (or downweight) only the previously heavily weighted equations, to remove the equations that were accepted in regions of abrupt change in the signal dynamics, or to remove the equations starting from the first equation and proceeding sequentially. Whatever the criteria, a fundamental issue is to detect when adaptation is needed to improve the parameter estimates. This issue is further investigated in Chapter 3. Therefore, the adaptive SM-WRLS algorithm and its extensions (e.g., the special subcases described above) are potentially very useful techniques that can be applied to identify models with fast time varying signals. Due to the flexible nature of this algorithm, however, various subcases can also be defined and tested. In any subcase, the objective (as noted above) is always to expand the ellipsoidal region of possible solutions. This objective can also be achieved in an unconventional but intuitive method. For example, an ad hoc adaptation strategy may be to inflate the ellipsoid (i.e., multiply the matrix Cz(n)/rc(n)) by a scale factor, say a (0 < a < 1), when 3An exponentially forgetting factor can also be incorporated into SM-WRLS. Although the strat- egy is computationally very efficient, it has not been found to be effective for adaptation in prelim- inary experiments. See Section 2.4.2 for theoretical development and discussion. 41 there is a need for adaptation. It is important to note that the general adaptive formulation is amenable to the suboptimal strategy presented in Section 2.3. The performance of the adaptive, suboptimal, and adaptive suboptimal techniques are investigated in Chapter 3. 2.4.2 Exponential Forgetting Factor Adaptation In this section, it is shown that exponential forgetting factor“ adaptation can be incorporated into SM-WRLS. The theoretical development of this computationally efficient strategy is presented, followed by a discussion explaining why this method is not found to be effective for adaptation in simulations to date. It is possible to make the conventional WRLS solution by GR’s adaptive by mul- tiplying the existing system of equations of the form T(n)9(n) = d1(n), through by a “forgetting factor,” [3, where, 0 < B < 1, prior to rotation of the equation at time n + 1 into the system [40, 43, 44]. Since this process nominally inflates the ellipsoid size by removing information, the optimization at the next time must be performed with respect to this “intermediate” system, say 'r.(11)9(11) = 111,.(11) , (2.84) where Ta(n) = 9T(n) and d1,a(n) = fldl(n). Note that in order to use (1.26), “ada ted” versions of e, G, and is must be used. It is easy to see that P (w(n +1) = 6,,(n +1) (2.85) ‘This is a more conventional “forgetting factor” strategy than that suggested by Dasgupta and Huang in [14]. (since 9(11) is unchanged), and 11111 G.(n +1) = 940(1) +1) . (2.86) Recalling (1.23) and the alternative expression for 5(a) in (1.34), and noting that C1.a(n) = TZ(n)Ta(n) = 62TT(n)T(n) = B2Cx(n) , (2.87) the leading term of 76.,(n) is easily shown to be II d1,a(n) ”3 = 92 II d1(n) “3. To complete h(n), note the fact that, at time n + 1, this method effectively assigns weight fln+1“m to the equation at time i, where Mi) is the weight assigned to the equation when originally rotated into the system [44]. Therefore, using (1.34) and the discussion above, yields h(n) = 6’ ll d1(n) H3 + (“n(n) (2188) where, 71.01) = 2": WWW") (1 — 109201)) 1:, 7(1) = 921.(n-1)+”7If;)(1-11n)y’(n)). (2.89) To introduce exponential forgetting factor adaptation into the SM-WRLS algo- rithm, therefore, it is only necessary to downweight the matrix T(n — 1) and the vector d1(n — 1) by multiplying through by ,9 prior to Step 1 in Fig. 1.2, then to replace all relevant quantities in Step 1 by their downweighted versions. xa(n -- 1) and ka(n — 1) are updated according to (2.88) and (2.89). Then A‘(n) is found in Step 2 using (1.26) with downweighted quantities. In preliminary experiments, it has been found that this strategy is not effective 43 for adaptation. Actually, this strategy performs well when applied to slow time varying systems but fails to track fast time variations. To CXplain this behavior, it is important to understand the effect of the “downweighting” process. As noted earlier, multiplying the system of equations (at each recursion) by 9 (0 < 5 < 1) inflates the ellipsoid volume by a factor that is inversely proportional to 6. Therefore, a large value of 9 causes the ellipsoid volume to increase slightly which lessens the tendency to accept new equations and the ability to track fast time varying signals. On the other hand, a small value of 9 causes the ellipsoid volume to increase significantly, and therefore, a tendency to accept more equationss. However, every time an equation is accepted, the ellipsoid volume decreases accordingly. Hence, a small value of 9 may result in “bad” estimates due to the continuously expanding and shrinking ellipsoid and, more importantly, to the fact that the effective window length is very small. Therefore, it is desirable to choose 9 to be large but not to the extent of having insignificant effect on expanding the ellipsoid volume. To pursue this discussion further, consider the problem of estimating a signal characterized as having slow time variations everywhere except in a region around time n at which the signal exhibits fast time variations (see, e.g., Fig. 3.3, where n = 2000). The fact that the signal is changing very slowly prior to n induces the algorithm to accept some points which, in turn, causes the ellipsoid volume to decrease. An increase in the “confidence” of the estimate results. Near time n, the ellipsoid volume becomes very small. When the signal moves rapidly away from its current location, it eventually moves outside the ellipsoid which is therefore no longer a valid bounding ellipsoid. The situation can be described as follows: The signal parameters are wandering outside of a small ellipsoid, but the incoming equations are being rejected (because of the high confidence in the existing estimate) until some of 5Note that one of the main advantages of SM-WRLS is the use of a small fraction of the data. Therefore, from the SM point of view, small values of 9 are least desirable. 44 the earlier influential equations are downweighted (or suppressed). Remember that the “downweighting” process is recursively being implemented. Hence, if 5 is large, it takes a long time to suppress the influence of previously accepted equations, or, in SM terms, the ellipsoid volume increases at a slow rate which may not be sufficient to track the fast time varying signal. This explains why the algorithm ceases accepting new equations for a period of time (which is dependent on B), and therefore, fails to track the signal. The adaptive SM-WRLS algorithms (of Section 2.4.1) do not depend on a fixed factor, such as 9, to expand the ellipsoid volume in order to adapt to the changing dy- namics of the system. However, these algorithms expand the ellipsoid by (selectively) removing previously accepted influential equations from the system, either partially or completely, and therefore, relinquishing their influence from the current ellipsoid, thereby allowing it to expand and adapt to the changes in the signal dynamics. 2.5 A Survey of the Computational Complexities of Several Related Sequential Algorithms The purpose of this section is to compare the computational complexities of several related sequential algorithms that solve the LS problem. The computational complex- ities are shown in Table 2.1 in which the first column indicates the algorithm under study, the second column gives the complexity (in floating point operations (flops) per equation) required to check for a valid SM weight, where one flop is defined as one floating point multiplication plus one floating point addition. The third column gives the complexity required to update (or downdate in the case of back rotation) the covariance matrix and the LS solution, and the fourth column gives the total number of flops per equation for a typical example described below. There are two theoretically equivalent methods associated with conventional LS 45 Table 2.1: Computational complexities (in floating point operations (flops) per equa- tion) for the real scalar sequential algorithms. Covariance and Example Algorithm Checking Solution Update (flops) Batch Solution -- 0(m3) > 1000 MIL—based WRLS — 3m2 + 5m + 3 353 MIL-based SM-WRLS m2 + 2111 + 13 2m” + 3m + 7 180 Suboptimal MIL-based SM-WRLS (m + 1) + 453(m2 + m + 12) 2m2 + 3m + 7 47 GR-based SM-WRLS .5m2 + 2.5m + 13 2.5m2 + 10.5m + 5 160 Suboptimal GR-based SM-WRLS (m + 1) + s(.5m3 + 1.5m + 12) 2.5m2 + 10.5m + 5 55 solution; the “batch” solution [39] which requires 0(m3) flops per equation, and the MIL—based WRLS solution [34, 35] which requires 0(m2) flops per equation (see Section 1.2.1). The SM-WRLS algorithm can also solve the problem in 0(m2) flops per equation based on MIL (see Section 1.2.3.1), however, a GR-based solution (outlined in Section 1.2.3.2) is also considered because it is amenable to a systolic architecture implementation which reduces the complexity of the algorithm to 0(m). The latter algorithm is the subject of Chapter 4. Similar computational complexity expressions for the MIL- and GR-based SM- WRLS solutions can be derived for the suboptimal strategy of Section 2.3, and are shown in Table 2.1. Note that there is one square root operation associated with each of the SM-WRLS algorithms but has been dropped from the table since it does not have any significant effect on the comparison. A note about the computational complexity of the OBE algorithm [20] is in order. It was noted by Huang [20] that the complexity of the OBE algorithm is in the order of m2 multiplications (or flops) for the information evaluations (i.e., checking), while 46 updating the estimates requires 6m2 multiplications (or flops). However, a careful analysis of the OBE algorithm during the course of this work, taking into account the symmetric properties of some matrices, reveals that the complexity of this algorithm is m2 + 2m + 3 flops for checking and 2m2 + 3m +16 flops for updating the estimates. A similar analysis of a “suboptimal” OBE algorithm [14] shows that the complexity of this algorithm is m + l flops for checking and 3m2 + 4m + 15 flops for updating the estimates. If the fraction of the data accepted by the SM-WRLS algorithm is denoted by r, the fraction of the data accepted by the suboptimal SM-WRLS algorithm by s (s < r), and the fraction of the data accepted by the SM-WRLS algorithm after passing the test (2.72) by t (t S s), then the total computational complexities of the SM-WRLS algorithms shown in Table 2.1 can be defined as follows. For the MIL-based SM-WRLS algorithm, the total computational complexity is given by (m2 + 2m +13) + 1' [2m2 + 3m + 7] (2.90) flops per equation. For the suboptimal MIL-based SM-WRLS algorithm, it is given by (m+1)+s[m2+m+12]+t[2m’+3m+7] (2.91) flops per equation. For the GR-based SM—WRLS algorithm, it is given by (.5m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] (2.92) flops per equation. Finally, for the suboptimal GR-based SM-WRLS algorithm, it is given by (m + 1) + .3 [.5m2 + 1.5m +12] +1 [2.5m2 +10.5m + 5] (2.93) flops per equation. 47 Let us consider a typical example to compare the complexities of the various algorithms. Assume that the model order m = 10, the fraction of the data accepted by the SM algorithms (both SM-WRLS and OBE), r, is 0.2, and the fraction of the data accepted by the suboptimal SM algorithms, 3, is 0.1. To simplify the analysis, it is assumed that all the equations satisfying condition (2.72) for the suboptimal SM- WRLS algorithm are also accepted by the SM-WRLS algorithm (3 = t). The total number of flops (per equation) is shown in the fourth column of Table 2.1. According to the computational complexities computed here, the OBE algorithm uses 173 flops and the “suboptimal” OBE algorithm uses 47 flops. The SM algorithms typically use 45 ~ 50% the number of flops required by the MIL-based WRLS algorithm. The computational savings are mainly due to the infrequent updating of the covariance matrix and the LS solution. A suboptimal test which is presented in Section 2.3 can be incorporated in virtually any version of the SM-WRLS algorithms to further improve the computational efficiency. When the suboptimal strategy is applied to a given algorithm, it reduces the computational complexity of the algorithm by 60 ~ 70%. The adaptive GR-based SM-WRLS algorithms of Section 2.4.1 use the same com- putational complexities as those of the “non-adaptive” GR-based SM-WRLS algo- rithms when performing back substitution to downdate the covariance matrix and the LS solution. The only exceptions being 7:4 (see (2.80)) which requires one extra flop per equation for cases when p ;é 1, and the fact that the LS solution needs to be computed (downdated) only after all necessary downdating of the covariance matrix and F: at any given time (see Section 2.4.1). If the fractions of the data used by the adaptive GR-based SM-WRLS algorithms are denoted by the same symbols used I above, and the fraction of the data removed from the system is denoted by u, then the total computational complexities of the adaptive SM-WRLS algorithms are as follows. For the windowed and selective forgetting SM-WRLS algorithms, the total 48 computational complexity is given by (m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] + u [2m2 + 10171 +5] (2.9.1) flops per equation. For the suboptimal windowed and selective forgetting SM-WRLS algorithms, it is given by (m + 1) + 3 [.5m2 + 1.5m +12] +t]2.5m2 +10.5m + 5] + u [2m2 + 10m + 5] (2.95) flops per equation. For the graceful forgetting SM-WRLS algorithm, it is given by (.5m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] + p-lu [2m2 + 10m + 6] (2.96) flops per equation. Finally, for the suboptimal graceful forgetting SM-WRLS algo- rithm, it is given by (m + 1) + 3 [.5m2 +1.5m +12] +1 [2.5m2 +10.5m + 5] + ,rlu [2m2 + 10m + 6] (2.97) flops per equation. Consider the same example with the new assumption that the fraction of the data removed from the system, 11, is half that of the data accepted (i.e., u = 0.1 for the SM-WRLS algorithm and 0.05 for the suboptimal strategy). Both the windowed and the selective forgetting strategies have the same complexity (191 flops per equation) which is reduced to 67 flops per equation when the adaptive suboptimal strategy is employed. Using the same fractions of data and a p value of 0.005, the graceful forgetting strategy use a total of 6280 flops per equation which is reduced to 2565 flops per equation for the adaptive suboptimal strategy. This larger number is due to the fact 49 Table 2.2: Computational complexities (in complex floating point operations (cflops) per equation) for the generalized sequential algorithms. j Covariance and Example MIL-based Algorithm Checking Solution Update (cflops) WRLS — 3(mk)’ + (21: + 3)mk + k + 2 32312 SM-WRLS (mic)2 + (I: + 1)ml: + I: + 12 2(mk)’ + (I: + 2)mk + k + 6 15365 Suboptimal SM-WRLS (mi:2 + k) + a [(77%)2 + mk + 12] 2(mlc)2 + (k + 2)mk + k + 6 7276 that each equation must be partially rotated out p“ (or 200 in this case) times. Al- though this technique “forgets gracefully” and quickly adapts to the rapid changes in the signal dynamics, it is computationally expensive. However, the adaptive systolic architecture (of Section 4.3) reduces the computational complexity of the “sequential” adaptive GR-based algorithm by 60%. In Chapter 4, the GR-based SM-WRLS algorithm is mapped into a systolic ar- chitecture for speed advantages. The (parallel) complexities of the parallel GR—based SM-WRLS algorithm and its suboptimal version are discussed in Chapter 4, however, it is worth noting that, compared to the complexity of the “sequential” GR-based al- gorithm, the systolic architecture implementation reduces the complexity of this algo- rithm by about 60%, and when the suboptimal strategy is employed, the complexity is reduced by 84%. In order to find the computational complexity of the generalized SM-WRLS al- gorithm developed in Section 2.2, let us define a complex flop (cflop) as four real floating point multiplications plus four real floating point additions. The computa- tional complexities of the MIL-based WRLS, SM-WRLS, and suboptimal SM-WRLS algorithms are shown in Table 2.2. The fourth column in this table shows the total 50 number of cflops (per equation) when using the same example used for the real scalar case with k = 10. The SM-WRLS algorithm typically uses 45 ~ 50% the number of cflops required by the MIL—based WRLS algorithm which is consistent with the real scalar case (see Table 2.1). However, when the suboptimal strategy is applied, it reduces the computational complexity of the algorithm by 75 ~ 80%. The com- putational savings for the complex vector case are more than those reported for the real scalar case because of the fact that the suboptimal strategy performs “scalar” checking compared with “vector” updating. However, the computational complexity of the generalized sequential GR—based SM-WRLS algorithm is expected to be worse than that of the MIL-based algorithm due to the fact that the LS solution matrix has to be solved for one vector at a time. 51 Chapter 3 Simulation Studies 3. 1 Introduction SM algorithms have the potential for application to many real digital signal process- ing problems such as speech recognition, image processing, beamforming, spectral estimation, and neural networks. This chapter is concerned with testing the behavior of the various suboptimal and adaptive SM-WRLS strategies presented in Chapter 2 for the real scalar case. This is done by conducting extensive simulation studies which illustrate important points about these strategies and about the SM-WRLS algorithm in general. The simulation studies are performed on models derived from real speech data. Section 3.2 discusses the results using a model of order two so that the results can be easily illustrated. The performance of a more realistic model of order 14 is analyzed in Section 3.3. These simulation studies are carried out on a 322-bit ma- chine, however, it is important to research the behavior of the SM-WRLS algorithm on a smaller wordlength. The performance of this algorithm is tested using a 16-bit wordlength and is discussed in Section 3.4. 52 '2. fl 1 v T f ,o 1 2. 3. to. s. 51. 7, Sample. n (X103) Figure 3.1: The acoustic waveform of the word “four”. 3.2 Simulation Results of two AR(2) models In this section, the identification of two time varying AR(2) models of the form y(n) = 010090! - 1) + adult/(n - 2) + v(n) (3-1) is considered. Two sets of AR parameters were derived using linear prediction (LP) analysis of order two on utterances of the words “four” and “six” by an adult male speaker. The acoustic waveforms of these two words are shown in Figs. 3.1 and 3.2. While more meaningful analysis of speech would involve model orders of 10-14 (see, e.g., [47]), this small number of parameters is used here so that the results are easily illustrated. The adaptive LP algorithm used to compute these parameters is de- scribed in [44]. The data were sampled at 10 kHz after 4.7 kHz loxvpass filtering, and 53 -2. fi ‘r 17 r T r o 1 2. 3. s. s. s. 7. Sample. n (X103) Figure 3.2: The acoustic waveform of the word “six”. the “forgetting factor” in the LP algorithm (see [44]) was 0” = 0.996. A 7000 point sequence, y(n), for each case (“four” and “six”) was generated by driving the appro- priate set of parameters with an uncorrelated sequence, v(n), which was uniformly distributed on [—1,1]. v(n) in each case was generated using a random number gener- ator based on a subtractive method [48]. In the simulations below, the conventional, adaptive, suboptimal, and adaptive suboptimal SM-WRLS algorithms are applied to the identification of the a.- parameters. In the following, the simulation results are shown and discussed. In each figure, there are two diflerent frames, one for each parameter. Each frame shows two curves, one for the true parameter, the other for the estimate obtained by the algorithm under study. Additionally, the true parameters for the words “four” and “six” are shown in Figs. 3.3 and 3.4, respectively. These are provided as a reference in cases 54 -2 d -3 w r r v v r 3. ‘9. 5 6 7 Sample. n (x103) (a) O '- N L -2 ‘3 r r r T r r 3 ‘9 5 6 7 Sample. n (x103) (b) O N Figure 3.3: The “true” parameters for the word “four”. (a) Parameter a1 and (b) Parameter a2. 55 ’3 1 Y 1 Y I r O l, 2 3 ‘0 5, 6. 7 Sample. n (X103) (9) ‘3 I r r v r 1 O l 2 3. ‘t 5. 6 7 Sample. n (X103) (1)) 3 Figure 3.4: The “true” parameters for the word “six’. Parameter 02. (a) Parameter a1 and (b) 56 where the true curves are difficult to discern. 3.2.1 Conventional RLS and SM-WRLS Algorithms The power of the SM-WRLS algorithm is evident when compared with the con- ventional RLS [35] algorithm. As a basis for further discussion, we first show this comparison. Figures 3.5 and 3.6 show the simulation results for the word four using the RLS and the SM-WRLS algorithms, respectively, and Figs. 3.7 and 3.8 show the simulation results for the word six. It is evident that SM-WRLS outperforms RLS in terms of its tracking capability, and it is critical to note that this improved perfor- mance comes with improved computational efficiency. In this case SM-WRLS uses only 1.86% and 2.16% of the data for the words four and six, respectively, and yet yields better parameters estimates almost all the time. It is important to note that SM—WRLS tracks the time varying parameters faster than RLS. This is manifest in both examples, especially the word six (see Figs. 3.7 and 3.8). While the main theme of Section 2.4 is the development of adaptive SM-WRLS methods, it is noted that the “unmodified” SM-WRLS algorithm apparently has adaptive capabilities in its own right. While SM-WRLS is developed under the as- sumption of stationary system dynamics, it is capable of behaving in this manner in certain circumstances because of the special weights used. Recall that these optimal data weights have the interpretation as parameters which minimize the ellipsoid vol- ume. Note, however, that these weights multiply the corresponding equations, and therefore, different equations have diflerent weights. The SM-WRLS algorithm deter- mines the value of each weight depending on the “amount” of new information (from the SM point of view) contained in an equation. Hence, an equation with “no new information” is likely to be rejected ()(n) = 0) whereas an equation with a significant amount of information (such as in the case of fast changing dynamics) is likely to be heavily weighted. This intuitively accounts for the inherent adaptation behavior of 57 3 hue a l a . « \ estimate 0‘ -4 -1 -( -2 d -3 1' fir T T 7* V O l 2 3 '4 5. 6. 7, Sample. n (X103) (9) 3 2 1 estimate 0 ‘ / -1 cl \ _a d 11118 ’3 1% r T r 0 l 2 3 ‘4 5 6 7 Sample. n (X103) (13) Figure 3.5: Simulation results of the conventional RLS algorithm for the word four. 58 ’3 Hue /’ 2 1 \ WW 1 O esunuue ° l -1 4 -2 -3 T T *7 Y r j 0 l 2. 3. ‘9 5. 6 7 Sample. n (X103) (21) 3 2 i 1 o . estimate *— __ W A -1 \ -2 true ’3 r ‘1? v 1 v v 0 l 2 3. ‘6. 3. 6. 7. Sample. n (X103) (b) Figure 3.6: Simulation results of the SM-WRLS algorithm for the word four. 1.86% of the data is employed in the estimation process. 59 -, l estimate -3 1 v T Y Y Sample. n (X103) (9) estimate ‘3 T 7 r T T r O l 2 3. '4 5 6 7 Sample. n (X103) (1)) Figure 3.7: Simulation results of the conventional RLS algorithm for the word six. 60 3 a ’/true 1 o . estimate -1 -l -2 < ‘3 Y F T r j v 0 l 2 3 ‘+ 5 6 . 7 . Sample. n (X103) (a) 3 a 1 1 estimate . / M \ hue 0 1r ZY 3r 9‘ 3V 6'. 7 . Sample. n (x103) (b) Figure 3.8: Simulation results of the SM-WRLS algorithm for the word six. 2.16% of the data is employed in the estimation process. 61 the unmodified SM-WRI.S algorithm. However, as will be seen below, it is not possible to depend upon SM—Wltl..S to reliably behave in this adaptive manner, particularly in cases of quickly varying system dynamics. Each time a new equation is accepted, the ellipsoid volume decreases and the “confidence” in the current estimate increases. In a situation in which the signal is varying rapidly and the parameters are moving away from their “current” locations. then the algorithm accepts incoming equations to incorporate the new information into the estimate, and the ellipsoid volume decreases rapidly, eventually becoming very small. As the parameters continue to move rapidly away from their current locations, they eventually move outside the shrinking ellipsoid which becomes an invalid bounding ellipsoid. This condition indicates that a violation of the theory has taken place, and therefore, the unmodified SM—WRLS algorithm is no longer guaranteed to work properly. However, it is noted (empirically) that the unmodified SM-WRLS algorithm performs well when applied to slow time varying systems. Next, the simulation results of the several variations on the general adaptive SM- WRLS algorithm are shown. It is worth reiterating that when it becomes difficult to distinguish the true parameters from their estimates, the reader can refer to Figs. 3.3 and 3.4 for clarification. Also note that the figure captions will not contain the description for the two frames for it is implied that frame (a) shows parameter a, and frame (b) shows parameter a2. 3.2.2 Adaptive SM-WRLS Algorithms 3.2.2.1 Windowing Figures 3.9 and 3.10 show the simulation results of the windowed SM-WRLS algorithm for the words four and six, respectively, using a window of length 1000. This strategy uses only 5.69% and 5.44% of the data for the words four and six, respectively. Since each equation rotated in is eventually rotated out of the system, this strategy 62 0 estimate -1 -2 ‘3 v r 1 Y T Y 0 l 2. 3 '4 5 6 7 Sample. n (X103) (31) 3 2 estimate 1 i / o .l -l nue -2 -3 1 r r Y T 7 0 l a 3 *1 S 6. 7. Sample. n (X103) 0)) Figure 3.9: Simulation results of the windowed SM-WRLS algorithm for the word four (I = 1000). 5.69% of the data is employed in the estimation process. 63 estimate fine -2 -3 TV Y r f Y T 0 t 2 3 H 5 6 7 Sample. n (X103) (a) 3 2 J estimate hue ‘3 Y Y Y f ‘7 j'w o l a 3 9 5 s 7 Sample. n (X103) (1)) Figure 3.10: Simulation results of the windowed SM- WRLS algorithm for the word six (I = 1000). 5. 44% of the data 18 employed 1n the estimation process. 64 cflectivcly uses about twice the number of equations rotated in. More data than with the unmodified SM-WRLS algorithm are used, but more accurate estimates result and the time varying parameters are tracked more quickly and accurately. This can easily be seen when the parameter dynamics change abruptly near the point 2000 for the word four (see Fig. 3.9) and near the points 2000 and 4500 for the word six (see Fig. 3.10). 3.2.2.2 Graceful Forgetting Figures 3.11 and 3.12 show the simulation results of the graceful forgetting SM- WRLS algorithm when rotating out 0.1% of each of the equations (p = 10'3) that was accepted in the past 1000 recursions. This strategy uses only 6.19% and 4.89% of the data for the words four and six, respectively. Note that this technique uses comparable percentages of the data to those used by the windowed strategy and yields smoother estimates. Although the algorithm uses very small percentages of the data, the value of p used here might not be practical because it means that the algorithm will rotate out each equation that was initially accepted 1000 times, which is clearly a computational burden. Therefore, this strategy effectively uses about 1000 (or [1") times the number of equations rotated in. Depending on the nature of the problem, practical values of 11 may range from 0.002 to 0.01 with an effective window of length 500 to 100. 3.2.2.3 Selective Forgetting As noted in Subsection 2.4.1.3, the selective forgetting strategy selects the equations to be (partially or completely) removed from the system based on user defined crite- ria. The selection procedure used here is to remove the equations starting from the first accepted equation remaining in the estimate at a given time, and proceeding sequentially until some other condition is satisfied. The determination of when to apply the forgetting procedure and when to stop removing equations at a given time 65 Hue o J \ \ —1 4 estimate '3 r r r 1 I T O .- N to) £ u (D ‘1 Sample. n (X103) (9) estimate 1 p.- 1 true 31. ~.' 5’ E 7 , Sample. n (X103) (b) O p. N Figure 3.11: Simulation results of the graceful forgetting SM-WRLS algorithm for the word four (p = 10'3). 6.19% of the data is employed in the estimation process. 66 3 Hue 2 1 ct ., \ 8811111316 "1 -3 1 ‘3 Y t T v r r 0 1 a 3 ‘1, 5. 6 7 Sample. n (X103) (*1) 3 3. estimate 1 4 o .l -1 -] / UB6 ‘2 -3 T 7’ fit 7 T Y 0 l 2. 3 "f 5, 6 . 7 Sample. n (X103) 0)) Figure 3.12: Simulation results of the graceful forgetting SM-WRLS algorithm for the word six (11 = 10‘3). 4.89% of the data is employed in the estimation process. 67 is discussed ill the following. When the true parameters of the word four are inspected (see Fig. 3.3), for ex- ample, it is noted that they can be characterized as having slow time variations everywhere except in the region from 2000 to 2300 where they have fast time varia- tions. The fact that the parameters are changing very slowly in the first 2000 points, induces the algorithm to accept some points which, in turn, causes the ellipsoid vol~ ume to decrease. An increase in the “confidence” of the estimate results. Near time 2000, the ellipsoid volume becomes very small. When the parameters move rapidly away from their current location, they eventually move outside the ellipsoid which is therefore no longer a valid bounding ellipsoid. An “animation” subroutine has been developed which allows the user to see the locations of the true parameters and their estimates with respect to the ellipsoid in the 2-D parameter space. When this condition happens, it eventually leads to a negative value of n(n), (see (1.23)). For a stationary system, n(n) is always positive [7], so that this condition indicates that a violation of the theory (in particular, the violation of the assumption of stationary dynamics) has taken place“. A similar condition was also reported by Dasgupta and Huang [14] while applying their OBE algorithm to nonstationary systems. In our simulation studies, it is found that a negative n(n) is an effective indicator of need for adaptation, and this criterion is used as the prompt to begin selective forgetting. Whenever accepting an equation causes n(n) to become negative, the algorithm starts rotating out the equations which are selected based on the selection procedure until n(n) becomes positive again. Figures 3.13 and 3.14 show the simulation results of the selective forgetting strat- egy described here. For the words four and six, respectively, this technique uses only 3.6% and 2.83% of the data, 18.7% and 56.1% of which are rotated out during the adaptation process, and therefore, when counting the total number of equations r0- 6Mathematically, n(n) < 0 indicates an ellipsoid of negative dimensions. 68 a ‘ Hue l < “\\\\ estimate 0 .. 1 4 -a '3 r r 1 Y r v o l a 3 w 5 s a Sample. n (X103) («9) 3 o F a 3 9' S s, 7 Sample. n (X103) (b) Figure 3.13: Simulation results of the selective forgetting SM-WRLS algorithm for the word four. 3.6% of the data is employed in the estimation process. 69 INC o 4 estimate -1 d -2 an -3 4 . . 1 T . 0 l 2 3 '4 5 6. 7 Sample. n (X103) (21) 3 2 1 estimate -1. d nue -2. 1 -3 , , r j r . O l 2 3 9 S, 6 7 Sample. n (X103) 0)) Figure 3.14: Simulation results of the selective forgetting SM-WRLS algorithm for the word six. 2.83% of the data is employed in the estimation process. 70 tated into and out of the system, this strategy effectively uses 4.27% and 4.41% of the data. Compared to the windowed and graceful forgetting adaptive strategies, the sim- ulation results show that the selective forgetting strategy yields smoother estimates using even fewer data. It is important to recall that the percentages given in the windowed and graceful forgetting adaptive strategies represent the total number of equations used by these techniques, and do not account for the number of equations rotated out of the system. Detailed complexity comparisons are made in Section 2. 5. It should be pointed out that n(n) > 0 is only a necessary condition for the true parameters to be inside the current ellipsoid. The fact that K. goes negative at a particular time does not precisely determine the point at which system dynam- ics began to change. In fact, n(n) < 0 indicates a rather severe breakdown of the process indicating that the “true” parameters have moved well outside of the cur- rent ellipsoid. However, it is precisely in cases of fast changing dynamics that this “breakdown” occurs rapidly resulting in “n(n) < 0” in fact being a good locator of changing dynamics which require “immediate” adaptation to preserve the integrity of the process. In cases of slowly changing dynamics where the theory can be violated without the appearance of negative 1:, SM-WRLS seems to be sufficiently robust to these slow changes to make its own adjustments. The examples of Figs. 3.6 and 3.8 above illustrate this later point. 3.2.3 Suboptimal SM-WRLS Figures 3.15 and 3.16 show the simulation results of the unmodified SM-WRLS al- gorithm with suboptimal data selection. In this case, only 1.19% and 1.53% of the data are used for the words four and six, respectively. Compared to the SM-WRLS algorithm (see Figs. 3.6 and 3.8), the suboptimal technique uses slightly fewer data but produces comparable estimates. It is interesting to note that most of the equa- tions (97.6% for the word four and 94.4% for the word six) that are accepted by 71 estimate -2 1 -3 . , , , f 0 l 2 3 “t 3. 7 Sample. n (x103) (a) 3 2 1 estimate 0 .l / A _W A -x \ -2. 1 true -3 V v 1 r r O l 2 3. ‘9 5 7. Sample. n (X103) (b) Figure 3.15: Simulation results of the SM-WRLS algorithm with suboptimal data selection for the word four. 1.19% of the data is employed in the estimation process. 72 estimate 0 1' 2' 3r “1' 5'. 6i 7 Sample. n (X103) (9) 3 2 1 . 8811111316 0 / HIE “7W —1 _ uue -2 4 ~3. . r . . a r O l 2 . 3 ‘+. 5 6 7 Sample. n (X103) 0)) Figure 3.16: Simulation results of the SM-WRLS algorithm with subOptimal data selection for the word six. 1.53% of the data is employed in the estimation process. 73 the suboptimal technique are also accepted by the SM-WRLS algorithm. It is also interesting to note that the equations that are accepted by the suboptimal technique but not by the SM-WRLS algorithm lie mostly in regions of fast changing dynamics. 3.2.4 Adaptive Suboptimal SM-WRLS It is noted in Section 2.4.1 that the general formulation of the adaptive SM-WRLS algorithm is amenable to the suboptimal technique. The simulation results of the selective forgetting SM-WRLS technique with suboptimal data selection are shown in Figs. 3.17 and 3.18. This strategy uses only 1.89% and 1.86% of the data, 14.4% and 48.5% of which are rotated out during the adaptation process, and therefore, when counting the total number of equations rotated into and out of the system, this strategy effectively uses 2.16% and 2.76% of the data. Compared to the selective forgetting strategy (Figs. 3.13 and 3.14), the selective forgetting technique with suboptimal data selection uses fewer data but produces comparable estimates. On the other hand, when compared to unmodified SM-WRLS with suboptimal data selection (Figs. 3.15 and 3.16), the selective forgetting subop- timal technique uses more data but produces better estimates. 3.3 Simulation Results of an AR(14) model In this section, the identification of a time varying AR(14) model is considered. The same procedure of Section 3.2 is used to generate the “true” AR parameters from an utterance of the word “seven” by an adult male speaker. The acoustic waveform of this word is shown in Fig. 3.19. In the following, the simulation results are shown and discussed using the same format as that used in Section 3.2, however, only one true parameter (04), which is the most challenging parameter for the algorithm to track, is used for illustration and is shown in Fig. 3.20. 74 -1 l estimate -a ‘3 1* 1 fi Y r f o 1 a. 3. a 5. s. 7 Sample. n (X103) (a) 3. a 4 1 4 estimate 0. -l -1 q + ‘W fine -2 .. -3 1 v __r v v v o 1 a 3. s. s. a 7 Sample. n (X103) (1)) Figure 3.17: Simulation results of the selective forgetting SM-WRLS algorithm with suboptimal data selection for the word four. 1.89% of the data is employed in the estimation process. Hue .. l \ estimate _!‘ -2. .4 -3 r r v T v r O t 2 3 ‘4 5, 6 7. Sample. n (X103) (80 3 a .J ‘ ‘ estimate 0 _‘ 4 true .a~ d -3 1 . , , r r 0 l 2 3 ‘t. 5. 6. 7 Sample. n (X103) (b) Figure 3.18: Simulation results of the selective forgetting SM-WRLS algorithm with suboptimal data selection for the word six. 1.86% of the data is employed in the estimation process. '2 T fl 1 f 1 1* o l a. 3 a. s. s. 7. Sample. n (X103) Figure 3.19: The acoustic waveform of the word “seven”. 3 2 4 1 a O. 4 -l 1 -3 a -3 . v v v v 1 0 l 2 3. H S. 6 7 Sample. n (x103) Figure 3.20: The fourth “true” parameter (a) for the word “seven”. 77 , 4 estimate -Z, ‘3 T w v v v r O t 2 3 '9. S. 6. 7 Sample. n (X103) Figure 3.21: Simulation results of the conventional RLS algorithm for the word seven. 3.3.1 Conventional RLS and SM-WRLS Algorithms Figures 3.21 and 3.22 show the simulation results for the word seven using the RLS and the SM—WRLS algorithms, respectively. Although both algorithms yield bad estimates and do not track the time varying parameters, the SM-WRLS algorithm which uses only 7.93% of the data yields better parameter estimates than those of the RLS algorithm most of the time. In contrast with the AR(2) results of Figs. 3.6 and 3.8, the “unmodified” SM- WRLS algorithm for the AR(14) example shown in Fig. 3.22 fails to track the time varying parameters. This is expected since the signal is varying rapidly (especially in the range [2000—6000]) and the algorithm is no longer guaranteed to work properly (recall the discussion of Section 3.2.1). By inspecting Fig. 3.22 carefully, it is clear that the SM-WRLS algorithm loses its ability to track the signal in the range when 78 estimate Sample. n (X103) Figure 3.22: Simulation results of the SM-WRLS algorithm for the word seven. 7.93% of the data is employed in the estimation process. the signal is varying rapidly. 3.3.2 Adaptive SM-WRLS Algorithms 3.3.2.1 Windowing This subsection tests and compares the simulation results of the windowed SM-WRLS algorithm for the word seven using three different window lengths. Figures 3.23, 3.24, and 3.25 show the simulation results using windows of lengths 500, 1000, and 1500, and using 22.1%, 17.04%, and 14.34% of the data, respectively. In all three tests, more data than with the unmodified SM-WRLS algorithm are used, but much more accurate estimates result and the time varying parameters (one of which is shown) are tracked more quickly and accurately. When comparing the effect of the window length 79 -3 Y Y Y T T r 0 l 2 3 ‘t S 6 7 Sample. n (X103) Figure 3.23: Simulation results of the windowed SM-WRLS algorithm for the word seven (1 = 500). 22.1% of the data is employed in the estimation process. on the algorithm, it is noted that shorter window lengths use more data, but yield more accurate estimates (most of the time) and track the time varying parameters more quickly and accurately. However, if the window length becomes very short (< 300), the variance of the estimates becomes very large because these estimates involve a small window length which is effectively even smaller because of the small fraction of data accepted. 3.3.2.2 Graceful Forgetting This subsection tests and compares the simulation results of the graceful forgetting SM-WRLS algorithm for the word seven using three different p values. Recall that p represents the fraction of the equations removed from the system at each time. Figures 3.26, 3.27, and 3.28 show the simulation results using p values of 10’3, 2x10‘3, and 4 x 10‘3 (or effective windows of lengths 1000, 500, and 250), and using 18.66%, 80 -2 o 1' a 3. J 5' s . 7 Sample. n (X103) Figure 3.24: Simulation results of the windowed SM-WRLS algorithm for the word seven (1 = 1000). 17.04% of the data is employed in the estimation process. 27.63%, and 35.59% of the data, respectively. Note that this strategy is initiated at time 100 to allow for the estimation to stabilize. It is clear that as the value of )1 increases (or the effective window length decreases), the algorithm uses more data (as expected), tracks the time varying parameters more quickly, but yields inaccurate estimates at certain points. The latter point is more evident in Fig. 3.28 where the effective window length is only 250 points. When com- paring the results of the graceful forgetting technique with those of the corresponding windowed technique (for example, compare Figs. 3.24 and 3.26 in which the effective window length is 1000), it is noted that the graceful forgetting technique uses more data and yields “comparable” estimates. 3.3.2.3 Selective Forgetting Using the same selection criterion as that used in Subsection 3.2.2.3, the selective 81 to estimate Sample. n (X103) Figure 3.25: Simulation results of the windowed SM-WRLS algorithm for the word seven (1 = 1500). 14.34% of the data is employed in the estimation process. forgetting strategy yields the simulation result shown in Fig. 3.29. This technique uses only 12.89% of the data, 72.9% of which are rotated out during the adaptation process. Therefore, it effectively uses (or rotates into and out of the system) 22.29% of the data. It is noted that this technique slowly tracks the rapidly varying parameters and yields estimates that are not as accurate as those produced by the other adaptation strategies but uses fewer data. Recall that this technique produces highly accurate estimates in the AR(2) examples (see Figs. 3.13 and 3.14), however, the signal variations in these examples are not as severe as those in the AR(14) example. 3.3.3 Suboptimal SM-WRLSV Figure 3.30 shows the simulation results of the unmodified SM-WRLS algorithm with suboptimal data selection which uses only 4.74% of the data. Compared to the SM- (1) (\D estimate '3 1 v v 1— o l a , 3 . w 5'. 5' 7 Sample. n (x103) Figure 3.26: Simulation results of the graceful forgetting SM-WRLS algorithm for the word seven (p = 10“”). 18.66% of the data is employed in the estimation process. WRLS algorithm (see Fig. 3.22), the suboptimal technique uses fewer data (60% of the data used by SM-WRLS) but produces comparable estimates. It is noted that most (91.3%) of the equations that are accepted by the suboptimal technique are also accepted by the SM-WRLS algorithm. It is also noted that the equations that are accepted by the suboptimal technique but not by the SM-WRLS algorithm lie mostly in regions of fast changing dynamics. 3.3.4 Adaptive Suboptimal SM-WRLS Two adaptive suboptimal techniques are tested in this section. The first technique is the windowed SM-WRLS algorithm with suboptimal data selection. Figures 3.31 and 3.32 show the simulation results of this technique which uses only 10.93% and 8.71% of the data when using a window of length 500 and 1000, respectively. Compared 83 -1 1 ii true , J! ‘ -. . M .. Sample. n (X103) Figure 3.27: Simulation results of the graceful forgetting SM-WRLS algorithm for the word seven ()4 = 2 x 10‘3). 27.63% of the data is employed in the estimation process. to the windowed SM-WRLS algorithm (see Figs. 3.23 and 3.24), the corresponding adaptive suboptimal technique uses fewer data (50% of the data used by the win- dowed SM-WRLS algorithm), however, it produces estimates that are comparable to but not as smooth as those of the windowed SM-WRLS algorithm (see Figs. 3.31 and 3.32). The second technique is the selective forgetting SM-WRLS algorithm with sub- optimal data selection. The simulation result of this technique is shown in Fig. 3.33 which uses only 8.8% of the data, 63.3% of which are rotated out during the adapta- tion process, therefore, it effectively uses 14.37% of the data. This technique produces comparable estimates to those of the selective forgetting SM-WRLS algorithm shown in Fig. 3.29 using fewer data. 84 “'3 I f r v 1* T Sample. n (x103) Figure 3.28: Simulation results of the graceful forgetting SM-WRLS algorithm for the word seven ([1 = 4 x 10‘3). 35.59% of the data is employed in the estimation process. 3.4 Roundoff Error Analysis Recently, there has been an increasing interest in the performance of adaptive algo- rithms in small wordlength environments [8, 49, 50, 51]. Rao and Huang [8] have investigated the effect of small wordlengths on one of the OBE algorithms presented in [14]. Their simulation studies were performed in integer arithmetic using a fixed point implementation of the OBE algorithm. They have shown that the OBE algo- rithm yields consistently good estimates over a large range of wordlength and performs better than the conventional RLS algorithm for small wordlengths [8]. Marshall and Jenkins [49] have presented a fast quasi-Newton (FQN) adaptive filtering algorithm which is quite robust with respect to small wordlength implemen- 85 Sample. n (x103) Figure 3.29: Simulation results of the selective forgetting SM-WRLS algorithm for the word seven. 12.89% of the data is employed in the estimation process. tation and has comparable performance to that of RLS. However, this algorithm is developed based on the assumption that the input is real and wide-sense stationary which yields a symmetric and Toeplitz autocorrelation matrix. The F QN algorithm appears to avoid the numerical problems reported for several fast RLS techniques [51, 52]. The numerical problems consist of numerical inaccuracy in the results (per- formance degradation) and numerical instability (overflows or underflows) which are caused by finite wordlength computations [50]. All the simulations presented in the previous sections are performed by using C with 32-bit, single precision, floating point arithmetic on a VAX 8600 mainframe running under a Unix operating system. It is the main purpose of this section to test the effect of smaller wordlength computations on the performance of the GR- based SM-WRLS algorithm. The simulations presented here are performed with 16- 86 estimate -2 ‘3 fir 1 17 T T Sample. n (x103) Figure 3.30: Simulation results of the SM-WRLS algorithm with suboptimal data selection for the word seven. 4.74% of the data is employed in the estimation process. bit, single precision, unnormalized floating point arithmetic. Whenever an overflow (underflow) occurs, the algorithm sets the detected value to the maximum (minimum) possible value which is determined by the wordlength used. The roundoff error analyses are performed on the same AR(2) models used in Section 3.2. Figures 3.34 and 3.35 show the simulation results of the SM-WRLS algo- rithm for the words four and six using only 1.96% and 2.17% of the data, respectively. Compared to the results of the SM-WRLS algorithm for the word four obtained when using a wordlength of 32-bit (see Fig. 3.6), this algorithm uses slightly more data and yields slightly better estimates when a 16-bit wordlength is used. The simulation results for the word six when a 16bit wordlength is used are almost identical to those obtained when using a wordlength of 32—bit (see Fig. 3.8). Figure 3.36 shows the simulation results of the windowed algorithm for the word 87 .2 c1 Sample. n (x103) Figure 3.31: Simulation results of the windowed SM-WRLS algorithm with subopti- mal data selection for the word seven (1 = 500). 10.93% of the data is employed in the estimation process. four using a window length of 1000 and a wordlength of 16-bit. This strategy uses slightly fewer data (5.4%) and produces comparable but slightly less accurate es- timates compared with those of the corresponding results obtained when using a wordlength of 32-bit (see Fig. 3.9). Figure 3.37 shows the simulation results of the graceful forgetting SM-WRLS algorithm when rotating out 0.1%, and using a wordlength of 16-bit. This strategy uses about half the data (3.27%) but produces unacceptable estimates compared with those of the corresponding results obtained when using a wordlength of 32-bit (see Fig. 3.11). Finally, Figure 3.38 shows the simulation results of the selective forgetting SM- WRLS algorithm using a wordlength of 16-bit. This technique uses only 3.27% of 88 '3 Sample. n (X103) Figure 3.32: Simulation results of the windowed SM-WRLS algorithm with subopti- mal data selection for the word seven (1 = 1000). 8.71% of the data is employed in the estimation process. the data, 10.5% of which are rotated out during the adaptation process. Therefore, it effectively uses (or rotates into and out of the system) 3.61% of the data. This strategy uses slightly fewer data and produces comparable but slightly less accurate estimates compared with those of the corresponding results obtained when using a wordlength of 32-bit (see Fig. 3.13). Except for the graceful forgetting strategy, the simulation results using a wordlength of 16-bit are very encouraging. This is due to the infrequent updating behavior inher- ent in the SM-WRLS algorithms, and hence, slower accumulations of roundoff errors. Also, note that the orthogonal rotation used in the GR-based SM-WRLS algorithms is known to be a numerically stable operation under the assumption that the orthog- onal matrices are produced in the absence of roundoff errors [39, 50]. However, the 89 estimate -3 Sample. n (X103) Figure 3.33: Simulation results of the selective forgetting SM-WRLS algorithm with suboptimal data selection for the word seven. 8.8% of the data is employed in the estimation process. graceful forgetting strategy requires performing a large number of downdates for each previously included equation, which has a severe effect in degrading the performance of the algorithm due to the accumulation of roundoff errors per iteration. The most widely discussed fast identification algorithm is the Fast Transversal Filter (F TF) [52] which is a fast RLS algorithm, requiring 8m + 15 flops per iteration (or 0(m2) for small m) in its more stable form. It includes a “rescue” procedure to prevent divergence due to finite precision effects, and to ensure numerical stability. The rescue procedure is a fast initialization procedure, requiring 3m + 10 flops per invocation, in which all the accumulated quantities are sacrificed. It is true that roundoff error experiments performed in this section are 0(m2), however, it is clear from the simulation results presented in Sections 3.2 and 3.3 90 3 true a / W 1 -4 \ estimate 0. -1‘ -2, q ’3 1 I v v 0 t 2 3. 9. 5 6. 7 Sample. n (x103) (a) a 2 4 1 o 4 esfinuue / _._. 7f _J’?!’ -l. '1 \ ‘3 ‘ ttue '3. v f v v r r O t. 2. 3. '4. S. 6. 7. Sample. n (X103) 0)) Figure 3.34: Simulation results of the SM-WRLS algorithm for the word four using a 16-bit wordlength. 1.96% of the data is employed in the estimation process. nue ‘ / -1_ 4 -a ‘3 r v r r r T _o l a. 3. s. 5. s. 7. Sample. n (X103) (a) 3 2. estimate L. / Jew... true -2 '3 T m r 1 r Y 0 l 2 3 ‘0. 5. 6. 7. Sample. n (x103) (b) Figure 3.35: Simulation results of the SM-WRLS algorithm for the word six using a 16-bit wordlength. 2.17% of the data is employed in the estimation process. 3 2 1 true 1 0, ‘ \ -2 a -3 Y 1 T F 0 l 2 3 ‘6. 5 6 7 Sample. n (x103) (a) 3, 2. estimate 1 " / O. .1 _A at} -1.1 V'" Hue -2. q -3 v r 1 T r I .0 l 2. 3. ‘9. 5. 6. 7. Sample. n (X103) (1)) Figure 3.36: Simulation results of the windowed SM-WRLS algorithm for the word four (I = 1000) using a 16-bit wordlength. 5.4% of the data is employed in the estimation process. 93 -2. -3, r v r v r r O l. 2. 3. ‘9. 5. 6. 7. Sample. n (X103) (a) 3. a J 1. esnnune -3 T 7 v v r 7 Sample. n (x103) (b) Figure 3.37: Simulation results of the graceful forgetting SM-WRLS algorithm for the word four (a = 10‘3) using a l6-bit wordlength. 3.27% of the data is employed in the estimation process. true estimate -1, 4 —a m ‘3. v t v—i r I T O t 2 3 W. 3 6 7 Sample. n (x103) (a) 3 2 estimate 1 o It _— ._ j -l flue -314 -3, y f j I r 1 O t 2. 3. H. S. 6. 7. Sample. n (X103) (b) Figure 3.38: Simulation results of the selective forgetting SM-WRLS algorithm for the word four using a l6-bit wordlength. 3.27% of the data is employed in the estimation process. 95 that the suboptimal strategy produces estimates which are not very different from those of the SM-WRLS algorithm and yet requires only m flops per checking, which represents a significant computational savings with respect to FTF. Ilowever, it is essential to note that the suboptimal strategy is used to select points and is not directly involved in the update procedure. Therefore, the results of the suboptimal strategy with respect to small wordlength effects are not expected to be significantly different from those of the SM-WRLS presented in this section. In fact, since even fewer computations are generally used in the suboptimal strategy, even better finite precision characteristics could be expected. 96 Chapter 4 Architectures and Complexity Issues 4.1 Introduction It is noted in Chapter 1 that one of the reasons why the GR-based SM-WRLS for- mulation is desirable is that it is amenable to contemporary computing architectures. This chapter is devoted to the development of parallel hardware implementations of the real scalar SM-WRLS algorithm and discussion of their advantages, particu- larly with regard to their improved computational complexity which improve their potential for real time applications. Section 4.2 presents a parallel architecture that implements the SM-WRLS algorithm. Section 4.3 develops an adaptive compact par- allel architecture that implements virtually any version of the real scalar SM-WRLS algorithm. Finally, a detailed analysis of computational complexity issues is carried out in Section 4.4. 4.2 Parallel Architecture for SM-WRLS The use of a parallel machine benefits the processing in terms of computational com- plexity by reducing the process to one requiring 0(m) time, rather than 0(m2) 97 required for the conventional version of the algorithm [5, 6], where m is the number of parameters in the system model. This speed-up is due to the parallel processing inherent in the design. The main computational requirements of the GR-based SM-WRLS formulation are a GR processor (to effectively execute orthogonal triangularization) to update the matrix [T(n) I d1(n)] at each step, and a back substitution (BS) processor to solve for the scalar C(72) and also for the estimate S(n) at each n. Systolic processors for these operations, based on the original work of Gentleman and Kung [42] and Kung and Leiserson [53], are well known. It is the purpose of this section to manifest this algorithm as a parallel architecture based on these processors. The SM-WRLS algorithm of Fig. 1.2 is mapped into a parallel architecture. The need for implementing the SM-WRLS algorithm on a parallel architecture arises from the fact that portions of the algorithm are compute-bound, specifically, updating the matrix [T(n) | d1(n)] and computing the value C(11) and the parameter vector 8(a). The architecture that speeds up the computation of these quantities and satisfies the desirable characteristics of systolic arrays (SA’s) is shown in Fig. 4.1. Note that although this architecture is designed based on SA design methodologies, it is used here to process one equation at a time (more on this below), and therefore, is not used as a SA. This architecture provides an improvement over that described in [9] by replacing the global buses with local buses for communication between adjacent cells. For simplicity of notation, the figure shows a purely autoregressive case with p=3; AR(3). Once the processor is understood, it should be clear that the architecture is perfectly capable of handling the general ARMAX(p,q) case. In the discussion below, the vectors r(n) and 9(n) are used, however, the architecture of Fig. 4.1 uses the vectors y(n) and 5(a) instead to denote the special case AR(S), where y(n) = ly(n - llyfn - 2)y(n - 3)lT- The architecture is composed of two SA’s, several memory management units (i.e., 98 y(n) lldlllz MAU [ITDMX DMX F I F O 8‘ ' 82 ' g" ‘— MPX MPX G(n+1) 4——— MAU _— t...’ . 33 . 32 . a] Figure 4.1: Systolic array implementation of the Givens rotation-based SM-WRLS algorithm. For simplicity of notation but without loss of generality, the figure shows a purely autoregressive case with p=3; AR(3). 99 [first-in First-out (FIFO) and Last-in First—out (Lll’O) stacks7), multiply-add units (M AU’s), multiplexers (MPX’S), and demultiplexers (DMX’S). The first SA is a trian- gular array that performs orthogonal triangularization using GR’s [42, 43] which are particularly suitable for solving recursive linear LS problems. The diagonal (circular) cells perform the “Givens generation” (GG) operations and all other (square) cells in the triangular array perform the GR operations. There is a delay element at the lower right-hand corner of the triangular array that is used to synchronize the flow of the generated entries into the FIFO stacks and to simplify the control of these stacks once they are filled and ready to output their contents to the BS array. The operations performed by this array are shown in Fig. 4.2 [42, 43]. Therefore, the triangular array rotates the new equation into the upper triangular matrix [T(n) I d1(n)], where the t,,- cells update the matrix T(n) and the right-hand column (dlj) cells update the vector d1(n). The element t,, denotes the i j‘h element of the matrix T(n) and the element d1,- denotes the j‘“ element of the vector (h(n). The second array is a linear array that performs the BS operations shown in Fig. 4.3 [53]. Note that the same BS array is used to solve for the vectors g(n + 1) and B(n) with the data provided to the appropriate cells in the required order by the FIFO and LIFO stacks. The FIFO stacks feed the lower triangular matrix TT(n) to solve for the vector g(n + 1), and hence, the value G(n + 1). The LIFO stacks feed the upper triangular matrix T(n) to solve for the parameter vector g(n). The values C(n + 1) = [I g(n + 1) [[3 and I] d1(n) I]; are generated by the MAU’s shown in Fig. 4.4. The number of segments in each stack is equal to the number of elements the stack holds. Therefore, the leftmost stack consists of m segments, whereas the rightmost stack has only one segment. 7Note that the architecture shown in Fig. 4.1 does not include any of the LIFO stacks that were used to hold the matrix T(n) in the architecture reported in [9]. This is achieved by slightly increasing the complexity of the cells used in the triangular array so that they can be used as storage elements as well. This is facilitated by the diagonal interconnections between adjacent cells which now constitute the LIFO stacks. 100 c = 1 x- s = 0 In t = t. tin } 01“ 1n 0 ( ) else{ c,s temp = [x2 + (xin)2]1/2 t c = x I temp t 01.1 S = Kin / temp (a) X = temp tout = x 1 tin “in If(xin=0&c=l&s=0) ] tout = tin else{ (c.s) —-u x e (cs) x = cx + sxin l XOUt = 'S X + C Kin t _ xout out tout - X l (b) [in tout = t1n tout (C) When the array is used as LIFO stacks: First cycle: tout = x (tij cells) xOUt = X (d1 06118) All following cycles: tout=tin (tij cells) xout = xin ((11 cells) Figure 4.2: The operations performed by the cells used in the triangular array of Fig. 4.1. (a) The Givens generation (GG) cells, (b) the Givens rotation (GR) cells, and (c) the delay element. lOl 1i y. xi,out x, l xi,out = (bi'yi)/ aii 1,0ut bi (a) aij yl out ‘— ‘__ y 1.1!! .. x, ____,. __+ )5 yi,out " yi,in+aijxj J ____. (b) Figure 4.3: The Operations performed by the back substitution array. (a) The left-end processor and (b) the multiply-add units. The initial y;,,-,, entering the rightmost cell is set to 0. a . d=a+b.c a G(n) = llg(n)l|2 - g3 82 81 XZU Figure 4.4: The multiply-add unit used in Fig. 4.1. The system shown in Fig. 4.1 works as follows. The first m+1 equations (with appropriate weights) enter the triangular array (from the top) in a skewed order, and the matrix [T(n) | d1(n)] is generated and stored inside the cells. A shift register with appropriate feedback connection and data sequencing can be used to hold and feed the equation to the array. The initial upper triangular matrix residing in the array, and corresponding to the first m +1 equations, is ready after 3m +1 GG time cycles. The CO time cycle is that of the triangular array performing the CG operations without square roots, which is the time required to perform five floating point operations (flops) [43, 55], where one flop is defined as one floating point multiplication plus one floating point addition. Note that in order to prevent data collision, the flow of data in the triangular array moves along a corresponding wavefront and is controlled by the slowest cells in the array, via, GG cells. Note that the data are fed to the array one (skewed) equation at a time, therefore, the contents of each cell remain constant 103 after the completion of the current recursion. After the new equation is rotated into the matrix [T(n) I d1(71)], the vectors g(n + 1) and B(n) are computed. All the t,,- cells in the triangular array load their contents on the to,“ lines (tau. «- ar), and then pass these elements across the diagonal lines (tau, +— t,,,) (see Figs. 4.1 and 4.2). This obviates the LIFO stacks. The FIFO stacks are still needed, however, to compute the vector g(n + 1). The FIFO stacks are filled with the elements of the lower triangular matrix TT(n) as they are generated. This is done by loading the ti, entry on the to... line (to... i- 2:) when it is generated. This entry propagates down the diagonal cells (with the function to,“ +— t,-,,) until it arrives at and fills the appropriate FIFO stack. For the cells in the right-hand column, which generate the vector d1(n), the operations are different because it is this column that constitutes the LIFO stack for the vector d1(n). Hence, after the new equation is rotated into the array, all the cells in the right-hand column load their contents on the 2:0... lines (mom «— as), and then they pass these elements down the column (am «- xgn) (see Figs. 4.1 and 4.2). Note that the output am leaving the bottom cell in this column passes through the delay element and is routed to both the MAU and the MPX feeding the (11,- elements to the BS array. Note that the elements d1", and tmm leave the triangular array at the same time because of this delay element. The timing diagram of the triangular array is shown in Table 4.1. In this table, the inputs refer to the elements fed to the cells in the top row. The circle (0) represents the CG cell and the square ([3) represents the GR cell (see Fig. 4.1). The outputs refer to the elements that are produced in the array cells and are written column wise; i.e., the first column in the table represents the first column in the array, and so on. The BS array is used to solve for the vectors g(n + 1) and 9(a). The vector g(n +1) is solved using (1.31) and the parameter vector 9(a) using (1.28). Therefore, the vector g(n + 1) is generated from the matrix TT(n), which is residing in the FIFO stacks, and the vector :e(n + 1) which is available. The entries are fed to the 104 Table 4.1: The timing diagram of the triangular array of Fig. 4.1. Inputs Outputs Time 0 a a D [T(n) l dd 0 y(n - 1) 1 y(n — 2) t1] 2 y(n — 3) tn 3 y(n) t2; t13 4 t23 du 5 tss €112 6 (113 BS array every other BS time cycle, where the BS time cycle is the time required to perform one flop. As the 9.- entries are output from the left-end processor of the BS array, they enter the MAU to generate the value C(n + 1) after 2m + 1 BS time cycles. Likewise, the parameter vector 9(n) is generated using the matrix T(n) and the vector (h(n) which are stored in the triangular array. Starting one BS time cycle after the initiation of the first BS Operation, the appropriate entries (of the second BS operation) are also fed to the BS array every other BS time cycle. The parameter vector 9(n) is output from the left-end processor of the BS array in reversed order and interleaved with the vector g(n +1) as shown in Fig. 4.1. The value [I d1(n) I]: is generated using a MAU one BS time cycle after the last (m‘h) element of the vector dl(n) is generated. The timing diagram of the BS array is shown in Table 4.2 in which the inputs refer to the elements fed to the shown cells, and the outputs refer to the elements produced by the left-end processor in the array. The values n(n) and c:(n+ 1) are then computed, and hence, the value An.“ which determines whether the new equation is to be accepted or not. If the new equation is 105 Table 4.2: The timing diagram of the back substitution: array of Fig. 4.1. Inputs Outputs Time 0 C! D O 0 t1], y(n "' 1) 1 ‘33. d13 t12 91 2 in. y (n " 2) t23 tn 33 3 t22. (112 in in 92 4 t3, y(fl — 3) in £22 5 in. du 93 6 . accepted, then the weighted new equation enters the triangular array and the same procedure described above takes place producing a new [T(n + 1) | d1(n + 1)] matrix after 2m + 1 CG time cycles, and therefore, an updated C(n + 2), 8(n + 1), and n(n + 1). On the other hand, if the new equation is rejected, then the triangular array preserves its contents (hold state), but the value C(n + 2) is updated to make the decision concerning the next equation. In the latter case, the same TT(n + 1) matrix is used as the previous TT(n) matrix, and hence, the feedback on the FIFO stacks. This procedure is repeated for every new equation. 4.3 An Adaptive Compact Parallel Architecture The basic idea behind the compact architecture is to map the triangular array of Fig. 4.1 into a linear array (called the GR array), that is, mapping all of the CG cells into one GG cell and all the GR cells that are on the same diagonal into one GR cell. This constitutes a permissible schedule because the projection vector, cf, is parallel to 106 the schedule vector, .3, and all the dependency arcs flow in the same direction across the hyperplanes [36, Ch. 3]. In other words, this schedule satisfies the conditions 5%? > 0 (4.1) and 3‘76 2 0, for any dependence arc E. (4.2) The compact architecture implementation of the adaptive SM-WRLS algorithm is shown in Fig. 4.5. The operations performed by this architecture are similar to those of Fig. 4.1 with the exception that the CG and GR cells are now capable of performing back rotation (see Fig. 4.6) and are embedded in a slightly more complicated modules needed for scheduling. These modules are called GG’ and GR’, and are shown in Fig. 4.7. This architecture uses 0(m) cells (one GG’ cell and m GR’ cells) compared with 0(m2) cells (m GG cells and ”—2331 CR cells) used in the architecture shown in Fig. 4.1, and yet has the same computational efficiency (per equation). Note however that the LIFO stacks that were embedded in the triangular array of Fig. 4.1 are now needed to hold the matrix T(n). The system shown in Fig. 4.5 works as follows. Each equation (with its opti- mal weight) enters the GR array (from the top) in a skewed order, and the matrix [T(n) I d1(n)] is generated and stored in the appropriate memory units. Note that the GR array can operate in two modes, forward (6 = +1) and backward (6 = —-l) rotation modes (see Fig. 4.6). In the backward rotation mode, the equation to be (partially or completely) removed is re-introduced to the GR array with the appropri- ate weight (see Section 2.4.1). At the end of each recursion, the FIFO stacks contain the lower triangular matrix TT(n) needed to solve for the vector g(n +1), and hence, the value G(n+1). The LIFO stacks contain the upper triangular matrix T(n) needed to solve for the parameter vector g(n). The values C(n + 1) = II g(n + 1) H?2 and 107 y(n) y(n-3) y(n-Z) - NH) 4 l 66 ‘GR' GR GR ‘ 4, E2“ [2ij Mad 1 F I.7 F L F L 1 1 I 1 I 1* F r F F F F o o o o o o 81- 82- 83 W- m m I11!!! G(n+1) e—i MAU .a, . a, . a, —'l___. H m. lldll2 Y(n'l) F d MAU —’ I . , l u y(n-Z) F ' . o d12 Y(n'3) T Figure 4.5: A compact architecture implementation of the adaptive SM-WRLS algo- rithm. 108 If (Kin = O){ xm C=l s = O 1 else{ x(n-l) ® ((3,8) x(n) = [x(n-1)2 + 5(xin)2] 1/2 c = x(n-1)/x(n) s = xin / x(n) X(n) } (a) xout xin (C’s) _, F, (c,s) x(n): cx(n-l) + sxin5 X(n-1) ——q T x0“, = -s x(n-l)5 + c xin5 x(n) (b) Figure 4.6: The operations performed by (a) the CG and (b) the GR cells used in the modules of Fig. 4.7. 6 = +1 (—1) for rotating the equation into (out of) the system. 109 “XII-I) CG 7 (C,S) Q (a) n(n) " y(n-k) Gilli!) < £113“) or diiln) I MPX | or film) (c.8) > (0.8) V tg(n-1) ‘ GR le-(n) <——— dIJ-lm) (b) 33(1)) v Figure 4.7: (a) The GG’ module and (b) the GR’ module used in the architecture of Fig. 4.5. 110 Table 4.3: The timing diagram of the Givens rotation (GR) array of Fig. 4.5. Inputs Outputs Time O U C] D O D Cl Cl 0 y(n — 1) 1 y(n " 2) tn 2 y(n " 3) t1: 3 y(n) ‘22 tl3 4 in 6111 6 d13 || d1(n) “3 are generated by the MAU’s. Note that the values which were propagating downward in the triangular array of Fig. 4.1 are now propagating leftward due to the new scheduling. Note also that the vector d1(n) is treated differently from the matrix T(n). When the element d1,- is computed, it is stored in an internal register in the GR’ cell (see Fig. 4.7). After generating and storing the matrix [T(n) I d1(n)], the processor is ready to compute the vectors g(n + 1) and é(n) using the BS array. The vector (h(n) is downloaded into the latches which serve as a LIFO stack used in conjunction with the other LIFO stacks (containing the matrix T(n)) to solve for the parameter vector é(n). The timing diagram of the GR array is shown in Table 4.3 in which the input (output) columns show the elements that are input (output) to (from) the corresponding CG (0) or GR (Cl) cells. Compared to the triangular array of Fig. 4.1, it is noted that the cell utilization per update (or downdate) has increased by a factor of 2.25 for the case when m = 3, or by W in general. The operations and timing diagram of the BS array is described in detail in Section 4.2. The architectures of Figs. 4.1 and 4.5 can also be used for the suboptimal SM- 111 WRLS algorithm, however, they are not utilized to the same extent as they are in the SM-WRLS algorithm because the suboptimal SM-WRLS typically uses fewer data. The infrequent updating feature of this algorithm (and virtually all SM-WRLS algorithms) might provide processing advantages in the systolic (and other parallel processing) schemes by permitting the sharing of processing time and resources. The complex scalar case can also be implemented using the same architectures of Figs. 4.1 and 4.5 which now perform complex GG and GR Operations. These opera- tions are well-defined and are found in [56]. However, the generalized complex vector case is not readily mapped into similar architectures. The generalized architecture that efficiently implements this case requires further research. 4.4 Computational Complexities The computational complexities (in flops per equation) for the scalar sequential GR- based SM-WRLS algorithm and for that implemented using the architecture of Fig. 4.1 are shown in Table 4.4. Note that the complexities of the parallel GR-based SM-WRLS algorithm shown in Table 4.4 are parallel complexities in the sense that they denote the effective number of operations per equation, though many processors can be performing this number of operations simultaneously. Accordingly the parallel complexity indicates the time it takes the parallel architecture to process the data regardless of the total number of operations performed by the individual cells. The GG and GR operations constitute the main computational load of the algorithm as shown in Table 4.5. In this table, the number of flops associated with the GR’s is multiplied by five to account for the CG cycle time (see Section 4.2). These oper- ations are avoided when the equation is rejected, and thus, a significant savings in computation time. Since this technique uses 0(m) flops per equation when imple- mented using the parallel architecture, it is clearly advantageous with respect to the 112 Table 4.4: Computational complexities (in flops per equation) for the real scalar sequential and parallel GR-based SM-WRLS algorithms. Covariance and Example SM-WRLS Algorithm Checking Solution Update (flops) Sequential GR-based .5m2 + 2.5m + 13 2.5m2 + 10.5m + 5 160 Sequential Suboptimal GR-based (m + 1) + 8(.5m2 +1.5m + 12) 2.5m2 + 10.5m + 5 55 Parallel GR-based 3m + 14 11m + 10 68 Parallel Suboptimal GR-based (m + l) + 5(2m + 13) 11m + 10 26 Table 4.5: Parallel computational complexities (in flops per equation) for the various SM-WRLS algorithms using the implementations of Figs. 4.1 and 4.5. Element Computed flops per equation 63,02 + 1) m + l Coefficients of quadratic (1.26) 7 A‘(n + 1) 5 + f C(n +1) and B(n) 2m + 1 If the equation is accepted: Updating the equation m + 1 + f Givens rotations 5(2m + 1) x(n) 4 113 original 0(m’) sequential formulations [5, 6, 20]. If the fraction of the data accepted by the SM-WRLS is r (r is typically less than 30% [7]), then the total parallel computational complexity is given by (3m + 14) + r[11m +10] (4.3) flops per equation. The adaptive compact architecture of Fig. 4.5, which has slightly more complicated cells, is as efficient as the architecture of Fig. 4.1. The only difference is that the architecture can be used to rotate out (part of) an equation which was previously rotated in. Therefore, the parallel computational complexity (per equation) does not change. However, in the windowed technique, for example, there might be a need to go through updating the system twice for a single equation; first to rotate an equation out (if it was accepted) and then to rotate the new equation in (if it is accepted). The architecture (of Fig. 4.5) can be visualized as operating in two modes, the first mode is when it is rotating an equation into the system (6 = +1) and the second mode is when it is rotating an equation out of the system (6 = —1). Note that the two modes have the same parallel computational complexity with some addition operations in one mode replaced by subtraction operations in the other mode (see Fig. 4.6). The total parallel computational complexities of the general adaptive SM-WRLS algorithms (see Section 2.4.1) depend on the adaptive strategy employed, the per- centage of the data accepted, and the number of times the algorithm rotates out an equation from the system (whether partially or completely). Consider the windowed adaptation, for example, which effectively slides a window of fixed length through the data by appropriate sequencing of rotating particular equations into and out of the system. Suppose that the fraction of the data accepted (rotated in) by the windowed SM-WRLS algorithm is r and the fraction of the data removed (rotated out) from 114 the system is u (u < r), then the total parallel computational complexity is given by (3m+ 14)+(r+u)[11m+ 10] (4.4) flops per equation. This expression also holds for the selective forgetting strategy, however, the graceful forgetting strategy uses the same expression with it replaced by I if u. It is important to note that the adaptive techniques typically use more data but produce better estimates. To show the computational savings when using the “suboptimal” SM-WRLS al- gorithm, it is noted in Section 2.3 that at each recursion, we only need to compute c3,_1(n) and check if following condition holds (written here for the real scalar case) cf._1(n) > i“(n) , (45) whereas in the SM-WRLS algorithm, the coefficients of the quadratic (1.26), cf,_1(n), C(n), and )(n) must be computed before making the decision. In the suboptimal case, these quantities are computed only if condition (4.5) is met, and then the new equation is accepted if the optimal weight is positive. To calculate the total computational complexity for the suboptimal SM—WRLS algorithm, let us denote the fraction of the data satisfying the condition (4.5) by s (s < r) and the fraction of the data accepted by the SM-WRLS algorithm after passing the test (4.5) by t (t S 3). Then the total parallel computational complexity for the suboptimal algorithm is given by (m + 1) +s[2m+ 13] +t[11m +10] (4.6) flops per equation, which clearly shows significant improvement over that of the SM- WRLS algorithm (cf. (4.3)). The total parallel computational complexity for the 115 suboptimal windowed and selective forgetting strategies is given by (m+1)+s[2m+l3]+(t+u)[llm+10] (4.7) flops per equation, with u replaced by p’lu for the suboptimal graceful forgetting strategy. The fourth column in Table 4.4 shows the total number of flops per equation for a typical example with m = 10, r = 0.2, and s = t = 0.1. It shows that the parallel architecture reduces the complexity of the algorithm by about 60%, and when the suboptimal strategy is employed, the complexity is reduced by 84%. The performance of the SM-WRLS algorithm in terms of its adaptive behavior and tracking capability, solution quality, and fraction of data used requires further research; however, it is important to note that the gain in the computational complex- ity of the GR-based algorithm, when implemented on a sequential machine, is only two to three times when compared to that of the MIL-based WRLS (see Table 2.1), and five to six times when implemented on a parallel machine (see Table 4.4). It is the suboptimal technique that gives an order of magnitude (13 to 14 times) gain in the computational complexity when implemented on a parallel machine, and gives six to seven times gain when implemented on a sequential machine. Therefore, it makes more sense to use the suboptimal technique for speed advantages since the estimates are not very different from those of the SM-WRLS algorithm. The computational complexity of the generalized complex vector case of the SM- WRLS algorithm developed in Section 2.2 when computed on a sequential machine is discussed in Section 2.5. However, the parallel computational complexity of the generalized parallel GR—based algorithm depends on the architectural implementation of this algorithm which requires further research. 116 Chapter 5 Conclusions and Further Work 5.1 Algorithmic Developments 5.1.1 A Generalized SM-WRLS Algorithm This research has been concerned with a class of SM algorithms for estimating the parameters of linear system or signal models in which the error sequence was pointwise “energy bounded.” Specifically, it was focused on the SM-WRLS algorithm which works with bounding hyperellipsoidal regions to describe the solution sets which are a consequence of the error bounds. SM-WRLS is based on the familiar WRLS solution with the SM considerations handled through a special weighting strategy. However, the original theoretical development of this algorithm made it applicable to real scalar data. Due to the strong potential for using this powerful algorithm in virtually any signal processing problem involving parametric models, this algorithm has been extended to work with a wider range of problems. The theoretical development of a generalized SM-WRLS algorithm that can handle complex vector-input vector-output data streams has been presented. The original SM-WRLS algorithm is a special case of the generalized algorithm. 117 5.1.2 Suboptimal Tests for Innovation A new strategy has been developed which can be applied to virtually any version of the SM-WRLS algorithm to improve the computational complexity. A significant reduction in computational complexity is achieved by employing a “suboptimal” test for information content in an incoming equation. The suboptimal check has been argued to be a useful determiner of the ability of incoming data to shrink the ellipsoid, but one which does not rigorously determine the existence of an optimal SM weight in the SM-WRLS sense. The main issue is to avoid the computations of an 0(m2) checking procedure required to check for the existence of a meaningful weight. Since most of the time these computations would result in the rejection of incoming data, a more efficient test significantly reduces the complexity of the algorithm. 5.1.3 Adaptive SM-WRLS Algorithms It has been argued that the “unmodified” SM-WRLS algorithm has inherent adaptive capabilities in its own right. However, it is not possible to depend upon this algorithm to reliably behave in an adaptive manner, particularly in cases of quickly varying system dynamics. In this work, explicitly adaptive SM-WRLS algorithms have been developed. Adaptation was incorporated into SM-WRLS in a very general way by introducing a flexible mechanism by which the algorithm can forget the influence of past data. The general formulation permitted the extension of SM-WRLS to a wide range of adaptation strategies. Three different adaptation techniques have been presented and tested on models derived from real speech data. Windowing is a simple way to make the algorithm adaptive by effectively sliding a window of fixed length through the data by appropriate sequencing of “adding” or “removing” equations. The Graceful Forgetting technique removes only a fraction of all previous equations at each time. The Selective Forgetting technique chooses the equations to be (partially or completely) removed from the system based on certain user defined criteria. 118 A survey of the computational complexities of several related sequential algorithms has been presented which shows the computational savings obtained when using the SM-based algorithms compared with the conventional WRLS algorithms. The dif- ferences in the computational complexities among the various SM-based algorithms have been discussed. 5.2 Simulation Studies The SM-WRLS algorithms have been tested on models derived from real speech data representing the words “four,” “six,” and “seven” using an AR(2) model for the first two words and AR(14) model for the third. The simulation results presented illustrate important points about the various methods and show that the adaptive algorithms yield accurate estimates using very few of the data and quickly adapt to fast variations in the signals dynamics. It is significant that in preliminary experiments, most of the SM-WRLS algorithms have been found to be robust in small (16-bit) wordlength environments. 5.3 Architectures and Complexity Issues The “nonadaptive” SM-WRLS algorithm has been formulated to be run on a parallel architecture. An architecture has been developed that implements this algorithm in 0(m) flops per equation. Then, this architecture (which uses 0(m2) cells) was mapped into a compact architecture in order to increase the cell utilization at the expense of using slightly more complicated cells (needed for scheduling). The compact architecture uses only 0(m) cells which is clearly advantageous with respect to the 0(m2) cells architecture. The cells used by the compact architecture were upgraded to implement the adaptive strategies. It was also noted that the same architectures could be used to implement the “suboptimal” strategy. 119 Finally, a detailed analysis of the computational complexity issues was carried out which clearly shows the significant computational savings when implementing the SM- WRLS algorithm using the parallel architectures. The computational complexities of the adaptive SM-WRLS algorithms depend on the adaptive strategy employed, the percentage of the data accepted (which is typically more than that of the nonadaptive algorithms), and the number of times the algorithm rotates out an equation from the system. The analysis also shows that the suboptimal strategy (whether implemented sequentially or using the parallel architectures) provides significant improvements in the computational efficiencies of the various algorithms. 5.4 Further Work This research has been concerned with problems in which the error sequence was pointwise energy bounded (see constraint (1.1)). It is of great interest to study other classes of SM algorithms that use different constraints. Other interesting variations involve stability constraints [30], and other noise parameter bounds [31, 32]. The optimization criterion used in this research was based on minimizing the “volume ratio” of the ellipsoids at n and n — 1 (see (1.25)). It might be useful to use different optimization criteria in order to minimize the ellipsoid volume. For example, it may be possible to define a strategy that efficiently minimizes the longest axis of the ellipsoid. The adaptive SM-WRLS algorithm presented in Section 2.4 works with a very flexible “forgetting” scheme. Three different techniques were presented and tested, however, it is possible to define (and test) various other adaptation strategies. It is true that the SM-WRLS algorithm was tested on models derived from speech data (since they are available in the Speech Processing Laboratory), however, this algorithm has the potential for application to a wide range of problems. It is of great interest and significance to study the performance of this algorithm when applied to beamforming, neural networks, and other important applications. Finally, the computational complexity of the GR-based SM—WRLS algorithm can be further improved by incorporating more parallelism (in hardware) within the cells of the architectures of Chapter 4. Also, a generalized architecture is needed to effi- ciently implement the complex vector case. Souheil F. Ode/z East Lansing, Michigan May, 1990 Appendix Appendix A The Relationship between C$(n) and )(n) The theoretical developments in this appendix are for the real scalar case. The generalization to the complex vector case is straightforward. It was noted in Section 1.2.3.1 that the system of equations (1.8) (or (2.9) for the generalized case) can be reduced to an upper triangular system (1.27) by applying a sequence of orthogonal operators (GR’s). Suppose that a new equation is accepted with an optimal weight )(n), it can be rotated into the upper triangular system by inserting it in the m + 1" row, i.e., T(n-1) l d1(n—1) _,/1(n).r(n.) —g . Amigo) and applying another sequence of m GR’s leaving the system in the form _ T(n) dim) oixm l. dzfn) b where the matrix T(n) is an m x m upper triangular Cholesky factor [39] of Cr(n) (sec (1.30)). This sequence of GR’S is given by anl=Jme—1°”JzJi (A3) where J,- denotes an (m + 1) x (m + 1) orthogonal matrix that annihilates element i of the m + 1” row of (A.l). It can be shown that Q(n) takes the form i C1 0 0 0 81 -3231 c2 0 0 32c] —s3c251 —8382 c3 ~-- 0 S3C2C1 -s4c3c231 —S4C332 ~3433 0 s4c3c2c1 -3mCm-1 ' ' ' €231 -8mCm—1"'0332 —3mCm—1 ° ' ' 0483 ° " Cm SmCm—1°'°Ci L "'CmCm—l ' ' '6231 _Cmcm—1”'C332 -CmCm—1 ' ° ' C483 ' ' ° 3m CmCm—1"°Cl . in which c,- (s,) is the cosine (sine) term associated with the i‘” rotation. This form of Q(n) simplifies the generation of the matrix T(n) from T(n - 1) and is useful in finding det C1.(n) below. Since the matrix T(n) is upper triangular, then det T(n) = fit,,-(n) (A.4) and det C;(n) = [det T(n)]2 = fit?,~(n) (A5) in which t,,- denotes the i‘“ diagonal element of the matrix T(n). After some (tedious) algebraic manipulation, it follows that: Case m =1: det C,(n) = t¥,(n) Case m = 2: detC,(n) = t22(n)tfl(n) = 13,0. _1)z';‘,(n)+ )(n) [1,,(n -1)$2(n)-t12(n —1)xi(n)l2 Case m = 3: detsznl = t§3(n)t;2(n)tf1(n) = t§3(n " llt§2(n)tfl(n) + )(n) {t22(n — 1) [t11(n — 1):r3(n) — t13(n —1):c1(n)]— t23(n —1)[tn(n — 1);rg(n) — t12(n —1):r1(n)]}2 Case m = 4: detcxn) = t:.(n)t§.(n)t§.(n)t:.(n) = t:.(n - 1)t§.(n)t§.(nit:.(n) + Mn) {t33(n - 1) {t22(n — 1) [t11(n — l):r4(n)-t14(n -1):r1(n)]— w(n —1)[tu(n -1)h(n)-t12(n—1)r1(n)l}- t34(n — 1) {t22(n — 1) [tn(n -— 1):c3(n) — t13(n -1):r1(n)] — t23(n — 1) [t11(n -- 1)h(n) - t12(n - 1):1:1(n)]}}2 124 and so on. Therefore, det C,(n) is a monotonically increasing function of Mn). D 125 Bibliography Bibliography [1] SH. Mo and J.P. Norton, “Fast and robust algorithm to compute exact polytope parameter bounds,” Mathematics and Computers in Simulation, to appear in 1990. [2] J.P. Norton and SH. Mo, “Parameter bounding for time-varying systems,” Math- ematics and Computers in Simulation, to appear in 1990. [3] A.K. Rao, Y.F. Huang, and S. Dasgupta, “ARMA parameter estimation using a novel recursive estimation algorithm with selective updating,” IEEE' Trans. Acoust., Speech, and Signal Process, vol. 38, pp. 447-457, March 1990. [4] SF. Odeh and J.R. Deller, Jr., “A systolic algorithm for adaptive set membership identification,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’90, Albuquerque, NM, vol. 5, pp. 2419-2422, April 1990. [5] .I.R. Deller, Jr., “A ‘systolic array’ formulation of the optimal bounding ellipsoid algorithm,” IEEE Trans. Acoust, Speech, and Signal Process, vol. 37, pp. 1432- 1436, Sept. 1989. [6] JR. Deller, Jr., “Set membership identification in digital signal processing,” IEEE ASSP Magazine, vol. 6, no. 4, pp. 4-20, Oct. 1989. [7] JR. Deller, Jr. and T.C. Luk, “Linear prediction analysis of speech based on set- membership theory,” Computer Speech and Language, vol. 3, no. 4, pp. 301-327, Oct. 1989. [8] A.K. Rao and Y.F. Huang, “Analysis of finite precision effects on an OBE algo- rithm,” Proc. IEEE Int. Conf. AcousL, Speech, and Signal Process. ’89, Glasgow, vol. 2, pp. 853-856, May 1989. [9] JR. Deller, Jr. and SF. Odeh, “Implementing the optimal bounding ellipsoid algorithm on a fast processor,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89, Glasgow, vol. 2, pp. 1067-1070, May 1989. 126 [10] E. W'alter and II. I’iet-Lahanier, “Estimation of parameter bounds from bounded- crror data: A survey,” Proc. 12‘" [MACS World Congress on Scientific Compu- tation, Paris, pp. 467-472, July 1988. [11] E. Walter and H. Piet-Lahanier, “OMNE versus least squares for estimating parameters of a biological model from short data records,” Proc. 12‘” [MA CS World Congress on Scientific Computation, Paris, pp. 85-87, July 1988. [12] H. Piet-Lahanier and E. Walter, “Practical implementation of an exact and re- cursive algorithm for characterizing likelihood sets,” Proc. 12’“ [MA CS World Congress on Scientific Computation, Paris, July 1988. [13] R. Tempo, “Robust estimation and filtering in the presence of bounded noise,” IEEE Trans. Automatic Control, vol. AC-33, pp. 864-867, 1988. [14] S. Dasgupta and Y.F. Huang, “Asymptotically convergent modified recursive least squares with data-dependent updating and forgetting factor for systems with bounded noise,” IEEE Trans. Information Theory, vol. IT-33, pp. 383-392, 1987. [15] J .P. Norton, “Identification of parameter bounds of ARMAX models from records with bounded noise,” Int. J. Control, vol. 45, pp. 375-390, 1987. [16] J .P. Norton, “Identification and application of bounded-parameter models,” Au- tomatica, vol. 23, pp. 497-507, 1987. [17] E. Walter and H. Piet-Lahanier, “Exact and recursive description of the feasible parameter set for bounded error models,” Proc. 26"" IEEE Conf. Decision and Control, Los Angeles, pp. 1921-1922, 1987. [18] J.R. Deller, Jr. and T.C. Luk, “Set-membership theory applied to linear pre- diction analysis of speech,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’87, Dallas, vol. 2, pp. 653-656, 1987. [19] Y.F. Huang and A.K. Rao, “Application of a recursive estimation algorithm with information-dependent updating to ARMAX models and ARMA models with unknown inputs,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’87, Dallas, vol. 2, pp. 1007-1010, 1987. [20] Y.F. Huang, “A recursive estimation algorithm using selective updating for spec- tral analysis and adaptive signal processing,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-34, pp. 1331-1334, 1986. [21] V. Broman and M.J. Shensa, “Polytopes, a novel approach to tracking,” Proc. 25’“ IEEE Conf. Decision and Control, Athens, pp. 1749-1752, 1986. [22] M.L. Feron and J.R. Deller, Jr., “‘Membership set’ identification applied to the impulse train excited AR model” (abstract), Final Program: SIAM 1983 National Mtg., Denver, p. 30, 1983. 127 [23] E. Fogel and Y.F. Huang, “On the value of information in system identification --— Bounded noise case,” Automatica, vol. 18, pp. 229—238, 1982. [24] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals eval- uation in presence of unknown but bounded errors: Linear families of models and estimators,” IEEE Trans. Automatic Control, vol. AC-27, pp. 408-414, 1982. [25] E. Fogel, “System identification via membership set constraints with energy con- strained noise,” IEEE Trans. Automatic Control, vol. AC-24, pp. 752-758, 1979. [26] B.R. Barmish and J. Sankaran, “The propagation of parametric uncertainty via polytopes,” IEEE Trans. Automatic Control, vol. AC-24, pp. 346-349, 1979. [27] EC. Schweppe, “Recursive state estimation: Unknown but bounded errors and system inputs,” IEEE Trans. Automatic Control, vol. AC-13, pp. 22-28, 1968. [28] HS. Witsenhausen, “Sets of possible states of linear systems given perturbed observations,” IEEE Trans. Automatic Control, vol. AC-l3, pp. 556-568, 1968. [29] DR Bertsekas and LB. Rhodes, “Recursive state estimation for a set- membership description of uncertainty,” IEEE Trans. Automatic Control, vol. AC-16, pp. 117-128, 1971. [30] P.L. Combettes and H.J. Trussell, “Stability of the linear prediction filter: A set theoretic approach,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’88, New York, vol. 4, pp. 2288-2291, 1988. [31] P.L. Combettes and H.J. Trussell, “General order moments in set-theoretic esti- mation,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89, Glas- gow, vol. 4, pp. 2101-2104, May 1989. [32] P.L. Combettes, Set Theoretic Estimation in Digital Signal Processing (Ph.D. Dissertation), North Carolina State University, Raleigh, 1989. [33] SM. Veres and J.P. Norton, “Structure identification of parameter-bounding models by use of noise-structure bounds,” Int. J. Control, vol. 50, pp. 639-649, 1989. [34] D. Graupe, Time Series Analysis, Identification, and Adaptive Systems, Malabar, FL: Krieger, Ch. 5, 1989. [35] L. Ljung and T. Séderstrém, Theory and Practice of Recursive Identification, Cambridge, MA: MIT Press, pp. 14-15 & Sect. 2.2.1., 1983. [36] S.Y. Kung, VLSI Array Processors, Englewood Cliffs, NJ: Prentice Hall, 1988. [37] ED. Van Veen and K.M. Buckley, “Beamforming: A versatile approach to spatial filtering,” IEEE ASSP Magazine, pp. 4-24, April 1988. 128 [38] T. Kohonen, Self-Organization and Associative Memory (2”" ed.), Berlin, llei- delberg: Springer-Verlag, Chs. 6 8". 9, 1988. [39] CH. Golub and GE. Van Loan, Matrix Computations, Baltimore, MD: .lolms- Hopkins Univ. Press, Chs. 5 & 6, 1983. [40] J.R. Deller, Jr. and GP. Picaché, “Advantages of a Givens rotation approach to temporally recursive linear prediction analysis of speech,” IEEE Trans. Acoust., Speech, and Signal Process, vol. 37, pp. 429-431, March 1989. [41] T.C. Luk and J.R. Deller, Jr., “A nonclassical WRLS algorithm,” Proc. 23rd Ann. Allerton Conf., pp. 732-741, 1985. [42] W.M. Gentleman and HT. Kung, “Matrix triangularization by systolic arrays,” Proc. Soc. Photo-Optical Instrumentation Engineers: Real-Time Signal Process- ing IV, vol. 298, pp. 19-26, 1981. [43] J.G. McWhirter, “Recursive least squares minimization using a systolic array,” Proc. Soc. Photo-Optical Instrumentation Engineers: Real- Time Signal Process- ing VI, vol. 431, pp. 105-112, 1983. [44] J.R. Deller, Jr. and D. Hsu, “An alternative adaptive sequential regression al- gorithm and its application to the recognition of cerebral palsy speech,” IEEE Trans. Circ. and Syst., vol. GAS-34, pp. 782-787, July 1987. [45] J .R. Deller, Jr., Unpublished research notes, Michigan State University, East Lansing, 1990. [46] M.L. Feron, “Membership set system identification with pulse train input,” M.S. Thesis, Illinois Institute of Technology, Chicago, 1982. [47] J. Makhoul, “Linear prediction: A tutorial review,” Proc. of the IEEE, vol. 63, pp. 561-580, 1975. [48] W.H. Press, B.P. Flannery, S.A. Teukolsky, and WT. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge, MA: Cambridge Univ. Press, pp. 209-213, 1988. [49] D.F. Marshall and W.K. Jenkins, “A fast quasi-Newton adaptive filtering algo- rithm,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’88, New York, NY, vol. 3, pp. 1377-1380, 1988. [50] J.M. Cioffi, “Limited-precision effects in adaptive filtering,” IEEE Trans. Circ. and Syst., vol. GAS-34, pp. 821-833, July 1987. [51] S. Ljung and L. Ljung, “Error propagation properties of recursive least squares adaptation algorithms,” Automatica, vol. 21, pp. 157-167, 1985. [52] J.M. Cioffi and T. Kailath, “Fast, recursive least squares transversal filters for adaptive filtering,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-32, pp. 304-337, April 1984. [53] HT. Kung and C. Leiserson, “Algorithms for VLSI processor arrays,” Rpt. No. MU-CS-79-103, Dept. of Computer Sci., Carnegie-Mellon Univ., 1978 (Reprinted in reference [54]). [54] C. Mead and L. Conway, Introduction to VLSI Systems, Reading, MA: Addison- Wesley, Section 8.3, 1980. [55 W .M. Gentleman, “Least squares computations by Givens transformations with- out square roots,” J. Inst. of Math. and its Appl., Vol. 12, pp. 329-336, 1973. [56] S. Haykin, Adaptive Filter Theory, Englewood Cliffs, NJ: Prentice Hall, Ch. 10, 1986. 130 "iiiiiiiiiiiiiir