.. .Ii‘ui 3'10".7r7v0l I. ’o.-'. POIO!I) v I-A:P-.) . .Il‘t‘... .IJ..3.§..€R v: {1.01:2} ....z.....:..¢. >12! ...s..:.{:’).-; .2515 lrta . 1:... {r1 .Jvc . 9.; :1!!! 1.913. a... 3:11.. ’15.. .34); 7.. 2 2.3 . If}... - s1. 1‘ .5: .3 so...‘..3 f..."ul.¢-.|l . . ) otuolo’ 0.2.40. 0):: a . Olv) .IO-A-Ilkh. of. a: .a. 1.. I . All lat-it . 3.1.0.! J .3 .1, .HA .fiflnv...‘ I . .ll. Mastur- x .t: . A. 0"? ; ‘7 .. l!!A‘.O. 31...: v aI-X..v. 15.) .61! Fifi,” . I! 1} ii; A a All . I . . . . mutton 3...... . £25.. . ”fitflfiris. .51.}... . .311 ..I :1!" a 15.13:: . 5.14:} It: a: . lift! . ‘(v’ . :..t.¥l.§a.}5 I. .Iktal‘l ’IJS. . . . . 3.1.3.2... . . . . .<. 4. . . 1 .. IA.D‘.II:‘.A¢5!'II‘I 1.5.2.35}. {rift .. 3...... : . r2. .ratezv . . . . nfflo S 3...}: .23.. Y. Ill...v):.:.5aa... 10G; .fénihtu 40¢ .235. J . 1:... ..aia5val.. 6;. .ltiatftr..§l -. f 35:39.... ..-..a .13, .lilalrzb...v.¢. . 1A.!)fia...i. s0. . 001.11.. .10. I:I:‘Jn:.9..i‘ftlnlwurllsl:llti: A c4.)!lv.ti.‘lxlé;§i¢n El... .luual .l .4 0.4.31): 3-: ’05:! If “91:3“ 8".” it; . I‘ll-'l" IF ' r1 rttilL‘ llllllliillllIIHHllllllllllillllllilliillilll'lllllllllllll 31293 00885 9948 This is to certify that the dissertation entitled Development and Analysis of An Optimal Bounding Ellipsoid Algorithm Using Stochastic Approximation presented by Ming-Shou Liu has been accepted towards fulfillment of the requirements for Electrical Engineering Ph.D. degree in Major professor Date 8'&' 93 MS U i: an Affirmative Action/Equal Opportunity Institution .n. .A._._....A,.._.A—- M ~a—— -. -fl- 4 _.....-—.. -. , LIBRARY Michigan State 1 University PLACE iN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE MSU Is An Affirmative Action/Equal Opportunity institution chS—oj ———___ DEVELOPMENT AND ANALYSIS OF AN OPTINIAL BOUNDING ELLIPSOID ALGORITHM USING . STOCHASTIC APPROXIIVIATION By Ming-Short Liu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1993 ABSTRACT DEVELOPMENT AND ANALYSIS OF AN . OPTIMAL BOUNDIN G ELLIPSOID ALGORITHM USING STOCHASTIC APPROXINIATION By Ming-Shea Liu Set-membership-based (SM) techniques provide an alternative to conventional system iden- tification algorithms. This research is concerned with a class of SM algorithms known as the optimal bounding ellipsoid (OBE) algorithms which combines the weighted least square error minimization approach with a bounded error constraint. More specifically, OBE al- gorithms focus on finding an optimal hyper-ellipsoidal region for thefeasible set when the additive model noise is bounded. Previous work has shown that OBE techniques have great potential in many applications involving parametric modelling. The published OBE algorithms, however, either do not include a formal convergence proof or do not show interpretable geometrical meaning. In this research, a modified OBE algorithm, which is both interpretable and convergent, is introduced. This algorithm is referred to as set-membership stochastic approximation (SM-SA) algorithm because of its resemblance to the stochastic approximation (SA) method developed by Robbins and Monro in 1951. Recently, a general class of OBE algorithms including all methods published to date, is unified into a single framework called the unified OBE (UOBE) algorithm. UOBE is based on generalized weighted recursive least squares in which very broad classes of “forgetting factors” and data weights may be employed. Different instances of UOBE are distinguished by their weighting policies and the criteria for determining optimal weight values. SM-SA is a. special case of UOBE, and, through the analysis of this research and the paper by Deller, Nayeri, and Krunz, the convergence properties of SM-SA, which will be introduced later, can be applied to the UOBE framework. In this dissertation, the updating criteria and optimal weights for shrinking feasible sets for SM-SA are first analyzed. Then asymptotic properties and performance of SM-SA are discussed. Simulation results for different SM algorithms and comparisons to conven- tional recursive least squares are also examined. Finally, the UOBE formulation and its convergence properties are introduced. Copyright by Ming-Shou Liu August 3, 1993 To my Mother and my wife For their Love, Support, and Sacrifice ACKNOWLEDGMENTS A deep debt is owed to my advisor Professor Majid Nayeri for his encouragement and guidance, and many hours spent on behalf of my dissertation. His direction is very important for the development and analysis of this research. Thanks are also given to my co-advisor, Professor John R. Deller, Jr., for his time spent with me in helpful discussions. I would like to express my gratitude to other members of my Ph.D. guidance committee: Professor Clifford E. Weil and Professor Hassan K. Khalil for their help in reviewing my dissertation. I gratefully acknowledge the financial support provided by a grant from the National Science Foundation and Office of Naval Research. vi Acronyms and Abbreviations Auto-regressive with exogenous input (ARX) Auto-regressive-moving-average with exogenous input (ARMAX) Multiple-input—multiple-output (MIMO) Optimal bounding ellipsoid (OBE) Optimal volume ellipsoid (OVE) Recursive least squares (RLS) Set-membership (SM) Set-membership stochastic approximation (SM-SA) Set-membership weighted recursive least squares (SM-WRLS) Single-input-single-output (SISO) Stochastic approximation (SA) Unified optimal bounding ellipsoid (UOBE) Weighted recursive least squares (WRLS) vii TABLE OF CONTENTS LIST OF TABLES x LIST OF FIGURES xi 1 Introduction and Background 1 1.1 Real-Time Identification and the Recursive Least Square Estimator ..... 2 1.2 Introduction to Stochastic Approximation ................... 3 1.3 Introduction to Set-Membership Theory .................... 4 1.4 Review of Previous Work on OBE Algorithms ................. 6 1.5 Motivation for Developing the New Algorithm ................. 9 2 The Development of the SM-SA Algorithm 13 2.1 The Set-Membership Stochastic Approximation Algorithm .......... 13 2.2 Initial Values ................................... 20 2.3 The Optimal Weight ............................... 21 2.4 Data Selection .................................. 23 2.5 Relations Between SM-SA and SM—WRLS ................... 33 3 Convergence Issues 37 3.1 Fundamental Convergence Properties of SM-SA ................ 41 3.2 Convergence of Parameter Sets and Suboptimal Test ............. 47 3.2.1 Convergence of Parameter Sets ..................... 47 3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm ..... 50 3.3 Convergence of the Parameter Set with White Noise Case or Errors Distur- bances ....................................... 52 3.3.1 IID White Noise Case .......................... 53 3.3.2 General White Noise Case ........................ 58 3.4 Independence and The Mixing Conditions ................... 72 4 Simulation Studies 79 4.1 Performance of SM-SA for Time-Invariant Systems .............. 80 4.1.1 SM-SA with Pseudo—Random Excitation ................ 81 4.1.2 AR Models with Binary Bernoulli Excitation . . . ' .......... 83 4.2 Performance of SM-SA for Time-Varying System ............... 85 4.3 Suboptimal Test ................................. 90 viii 4.3.1 AR Models with Pseudo—Random Excitation ............. 4.3.2 AR Models with Binary Bernoulli Excitation ............. 4.4 More Studies on Time-varying Systems ..................... 5 The Convergence Properties of UOBE Algorithms 5.1 The UOBE Algorithm .............................. 5.2 Convergence Properties of UOBE ........ l ................ 6 Conclusion 6.1 Concluding Remarks ............................... 6.1.1 Summary of Landmark Developments of SM-Based Techniques . . . 6.1.2 Contributions of this Research ..................... 6.2 Future Work ................................... A Infinite Product BIBLIOGRAPHY 96 96 101 101 103 105 105 105 106 107 108 109 2.1 2.2 4.1 LIST OF TABLES The data-selection and the performance of SM-WRLS and SM-SA for the AR model (2.50) .................................. 35 Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a1 = 1.6 and a2 = —0.68 ................................... 36 Noise processes used in the simulations in this chapter ............. 80 1.1 1.2 2.1 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 LIST OF FIGURES Relationship between C(11) and 9(a) for the two dimensional-case. ..... The weights become very large when more data are used. .......... Plots of V(/\,,) when cn > 0 and Gn < 1 ..................... When G7, > 1 and c,, > 0, if there exists one extremum of V(/\,,) in (0,1), then there exist more than two extrema ..................... When cn = 0, the the only two possibilities for V(z\,,) in (0,1) are: (a) There is a saddle point, or (b) no extrema and saddle points exist .......... When c,, < 0, if more than one extremum appear in (0, 1), then there must be at least three extrema. ............................ The relationship between logarithms of time and weight ........... (a) If Evin—13%.. > 0 and ”m" in shaded area, then a contradiction results. (b) If 6,7,,“an < 0 and v,,,,, in shaded area, then a contradiction results. . Sample correlation coefficients of {wn} in the example on page 63 up to time lag40 ......... Sample correlation coefficients of {vn} in the example on page 63 up to time lag 40 ........................................ Estimates of the parameter (1;, when the system is driven by (a) an Lid. white noise (b) a white noise but not i.i.d.. .................. Distribution of {mu} and {12"} at each amplitude interval. .......... The logarithmic plot of volume versus the actual time n. We see that the volume of ellipsoid is much smaller when the noise is i.i.d ........... Distribution of a white noise at each amplitude interval after passing {vn} through a soft-limiter. .............................. Sample correlation coefficients of {vn} after passing through a soft-limiter. . Estimates of the parameter a3 when 0,, is passed through a soft-limiter. Distribution of a white noise at each amplitude interval after multiplying 12,, by two and then passing it through a soft-limiter ................ Estimates of the parameter a3 when 12,, is multiplied by two and then passed through a soft-limiter. .............................. Sample correlation coefficients of {vn} up to time lag 40 ............ Estimates of a3 by SM-SA and RLS when the noise is color .......... 27 29 29 30 34 56 62 62 64 65 66 66 67 67 68 68 69 70 3.14 3.15 3.16 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 Distribution of a color noise at each amplitude interval. ........... Sample correlation coefficients of a noise sequence obtained by passing {0”} through a soft-limiter. .............................. Estimates of the parameter a3 when the color noise {vn} is passed through a soft-limiter. ................................... The logarithmic plot of 11,, versus time n for thelinear time-invariant AR(3) model with pseudo-random excitation. ..................... The logarithmic plot of Isnl versus time n for the linear time-invariant AR(3) model with pseudo random excitation. ..................... First 1000 estimates of the parameters in the AR(3) model obtained using SM-SA and a pseudo-random excitation, where a1 = 2.1, a; = —1.48, and a3 = 0.34. ..................................... The relationship between logarithms of time and weight ............ The estimates of model parameters by RLS and SM-SA for biased noise sequence. ..................................... The logarithmic plot of pa versus time 11 when a binary Bernoulli sequence is used. ...................................... First 30,000 estimates obtained using SM-SA and RLS when a binary Bernoulli sequence is used ............................. First 30,000 estimates obtained using SM-SA and RLS when a biased binary Bernoulli sequence is used ............................. The acoustic waveform of the word “four” .................... The “true” parameters for the word “four”: (a) b1(n) (b) bg(n) ........ Estimates of b1(n) for the word “four” using (a) SM-SA (b) RLS ....... Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS ....... The logarithmic plot of 11,, versus time n for the AR(3) model when the suboptimal test is used. ............................. The logarithmic plot of Isnl versus time n for the AR(3) model ........ First 2000 estimates of the parameters using SM-SA, where a1 = 2.1, a; = —1.48, and a3 = 0.34. .............................. The relationship between logarithms of time and weight ............ Estimates of parameter using RLS and SM-SA for the suboptimal test. . . . First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary Bernoulli sequence is used and suboptimal test is employed. ......... First 30,000 estimates obtained by SM-SA and RLS when a bias binary Bernoulli sequence is used and suboptimal test is employed .......... Estimates of (a) b1(n) (b) bg(n) by using SM-SA algorithm for the word “four” when noise bounds are over-estimated .................. '71 81 82 82 83 84 85 86 87 88 89 91 92 93 93 94 94 95 97 98 100 Chapter 1 Introduction and Background “Man creates models of his natural and man-made environments to understand and explain them better and as prelude to action” [8]. Indeed, from social science to engineering and natural science, modeling plays an important roll to help humankind describe functions of processes and to predict future actions. For example, in economics, modeling is used to study inflation, to predict the trends of stock markets, and to maximize profits on invest- ments. In engineering applications, modeling can be used to analyze the characteristics of a process, optimize the performance, and study the functions and dynamics before a large plant is actually built and large amounts of money and effort are invested. In many branches of science, modeling is used as an aid to describe and understand reality. The term “system identification” generally refers to the determination of a mathematical model for a system or a process by observing its input-output relationships. Based on the model structures that are used, system identification methods are generally classified into two categories: Nonparametric and parametric. A nonparametric model is described by a curve, function, or table. Impulse responses and frequency diagrams are two examples of a nonparametric model. In this dissertation, we will consider the parametric model only. Such models are characterized by a parametric vector. Typical parametric models are auto-regressive with exogenous input (ARX), auto—mgmssive-moving-average with exogenous input (ARMAX) [16], and state-space structures. For modeling engineering systems and industrial processes in practice, some basic char- acteristics of the system, such as linearity and stability, are assumed to be known. Further- more, in many cases, we assume to have basic knowledge about the mathematical structure of the system such as the order of the system, the type of the model, and so on, so that it is possible to derive a specific mathematical model of the system dynamics. Consequently, only a set of parameters in the model equation are left to be determined. In this research, we derive a new algorithm for identifying model parameters which has well-analyzed con— vergence properties and an intuitively meaningful optimization criterion. This algorithm is applicable when the system is corrupted by additive and bounded noise. 1.1 Real-Time Identification and the Recursive Least Square Estimator In classical system design, modeling is accomplished once during the design stage. This is called off-line identification. In most adaptive systems, on-line or recursive identification is desirable to allow optimum adaptation in the face of environmental uncertainty and changing environmental conditions. From computational point of view, the memory requirement for recursive estimators is quite modest since not all the data are stored. In off-line identification, however, all the previous data have to be memorized. I A variety of well-known recursive techniques such as least square error (LSE), maximum likelihood, instrumental variables [16], and stochastic approximation [1] have been used to identify system parameters in real time. Among these techniques, the LSE method is most popular because of its simplicity and well-understood criteria. To review the LSE method, consider a simple linear regression model y(t) = ITO)" where x(t) is an n-vector of known quantities and 0 is an n-vector of unknown parameters. The LSE estimate of 0 at time n is obtained by minimizing the cost function u v<é> = ,2 (ya) -x"'é)” t=1 Recursive least squares (RLS) [16] estimates are obtained through an on-line iterative com- putation of the off-line LSE estimates. Weighted recursive least square (WRLS) is a variation of RLS which minimizes me) = g :1 w, (y(t) — xT6)2 where {wn} is a weighting sequence. 1.2 Introduction to Stochastic Approximation Stochastic approximation algorithm is concerned with the problem of estimating the pa- rameters of a linear system by using the noisy observations. Let f : 32’ -+ it” be an unknown function with a unique zero a, i.e. f(8) = 0 (1-1) We assume that f () can be observed at any x 6 3V. Stochastic approximation (SA) refers to a general class of recursive algorithms for estimating an equation of the form (1.1) when the observation f () is corrupted by random noise. The origins of SA can be traced back to a paper by Robbins and Monro [1] in 1951. This paper is concerned with the problem of estimating the roots, say a, of an unknown regression function f (x), which is the mean value of an observable random variable Z(x). Robbins and Monro prove that a can be estimated recursively by a'n-H = a1! + an(é,,) (1'2) where l,, is a sequence of positive scalars that converges to zero. The reason why {1"} must converge to zero is because convergence of a}, can take place only if the product l,,Z(a,,) = 0. Since Z (5,.) 7’: 0 with probability one, 1,, must be zero. This ensures that the noise is rejected by paying less and less attention to the noisy observations. Robbins and Monro state the sufficient conditions for convergence of the estimate a to the true parameter in probability [1]: 1,, z 0 2:33:11, = 00 (1-3) ;',‘_’___1(l,,)2 < 00. Any sequence {In}?i__o which satisfies (1.3) is called a sequence of type 1/n. A common choice for the sequence {la} is 1,, = 1 / n. Ljung [12] further weakened the second condition in (1.3) to 00 EU")? < 00 11:1 for some p > 1. Blum [2] (1954) extended Robbins and Monro’s theorem to the multidi- mensional case. In the next two sections we will briefly review the development of “bounding ellipsoid” algorithms and investigate the problems associated with the previous work on these meth- ods. We will then demonstrate that these problems can be overcome by an SA formulation. 1.3 Introduction to Set-Membership Theory Set-membership (SM) identification refers to a class of recursive techniques for estimating parameters of linear systems or signal models under a prior information that constrains the solutions to certain sets [25]. In contrast to conventional system identification schemes (e.g. maximum likelihood, LSE, etc.) which yield point estimates of the parameters, an SM algorithm yields sets of parameter estimates which are compatible with noise constraints. These sets are generally called “membership sets.” Based on certain set-theoretic checking criteria, SM algorithms select and use only the useful data to update the parameter estimates and refine the membership set. When data do not help refine these membership sets, the effort of updating the parameters can be avoided. SM theory was pioneered by Krasovski [3], Schweppe [4], Witsenhausen [5], and Bert- sekas and Rhodes [7] in the late 19603, in the control and system science domain for estimat- ing the state of a linear dynamic system. The optimum estimate is the smallest time—varying ellipsoid which contains the unknown system states. A major appeal of parameter set estimation is that it does not rely on the availability of a probabilistic description of the uncertainty affecting the model. It simply delineates the range of the parameter space yielding model-output values within specified ranges about the observed output samples. 0n the other hand, since the estimates are updated only when the input data contain sufficient information, some SM algorithms are more efficient in some applications. In recent years, SM-based signal processing is receiving considerable attention and is becoming increasingly popular around the world as is evident from the extensive literature on this topic [14, 20, 23, 26, 29, 31, 36, 39]. This research is restricted to the class of algorithms known as optimal bounding ellipsoid (OBE) algorithms which follow a bounded error constraint. The original OBE algorithm is attributable to Fogel and Huang [14] in 1982. Let us refer to this algorithm as “F-H/OBE.” Deller [25] and Deller and Luk [26] have reformulated the F-H/OBE algorithm so that the center of the hyper-ellipsoid is exactly the conventional WRLS solution. This algorithm is referred to as set-membership weighted recursive least square (SM-WRLS) algorithm in order to distinguish it from the F-H/OBE algorithm. The main deficiency of these algorithms is the lack of well-understood convergence properties. This problem led to the development of the modified OBE algorithm by Dasgupta and Huang in 1987. We will refer to this algorithm by “D-H/OBE.” Recently a general class of OBE algorithms, including all methods published to date as well as the algorithm presented in this work, is unified into a single framework called the unified OBE (UOBE) algorithm [38, 39]. Historically, this research on the “SM-SA” (presented in this thesis) algorithm, and that of Krunz on the “Dual-SM-WRLS” algorithm [33, 34], precedes and contributed to the development of UOBE. We shall later discuss the fact that SM-SA is an important special case of UOBE and show how the many of the results of this work apply to all algorithms in broader UOBE formulation in Chapter 5. Another OBE—like technique called optimal volume ellipsoid (OVE) algorithm was pro- posed more recently [31, 32] by Cheung, Yurkovich, and Passino to estimate parameter sets. It is proven that the algorithm is truly optimal in the sense that the volume of ellipsoid is smallest at each recursive computation. Simulation studies in [31, 32] compare the perfor- mance of OVE and F-H/OBE. They show that OVE gives tighter parameter bounds. The estimates given by OVE, however, do not show significant improvement over the estimates given by F-H/OBE. 1.4 Review of Previous Work on OBE Algorithms In principle, F-H/OBE and SM-WRLS are the same in spirit. However, SM-WRLS has a structure which is simpler and more elegant than the F-H/OBE because of the choice of the weighting sequence (see [38, 39]). Thus we begin with a review of SM-WRLS': similar results on F-H/OBE and D-H/OBE are found in [38, 39]. Note that the theoretical development of SM-WRLS can handle a complex vector, multiple-input—multiple-output (MIMO) model, we consider only real scalars here for simplicity [28]. Let P 9 3’11 2 Zfiyn-i + Zgjun-j "l’ ”11 (1'4) i=1 j=0 : 9.1.x” + v,, (1.5) where O“ = [fl, . . .,f,,,go, . . .,g,,], x: = [yn_1, . . ., ,,_p,u,,,u,,-1, . . .,u,,..,,], y" and u,, are real measurable outputs and inputs, respectively, and v,, is an unknown noise process. Let m = p + q + 1. We assume that m > 1 and, for each time n, v,, is known to be bounded according to 113, < 7,, (1.6) where {7“} is a known positive sequence. Let w(n) be a parameter set at time n such that all 9 e w(n) are feasible parameter estimates consistent with the error bounding in (1.6). In conjunction with the model of ‘Generalized results for the more general weighting policies of UOBE are found in [38, 39] to which the present research contributed. form (1.5), w(n), which is a “hyper-strip” region, can be expressed as w(n) = {0 : 0 E 52ma(‘li’n " aTxn)2 S 7n} Intersection of these sets over a given time range t E [1, 11] form convex polytopes of feasible parameters fl 52(1).) = n w(i). (1.7) i=1 In general, (2(a) is an irregular convex set that is difficult to describe and track analytically. In conjunction with the WRLS processing, however, (2(a) can be shown to be contained in a hyper-ellipsoid superset Il(n) [25] Il(n) = [a : (o — 9,)TC(")(0 — 0,.) g 1} Nu where 0(a) is the weighted covariance matrix, C(n) = C(n — 1) + Anxnxf (1.8) 5,, is a scalar quantity, n 2 5,, = 0£C(n)0,, + z: /\,,7,, ( — {1) i=1 n and G,,, the center of 5(a), is the WRLS estimate at time 11 using the weights {A3321 and can be computed recursively using lll> C(n) P'l(n) G,, = x£P(n—l)x,, 5,, = y» - 93.11:" AnP(n - 1)x,,x3,'P(n — 1) 1 + AnGn P(n) = P(n — 1) — 0,, = 0,,_1+’\,,P(n)x,,e,, Anef, 1 + AnGn Kn = 511—1 + A1177; — The relationship between fi(n) and 9(n) for two dimensional case is shown in Fig. 1.1. 5‘2 (2) (0(3) Figure 1.1. Relationship between C(11) and Q(n) for the two dimensional-case. The SM-WRLS algorithm begins with a large ellipsoid, 37(0), which contains all admis- sible values of the model parameter vectors. The objective is to employ the weights An to sequentially minimize the volume of the ellipsoid defined by mm» 2 det{~.c-‘(n)} It has been shown that the optimal weight A; is the unique positive root of the quadratic equation [14, 38] a,,AE, + b,,A, + a,, = 0 (1.9) where an 2 (m — 1)G:7n b,, = 2mG,,7,, — n,,-1G,2, — G,,7,, + 63,0“ (1-10) 2- n cu = m7" — me M40... When the roots of equation (1.9) are both negative, the data do not contain useful infor- mation and hence updating can be avoided. Since a,, is always positive, if c,, is negative then there exists a unique positive root for (1.9). When c,, _>_ 0, b,, is nonnegative because according to (1.10) ‘ b2.“ = (2m — 1)G,,7,, + 6,2,G,, — x,,_le, 17’? 2 mGnn + 5310,; — ten-10?. = G,,(m7,, + £3, — Kn-1G,,) Z G,,(m'yn - me: — 5,,_1G,,) = 0,612,} o. 6.. Therefore both roots are negative. So there is no optimal weight at this time and updating can be avoided. Thus the estimate: are updated if and only if a,, < 0. The sign of a,, serves as an indication for updating. Similar result for trace minimization in which optimization is applied to flg{§(fl)} .9— tr{n,,C"1(n)} can be found in [14, 38, 39]. 1.5 Motivation for Developing the New Algorithm Although the SM-WRLS algorithm and related algorithms have been applied in many areas, it is diffith to formally analyze the convergence properties. From simulation results, we have found that with F-H/OBE and SM-WRLS the optimal weight A; might drift to infinity as more data are used. The reason for this anamoly is as follows: The elements of the 10 weighted covariance matrix C(n) in (1.8) may become very large when more data are used. So, relatively speaking, the weight A; in (1.8) must be large enough in order for x,, to make a difference and to effectively shrink the volume of ellipsoid. To illustrate, we consider an AR(2) system given by y,, = 1.6yn-I - 0.6811,.-2 + 11”. (1.11) 11,, is a colored noise sequence generated by passing a white sequence through a Butterworth bandpass filter of order 14 with passband edge frequencies at w = 1.75 rad and 2.75 rad. The passband attenuation is -1dB and stopband attenuation is -60dB. This system is stable since the poles are located at —0.8 :l: 0.2 3' in the z-plane. Figure 1.2 shows that the gradual growth of the weights as more data are used. In this situation, even if the estimate 0,, approaches the true parameter asymptotically, the large weights still make this algorithm difficult to implement on finite word-length machines. This is just an example in which the optimal weight may diverge. In order to deal with the convergence issues, Dasgupta and Huang introduced the D-H/OBE algorithm [20] in 1987. In this algorithm, the updated covariance matrix is a convex combination of C(n — 1) and the new data outer product, C(n) = (1 — A;)C(n — 1) + A;x,x£. The number (1 — A;) is referred to as a “forgetting factor” by Dasgupta and Huang. In a sense, this D-H/OBE represents significant progress in OBE theory because it involves a formal convergence proof for the weights A} However, the optimal weight A; in D-H/OBE is obtained by minimizing x,, at each n. Since the magnitude of x,, is not clearly related to the size of the membership sets (see [30, 38, 39]), this makes the D-H/OBE algorithm difficult to interpret geometrically. Therefore, the objectives of this research are: e To develop a new algorithm to remedy the trade-off between interpretability in weight- ing strategy and convergence issues. 0 To provide a convergence proof for the new algorithm. 11 1013 A u 1 r T l I I I I II III!“ [fill 1010 I lllllIII I llllllll 107 I It'll" l llllllll 104 l l l [111“ 101 l l 1111“] 10-2 . 1 . . ‘ . 010002000300040005000600070008000900010000 11 Figure 1.2. The weights become very large when more data are used. 12 c To perform simulations to study and compare the performance of the new algorithm with that of existing methods. In the next chapter, the new algorithm “set-membership stochastic approximation” (SM- SA) is derived for an ARX model with bounded noise. SM-SA is developed in a systematic way to accommodate bounded weights and covariance matrix. The relationship of SM-SA to the stochastic approximation approach is investigated. If the weights of SM-SA are chosen to be 1 / n, then SM-SA is exactly the RLS estimator. In Chapter 3, the convergence properties of SM-SA are discussed. First we show that the sequence of optimal weights {A‘,",} converges to zero as n approaches infinity. The convergence of the weights to zero is important because it is a good indication that the estimates converge. This is so since we pay less attention to noisy observations when A; is close to zero. The main point in this chapter is the convergence of the ellipsoid and its center, that serves as the estimate of the SM-SA. For RLS, the performance of the estimator is highly dependent on the statistics of the noise sequence. In SM-SA we will see that the dependency is much less than that of RLS. Simulation studies are presented in Chapter 4. In light of the theory and analysis established in the previous two chapters, we will examine the performance of SM-SA excited by different noise sequences. The UOBE framework is introduced in Chapter 5. We will show that the convergence properties of SM-SA can be applied to the more general weighting policies of UOBE. Chapter 6 lists the conclusions of this research. We will also suggest some possible directions for future study. Chapter 2 The Development of the SM-SA Algorithm In the previous chapter, we have observed that in SM-WRLS, the weights A; may become very large, particularly for large 11, leading to huge numbers in computations and numerical instabilities. In this work, we will show a feature that tends to promote convergence of the ellipsoid by preventing the “drift” of the covariance matrix toward infinity. In this chapter, we will start deriving the SM-SA algorithm by modifying SM-WRLS so that the covariance matrix is normalized to the sum of the weights. We will develop the optimality test based on the minimum volume criterion and find the similarities between SM-SA and SM-WRLS. 2.1 The Set-Membership Stochastic Approximation Algo- rithm We begin with an ARX model given in (1.5) and let the noise bound be the same as in (1.6). For convenience, we rewrite the system here: 9 q T yn = Zflyn—i + zgjun-j "l' ”11 = a. xn + ”a (2.1) i=1 j=0 In an unweighted RLS algorithm, the problem of numerically unstable quantities can be eliminated by normalizing the covariance matrix to the time n, that is, by replacing C(n) by (1 /n)C(n). In principle, if the sequence {x,,} represents a stationary stochastic process 13 14 with appropriate ergodicity properties, then (1/n)C(n) will tend to E [x,,xZ]. Clearly, however, this strategy may not work for weights determined by SM considerations as C(n) may grow much faster than n. In the SM-SA approach, SM-WRLS is modified so that the covariance matrix is normalized to the sum of the weights. Let AfM be the optimal weight of SM-WRLS at time n. Define 2 2),”! = 2,., + AS“, 20 2 o t=1 and Mn) §;C(n) = fl; [C(n — 1) + Afonxz'] = 2i [2,,_1R(n — 1) + AfonXZ] En— SM _ 1 n T ' 2,, R(n 2,, If we define the normalized weight A,, by A3” A,, — En , (2.2) then 2,,_, _ AfM _ 2n — l — n — 1 — A,, Thus the normalized covariance matrix R(n) can be written as R(n) = (1 — A,,)R(n — 1) + A,,xnxf (2.3) where A,, 6 [0,1]. Remarkably, this approach in which R(n) is a convex combination of past normal matrix and incoming puter product, results in the same “covariance” matrix and re- cursive formula as those in D- li/OBE. However, SM- SA has been developed systematically from SM-WRLS expressly to connect the gap between the convergence issues and geomet- 15 rical meaning in previous work. This philosophy is quite different from the development of D-H/OBE in which geometrical interpretability is sacrificed. Replacing A,, for ASM leads to the modification of the SM-WRLS original recursions. ’ \ 1)! The description of the corresponding ellipsoid is given in the following theorem. W. Theorem 2.1 The hyper-ellipsoid region corresponding to the .S'M—SA algorithm is given by r(n)_{0|(9 a,,) (o 0,.)<1} (2.4) where x,, is the quantity n. = f: ( f1 (1 — A») Am.- — y?) + ofmnw... (2.5) i=1 j=i+1 G,,, which is the weighted LSE estimate at time 11 using weights {A,,}, can be computed recursively using the relations P(n) é R-1(n) (2.6) G,, = x£P(n — 1):... (2.7) 5,, = y,, - 95411,, (2.8) __ 1 A,,P(n — l)x,,x,7,'P(n - 1) P(n) _ 1 _ 1,, P(n— 1) — 1 _ A” + 1,, G" (2.9) 0,, = a,,.l + A,,P(n)x,,e,, (2.10) A,,(l — A,,)ef, x,, = (1 — A,,)rc,,_1 + A,,7,, — (2.11) 1—A,,+A,,G,,' Proof: From the bounding condition 11,2, 5 7n, the accumulated inequality holds: n 71 2v? S 271- i=l i=1 16 That is, Z(l/i - O'sz‘)2 S 27;. i=1 i=1 The relationship remains true if the individual constraints are first multiplied by any non- negative quantity. Thus, we have 2":( fi (1 - Ail) A;(y.- - 013(1)2 S i( ii (1 — Ag) A171 (2.12) i=1 j=i+1 i=1 j=i+1 where n A H (-) = 1. j=n+l Therefore, given a sequence of weights A;, (2.12) represents a global membership set, say I‘(n), to which 0" must belong. Since 9(n) in (1.7) is the smallest known set, it must be true that 9(n) C [(11). Let usfldeinote any general element of I‘(n) by 9. Upon expanding ,r/j ,... the square, we then have that z": ( fl (1 " A1)) Ail/i2 - 22”:( fi (1 - Afl) [\gygaTx, i=1 j=i+1 i=1 j=i+l + E“: ( fl (1 — A19) MOTXiXiro S i( ii (1 — A,)) Am. i=1 j=i+1 i=1 j=i+l The following quadratic inequality in 0 emerges after rearranging the terms : ( fl (1 - A5))":11'2- 291‘: ( f1 (1 - M) Aim-X.- i=1 '=i+1 i=1 j=i+1 i=1 j=i+1 i=1 j=i+1 +9T [2“: ( f1 (1 ’ A12) Amati] 9 S i ( fi (1 - A19) Am (2-13) For the WRLS solution, we have [21] a, = [i( fl (1 — m) 1,11,11,14 [f3( f1 (1 — A,)) Asa-w] . (2.14) i=1 j=i+1 i=1 j=i+l 17 Thus [2": ( f1 (1 - ’21)) Agxpcg] 9n = Z": ( n (1 - A») Aixiyi- I (2.15) i=1 j=i+1 i=1 j=i+l Substituting (2.15) into (2.13), we have f:( fi (1 - 1,)) 1,3,3 — 297' [i( 1"] (1 — A») A,x.-x,T] 0,, i=1 j=i+l i=1 j=i+1 n n n n + 9T 2 H (I — Aj) Agxgx? 9 S 2 H (1 - Aj) Ag‘yg. (2.16) i=1 j=i+1 i=1 j=i+1 For an off-line estimator, the covariance R(n) can be computed as i=1 j=i+l R(n) = i ( fi (1 - Afl) A;X;X,T. Thus i=1 j=i+1 i=1 '=i+1 in: ( fi (1 — A») My? — 29111009" + 9TR(")9 S i ( fi (1 - Ad) Am. Since R(n) is symmetric, this can be written as n 2”: ( fl (1 " Ail) A£313,449"9n)TR(n)(9"'97:)“911100911 S in: ( H (1 — *0) Am- i=1 j=i+1 i=1 j=i+1 (2.17) Since 14,, is given by Na = i( fl (1 - A1)) A{(71 — y?) + 95110091" i=1 j=i+1 (2.17) is reduced to (a — 9,)TR(n)(a — a,,) g x,,, and (2.4) and (2.5) are established. 18 The matrix inversion lemma [16]: (A + BCD)“ = A-1 — 11-13(0-1 + DA-lBrlDA"1 in which A and C are nonsingular matrices, and B and D are matrices with appropriate dimensions, can be used in (2.3) to obtain (1 -— A,,)“1P(n — 1)x,,x£(1 - A,,)‘1P(n — 1). P = 1—A,,‘1P —1— 2.18 (n) ( ) (n ) A;1 + x£(1 — A,,)‘1P(n — 1)x,, ( ) Replacing x£P(n — 1)x,, by C(n) in (2.18) yields 1 _ (1 — A,,)‘1P(n — 1)x,,xTP(n — 1) P = P - 1 — n (n) 1 — A,, _ (n ) A? + (1 - A,,)-1G,, r __ T _ = 1 P(n— 1) _ A,,P(n 1)x,,x,,P(n 1) . 1—A,,_ l—A,,+A,,G,, For the WRLS estimate G,,, we have a. = Pm [)3 ( H (1 — A») Ann-ye] i=1 j=i+1 _ 1 ' A,,P(n — 1)x,,x£P(n - 1)“ " ' n , _ 14” P(n—1)— 1—A..+A..G.. Z; _1'1(1—1,) A,x,y, . . i— J—i+1 _ 1 ' A,,P(n — 1)x,x£P(n — 1)“ " 1—.\,. _P("‘1)‘ 1—A,,+A,,G,, J n—1 n-1 ' (1 - A102 H (1 - *1) MHz/1+ Anxnyn i=1 j=i+1 A,,P(n - 1)x,,xT0,,_1 1 A2P(n — 1)x,,G,,y,, = G,,- - n — n - n n - n 1 1—A,,+A,,G,, +1—A,[’\ P(“ my 1—A,,+A,,G,, _ 0 _ A.P(n — 1)x.x£0.-1 + A.P(n - 1)x.y. (1 1.6.. ) ' "‘1 1—.\,,+.\,G,, 1—.\,. 1—A,,+A,,G,, __ 0 _ AnP(n - l)xnx£0n_1 A,,P(n — 1)x,,y,, ‘ "‘1 1 — A,, + A,,Gn 1 — A,, + 1,0,, = 911-1 + A,,P(n — 1)x,,£,, 1-A,,+A,,G,,' 19 Note that 1 A,,P(n — 1)x,,G,, P(n)x,, _ 1 __ 1,, [P(n— 1)x,, — 1 __ 1,, + (“an _ P(n — 1)x,, (1 _ A,,Gn ) — 1—.\,, 1-A,,+A,,G,, _ P(n — 1)x,, - " 1 - A,, + A,,Gn Therefore 0,, = 0....1 + A,,P(n)x,,e,,. To show (2.11), consider Kn Thus the proof is complete. n-1 n-1 (1 — An) 2 ( H (1 "' Ail) )‘i(7i ’ Ila?) + An(7n — 31721) i=1 '=i+1 +(93i—1 + AanP(n)6n)R(n)(0n_1 + AnP(n)Xn€n) n-1 n-1 (1 - A») Z ( H (1 - 15)) 11(71- 1!?) + Am. - 113.) J=i+1 +0£_1R(n)0n_1 + 2A,,x,7,' 0,,_1 5,, + Aix£P(n)x,,ef, n—1 n—1 (1 - An) 2 ( H (1 " A») A471“ 3112) + An(7n - 11:) + 31193—139167: i=1 j=i+l i=1 ' P in — 1 x,,6,2, +(1 — An)0£-1R(n - 1)0'1-1 + An0£_lxnx£0n-l + Aile —( A" +)’\nGn (1 — An)Kn—1 + 1.17. — 143.) + mien-1)” + 21.19? 92-15" + "i 1 — 13:53.0. (1 — A,,)nn_1 + Arm. — A. [112 - 2(XZOn-1)yn + (’91.' 911-02] + A311 _ AffanGn (1 — An)"n—1 + A,,), - 1,5,2, + 13.1 _ Affinan (1 - Anya-1 + A1177; — 1A:(:n—+’\;:§n ' frf‘ 20 2.2 Initial Values The covariance matrix in SM-SA is i=1 j=i+1 Rm= :(fiu-n)ms In order to start the recursive algorithm, it is necessary to choose initial values for 0,, and P(n). If we incorporate the initial condition into the covariance matrix, then it will become R(n): 1210—1.) no+2nj(1'[ (1—1,-))1.T.-x,-x. (2.19) i=1 i=1 j=i+1 Note that [H?__.1(1 — A,-)] R0 will not be zero if R0 is chosen to be positive definitive. Generally speaking, P(O) = R61 reflects the confidence in the initial estimates 00. If E- .. P(O) is large, from (2.10) we see that the next estimates, 0,, will jump away from its initial value. Without any a prior information about 00, it is common to take 00 = 0 and P(O) = 0'1. where a is a “large” number, typically 106 so that the impact of the choices of initial values is small. From (2.19) we can also see that this choice will ensure the estimates to be close to the off-line LSE estimate. Note that given any initial condition P(O), if n “11:30 11(1 — 1,) = o (2.20) i=1 then the recursive estimates will approach to the off-line estimates asymptotically. Using material in Appendix A, (2.20) implies that iA;=oo n=1 From theory of stochastic approximation, as discussed in Chapter 1, this is an essential 21 condition for the convergence of estimates to the true parameters. 2.3 The Optimal Weight The optimal weight A; of SM-SA is found by using the minimization criterion 1 introduced in Chapter 1. That is, A; minimizes 1108101)} 3 det{x,,R'l(n)}. Let B(n) é x,,R'1(n). Then by definition, B(n) = x,,P(n) x,, A,,x,,x£B(n — 1) ' B(n ' 1)(1 — 1,.)1r;,,_l ’ n,,-1(l — 1.. + AnGn) ' Define h,, 2 1 — 1,, + 1,,G,,, (2.21) and A 1 m. An‘rn Ans“ dn = _ _ 1 + — —-"—.. 2.22 I — A,, x,,_1 (1 - A11)1“1ri-1 "n-l hn ( ) Then B(n): B(n -- 1) {d,, I — d fl[B(A"x--——h—:n — 1)x,,]T} ,,_ and d ,,A,,x,, det B(n) = det B(n — 1)det {d,,I — Kn ——[B(n - 1)anT} Since B(n — 1) is a known quantity at time n, it suffices to minimize the volume ratio V(A,,) given by A det B(n) det B(n— 1). V(1,,)= Thus d ,,A,,x,, V(A,,) = det {11,1 — n. —’:;[B(n — 1)x,,]T}. (2.23) 22 Using the matrix identity [26] det(cI + sz) = c'"“‘(c + yTz) (2.24) and letting d A x 4 c=d,,, =—-1—n—n-, z=B n—l x,,, y 81-1 h.. ( ) then (2.23) becomes V(’\n) = dill-l (dn Kn—lhn ( A,,xznfl-1P(n - 1)x,,) 1 _ Kn-lhn (Kn—1hr: - AnKn-IGn) Kn-lhn (11,, - A,,G,,) h d,,A,,x,7,'B(n -— 1)x,,) u 11”: ll :1”: a“; a”? To minimize the volume ratio with respect to A,,, we must differentiate (2.25) and set the result to zero: 0 _ 6 _, a 1. m.) — K [m1 - 1.1». ] dart-I = Fug—ha (2.26) where 0d,, P(An) = (1 — A,,)mhnai— - (1,107,. (2.27) Since 6d,, _ 1 x,, 1 0 x,, a—A: — (I — A,,)2 n,,_1 + (I - A") 6A,, n,,_1 (228) 1 1 7,. _ 53,[1 — 1,, — 1,11,] —d.. + —— -1 + , (1 '- An) (1 '- An.) Kit-I hi’cn—l 23 (2.27) becomes m1.) = mh,, [d,, — 1 + “:1 — 531“ 22;":1*"h")] — 1.0... (2.29) The Optimal weight A; is found by setting F (A,,) = 0. Thus mh,, [d,, — 1 + “:1 - €:(1—113:\:..:1A;hn)] = d,,Gn. (2.30) Expanding the h,, and d,,, we have a quadratic equation a,,13, + (1,1,, + c, = 0 (2.31) where an = mm. - "163, + mem. — 2mGn7n - 161—10.. + ten—1G,”, + Gn7n - 0.2.7.1 - 53,0, b,, = 2me; — 2m7,, + 2mG,,7,, + 2n,,_1G,, — n,,_lG; — G,,7,, + 6:0,, 2 a,, = m7,, — me; — x,,_1G,,. This SM technique is called set-membership stochastic approximation (SM-SA) because of the resemblance to the stochastic approximation method. The central difference between D-H/OBE and SM-SA is that the A; in their algorithm is obtained by minimizing x,,, while in SM-SA.it is obtained by minimizing the volume ratio. Indeed, there is an ellipsoid associated with Dasgupta-Huang’s OBE at each step, but the meaningful measures of its size are ignored in the optimization process — sacrificing interpretability. 2.4 Data Selection In this section, we will derive the criterion for updating. From (2.25) and the definitions of h,, and d,, in (2.21) and (2.22), it is easy to see that V(O) = 1 and V(A,,) > 0 for all 0 < A,, < 1. Note that d,, and h,, are unity when A; = 0. Thus from (2.26) and (2.29), we 24 have 6V(’\n) = 2.32 0A7; fl:0 F(O) ( ) 2 ___ m( “In _ En ) _ G,, (233) Kn-l 511—1 6,, ‘ = ~14 (2.34) where 1c,,_1 > 0 by definition (2.4) for A,,-1 6 [0, 1). Equation (2.34) reveals some interesting relationships between the sign of c,, and the volume ratio, because if a,, < 0, there exists some A,, such that the volume ratio is less than unity, which means that the volume shrinks. Before we develop the main theorem on this point, we need some preliminary lemmas: Lemma 2.1 If G,, < 1, then h,, > 0 for all A,, E (11%;) and lim h,, = 0+ A,,—9%;- IfG,, > 1, then h,, > 0 for all A,, 6 (1710.71 1) and lim h,, = 0" 1 + An" 1-G,, where h,, is defined in (2.21). Proof: By definition, h,, = 1—1,,+A,,G,, = 1 — A,,(l — G,,). /—\. If/an) 1, then 1/(1 — G,,) > 1. So the interval (1,1/(1 — G,,)) is not empty. For all AJE (1, 1/(1 — G,,)), we have 1 1 0 and lim h,, = 0+. .__1_- ”"T1—Gn If G,, > 1, then 1/(1 -— G,,) < 0. Therefore the interval (1/(1 — G,,),O) is not empty and for all A,, E (1/(1 — G,,),O), we have 0 when A,, 6 (1/(1 - G,,),O) and lim + h,, = 0"". I burr-:6: Lemma 2.2 lim V(A,,) = +00 Anfll- Proof: From (2.25), we have V(A,,) = “39%. Since lim1,,_.1- h,, 9t 0, it suffices to show that lim 11;"(1 — A,,) = +00. Angl- 26 From the definition of d,,, «1:10— A.) = (1),")... (5:1)...“ - 1.) _ 1 ( x,,, )m — (I — A,,)m-l Kn_1 . The positiveness of x,, can be immediately seen from (2.4). Thus at any time n, x,,/x,,_1 > em__ fifl._._—r—----__ __ _ _. A- m“! - -—--" I~ 0' SO ”(I — An) approavChes +m as An approaches 1-. (I, *’ (xi, ‘I fr; he)" "f‘if'.’ . ' [3 Theorem 2.2 If c,, 2 0, then there is no extremum of V(A,,) in the open interval (0, 1). Proof: From (2.34) we know that aV(/\n) — > 01,. 1..=o - 0 when c,, 2 0. First we consider the case when c,, > 0, for which there are three possible sub-cases: 1. Suppose G,, < 1. Then there are two asymptotes at A,, = 1 and A,, = '1-_IG',I > 1. Claim 2.1 If c,, > 0 and G,, < 1, then A,,—91+ lim dn = —m. 1 _ burr-7:: Proof of Claim 2.1: From (2.22), A 1 x,, A,,7,, A,,e2 d = — = 1 + _ n n 1 -' An Kn—l (1 - An)"n--1 Kn—lhn Note that since lim h,, > 0, 0"] 27 and since x,,_1 > 0 at time n, the second term in the right hand side will dominate in the limit. So lim d“ = -w. Avg-91+ Similar arguments also apply when A,, approaches 1 / (1 — G,,)', because in this con- dition, 1 — A,, d roach zero, but h,, approaches 0” from Chi/1:131. So the last term dominates and 4 ///.’/’ 5k, d,, = --00. lim 1,.—+,—_15;" Cl Note that for all A,, 6 (1, 1 / (1 — G,,)), the sign of 1 — A,, and h,, remain unchanged, so from (2.25) and Claim 2.1, the value of V(A,,) near the two asymptotes (A,, = 1 and A,, = 1/ (1 - G,,)) will both approach +00 or —oo, depending on whether m is even or odd. Note that in Lemma 2.2 we have shown that V(A,,) approaches +00 when A; approaches 1‘, so minima can not exist in (0, 1) because otherwise more than two extrema will exist. Therefore there are only two possibilities in this case, as illustrated in Fig. 2.1. V(An) V(An) ‘ 1 1/ \+ ‘| U + Figure 2.1. Plots of V(A,,) when c,, > 0 and G,, < 1. 28 2. Suppose G',, = 1. Then there is only one asymptote at A,, = 1. Note that in this condition, a,, = -me,2, — 6,2, < 0. So c,,/a,, < 0. This means that the extrema have different signs. If there exists one extremum in the region of (0, 1), then there must be another extremum in (0, 1) since lim,\n_,1_ V(A,,) = +00. This is contrary to the fact that the roots’ signs must be different. Therefore we conclude that there is no extremum in (0, 1). 3. Suppose G,, > 1. Then there are two asymptotes at A,, = 1 and A,, = ,—_1G—"- < 0. We claim that lim + V(A,,) = +00. (2.35) An‘-’ l—Gn For convenience, we rewrite (2.25) as follows V(A,,) = 172$. Note that lirnA , + h,, = 0+ from Lemma 2.1, 1 - A,, > 0, and fl_’"Tn 1— 2 d,, = 1 x,, = 1 + A,,7,, A,,(sfl 1 — ’\n Kn—l (1 " An)Kn-1 — Kn—lhn —) +00. (2.36) Equation (2.36) holds because h,, -> 0+ and A,, is negative. Now (2.35) is satisfied which implies that there must be one local minimum in the interval (1 / (1 - G,,), 0). If there is one extremum in the interval of (0, 1), then there must be another extremum since lim (v.1- V(A,,) = +00. This situation can be seen in Fig. 2.2, which shows the existence of more than two roots in (2.31)—contradiction. Thus we conclude that there is no extremum for A,, E (0, 1). Suppose c,, = 0. Then and am.) 0 0A,, “:0 . We.) 11/ W =0”- 29 V(A,,) 1 1/-\./ :15; I A" Figure 2. 2. When G,, > 1 and c,, > 0, if there exists one extremum of V(A,,) 1n (0,1), then there exist more than two extrema. . This means that V(A,,) is convex in the neighborhood of A,, = 0 and there is one minimum when A,, = 0. If there exists some extrema in (0,1), there are at least three extrema which violates (2. 31) since the number of extrema must beceyen.) So we ~conclude that there 18 no extremum in (0, 1), and if there exist some A,, in (0,1) suchlthht (2.31) holds, it must be a saddle point. The only possibilities with and without saddle points can be seen in Fig. 2.3. C] V(A,,) V(An) (I) (b) Figure 2.3. When c,, = 0, the the only two possibilities for V(A,,) in (0,1) are: (a) There is a saddle point, or (b) no extrema and saddle points exist. 30 Theorem 2.3 Suppose c,, < 0. Then V(A,,) has a unique minimum in the open interval (0, 1). Proof: When c,, < 0, we have 3V(/\n) 0. 6A” A,,:O < By construction of the algorithm, there are two A,,’s such that (ax/(1,.) ___ 6A,, 0. Also, from Lemma 2.2, lirn V(A,,) = +00, A,,-91‘ so there is at least one minimum in (0, 1). If there exist more than one extremum in (0, 1), then there must be at least three extrema as shown in Fig. 2.4, contrary to (2.31). Therefore, there must be exactly one minimum in the interval [0, 1). Cl V(An) A \ Figure 2.4. When c,, < 0, if more than one extremum appear in (0, 1), then there must be at least three extrema. 31 From Theorems 2.2 and 2.3, we conclude that there exists an optimal weight if and only if c,, < 0. Theorem 2.4 There are always two real roots of the quadratic equation (2.81). Further- more, if c,, < 0, then there is a unique root A; E (0, 1) with A" _ —b,, + Vb; — 4a,,c,, 2a,, (2.37) Theorem 2.4 implies that if c,, < 0, then the optimal weight A;, which belongs to (0, 1), can be obtained from (2.37). Proof: First we will show that the real roots of (2.37) exist. Let a,,, b,,, and c,, be coefficients of (2.31). For convenience, we rewrite them: a,, = m7,, — me; + mG;7,, - 2mG,,7,, - 53-10,, + x,,-lG; + G,,7,, - 0,2,7” — 5:0,, 0- a l 2m5; — 2m7,, + 2mGn7,, + 2n,,_1G,, — x,,_1G; — G,,7,, + 5;G,, _ 2 Define 5,, and b,, as: "D (m — 1)G;7,. (2-38) @I IID G,,(2m7,, — x,,-lGn — 7,, + 5;). (2.39) Then and 32 Note that the expression of 6,, and b,, are exactly the same as the coefficients of the quadratic equation in (1.9). Let the solutions of (2.31) be A; and A3,. Then A1 _ _bn + Vb; — 4a,,c,, ” 2a,, A2 —b,, — Mb“! — 4a,,c,, 2a,, Note that b; - 4a,,c,, = (—2c,, + b,,)2 — 4c,,(c,, — b,, + a,,) = 4c; — 4c,,b,, + '13,”, — 4c; + 4c,,b,, — 4c,,E,, = b,, - 4E,,c,, = Gib: + 463,1712711 — 2GnKn-1‘Yn - 253.711 + 03.03-; " 26iG'nKn-l '1' 5:) = G3.(G?.K3.-1 + 73. + 63. - 2G,,n,,_17,, + 253,1, - 2€§Gn~n-1) +46im’7nG3. - 403,535), = G;(G,,n,,_1 - 7,, — 5,2,)2 + 4£;G;m27,, — 46;G;7,, = G;(G,,K.,,_1 — 7,, — 5;)2 + 415,2,G27,,(m2 — 1) 2 0. n Thus two real roots always exist. To prove that A; = A;, consider the following four cases: 1. Suppose a,, < 0 and b,, < 0. Then «b,,; — 4a,,c,, < [b,,] since a,,c,, > 0. So —b,, :l: «b,,! — 4a,,c,, > 0. But since a,, < 0, both solutions are negative. This violates the existence of positive A; when c,, < 0. Therefore we can rule out this possibility. 2. Suppose a,, > 0 and b,, > 0. Then A; < 0. The only possibility to obtain a positive root is to choose A; = A3,. 3. Suppose a,, > 0 and b,, < 0. Then «b,,; — 4a,,c,, > |b,,| since a,,c,, < 0. So A; < 0 and A; > 0. To obtain a positive root, we must have A; = A3,. 33 4. Suppose a,, < 0 and b,, > 0. Then both roots are positive when c,, < 0. Since a,,c,, > 0, Mb; — 4a,,c,, < |b,,| and hence A; < A3,. To obtain the unique root in (0, 1), we must have A; = A;,. We conclude that A; = A; when c,, < 0. CI 2.5 Relations Between SM-SA and SM-WRLS As shown in Theorems 2.2 and 2.3, the ellipsoid will shrink if and only if c,, in equation (2.31) is negative. If we compare c,, in (2.31) and c,, in (1.9), we find that they are exactly the same quantity. This fact led us to explore whether relationships exist between SM-SA and SM-WRLS. The following simulation shows some apparent equivalencies between these two algo- rithms. Consider the system yn = 1-6yn—1 "’ 0°68yn—2 + ”n (2.40) where {v,,} is a sequence of pseudo-random numbers uniformly distributed in [—0.5,0.5]. A set of 120,000 data points is generated using the model. Table 2.1 and Table 2.2 are the results of the simulations for SM-SA and SM-WRLS using the system in (2.40). It can be seen that both algorithms update the parameters at the same time and result in the same “volume measure” and estimates. These empirical results, and others due to Krunz et al. [33, 34], led Deller et al. [38, 39] to the discovery that the volume measure defined by det B(n), the data points selected, and the parameter estimates 0(n) is independent of the choice of weighting sequences when UOBE is considered. Thus the relationship between SM-SA and F-H/OBE is expected to be the same as that between SM-SA and SM-WRLS. This implies that the present research on SM-SA has much broader implications for the entire class of UOBE algorithms. Details will be discussed in Chapter 5. On the other hand, SM-SA has some features which are not necessarily general. The 34 simulation results also reveal that the average weights satisfy Robbins-Monro’s require- ments in (1.3) under the condition that the noise input is a pseudo-random number with uniform distribution in [Vifl. Figure 2.5 shows the relationship between the time and the “averaged optimal weight” from six experiments. The vertical and horizontal axes are logarithms of weight and time, respectively. The two dotted lines are logarithms of 1/n and Mn“. The curve of the averaged weights is very close to the curve of 1 / n and hence the estimates could approach to the true parameter. Therefore, experimental results show that SM-SA has a great potential in theoretical development and its theoretical analysis can help explain some features in SM-WRLS which cannot be obtained directly from the original algorithm. 10‘ I W I V‘TV‘T I V I UU'FIU I I V I'UIU § a...~‘ ". ‘. P 't ~§ — .. ~~ .. \- ' Q '0. ‘.~ ._' .. . \‘. § lo-l - 102 - 103 ~ 105 '- --..,,_..'. q Lljlllll l 1 .Llllle l I lllll 106 . , 10° 101 102 103 Figure 2.5. The relationship between logarithms of time and weight 35 SM-WRLS SM-SA time volume measure 5,, time volume measure 5,, 1 1588030880 0.045102 1 6250000000 0.045102 2 61090735 0.325878 2 122037760 0.325878 3 75912.859375 -1.841745 3 93237.984375 -1.853359 4 67.021507 6.571213 4 66.320511 8.059097 5 19.407402 0.208971 5 19.290403 0.212660 6 17.094551 -0.261806 6 17.005600 -0.262197 7 6.657953 -0.219103 7 6.645070 -0.222424 8 2.566225 0.210988 8 2.566096 0.208843 9 0.861677 -0.235759 9 0.861815 -0.236259 10 0.399760 -0.076875 10 0.399806 -0.076872 11 0.191688 -0.458952 11 0.191711 -0.458865 12 0.191604 0.157472 12 0.191627 0.157516 13 0.169054 -0.062481 13 0.169072 -0.062429 17 0.168957 -0.355722 17 0.168976 -0.355732 18 0.130553 -0.420661 18 0.130561 -0.420695 19 0.102843 0.197666 19 0.102846 0.197650 20 0.089835 0.242533 20 0.089836 0.242511 21 0.051043 -0.607100 21 0.051041 -0.607123 23 0.044455 -0.433844 23 0.044454 -0.433845 41 0.044317 -0.433763 41 0.044316 -0.433759 55 0.043492 0.340501 55 0.043491 0.340501 58 0.043200 0.330139 58 0.043199 0.330139 115034 0.000072 -0.498830 115034 0.000072 -0.498830 115258 0.000072 -0.500641 115258 0.000072 -0.500641 115280 0.000072 -0.496111 115280 0.000072 -0.496111 115463 0.000072 -0.491712 115463 0.000072 -0.49l711 116302 0.000072 -0.483166 116302 0.000072 -0.483166 116350 0.000071 -0.499281 116350 0.000071 -0.499281 116643 0.000070 -0.503710 116643 0.000070 -0.503710 116738 0.000070 -0.501659 116738 0.000070 -0.501659 116934 0.000070 -0.500541 116934 0.000070 -0.500541 117060 0.000069 -0.499384 117060 0.000069 -0.499384 117354 0.000069 -0.491482 117354 0.000069 -0.491482 117706 0.000069 -0.502020 117706 0.000069 -0.502020 118516 0.000068 -0.503093 118516 0.000068 -0.503093 118709 0.000068 -0.498520 118709 0.000068 -0.498520 Table 2.1. The data-selection and the performance of SM-WRLS and SM-SA for the AR model (2.50). 36 SM-WRLS SM-SA time 01 “2 time 01 02 . 010-03101-1 115034 115258 115280 115463 116302 116350 116643 116738 116934 117060 117354 117706 118516 118709 0.000000 2.659994 2.659676 2.504596 2.485668 1.602177 1.600430 1.601612 1.601638 1-596280 1.594492 1.591593 1.602305 1.604294 1.602866 1.603187 '1.597315 1.600800 1.605809 0.000000 0.000000 2.490193 . -1.994164 -1.956294 -0.678578 -0.676362 -0.677525 -0.677555 -0.673320 -0.670988 -0.668343 -0.678133 —0.680194 -0.678490 -0.678821 -0.678648 -0.681055 -0.682360 .01000101-1 115034 115258 115280 115463 116302 116350 116643 116738 116934 117060 117354 117706 118516 118709 0.000000 2.662042 2.661741 2.505078 2.485804 1.602176 1.600430 1.601612 1.601639 1.596280 1.594492 1.591593 1.602305 1.604294 1.602866 1.603187 1.597315 1.600799 1.605808 0.000000 0.000000 -2.496096 -1.995054 - 1 .956493 -0.678578 -0.676362 -0.677525 -0.677555 -0.673320 -0.670988 -0.668343 -0.678133 -0.680194 -0.678490 -0.678821 -0.678648 -0.681054 -0.682360 Table 2.2. Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a, = 1.6 and a; = —0.68. Chapter 3 Convergence Issues In the previous chapter, we have developed the SM-SA algorithm with a selective updating scheme and non-increasing volume. In this chapter, the convergence properties of this algorithm will be discussed. However, for all the estimation problems, the parameters in model (2.1) can not be determined unless some conditions are imposed on the input signal. A commonly assumed condition is persistency of excitation (PE)‘in which the input signal is sufficiently rich in frequency components. PE has been defined in a variety of ways [15, 17, 19, 20, 21, 22]. The following definition has been used extensively in adaptive control and system identification. Definition 3.1 A signal u(t) is said to be persistently exciting (PE) of order n if for all t there is an integer N such that t+N 911 < Z w(thTUC) < 021 (3-1) h=t+1 where 91, 92 > 0 and the vector w(t) is given by 90(1) = [u(1 - 1) - - °u(t - ”HT 'Please read “PE” as “persistency of excitation” or “persistently exciting” as appropriate. 37 38 The underlying meanings of the PE condition have been discussed extensively in the literature [6, 15, 17, 10]. From a frequency-domain point of view, a stationary signal u(t) is persistently exciting of order n if its frequency spectrum is non-zero at n points (or more). That is, its frequency components have to be rich enough so as to drive the estimates toward the true parameters. According to this definition, white noise is persistently exciting of all orders. Research in SM-theory, however, is concerned with PE of x,,, the regression vector. Anderson and Johnson [15], and Bitmead [17] proposed that the PE of the input signal implies the PE of x,, under some mild restrictions on the system function. The major result is given in the following theorem: Theorem 3.1 [15, 17] Consider a linear system with transfer function H (z) = B(z)/A(z) where A(z) = z” — alzr".1 — ~ - - — apz”'1’ and 8(2) = boz" + 512’"1 + - - . + qu”“1, which are co-prime, and with input sequence {uk} and bounded output sequence {yk}. Then with .f/ l“\~/ x; = [y,,_1, . . .,y,,_,,,u,,,u,,_1, . . .,u,,_q], PE of {a,,} implies PE of {x,,}. Non-coprimeness of A(z) and 8(2) usually renders the parameters non-identifiable. This discussion, however, is far beyond the scope of this research. For LSE estimates of the parameters of a finite impulse response model with n param- eters, the estimate is consistent and the variance of the estimate goes to zero if the input signal is persistently exciting of order n. In order to deal with the convergence issues in D-H/OBE, Dasgupta and Huang claimed that if for all t, t+N 91I < Z xkxf < 921 (3.2) k=t+l for some N 6 N and 91, 92 > 0, then the covariance matrix C(n) = (1 - A;)C(n — 1) + 1:91.11; 39 in D-H/OBE, which has the same form as R(n) in SM-SA, is nonsingular. From Theorem 3.1, if the system function is co-prime, then the statement in (3.2) is validated. However, examining the proof of their argument, we find that the condition (3.2) is not sufficient because of the time index mismatch. In (3.2), It varies over all the natural numbers between (t + 1) to (t + N). In their proof, however, the time index varies only over those instances at which updates take place. Consequently, we require a more strict definition for PE which would accommodate non-uniform time indexing. Given that x,, is PE, the question of whether the subsequence of {x,,} in which the updates take place is PE or not is still an open issue and is difficult to prove formally. However, this problem can be simplified if we impose the mixing condition to the noise sequence. The mixing condition will be introduced in Section 3.4. For the moment, we assume that the subsequence of {x,,} used in updates is PE. For SM-SA with ARX model, we give the following theorem: Theorem 3.2 Given the update equation: R(n) = (1 — A,,)R(n — 1) + 1,,x,x£, 11(0) = no > 0 (3.3) the eigenvalues of R(n), for all n, will remain bounded if the signal energy is bounded, i.e., there exists an L 6 R such that xxx“ 5 L < 00 for all n, and 0 S A,, 5 1. Let {t,,} be a subsequence such that c,,, < 0 Vn. If there exists an N 6 N and 91, 92 > 0 such that for all ’12 I] n+N 911 < E x,,xg; < ml, (3.4) .1 k=n+1 / then R(n) is nonsingular. Proof: Use induction. Note that from the above recursion, R(n) Z 0 for all 71. So it suffices to show that trace(R(n)) < 00. v , ,J ~~ “111/ 40 l. n = 1 R(1)= (I — A1)R(0) + Alxnx; tr[R(1)] = (I - A1)tr[Ro] + A1t1‘[X1X¥‘] = (1 — A1)t1'[R.o] + A1X¥X1 S max[tr(Ro), L]. 2. Let tr[R(n)] S max[tr(Ro), L] < 00. We have to show that tr[R(n + 1)] S max[tr(Ro), L] < 00. Since R(n + 1) = (1 " A,,+1)R(n) + A,,+1x,,+1x;+l, tr[R(n + 1)] (1 — Art-H )trlR(n)l + An+lxz+lxn+l IA multr(R(n)),x£..x..0 IA InnaLXItr(R(")), L] IA max[max(tr(Ro), L), L] max[tr(Ro), L]. Therefore, the sum of all eigenvalues is bounded for all 71. So the eigenvalues have to remain bounded. The non-singularity of R(n) under the assumptions in (3.4) that the subsequence of {x,,} in which updates take place is PE has been proven in the paper by Dasgupta and Huang [20]. EJ 41 Theorem 3.2 ensures that the covariance matrix R(n) in (3.3) is always bounded and the boundedness property holds for all the noise processes. Although boundedness of R(n) is sufficient for proofs of some fundamental properties, positive definiteness of R(n) is essential in order to prove the convergence of parameter sets as discussed in the next few sections. 3.1 Fundamental Convergence Properties of SM-SA The most important feature of SM-SA, which differs from previous work, is the convergence of weighting sequence {A;}. In this section, the fundamental convergence properties of SM-SA are analyzed. [ N6] NHL-20") 'lC C(n/(761” hit 6] $5.10]; (1.; Theorem 3.3 Let (Rm, I] ' ll) be a normed space. Given the R(n) in (3.3) and the update equation (2.7)-(2.11). Let A; be defined as “bn+vb?;-4¢ncn 1.an < 0 A. 20a ’ , 0, otherwise If infinitely many data points are selected, then the SM-SA algorithm developed in Theorem 2.1 converges in the sense that "11.120 A; = O (3.5) "11.113063, 6 [017nl- (3'6) Furthermore, if R(n) is positive definite, then "11.1130 “0,, — 0,,_.1|| = 0. (3.7) If there exists positive real numbers 91 and 92 and integer N1 such that for all k f. \ k+~1 glI < z x,,x; < ml (3.8) n=lc+l 42 then [31] a! _ t 2 4 n+N1 lim 0 — 0 < _ lim 1' 3.9 n—ooo ll nll2 — 91 "“00 £3.24 7 ( ) The proof of this theorem is very significant because, as a member of UOBE [39], SM-SA sets the stage for a formal proof of the converging behavior of this family. Before we prove this theorem, we need the following lemma: Lemma 3.1 Consider the coefi‘icients of the quadratic equation (2.31) and the conditions stated in Theorem 3.3. Let c; and c; be defined by c,,, i c < 0 c; = f " . (3.10) 0, otherwise 0: = cm ifcn > 0 0, otherwise Then lim,,_.°° c; = 0 if and only if lim,,_,oo A; = 0. Proof of Lemma 3.1: From Theorem 2.4 and the definition of A; in Theorem 3.3, if c,, < 0, then A" = —b,, + Vb; — 4a,,c,, n 2afl Let 'd',, and b,, be the same as in (2.38) and (2.39), then from the proof of theorem 2.4, we have —2 — b; — 4a,,c,, = b,, — 4a,,c,, If c,, < 0, then A; becomes ,_ 2c; - b,, + Vb; — 4(m — 1)G3,c,77 ‘ A = 2a C (3.11) Ti. Suppose lim,,_.°° c; = 0. If lim,,_.°o a,, 75 0, then from (3.11) we know that lim,,_.°o A; = 0. Iflim,,_.°° a,, = 0, then we claim that lim,,_.oo b,, 76 0 because from the definitions of 5,, and 43 b,,, we can write an+bn=_cn+6n and for c,, < 0, we have .1919 + 3,.) = .132. a» + .22. bn = lim 0,, n—ooo = lim (—a; +6“) 71-600 = lim 6,, fl—‘W = (m - 1)Gn7fl> That is, lim,,_.oo b,, > 0. Thus from (2.31), lim 1;; = — lim 91 = 0 n—ooo n-ooo bn Suppose lim,,_.oo A; = 0, it follows from (3.11) that when c,, < 0, then in the limit 2c; — b,, = —\/b; — 4(m — 1)G3,c;7n Thus / “11.00 'm;[4c — 4a,", b,, + b 3,] = "lim 0°[b; - 4(m— 1)G;c;'7,] and m[4c;" — 4c;b,, ] = “lim —4(m — 1)G,2,c,, 7] fl-Ougfi Suppose lim,,_.°° c; at 0. Then lim c,, =-nle [(m—1)G,, 7+b,,.] fl-‘m 44 The numerator of equation (3.11) becomes 2c; — b,, + «lb—,2, — 4(m — 1)G3,7flb,, + 4(m — 1)2G;7: = 2c; - b,, + \/(b,, - 2(m — 1)G,2,7,,)2 (3.12) Comparing (3.12) and (3.11), we have in the limit ,flb‘, — 2(m — 1)G§,7)2 = \/53, — 4(m — 00.2.9.7, (3.13) Note that in left hand side of (3.13), \/(b,, -— 2(m — 1)G3,7)2 < [b,,]; while -in right hand side of (3.13), \/b,2, — 4(m — 1)G,’,c,,7n > |b,,| since c,, < 0. This is a contradiction. So the only possibility when lim,,_..°,, A; = 0 is lim,,__.oo c,, = 0. 0 Proof of theorem 3.3: Suppose A; does not converge to zero. Then there exists a strictly increasing sequence (m,,);°=l such that A;,“ 2 a for some a > 0. From Lemma 3.1, we have lim sup,,_,°° c,, < 0. From (2.34), 6V(A,n,,) _ __ c,,,,, BAN“; Amn=0 — F(O) - nmn" . For the “volume measure” of P(m,,), we have det B(m,,) = det x,,,nP(m,,) = Km" det P(m,,) Since R(m,,) is always bounded, det P(m,,) = det R‘1(m,,) > 0. Also note that {det B(n)};g, is non-increasing, so 16,, is bounded. Since 1imsup,,_,°o c,, 314 0, there ex- /. \ ists infinitely many n such that 0V(A,,,,,) _ c,,," —- < —b 0A,"; Amnzo ”mu—1 where b > 0. So there are infinitely many n such that V(A;,n) 5 c for some 0 < c < 1. Let M,,,,, be the volume at time m,,, then Mm; = Vonle, ' - -V,,,,,. Thus lim,,_,°o M,,,,, = 0. 1 (.1' 45 Note that {Mn} is non-increasing. Thus we have 1im,,_.oo M,, = 0. By definition, M,, = det n,,P(n) = K? det P(n). wire] i/ I]. Since det P(n) is positive definite by Theorem 3.2, lim,,_.oo It? = 0. This leads to the conclusion that lim,,_.oo x,, = 0. Let 0,, = 0,, — 0", then 5n = yn - 07T—1xn = 031+ v,, — 0;_1x,, (3.14) ~T Since the volume of the ellipsoid converges to zero, the estimates converge to the true . - . ~T parameters. Thus 11m,,_.°° ||0,,-1|| = 0 and hence 11m,,_.°° 0,,_1x,, = 0. Thus, from (3.14) we have lim,,_.°o |e,, -— v,,] = 0. Since lim,,_.°° x,, = 0 and since c,, = 7717,, - me; - x,,_1G,,, we have .2261; = .liem<1-€i)-~n9n = lim r”(x-£3.) n-voo = .22.":(1- vi) 2 0 By the definition of c; in Lemma 3.1, lim,,_.oo c; = 0 and hence lim,,_..,,o A; = 0, contrary to the assumption. Thus we conclude that the sequence {A;} must converge to zero. It follows from (2.10) that lim,,_,°° ||0,, - 0,,_1|| = 0 when lim,,_.°o A; = 0. Thus (3.7) is satisfied. Consider the coefficient c,, in (2.31). At any time n, we can write 6,, = c; + 6,", where c; and c; were defined in Lemma 3.1. Since 1im,,_.°o c; = 0, we have liminf c,, = liminf(c;' + c,‘,’) 2 0. n-‘m n-‘w 46 Let s > 0. Then there exists an N E N such that for all n > N, m(7fl— 6;) — n,,_1G,, > —me. Let n > N. Then 7,1- 5; > -E + 3137:7031 > —e. t (3.15) Let s be arbitrarily small, then we have 7 - 63. Z 0- Thus a; 6 [0,7] and, hence, (3.6) is satisfied. -, Due to the slow convergence of the ellipsoid for large 71, one can argue that for sufficiently “K Km, —__,, large n, there exists a positive integer M such that M > N1 and [ya-H: — OZXnH] < \/7n+k, l S k S M In other words, no update takes place on {n + 1,. . .,n + M}. Substituting for y,,+), by 07x,,+), + v,,.H, and noting that |v,,+),| < ,/7,,+1,, we can write [(0'Y— 0,,)Tx,,+),] = [(0‘— 0,,)Tx,,+), + v,,+;, — v,,+),] (3.16) >4“ |(o"- 00’s.). +v.+1|+|v.+1| (3.17) 3 2w... 7 (3.18) for l _<_ k 5 N1. Therefore, squaring both side of (3.18) and, then, summing over 1:, 1 S k 3 N1, it is found that n+N1 J fl+N1 (0]- 0,,)T [ z: x,-X,T (J-On)<4 Z 7,- i=n+1 i=n+1 47 Hence, using (3.8), it follows that 1 4 "+N’- “9- 0..“2 < — 2‘ 7.- 91 i=n+1 Theorem 3.3 assumes that infinitely many data points are selected. The algorithm, however, may stop at some finite time N, that is, optimal weights no longer exist after N, when noise excitations are “too colored” or noise bounds are over-estimated. In this case the convergence properties (3.5) and (3.7) still hold since the volume of the ellipsoid and the estimates will remain constant, and the optimal weights A; become zero after time N. For c,,, recall that c,, = m(7,, — 5;) — x,,_1G,,. Since cn20forn> N, m(7,, — 53,) 2 Kn_1Gn > 0 (3.19) Equation (3.19) ensures that (3.6) still holds even if the algorithm stops at finite time. 3.2 Convergence of Parameter Sets and Suboptimal Test In this section, we will first investigate the condition under which the membership sets converge to a point. Then, a “suboptimal test” criterion is introduced. We will find that employing the suboptimal test not only improves computational complexity, but also results in excellent convergence properties. 3.2.1 Convergence of Parameter Sets Since the volume of the ellipsoid in SM-SA is non-increasing, it converges for all the sta- tionary signals and linear time-invariant models. The ideal performance for SM-SA is the convergence of membership sets to a point asymptotically. In this section, we will study the asymptotical properties of parameter sets. 48 Theorem 3.4 Let {t,,} be a subsequence such that 7,; - a?" < 0 Vn. If there exists an N E N and 91, 92 > 0 such that (3.4) holds, then the volume of the ellipsoid converges to zero if {t,,} is infinite. Proof: In Theorem 3.3 we have shown that lim,,_.°° A; = 0. Let c; be defined as in (3.10). Then from Lemma 3.1, lim,,_.°o c; = 0. That is, .- A ('0, £570 ,3 .41 ,1" _;_. .. H I’ll” c", ( H ’ 1 7 "lingo [m(7,, - 6;) — x,,_1G,,] = 0. . i (3.20) I/ 1 "l ,I - l \ Note that the ellipsoid at time n is defined by (f); L5 firm/r ’j 1/ (I 7/ P(n) = {0| (9 — 9,.)7’ R(n) (9 — 0,,) < x,,}. Thus it is obvious that x,, is positive if P(n) is non-empty. We claim that there exists a a > 0 such that x;B(n — 1)x,, V [1 0' < llxnllz det B(n _ 1). (3.21) To show (3.21), consider x;B(n - 1)x,, _ x£n,,_1P(n — 1)x,, (3 22) “x,,”2 det B(n — 1) — ||x,,||2n$_l det P(n - 1) ' T _ x,,P(n 1)x,, (3.23) 1c""'1||x,,||2 det P(n - 1) n-l Since the PE condition defined by (3.4) holds, P(n) is positive definite. Therefore 0 < A,,,,-,,||x,,||2 g x£P(n — 1)x,,, 49 where A,,,,-,, is the minimum eigenvalue of P(n — 1). Now A,,,,-,,||x,,||2 x;P(n — 1)x,, __ x;B(n - 1)x,, Isl-11 ||x,,||2 det P(n - 1) - 11:31 ||x,,||2 det P(n — 1) le,,||2 det B(n — 1)' That is, A,,,,-,, xZ'B(n - 1)x,, 52:11 det P(n - 1) - ”x,,“2 det B(n — 1). rn-l By the positive definite property of P(n), we have det P(n — l) < 00. Also x,,_1 < 00. Thus we conclude that there exists a o > 0 such that A,,,,-,, x;B(n — 1)x,, 162:1] det P(n — 1) - ”x,,“2 det B(n — 1)’ a< and, hence, (3.21) is satisfied. In the remaining part of this proof, we consider the subsequence {t,,} only. In order to simplify the notation, we replace the time index t,, by n for the rest of this proof. Since x,, > _. satisfies (3.4), there exists a 6 such that there are infinitely many 11 for which ||x,,||2 > 6. Let c > 0. According to (3.20), there exists an N, such that if n > N, then [m(7,, — 5,2,) — x,,Gn] < 660. (3.24) Let n > N1. Since 7,, - e; < 0, we have from (3.24) that m ]7,, - 5;] + x,,_1G,, < 660. Thus 0 < n,,_1G,, = n,,-1x;P(n — 1)x,, = x;B(n — 1)x,, < 660'. By (3.21), 0 < 0‘||x,,||2 det B(n — 1) < x;B(n — 1)x,, < 660. Thus 6 det B(n — 1 < ——e. ) "an? 50 Note that there exists an N > N1 such that “IQ/[[2 > 6. Therefore 6 detB N—l <——e N. (3.25) By definition, the volume of the ellipsoid converges to zero. Cl 4 Recall that 6,, ; —0,,_1x,, + v,,. Therefore 7,, — 5,2, < 0 if and only if lvn " éZ—lxnlz > 7n (3.26) If the volume of the ellipsoid keeps shrinking, the value 034x" will decrease to a small number as n becomes large.\When 011x" is small, from (3.26) we see that v,, must be sufficiently close to the 6:31:41: thaftj7n — 5,2, < 0 for infinitely many 71. This is very important for the convergenceitdf the parameter set. In Section 3.3.2 we will show that if either the noise bound is over-estimated or the realization of the noise process does not visit an arbitrary neighborhood of the bound infinitely often, then the ellipsoid can not converge to a point asymptotically. 3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm Theorem 3.4 leads to the pursuit of a suboptimal test. Recall that one important feature for members of UOBE is “selective updating.” That is, no updating is taken when incoming data do not contain useful information. Sub optimal test is a small modification of the updating criterion associated with SM-SA. Instead of checking c,, = m(7n - 5;) — n,,_1G,,, the suboptimal test checks only 8,, g 7,, - 5; since the negativeness of 6,, implies the 51 - r . negativeness of c,,. Note that the unique root A; on (0, 1) still exists for suboptimal test. Suboptimal testing has been studied in previous work [28]. The computational benefits of employing a suboptimal test can be substantial since the updating criterion is reduced to 0(m) computations where m is the model order. The weights obtained from suboptimal test, however, may not be optimal in the sense of volume ratio minimization. In order to obtain similar convergence properties as we do in SM-SA, define c,, ifE,,<0 n :1 || 0 otherwise Lemma 3.2 Given the coefl‘icients of the quadratic equation (2.31) and the conditions stated in Theorem 3.3, we have lim,,_.°° 5,, = 0 if and only if lim,,...°o A" - 0, where A; n— is obtained by solving the quadratic equation a,,A; + b,,A,, + c,, = 0 when 8,, < 0 and is set to be zero when 5,, Z 0. We do not intend to give formal proof for this lemma because 2,, < 0 implies t t c,, < 0 and hence the proof follows directly from that of Lemma 3.1. Only note that c,," proof of lemma 3.1 must change to é,,. The importance of the suboptimal test in theor cal analysis can be seen in the next theorem. Theorem 3.5 Let (33m, ll ’ ll) be a normed space. Given the R(n) in (3.3) and the update equations (2.7)-(2.11), if R(n) is positive definite and the suboptimal test is employed with infinitely many data points selected, then the SM-SA algorithm developed in Theorem 2.1 converges in the sense that "lingo 1:, = 0 (3.27) lim det B = 0 (3.28) n-ooo 52 Proof: If we change c,, in the proof of Theorem 3.3 to En, then by Lemma 3.2, the proof of (3.27) follows directly from the proof of Theorem 3.3. Since infinitely many data points are taken using the suboptimal test, there exists infinitely many points such that 7,, -E; < 0. By Theorem 3.4, the volume of the ellipsoid converges to zero and hence (3.28) is satisfied. 0 Again, if the algorithm stops at finite time N, then lim,,_.oo A; = 0 for n > N. Thus (3.27) still holds. In this case, however, the volume of the ellipsoid cannot converge to zero. Simulation results will be presented in the next chapter in support of the above analysis. 3.3 Convergence of the Parameter Set with White Noise Case or Errors Disturbances White noise has been studied frequently in the analysis of many estimation algorithms as well as model structures. A sequence {v,,} is defined to be white noise if 131),, = 0 (3.29) and 0 ° 0 Ev,v,-_,- = J 71 (3.30) a2 j = 0 where 02 is referred to as variance of the process. If (3.30) holds, {v,,} is said to be uncorrelated. It is common to assume that {v,,} is second moment ergodic so that the statistical properties in (3.29) and (3.30) can be realized by time averages, that is, O 1 n “1320 if ‘E— v,- — 0 (3.31) and 1 " 0 j 0 lim — E v,-v,-_,- = # (3.32) 53 For LSE algorithms, white properties have been shown to be sufficient for the convergence of the estimates for the structure in (2.1) [13]. Since the estimates in SM-SA belong to a class of LSE estimates, it is of interest to know whether the white property of noise excitation also applies to SM-SA. In this section, we will first consider a special class of white noise which is called i.i.d. white noise in which the random variables v,- and v,- are not only uncorrelated but also independent, and identically distributed for all i, j. Following the analysis of the i. i.d. case, the general white noise processes defined in (3.29)-(3.30) or (3.31)-(3.32) are considered. 3.3.1 IID White Noise Case Theorem 3.6 Suppose the noise process {v,,} is an i.i.d. white noise in which {v,,} is zero mean and uniformly distributed on [—,/7,,/7] for all n. If SM-SA is employed with noise bound given by v; < 7 for all n, then the volume of the ellipsoid converges to zero. Proof: LetFé{n€N:c,,<0}={n€N:c;#0}andletnEF. Since limn m c- = 0 >_ I, ;. f7 T n ’ / . . , ' 2 ._ ‘r f— .‘f ILG' 7‘» .w . ' [ nanolo m(7 - En) - ”71-1071 — 0' 'r My .117. ,. 1 (4 Note that '1 ' x,,_1G,, = n,,_1x;P(n — 1)x,, = x;B(n — 1)x,,. Thus 0 — — . 2 T — 11111 c,, — “lingo [m(7 — c,,) — x,,B(n - 1)x,,] — 0. (3.33) n—voo Equation (3.33) implies that in the limit _ 3:58.495. m . m :N 54 We may assume that ”x,,“ 76 0 since the probability of ”x,,“ = 0 is zero. Suppose the volume of the ellipsoid does not converge to zero, that is, lim,,_.0° det B(n — 1) ¢ 0. Since B(n — 1) is positive definite, x;B(n — 1)x,, > 0. Thus "1:11:07—5; >0forn6F. For other 71 g F, the inequality 7 — 5,2, > 0 still holds since m(7 - 5;) > n,,-1G,,. Thus we .' ] _. . 1‘\ have ”lingo 7 — 5,2, > 0 for all n, (3.34) and there exists an 5 > 0 and a subsequence {m,,};,"’=1 such that 7 — 53,, > 5 Vn. (3.35) . ~T Since 5,, = —0,,_1x,, + v,,, we have 7 - 63.... (J7 + lem..|)(\/“7 - lsmnl) = («7+ Ism.l)(./~7- Ivm. - 03..-.xm1) >6. Thus ~T 6 —v,,—0 _x,, >—— 6 2,/-7 , , )4 I =6. ”R Therefore (3.35) holds if and )0nly if T |v,,,n — 0,,,n_1x,,,,,| < ,/7 — 5' Vn. (3.36) 55 Expanding (3.36), we have — (,fi _ a) < v,,," _ 0;_,..,,, < fi— 5, and thus _fi-Jf- (5' + 0:,n_1x,,,,,) < v,,," <‘\/7 — (5' '— 0:n_lx,,,,,) 1. Suppose 0:n_1xm; > 0. If —\f7_ < v,,," < —\/7+ 5’ (see Fig. 3.1(a)), then \/7- 5’ < |v,,,,,| < ,/7 and thus 6T vmn — mn-lxmfl “T = lvmnl + [emu-Ixmn “T > J7—5’, violating (3.36). Consequently, v,,," cannot be in the shaded area in Fig. 3.1 (a). Since v,, is uniformly distributed, the probability that (3.36) holds is less than 11%;:1. 2. Suppose 0:n_1x,,,,, < 0. If \/7— 5’ < v,,," < ‘/7, then “T ”T lvmn _ emu-Ixmn = vmn + lama-1x731: “T > J? — 5’ + 0,,,n_lx,,,,, > \f7- 6’1 violating (3.36). The graph interpretation is in Fig. 3.1 (b). Since v,, is uniformly distributed, the probability that (3.36) holds is less than 2%: Let the probability of ~0:_1x,, > 0 be q,,. Then, the probability that (3.36) is satisfied is less than 2 —e’ 2 —€’ 2 —e’ L+(1_qm)_fl_-_\/7_<1 gm" 2J7 2J7 — 2\/7 (3.37) 56 flvn) *m l 1 v» —,/7: 5’ o f; (a) flvn) 1 l v .5; e 0 " Figure 3.1. (a) If gin-1x171» > 0 and v,,," in shaded area, then a contradiction results. (b) If dandxm; < 0 and v,,,,, in shaded area, then a contradiction results. Let A,, be the event that (3.36) holds at time m,,. Then from (3.37), PM.) _<_ 1% < 1. Now, given (3.36) holds at time n, we claim that . 2 _ a P(A..+1|A..) s i— < 1 277 , (3.38) where A,,.” is the event that (3.36) holds at m,,+1. To show (3.38), consider P (An+1lAn) = p (A,,, n {é;,,,-,xm,,, > 0} | A,,) + P (A,,, n {é;,,,_,xm,,, < 0} | A,,) = P (A,,, | {éfimdxmfi > 0} n A,,) P ({é;,,,_,xm,,, > 0} | A,,) +P (A,,+1 | {9;n,,_,xm,,, < 0} n A,,) P ({0:n+1_1x,,,,,+, < 0} | A,,) P (-,/»7 + 5’ < v,,,,, < fil {é;,,,-,xm,,, > 0} n A,,) P ({0;,,,_,xm,,, > 0} | A,,) ~T ~T +1) (-\/‘7 < vmrH-l < W- 6’ l {omn+1-Ixmn+1 < 0} n A“) P ({9m3+1-Ixmn+1 < 0} l A") ' IA 57 Since {0,,} is an i.i.d. white noise process, /’T I "T f 2 7 — 5’ P (—‘/7+ 6, < ”mu.“ < WI {9mn+1—1xm,,+1 > 0} m A") : __\/;__ 2,/7 and ~T - 2 7 - 5’ P (—\/T< ”mu-+1 < fi- 6’ l {0mn+1-lxmn+1 < 0} n An) = —\g:—/—:r—_- Thus 2 7 - 6' -T 2 7 - 5’ ~T P(An+l|An) S f—fl.—' P ({omn+1—lxmn+l} > 0 l An)+-\g-—\/7_ P ({omn+1—lxfl‘ln+1 < 0} l An) ° T mn-l-l Let q,,., = P ({9 _,x,,,,,,, > 0} | A,,). Then 2 —€’ 2 -€' 2 -€’ P(An+l l A,,) = an—‘é-if—‘Y- + (1 - 9n+1)—\/77‘/—,7— : Jé—i‘fi— Thus (3.38) is proved and _ g 2 P(A,,+1 n A,,) = P(A,,+1 | A,,)P(A,,) g (%.7—) A similar method can also be used to show that 2 7 — 5’ P(An+2lAn+1 n An) S _‘éZfi— and 2J7 — 5’ 3 P(A,,+2 n A,,“ n A,,) = P(A,,+2|A,,+, n A,,)P(A,,+1 n A,,) s (W) . By induction, we have .. g " n—ooo n—soo 2J7 Thus we conclude that the ellipsoid converges to zero. 0 58 3.3.2 General White Noise Case Set-membership-based algorithms are derived from noise bounds. In Theorem 3.4 we showed that if there exists infinitely many data points such that 7fl -_£,2, < 0, then the volume of the ellipsoid converges to zero. Since 6,, = v,, - 91:13:", the geometrical meaning of 7,. - 5,2, < 0 is that v,, must be very close to the boundaries if more data points are to be used. Before we study the general white noise case, consider the cases in which the noise bounds are over-estimated or when the noise processes are highly correlated such that 12,2, < 7,. — 6 Va 2 N (3.39) for some N and 6. Claim 3.1 Suppose there exists N and 6 such that (3.39) holds. Then the ellipsoid does not shrink to a point. Proof: We prove this claim by contradiction. Consider the N and 6 in (3.39). Suppose the ellipsoid converges to a. point. Let n 2 N. Then (\/7—n- Ivnl)(¢7;+ Ivnl) > 6. Therefore f ‘/7 — v > —— n lnl V7n+lvnl 6 «7.. + WT. 6 2,/7—,,' Since x,, is bounded, by the stability assumption on (1.5), and since the volume of the ellipsoid is assumed to converge to zero, there exists an N1 2 N such that 6 7 <3V7n, lea-1‘43 <—— where 7 = maxneN{7n}. Let n 2 N1, then 7n—En 59 (m4 |€n|)(\/'7?I- lenl) («m lenlxw—n- Iv. - fax.» mm lens/7;— lvnl 461ml) > > (W? lenl)(2‘fi_ - Iéf..xnl) > (m+le.l)(2jfi-3jfi) = (muesxsjfi) > 5 Now the convergence of the ellipsoid to a point implies that x,, converges to zero. So there exists an N; 2 N1 such that 6 Kn_1Gn < - Vn 2 N2. Let n > N2. Then it > N and Cn E 6 O. 6 m(7,, " 512:) - "n-l Ga 3 6 This result is contrary to the assumption since the algorithm stops taking points after time N2. Thus we conclude that the volume of the ellipsoid can not converge to zero. Cl Claim 3.1 points out the impact of the noise bounds on the volume of the ellipsoids and thus leads us to study the relationship between the white noise process and the noise bounds. Let {0,,} be a white noise process defined in (3.29)-(3.30). If {v,,} is i.i.d., then from theorem 3.6 the volume of the ellipsoid converges to zero. The key point in the proof is that the magnitude of the noise can be arbitrarily close to the boundaries. If we eliminate 60 the condition of independence, then the noise may not be arbitrarily close to the boundaries. Consider the following example. Let {v,,} be a noise process taking only discrete values at -\/7, —\/273, 0, m, and fl. Let A be an event defined in an appropriate probability space, and vo be the initial condition such that P(vo 6 A) = 2/5. If vo E A, then {v,,} is defined as \/2 73, w" 2 0 —\/27/3, 10,, < 0 Otherwise fir 1 Thus, (v,,,n _<:1) is uniformly distributed over {—‘/7', —\/2773,0, V2773, fl}, and hence we can write 03, g 7 for all n. The conditional expectations are given by E(v,,|vo E A) = ‘/27/3P(w,. 2 0) - ‘/27/3P(w,. < O) = O E(v,,|vo ¢ A): fiP(1 < 10,. S 3) — fiP(—3 3 mn S —1)= 0. Therefore, B(vn) = B(vnlvo E A) P(vo E A) + E(v,,|vo ¢ A) P(vo ¢ A) = O. 61 Also, we find that a}, if s = t E(v,vt I 120 6 A) = , 0, otherwise and 131, if s = t E(v,vt | v0 5? A) = . 0, otherwise Consequently, 231, if s =t E(U,Ut) = 0, otherwise As a result, {v,,} is a white noise with mean zero and variance 1}. Furthermore, if second moment conditional ergodicity is assumed, as defined in (3.31)-(3.32), given v0, then 12,, is also second moment ergodic. If RLS algorithm is used to identify the model parameters in (2.1) with this noise model, then the estimates converge to the true parameters since on is uncorrelated. This may not happen to SM-SA if the initial condition v0 6 A, simply because 12,, can not be arbitrarily close to the boundaries. By Claim 3.1, the ellipsoid cannot shrink to a point. In this presented example 12,. can not be close to the boundaries since the dependency between on and 00 can not be eliminated, even for large n. In addition to this noise process, there are other white noise processes in which {v,,} rarely hits the boundaries. In this case, the performance of SM-SA may not be favorable. ‘ . For the following example, we will assume {ton} to be a sequence of pseudgrandom numbers with uniform distribution on [-0.5,0.5]. First, let {v,,} be a white noise process generated by passing {10"} through a first-order all-pass filter 0.5 + 2’1 HM = 1 + 0.52-1' Figure 3.2 and 3.3 show the sample correlation coefficients of {mu} and {v,,}. ,3 Both sequences have identical sample mean and second moment properties. \‘Consider the systems yn = Glyn—1 + a2yn-2 + aayn—s + vn 12" 62 -02 - - -o.4 - « -O.6 . -o.e - - '10 5 1o 15 20 25 30 35 40 Figure 3.2. Sample correlation coefficients of {mu} in the example on page 63 up to time lag 40. -02 - l -o.4 - « oer ~ -o.a - - '10 5 1o 15 20 25 so 35 40 Figure 3.3. Sample correlation coefficients of {v,,} in the example on page 63 up to time lag 40. 63 and ya = Glyn—1 + awn-2 + “syn-3 + wn where 01 = 2.1, a; = —1.48, and 03 = 0.34. Comparing the estimates obtained by RLS and SM-SA in Fig. 3.4 (a) and (b), we find that the behavior of SM-SA for these two noise processes is completely different while the estimates obtained by RLS are almost the same because both noise processes {v,,} and {mu} are white. The difference in SM-SA performance can be explained by analyzing the visitation frequency of the noise bounds. Fig. 3.5 compares the relative frequency of {mu} and {v,,} at each amplitude interval. In these figures, the horizontal axis represents the amplitude and the vertical axis represents the number of times the noise amplitude falls in a specific interval. We see that the distribution of {mu} at each interval is close to be uniform on [—0.5,0.5], while the distribution of {v,,} appears Gaussian, although it is not, within the interval [—O.9,0.9]. Since {v,,} rarely stays in the neighborhood of the noise bounds, the ellipsoid cannot shrink enough to provide good estimates as shown in Fig. 3.6. However, if we pass {v,,} through a soft-limiter such that v,, is set to be 0.5 or —0.5 if v,, > 0.5 and v,, < —0.5, respectively, and otherwise left unchanged, then the frequency of visitation take a neighborhood of the bound, namely 7 = 0.55, increases as depicted in Fig. 3.7. This new {v,,} still seems to have white properties as $116411 in Fig. 3.8. However, from Fig. 3.9, we find that the performance of SM-SA in this case is much better than before. If we multiply the original {v,,} by two and pass it through the same soft-limiter, then the rate of visiting the neighborhood of the noise bounds by {v,,} is much higher as shown in Fig. 3.10. Figure 3.11 shows that in this case the performance of SM-SA is extremely good. If we examine the proof of Theorem 3.6, we find that the convergence of the ellipsoid does not depend on the first and the second moment properties of the noise input. That is, even if the conditions defined in (3.29) - (3.32) are not satisfied, the estimates may still converge to the true parameters if the “close-to-boundaries” condition is satisfied. For example, let {v,,} be a noise process obtained by passing {wn} (i.i.d. noise, see page 61) 64 O . .. -O.2 - - True ---- -O.4 - ~ 200 400 600 800 1000 (13) § Figure 3.4. Estimates of the parameter (13 when the system is driven by (a) an i.i.d. white noise (b) a white noise but not i.i.d.. 65 $883 ,8 N O .5 O Nunber of time the maglitlfle falls in the given intenlal O ‘1 O 8 2 .j i ' 0. .L i. i don-CooouozoIo-o-oiooun oooooooooooo I a I I I o o I I o o i o o I I I I C o l ‘5 Q I a o I i 2 u u '5 d O l I l I Mmberoltimemsmagtitude fallshthegiven interval 3. 31 l l \ j i 1 l j J O Figure 3.5. Distribution of {ton} and {v,,} at each amplitude interval. 66 4 white noise . 4 i.i.d. noise ‘ ‘5 . 10 A A A l A A l A A A A 10° 101 1o2 103 n Figure 3.6. The logarithmic plot of volume versus the actual time n. We see that the volume of ellipsoid is much smaller when the noise is i.i.d. \l O E £60- ........ . ................... ................... ................... . ................... . ........ § 3 5 l : E F 350. ........ , ................... 3 ................... i: .............................................. _ 3 _ gwl ........ ................... ...... gag. ...................... _.l_l_r1 F. .i. T. s t— - ' 20,. .. 5 .. . '6 E g 10" :t *' 1 ‘ G -O:2 (I) Amplitude Figure 3.7. Distribution of a white noise at each amplitude interval after passing {0,,} through a soft-limiter. 67 Sample correlation coefficients in to time difference a: 40 1 I U I T I V I -0.2 r . -0.4 - . -O.6I- . -0.8 - - '10 5 1b 115 2b 25 ab 35 40 time difference -02 - - True --- -O.4 ' 1 200 400 600 800 1000 n Figure 3.9. Estimates of the parameter a3 when on is passed through a soft-limiter. 68 g I jg I Y I r— .5200... ........ g ................... g ................... 4 .............................................. q S. 3 £150., ................................................................................................ 1 so 3 §3100' ............................................................................................... .1 E 3 E 50. ............................................................................................... _ E c -014 -o:2 o ” 0:2 0.4 Amplitude Figure 3.10. Distribution of a white noise at each amplitude interval after multiplying v,, by two and then passing it through a soft-limiter. 0- . .02. . True --- -O.4~ - 200 400 600 800 1000 n Figure 3.11. Estimates of the parameter (1;, when on is multiplied by two and then passed through a soft-limiter. 69 through an all-pole filter described by 1 H(Z) : 1 + 0.5.?"1 The output is bounded on [—1, 1]. From its sample correlation coefficients in Fig. 3.12, we know that {v,,} is colored noise. Figure 3.13 shows that the performances of SM-SA and 1 . . . . 0.8 - - 0.6 . 0.4 - 4 0.2 — . 0 . .02 r- _. -0.4 - - -O.6l- . -0.s - - '10 5 1b 15 20 25 ab 35 40 time difference ‘ Figure 3.12. Sample correlation coefficients of {0,,} up to time lag 40. RLS are severely affected by the autocovariance of {0,,}. Again, the distribution of {v,,} is examined in Fig. 3.14 where we find that {v,,} rarely appear in the neighborhood of the noise bounds. If we multiply {v,,} by two and pass it through a soft-limiter, as we described before, then {0,,} stays in the neighborhood of the noise bounds, $0.5, most of the time. Note that in this case the new {0,,} is still a color noise as shown by its sample correlation coefficients in Fig. 3.15. However, the performance of SM-SA is excellent as shown in Fig. 3.16, while RLS is still affected by the autocovariance. 70 0.3 - 0.6 - 0'4 _____________________ _Irye. .. _ _ _ _ _ _I 02 < 0| .4 -02 RLS - -0.4 SM-SA . 200 400 600 300 1000 Figure 3.13. Estimates of 03 by SM-SA and RLS when the noise is color. falls in the given interval Number of time Figure 3.14. Distribution of a color noise at each amplitude interval. 71 Sample correlation coefficients Lp to time difference - 4O 0.8 . 0.6 ~ 0.4 ~ 02 - - o . -O.2 - 1 -0.4 - « -O.6I- - 03 ~ - '10 1 1b 15 20 25 so as 40 time difference Figure 3.15. Sample correlation coefficients of a noise sequence obtained by passing {0,,} through a soft-limiter. as Tare-— ' J 0.6 . 0.4 - “I 'l ‘ ' l — i - ' ' - i ; SM-SA 02 ' 4 ° RLS ‘ -02 - -0.4» . 200 400 600 800 1000 n Figure 3.16. Estimates of the parameter 03 when the color noise {v,,} is passed through a soft-limiter. ' 72 The above examples show that the whiteness of the noise processes is not important in the performance of SM-SA, although it is important to the conventional RLS. The most important factor for the convergence of SM-SA is the probability for which the noise visits a neighborhood of the bounds. Thus, while SM-SA employs similar recursions to RLS, the two algorithms have totally different convergence properties. The convergence properties in RLS arise from the second moment properties; while the ones in SM-SA come from the noise bounding assumption. 3.4 Independence and The Mixing Conditions To generalize the convergence of SM-SA to colored noise and give a more rigorous treatment, we first introduce the measure-preserving transformation T. Let (5,?) be a measurable space consisting of a set 5 and o-algebra ? of subsets of S. Let P be a probability measure defined on (5,?) so that (5, ? , P) is a probability space. In our case, if X is a real set bounded by «7, X = l‘fiH/fl’ then S = X°°, the collection of all real sequences (vo,v1, . . .,). Let T : (5,?) —> (5,?) be a measurable transformation on (5,?, P), then T is said to be measure-preserving if and only if P(T"1 A) = P(A) for all A 6 ? [10]. In order to simplify our discussion, we can take the measure-preserving transformation T as a one-sided shifter, T(‘vo, ‘01, . . .) = (1)1, 02, . . ..) Note that P(T"A) = P(A) implies that P(T‘kA) = P(A) for k = 1,2,.... Thus P{(uo,v1,...,u,,_1) 6 Ba} = P{(vk,vk+1,...,v,,+k_1) e Bu} for all n and k and n- dimensional Borel sets Bu. 73 Definition 3.2 Let T be measure-preserving on (5, ? , P), T is said to be independent if for all A,B E ?, P(A n T‘"B) = P(A)P(B) Vn e N. T is said to be mixing iffor all A,B E ?, lim P(A n T‘"B) = P(A)P(B). (3.40) 7%..” The mixing condition is a very reasonable assumption since it reflects the physical characteristics of many common random processes. In this research, let {0,,} be the noise input of the model in (1.5), then {v,,} satisfies the mixing condition if the dependency between v,- and 0,- decreases to zero when |i — j| —5 00. For most system identification problems, we assume that the PE condition on {x,,} holds over all natural numbers. Since the SM-SA has a “selective updating” feature, we should confine the consideration of the PE condition to the subset of {x,,} in which updates take place. Since we do not know in advance which points will be used in the updates, this requires that all the infinite subsets of {x,,} are PE. However, this problem can be alleviated if the mixing condition is imposed on {0,,}. Consider the following corollary given by Bitmead [17]: Corollary 3.1 If x,, is such that E(x,,x,7,') is positive definite and either x,, is ergodic or mixing, then there exists a constant 6 > 0 and a finite integer N such that n+N Z x,x,~T 2 51 > o (3.41) i=n holds infinitely almost surely. 74 In the present work, the upper bound of (3.41) is guaranteed by the stability assumption. Note that all the subsequences of {x,,} are also mixing if {x,,} is mixing. Thus if B(xan) is bounded below at the points which updates take place, then (3.4) is satisfied. The next two corollaries serve as an extension of Theorem 3.6. First we consider the independent case in which the mean of the noise process does not have to be zero and the distribution does not have to be uniform. Corollary 3.2 Let {v,,} be a noise sequence and let T be independent on (5, ?, P) according to Definition 3.2. Given 6, let D¢+ = [fi— c,,/7] and D,. = [—\/7', —\/7'+ c]. If SM-SA is employed and there exists a 6 such that for all n P(vfl E De...) > 6, and P(vn E DC.) > 6, then the volume of the ellipsoid converges to zero. Proof: Consider the proof of Theorem 3.6. If the volume of the ellipsoid does not converge to zero, then (3.36) must hold. Given 6’, by assumption there exists a 6’ such that for all n P(vn 6 D¢I+) > 6’, and P(vn 6 Dy-) > 6’. 1. Suppose bin-13%.. > 0. Then the probability that (3.36) holds is less than 1 - 6’. 2. Suppose 9:..-1xm. < 0. Then the probability that (3.36) holds is also less than 1 —6’. If we change the quantity 235$ in the proof of Theorem 3.6 by 1 - 6’, then we have 0 e I n _ nlLlIgoP(AnnAn_1“An_gn...A1)Snango(1—6) -0 and thus the ellipsoid converges to zero. 0 75 The key point to the proof of the convergence of the parameter set is the independence between vfl and 6,7,1] x,,. In order to extend the convergence property from independence to mixing condition, SM-SA has to be modified. The reason is that for the original SM- SA with the ARX model, x,, contains yn_1, the output at time n - 1, thus we cannot separate the dependency between vn and 611x" unless {v,,} is independent. However, if we modify the SM-SA so that x,, is not changed unless an update takes place, provided that for large n the time difference between two updates is large, then we can apply the mixing condition to show convergence. This is possible since x,, depends only on the events up to the previous update. This modified version is identical to the SM-SA algorithm r except for the regression vector x,,. In SM-SA, x,, = [y,,_1, . . .,yn_,,u,,,u,,_1, . . .,u,,..q], . . . . T while m the modified algorithm, x,, = [y;,,,. . .,y¢,,_P,u,,,u¢,‘,. ..,u,,_g] where y", u,, are the same as in (1.5), and {t,} is a subsequence such that c,,. < 0 and t,, < n 5 “+1. , Note that in this situation, the algorithm is identical to RLS with a decimated input. The corresponding decimator has a time-varying sampling rate, say D“, which is controlled by the data selection process. Generally, if y,, is wideband, then D" is large and the data picked are usually sparse. ‘ All the convergence proofs developed previously for SM-SA are still applicable to this modified algorithm, since those convergence properties are not affected by this modification. 1”" The following corollary applies to this modified algorithm. Corollary 3.3 Let {v,,} be a noise sequence and let T be mixing on (5,?, P) according to Definition 3.2. Given 6 and N, let D9,. = [fi- 6,‘/‘—7-] and D,. = [—fi, -\/7+ 6]. If the modified SM-SA is employed and there exists a 6 such that for all n P(vn 6 DC...) > 6, and P(vn 6 DC-) > 6, and there exists an M > N such that there are infinite number of interval [n — 1,n - M] on which no update takes place, then the volume of the ellipsoid converges to zero. 76 Proof: Consider the proof of Theorem 3.6. Again let {m,,} be the subsequence such that 7 - 53,,” > 6 for some 6 > 0. If the volume of the ellipsoid does not converge to zero, then (3.36) must hold. For convenience, we rewrite (3.36) Iv..." — airlxmnl < y; — e' where e": 72,, (3.42) Given 6’, then, by assumption, there exists a 6’ such that for all n P(vfl 6 Deli.) > 6’, and P(vn 6 Der-) > 6’. 1. Suppose airlxmfl > 0, then the probability that (3.42) holds is less than 1 — 6’. "v 2. Suppose bin-13%.. < 0, then the probability that (3.42) holds is also less than 1 — 6’. Again let A,, be the event that (3. 42) holds at time m,,, then 7"? Alli/h/AEE at”) : fillV' / l F(A,,) < P(- -,/~7+e’0) P(0T0_1xm,,>0) +P (-\/’7 < v"... < f- 6’ l Emu-1x11». < 0) P (6711;.-1an < 0) . m" 'H-————-—-——-~ly-—~—/'< . k 77‘ ‘ .C l/l/l‘~{jl Note that {6,1,4an > 0} does not depend on the events which take place after the ill. previous update. Let the time difference betweenqi and the time index of the previous update be 1:. Define W,,_k = A{0,,,"_1xm > 0} and Xn- - A{- f+ 6’ < v,,," < J7}. Then WM)“ X" E ? and P(Xn) al— 6’. By the definition of mixing condition in (3.40), an; P(X, n W,,-;,) = P(Xn)P(W -,). (3.43) Suppose P(W _k) is not zero. Then (3.43) can be written as m P(Xn 0W —k) k—ooo P(Wn_k) =klim P(Xn | W,,_,,)- _. P(Xn ). Let 0 < f < 6’, then there exists an N such that lP(Xn I Wn-N)—P(Xn)l<£r 77 and, hence, P(Xn) - E < P(Xn | W -N) < P(Xn) + E. (3.44) . "dn’A/v’, IMJ‘I Note that (3.44) holds only if no update takes place on - N, n}. Due to the slow convergence of the ellipsoid for large 11, most incoming data are rejected. By assumption, there exists a positive integer M Z N such that no update takes place on {n — 1,n — 2, . . (,1:— M}. Thus (3.44) is satisfied and F(A,,) g P (-,/7+ 6 < v,,, < J? | birlxm” > o) P (614me > o) +P (—fi < v,,,” < fi- 6’ | 6:“,an < 0) P (6143:...“ < 0) (P(—,/~7+ c’ < v,,,” < «7) + 6) P (6141:...” > 0) + (P(-fi < v,,.” < fi— 6’) + 5) P (bin-1x77»: < 0) 1-€+£<1, IA Q ( i ‘\, IA and P(A,|A,._N) ” »' IA P (—,/~7+ a < v,,," < (m {39;qu > 0} n A,,-N) P ({éiflxm > 0} | 11..-”) P +P (-,/j< v,,." < fi- 6’ | {gin—139m. < 0} n A,,-N) ({éZn-lxmn < 0} | A,,-IV) / i 1. IA 1-€+£ Therefore, P(An n A,,-N) = P(AnlAn-N)P(A,,_N) g (1 - 6’ + ()2 < 1. Note that P(An n A...l n ---n A,,-N) g P(An n A,,-N) g (1 — 5' + ()2 < 1, 78 and klgrgo P(An-HcN n An-HcN—l “-0 An) IA #520 P(An-HcN n An+(k—1)N n -° ' ° 0 An) 3 klim(1—6'+£)"=O. That is, the probability that the ellipsoid converges to zero is one. Chapter 4 Simulation Studies Set-membership based algorithms have shown great potential in applications for many signal processing and control systems problems involving parametric modeling [25, 26, 28], such as speech processing, image coding, neural networks, and adaptive control. This chapter is concerned with simulation studies using the SM-SA algorithm with optimal and suboptimal strategies for the properties discussed in Chapter 2 and 3. The simulations performed in this chapter are designed for those noise processes which are commonly seen in engineering applications. These simulations not only serve as a test for theoretical analysis, but also provide heuristic descriptions for those phenomena which cannot ‘ be proven by rigorous mathematics even though they work for most, engineering applications. We will see that the SM-SA performs better than conventional RLS in many problems if the noise bound can be correctly estimated. In Section 4.1 we use an AR(3) time-invariant model to examine the general properties discussed in Chapter 3, including the behavior of the estimates, the volume of the ellipsoid, the estimation error, and the resemblance of SM-SA to stochastic approximation. Section 4.2 deals with a time-varying system derived from a speech signal. Comparisons to the conventional RLS are also made. Section 4.3 is concerned with the suboptimal test when the model is time-invariant. Finally, a more general case in which the noise bounds are over-estimated for the time-varying system is considered in Section 4.4. For all the simulations in this chapter, the models are driven by the noise process {Sn} which is the name representing all the noise processes summarized in Table 4.1. Note that 79 80 5,, Characteristics of the noise processes wfl IID white noise with distribution U [—0.5, 0.5] v,, Obtained by g-(wn + 0.1) 2,. Zero mean binary Bernoulli sequence obtained from {wn} t,, Biased binary Bernoulli sequence obtained from {wn} Table 4.1. Noise processes used in the simulations in this chapter. the sequence {2"} in Table 4.1 is obtained by 0.5 wfl Z 0 Zn: —0.5 112,, < 0 and {t,,} is obtained by 0.5 10,, 2 -0.081 t,,: -0.5 ton < —0.081 4.1 Performance of SM-SA for Time-Invariant Systems In this section, an AR(3) model given by yn = Glyn-1 + (Wyn—2 + “33111—3 + Sn (4.1) is considered. This model has three poles at 0.8 :l: 0.2j and 0.5 when a1 = 2.1,a2 = —1.48, and a3 = 0.34. For these simulations, a set of 50,000 data points is generated to identify the model parameters using {5...} with 5,3 5 0.25 (4.2) Note that Sfl should be replaced by the random sequences given in Table 4.1. 81 4.1.1 SM-SA with Pseudo-Random Excitation In this section, the model (4.1) is driven by a pseudo random sequence {wn} with mean fl = -8.9 x 10", which is obtained using a C-language version of the machine-independent random number generator URAND describe in [11]. Only 1.58% of data points are selected by SM-SA. Figure 4.1 shows the non-increasing volume measure 11,, on a logarithmic scale for the model in (4.1). Figure 4.2 shows that the estimation errors Isnl falls into the dead zone [0, 0.5]. The first 1000 SM-SA estimates can be seen in Fig. 4.3. We observe that the estimates are “almost” consistent and converge at a reasonable speed with only 1.58% of the total data points used. Figure 4.4 illustrates the weighting sequence {A;} averaged over six experiments. It shows that the averaged {A;} is type l/n sequence, which implies that it satisfies Robins- Monro’s conditions for convergence. 1o1o 10-10 _ Figure 4.1. The logarithmic plot of 11,. versus time n for the linear time-invariant AR(3) model with pseudo-random excitation. For LSE algorithms, the estimates are consistent when the noise excitation is white. If 82 Figure 4.2. The logarithmic plot of lenl versus time n for the linear time-invariant AR(3) model with pseudo random excitation. 2 " "' I ' I ' 'I - 1s « 1 .. as _ __ - _ ____ - o . -os . .1 4 -15 -. --- T - - . ----.-- 200 400 n soo 800 1000 Figure 4.3. First 1000 estimates of the parameters in the AR(3) model obtained using SM-SA and a pseudo-random excitation, where a; = 2.1, a; = -1.48, and 03 = 0.34. 83 4 10 _r '3 1O6r 1 " J A L4 .aaal A A 4 A AH 1° [0 A A A AAAA 1 A 2 3 10 1O 10 10 Figure 4.4. The relationship between logarithms of time and weight. we let 1),. = %(wfl -- 0.1), then the noise bound given in (4.2) still holds for {v,,} but the sample mean is now —-113. Figure 4.5 compares the performance of RLS and SM-SA. It is seen that RLS is severely affected by the bias of the noise excitation. The bias in SM-SA, however, is very small. Note that only 0.89% of the data points are used by SM-SA. 4.1.2 AR Models with Binary Bernoulli Excitation SM-SA algorithm is derived from bounded noise constraints. The weight A; is optimal in the sense that the feasible set is minimized given that the previous information is known. We may view the size of the feasible sets as a “confidence interval.” If the optimal weight exists most of the time, it is expected that the performance of SM-SA is much better than RLS because of the optimal weights. From the analyses in the previous chapter, we know that optimal weight A; exists when the magnitude of the noise is close to the bound. In this simulation, a zero-mean Bernoulli process {2”} is employed to drive the system. Let 7 = 0.25, then {2"} always hits the bounds. 80.11% of the data points are used. This is not surprising because {2"} is always at the boundaries and thus updating is carried out 84 True value a1 -2. 1 2.4 a . v w 2.35 - ‘4 2.3 2.25 2.2 2.15 2.1 2.05 2 x10‘ True value e2--1 .48 ' 1 '1 .2 f r I f -1 .3 ‘ -1.4 - SM-SA -1 .5 -1 .8 -1 .7 .. 1 '1'8 2 3 4 5 " x 10‘ True value 33-0.34 0.6 Y r U ‘I 0.5 ~ « 0.5 - J I 0.45 I q . 1 RLS 0.4 I H _ v -1 035‘ 4 0.3 SM-SA ~ 0.25 - " x 1 0‘ Figure 4.5. The estimates of model parameters by RLS and SM-SA for biased noise se- quence. 85 most of the time. One of the advantages of this extensive updating is that the volume of the ellipsoid converges very fast to a very small number as indicated in Fig. 4.6. Non-increasing volume when binary Bemoulli sequence is used I v Vfivvvv' *7 Figure 4.6. The logarithmic plot of pa versus time n when a binary Bernoulli sequence is used. A comprehensive comparison of SM-SA and RLS is shown in Fig. 4.7. If we replace {2"} by a non-zero mean Bernoulli sequence {t,,}, then, according to Fig. 4.8, the performance of SM-SA with 78.16% of the data points selected is not affected by the non-zero mean {t,,}. However, the RLS estimates are clearly influenced. 4.2 Performance of SM-SA for Time-Varying System In this section, a speech model with time-varying coefficients ya = 61(n)yn-1 + b2(n)yn—2 + b3(n)yn-3 + wn (4-3) 86 True value a1 -2. 1 2.137 f l I r v 2.12 RLS ‘ 2.1 1 - SM-SA 2.1 ‘r ‘. ‘ 2.09 . 2.08 . 2'07 1 1 5 2 2.5 3 x 10‘ True value a2--1.48 '1 .42‘ t r r f I '1 .43 l "' a; A , I am RLS j 4 '52 0.5 1 1.5 2 2.5 ' 3 x 10‘ True value a3-0.34 0.36 f l I I r 0.355 I _ SM-SA .4 1.5 2 2.5 3 x 1 0‘ Figure 4.7. First 30,000 estimates obtained using SM-SA and RLS when a binary Bernoulli sequence is used. 87 True value a1 -2. 1 RLS 2'1 1 I 2.1 1: SM'SA 2'09 0 5 1 1 5 2 2 5 3 x 10‘ True value a2--1.48 '1 .46 I T I T I -1 48 J: #SILSAI I -1.5 I - ‘1 .52 I q -1 .54 I -1 RLS '1 .56 cl '1'“ 0.5 1 1 5 2 2 5 a ' x 10 True value 113-0 .34 0.4" I I I I I 0.39 RLS - 0.38 - 0.37 - .1 4 s22; l .I 1 1.5 2 2.5 3 x 1 0‘ Figure 4.8. First 30,000 estimates obtained using SM-SA and RLS when a biased binary Bernoulli sequence is used. 88 is considered. Simulations for speech signals have been performed by Odeh [28]. Here we use a similar experiment just to examine the performance of SM-SA. True parameters in this model are derived using short-term linear prediction analysis [37] of order two with a 256-point Hamming window and Levinson-Durbin recursion (see [37]) on an utterance of the word “four” by an adult male speaker. The acoustic waveform is shown in Fig. 4.9. The data were sampled at 10kHz after 4.7kHz low-pass filtering. While more meaningful The acoustic waveform of the word "lour‘ 1 000 I I r I I 800- 600' 400" 200'- 400'- -600' soc» - -1 000° 1 000 2000 3000 4000 5000 ' 6000 11 Figure 4.9. The acoustic waveform of the word “four”. analysis of speech would involve model orders of 10-14 [37], this small number of parameters is used here so that the results are easily illustrated [28]. The true parameters for the word “four” are shown in Fig. 4.10. A set of 5376 data points were generated by driving this model by 11:". In this simulation, only 2.15% of data points are used by SM-SA with good performance. Simulation results by using SM-SA and RLS algorithms can be seen from Figs. 4.11 to 4.12. Similar results by using SM-WRLS can be seen in [28] in which 1.86% of data points are used. In each figure there are two curves, one for the true parameters as reference, the other for the estimates obtained by using SM-SA or RLS. 89 -1 _ q -1.5 - « '2 1000 2000 3000 4000 5000 (a) 3 . . . . . 2.5 ~ - 2 _ . 1.5 - ' ~ 1 . - 0.5 - . 0 - _ -0.s - -1 . q -1.5 - - '2 1000 2000 @3000 4000 5000 Figure 4.10. The “true” parameters for the word “four”: (a) b1(n) (b) bg(n). 90 The power of the SM-SA algorithm is evident in this simulation. Upon comparing the estimates obtained by SM-SA and conventional RLS algorithms, we find that SM-SA shows better tracking capability than conventional RLS. For most algorithms, there is a trade- off between improved computational complexity and improved performance. For SM-SA, however, improved performance comes along with improved computational efficiency. More discussion about SM-SA in time-varying system is left in section 4.4. 4.3 Suboptimal Test More computational saving can be achieved by employing the suboptimal test. In Section 3.2.2 we have shown that many convergence properties of the suboptimal SM-SA are similar to the “optimal” SM-SA. In this section, simulations and comparisons to the conventional RLS are performed to verify the performance of SM-SA and its convergence properties. Only the time-invariant AR(3) model given in (4.1), will be examined. The reason we do not apply suboptimal test to speech signal is explained in Section 4.4. 4.3.1 AR Models with Pseudo-Random Excitation First the model (4.1) is driven by {wn}. Only 0.46% of data points are used by the subop- timal strategy. Compared with 1.58% in “optimal” SM-SA, this is a significant saving. But more importantly, the checking procedure is 0(m). Figure 4.13 shows the non-increasing volume measure pa in logarithmic scale. Fig. 4.14 shows that the estimation errors [5"] also falls into the dead zone [0,05]. The first 2000 estimates can be seen in Fig. 4.15. We see that the estimates are consistent and converge at a reasonable speed with only 0.46% of total data points used. Figure 4.16 illustrates that {A;} is a 1/n-type sequence, satisfying Robins and Monro’s conditions for convergence. Only 0.19% of data are used if {v,,} is the driving process, Note that the same ex- periment performed in Section 4.1.1 used 1.03% of the data points. Figure 4.17 compares the performance of RLS and SM-SA. We see that the effect of biased noise on suboptimal SM-SA is much less than it is on RLS. 91 CD estimates 0.5 1 0 d -0.5 q -1 I- .4 -1.5 P - '2 1000 2000 3000 4000 5000 (a) 3 I I I r I 2.5 - . estimate 1 000 2000 (b )3000 4000 5000 Figure 4.11. Estimates of bl(n) for the word “four” using (a) SM-SA (b) RLS. 92 2.5 - ‘ 2 r ' 1.5 r ‘ 1 r ‘ 0.5 ' ‘ 0 estimates - -0.5 -1 (- true " .15 . . '2 1600 2000 ao'oo 4000 so'oo (I) 25- I 2 r . 1.5 - . 1 '- . 0'5 ' estimate ‘ 0 q 05 - '1 P true I -1 5 ~ J .2 .03. 2000 (1)35“ .50. 5000 Figure 4.12. Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS. 93 Non-increasing volume using sub-optimal test ' ' ""'I V ' """I ' ' ‘ ""'I P' I """I "' ‘ ' p p p 5 10 r L p 0 10 P y p I I -6I 1O - p A_ A AAAAAAJ. A A AAAAAAI A LAAU‘A‘ A A A AAAAAl A A A O 1 2 3 4 10 10 10 10 10 n Figure 4.13. The logarithmic plot of 11,, versus time n for the AR(3) model when the suboptimal test is used. Estimation error for sub-optimal bet 1o . - - "m, - - - - - a - - Figure 4.14. The logarithmic plot of |sn| versus time n for the AR(3) model. 94 1 - -- true parameters , s -0.5 - “ -1 . .. -1.5 200 400 600 800 1000 1200 1400 1600 1800 2000 n Figure 4.15. First 2000 estimates of the parameters using SM-SA, where 01 = 2.1, 02 = -1.48, and 03 = 0.34. Figure 4.16. The relationship between logarithms of time and weight. 95 True value a1-2.1 2.4 I I I I 2.35 ‘ 2.3 .1 2.25 ‘ 2.2 ' ‘ 1 _ FlL§ 2.15 - 2-‘ SM-SA I 2.05 ‘ " 1 2 3 4 5 x 10‘ True value a2--1 .48 '1 .2 I 1 I I -1 .3 r ' '1 .4 .1 SM-SA . -1 .5 g . .. '1 .8 . . hmA WA RLS - -1.7 1 -1.3 . 1 ‘ ‘ ‘ ' 1 2 3 4 5 n x 1 0‘ True value a3-0.34 0.6 I I I I I 0.55 . 0.5 - 0.45 - II ‘ RLS 0.4 l ’ I - 0.35 - SM-SA 0.3 r 0.25 -1 0.2 l 1 1 1 1 2 3 4 5 n x 1 0‘ Figure 4.17. Estimates of parameter using RLS and SM-SA for the suboptimal test. 96 4.3.2 AR Models with Binary Bernoulli Excitation Let 7 = 0.25 and let the AR(3) model is driven by {2”}. In the simulation for suboptimal test, 49.86% of data points are used. The corresponding experiment for optimal SM-SA used 80.11% of data points. Figure 4.18 compares the performance between SM-SA and RLS. We find that SM-SA shows excellent performance in the convergence of estimates even for the suboptimal test. If we change the noise process to {t,,}, Fig. 4.19 shows that the performance of SM-SA is not affected by the bias of excitation, which is similar to the result for the “optimal” SM-SA. In this case, 48.91% of data points are used, while 78.16% data points are used in the corresponding optimal SM-SA. 4.4 More Studies on Time-varying Systems SM-SA algorithm is derived assuming a time-invariant system. In the previous simula- tion, we have seen that SM-SA has some tracking ability. The adaptability of SM-WRLS algorithm have been studied in previous work [28]. Although we do not intend to pur- sue theoretical analysis of SM-SA for adaptivity of SM-SA, we can make some heuristic descriptions. Consider the weighted covariance R(n) R(n) = (1 — A;)R(n — 1) + 1;):an The (1 — A;,) term in this equation can be viewed as a forgetting factor for the algorithm. The smaller the quantity A;, the lesser the impact of the current point. By controlling the magnitude of the weights, the algorithm can decide the importance between the new and old data. This is why SMoSA can trace some time-varying systems. In this situation, the ellipsoid cannot shrink too fast, and since the parameters are always changing, the ellipsoid cannot converge to zero either. A better way to estimate system parameters is to “over- estimate” the noise bound because we can prevent the ellipsoid from shrinking to a very small set. For the speech signal we discussed in this chapter, if we “over-estimate” the noise 97 Tme value a1 -21 2.137 I I I I I SM-SA ' 2'07 0.5 1 1 .5 2 2.5 3 " x10‘ True value a2--1 .48 '1 .42 I I I I I - A, 8M5 -1 .51 RLS ‘ '1 '52 ’ 0.5 1 1.5 2 2.5 3 " x 1 0‘ True value a3-0.34 0.36 I 1 f I If RLS 1- - 1 2 f SM-SA 4 0.315 ] - 0'31 0.5 1 1 .5 2 2.5 3 " x10‘ Figure 4.18. First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary Bernoulli sequence is used and suboptimal test is employed. 98 True value a1 =21 I 2.09 ‘0 x10 True value a2a-1 .48 I I I I -1.46 -1.48 -1.5 -1 .52 r -1.54 -1.56 -1 .58 True value a3=034 T I I I 0.39 0.33 RLS 0.37 0.36 0.35 0.34 — S—uzM'SA .— h x10‘ Figure 4.19. First 30,000 estimates obtained by SM-SA and RLS when a bias binary Bernoulli sequence is used and suboptimal test is employed. 99 bound so that the bound be chosen as 3.0 I/\ O 0" but the real bound is 103, S 0.25 we will find that the algorithm has better tracking ability as shown in Fig. 4.20. In this case 4.09% of data points are used, which is more than before. Note that SM-SA is developed for time-invariant systems. If it works in a time-varying system, some assumptions may be violated, for example 5,, may become negative. Thus it is not appropriate to apply the suboptimal test to time-varying system since a negative 6;, may not imply a negative c,,. Some simulations show that if we apply the suboptimal test to time-varying systems, the results are often not encouraging. 100 C1) 1WD 2000 3000 4000 5000 (a) 2.5 r - 2 - - 1.5 ~ - 1 .. .- 0.5 r - o .- .1 -0 5 M true -1 - I 'L v ‘1'" HJ—i estimate -1.5 '- 1 Figure 4.20. Estimates of (a) b1(n) (b) 02(11) by using SM-SA algorithm for the word “four” when noise bounds are over-estimated. Chapter 5 The Convergence Properties of UOBE Algorithms UOBE algorithm, a general class of OBE algorithms, is based on generalized WRLS in which very broad classes of “forgetting factors” and data weights may be employed. Histor- ically, this research on the SM-SA algorithm, and that of Krunz on the “Dual-SM-WRLS” algorithm [33, 34], precedes and contributed to the development of UOBE. In this chapter, we will generalize the convergence properties of SM-SA to all UOBE algorithms and show the significant implication of this research to broader applications. A brief review of UOBE is given here. Details are found in [38, 39]. 5.1 The UOBE Algorithm Given the system in (1.5) and the noise bound in (1.6), the hyper-ellipsoid set C(n) associ- ated with UOBE is defined by 0(n) = {o no — 0.0T 07“)“: - a.) s 1} where 0,, is the WRLS estimates obtained by minimizing 1 fl V(0) = ; Z m,,,,(yt. — £9)? 7:1 101 102 in which a,,wn_1,, 1' S n — 1 “’71 r — fin 1' = n and x,, is a scalar quantity, 5,, = 010000,. + i "’n... (mg—,3) 1'=1 C(n) is fundamentally defined as the sum of the weighted outer products, C(n) = a,,C(n — 1) + flnx(n)xH(n) Different algorithms in UOBE are distinguished by the weighting sequences {a,,} and . {flu}, in conjunction with the optimization criterion employed in selecting them. Three criteria have been applied to the optimization process: Optimization Criterion 1 Minimize the determinant of the inverse ellipsoid matrix, 11,, {B(n)} é det{nnC'1(n)}. Optimization Criterion 2 Minimize the trace of the inverse ellipsoid matrix, _ A —1 (14901)} = tr {MC (7%)}- Optimization Criterion 3 Minimize the parameter Kn. Criteria 1 and 2 were first suggested by Fogel and Huang [14], while criterion 3 was used by Dasgupta and Huang [20]. In the single-output case in which C(n) is clearly interpretable as an ellipsoid in 52'", 11,, {C(n)} is proportional to the square of the volume of the ellipsoid, while pt{fi(n)} is proportional to the sum of squares of its semi-axes. Criterion 3 has been used in conjunction with a specific weighting strategy to achieve a rigorous proof in certain sense [20]. Let us adopt the policy of writing the weights an and 6,, as functions of a single param- 103 eter to be optimized at time n, say A,, so that a,,(An) fin = fln(An) 0" (5.1) where {an} and {0,,} should now be considered to be sequences of functions. If we choose a,,(An) = l and fln(z\,,) = A,,, then UOBE is exactly the SM-WRLS algorithm. SM—SA is obtained by choosing a,,(An) = (1 - A,,) and 13710») = A,, and optimization criterion 1. The extension of the optimization process of SM-SA to criterion 2 is left as a future work. 5.2 Convergence Properties of UOBE The most important theorem for the theoretical framework of UOBE is given as follows [38, 39]: Theorem 5.1 Consider a UOBE algorithm in which volume or trace is to be minimized. Then the following are independent of the choices of function sequences {an(/\n)} and {16n(’\n)}-' 1. the sequence of measures of optimality ({pv(n)} or {p¢(n)}); 2. the data points selected (times for which there exists A; > 0); 3. the parameter estimate, 0,,. However, in a UOBE algorithm with n minimization, none of these items is independent of the sequence {an()\,,)} and {fln(An)}. Theorem 5.1 concludes that any UOBE algorithm with volume minimization, regardless of the weighting policy, will “behave” similarly in three senses. By using this theorem, we can extend the convergence properties from SM-SA to UOBE: 104 Corollary 5.1 Consider the UOBE algorithms in which volume is to be minimized. Let {t,,} be a subsequence such that update takes place at time t,,. If there exists an N 6 N and 91, 92 > 0 such that for all n n+N 911 < Z x,,‘xi < 921, (5.2) k=n+l then "112;, "0,, "' 971-1" 3 O- In addition to (5.2), if the noise process {v,,} is independent, and given 6 there exists a 6 such that for all n P(vn 6 DC...) > 6, and P(vn 6 De.) > 6, where Dc... = [\fi — €,\/‘7] and D¢_ = [—‘/'7,-\/'7 + 6], then the volume of the ellipsoid converges to zero and the estimates obtained using any UOBE algorithm converge to the true parameters. This corollary is very important since, through the research on SM—SA, the convergence properties of all UOBE algorithms are established? This implies that SM-SA has much broader implications for the entire class of UOBE algorithms. ’The original OBE paper by Fogel and Huang [14] is sometimes misunderstood to indicate the convergence of the bounding ellipsoid to a point under ordinary conditions on {0...} (see [38]). Now the convergence properties are proved. Chapter 6 Conclusion 6.1 Concluding Remarks 6.1.1 Summary of Landmark Developments of SM-Based Techniques Set-membership-based algorithms are developed to estimate the parameters of a linear system in which the noise perturbance is additive and bounded. After the first publication of the OBE algorithm by Fogel and Huang [14] in 1982, SM-techniques attracted the attention of many research groups in the signal processing field. However, the lack of convergence proof has been a serious weakness for the OBE-like algorithms, although it shows good performance in many applications.. In order to remedy the convergence problem, Dasgupta and Huang proposed a modified OBE [20] in 1987. Instead of minimizing the volume of the ellipsoid, they minimize the upper bound of the set. This algorithm is convergent in the sense that the optimal weight and the difference of the estimates converge to zero. However, it has been criticized by some researchers [30, 38, 39] due to the lack of geometrical meaning. The Adaptive Signal Processing and Speech Processing Laboratories at Michigan State University have been pioneers in the research on SM-based techniques. The problem of the tradeoff between the interpretability and convergence led to the study of SM-SA. During the course of this research, it is found in parallel work that almost all OBE algorithms, can be combined into a unified frame work called UOBE, and, SM-SA is just a special case of UOBE obtained by choosing a,,(An) = 1 - A,, and fln(An) = A,,, and volume minimization. 105 106 The name SM-SA comes from the structural resemblance to the stochastic approximation method. 6.1.2 Contributions of this Research Selective updating so as to improve the computational load is the common feature among all the members in UOBE. In contrast to RLS estimation in which the parameter estimates are updated at every iteration, UOBE updates the estimates only if the incoming data contains sufficient information. In addition to this advantage, the major contribution of this work is the establishment of an algorithm which is both interpretable and convergent. The interpretability of SM-SA comes from the optimization process in which the optimal weight is found by minimizing the volume of the ellipsoid defined by _ A _1 #u{9(n)} = det{KnR (71)}- The convergence properties come from the boundedness of the covariance matrix. In this work, we have shown that: 1. The optimal weight A; converges to zero. 2. The squares of the estimation error sequence fall into a dead zone given by [0,73,] asymptotically. 3. If the noise sequence {v,,} is independent and the probabilities are non-zero in the neighborhoods of the noise hounds, then the volume of the ellipsoid converges to zero. 4. If the noise process is mixing, then under a slight modification of SM-SA, the volume of the ellipsoid converges to zero. More importantly, through the analysis on SM-SA, the convergence of estimates presented in Chapter 3 can be extended to all UOBE algorithms. The other contribution is the illustration of the central difference between SM-SA and RLS. While both algorithms have the same recursive formulation, they show completely different behaviors. The reason is because the weight of SM-SA at time n is a random 107 variable which depends on all the previous and present noise samples, while the weight of conventional RLS or WRLS is deterministic. We have shown the difference in Chapters 3 and 4 by theoretical analysis and simulation studies. 6.2 Future Work This research is concerned with the theoretical analysis of a special case of UOBE, SM- SA. While we have seen in Chapter 4 that SM-SA performs well in tracking time-varying parameters, we should not rely on an algorithm, which is developed in a time-invariant setting, to perform well in a time-varying environment. It is of practical interest to develop the adaptive SM-SA since most engineering applications deal with time-varying models. Secondly, we have shown the convergence of SM-SA only on volume minimization, while it is believed that the convergence properties under trace minimization are similar, it is still necessary to provide a rigorous proof. Finally, it is often difficult to get an accurate estimate for the noise bounds. In order not to violate the bounded error assumption, it is natural to give conservative noise bounds. As we have discussed, this often leads to a bad estimates. One way to solve this problem is to use soft noise bounds in which the bounded noise assumption may be‘violated sometimes. Thus the author suggests the study of soft-bound SM-SA algorithm as a further work. In conclusion, the following topics are proposed for future research: 1. To develop an adaptive SM-SA algorithm. 2. To establish the convergence proof for SM-SA under the trace minimization. 3. To extend the model from ARX to ARMAX. 4. To develop a soft~bound SM-SA algorithm. APPENDICES APPENDIX A Infinite Product Definition A.1 Given an infinite product [1:11 um let p, = [12:1 uh. If no factor a,, is zero, we say the product converges if there exists a number p 96 0 such that {pa} converges to p. If {pa} converges to zero, we say the product diverges to zero. The following theorem is given without proof [9]: Theorem A.l Assume that each a,, 2 0. Then the product [[(1 — a,,) converges if, and only if, the series 2a,, converges. 108 BIBLIOGRAPHY BIBLIOGRAPHY [1] H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat, vol. 22, pp. 400-407, 1951. [2] J. Blum, “Multidimensional stochastic approximation method,” Ann. Math. Stat., vol. 25, pp. 737-744, 1954. [3] N.N. Krasnovski, “On the theory of controllability and observability of linear dynamic systems,” Prikl. Mat. Mech., vol. 28, no. 1, 1964. [4] RC. Schweppe, “Recursive state estimation: Unknown but bounded errors and system inputs,” IEEE Trans. Automatic Control, vol. AC-13, pp. 22-28, February 1968. [5] H.S. Witsenhausen, “Sets of possible states of linear systems given perturbed observa- tion,” IEEE Trans. Automatic Control, vol. AC-13, pp. 556-568, October 1968. [6] P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968. [7] D.P. Bertsekas and LB. Rhodes, “Recursive state estimation for a set-membership description of uncertainty,” IEEE Trans. Automatic Control, vol. AC-16, pp. 117-128, April 1971. [8] J .M. Mendel, Discrete Techniques of Parameter Estimation: The equation Error Far- mulation, Marcel Dekker, New York, 1973. [9] T.M. Apostol, Mathematical Analysis, Addison-Wesley, 1975. [10] RB. Ash and M.F. Gardner, Topics in Stochastic Process, Academic Press, 1975. [11] G.E. Forsythe and M.A. Malcolm, Compute Methods for Mathematical Computations, Prentice-Hall, 1977. [12] L. Ljung, “Analysis of Recursive Stochastic Algorithms,” IEEE' Trans. Automatic Con- trol, vol. AC-22, pp. 551-575, October 1977. [13] E. Fogel, “System identification via membership set constraints with energy constrained noise,” IEEE Trans. Automatic Control, vol. AC-24, pp. 752-758, October 1979. [14] E. Fogel and Y.F. Huang, “On the value of information in system identification - bounded noise case,” Automatica, vol. 18, pp. 229-238, 1982. [15] B.D.O. Anderson and C.R. Johnson, J r., “Exponential convergence of adaptive identi- fication and control algorithms,” Automatica, vol. 18, pp. 1-13, January, 1982. 109 110 [16] L.Ljung and T. Séderstréim, Theory and Practice of Recursive Identification, Cam- bridge, MA: MIT press, 1983. [17] R.R. Bitmead, “Persistence of excitation conditions and the convergence of adaptive schemes,” IEEE Trans. on Information Theory, vol. IT-30, No. 2, pp. 183-191, March 1984. [18] A. Papoulis, Probability, Random Variables and Stochastic Processes, (2nd edition), McGraw-Hill, 1984 [19] G.C. Goodwin and KS. Sin, Adaptive Predition, Filtering, and Control, Prentice-Hall, 1984 [20] S. Dasgupta and Y.F. Huang, “Asymptotically convergent modified recursive least- squares with data-dependent updating and forgetting factor for systems with bounded noise,” IEEE Trans. on Information Theory, vol. IT-33, pp. 383-392, May 1987. [21] T. Soderstrom and P. Stoica, System Identification, Prentice-Hall, 1988 [22] K.J. Astriim and B. Wittenmark, Adaptive Control, Addison-Wesley, 1989 [23] J .P. Norton, “Recursive computation of inner bounds for the parameters of linear model,” Int. J. Control, vol. 50, pp. 2423-2430, 1989. [24] J .R. Deller, Jr. and S.F. Odeh, “Implementing the optimal bounding ellipsoid algorithm on a fast processor,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89, Glasgow, vol. 2, pp. 1067-1070, May 1989. [25] J .R. Deller, Jr., “Set-membership identification in digital signal processing,” IEEE ASSP Magazine, vol. 6, pp. 4-22, October 1989. [26] J .R. Deller, Jr. and T.C. Luk, “Linear prediction analysis of speech based on set- membership theory,” Computer Speech and Language, vol. 3, pp. 301-327, October 1989. [27] R.A. Decarlo, Linear Systems, Prentice-Hall, 1989 [28] S.F. Odeh Algorithms and Architectures for Adaptive Set Membership-Based Signal Processing, Ph.D dissertation, Michigan State University, 1990. [29] AK. Rao, Y.F. Huang, and S. Dasgupta, “ARMA parameter estimation using a novel recursive estimation algorithm with selective updating,” IEEE Trans. on Acoust., Speech, and Signal Process, vol. 38, pp. 447-457, March 1990. [30] J .P. Norton and S.H. Mo, “Parameter bounding for time-varying systems,” Mathemat- ics and Computers in Simulation, vol. 32, pp. 527-534, 1990. [31] M.F. Cheung 0n Optimal Algorithms for Parameter Set Estimation, Ph.D dissertation, Ohio State University, 1991. [32] M.F. Cheung, S. Yurkovich, and KM. Passino, “An optimal value ellipsoid algorithm for parameter set estimation” Proc. of the 30th Conference on Decision and Control, Brighton, pp. 969-974, December 1991. 111 [33] M. Krunz Numerical Stability and Convergence Issues in the SM- WRLS Algorithm, MS. Thesis, Michigan State University, 1992. [34] M. N ayeri, J .R. Deller, Jr., and M.M. Krunz, “Convergence and colored noise issues in bounding ellipsoid identification,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’92, San Francisco, vol. 5, pp. 337-340, March 1992. [35] M. N ayeri, J .R. Deller, Jr., and MS. Liu, “Optimal bounding error algorithms with adaptive tracking capability,” Proc. of the 26th Annual Asilomar Conference On Sig- nals, Systems, and Computers, Pacific Grove, October 1992. [36] M. Nayeri, M.S. Liu, and J .R. Deller, Jr., “An interpretable and converging set mem- bership algorithm,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’93, Minneapolis, April, 1993. [37] J .R. Deller, J .G. Proakis, and J .H.L. Hansen, Discrete- Time Processing of Speech Sig- nals, New York, Macmillan, 1993. [38] J .R. Deller, Jr., M. Nayeri, S.F. Odeh “Least squares identification with error bounds for real-time signal processing and control,” Proceeding of the IEEE, to appear. [39] J .R. Deller, Jr., M. Nayeri, M.S. Liu “Unifying the landmark developments in OBE processing,” Int. Journal Automatic Control and Signal Process., to appear. HICH IES llllllll Lb.‘ I §ou4»--—-‘