i. 37.. e. "‘3’ 1. a...)éiinfifirfia . _ $32.2 . , . . THESIS . , . lUlIlHlllUllllIllHlHlllllllllHlllilI‘IIHJUllllHlllHl 301564 4614 This is to certify that the dissertation entitled OPTIMAL BOUNOING ELLIPSOID ALGORITHMS NITH AUTOMATIC BOUND ESTIMATION presented by Tsung Ming Lin has been accepted towards fulfillment of the requirements for Ph.D. degree in ETECtrica] Eng. WI? /\T Major professor Date DiC QC" 1/776 MS U i: an Affirmative Action/Equal Opportunity Institution 0—12771 V- fiViv ~ —v v—v v- v“ r- - v - v T v v ‘ ’ LIBRARY Mic Z‘iigan State University PLACE ll RETURN BOX to remove this checkout hum your "cord. To AVOID FINES rotum on or before data duo. DATE DUE DATE DUE DATE DUE l MSU Is An Affinnotivo ActioNEqud Opportunity institution m OPTIMAL BOUNDIN G ELLIPSOID ALGORITHMS WITH AUTOMATIC BOUND ESTIMATION By Tsung-Ming Lin A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1996 ABSTRACT OPTIMAL BOUNDING ELLIPSOID ALGORITHMS WITH AUTOMATIC BOUND ESTIMATION By Tsung-Ming Lin This research is concerned with new Optimal Bounding Ellipsoid algorithms with Automatic Bound Estimation (OBE-ABE) that can be applied to model parameter estimation, system identification, and related topics. To achieve convergence, conven- tional OBE algorithms require the a priori knowledge of the noise bounds which are unavailable in most real-world applications. It has been found that the exact knowl- edge of the noise bounds is essential to the performance of OBE algorithms. The new OBE-ABE algorithm, its computationally efficient version, Sub-OBE—ABE al- gorithm, and an adaptive version, Adaptive Sub-OBE—ABE algorithm, introduced in this dissertation do not require this knowledge. With the help of the automatic bound estimation (ABE) procedure, the new algorithms have excellent performance with re- spect to convergence, speed of convergence, computational efficiency, and tracking capability. Another excellent feature of the new algorithms is the convergence in colored noise, or even non-stationary noise environments, which is theoretically impossible for other well-known algorithms, e.g., Recursive Least Square (RLS), Least Mean Square (LMS), and the Kalman-Bucy Filter. Due to these distinctive features, the new algorithms are expected to have wider applications than others. Rigorous proofs for the almost sure convergence and convergence in probability of the new algorithms are provided for the cases of iid, mixing, ergodic, and non- stationary noises. Simulations in these noise cases are presented to support the proven convergence and to demonstrate other properties. The new algorithms are successfully applied to solve two real-world problems, the linear prediction analysis of speech and the blind-deconvolution problem. Copyright by Tsung-Ming Lin 1996 To SUMA CHIN G HAI Acknowledgments I am indebted to my thesis advisor, Professor Majid Nayeri, for his encouragement, patient guidance, and many hours spent in discussing with me. Thanks are also given to my co-advisor, Professor John R. Deller, Jr., for his generous literature support, valuable comments, and help in reviewing my dissertation. I would also like to thank other members in my Ph.D. guidance committee, Pro- fessor Clifford Weil and Professor Hassan K. Khalil, for their help and encouragement in reviewing my dissertation. I would like to express my gratitude to my mother and my wife for their love, sacrifice, and financial support. vi Contents List of Tables List of Figures 1 Introduction and Background 1.1 Introduction ................................ 1.2 Methods for Parameter Estimation ................... 1.3 Review of OBE Algorithms ....................... 1.4 Motivation for New Algorithms ..................... OBE Algorithms with Automatic Bound Estimation 2.1 Introduction ................................ 2.2 Formulation of OBE algorithms ..................... 2.3 OBE-ABE Algorithm ........................... 2.4 Sub-OBE—ABE Algorithm ........................ 2.5 Adaptive Sub-OBE—ABE ......................... Convergence Analysis 3.1 Introduction ................................ 3.2 Definitions ................................. 3.3 Persistency of excitation condition .................... 3.4 Lemmas .................................. 3.5 Almost Sure Convergence of OBE-ABE ................. 3.6 Convergence in Probability of OBE-ABE ................ 3.7 Almost Sure and Probability Convergence of Sub-OBE—ABE ..... 3.8 A Necessary Condition .......................... Simulation Studies 4.1 Introduction ................................ 4.2 IID Noise Cases .............................. 4.2.1 OBE—ABE vs. OBE ........................ 4.2.2 Sub-OBE—ABE vs. OBE-ABE .................. 4.3 Colored Noise Cases ........................... 4.3.1 OBE—ABE vs. OBE ........................ 4.3.2 Sub-OBE-ABE vs. OBE-ABE .................. 4.4 Tracking Performance of Adaptive Sub-OBE-ABE ........... vii ix NWMi—‘H 11 11 11 14 14 16 20 20 21 25 29 35 42 50 53 55 55 56 57 64 64 68 69 74 5 Application to Linear Prediction Analysis of Speech 5.1 Introduction .......... 5.2 LP model of speech ...... 5.3 LP analysis using OBE-ABE . 5.4 Sub-OBE-ABE and computational complexity ............. 6 Application to Blind Deconvolution 6.1 Introduction .......... 6.2 IID input and IIR filter . . . . 6.3 IID input and FIR filter . . . 6.4 Colored Input and IIR Filter 7 Conclusions 7.1 Concluding Remarks ..... 7.2 Contributions and further work ..................... Appendix A Bibliography viii 84 84 84 86 94 99 99 101 102 102 109 109 110 112 116 List of Tables 2.1 The OBE-ABE algorithm. ........................ 15 2.2 The Sub-OBE—ABE algorithm ....................... 17 2.3 The Adaptive Sub-OBE-ABE algorithm. ................ 19 ix List of Figures 1.1 Block diagram of a model for a dynamic system ............. 2 1.2 OBE algorithm with 7,, = 0.8 (underestimated) on model (1.3). vn ~ U(—1, 1) ................................... 8 1.3 OBE algorithm with 7,, = 1.2 (overestimated) on model (1.3). vn ~ U(-1,1) ................................... 9 1.4 Volumes of ellipsoids (after 100 runs) of SM-SA with 7,, = 1, 1.1, 1.5 respectively, and OBE—ABE (70 = 1.5) on model (1.3). vn ~ U(—l, 1). 9 4.1 Estimators of a1. = 2. CASE 1 : 'vn ~ B(-1,1) is non-zero mean for model (4.1).~ ................................ 59 4.2 CASE 1 : U B(— -l,1) is non-zero mean for model (4.1). ...... 59 4.3 CASE 1 : vn~ B-—( -,1 l) is non-zero mean for model (4.1). ...... 60 4.4 Estimators of a1... = 2. CASE 2 : vn ~ U(—1,1) for model (4.1). . . . 60 4.5 CASE 2 : vn ~ U(— 1, 1) for model (4.1). ................ 61 4.6 CASE 2 : vn ~ U(—1,1) for model (4.1). ................ 61 4.7 Estimators of 01... = 2. CASE 3 : 1),, ~ B(—0.5, 1), (4.3), for model (4.1). 62 4.8 CASE 3 : 22,, ~ B(—0.5, 1), (4.3), for model (4.1). ........... 62 4. 9 CASE 3 : 2),, ~ B(—0.5,1), (4.3), for model (4.1). ........... 63 4.10 CASE 4 : The final ellipsoid and the trajectory of the estimator. 12,, ~ B(-l,1), (4.2), for model (4.4). ..................... 63 4.11 Estimators of a1. = 2. CASE 1 : vn ~ B(-1,1) is non-zero mean for model (4.1). ................................ 65 4.12 CASE 1 : vn ~ B(—1, 1) is non-zero mean for model (4.1). ...... 65 4.13 Estimators of a1... = 2. CASE 2 : ‘vn ~ U(—1,l) for model (4.1). . . . 66 4.14 CASE 2: un ~ U(—1, 1) for model (4.1). ................ 66 4.15 Estimators of a1. = 2. CASE 5: on is colored in (4.5) for model (4.1). 69 4.16 CASE 5 : un is colored in (4.5) for model (4.1). ............ 70 4.17 CASE 5 : vn is colored in (4.5) for model (4.1). ............ 70 4.18 Estimators of a1. = —0.1. CASE 6: un is colored in (4.5) for model (4.6). 71 4.19 CASE 6 : vn is colored in (4.5) for model (4.6). ............ 71 4.20 CASE 6 : vn is colored in (4.5) for model (4.6). ............ 72 4.21 Estimators of a1. = 2. CASE 7 : Colored noise vn ~ B(-0.5, 1), (4.7), for model (4.1). .............................. 72 4.22 CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7), for model (4.1). . . . 73 4.23 CASE 7 : Colored noise vn ~ B(—0.5,1), (4.7), for model (4.1). . . . 73 4.24 CASE 8 : The final ellipsoid and the trajectory of the estimator. vn ~ B(—1,l), (4.5), for model (4.4). ..................... 74 4.25 Estimators of a1. = 2. CASE 5: un is colored in (4.5) for model (4.1). 75 4.26 CASE 5 : 127, is colored in (4.5) for model (4.1). ............ 75 4.27 Estimators of a1. = —0.1. CASE 6: 12,, is colored in (4.5) for model (4.6). 76 4.28 4.29 4.30 4.31 4.32 4.33 4.34 4.35 4.36 4.37 4.38 4.39 5.40 5.41 5.42 5.43 5.44 5.45 5.46 5.47 5.48 5.49 5.50 5.51 5.52 5.53 5.54 CASE 6: v,, is colored in (4.5) for model (4.6) .............. 76 Non-adaptive OBE algorithm on the time-varying system (4.8). 7,, = 1.5. 78 Non-adaptive OBE (SM-SA) algorithm on the time-varying system (4.8). 7,, = 1.5 ................................... 79 Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8). p = 4.5%. ................................. 79 Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 3%. 80 Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8). p = 5.5%. ................................. 80 Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 3.5%. 81 Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 7.5%. 81 Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 5.5%. 82 Adaptive Sub-OBE—ABE on the time-varying system (4. 8). p— — 10%. 82 Adaptive Sub-OBE-ABE on the time-varying system (4. 8) (1),, ~ B(— l ,1)). p: 31%. ................................. 83 Adaptive Sub-OBE—ABE on the time-varying system (4. 8) (12,, ~ B ( 1, 1)). p: 33%. ................................. 83 Upper graph. Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 88 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 88 Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 89 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 89 Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 90 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 90 Spectra of voiced /I/ phoneme. Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). ........................... 91 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line) ......... 91 Upper graph: Direct 512-point F F T spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line) ......... 92 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line) ......... 92 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line) ......... 93 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line) ......... 93 Spectra of simulated voiced /u/ phoneme. Upper graph: Direct 512- point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation method (solid line) .......... 95 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 95 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line). 96 xi 5.55 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 96 5.56 Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 97 5.57 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 97 5.58 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 98 6.1 Model of the received signal in the blind-deconvolution problem. . . 100 6.2 Block diagram of the blind-deconvolution problem. .......... 103 6.3 Estimates of al. of OBE (7,, = l) and OBE-ABE (70 = 10) for Fig. 6.2 (iid-IIR, S/N = 42 dB). ......................... 103 6.4 Volumes of ellipsoids of OBE (7,, = 1) and OBE-ABE (70 = 10) for Fig. 6.2 (iid-IIR, S/N = 42 dB). .................... 104 6.5 First 100 samples of 13,, (top), 13,, (middle), and 2),, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (iid-IIR, S/N = 42.5 dB). ...... 104 6.6 Estimates of al. of OBE (7,, = 1) and OBE-ABE (‘70 = 10) for Fig. 6.2 (iid-IIR, S/N = 36.5 dB). ........................ 105 6.7 First 100 samples of 6,, (top), 6,, (middle), and 12,, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (iid-IIR, S/N = 36.5 dB). ...... 105 6.8 First 100 samples of 6,, (top), 0,, (middle), and 1),, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (iid-FIR, S/N = 42.5 dB). ...... 106 6.9 First 100 samples of 13,, (top), 6,, (middle), and 1),, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (IID-FIR, S/N = 36.5 dB). ..... 106 6.10 First 100 samples of 23,, (top), 6,, (middle), and 12,, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (Colored-HR, S/N = 36.5 dB). . . . 107 6.11 First 100 samples of 5,, (top), 6,, (middle), and 1),, (bottom) using OBE-ABE (70 = 10) in Fig. 6.2 (Colored-HR, S/N = 22.5 dB). . . . 108 xii Chapter 1 Introduction and Background 1.1 Introduction From engineering to social science, people use modeling to help understand and de- scribe observed phenomena. For example, in economics, modeling is usually used to predict inflation and study the trend of stock markets. In engineering fields, a dynamic system or signal is usually modeled before further analysis or processing. In general, a model can be described as in Fig. 1.1 in which the noise (or disturbance or model-error) sequence {v,,} is unobservable. Modeling is usually classified into two categories: Nonparametric and parametric [42]. A nonparametric model is described by a curve, table or function. Transient analysis and correlation analysis are two well-known methods for studying nonpara- metric models. A parametric model is usually described by difference equations (or differential equations in continuous time) with system parameters. Typical paramet- ric models are the auto regressive with exogenous input (ARX), auto regressive moving average with exogenous input (ARMAX) and the state-space structure. Basic laws from physics or science are usually employed to construct a parametric model. However, in many applications, the dynamic systems are too complex to model using only physics laws. % H(Z) > Figure 1.1: Block diagram of a model for a dynamic system. Hence, numerous system—identification or parameter-estimation techniques have been developed to estimate the unknown parameters in a model. Some well-known methods are briefly reviewed in the next section. The main objective of this research is to devise a new algorithm, optimal bounding ellipsoid with automatic bound estimation (OBE-ABE) algorithm and its variants for the estimation of model parameters. The new algorithms require less restrictive a priori knowledge and conditions than conventional methods for the convergence (or consistency) of the estimator. In addition, the performance of the new algorithms with respect to speed of convergence, computational efficiency, and tracking capability is superior to conventional methods. 1.2 Methods for Parameter Estimation There are varieties of methods for estimating the model parameters. Among those are minimum mean square error (MMSE), maximum likelihood (ML), and least square error (LSE), etc. [22, 42]. They are all batch methods. Well-known recursive (or real-time) methods are the recursive least square (RLS) [22], least mean square (LMS) [22], instrumental variable (IV) [42], Kalman-Bucy filter [42], and optimal bounding ellipsoid (OBE) [8, 11, 14, 20, 35] algorithms. One of the most popular methods in engineering applications is the LSE or its recursive counterpart, RLS, since they require only the whiteness of the noise sequence {v,,} for convergence and have simple structure with well-understood convergence performance under regular conditions [i.e., wide-sense stationarity (WSS) and second-moment er- godicity (SME)]. RLS has been modified to a weighted RLS (WRLS) [22, 42] using a forgetting factor for tracking time-varying parameters. Since its invention fifteen years ago, OBE has attracted attention from people in real-time signal processing [11, 15], system identification [3, 10], adaptive control [24], and other fields in which computational efficiency and speed of convergence are critical, and the statistics of the inputs and noise are unknown. However, its advan- tages (over those of popular RLS) with regard to computational efficiency and faster speed of convergence in some noise cases (see Chap. 4) are offset by the requirement of known noise bounds which are unavailable in most applications. Without a priori knowledge of the exact noise bounds, the performance of OBE suffers (Chap. 4). This is why OBE has been rarely found in real-world applications. 1.3 Review of OBE Algorithms In 1968, Schweppe introduced the first bounding ellipsoid algorithm (BE) [41] for estimating the states of a state space model with the assumption of bounded inputs and bounded noises. With this assumption, BE algorithm produces at each step a feasible set in state space instead of a single point which is normally produced by other well-known methods, e.g., the Kalman-Bucy filter. The BE algorithm has a set of recursive formulae similar in structure to the Kalman-Bucy filter. Without a priori 3 knowledge of the statistics of the inputs and the noise which are usually required by other methods, BE has potential for wider applications. However, as pointed out by Schweppe, he feasible set of the algorithm at each step is usually impractical to calculate. He did not provide the convergence analysis of BE algorithm either. Thereafter, Witsenhausen [47], Bertsekas and Rodes [4] published set-membership (SM) algorithms based on the same bounded-input and bounded-noise assumptions. Although SM has a smaller feasible set (convex polytope, see Chap. 2) at each step, the mathematical analysis of SM is more complicated than that of BE. In 1979, Fogel published 3. BE algorithm [19] for the identification of the ARX model with the implicit assumption that the noise sequence {v,,} has an asymptotically— known accumulative energy bound. With this knowledge and the assumptions of the white noise {v,,}, the regular WSS, and SME, Fogel has shown, in a deterministic way, the convergence of his BE algorithm. In other words, the sequence of the ellip- soids of BE algorithm is shown to asymptotically reduce to a point (the true model parameter vector). Although the recursive formula of BE is similar in structure to RLS , the computation is less efficient than RLS. In 1982, Fogel and Huang published the first OBE algorithm (F/H OBE) [20] which featured selective updating to ignore “redundant” data (in the OBE sense). This innovative feature improves the computational efficiency of F ogel’s BE algorithm. However, F / H OBE algorithm still has 0(m2) computational complexity due to the check for selective updating at each step (Chap. 2). Selective updating is achieved by variably weighting the incoming data for an optimal (maximum reduction in size) ellipsoid at each step. The optimization is based on the assumption that the noise bounds are known pointwise. If incoming data set does not help shrink the ellipsoid, the weighting is set to zero, i.e. the incoming data set is “redundant” and hence discarded. In contrast to OBE, Fogel’s BE algorithm has constant data weights (= 1). Detailed formulation of OBE algorithms can be found in the next chapter. Fogel and Huang also provided a proof of convergence of OBE based on the as- sumption that the noise sequence is white and its pointwise upper bounds are known in addition to the regular WSS and SME. However, the correctness of the proof is arguable (also see [13, p.1913]) because the authors use in the proof the questionable assertion that convergence of Fogel’s BE implies convergence of OBE. In fact, the whiteness of the noise is neither sufficient nor necessary for the convergence of the OBE algorithm. Theorem 3.4 and 3.6 in Chap. 3 validate this assertion. In 1987, Dasgupta and Huang published another version of OBE (D/ H OBE) [8] and showed the “convergence” of its estimator to a non-infinitesimal region of the true parameter vector under the same assumptions. An interesting and practically important feature of the algorithm is that it has an efficient 0(m) check for innovation instead of 0(m“) in F/ H OBE. However, comments are found in literature [14, 37] on the lack of interpretability of its optimization criterion [14], i.e., it minimizes, at each step, a quantity which is not directly related to the “size” of the ellipsoids. Thereafter, other versions of OBE algorithm, e.g. set-membership weighted recur- sive least square (SM-WRLS) [11, 15, 37], and set-membership stochastic-approximation (SM-SA) [11, 27, 35], were developed by Nayeri, Deller, and their students. SM-WRLS is the first algorithm to clearly relate the OBE philosophy and conventional WRLS (see [11]). They also showed that all versions of OBE algorithm based on the same optimization criterion of minimizing the ellipsoid at each step, are basically the same algorithm [14]. Specifically, F/ H OBE, SM-WRLS, and SM-SA, though different in their weighting policies and computational complexities, have identical estimators, ellipsoids, selected data, and convergence performance. A suboptimal check for inno- vation is introduced to SM-WRLS that results in a computationally efficient version of “interpretable” OBE algorithm [13, 37]. Further, using SM-SA, the first proof of convergence in probability (p. convergence) [17] for OBE algorithms were accom- plished under the assumptions of independent identically distributed (iid) noise and symmetric bounds. The last OBE-like algorithm Seeking to improve the performance of OBE in this review is optimal volume ellipsoid (OVE) algorithm proposed by Cheung et a1. [7] in 1991. The algorithm uses an affine transformation [21] to get the smallest ellipsoid at each step. The simulations in [7] show that the ellipsoid at each step is smaller than those of OBE. However, the estimator of OVE, at the expense of computational cost, does not show significant improvement over those of OBE’s (also see [27, p.6]). A proof of “convergence” of OVE similar to that of D/ H OBE was provided in [7]. In 1991, Veres and Norton published a paper [44] presenting the first p. conver- gence (under very general conditions) of the exact polytope algorithm (EPA) which is an SM algorithm with a polytope feasible set at each step. Although they also provided a proof of almost sure (a.s.) convergence (or strong convergence in their notation with equivalent definition) for the EPA, the justification is arguable. What they have proved for a.s. convergence [on a probability space ((1, f, P)] is: P(U:.“i=1{w= no. — o.” < 6}) = P(,,]i_{IgO(U$."=1{w= no. - o." < e1»: 1. v: > o (1.1) or, equivalently, after suppressing w for brevity, manna. - a.“ > e» = ammrsuw. — a.“ > e») = o, v e > o (1.2) which are not sufficient to imply a.s. convergence. This becomes evident upon com- paring (1.1) with (3.3), or (1.2) with (3.6). 1.4 Motivation for New Algorithms Despite their superiority over RLS in computational efficiency and speed of conver- gence in many noise cases, OBE algorithms have been rarely found in real-world applications. The main reason is that the assumption of the known noise bounds, while less restrictive (without the knowledge of statistics of noise, e.g., mean, variance, probabilistic distribution, etc.), is often impractical. Lack of proof of convergence for OBE in more realistic noise cases (than symmetric iid noise) is another (minor) rea- son. Since the noise sequence {v,,} is usually unobservable in practice, exact noise bounds are not available in most applications. Without the knowledge of exact noise bounds, the performance of convergence and the speed of convergence of OBE suffers. If one or more bounds are underestimated, i.e., v: < 7,, at one or more n, then the algorithm is no longer theoretically valid. Simulations reveal that the underestimation of the noise bounds results in an inconsistent [17, 42] or diverging estimator. A conservative sequence of bounds {7,,} (overestimated bounds) will assure a meaningful ellipsoid at each n. However, recent work of Nayeri et al. [27, 35] has demonstrated that the estimator may be very imprecise, even asymptotically, if the bounds are too “loose.” To illustrate the effects of incorrectly estimated error bounds, an AR(3) model for Fig. 1.1 is constructed as follows: yn = 03X" + ”n (1.3) in which x,, = [y,,_1 , y,,_2, y,,_3]T is a sequence of observable vectors, 0,. = [2, —1.48, 0.34]T is the unknown parameter vector to be identified, and the unobservable noise sequence {v,,} is uniformly distributed and iid with constant bound 7,. = 1. Figure 1.2 shows 2.1 1.95 1.9 0.5 4511111;1 . 1 10002000300040005000600070000000900010000 Figure 1.2: OBE algorithm with 7,, = 0.8 (underestimated) on model (1.3). v,, ~ U ( — 1 , 1). the simulation result of OBE (SM-SA) algorithm (see Chap. 2 for details of the formu- lation) identifying this AR(3) model with under-estimated error bound 7,, = 0.8 for all n. As seen in the figure, the volume of the ellipsoid becomes negative quickly and the estimator diverges. This is the result of violating the underlying noise-bounding assumption of the algorithm. On the other hand, the simulation result of the algo- rithm using over-estimated error bound 7,, = 1.2 Vn is shown in Fig. 1.3. As seen in the figure, the estimator does not converge well to the true parameter (a, = 2 in the figures) even after 10,000 iterations. The speed of convergence of OBE becomes slower if a looser overbounding se- quence is employed. This is observed in Fig. 1.4 which shows the simulation results of 100 runs of the OBE algorithm on model (1.3). The numbers on the figure beside each dashed line are the estimated noise bounds. In this dissertation, a new algorithm, OBE-ABE, and its variants are devised to improve the applicability of OBE algorithms while preserving or improving the supe- rior performance of OBE algorithms in real-world applications. Proofs of a.s. conver- 8 1.“ .1 1. L l l l 1 I l l l 9 1WD 2000 3000 4W 5W 6N0 7WD 0000 9000 10000 VoiumesoiEiihooids 10" r I 1 j I I l I I i : : 10'0E..._:.. i .. .1 f . 5 i 10° k... .1 owoozoooaooowoosoooeooomoosooosooowooo Figure 1.3: OBE algorithm with 7,, = 1.2 (overestimated) on model (1.3). v,, ~ U(—1,1). 102 r - 10° . 260 430 e130 aéoioboaziaoiiooiiooiioozooo Figure 1.4: Volumes of ellipsoids (after 100 runs) of SM-SA with 7,, = 1,1.1,1.5 respectively, and OBE-ABE (70 = 1.5) on model (1.3). v,, ~ U (—l, 1). gence and p. convergence of the new algorithms in the cases of iid, mixing, ergodic, and non-stationary noise with asymmetric bounds are provided. The new algorithms have efficient computation, excellent convergence performance and tracking capabil- ity. The details of the new algorithms are found in the subsequent chapters. A summary of the major contributions of this research is found in Chap. 7. 10 Chapter 2 OBE Algorithms with Automatic Bound Estimation 2.1 Introduction In this chapter, the new algorithms, OBE-ABE, Sub-OBE-ABE, and the adaptive versions of these two algorithms are introduced. The algorithmic steps are listed in Tables 2.1 — 2.3. More details of these algorithms are found in subsequent chapters. 2.2 Formulation of OBE algorithms A dynamic system or signal can often be modeled by a linear-in-parameters model ya = 0339. + vn (2.1) in which 0... = [a1., - . - ,ap., b0.., b1..., - - . , bq.]T is the unknown parameter vector to be identified; {x,,} is a sequence of observable vectors of dimension m = p + q + 1; and {v,,} is an unobservable noise or model-error sequence. An important special case is the auto-regressive with exogenous input (ARX) [42] model in which x,, = 11 [y,,_1, - . - ,y,,_p,u,,, - . - ,u,,_q]T is the observed data set composed of samples of the observable input sequence {u,,} and output sequence {y,,}. All versions of OBE algorithms [8, 11, 14, 20, 35] are based on the premise that v,, has a pointwise bound that is known a priori, v: S 7,,, ‘v’n E N. (2.2) Let A,- denote the parameter set at time i such that all elements in A,- are feasible parameter estimates consistent with (2.2). In conjunction with (2.1), it is clear that A,- is a (hyper-) strip region which can be expressed as A; = {i9 2 0 E Rm, (3]; — 0TXi)2 S. 7i} As time i goes from 1 to n, the feasible set at time n, B,,, is the intersection of A,, Vi E [1,n], i.e., B" = n?=1A,. In general, 8,, is a convex (hyper-) polytope in R“ which is mathematically difficult to track. Some SM algorithms are based on minimizing B,, at each step while OBE algorithms are based on minimizing a (hyper-) ellipsoid E',, [see (2.3)] which is a superset of B,,. It can be shown [14] that the ellipsoid associated with any OBE algorithm is given by E, ‘1—3‘ {9 : (o — 0,)T P;1 (a — 0,) < in} (2.3) where 0,,, 5,, and the matrix P,, are computed recursively using an = XnTPn_1Xn (2.4) 12 5,, = y,-93f_,x,, (2.5) 1 ,BnPn—lxnanPn-l pa = _ pn_ _ 2. (Yul 1 an'l'flnGn l ( 6) 0,, = on-l'l’flnpnxngn (27) 2 a,, 7157, «In = awn-1 +fln7n-fim- (2.8) Recursions (2.4) — (2.8) comprise a general OBE algorithm. The ellipsoid center 9,, can be used as an estimator of the parameters 0,. at each n. The matrix P? satisfies the following recursion: P;1 = a,, P31 + 6,, xnxz. (2.9) The OBE algorithm is usually initialized with 00 = 0, no = p and P0 = L151, where p is a small number, typically 10‘3. In almost every OBE algorithm, the nonnegative data weights {a,,} and {fin} are calculated according to an optimization criterion which minimizes the “size” of the set E,, at each iterationl. When such optimal weights do not exist, the updating need not take place. This prominent feature is called selective updating. OBE algorithms are distinguished from one another by the choices of weighting sequences {a,,} and {Ba} [14]. For example, the SM-SA algorithm has a,, = 1 — )1,, and [3,, = An in which A" is given by -bn+\/b3,-4ancn 2a” , if c,,<0 ‘) A“: 0, if anO where a,, = m7,, — meg, + mGi7,, — 2mG,,7,, — n,,-1G,, + rc,,_1 G: + 0,,7,, — 03,7“ — 520,, b,, = 2mg“, - 2m7,, + 2mG,,7,, + 2n,,_1G,, — “-103, — Gn7,, + 6:0,, 1The exception is the D/ H OBE algorithm [8] in which a,, is minimized at each iteration. 13 c,, = m7,, — me: — a,,_1G,,. (2.10) 2.3 OBE-ABE Algorithm While OBE algorithms require a priori knowledge of exact noise bounds to achieve convergence and fast speed of convergence (see Fig. 1.4 and Theorem 3.6 in Chap. 3), the new algorithm, OBE-ABE, does not require this knowledge. OBE-ABE theoreti- cally requires a lower bound of tail probability but practically does not (see Remark 2 after Theorem 3.3). Initialized with any overestimated noise bound, OBE-ABE au- tomatically (or blindly) updates its estimated noise bound 7,, to the unknown true bound 7..., while concurrently shrinks the ellipsoids to get a consistent estimate. The OBE-ABE algorithm is found in Table 2.2. As seen in the table, the ABE procedure (below “otherwise”) features selective updating and is computationally negligible, hence preserves the computational efficiency of OBE. Further, the ABE procedure guarantees the convergence of the estimator, increases the speed of convergence, and increases the efficiency of updating the estimator (see Chap. 4). The convergence analysis of OBE-ABE algorithm is found in Chap 3. 2.4 Sub-OBE-ABE Algorithm In optimal form, all versions of the OBE algorithm except D / H OBE have an 0(m2) check for innovation at each step. Hence, the computational complexity of OBE algorithms is similar to the popular RLS. To fully benefit from the selective updating feature and achieve superior computational superiority over RLS, a modification of the check for innovation to an 0(m) operation is necessary. One approach is developed by Deller et al. [11, 13]. The modified OBE algorithm (Sub-OBE algorithm) in [11, 13] are 0(pm2) complexity with p indicating the fraction of data found to be innovative. 14 I. Initialization: 1. 00 = 0, no = p and P0 = Eli-I, where p is a small number, typically 10‘3. 2. 70 = any overestimated bound. 3. Choose 6 (small positive number), M and N (see Remark 2 following Theorem 3.3). II. Recursion: For n = 1 : N If a,, < 0 Execute recursions Gn = xzpn-lxn 5n = yn — oil—Ix" ,BnPn-l xnanPn—l an + 31101: 9n = 071-1 + ,BnPanEn 1 Pn = anpn-l - ] 071167153; CIn + flnGn . A:11 = annn—l + iBn7n —' Otherwise, If a time interval I of length M over which a,, Z 0 is found, 7n={7,,_,—dJ, ifdJ>0 7,,..1 , otherwise, where d; déf n1_1GJ/m — e(2,/7,,_1 — e) and J = arg maxnel 5,2,. Table 2.1: The OBE-ABE algorithm. 15 Hence, for small m (<10), the Sub-OBE algorithm is an 0(m) algorithm since p is typically near 0.1 for a uniformly distributed {v,,}. However, the same 0(m) formula for checking of the Sub-OBE algorithm cannot be applied directly to the OBE-ABE algorithm, since the ABE procedure will fail [(3.61) no longer holds]. Hence, the Sub-OBE—ABE algorithm is endowed with another 0(m) checking formula for innovation: —- 2 T where d,, has the recursion“: T dn = andn—l + flnxnxno Justification of this check is found in Chap. 3. The steps of Sub-OBE—ABE algorithm are shown in Table 2.2. The Sub-OBE—ABE algorithm is compared to the OBE- ABE algorithm with regard to speed of convergence and computational complexity in Chap. 4. 2.5 Adaptive Sub-OBE-ABE Tracking performance of adaptive OBE algorithms to time-varying parameters have been investigated in [11, 37, 38]. In those papers, adaptive OBE algorithms have been shown to have superior tracking capability to RLS, LMS, and their variants. In [38], Rao and Huang modified D/ H OBE algorithm by resetting n,,_1, when- ever n,, < 0, to a value C + K1 or C + K2, where K1, and K2 are two positive values obtained from the algorithm’s parameters and data at time n, and C is a “safety fac- 2Preliminary version of Sub-OBE-ABE algorithm computes l/d,, to be the minimum eigenvalue of P,, that features selective updating. Nayeri modifies it to this 0(m) recursion. 16 I. Initialization: 1- 90=oaflo=flido=fifiwdpo=firli where p is a small number, typically 10‘3. 2. 70 = any overestimated bound. 3. Choose 6 (small positive number), M and N (see Remark 2 following Theorem 3.3). II. Recursion: For n = 1 : N Ha<0 Execute recursions G,, = xZPndxn 5n = yn - OZ-IXH 1 ,BnPn-lxnanPn—l Pa = '— Pn- '— an[ 1 0,; + flnGn 0n = 011-1 + ,BnPanEn ] 011131.63, an + .BnGn dn = audit-l + flnxzxrv Kn = CYnfiin-l + fln7n " Otherwise, If a time interval I of length M over which E,, Z 0 is found, ,7 ={ 7n-l_dJ9 ifdJ>0 7,,..1 , otherwise, where d,, déf n1_1x§xJ/(d1-1m)—e(2,/7,,_1-e) and J = arg maxnel 53,. Table 2.2: The Sub-OBE—ABE algorithm. 17 tor” chosen to be 1. Equivalently, their adaptive OBE algorithm is obtained through modifying D/ H OBE algorithm to include the resetting of n,,_1 to a value greater than 1 whenever m, < 0. Resetting n,,-1 is equivalent to expanding the ellipsoid. In [11, 37], Deller et al. basically proposed three methods, windowing, graceful forgetting and selective forgetting, for modifying OBE algorithms to be adaptive to time-varying parameters. The basic spirit of those modifications is the various ma- nipulations of forgetting factors (data weightings) similar to those of adaptive RLS (or WRLS). Among those methods, it is observed in [11, 37] that the best policy regarding tracking performance is the selective forgetting method which back-rotates previously accepted data sets when a,, < 0 until a,, > 0. This is equivalent to the expansion of the ellipsoid according to a systematic criterion. In this dissertation, an adaptive Sub-OBE—ABE algorithm is presented by modify- ing Sub-OBE-ABE to include the resetting of n,, = 0.1 whenever m, < 0. Despite its simplicity and computational efficiency, the adaptive Sub-OBE-ABE algorithm has shown excellent tracking performance in both slow and fast time-varying parameter cases (see Chap. 4). The steps of adaptive Sub-OBEABE algorithm are shown in Table 2.3. 18 I. Initialization: I. 00:0,no=p,do=;1;,andPo=:l;-I, where p is a small number, typically 10"“. 2. 70 2 any overestimated bound. 3. Choose 6 (small positive number), M and N (see Remark 2 following Theorem 3.3). II. Recursion: For n = 1 : N Ha<0 Execute recursions Gn : ann—l xn 6,, = y,, — 0:439, 1 flnPn—lxnanPn—l Pu = —' Pn- _ afll 1 an + 31101; ] on = on-l + flnpnxnsn anfln63, Km — OnKn—l + 57.77; - m dn = andn-l + flnxzxn- If 15,, < 0, set 51,, = 0.1. Otherwise, If a time interval I of length M over which E,, Z 0 is found, ,7 ={ 7n—1-dJ, ifd.I>0 7,,-1 , otherwise, where d; 212 nJ_1x§xJ/(dJ-1m)—e(2,/7,,..1 —e) and J = arg maxflel 6:. Table 2.3: The Adaptive Sub-OBE—ABE algorithm. 19 Chapter 3 Convergence Analysis 3.1 Introduction This chapter provides proofs of a.s. convergence and p. convergence of OBE-ABE and Sub-OBE-ABE in the cases of iid, mixing, ergodic, and non-stationary noises. Throughout this chapter, the model considered is a stable ARX model y. = 03x. + v. (3.1) in which 0.. = [a1., - . - ,ap., b0.., b1.., - - - , b,.] is the unknown parameter vector to be identified; {x,,} = [y,,_1, - - - , y,,.,,, a,,, - - - , u,,_,,] is a sequence of observable data sets composed of the observable input sequence {u,,} and output sequence {y,,}. The dimension of {x,,} is m = p+q+ 1. The unobservable noise (or model-error) sequence {v,,} is assumed to be asymmetrically bounded with unknown least upper bound fl or greatest lower bound —\/7—... for all n. The results of this chapter are generalizable to cases in which 3],, and v,, are vectors (e.g., [11, 37]). For stochastic analysis, v,,, u,,, and y,,, etc., are modeled as random variables (r.v.) defined on a probability space (0,.7, P) where Q is a sample space, .7: a o-field, and 20 P a probability measure. Clearly, for the ARX model, if we let In déf a{vma um+1,m S n}, then x,, is fad-measurable. Note that the usual assumptions of WSS, SME, and whiteness of {v,,} which are essential for the convergence of RLS, LMS, IV, etc., are not needed for OBE-ABE and its variants. 3.2 Definitions For the concise statement and analysis of new theorems, the following definitions are introduced. Most definitions are cited from the literature. The first two below define the convergence of a random sequence to a random variable. These two types of convergence are treated in this dissertation. Definition 3.1 [17, 42]. An estimator 0,, is called p. consistent, or p. convergent (convergent in probability) if for all e > 0, £1310 P(||0,, — 0..” > e) = 0 (3.2) where I] - || denotes any valid norm. Note that p. convergence implies (is stronger than) weak convergence or, alternatively, convergence in distribution. Definition 3.2 [17, 42]. An estimator 0,, is called a.s. consistent, or a.s. convergent ( convergent almost surely) if P( lim 0,, = 0...) = 1. (3.3) ”#00 21 Note that (3.3) is equivalent to P(||6,, — 0...” > e i.o.) = 0, V e > 0 (3.4) in which 1.0. denotes infinitely often with respect to 11.. Comparing (3.2) with (3.4), it follows that a.s. convergence implies p. convergence. Equation (3.4) can be written as P(US.°=1 021m (Il9n - 9:” > 5)) = 0, V 6 > 0 (3-5) or equivalently, 19931330 nggmuw, — o.“ > 6)) = o v e > o. (3.6) Since the sets ng°=m(||9,, — 0...” > e) is increasing with m, (3.6) can be written as A1330 P(flf,°=m(||0,, — 0..” > 6)) = 0 V e > 0. (3.7) Although surely convergence [17] is theoretically stronger than a.s. convergence, the latter is regarded as the strongest type of convergence employed in practice (called strong convergence in some literature). Definition 3.3 A random sequence {v,,} is called asymptotically independent if 3112) |P((v,- E A)fl (v,+,, 6 B)) — P(v,~ E A) - P(v,-+,, E 3)] = 0, Vi andV A,B E 1:. For convenience, let us denote e-neighborhoods of noise bounds as D?=[fi-€,x/7:l 22 and D? = l‘fia-fi'l’fl- The following definition is seen in [44] which is a sufficient condition employed in their convergence analysis of the EPA algorithm. Definition 3.4 [44]. A random sequence {v,,} is called uniformly conditionally heav- ily tailed (UCHT) if there exist Cl > 02 > 0 and an infinite subsequence {t,-} C N, such that, with any sufliciently small 6 > 0, C, e S P(v,, E (D: U 0:) I}— -1) S C26 a.s. Vn 6 {t,}. (3.8) Definition 3.5 A random sequence {v,,} is called uniformly conditionally tailed (UCT) if given 6 > 0, there exist a 6 > 0 and an infinite subsequence {t,} C N, such that P(v,, E (D: U D?) [7,,4) > 6 a.s. Vn 6 {t,}. (3.9) Note that, throughout the dissertation, the assumption that {v,,} is asymmetri- cally bounded with the UCT condition means that either P(v,, E D: I 3,4) 2 6 a.s. and P(v,, 6 D: I f -1) = 0 a.s. 01‘ P(v,, E D: I .7: -1) = 0 a.s. and P(v,, E D: I .7: -1) Z 6 a.s. holds for all n. Also, the assumption that {v,,} is symmetrically bounded with UCT condition means that the following condition holds: P(v,, E D: I fad) Z 61 a.s. and P(v,, E D: I .7: -1) Z 62 a.s. 23 Definition 3.6 A random sequence {v,,} is called uniformly tailed (UT) if given 6 > 0, there exist a 6 > 0 and an infinite subsequence {t,} C N, such that P(v,, E (D;+ U D:)) > 6 a.s. Vn 6 {t,}. (3.10) Note that UT and UCT conditions are less restrictive than UCHT. For example, “triangle” or “bounded-Gaussian” distributed iid sequence satisfies UCT and UT conditions but not UCHT. The following definition of almost periodic function (a.p.) which is due to Bohr [6], and a lemma (Lemma A.3 in Appendix) which relates the type of random variable to an a.p. function are introduced for the proof of main theorems. Definition 3.7 [1, 6]. A continuous function (P(t) is almost periodic (a.p.) if given 6 > 0, there exists a length l(£) such that every interval of length l(e) contains at least a number 1', and |(t + r) — (t)| < e, Vt E R. The set of real numbers 1', {r},, defined above is called relatively dense [6]. Note that l(e) exists if and only if there do not exist arbitrarily large gaps among the numbers 1'. Clearly, a continuous periodic function is also a.p. For the following definitions and some lemmas in the Appendix, let K (34‘ N or Z, RK {l-Ef {(---,x1,x2,---):x,- E R, Vi E K}, and BK‘ié‘{(.--,A,,A,,m):A,es, Vie K} where B is the Borel set. 24 Definition 3.8 [17]. {X}, : k 6 K} is stationary if (---,X1,X2,"')g("',Xk,Xk+1,°°°), VkEK where 2 denotes equality in distribution. Definition 3.9 [17]. The measurable transformation T : Q —+ Q is called a measure preserving transformation (m.p.t.) ifP o T‘1 = P, i.e., P({w : T(w) E A}) = P(A). Definition 3.10 [17]. A set A E .7: is called invariant {or T-invariant) if A = T'1(A), i.e., A = {w : T(w) E A}. Definition 3.11 [17]. The invariant set I drc-f {A E f: T'1(A) = A] is called trivial ifP(A) 6 {0,1} VA 6 I. Definition 3.12 [17]. The m.p.t. T is called ergodic if I is trivial. Definition 3.13 [17]. The m.p.t. T is called mixing if ”113,10 |P(A n T‘"(B)) — P(A) - P(B)| = o v A,B e f. (3.11) Note that the analogous definitions of mixing and ergodicity for a stationary sequence can be infered from Definition 3.12, Definition 3.13, and Lemma A.5. By - Definition 3.13, a stationary sequence is mixing if and only if it is asymptotically independent. Also, by Lemma A.6, mixing implies ergodicity. 3.3 Persistency of excitation condition Persistency of excitation (PE) (defined below) is a necessary condition [5, 31, 42] for the convergence (consistency) of recursive algorithms such as LMS, RLS, and 25 their variants. For SM algorithms that assume bounded noise, PE is at least one of the sufficient conditions for convergence (see theorems in this chapter and [44]). Simulations even show [32] that PE is necessary for the consistency of the estimator of OBE algorithms. The following definitions of PE that appear in literature are rewritten and related. The first definition is the most general one employed in proofs of the new theorems. Definition 3.14 [44]. A sequence of random vectors, {x,,}, is called persistently exciting (PE)3 or omni-directional if for any nonsingular cone K = {X2X=0181 +"'+amem1 det(ela"'aem)7“0i ai >0,Vl}, there exist p1 and p2 such that liginghn E K [3.4) 2 p1 > 0 a.s., (3.12) and E(||x,,|] I}. -2) 2 p2 > 0 a.s. V n. (3.13) The condition (3.12) means that the orientation of x,, is sufficiently varied in a conditional-probability sense while the condition (3.13) means that the magnitude of x,, cannot be too small on average. These conditions imply the following conditions which constitute another PE definition frequently used in the convergence analysis of conventional LSE recursive algorithms. Definition 3.15 [31]. A sequence of random vectors, {x,,}, is called PE if there exist p1 and p2 such that liflgfiEknxz m4) 2 p11 > o a.s. (3.14) 3Please read “PE” as “persistency of excitation” or “persistently exciting” as appropriate in context. 26 and E(||x,,|| If -2) 2 p2 > 0 a.s. V n. (3.15) Violation of (3.14) is effectively a.s. restriction of x,, to a proper subspace of R”. This justifies the assertion that (3.12) implies (3.14). Also note that the assumption of stability imposes an upper bound on (3.14). The fact that 1”3(1*3(XnX3.~ Via-2)) = 130904.”), leads to an equivalent definition of PE: Definition 3.16 A sequence of random vectors, {x,,}, is called PE if there exist p1 and p2 such that liflgfmxnxf) 2 ml > 0 a.s. (3.16) and E(||x,,|| I fn_2) 2 p2 > 0 a.s. V n. (3.17) Condition (3.16) means that the autocorrelation matrix of x,, is positive definite asymptotically. For the stationary case, it becomes positive definite for all n as in condition (3.18) below. Deterministic approaches to convergence analysis assume the stationarity and er- godicity (or at least WSS and SME) of the signals. In this case, Definition 3.16 is equivalent to the following. Definition 3.17 [5]. A stationary (or at least W55) sequence of random vectors, {x,,}, is called PE if there exist p1 and p2 such that E(x,,x:) 2 p11 > 0 a.s. V n (3.18) 27 and E(||x,,|| I]: -2) 2 p2 > 0 a.s. Vn. (3.19) By the Ergodic Theorem [17], Definition 3.17 is equivalent to the following. Definition 3.18 [5, 42]. A stationary and ergodic (or at least WSS and SME) se- quence of random vectors, {x,,}, is called PE if there exists an N1 6 N and p1, p2 > 0 such that for all n n+N1 Z XkaT 2 p11 > 0, (3.20) k=n+l and ”x,,” 2 p2 > 0 Vn. (3.21) Note that the stability assumption imposes an upper bound on (3.20). Using matrix algebra, Definition 3.18 can be shown to be equivalent to the following. Definition 3.19 [5]. A stationary and ergodic (or at least WSS and SME) sequence of random vectors, {x,,}, is called PE if for any unit vector e E R'”, and for all n, there exist N1 6 N and p1, p2 > 0, all independent of e, such that n+N1 Z ||xfe|| 2 p, > 0, (3.22) k=n+1 and “x,,” 2 p, > o v n. (3.23) 28 3.4 Lemmas The following lemmas and those in the Appendix (cited from the literature) are essential for the proofs of the main theorems in this work. For convenience, define c,,, ifc,,<0 0 , otherwise. c,,, if c,,ZO =+ 0 , otherwise. where c,, is the check for innovation (2.10). Then, by Lemma A.2, lirn,,_.,,o c; = 0. However, since c,, is a random variable, lim,,_.co c; = 0 does not imply lim,,_..,.o P(c,, < 0) = 0. For example, let 1 , with probability 0.5 Cu: —1/n , with probability 0.5. Then, lim,,_.°° c; = 0 while lim,,_.oo P(c,, < 0) = 0.5. Hence, the following lemmas are needed for the proof of Lemma 3.3 which, in turn, is essential for the proofs of the main theorems. Lemma 3.1 Let X,Y be random variables. If X is continuously distributed, then X + Y is continuously distributed. If, in addition, Y 7’: 0, then X Y is continuously distributed. Proof: Let Z = X + Y, then, 13(2): °°fxy(x.z—x)dx= ”insane—mien. _m -00 29 where f denotes the various probability density functions. By assumption, f x(x) does not contain a Dirac delta function, so fz(z) does not. Hence, X + Y is continuous. Let W = XY, then, °° 1 w °° 1 w w fw(w) = /_00 mfxr (5,31) 613/ = [00 mfx (y) fylx (31,?)‘131- Since Y 75 0 and f x(-‘;—’) does not contain a Dirac delta function, it follows that fw(w) does not contain a Dirac delta function. Hence, X Y is continuous. Cl Lemma 3.2 Assume that model (3.1) is a stable ARX model, and {u,,} and {v,,} are bounded. If both {u,,} and {v,,} are asymptotically independent sequences, and {u,,} is independent of {v,,}, then y,, converges to a continuous random variable as 12—900. Proof: First, rewrite (3.1) as P 9 ya — Z ajyn—j = Z bjun—j + v,,. (3.24) i=1 i=0 Let 9 V,, d—i-f z b,u,,_j + v,,. i=0 Note that {V,,} is an asymptotically independent random sequence. Let h,, be the discrete-time impluse response of the system (3.24), i.e., y,, = h,,, if V,, is a Kronecker delta function. Then, .21. = Z hm-.. (3.25) i=0 By the Lebesgue Decomposition Theorem [17], a random variable can be decomposed into a discrete part and a continuous part. However, by (3.25) and Lemma 3.1, y,, is continuous if K, is continuous. Hence, V,, can be assumed discrete for the remainder 30 of the proof. That is, assume P(Vn=qk)=pk>0, k=1,2,...,K (3.26) where q), is bounded since V,, is bounded by assumption. Let v,,(t) denote the characteristic function of V,,. Then, from (3.26), K V,,(t) = Zpkeltq", where i = \/—1. k=l Hence K ‘I’h1v.-.(t) = 2 me“""’*- (327) k=l Note that elm?" is a periodic function of t with primary period 7',- = 27r/hj9k for all j,lc, in which hj 96 0 and q), 7t 0. Since h,- is a stable infinite impulse response, it follows that limjnoo hj = 0. Hence, given 61 > 0,'there exists an N such that for all j > N, [th < 61 and there existsj > Nsuch that [th > 0. Hence, r,, = 27r/h,,q,, —+ 00 as n —+ 00. This means that as n —i 00, the set {7'}, of hjvn_,.(t) is not relatively dense. That is, there does not exist an l(e) for (hit/M, (t) to satisfy Definition 3.7 as n —1 00. Hence, ;,,v,,_,.(t) is a non-a.p. function as n —) 00. For finite n, (P,,, is a.p. by Lemma A.3 since y,, is discrete by (3.25). Hence, (1),,” is non-a.p. as n —1 00 by the asymptotic independence of {V,,}. Thus, by Lemma A.3, y,, is continuous as n -» 00. [3 Note that, by Lemma 3.1, if {u,,} or {v,,} or both are continuously distributed, the assumptions that {u,,} and {v,,} are asymptotically independent sequences and the independence between {u,,} and {v,,} are not required in the above lemma. The following lemma asserts that if the noise bounds are overestimated, then the time between updates becomes infinite as n —+ oo. 31 Lemma 3.3 Assume that model (3.1) is a stable ARX model, and {u,,} and {v,,} are bounded. If PE holds, both {u,,} and {v,,} are asymptotically independent sequences, {a,,} is independent of {v,,}, and the noise bounds are overestimated for OBE al- gorithms, i.e., if there exists an 61 > 0 and an N E N such that for all n > N, 7,, -— v,,2 > 61, then the expected updating interval of the estimator 0,, approaches infinity as n approaches infinity. Proof: Let i, = 0,, — 0.. From (2.2) and (2.5), 7.. - 63, = 7.. - (yn - 95.1%)” = 7.. - (vn - 51.13%)? Hence, from (2.10), ~ on : m7,, — 777,5?1 — “in—1G1; = m7,, — m(v,, - 0£_1Xn)2 — E,,_IG,,. (3.28) Let ¢n ‘2‘ 171(1),, ‘- é:_1xn)2 + Kn_1Gn Z 0. (3.29) From (3.28), c,, = m7,, — d,,. (3.30) From Lemma A.2, it follows that 1133ng c,, = liflglfic: + c; ) 2 0. This means that, for any 6 > 0, there exists an N such that for all n > N, c,, > —me. (3.31) 32 It follows from (3.30) and (3.31) that, for all n > N, (h,, < m(7,, + 6). (3.32) From (3.30) and (3.32), it follows that P(c,, < 0) = P(m7,, < 45,, < m(7,, + 6)) V n > N. (3.33) This probability approaches zero as 6 —> 0, hence as n —+ 00. To prove this, first note that, as n --1 00, y,, in (3.1) is continuous by Lemma 3.2. From (2.4), P q _ T _ 2 2 an — xn Pn-lxn — Z Vn,iyn_,‘ + Z Vn,i+p+l a,,—i i=1 i=0 where V,,,,- > 0, Vi, are the eigenvalues of the positive-definite matrix P,,-l. Hence, by Lemma 3.1, G,, is continuous as n -—1 00. Also, from Lemma 3.7 (to follow), 16,, does not converge to zero since 7,, are overestimated V n > N by assumption. Thus, it follows from Lemma 3.1 that x,,-lGn is continuous as n -+ 00. Finally, from (3.29) and Lemma 3.1, 45,, is continuous as n —* 00. This proves that the probability in (3.33) approaches zero as n -> 00. Since an OBE algorithm updates its estimator only when c,, < 0, the probability of an update diminishes as n —+ 00. This implies that the expected updating interval approaches infinity as time increases indefinitely. Ci Note that, if {u,,} or {v,,} or both are continuously distributed, the assumptions that {u,,} and {v,,} are asymptotically independent sequences and the independence between {u,,} and {v,,} are not required in the above lemma. The following lemma asserts that the sum of two independent, ergodic, stationary sequences is stationary and ergodic. 33 Lemma 3.4 If both {X}, : k E K} and {Y}, : k E K} are ergodic, stationary sequences, and {X}, : k G K} is independent of {Y1c : k 6 K}, then {Z}, = X), + Y), : k E K} is also stationary and ergodic. Proof: Since {Xn} and {Y,,} are stationary, by Definition 3.8, there exist sets A, B, C, and D such that A = {L02(°°°,X_1,X0,X1,"°)EC} (3.34) = {w i ( ° ° 1Xn—11Xn1Xn+la' ' °) 6 C}, (3.35) and B : {W:('”1Y-11)/01Y11”')€D} (3.36) = {wz(---,Y.-1,Y.,Yn+1,---)eD}. (3.37) Hence, A and B are invariant sets by Definition 3.10. Also, by the ergodicity of {Xn} and {Y,,}, Definition 3.11, and Definition 3.12, P(A) 6 {0,1} and P(B) 6 {0,1}. By (3.34) and (3.36), let F be such that F=AnB={w:(°°°1X—17X01X11°°°)ecn('°'1Y-11Y()9Yla”')€D} Then, there exists an E such that F = {WIW'(,(X_1+Y_1),(Xo+I/o),(X1+Y1),')€E} {w:(---,Z_1,Zo,Zl,---)€ E}. (3.38) 34 Similarly, by (3.35) and (3.37), F = {w : ( . - ,Z,,_,,Z,,,Z,,+1, - - ) E E}. (3.39) Thus, by (3.38), (3.39), and Definition 3.10, F is shift-invariant with respective to {Zn}. Also, by the independence of {X,,} and {Y,,}, P(F) = P(A n B) = P(A) - P(B) 6 {0,1}. Hence, by Definition 3.11, F is trivial. Therefore, by Definition 3.12, {Zn} is station- ary and ergodic. D 3.5 Almost Sure Convergence of OBE-ABE In this section, the theorems of a.s. convergence of the OBE-ABE algorithm are introduced and proven in iid, mixing, and ergodic noise cases. For a.s. convergence, both the input sequence {u,,} and the noise sequence {v,,} are assumed stationary, and {u,,} is assumed independent of {v,,}. However, for p. convergence, the stationarity of {u,,} and {v,,} are not required. Further, if {u,,} or {v,,} or both are continuously distributed, the conditions for a.s. and p. convergence can be further relaxed. The following is the main theorem for a.s. convergence in the mixing case. Theorem 3.1 Assume that the stationary sequences {u,,} and {v,,} are independent. If PE holds, UCT holds, and {u,,} and {v,,} are mixing, then the estimator of the OBE-ABE algorithm is a.s. convergent. 35 Proof: Since PE (3.12) and UCT (3.9) hold, there exists an infinite subsequence {t,~} E Z such that for all n 6 {t,-}, (3.9) and P(Xn E K If -2) 2 p1 > 0 a.s., (3.40) hold where K is defined in (3.12). Throughout the proof, all 11 considered are in {t,-}. For convenience, let {n} replace {t,} as the time coordinate in the proof. The meaning of time intervals, for example, should change accordingly. Let {v,, : It 6 Z} denote the noise sequence and {u}, : k E Z} the input sequence of model (3.1). Rewrite (3.1) as P 9 yn _ z ajyn-j = Z bjun—j + ”11- (3.41) j=1 j=0 Let (I f (1 w,, é Zb,u,,_,-, (3.42) i=0 and V,, déf w,, + v,,. Then, (3.41) becomes .21.: Z h..-,V,- (3.43) j=-oo where h,, is the infinite impulse response, y,,, if V,, is set to a Kronecker delta function. Note that, by Lemma A.7, {w,,} is stationary and ergodic. Also, by Lemma 3.4, {V,,} is stationary and ergodic. Let {1),} be a sequence of time intervals of length M over which the OBE—ABE algorithm does not update its estimator. Then, for each k = 1, 2, ' - - , K, «9 “-—°=‘ [9(1) 0(2) o1 36 is a constant vector for all n in each 1),. Hence, by (2.5), 5n = yn- 0,,—1x11 = yn-29(1)1)yn-1—— 29(1)!)3/11—1, (3-44) 1:] i=0 where 0(0)dé 1. From (3. 43) and (3.44), it follows that 11—1 =2 2 9( hn-e-jV 1:0 J—-OO By changing the order of summations, this equation becomes, for all n > m, ”-1 6,, = Z V Z0(i)h,,_,-_, +712? V Z0(i)h,,_,-_,-. j=n—m+1 i=0 j=—oo i=0 Letting p = n — j, 171-1 9 co m m—l oo = 2 v.-. mans 5: v.-. 20101.-.- dé‘ >2 v._,.,,+ 2 vam- (3.45) :0 P=0 i=0 1):"! i=0 p=m Note that both g1 and g; are independent of 11. Hence, (3.45) can be written as 5n = 9(Vn1 Vfl-l, Via—21 ° ° ) (346) where g : RK —1 R is measurable and independent of 71. Hence, by Lemma A.7, {a,,} is stationary and ergodic as M —> 00. Now, for each I: = 1,2, - - - ,K, it follows from (2.10) that 611 = m(7n —' 53,) — "11-1071 2 01 V". 6 LP (3'47) Also from (2.5) and letting 0,, déf 0,, — 0,,, e: = (y. - 031m)” = (... - oils.) . (3.48) Let A; be the event that v,, 6 D; and 011x” > 0. Let A; be the event that v,, 6 D2“ and 0: 13% < 0. Also, let A,, (ETC-f A; U A; and denote by Af, the complement of A,,. Also note that 054x“ is fn-1-measurable. Let 3;. denote the event that A,, occurs at least once in 1).. Then, for all n 6 [1.4.1, B). E 7,,_2. Hence, it follows from (3.40) that P(0Z_,x,, Z 0 IBI.) 2 p1 > p > 0 (3.49) and P(i;_,x,, < 0 |3,.) 2 p, > p > 0. (3.50) Since B]. E .7: -2 C 7""-1 and 0£_1xn is fn_1-measurable, it follows from UCT (3.9) that P(v,, e (3,+ u 3;) |5§_,x,, 2 0, 3,.) 2 6, > g > 0 (3.51) and ~ 5 P(v,, e (D: u 3;) |93,'_,x.. < 0,3,.) 2 a, > 3 > 0. (3.52) Since A; and A; are mutually exclusive for sufficiently small 6, it follows from (3.49), (3.50), (3.51), (3.52), and the notes below (3.9) that P(A. (3,.) = P(A; (3,.) + P(A; IB.) = P(v,, e 3301,11,, 2 0,3,.) - P(5§_,x, 2 0 |3,.) +P(v,, e Df|5§_,x,, < 0,3,.) . P(6§_,x, < 0 |3,.) > P(v,, e 3301,11,, 2 0, 3,.) - p + P(v,, e D:|5§_,x, < 0,3,.) . p > 6 > 0. (3.53) 38 Then, by (3.53) and the definition of A,,, it follows that P(lenl > ,/7—.— c |B,.) 2 P(An |B,.) > 6 > 0. (3.54) Thus, by the Poincaré Recurrence Theorem (Theorem A.1) and (3.54), P(|6,,| > 7.. — 61.0. |B;.) = l as M —-1 00. It follows that, for sufficiently large M and conditionally on Bk-“ the following inequality holds with probability 1: max 63, 2 (J7: — 6)2 = 7.. — 6(2\/:7:— 6) _>_ 7.. — 6(2\/7',: — 6). (3.55) 7161), Now, let J). = arg max;e 1,. 62 n. Then, by letting it = Jk, (3.47) can be written as (replacing J). by J for simplicity) 7,; — 53 — nJ_1GJ/m 2 0. (3.56) By (3.55) and (3.56), 7.1 — 7:- 2 KJ-lGJ/m — e(2\/7,,—_f — €)- (3.57) Let d,, ill-5‘ nJ_1GJ/m — 6(2,/'7,,—_1 — 6). (3.58) Hence, by (3.57), 7,, is updated (whenever an interval 11, is found at time n) by the recursion ,_ —d, ifd >0 7,,: l 1 J J , (3.59) 7,,-1 , otherwise 39 is an upper bound of 7., a.s. for each 11:. Note that PE and the mixing of {u,,} and {v,,} implies that, for overestimated 7,,, Lemma 3.2 and then Lemma 3.3 hold. Hence, intervals I), exist i.o. It follows that, with 6 —> 0, (3.59) and Lemma 3.7 imply that 7,, would continue to be updated by the OBE-ABE algorithm until (1] —-1 0 and 7,, —1 7,. as n —-+ 00. Thus, 11.] —> 0 when 6 —+ 0 since 0,, > 0 due to PE and the stability assumption. Hence, 11,, —> 0 due to its non-increasing property. Therefore, P(w : ”111210 0,, = 0,.) = 1. This completes the proof of a.s. convergence. E] For p. convergence, the mixing assumptions of both {u,,} and {v,,} in the above theorem can be relaxed to the requirements that both {u,,} and {v,,} are asymptoti- cally independent sequences (see Theorem 3.3). Further, if {u,,} or {v,,} or both are continuously-distributed random sequences, then the mixing condition in the theo- rem above can be relaxed to an ergodic condition for a.s. convergence. The following theorem validates this assertion. Theorem 3.2 Assume that the stationary sequences {u,,} and {v,,} are independent. If {u,,} or {v,,} or both are continuously-distributed random sequences, PE holds, UCT holds, and {u,,} and {v,,} are ergodic, then the estimator of the OBE-ABE algorithm is a.s. convergent. Proof: Continuously-distributed {v,,} or {u,,} or both implies that Lemmas 3.2 and 3.3 hold without the assumption that v,, and {u,,} are asymptotically independent sequences. The theorem is then proved by following the same steps as in the proof of Theorem 3.1. D 40 For p. convergence, the assumptions that {u,,} and {v,,} are independent and stationary, and that {u,,} and {v,,} are ergodic are not required (see Theorem 3.4). The following corollaries are iid noise cases for a.s. convergence. They are special cases of Theorem 3.1 and 3.2, respectively. Corollary 3.1 Assume that the stationary sequences {u,,} and {v,,} are independent. If PE holds, UT holds, {u,,} is mixing, and {v,,} is iid, then the estimator of the OBE- ABE algorithm is a.s. convergence. Proof: Since UT and UCT imply each other with the iid assumption, and iid implies mixing, a.s. convergence follows immediately from Theorem 3.1. D Corollary 3.2 Assume that the stationary sequences {u,,} and {v,,} are independent. If {v,,} or {u,,} or both are continuously-distributed random sequences, PE holds, UT holds, {u,,} is ergodic, and {v,,} is iid, then the estimator of the OBE-ABE algorithm is a.s. convergent. Proof: Since UT and UCT imply each other with the iid assumption, and iid implies ergodic, a.s. convergence follows immediately from Theorem 3.2. Ci The following corollaries assert the a.s. convergence of the OBE—ABE algorithm when the probability distribution of the noise sequence {v,,} is known. Corollary 3.3 Assume that the stationary sequences {u,,} and {v,,} are independent. If PE holds, {u,,} is ergodic, and {v,,} is iid with uniform distribution, then the estimator of the OBE-ABE algorithm is a.s. convergent. Proof: For a uniform distribution, P(v,, E (D: U D; )) Z 2%.; > 0. So, UT holds. By Corollary 3.2, the estimator is a.s. convergent. Ci 41 Corollary 3.4 Assume that the stationary sequences {u,,} and {v,,} are independent. If PE holds, {u,,} is mixing, and {v,,} is iid with binary Bernoulli distribution: P(v,, = W) > 6 or P(v,, = —\/'K) > 6, Vn, then the estimator of the OBE-ABE algorithm is a.s. convergence. Proof: Since UT holds in this case, a.s. convergence follows from Corollary 3.1. Ci Corollary 3.5 Assume that the stationary sequences {u,,} and {v,,} are independent. If PE holds, {u,,} is ergodic and continuously distributed, and {v,,} is iid with binary Bernoulli distribution: P(v,, = W) > 6 or P(v,, = —,/7:) > 6, Vn, then the estimator of the OBE-ABE algorithm is a.s. convergent. Proof: Since UT holds in this case, the a.s. convergence follows from Corollary 3.2. Ci Note that all the theorems and corollaries in this section are also valid for con- ventional OBE algorithms (and the EPA algorithm) with a known exact noise bound 7,. or —fl, because, for the OBE-ABE algorithm, 7,, —> 7... as n -—> 00 as seen in the proof of Theorem 3.1. 3.6 Convergence in Probability of OBE-ABE Although a.s. convergence implies p. convergence, the sufficient conditions for a.s. con- vergence can be relaxed for p. convergence. Specifically, in addition to other relax- ations in some cases, the stationarity of {u,,} and {v,,} are not required for p. con- vergence. The main theorem for p. convergence of OBE-ABE algorithm is in the following. Theorem 3.3 Assume that the sequences {v,,} and {u,,} are independent. If PE holds, UCT holds, and {u,,} and {v,,} are asymptotically independent sequences, then 42 the estimator of the OBE-ABE algorithm is p. convergent. Proof: Since PE (3.12) and UCT (3.9) hold, there exists an infinite subsequence {t,} 6 2 such that for all n E {t}, (3.9) and P(x,, 6 K If},_2) 2 p1 > 0 a.s., (3.60) hold where K is defined in (3.12). Throughout the proof, all n considered are in {lg}. For convenience, let {n} replace {t,-} as the time coordinate in the proof. The meaning of time intervals, for example, should change accordingly. Let {1).} be a sequence of time intervals of length M E N over which the OBE- ABE algorithm does not update its estimator. Then, for each k = 1,2,- - - ,K, it follows from (2.10) that c,, = m(7,, — 6,2,) — x,,_1G,, 2 0, Vn E 1).. (3.61) Also from (2.5), 2 ” 2 e. = (y. — 65.13.)? = (v. — 0.1.x.) . (3.62) Let A; be the event that v,, 6 D: and 011x“ 2 0, and A; be the event that v,, 6 Dj’ and 55.13% < 0. Also, let A,, 91—3! A; UA;, and denote by A5, the complement of A,,. Also note that 0Z_,x,, is .7:,,_1-measurable. Let B), denote the event that A,, occurs at least once in 1).. Then, for all n E 1).“, B), 6 .77.,_2. Hence, it follows from (3.60) that P(0;‘[_,x,, 2 0 IE.) 2 p1 > p > 0 (3.63) and P(0:_,xn < 0 [Bk) 2 p2 > p > 0. (3.64) 43 Since 3;. E 3.4 C .77 -1 and 034x" is fn_1-measurable, it follows from UCT (3.9) that - 6 P(v,, e (D: u 3;) |03,‘_,x,, 2 0, 3,.) 2 6, > 0 > 0 (3.65) and 6 P(v,, e (D: u D;- ) |0',,_,x,, < 0, 3,.) 2 6, > 0 > 0. (3.66) Since A; and A; are mutually exclusive for sufficiently small 6, it follows from (3.63), (3.64), (3.65), and (3.66) that P(An lBk) P(A; |3,.) + P(A; |3,.) = P(v,, e 1),-I013, 2 0,3,.) P(0’T _,x,, 2 0 |3,.) +P(v,, e D+|0’,,_,x,, < 0,3,.) - P(0,T_,x,, < 0 |3,.) > P(v,, e D;|03’,‘_,x,. 2 0, 3,.) . p + P(v,,e eD+|0T _,x,, < 0, 3,.)-p > (61 + 6,) p— dé‘ 6 > 0. (3.67) Hence, P(Af, |3,.)=1— P(A, |3,.) < 1 — 6. Similarly, since Af,_2 E .7",,_2, it follows that P(A:IA°-2,Bk) = 1- P(A IAi- 2,31) = 1 - [P(A :IAS.-2,BI=)+P(A.T|Af.-2,Bk)] = 1 - {P(v,, e D,- |0',,_,x,, 2 0, A;_,,3,.) - P(0,’-’,‘ _,x,, 2 0|A;_ 2,3.) + P(v,,e eD+|0T _<1x,,<,0 A,,_ 2,3.) P(0T _,x.. <0|A:,_ 2,3,.)} <1—6. Thus, P(A: fl Afi-Z '31:): P(Af,_2 lBkl' P(Ac lAn- 21 Bk) < (1 _ (5)2. 44 Similarly, P(A;|A;_,A° 3,.) < 1 — 6. 11-4 Hence, man/1230023180 = P(A° As.-. worms/11.05.38.) 71-2 < (1—6)2o(1—6)= (1 -6)3. By induction, (for notational simplicity, let M be even) P10032612. IBt) < (1— 0“”. Thus, P(B). | B)._1) = P(A,, occurs at least once in I). | B)._1) = P(UZIA, IBI._1) 1— P(niiiAi I 131.1) 2 1— P(0M/’A§.- I 131.1) i=1 > 1— (1 — 6)M/2. Note that by (3.62), the fact that A,, occurs at least once in 1;. implies that maxne 1,, 63, 2 (fl — (5)2. Hence, from (3.62) and (3.68), it follows that max 53, 2 (WT - 6)2 = 7. - 6(21/7—1 - 6) Z 7. - C(Zx/Vn-l - 6) 7161,; with probability 1 -— (1 - 6)M/2 ( > 0.99995 if 6 = 0.01 and M > 2000). (3.68) (3.69) In the case of iid noise and the assumption that {v,,} is independent of {u,,}, the conditional probabilities in the above derivation are not required and can be replaced with marginal probabilities. Hence, (3.69) holds with probability 1 - (l — 6)” in this case. 45 Let J). = arg maxneh 6,2,. Then, by letting it 2 J1, (3.61) can be written as (replacing J). by J for simplicity.) 7,1 — 63 — nJ_lGJ/m Z 0. (3.70) By (3.69) and (3.70), '71 - ’7:- Z 1171—1 GJ/m — 6(2\/’7n—-1 — 6)- (3-71) Let 0, “é‘ ch_lGJ/m — ism/77.: — e). (3.72) Hence, by (3.71), 7,, is updated (whenever an interval 1;. is found at time n) by the recursion ,_ —d , ifd >0 7,,: 7 1 J J , (3.73) 7,,-1 , otherwise is an upper bound of 7,. with probability 1 — (1 — 6)le2 for each 10. Since the o-fields .77., increase with n, i.e., .'F,,_1 C f”, it follows from (3.68) that, for each It, P(Blek_1Bk_2 ' ° ° 81)) l — (I -' (”M/2. Hence, P(flf=,B,.) = P(Blek_lBk-2 - - - Bl) - P(B)._1|B;._2B;._3 - - - Bl) - - - P(leBl) - P(Bl) > [1 — (1 — 6)%]K. (3.74) The assumptions that PE holds, the independence between {u,,} and {v,,}, and that both {u,,} and {v,,} are asymptotically independent sequences, imply that, for 46 overestimated 7,,, Lemma 3.2 and then Lemma 3.3 hold. Hence, intervals I), exist i.o. It follows that with 6 -1 0, (3.73) and Lemma 3.7 imply that 7,, would continue to be updated by the OBE-ABE algorithm until d,, —1 0 and 7,, —> 7... as n -+ 00. Thus, 11,, —1 0 when 6 —> 0 since G,, > 0 due to PE and the stability assumption. Hence, 11,, —> 0 due to its non-increasing property. Note that K < N/M where N is the total “time” (number of data sets). Let M = [V2N] + l where [] denotes the integer part. Then, (3.74) becomes P(nf,‘=,3,.) > [1 — (1 — @313 2 [1 — (1 — 6)\/N/’]\/N/2 -» 1 as N ——1 00. This is verified by applying L’Hopital’s rule as shown below to find that (1 —q")" -+ 1, asn—+oo,where0§q<1: lim (1 — q ")” = nangoexp[nln(1—q”)] = exp[lim nln(1— q”)] q" l ,, "1n = exp(lim fl-1——2)--exp(11m-—l-q C] q) 11—900 n n-voo —-’;}- 2 2nlnq = exp 11m = exp 11m —— (ft-900 q‘nn _ ql) (11—100— _q-n lnq) 2n = ex 11m — -— ex 11m p(,,_,oo _q_,, ..) p(,,_,°, q-.. 111 q) = exp(0) = 1. Thus, 311% P(||0,, — 0,.” > 6) = 0. This completes the proof of p. convergence. Ci Remark 1: Theoretically, instead of the a priori knowledge of 7,. to achieve a.s. or p. convergence, the OBE-ABE algorithm requires only a lower bound, 6(6), of the 47 6-tail probability of v,, [see (3.10)]. Remark 2: For the application of the OBE-ABE algorithm, since N is finite, an 6 is first arbitrarily chosen to be a small positive number (111v decreases with 6). Given this 6, a 6 is estimated. Any lower bound satisfying UT (3.10) (in the iid case, for example) will work. Then, a large enough N and M (= [W] + 1, for example) are chosen so that the probability [1 - (1 - (5)1111]; is close to one. For example, if 6 = 0.1, choose M = 101 and N = 10,000, for example, to get the probability [1 - (1 —6)M]% > 0.997. If 6 = 0.02, choose M = 601 and N = 360,000, for example, to get the probability [1 — (1 —6)M]l‘l > 0.996. Since the probability [1 - (1 — 6)M]l§ is the most conservative value, if N is fixed in practice and is not too large (N < 10, 000, for example), M can be simply chosen large enough to make 1 - (1 — 6)“ close to one. If, in a real-world application, 6 is still difficult to estimated after choosing 6, then simply start the OBE-ABE algorithm with a small M (M = 20 for example) and then increase M after each trial if necessary. The negativity of 11,, can serve as an indication of a too small M. The algorithm is guaranteed to be p. convergent (with probability close to 1) if M is large enough when sufficient conditions (PE, UCT) are satisfied. If {u,,} or {v,,} or both are continuously-distributed random sequences, then the requirements that both {u,,} and {v,,} be asymptotically independent sequences, and that {u,,} be independent of {v,,} are not required for p. consistency. The following theorem validates this assertion. Theorem 3.4 If {u,,} or {v,,} or both are continuously-distributed random sequences, PE holds, and UCT holds, then the estimator of the OBE-ABE algorithm is p. con- vergent. 48 Proof: Continuously-distributed v,, implies that Lemmas 3.2 and 3.3 hold without the assumption that both {u,,} and {v,,} are asymptotically independent sequences, and that {u,,} is independent of {v,,}. The theorem is then proved by following the steps of the proof of Theorem 3.3. D The following corollaries involve iid noise cases for p. convergence. They are special cases of Theorems 3.3 and 3.4, respectively. Corollary 3.6 Assume that the sequences {u,,} and {v,,} are independent. If PE holds, UT holds, {u,,} is asymptotically independent, and {v,,} is iid, then the esti- mator of the OBE-ABE algorithm is a.s. convergent. Proof: Since UT and UCT imply each other with the iid assumption, and iid implies asymptotically independent, p. convergence follows immediately from Theorem 3.3. Ci Corollary 3.7 If {u,,} or {v,,} or both are continuously-distributed random sequences, PE holds, UT holds, and {v,,} is iid, then the estimator of the OBE-ABE algorithm is p. convergent. Proof: Since UT and UCT imply each other with the iid assumption, p. convergence follows immediately from Theorem 3.4. Ci The following corollaries assert the p. convergence of the OBE—ABE algorithm when the probability distribution of the noise sequence {v,,} is known. Corollary 3.8 If PE holds, and {v,,} is iid with uniform distribution, then the esti- mator of the OBE-ABE algorithm is p. convergent. Proof: For a uniform distribution, P(v,, 6 (Dj’ U D; )) 2 N67; > 0. So, UT holds. By Corollary (3.7), the estimator is p. convergent. D 49 Corollary 3.9 Assume that the sequences {u,,} and {v,,} are independent. If PE holds, {u,,} is asymptotically independent, and {v,,} is iid with binary Bernoulli dis- tribution: P(v,, = \/7_..) > 6 or P(v,, = —\/T) > 6, Vn, then the estimator of the OBE-ABE algorithm is a.s. convergent. Proof: Since UT holds in this case, the p. convergence follows from Corollary 3.6. CI Corollary 3.10 If PE holds, {u,,} is continuously distributed, and {v,,} is iid with binary Bernoulli distribution: P(v,, = ([77,) > 6 or P(v,, = —\/7_..) > 6, Vn, then the estimator of the OBE-ABE algorithm is p. convergent. Proof: Since UT holds in this case, the p. convergence follows from Corollary 3.7. D Note that all the theorems and corollaries in this section are also valid for con- ventional OBE algorithms (and the EPA algorithm) with a known exact noise bound 7.. or —fi, because, for the OBE-ABE algorithm, 7,, —+ 7,. as n —1 00 as seen in the proof of Theorem 3.3. 3.7 Almost Sure and Probability Convergence of Sub-OBE-ABE All the convergence theorems developed in previous sections are valid for Sub-OBE- ABE algorithm in various noise cases with the same sufficient conditions respectively. The sketch of proofs of all the theorems for Sub-OBE—ABE is in the following. The algorithmic steps of the algorithm are found in Table 2.3. For convenience, let us rewrite the optimal check coefficient 6,, suboptimal check coefficient E,,, and recursion formula for d,, here: c,, = m7,, — meg, -- 11,,_1G,, (3.75) 50 E,, = m7,, — m6: — 11,,_1x:x,,/d -1 (3.76) d,, = a,,d,,_1 + B,,xzxn. (3.77) Also, define c,, , if E,, < 0 E; = (3.78) 0 , otherwise. and Cn1ifén20 .4. cu 0 , otherwise. Hence, c,, = E; + E; . The following lemmas are needed for the proof of the main theorem. Lemma 3.5 E,, < 0 implies c,, < 0 Proof: Since the matrix P,, is real symmetric, 1 G1; = XiPn—lxn Z ijn(Pn-1 )szn = m xfx, (3.79) where me() and V,,m(-) denote the minimum and maximum eigenvalues, respec- tively. Also, as in (2.9) -1 -l T Pn = anPn_1 + IBanxno This implies tr(P;l) = o,tr(P;1,) + 0,1156... (3.80) Since P; l is positive definite, tr(Pr-11) Z Vmax(P;l) 51 That is, l l 6(1):) 3 4.403;!) (3'81) Comparing (3.80) and (3.77), we see that d,, = tr(P;1). Also, by (3.79) and (3.81), 0,, Z x,,xZ/dnd . (3.82) Hence, by (3.75), (3.76) and (3.82), the lemma is proved. . D Note that a version of Lemma A.2 is valid for Sub«OBE—ABE as is stated in Lemma 3.6 below. The proof is the same except that c; in the original proof [27] is replaced by E; . Lemma 3.6 For Sub-OBE-ABE algorithm, lim,,_.°° E; = 0, where E; is defined in (3.78). Theorem 3.5 All the convergence theorems of OBE-ABE are valid for Sub-OBE— ABE with the same respective conditions. Proof: That Lemma 3.3 is valid for Sub-OBE—ABE is shown by substituting c; and c; in the proof with E; and E; , respectively. Also note that P(En < 0) S P(c,, < 0) which is implied by Lemma 3.5. Since other lemmas (including Lemmas 3.5 and 3.6) which are necessary for the proof of the main theorems of Sub—OBE—ABE algorithm are not affected by the suboptimal check E,,, it follows that the proofs of convergence for OBE-ABE are also valid for Sub-OBE—ABE provided that c,, and G" in the proofs are changed to E,, and xzxn /d,,_1, respectively. D 52 3.8 A Necessary Condition All convergence theorems in the previous sections have two common sufficient condi- tions: PE and UCT (or UT in the iid case). PE has been shown to be a necessary condition [5, 31, 42] for the convergence of any LSE algorithm. Another necessary condition for the convergence of those algorithms is the whiteness of the noise se— quence. This is a logical result since white noise carries the least power of all WSS noises. SM algorithms are based on minimizing the feasible set (polytope or ellipsoid) according to assumed known noise bounds. Although some SM algorithms (especially OBEs) have similar recursion formulae to RLS, a LSE algorithm, the convergence of SM algorithms does not rely on the whiteness of the noise sequence. Even in the white noise case, the estimator of an OBE algorithm (i.e. the center of ellipsoid at each step) may be biased away from the true parameter vector due to the optimal data weighting process. This is reasonable since whiteness is an “average” property, not a pointwise property (e.g. UCT and UT are pointwise properties). If UCT or UT does not hold, then the ellipsoid of SM algorithms will not shrink to a point, hence convergence is not guaranteed. The following lemma which is extended from Lemma A.1 by considering an almost sure (a.s.) w-set in its proof, validates this assertion. Lemma 3.7 Assume that the PE (3.12) holds. If there exists an 6 > 0 and N E N, such that 7,, - v,,2 > 6, Vn > N, a.s., then the sequence of the ellipsoids of OBE algorithms does not asymptotically shrink to a point a.s. Theorem 3.6 Assume that PE holds. Then, UCT is a necessary condition for the sequence of the ellipsoids of any OBE algorithm to shrink to a point a.s. Proof: Suppose that UCT does not hold. Then, considering the notes below (3.9), 53 there exist an 6 > 0 and an N E N such that, for all n > N, P(v,, E Di]? -1) = 0 and P(v,, E D;|f' -1) = 0 a.s. Hence, 7,, — v3, > 6, Vn > N a.s. Thus, by Lemma 3.7, the sequence of the ellipsoids does not shrink to a point a.s. Cl 54 Chapter 4 Simulation Studies 4.1 Introduction In this chapter, the performance of each of the new algorithms is investigated through simulations. Although the algorithms have been proven to be convergent in iid, mixing, ergodic, and non-stationary noise cases in the previous chapter, simulations can provide other characteristics which are not explicitly inferred from theoretical analysis. For example, simulations can illustrate how the speed of convergence of OBE-ABE algorithm is affected by the ABE procedure. Simulations also reveal that the suboptimal check in Sub-OBE—ABE does not slow down the speed of convergence. This is not surprising since the the noise bounds estimated by the ABE procedure are not affected by the suboptimal check. Simulations also reveal the effect of asymmetric noise bounds on the performance of convergence. The version of OBE algorithms used in this dissertation is SM-SA algorithm due to its interpretable optimization criterion and numerical stability. Colored noise cases, in addition to iid cases, are also demonstrated. 55 4.2 IID Noise Cases In this section, the performance of OBE-ABE, Sub-OBE-ABE, and conventional OBE are compared in iid noise cases with symmetric and asymmetric bounds. Comparisons of performance of OBE-type algorithms with RLS in the iid case are found in [27]. Hence, they are omitted here. When compared to RLS, OBE algorithms have better speed of convergence in many noise cases. Throughout this section, an AR(3) model is simulated as follows: 311: : altyn—l + 02-3/n-2 + aSIUn-ZB + ”11 (4'1) where a,,, = 2,a2,. = —1.48,a3.. = 0.34 are unknown parameters to be identified. Three types of noise sequences {v,,} are employed to generate three observable out- put sequences {y,,}. Specifically, CASE 1: AR(3) model as (4.1). v,, ~ B(—1, 1) is iid and non-zero-mean with binary Bernoulli distribution. 1, with robabilit 0.7 p y (4.2) vn —1 , with probability 0.3. CASE 2: AR(3) model as (4.1). v,, ~ U (—1, 1) is uniformly distributed on [-1,l]. The following is a case of asymmetric noise bounds: CASE 3: AR(3) model as (4.1). v,, ~ B(—0.5, 1) is iid, non-zero—mean, asymmetri- 56 cally bounded with binary Bernoulli distribution. Specifically, I , with probability 0.7 v11 II A 1"“ C10 V —0.5 , with probability 0.3. The following case is constructed for the illustration of the final ellipsoid of the algo- rithm and the trajectory of the estimator. CASE 4: AR(2) model is constructed as follows: lln = —0.1yn—l + 0.5611,.-2 + ”71 (4.4) in which v,, is as in CASE 1, (4.2). Note that the noise bound (7.. = 1) and the non-zero mean of v,, are assumed unknown in these cases. 4.2.1 OBE-ABE vs. OBE In this subsection, OBE-ABE and OBE are compared with regard to the speed of convergence and computational efficiency using CASES 1 — 3. These results support the validity of the theorems in the previous chapter. For CASE 1, the simulation results of OBEABE (70 = 1.5, 6 = 0, and M = 20) and OBE (70 = 1.5) are shown in Figs. 4.1 - 4.3. As seen in the figures, neither estimator is affected by the non-zero mean of v,, and both algorithms converge to the true parameter (a,,. = 2). This supports the assertions of related theorems in Chap. 3. However, with the help of automatic bound estimation, OBE-ABE converges faster 57 (as seen in Fig. 4.1) than OBE. This result is not unexpected in light of Lemma 3.7 and related theorems in Chap. 3. For CASE 2, the simulation results of OBE-ABE (70 = 1.5, 6 = 0.05, and M = 50) and OBE (70 = 1.5) are shown in Figs. 4.4 - 4.6. The difference of convergence performance between OBE-ABE and OBE is similar to those of CASE 1. Further, a comparison between CASE 1 and CASE 2 shows that the speed of convergence of both algorithms is proportional to the 6-tail probability, 6, defined in UT (3.10) or UCT (3.9). For CASE 3, the simulation results of OBE-ABE (70 = 1.5, 6 = 0, and M = 70) and OBE (70 = 1.5) are shown in Figs. 4.7 — 4.9. As seen in the figures, the con- vergence of (overestimated) OBE is affected by the asymmetry of noise bounds while OBE-ABE converges consistently as estimated bounds 7,, automatically converge to the true bound (= 1). For CASE 4, the final ellipsoid of OBE-ABE ("10 = 1.5, 6 = 0, and M = 20) and the trajectory of the estimator after 1000 steps are shown Fig. 4.10. The trajectory of the estimator starts from the 17th step. The ellipsoid does not degenerate to a line segment. This conforms to the positiveness of P,, and 11,, under PE and stable assumptions. These simulations show that the number of updates for the estimator of OBE-ABE algorithm is smaller than that of the (overestimated bound) OBE algorithm if both algorithms terminate at same ellipsoidal volume. This means that OBE-ABE shrinks the volumes of ellipsoids more drastically than OBE each time that the updating occurs. The improved computational efficiency of OBE-ABE is not surprising since the ABE procedure, having trivial computational cost as seen in CASE 1 through CASE 3, blindly gets more correct information (noise bound) than those of OBE. 58 1.9, 1 1 1 1 1 1 1 1 1 200 400 800 1000 1200 1400 1600 1800 2000 Time(n) Figure 4.1: Estimators of a,,. = 2. CASE 1 : v,, ~ B(—1,1) is non-zero mean for model (4.1). -2 ‘0 :::::::::l:::::::::T‘::::::::Et::::::::i::::::::::V:::::::::T:::::::::T:::::::::!:::::::::l::::::::: ................................................................................................. ............................................................................................... ................................................................................................. ............................................... ...-.........1........,.........~-........~.....-.. . ............................................................................................. .................................................................................................... ................................................. ,....................A,........-...................q p........,.. ,1.. ........ ..................................................................... 1O .- .............................................................................................. 1 ...................._......... . ............... ................................................................................................... ................................................................................................. .................................................................................................. ..................................................................................................... . . . 1 . . . . . -5 . . . . . . ._ ................ . ........................................................................... 1O .... ......... a .................................................................................................... .................................................................................................. ................................................................................................... . . . . . . . ................... -.. ......1................._............................_ ............................................................................................ .................... .... .....-.-..........«.......1..'....,....r........q ................................................................................................... .................................................................................................... .................................................................................................. -7 : i E E Z 3 : . E 10 1- ............................ .......... - ........ ....................................................................................................... ........................................................................................................ 16- 11;; Figure 4.2: CASE 1 : v,, ~ B(—1, 1) is non-zero mean for model (4.1). 59 1.5 t i 1 t t i t i v 1.45b ., 1.41- .1 1,3. . 1.3- A 1.25- .1 1.2- ] 1.15- , 1.11 , 1.05.. .. 1 1 1 4 1 1_ 1 1 1 1 ‘— 200 400 000 800 1000 1200 1400 10m 1800 2000 Thom) Figure 4.3: CASE 1 : v,, ~ B(—1, 1) is non-zero mean for model (4.1). OBE 22 a. 7 T T 1 7 Y 1. 1 a, ................ ...... ........ 2 4' ' a 1.9 1.9 0 1.9 1.8 l l l LLL l l 100015002000250030003500400045005000 Figure 4.4: Estimators of a1... = 2. CASE 2 : v,, ~ U(-1, 1) for model (4.1). 60 10 I I I 1 v r J I l -------- ................................................. _............... .1... .. . W111... ................................................. 1 ................................................. ‘.............. ......... .......................4 ................................................................................ .4 ..................................................................................................... .............................................................. .4 ............................................................ 1 -1 ‘0 .................................................................. _ ...................................................................................................... ------------------------------------------------------------------------- l‘ ..................................................................... 4 .......................................................... 1 ........................................................... :11........'..........'..1.....11'1111-1...u p ........................................................ 1 ....................................... 1 ........................................................... '........... ................111111111111 1 ......................................................... . ................................. T . 1o ............................................................................................... .1 ..................................................... . ...... ............1........... ............................................. . . ..1 .............................................. .08...E .............................................. l ............................................................................................. q ....................................................... 1 5 ................................................. ................................ 4 L ................................................................... .- ........................................................... . ................................ 10 ......................................................... .............................. .1 .......................................................... _.............................~.......114 ............................................................ ,.~1.......,.......111,............1.......4 .................................................................................................... p. ..‘.. 1‘ p ................................................................................................... 5. ....................................................... « ....................................... OBE-ABE ............................................................................................. 1 10 p ........................................................ ' ...................................... u p ...................................................... ' ....................................... y ......................................................... . ..................................... { ........................................................................................... 4 p ......... ‘ .................................................. ‘ ............................ >1 .1 1...} ........................................... ......: ............................... P ............................. ' .................... ‘ .......... ‘ ................................. l l l l L l l 1 I Figure 4.5: CASE 2 : vn ~ U(—1, 1) for model (4.1). WWW 1.5 fi 1 a 1 1 1 1 f 1 1.45- - 1.4L J 135- ‘ 1.3- ‘ 125» 1 1.2'- W 1.15- - 1.11- L_—l__ J 1.05)- s 1 1 1 1 1 1 1 1 1 1———1 500 1000 1500 2000 2500 $00 3500 4000 4500 5000 Thom) Figure 4.6: CASE 2 : 12,. ~ U(—1,1) for model (4.1). 61 ;. A A g A; o 500100015002000250030003500400045005000 OBE-ABE ; P. . i . i ;; 500100015002000250030003500400045005000 Figure 4.7: Estimators of al. = 2. CASE 3 : 1),. ~ B(—0.5, 1), (4.3), for model (4.1). Volumes of Ellipsotds Figure 4.8: CASE 3 : v,, ~ B(-0.5,1), (4.3), for model (4.1). 62 Estimated noise bounds ‘.5 ‘H I I T T I I 1.45- 14* 1.35“ 1.3? 1.25? l 1.2- 1.1l' 1.05" 1 L A L L J n 1 H: 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Tlmo(n) Figure 4.9: CASE 3 : vn ~ B(—0.5, 1), (4.3), for model (4.1). 0.8 y- 0.6 - 0.4 ~ 0.2 - L 1 l l -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 A.1 Figure 4.10: CASE 4 : The final ellipsoid and the trajectory of the estimator. vn ~ B(-—l, 1), (4.2), for model (4.4). 63 4.2.2 Sub-OBE-ABE vs. OBE-ABE In this subsection, Sub-OBE—ABE and OBE-ABE are compared with regard to the speed of convergence and computational efficiency using CASES 1 and 2. These convergence results support the validity of the theorems in the previous chapter. For CASE 1, the simulation results of OBE-ABE (70 = 1.5, e = 0, and M = 20) and Sub-OBE-ABE (70 = 1.5, c = 0, and M = 20) are shown in Figs. 4.11 and 4.12. As seen in the figures, neither estimator is affected by the non-zero mean of v,,, and both algorithms converge to the true parameter (a1. = 2). In this case, Sub-OBE—ABE, which selects 18% of the data, has similar convergence performance to OBE-ABE which selects 50% of the data. This reveals that the excellent performance of ABE procedure is not affected by the suboptimal check. For CASE 2, the simulation results of OBE-ABE (70 = 1.5, c = 0.05, and M = 50) and Sub-OBE—ABE (‘70 = 1.5, c = 0.05, and M = 30) are shown in Figs. 4.13 and 4.14. As seen in the figures, neither estimator is affected by the non-zero mean of v,, and both algorithms converge to the true parameter (a1... = 2). Sub-OBE—ABE which selects very few (1.5%) data in this case has similar convergence performance to OBE-ABE which selects 5% of the data. However, both in CASEs 1 and 2, the speed of convergence of Sub-OBE—ABE in the beginning (e.g. first 150 steps) is slower than that of OBE—ABE. This is because, in that stage, Sub-OBE—ABE has big ellipsoids and is busy updating the estimator using the suboptimal check without updating the estimated noise bound using the ABE procedure. 4.3 Colored Noise Cases In this section, the performances of OBE—ABE, Sub-OBE-ABE, and conventional OBE are compared in colored noise cases with symmetric and asymmetric bounds. 64 2” f T v r Y 3 r .' r 2.“ ................ .......... .......... ...... .......... E ......... . ........ .1 2} u 1.“ ‘... . ...... .......... .......... ......... ......... :"""“ . ........ .. m l i i i . i i . . 200 400 000 1300 1WD 1200 1400 1000 1m 2000 Stb—OBE-ABE i 1 g 4 1 i i . 400 600 M 1m 1200 1400 IBM 1000 2000 "m(n) Figure 4.11: Estimators of a1... = 2. CASE 1 : 12,. ~ B(—1,l) is non-zero mean for model (4.1). L 10‘ L . t 102 . - r -— OBE-ABE 1 10° + 4 _ Sub-OBE—ABE ‘ 10 10" 10" 10" 10"0 l l 1 l l l l L l 200 400 000 000 1000 1200 1400 1000 1000 2000 Tina (n) Figure 4.12: CASE 1 : 0,, ~ B(—1, 1) is non-zero mean for model (4.1). 65 1.9 ' 1.8 10001500200025003000500400045005000 Timon) Figure 4.13: Estimators of al.. = 2. CASE 2 : 0,, ~ U(—1,1) for model (4.1). ‘0 I l I I V I j Y r 10" '4 > 4 ‘ 4 I 4 l L L 1 l l I 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (n) Figure 4.14: CASE 2 : 0,. ~ U(—1,1) for model (4.1). 66 Four colored noise cases are constructed as follows: CASE 5: AR(3) model as (4.1). v,, is a colored non-zero-mean noise sequence related to a colored sequence {wn} as follows: 1, ifwn>—1 ”n (4.5) —1 , otherwise in which the colored sequence {wn} is generated by a uniformly-distributed iid white noise sequence 2,. ~ U (—1, 1) as follow: wn = —0.8w,,..1 + Zn. CASE 6: v,, is the same as in CASE 5, (4.5), while the order of the AR model increases to 12. Specifically, a stable AR(12) model is employed in this case as follows: 311: = a1.y,._1 + 02-yn—2 + ° ' ° + a12.yn—12 + vn (4.6) where a1... = —O.l,a2.. = 0.9175, a3... = —0.191,a4... = —0.2253,a5. = 0.2601,a6... = 0.0046, a7... = —0.0367, as... = —0.0209,09. = —0.0082,a10... = 0.0095, (111... = -0.0052, and an»... = -0.0041 are unknown parameters to be identified. CASE 7: AR(3) model as (4.1). v,, is a colored non-zero—mean noise sequence with asymmetric bounds generated by a colored sequence {wn}, i.e., l , if w, > —1 v,, = (4.7) —0.5 , otherwise The following case is constructed for the illustration of the final ellipsoid of the algo- 67 rithm and the trajectory of the estimator in colored noise case. CASE 8: AR(2) model as in CASE 4, (4.4). v,, is as in CASE 5, (4.5). 4.3.1 OBE-ABE vs. OBE In this subsection, OBE—ABE and OBE are compared with regard to the speed of convergence and computational efficiency using CASES 5 — 7. These results support the validity of the theorems in the previous chapter. For CASE 5, the simulation results of OBE-ABE (70 = 1.5, e = 0, and M = 40) and OBE ('70 = 1.5) are shown in Figs. 4.15 — 4.17. A seen in the figures, neither estimator is affected by the color of v,, and both algorithms converge to the true parameter (a1. = 2). This supports the assertions of related theorems in Chap. 3. For CASE 6, the simulation results of OBE-ABE (70 = 1.5, c = 0, and M = 40) and OBE (70 = 1.5) in this case are shown in Figs. 4.18 — 4.20. A seen in the figures, the slow speed of convergence of OBE due to overestimated noise bound results in poor convergence, while OBE-ABE converges very well to the true parameter (0;. = —0.1). This shows that the excellent convergence performance is not affected by model order while (overestimated bound) OBE is. For CASE 7, the simulation results of OBE-ABE (70 = 1.5, c = 0, and M = 250) and OBE (70 = 1.5) are shown in Figs. 4.21 - 4.23. As seen in the figures, the con- vergence of (overestimated) OBE is affected by the asymmetry of noise bounds while OBE-ABE converges consistently as estimated bounds 7,, automatically converge to the true bound. For CASE 8, the final ellipsoid of OBE—ABE (70 = 1.5, e = 0, and M = 40) and the trajectory of the estimator after 1000 steps are shown in Fig. 4.10. The trajectory 68 2.1 T fl 2.05-- 2 1.95 1.9 1 i i A. L i j i L 200 400 600 ”0 1M 1200 1400 1000 1&0 m OBE-ABE 2.1 ‘ F ' r # 2.x ' i" .1 1.05 ..................................................... . 1.9 1 l L L L L . l 1 200 400 000 MO 1WD 1200 1400 1000 1m 2000 TImom) Figure 4.15: Estimators of a1. = 2. CASE 5: v,, is colored in (4.5) for model (4.1). of the estimator starts from the 17th step. The ellipsoid does not degenerate to a line segment. This conforms to the positiveness of P,, and 10,, under PE and stable assumptions. 4.3.2 Sub-OBE-ABE vs. OBE-ABE In this subsection, Sub-OBE—ABE and OBE-ABE are compared with regard to the speed of convergence and computational efficiency using CASES 5 and 6. These results support the validity of the theorems in the previous chapter. For CASE 5, the Simulation results of OBE—ABE (70 = 1.5, e = 0, and M = 20) and Sub-OBE—ABE (70 = 1.5, e = 0, and M = 20) are Shown in Figs. 4.25 and 4.26. AS seen in the figures, neither estimator is affected by the color and non-zero mean of v,, and both algorithms converge to the true parameter (al.. = 2). This supports the assertions of related theorems in Chap. 3. In this case, Sub-OBE-ABE, which selects 15% of the data, has similar convergence performance to OBE—ABE which selects 45% of the data. This reveals that the 69 :mo am: an two «no 12m) 14» “no 1am) 2am Tim. (0) Figure 4.16: CASE 5 : v,, is colored in (4.5) for model (4.1). Smudnobobounds 1.5 u 1 u 1 u v r r 1 1.45- - 1.4“ - 1.35- -' 1.3» ~ 1.25- - 12*- . 1.15b .4 1.1? . 1.05- i l—‘ 'saaa.a.a............m 11111an) Figure 4.17: CASE 5 : v,, is colored in (4.5) for model (4.1). 70 -0.1' ........ ...... u u .l n . . . . 0‘5. ............. . ,. -.. .....,.........‘ I 1 . . . . . . . . . '02 200400 000 000100012001400100010002000 "110(0) Figure 4.18: Estimators of a]... = —0.1. CASE 6: v,, is colored in (4.5) for model (4.6). 10 10 10‘”... 200400600800100012001400100018002000 111110.01) -1: ‘. 10 l Figure 4.19: CASE 6 : v,, is colored in (4.5) for model (4.6). 71 1.5 x v r . . I t 1 1 1.45- d 1.4'- 4 1.35- r 1.3- ~ 1.25- 4 1.2». .. 1.15- . 1.1- . 1.05- . 1 1 . 0200400600800100012001400160018002000 Tlmo(n) Figure 4.20: CASE 6 : vfl is colored in (4.5) for model (4.6). 0.5 1 1.5 2 2.5 3 The (n) Figure 4.21: Estimators of al.. = 2. CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7), for model (4.1). 72 ................................................................................................... ..o-u..........._.........-.....~,...-~............». ............. .................................. .............................................................................................. ........................... ‘04 HHHHIHHH'HE!!!!!!!!!!H!:!!!!!"'"H""E'HH'"l'"NH"!'!!'!!!'.!'!!‘!E!'H!!”WHEEL"! Figure 4.22: CASE 7 : Colored noise 12,; ~ B(—0.5, 1), (4.7), for model (4.1). Elam-bambobomds ‘ i i i 1.5 Tm (n) Figure 4.23: CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7), for model (4.1). 73 0.8 '- 0.6 '- OA '- 0.2 '- -o.a ~ « -1 g L I L l 41 -1 -0.8 —0.6 -O.4 -0.2 0 0.2 0.4 0.6 0.8 1 a_1 Figure 4.24: CASE 8 : The final ellipsoid and the trajectory of the estimator. 22,, ~ B(—1,1), (4.5), for model (4.4). excellent performance of the ABE procedure is not affected by the suboptimal check. For CASE 6, the simulation results of OBE-ABE (’70 = 1.5, c = 0, and M = 30) and Sub-OBE-ABE (70 = 1.5, e = 0, and M = 30) are shown in Figs. 4.27 and 4.28. As seen in the figures, neither estimator is affected by the non-zero mean of v,, and both algorithms converge to the true parameter (a1. = 2). However, Sub-OBE-ABE, which selects very few (1.5%) data in this higher order case, has slower speed of convergence in 2000 steps than OBE-ABE which selects 5% of the data. 4.4 Tracking Performance of Adaptive Sub-OBE- ABE In this section, tracking performance of Sub-OBE-ABE algorithm is illustrated through study of two examples of time-varying systems: one gradually changing, the other abruptly changing. Specifically, an AR(2) model with time-varying parameters is 74 2.‘ 1 I T f T T 1“ ................................................................. T ‘3 1 l 1 L l Stb-OBE-ABE 2.1 I I I 7 I I I I I I 2.05--.. . ,.,..; .4 ........ . J'l 3 ' ; i : : : : : 2 . l ‘5 .. .:. .1 ‘n L 1 , , 4 L 41 1 n i " zoo 400 000 000 1000 1200 1400 woo 1000 2000 M(n) Figure 4.25: Estimators of al.. = 2. CASE 5: v,, is colored in (4.5) for model (4.1). -..—wry ....A o . 10 r 1 10" r -— OBE-ABE ‘ ZOOMGOONOIWQOOMWWWIWZWO Thom) Figure 4.26: CASE 5 : v,, is colored in (4.5) for model (4.1). 75 . . . . . . '. o - l n . I a I D . . . . . . . . . . . . _O‘5 V... ,.... .......... . ............ . ..................................... .1 - . . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200400600800100012001400160018002000 Figure 4.27: Estimators of a1... = -0.1. CASE 6: v,, is colored in (4.5) for model (4.6). l MMBOOEJO1W1ZW14M1BOO1MOZOW 10 Figure 4.28: CASE 6: v,, is colored in (4.5) for model (4.6). 76 constructed as follows: yn = altyn—l + aZtyn—Z + ”71 (4.8) in which a2. = —0.68 and a1... varies between —1.6 and 1.6 (the same as in [11]). This is equivalent to varying the system’s conjugate poles 0.8 :i: j0.2 to and from —O.8 :l: 10.2. The variations of al.. (abruptly or gradually) are shown as dashed lines in Figs. 4.31 and 4.39. The noise sequence {v,,} is iid and uniformly distributed on [-1,l]. Although “non-adaptive” OBE algorithms are well-known to have good tracking capability in slowly time-varying systems, they eventually lose tracking capability. This result is shown in Fig. 4.29 and 4.30 (7,, = 1.5). On the other hand, the adaptive Sub-OBE-ABE algorithm ('70 = 1.5, e = 0.02, M = 70), which selects 4.5% and 3% (p in each figure) of the data, respectively, keeps track of the varying parameters very well, as shown in Figs. 4.31 and 4.32. To further investigate its tracking capability, faster-varying systems are simulated as in Figs. 4.33 — 4.36. As shown in the figures, the adaptive Sub-OBE—ABE algorithm (’70 = 1.5, e = 0.02, M = 70), which selects no more than 7.5% of the data, keeps track of these faster time-varying parameters equally well. Comparing these diagrams with those in [11, 37, 38] shows that the adaptive Sub-OBE—ABE algorithm has better tracking capability. The tracking capability of the Sub-OBE—ABE (or any OBE) algorithm is propor- tional to the tail probability of noise. This is a natural result since the larger the tail probability, the more frequent Sub-OBE—ABE updates its estimator. To demonstrate this, a much faster time-varying systems with the same uniformly distributed noise v,, is simulated as shown in Fig. 4.37. As shown in the figure, the tracking performance is not as good as in previous cases. However, if vn is now binary Bernoulli distributed with 0.5 tail probability, the tracking performance is improved as shown in Fig. 4.38. 77 -2»- 1ooozoooaooo4ooosooosooo7oooaooosooo1oooo Tlme(n) Figure 4.29: N on-adaptive OBE algorithm on the time-varying system (4.8). 7,. = 1.5. An even faster time—varying system with the same binary Bernoulli distributed noise is shown in Fig. 4.39. As shown in the figure, the tracking performance of Sub«0BE— ABE algorithm is still excellent in this case. In these cases, Sub-OBE—ABE selects, on average, 32% of the data. 78 I I I I I L r l l I I l l l I I I 1000200030004000500060007000m900010000 mm) Figure 4.30: N on-adaptive OBE (SM-SA) algorithm on the time—varying system (4.8). 7,, = 1.5. 10002000300040005000600070008000900010000 Tlmo(n) Figure 4.31: Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8). p = 4.5%. 79 -1- a -2. 10002000300040005000600070008000900010000 Timur) Figure 4.32: Adaptive Sub-OBEABE on the time-varying system (4.8). p = 3%. -1 I- ' I l -2 - -3h 1 10002000300040005000600070008000900010000 Thom) Figure 4.33: Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8). p = 5.5%. 80 -1 .2 I- 4 4 L... L_L__L: 1ooozoooaoooaooosoooeooomoosooosooo1oooo Time (n) Figure 4.34: Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 3.5%. -1- -2. L 1ooozoooaooowoosoooeooo7ooosooosoootoooo The (n) Figure 4.35: Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 7.5%. 81 10002000300040005000600070008000900010000 Time(n) Figure 4.36: Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 5.5%. I I ' g I I I I II I II II I I I I ‘l I I II II I I v I I II II 'I ‘I V I' J 10002000300040005000w007000m00900010000 Tlmo(n) Figure 4.37: Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 10%. 82 “‘I -2- 10002000300040005000600070008000900010000 Tlmo(n) Figure 4.38: Adaptive Sub-OBE-ABE on the time-varying system (4.8) (v,, ~ B(-1,1)). p = 31%. . I. . I i .2r- .3. 4 1 L; 1000200033004000500060007000w00900010000 Tlmo(n) Figure 4.39: Adaptive Sub-OBE—ABE on the time—varying system (4.8) (vn ~ B(—l,l)). p = 33%. 83 Chapter 5 Application to Linear Prediction Analysis of Speech 5.1 Introduction OBE-ABE algorithm and its variants presented in the previous chapters are proven in Chap. 3 to be a.s. and p. convergent under various conditions including colored and non-stationary noise cases. These algorithms have excellent performance in simula- tions which are designed to satisfy the sufficient conditions of convergence. However, in real-world applications where model structures or noise characteristics are not ideal as in simulations, the performance of most algorithms suffer. In order to further inves- tigate the performance of the new OBE-ABE algorithms in real-world applications, they are applied to the linear prediction (LP) analysis of speech [12, 40] in this chapter and to the blind-deconvolution problem in the next chapter. 5.2 LP model of speech Speech is a very dynamic signal. Even a short frame (e.g., 25.6 msec) of speech can be regarded as quasi-stationary at best. However, for analytical tractability, a short 84 frame of 25.6 msec (256 points for 10 kHz sampling rate) speech signal is usually modeled as an AR(m) process as follows: yn = Zaityn-i + vn = 03X" '1’ ’0". (5.1) i=1 0? = [al. - - oam...] is the vector of unknown LP coefficients to be identified and the unobservable excitation input sequence {v,,} is a white noise for unvoiced sounds and a quasi-periodic pulse train of appropriate pitch for voiced sounds. The LP model has been the industry standard for decades in speech analysis be- cause of its mathematical tractability and acceptable results for processing or recog- nition purposes. Although cepstral coefficients [12] or temporal cepstral derivative [40] are employed in some speech recognition systems for robustness or improved performance, they are often derived from LP coefficients. The autocorrelation method using the Levinson-Durbin algorithm [12, 40] to achieve an 0(m) [0(3m) if overlapped frames are taken into account] computation has been the industry standard for identifying the LP coefficients {a,,...}. It is a batch method based on LSE optimization. Its recursive counterpart is the widely-used RLS. Both algorithms work very well on identifying LP coefficients as shown in the figures in the following sections. However, RLS has 0(m2) computational complexity since it uses 100% of the data to update the estimator. Any recursive algorithm must converge fast enough in, for example, 256 steps in order to be successfully applied to LP analysis of speech. Known to have fast speed of convergence and efficient computation [0(pm2)], OBE algorithms (with suboptimal checking), besides RLS, is another candidate for this purpose. 85 5.3 LP analysis using OBE-ABE The first attempt to applying OBE algorithm to LP analysis of speech is performed by Deller and Luk [15]. This attempt reveals the fact that the convergence performance of OBE is seriously affected by the precision of the estimated noise bounds which are not available in most real-world applications. With the ABE procedure, the OBE-ABE algorithm makes possible the application of OBE to real-world applications including LP analysis of speech. Due to OBE- ABE’s fast speed of convergence, robustness to measurement noise (see Chap. 6) and non-stationarity, and blind-deconvolution capability in colored noise, the results shown below are comparable to those of RLS or the autocorrelation method. Theoretically, OBE-ABE can be initialized with any overestimated bound 70, small 6, and large enough M. Since each speech frame in the LP model is typically only 256 points (for 10 KHz sampling rate) and quasi-stationary, the initialization must be deliberately chosen to assure convergence. To this purpose, several speech signals which are drawn from TIMIT (Texas Instruments and Massachusetts Insti- tute of Technology) speech database [18] are employed in determining the optimal initialization. Also, each frame of speech is normalized to have maximum unity mag- nitude. However, no windowing (e.g., Hamming window) is needed. An acceptable initialization is 70 = 1.5, e = 0.25, and M = 12, for both voiced and unvoiced sounds. This makes the application of OBE-ABE straightforward, without requiring a priori knowledge of the voice/unvoiced status of the frame. This initialization produces acceptable results using only, on average, 12% of the data for voiced sounds and 8% for the unvoiced. By experimentation, an excellent initialization is found (70 = l, e = 0.01, and M = 20) for all the voiced and unvoiced. This initialization results in excellent spectral envelopes of LP coefficients which are comparable to those of the autocorrelation method and RLS. The data selected in this case are, on average, 25% 86 for voiced sounds and 60% for the unvoiced. As typical examples of LP analysis, the first set of spectra of LP coefficients for vowels /I/ (voiced phoneme, from utterance “six”), / E / (from “seven”), and /u/ (from “two”) are shown in Figs. 5.40 - 5.42, and for unvoiced plosive /t/ (from “two”), /f/ (from “four”), and /s/ (from “six”) are shown in Figs. 5.43 — 5.45. The upper portions of the figures show the spectra of the speech frames themselves based on a 512-point F FT. The lower portions show the spectra of LP coefficients (m = 14) obtained by autocorrelation method (solid line), and OBE—ABE (dashed line). As seen in the figures, OBE-ABE produces similar spectra to those of conventional batch method. The second set of spectra of LPG coefficients for vowels / I / (voiced phoneme, from utterance “six”), /E/ (from “seven”), and /u/ (from “two”) are shown in Figs. 5.46 — 5.48, and for unvoiced plosive /t/ (from “two”), /f/ (from “four”), and /s/ (from “six”) are shown in Figs. 5.49 - 5.51. Similar to the first set, the upper portions of the figures show the spectra of the speech frames themselves based on a 512-point F FT. The lower portions show the spectra of LP coefficients (m = 14) obtained by RLS (solid line), and OBE—ABE (dashed line). As seen in the figures, OBE—ABE produces similar spectra to those of RLS while selects, on average, no more than half as many data as RLS. To further investigate OBE-ABE’s capability, a simulated speech frame (256 points) of phoneme /u/ is generated by the LP model (5.1) using a pulse train (pulses separated by 80 points, amplitude 0.5, and first pulse located at the 10th point) as the excitation {v,,}. Both OBE-ABE and autocorrelation method are applied to this case for the identification of the LP coefficients. The results are shown in Fig. 5.52. As seen in the figure, OBEABE produces better result since its LP spectrum is more resemblant to that of the simulated speech itself. This conforms to the fact that OBE-ABE converges consistently in colored noise case while other algorithms based 87 1 Spearmint/II 10 I I I I U I I 1 Y 10 '310'2 1 J. L A L A 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 o 1 l l l I 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mn W normalized b PI Figure 5.40: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). I Wot/El l J 1 l l l G l l l O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon MW nonnallud b Pl Figure 5.41: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE—ABE (dashed line), and autocorrelation (solid line). 88 ‘ Mind/U 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.0 1 Man W m :5 PI Figure 5.42: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). o Sputum 0M! 10 fir I I f I I T 1 1o" -2 u ‘° . 1. 104i 1 10" 104 l I 1 1 J_ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 L l l 1 A L 1 l 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon Mm norm-Izod to PI "o 01 Figure 5.43: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). 89 ‘ Spuwunouv 10“ ' 0 0.1 02 0.3 0.4 05 0.0 0.7 on 0.9 1 G I L 0 on 02 03 0A 05 05 0] 03 09 1 allmh'mnmwnaWUhndel Figure 5.44: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). a Spoctunothl ‘0 1' fl 1 r I r 10° 010*» 10"} 104 1 LL 9 l L 4 l l l 0 0.1 02 03 0.4 as 0.0 0.7 as 0.9 1 15 . s 0 OJ 02 03 0A 05 on 0] Hulmfluwunymmwflmfltflfl Figure 5.45: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation (solid line). 90 2 Spoctunotlll ‘0 I I I I I I T I I 10 010* 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon Iroquoncy nonnalzed 10 PI P l L Figure 5.46: Spectra of voiced /I/ phoneme. Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). 3 Wot/El ‘0 I fi I I I I T 10 10 ' 10' 15 I I I f r 10 _ \ J 8 I ,. 5 - — ’l G J 1 1 1 1 1 1 1 J ‘ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon lroqmncy normalized 10 PI Figure 5.47: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). 91 Spammdhv 102 I I I I 0 10“ . 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon M norm-Izod b Pl Figure 5.48: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). 0 Spectrumoflv 10'“ 0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 WW nomullzodb PI Figure 5.49: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). 92 ‘ Spoummot/ll 10"; . 0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 O 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 Mn 11W nourished b Pl Figure 5.50: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). a Spectrumdls/ 8104- 4 0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 c l l 1 L l l l I l 0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Radon M normalized 10 PI Figure 5.51: Upper graph: Direct 512-point F FT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid line). 93 on LS concept (e.g., LS, RLS, LMS) are inconsistent. 5.4 Sub-OBE-ABE and computational complexity The Sub-OBE—ABE algorithm has been shown in Chap. 4 to have similar speed of convergence to OBE-ABE while having more efficient 0(pm2) computation. However, within short periods of 256 steps, Sub-OBE—ABE does not converge as well as OBE- ABE with the same initialization, due to the suboptimal check for innovation. This phenomenon has been explained in Chap. 4 for Fig. 4.12 and 4.14. Through experiments, two set of initializations of Sub-OBE-ABE algorithm are found respectively for the voiced and unvoiced sounds. For voiced sounds, OBE-ABE initialized with 70 = 0.25, e = 0.0001, and M = 12 has excellent results as shown in Figs. 5.53 — 5.55. Seven percent of the data is selected in the voiced case on average. For unvoiced sounds, the initialization is 70 = 0.5, e = 0.0001, and M = 8. The results for unvoiced sounds are shown in Figs. 5.56 - 5.58. Thirteen percent of the data is selected in the unvoiced case on average. As seen in the figures, Sub-OBE—ABE achieves similar spectra to those of RLS or the autocorrelation method while selecting, on average, only 10% of the data. Since the number of LPG coefficients, m, is on the order of 10 [12, 40] for most speech applications, Sub-OBE—ABE is virtually an 0(m) recursive algorithm in this application. 94 z Specuunot/w 10 I I f T L 10" i 810" P 4 10 V1 10" L 1 L L 0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 Baden lrequency normalized b P1 Figure 5.52: Spectra of simulated voiced /u/ phoneme. Upper graph: Direct 512- point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and autocorrelation method (solid line). Specuumotlv 102 I I I I I 10 " 0 10" . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4L 1 4 k 0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Baden trequemy normalized 10 P1 Figure 5.53: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line). 95 1o 1 . . . . T . f . 10° - - 10" i 4 B 10“ ' j ‘0'... .1 10" 1 1— 1 L L 1 1 4 1 1 A L 1 1 1 ‘1 - - - o 0.1 0.2 0.3 0.4 0.6 0.6 0.7 Mn tequeney nonnelzed b PI o l L Figure 5.54: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.0 1 0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Hedien hequemy normalized 10 Pl Figure 5.55: Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 96 ‘ Spout-110M! 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1 Red-n new nonnelzed b Pl Figure 5.56: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line). 0 Specuumofnl 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1 Reden lrequency nommlzed 10 P1 Figure 5.57: Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 97 10 I I T 10° - . > 4 810" ~ ~ 10" [ 10'. 1 l l L L L 1 1 1 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.0 0.9 1 15 I I I I I I I I 1 j l l 1 l 0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1 Hedien 1requenay normalized 10 Pl Figure 5.58: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line). 98 Chapter 6 Application to Blind Deconvolution 6.1 Introduction The blind-deconvolution problem [23] has attracted intensive research in recent years in the fields of data communications, adaptive filtering, and signal processing. A typi- cal block diagram for the blind-deconvolution problem is shown in Fig. 6.1. The input {v,,} and the noise {10”} in the figure are assumed unobservable. In data communi- cations, the unknown input data are recovered using the output data received from a noisy and distorted channel (e.g., satellite, cellular or optical fiber). Another example of a blind-deconvolution problem is the LP analysis of a speech signal corrupted by unknown measurement or environmental noise. Several well-known estimation methods have been applied to solve the blind- deconvolution problem. Bayesian estimation [23] (batch method) and LMS (recursive) are two popular methods. However, both of those methods require that the input {0,,} and the noise {wn} be white for convergence since they are based on the LSE op- timization. This seriously restricts the application scope since input data ({v,,}) in most engineering applications (e.g., the previous two examples) are not always white. 99 yn D H(z) Figure 6.1: Model of the received signal in the blind—deconvolution problem. The new OBE-ABE algorithms presented in this dissertation are good candidates to solve this difficulty since they converge in colored noise cases as demonstrated in the previous chapters. Although the model of blind deconvolution shown in Fig. 6.1 is not an ARX model, OBE-ABE algorithms show excellent results in this application due to their robustness to the measurement noise. The block diagram in Fig. 6.2 is one of the application structures for blind de- convolution and is employed in this chapter. The parameters of the inverse filter H‘1(z) are estimated by OBE-ABE using output data {y,,}. The detector in the figure has known transfer functions for the recovery of the input data using {0,,}. The unknown H(z) is simulated as HR (infinite impulse response) and FIR (finite impulse response) filters, respectively. The unknown input {0,,} is simulated as an iid or a colored digital data stream. The noise {ion} is independent of {v,,} and is iid with uniform distribution specified in the following sections. 100 6.2 IID input and IIR filter In this section, the IIR filter in Fig. 6.2 is assumed to be: H(z) = 1/(a1...z’1 + 02.2-2 + a3..z"3) (6.1) where the unknown parameters al.. = 2, a2... = —1.68,a3... = 0.34, the input {v,,} is iid with equally distributed discrete values {-1, -0.5, 0, 0.5, 1}, and the noise {wn} is iid and uniformly distributed on [-0.05, 0.05]. In this case, the signal to noise ratio (S/N) [ignoring the gain of H (2)] is 42.5 dB. The order of the inverse filter f1 “1(2) in this case is three. The application results of OBE-ABE and conventional OBE algorithms in this case are shown in Fig. 6.3 and 6.4, respectively. As seen in Fig. 6.4, the ellipsoidal volumes of OBE algorithm (77. = 1) quickly become negative because of the added noise {wn}, while the volume of the OBE-ABE algorithm (70 = 10, e = 0.5, and M = 80) converges to zero. Figure 6.3 shows the estimator of the first parameter both algorithms. As seen in the figure, the OBE estimate diverges due to the negativity of ellipsoidal volumes, while OBE-ABE algorithm converges to the theoretical values ([al,ag,a3] = [1.9930, —1.4676,0.3332]) calculated by the Wiener optimization method [23]. Simulations show that OBE-ABE algorithm is robust to the noise {wn} while OBE is not. Simulations also show that, using OBE—ABE in this case, the output of the detec— tor, {13n}, recovers the unknown input data {0,,} with a 99% success rate. Figure 6.5 shows the first 100 samples of v,,, i3", and i)... The 1% wrong data can be easily recovered using some error-correction techniques in data communications. Simulations also show that the performance of OBE-ABE algorithm in the noise- corrupted case is not affected by the order of H (2) However, when the S / N decreases (e.g. increases the bound of the noise {wn}), the success rate decreases due to the 101 biased estimator which can be theoretically calculated by the Wiener optimization method. This can be seen in Figs. 6.6 and 6.7 which are the simulation results of the above model except that now wn ~ U(—0.1,0.1) (S/N = 36.5 dB). As seen in the figures, the estimator of OBE-ABE is biased (theoretical values of the estimator in this case are [01,02,031 = [1.9724, —1.4315,0.3133]), and the success rate is 91%. 6.3 IID input and FIR filter In this section, the FIR filter in Fig. 6.2 is assumed to be: 11(2) = ale-z-l + a2ez_2 + 03112—3 where the unknown parameters a1... = 0.7,a2. = 05,03... = —0.336, the input {v,,} is iid with equally distributed discrete values {—1, -0.5, 0, 0.5, 1}, and the noise {wn} is iid and uniformly distributed on [-0.05, 0.05] (S/N = 42.5 dB) and [-0.1,0.1] (S/N = 36.5 dB), respectively. In this FIR case, the performance of OBE-ABE algorithm is still excellent if the order of the inverse filter H‘1(z) increases from 3 to 10. The results are shown in Figs. 6.8 — 6.9. As seen in the figures, OBE-ABE shows excellent performance with 99% success rate in the case of S/N = 42.5 dB (Fig. 6.8) and 93% success rate in the case of S/N = 36.5 dB (Fig. 6.9). 6.4 Colored Input and IIR Filter In this section, the HR filter in Fig. 6.2, H (z), is the same as in (6.1) and the input {v,,} is colored as in (4.5), and the noise {tan} is iid and uniformly distributed on [—0.1, 0.1] (S/N = 36.5 dB) or [-0.5,0.5] (S/N = 22.5 dB). The application results of OBE-ABE in these cases are shown in Figs. 6.10 and 6.11. As seen in the figures, 102 ... A '0" 7 + yn A I vn 7 0n ——>[ H(z) H'1 (2) H Detector [——> Figure 6.2: Block diagram of the blind-deconvolution problem. J. l IJLI l l Ji 010002000300040005000600070008000900010000 OBE-ABE l l l J l 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time(n) 1 4 4' 1 Figure 6.3: Estimates of al. of OBE (7,, = 1) and OBEABE (70 = 10) for Fig. 6.2 (iid-11R, S/N = 42 dB). 103 -69 .3 10002000300040005000000070000000000010000 Figure 6.4: Volumes of ellipsoids of OBE (7,. = 1) and OBE-ABE (70 = 10) for Fig. 6.2 (iid-HR, S/N = 42 dB). _, 4 4 ., l l L 1 l L I L J d — 0 and N E N, such that 7,, - v,,2 > e, Vn > N, then the ellipsoids of OBE algorithms do not asymptotically shrink to a point. Lemma A.2 (Proof: [27]). Let c; and c: be the negative and positive parts ofc,,, re- spectively, where c,, is defined in (2.10). That is c; = {c,,, if c,, < 0; 0, otherwise}, and c: = {c,,, if c,, 2 0; 0, otherwise}. If PE holds, then lim,,_.°o c; = 0. Lemma A.3 [46, p.120], [16, p.35]. A discrete random variable has an almost peri- odic characteristic function, and the converse is also true. Lemma AA [17]. If {X}, : k E K} is stationary and g : RK -+ R is measurable, then {Y}, = g(Xk,Xk_1, - - .) : k E K} is also stationary. Proof: Let a: E RK, B E BK, and g], = g(:c;,,a:k_1,- - ). Also, let A = {x 1 (90(33lag-1($)1°“) E B}- 112 Then, stationarity of {Xk} implies P(UJ:(YE),Y_1,'”) E B) P(w 2 (X0,X_1,°") E A) = P(w I (Xk,Xk_1,"') E A) P(w : (Yk,Y),_1, - ~ ) 6 B). Hence, by Definition 3.8, {Y,,} is stationary. [3 Lemma A.5 [17]. Let X), d-Ef X o Tk‘l, V k E K where X is a r.v. on (Q,f, P). If T is an m.p.t., then, {X}, : k E K} is a stationary sequence. Proof: The lemma is proved by observing that, for B E BK and Adi! w:(~--,X1,X2,---)€ B}, P((- - . ,X1,X1+1,-~) e B) = P(Tk‘1(w) e A) = P(w e A) = P((--°,X1,X2,---)EB). Lemma A.6 [17]. If m.p.t. T is mixing, then T is ergodic. Proof: Let A E I. Then, T‘1(A) = A,~--,T"‘(A) = A, V k. Hence, by defini- tion 3.13, P(A) = P(A n A) = P(A n T-"(A)) _. P(A) . P(A) as n _. 00. So, P(A) = [P(A)]z. Thus, P(A) E {0,1}. Hence, I is trivial. Therefore, T is ergodic. [3 Lemma A.7 [17]. If {X}, : k E K} is an ergodic stationary sequence, and g : RK —> R is measurable, then {Y}, = g(Xk,X),_1, - - -) : k E K} is also stationary and ergodic. 113 Proof: Let X0,X_1,-- - be defined on sequence space 9 = RK with X),(w) == 02),, where w = (---w-1,wo,w1,-- ) 6 it. Since {Y,,} is stationary by Lemma A.4, let B 6 3K be such that {w : (Y0,Y_1,---) E B} = {w: (Y),,Yk_1,~-) 6 B}. then, by Definition 3.10, A die! 02 : (Y0,Y_1,~ - .) 6 B} is shift invariant. So, by the ergodicity of {X1}, P(A) = P({w = (X01X-11'”) 6 A}) 6 {0,1}- Therefore, {Y,,} is ergodic by Definitions 3.11 and 3.12. CI Note that {X1} and {Y,,} in the above lemma can be one-sided infinite sequences. Also, the mapping g which is independent of index k can also be any finite-length moving function. Theorem A.1 (Poincaré Recurrence Theorem) [17]. Let T : (2 —-1 it be an m.p.t. and A E .7. Define TA = inf{n Z 1 : T"(w) 6 A}. Then, (i) TA < oo a.s. on A, i.e., P(w 6 A : m(w) = 00) = 0. (ii) A C {02 : T"(w) E A i.o. }. (iii) If T is ergodic and P(A) > 0, then P(w : T"(w) 6 A i.o. ) = 1. Proof: (i) Let B = {w E A : 731(0)) = 00}. Ifw E T‘1(B), then T(w) E A, T2(w) ¢ A, T3(w) ¢ A, - - -. Ifw E T“2(B), then T2(w) E A, T3(w) ¢ A, T°(w) ¢ A, 2 - o. 114 Hence, T‘1(B), T’2(B), T'3(B), - - - are pairwise disjoint. Since T is m.p.t., it follows that P(B) = P(T'1(B)) = P(T"2(B)) = Thus, P(B) = 0. (Otherwise, P(Q) 2 22:0 P(T'k(B)) = 00.) (ii) For any h E N, since T is m.p.t., it follows from (i) that 0=P(w€A:T""(w)¢A, Vn21)2P(w€A:Tm(w)¢A, VmZk). This inequality implies that the last probability is 0 for all lc. Hence, P(w E A : Tm(w) 6 A i.o.) = 1. Thus, A C {0.2 : T"(w) i.o. }. (iii) By Definition 3.10, The set C gr (.0 : T"(w) E A i.o.} is an invariant set. Since T is ergodic, it follows from Definition 3.12 that P(C) 6 {0,1}. Also, by (ii), C D A. Hence, P(C) 2 P(A) > 0 by assumption. Therefore, P(C) = 1. Cl 115 BIBLIOGRAPHY Bibliography [1] Amerio, L., Prouse, G., Almost Periodic thctions and Functional Equations, Van Norstrand Reinhold Company, New York, 1971. [2] Atal, B.S., J .R. Remde, “A new model of LPG excitation for producing natural- sounding speech at low bit rates,” Proc. ICASSP, Paris, pp. 614-617, 1982. [3] Banyasz, Cs. and Keviczky, L. (editors), Proc. 9th IFAC Symp. on Identification and System Parameter Estimation (SYSID ’91) (Special sessions on bounded- error methods), vols. l &, 2, Budapest, July, 1991. [4] Bertsekas, D. P., and LB. Rhodes, “Recursive state estimation for a set- membership description of uncertainty,” IEEE Trans. on Automatic Control, vol. AC-16, pp. 117-128, April, 1971. [5] Bitmead, R. R., “Persistence of excitation conditions and the convergence of adaptive schemes,” IEEE Trans. on Information Theory, vol. IT-30, No. 2, pp. 183-191, March, 1984. [6] Bohr, H., Almost Periodic Functions, Chelsea Publishing, New York, 1951. [7] Cheung, M. E, On Optimal Algorithms for Parameter Set Estimation (Ph.D. dissertation), Ohio State University, 1991. [8] Dasgupta, S., and Y. F. Huang, “Asymptotically convergent modified recursive least squares with data dependent updating and forgetting factor for systems with bounded noise,” IEEE Trans. Info. Theory, vol. 33, pp. 383-392, 1987. [9] Deller Jr., J. R., T. M. Lin, and M. Nayeri, “Linear prediction using set- membership identification with blind error-bound estimation,” Proc. 1997 IEEE' Int. Conf. Acoustics, Speech, and Signal Processing, (in press). [10] Deller Jr., J. R., and M. Nayeri (session chairs), Proc. IEEE Intl. Symp. on Circuits and System (Special session on set-membership-based identification for signal processing and control), vol. 1, Chicago, May, 1993. [11] Deller Jr., J. R., M. Nayeri and S. F. Odeh, “Least-square identification with error bounds for real-time signal processing and control,” Proc. IEEE, vol. 81, pp. 815—849, 1993. 116 [l2] Deller, Jr., J. R., J. G. Proakis, J. H. L. Hansen, Discrete-Time Processing of Speech Signals, Macmillan, New York, 1993. [13] Deller Jr., J. R. and S. F. Odeh, “Adaptive set-membership identification in 0(m) time for linear-in-parameters models,” IEEE Trans. on Signal Processing, vol. 41, No. 5, May, 1993. [14] Deller Jr., J. R., M. Nayeri, and M. S. Liu, “Unifying the landmark developments in optimal bounding ellipsoid identification,” Int. J. of Adaptive Control and Signal Processing, vol. 8, pp. 43-60, 1994. [15] Deller, J .R., and T.C. Luk, “Linear prediction analysis of speech based on set- membership theory,” Computer Speech and Language, Oct., vol. 3, pp. 301-327, 1989 [16] Douglas, J. B., Analysis with Standard Contagious Distributions, vol. 4, Interna- tional Co-operative Publishing House, Burtonsville, Maryland, 1980. [17] Durret, R., Probability: Theory and Examples, Wadsworth be Brooks/ Cole Pub- lishing Co., California, 1993. [18] Fisher, W. M., G. R. Doddington, and K. M. Goudie-Marshall, “The DARPA speech recognition research database,” Proc. DARPA Speech Recognition Work- shop, pp. 93-99, 1986. [19] Fogel, E., “System identification via membership set constraints with energy constrained noise” IEEE Trans. on Automatic Control, vol. 24, pp. 752—758, 1979. [20] Fogel, E., and Y. F. Huang, “On the value of information in system identification - bounded noise case,” Automatica, vol. 18, pp. 229-238, 1982. [21] Goldberg, J. L., Matrix Theory with Applications, MacGraw-Hill Co., 1991. [22] Graupe, D., Time Series Analysis, Identification, and Adaptive Systems, Krieger, Malabar, Florida, 1989. [23] Haykin, S, 8., Adaptive filter theory, 3rd ed., Prentice Hall, Englewood Cliffs, New Jersey, 1996. [24] Kosut, R.L. (editor), IEEE Trans. Automatic Control (Special section on error bounding methods for robust control), vol. 37, July, 1992. [25] Lin, T. M., Nayeri, M., and J. R. Deller Jr., “Performance of Blind OBE in a Correlated-Error Environment,” 30th Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, Nov. 2 - 5, 1996 (in press). [26] Lin, T. M., Nayeri, M., and J. R. Deller Jr., “A Consistently Convergent OBE algorithm with Automatic Bound Estimation,” Int. J. of Adaptive Control and Signal Processing, (in review). [27] Liu, M. 8., Development and Analysis of an Optimal Bounding Ellipsoid al- gorithm using Stochastic Approximation (Ph.D. dissertation), Michigan State University, 1993. 117 [28] Liu, M.S., M. Nayeri, and J.R. Deller Jr., “Do interpretable optimal bounding ellipsoid algorithms converge? Part II - OBE vs. RLS: Clearing the smoke,” Proc. 10th IFAC Symp. on System Identification (SYSID ’94), vol. 3, pp. 395- 400, Copenhagen, Denmark, July, 1994. [29] Milanese, M., H. Piet-Lahanier, J. Norton, and E. Walter (editors), Bounding Approaches to System Identification, Plenum, London, 1996. [30] Milanese, M., and A. Vicino, “Estimation theory for dynamic systems with un- known but bounded uncertainty: an overview,” Automatica, vol. 27, pp. 997- 1009, 1991. [31] Moore, J .B., “Persistence of excitation in extended least squares,” IEEE Trans. on Automatic Control, vol. AC-28, No. 2, pp. 60 - 68, January, 1983. [32] N ayeri, M., J. R. Deller Jr., and M. M. Krunz, “ Convergence and colored noise issues in bounding ellipsoid identification,” Proc. IEEE I CASSP ’92, San Fran- sisco, vol. 5, pp. 337-340, March, 1992. [33] Nayeri, M., T. M. Lin, and J. R. Deller Jr., “Blind error-bound estimation for OBE identification,” Proc. 34th Annual Allerton Conf. on Communications, Control and Computing, University of Illinois, Urbana-Champaign, Oct 2-4, 1996 (in press). [34] Nayeri, M., T. M. Lin, and J. R. Deller Jr., “Some novel variants of blind OBE algorithm,” Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, (in press). [35] Nayeri, M., M. S. Liu, and J. R. Deller Jr., “Do interpretable optimal bounding ellipsoid algorithms converge? Part I - The long-awaited set-convergence proof,” Proc. 10th IFAC Symp. on System Identification (SYSID ’94), July, vol. 3, pp. 389-394, Copenhagen, Denmark, 1994. [36] Norton, J.P. (editor), Int. J. Automatic Control and Signal Process (Special issues on bounded-error estimation), Jan.-Feb., 1994-95. [37] Odeh, S. F., Algorithm and Architectures For Adaptive Set Membership-based Signal Processing (Ph.D. dissertation), Michigan State University, 1990. [38] Rao, A. K., and Y. F. Huang, “Tracking characteristics of an OBE parameter estimation algorithm,” IEEE Trans. Signal Processing, vol. 41, pp. 1140—1148, 1993. [39] Rao, A. K., Y. F. Huang, and Dasgupta, “ARMA parameter estimation using a novel recursive estimation algorithm with selecting updating,” IEEE Trans. Acoustics, Speech, Signal Process., vol. 38, pp. 447—457, 1990. [40] Rabiner, L. R., and B.-H. Juang, Fundamentals of Speech Recognition, Prentice- Hall, Englewood Cliffs, New Jersey, 1993. [41] Schweppe F. C., “Recursive state estimation: Unknown but bounded errors and system inputs,” IEEE Trans. on Automatic Control, vol. AC-13, pp. 22-28, Feb., 1968. 118 [42] Séderstrém, T., and P. Stoica, System Identification, Prentice Hall, Englewood Cliffs, New Jersey, 1989. [43] Stark, H., J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, Prentice Hall, Englewood Cliffs, New Jersey, 1986. [44] Veres, S.M., and J .P. Norton, “Structure selection for bounded-parameter mod- els: Consistency conditions and selection criterion,” IEEE Trans. on Automatic Control, vol. 36, pp. 474-481, 1991. [45] Walter, E., and H. Piet-Lahanier, “Estimation of parameter bounds from bounded-error data: A survey,” Mathematics and Computers in Simulation, vol. 32, pp. 449-468, 1990. [46] Wiener, N., Generalized Harmonic Analysis and Tauberian Theorems, MIT press, Cambridge, Massachusetts, 1964. [47] Witsenhausen, H. 8., “Sets of possible state of linear systems given perturbed observation,” IEEE Trans. on Automatic Control, vol. AC-13, pp. 556-568, Oct., 1968. 119 "I22222222|||-122222225