i. 37..

e. "‘3’ 1.

 

 

 

 

 

a...)éiinﬁﬁrﬁa . _ $32.2 . , . .

 

 

THESIS

. , . lUlIlHlllUllllIllHlHlllllllllHlllilI‘IIHJUllllHlllHl

301564 4614

This is to certify that the

dissertation entitled

OPTIMAL BOUNOING ELLIPSOID ALGORITHMS
NITH AUTOMATIC BOUND ESTIMATION
presented by

Tsung Ming Lin

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in ETECtrica] Eng.

WI? /\T

Major professor

Date DiC QC" 1/776

MS U i: an Afﬁrmative Action/Equal Opportunity Institution 0—12771

 

V- ﬁViv ~ —v v—v v- v“ r-

- v -

v T v v ‘

 

’ LIBRARY
Mic Z‘iigan State
University

 

 

 

PLACE ll RETURN BOX to remove this checkout hum your "cord.
To AVOID FINES rotum on or before data duo.

DATE DUE DATE DUE DATE DUE l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU Is An Afﬁnnotivo ActioNEqud Opportunity institution
m

OPTIMAL BOUNDIN G ELLIPSOID ALGORITHMS
WITH
AUTOMATIC BOUND ESTIMATION

By
Tsung-Ming Lin

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering

1996

ABSTRACT

OPTIMAL BOUNDING ELLIPSOID ALGORITHMS
WITH
AUTOMATIC BOUND ESTIMATION

By

Tsung-Ming Lin

This research is concerned with new Optimal Bounding Ellipsoid algorithms with
Automatic Bound Estimation (OBE-ABE) that can be applied to model parameter
estimation, system identiﬁcation, and related topics. To achieve convergence, conven-
tional OBE algorithms require the a priori knowledge of the noise bounds which are
unavailable in most real-world applications. It has been found that the exact knowl-
edge of the noise bounds is essential to the performance of OBE algorithms. The
new OBE-ABE algorithm, its computationally efﬁcient version, Sub-OBE—ABE al-
gorithm, and an adaptive version, Adaptive Sub-OBE—ABE algorithm, introduced in
this dissertation do not require this knowledge. With the help of the automatic bound
estimation (ABE) procedure, the new algorithms have excellent performance with re-
spect to convergence, speed of convergence, computational efficiency, and tracking
capability.

Another excellent feature of the new algorithms is the convergence in colored
noise, or even non-stationary noise environments, which is theoretically impossible
for other well-known algorithms, e.g., Recursive Least Square (RLS), Least Mean
Square (LMS), and the Kalman-Bucy Filter. Due to these distinctive features, the

new algorithms are expected to have wider applications than others.

Rigorous proofs for the almost sure convergence and convergence in probability
of the new algorithms are provided for the cases of iid, mixing, ergodic, and non-
stationary noises. Simulations in these noise cases are presented to support the proven
convergence and to demonstrate other properties. The new algorithms are successfully
applied to solve two real-world problems, the linear prediction analysis of speech and

the blind-deconvolution problem.

Copyright by
Tsung-Ming Lin
1996

To SUMA CHIN G HAI

Acknowledgments

I am indebted to my thesis advisor, Professor Majid Nayeri, for his encouragement,
patient guidance, and many hours spent in discussing with me. Thanks are also given
to my co-advisor, Professor John R. Deller, Jr., for his generous literature support,
valuable comments, and help in reviewing my dissertation.

I would also like to thank other members in my Ph.D. guidance committee, Pro-
fessor Clifford Weil and Professor Hassan K. Khalil, for their help and encouragement
in reviewing my dissertation.

I would like to express my gratitude to my mother and my wife for their love,

sacriﬁce, and ﬁnancial support.

vi

Contents

List of Tables

List of Figures

1

Introduction and Background
1.1 Introduction ................................
1.2 Methods for Parameter Estimation ...................

1.3 Review of OBE Algorithms .......................
1.4 Motivation for New Algorithms .....................

OBE Algorithms with Automatic Bound Estimation
2.1 Introduction ................................
2.2 Formulation of OBE algorithms .....................
2.3 OBE-ABE Algorithm ...........................
2.4 Sub-OBE—ABE Algorithm ........................
2.5 Adaptive Sub-OBE—ABE .........................

Convergence Analysis

3.1 Introduction ................................
3.2 Deﬁnitions .................................
3.3 Persistency of excitation condition ....................
3.4 Lemmas ..................................
3.5 Almost Sure Convergence of OBE-ABE .................
3.6 Convergence in Probability of OBE-ABE ................
3.7 Almost Sure and Probability Convergence of Sub-OBE—ABE .....
3.8 A Necessary Condition ..........................

Simulation Studies

4.1 Introduction ................................

4.2 IID Noise Cases ..............................
4.2.1 OBE—ABE vs. OBE ........................
4.2.2 Sub-OBE—ABE vs. OBE-ABE ..................

4.3 Colored Noise Cases ...........................
4.3.1 OBE—ABE vs. OBE ........................
4.3.2 Sub-OBE-ABE vs. OBE-ABE ..................

4.4 Tracking Performance of Adaptive Sub-OBE-ABE ...........

vii

ix

NWMi—‘H

11
11
11
14
14
16

20
20
21
25
29
35
42
50
53

55
55
56
57
64
64
68
69
74

5 Application to Linear Prediction Analysis of Speech

5.1 Introduction ..........
5.2 LP model of speech ......
5.3 LP analysis using OBE-ABE .

5.4 Sub-OBE-ABE and computational complexity .............

6 Application to Blind Deconvolution

6.1 Introduction ..........
6.2 IID input and IIR ﬁlter . . . .
6.3 IID input and FIR ﬁlter . . .

6.4 Colored Input and IIR Filter

7 Conclusions

7.1 Concluding Remarks .....

7.2 Contributions and further work .....................

Appendix A
Bibliography

viii

84
84
84
86
94

99
99
101
102
102

109
109
110

112
116

List of Tables

2.1 The OBE-ABE algorithm. ........................ 15
2.2 The Sub-OBE—ABE algorithm ....................... 17
2.3 The Adaptive Sub-OBE-ABE algorithm. ................ 19

ix

List of Figures

1.1 Block diagram of a model for a dynamic system ............. 2

1.2 OBE algorithm with 7,, = 0.8 (underestimated) on model (1.3). vn ~
U(—1, 1) ................................... 8

1.3 OBE algorithm with 7,, = 1.2 (overestimated) on model (1.3). vn ~
U(-1,1) ................................... 9

1.4 Volumes of ellipsoids (after 100 runs) of SM-SA with 7,, = 1, 1.1, 1.5
respectively, and OBE—ABE (70 = 1.5) on model (1.3). vn ~ U(—l, 1). 9
4.1 Estimators of a1. = 2. CASE 1 : 'vn ~ B(-1,1) is non-zero mean for

model (4.1).~ ................................ 59
4.2 CASE 1 : U B(— -l,1) is non-zero mean for model (4.1). ...... 59
4.3 CASE 1 : vn~ B-—( -,1 l) is non-zero mean for model (4.1). ...... 60
4.4 Estimators of a1... = 2. CASE 2 : vn ~ U(—1,1) for model (4.1). . . . 60
4.5 CASE 2 : vn ~ U(— 1, 1) for model (4.1). ................ 61
4.6 CASE 2 : vn ~ U(—1,1) for model (4.1). ................ 61
4.7 Estimators of 01... = 2. CASE 3 : 1),, ~ B(—0.5, 1), (4.3), for model (4.1). 62
4.8 CASE 3 : 22,, ~ B(—0.5, 1), (4.3), for model (4.1). ........... 62
4. 9 CASE 3 : 2),, ~ B(—0.5,1), (4.3), for model (4.1). ........... 63

4.10 CASE 4 : The ﬁnal ellipsoid and the trajectory of the estimator. 12,, ~
B(-l,1), (4.2), for model (4.4). ..................... 63

4.11 Estimators of a1. = 2. CASE 1 : vn ~ B(-1,1) is non-zero mean for
model (4.1). ................................ 65
4.12 CASE 1 : vn ~ B(—1, 1) is non-zero mean for model (4.1). ...... 65
4.13 Estimators of a1... = 2. CASE 2 : ‘vn ~ U(—1,l) for model (4.1). . . . 66
4.14 CASE 2: un ~ U(—1, 1) for model (4.1). ................ 66
4.15 Estimators of a1. = 2. CASE 5: on is colored in (4.5) for model (4.1). 69
4.16 CASE 5 : un is colored in (4.5) for model (4.1). ............ 70
4.17 CASE 5 : vn is colored in (4.5) for model (4.1). ............ 70
4.18 Estimators of a1. = —0.1. CASE 6: un is colored in (4.5) for model (4.6). 71
4.19 CASE 6 : vn is colored in (4.5) for model (4.6). ............ 71
4.20 CASE 6 : vn is colored in (4.5) for model (4.6). ............ 72

4.21 Estimators of a1. = 2. CASE 7 : Colored noise vn ~ B(-0.5, 1), (4.7),
for model (4.1). .............................. 72

4.22 CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7), for model (4.1). . . . 73
4.23 CASE 7 : Colored noise vn ~ B(—0.5,1), (4.7), for model (4.1). . . . 73
4.24 CASE 8 : The ﬁnal ellipsoid and the trajectory of the estimator. vn ~

B(—1,l), (4.5), for model (4.4). ..................... 74
4.25 Estimators of a1. = 2. CASE 5: un is colored in (4.5) for model (4.1). 75
4.26 CASE 5 : 127, is colored in (4.5) for model (4.1). ............ 75

4.27 Estimators of a1. = —0.1. CASE 6: 12,, is colored in (4.5) for model (4.6). 76

4.28
4.29
4.30
4.31

4.32
4.33

4.34
4.35
4.36
4.37
4.38
4.39
5.40
5.41
5.42
5.43
5.44
5.45

5.46

5.47
5.48
5.49
5.50
5.51
5.52

5.53

5.54

CASE 6: v,, is colored in (4.5) for model (4.6) .............. 76
Non-adaptive OBE algorithm on the time-varying system (4.8). 7,, = 1.5. 78
Non-adaptive OBE (SM-SA) algorithm on the time-varying system (4.8).

7,, = 1.5 ................................... 79
Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8).
p = 4.5%. ................................. 79

Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 3%. 80
Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8).

p = 5.5%. ................................. 80
Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 3.5%. 81
Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 7.5%. 81
Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 5.5%. 82
Adaptive Sub-OBE—ABE on the time-varying system (4. 8). p— — 10%. 82
Adaptive Sub-OBE-ABE on the time-varying system (4. 8) (1),, ~ B(— l ,1)).
p: 31%. ................................. 83
Adaptive Sub-OBE—ABE on the time-varying system (4. 8) (12,, ~ B ( 1, 1)).
p: 33%. ................................. 83
Upper graph. Direct 512-point FFT spectrum. Lower graph: Spectra

based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 88
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 88
Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 89
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 89
Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 90
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line). . . 90

Spectra of voiced /I/ phoneme. Upper graph: Direct 512-point FFT
spectrum. Lower graph: Spectra based on OBE-ABE (dashed line),

and RLS (solid line). ........................... 91
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line) ......... 91
Upper graph: Direct 512-point F F T spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line) ......... 92
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line) ......... 92
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line) ......... 93
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line) ......... 93

Spectra of simulated voiced /u/ phoneme. Upper graph: Direct 512-
point FFT spectrum. Lower graph: Spectra based on OBE-ABE
(dashed line), and autocorrelation method (solid line) .......... 95
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 95
Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line). 96

xi

5.55 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 96
5.56 Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 97
5.57 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 97
5.58 Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line). 98

6.1 Model of the received signal in the blind-deconvolution problem. . . 100
6.2 Block diagram of the blind-deconvolution problem. .......... 103
6.3 Estimates of al. of OBE (7,, = l) and OBE-ABE (70 = 10) for Fig. 6.2
(iid-IIR, S/N = 42 dB). ......................... 103
6.4 Volumes of ellipsoids of OBE (7,, = 1) and OBE-ABE (70 = 10) for
Fig. 6.2 (iid-IIR, S/N = 42 dB). .................... 104
6.5 First 100 samples of 13,, (top), 13,, (middle), and 2),, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (iid-IIR, S/N = 42.5 dB). ...... 104
6.6 Estimates of al. of OBE (7,, = 1) and OBE-ABE (‘70 = 10) for Fig. 6.2
(iid-IIR, S/N = 36.5 dB). ........................ 105
6.7 First 100 samples of 6,, (top), 6,, (middle), and 12,, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (iid-IIR, S/N = 36.5 dB). ...... 105
6.8 First 100 samples of 6,, (top), 0,, (middle), and 1),, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (iid-FIR, S/N = 42.5 dB). ...... 106
6.9 First 100 samples of 13,, (top), 6,, (middle), and 1),, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (IID-FIR, S/N = 36.5 dB). ..... 106
6.10 First 100 samples of 23,, (top), 6,, (middle), and 12,, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (Colored-HR, S/N = 36.5 dB). . . . 107

6.11 First 100 samples of 5,, (top), 6,, (middle), and 1),, (bottom) using
OBE-ABE (70 = 10) in Fig. 6.2 (Colored-HR, S/N = 22.5 dB). . . . 108

xii

Chapter 1

Introduction and Background

 

1.1 Introduction

From engineering to social science, people use modeling to help understand and de-
scribe observed phenomena. For example, in economics, modeling is usually used
to predict inﬂation and study the trend of stock markets. In engineering ﬁelds, a
dynamic system or signal is usually modeled before further analysis or processing. In
general, a model can be described as in Fig. 1.1 in which the noise (or disturbance or
model-error) sequence {v,,} is unobservable.

Modeling is usually classiﬁed into two categories: Nonparametric and parametric
[42]. A nonparametric model is described by a curve, table or function. Transient
analysis and correlation analysis are two well-known methods for studying nonpara-
metric models. A parametric model is usually described by difference equations (or
differential equations in continuous time) with system parameters. Typical paramet-
ric models are the auto regressive with exogenous input (ARX), auto regressive moving
average with exogenous input (ARMAX) and the state-space structure.

Basic laws from physics or science are usually employed to construct a parametric
model. However, in many applications, the dynamic systems are too complex to

model using only physics laws.

 

 

 

% H(Z) >

 

 

 

Figure 1.1: Block diagram of a model for a dynamic system.

Hence, numerous system—identification or parameter-estimation techniques have
been developed to estimate the unknown parameters in a model. Some well-known
methods are brieﬂy reviewed in the next section.

The main objective of this research is to devise a new algorithm, optimal bounding
ellipsoid with automatic bound estimation (OBE-ABE) algorithm and its variants for
the estimation of model parameters. The new algorithms require less restrictive a
priori knowledge and conditions than conventional methods for the convergence (or
consistency) of the estimator. In addition, the performance of the new algorithms
with respect to speed of convergence, computational efﬁciency, and tracking capability

is superior to conventional methods.

1.2 Methods for Parameter Estimation

There are varieties of methods for estimating the model parameters. Among those are

minimum mean square error (MMSE), maximum likelihood (ML), and least square

error (LSE), etc. [22, 42]. They are all batch methods.

Well-known recursive (or real-time) methods are the recursive least square (RLS)
[22], least mean square (LMS) [22], instrumental variable (IV) [42], Kalman-Bucy
ﬁlter [42], and optimal bounding ellipsoid (OBE) [8, 11, 14, 20, 35] algorithms. One
of the most popular methods in engineering applications is the LSE or its recursive
counterpart, RLS, since they require only the whiteness of the noise sequence {v,,} for
convergence and have simple structure with well-understood convergence performance
under regular conditions [i.e., wide-sense stationarity (WSS) and second-moment er-
godicity (SME)]. RLS has been modiﬁed to a weighted RLS (WRLS) [22, 42] using a
forgetting factor for tracking time-varying parameters.

Since its invention ﬁfteen years ago, OBE has attracted attention from people
in real-time signal processing [11, 15], system identiﬁcation [3, 10], adaptive control
[24], and other ﬁelds in which computational efﬁciency and speed of convergence are
critical, and the statistics of the inputs and noise are unknown. However, its advan-
tages (over those of popular RLS) with regard to computational efﬁciency and faster
speed of convergence in some noise cases (see Chap. 4) are offset by the requirement
of known noise bounds which are unavailable in most applications. Without a priori
knowledge of the exact noise bounds, the performance of OBE suffers (Chap. 4). This

is why OBE has been rarely found in real-world applications.

1.3 Review of OBE Algorithms

In 1968, Schweppe introduced the ﬁrst bounding ellipsoid algorithm (BE) [41] for
estimating the states of a state space model with the assumption of bounded inputs
and bounded noises. With this assumption, BE algorithm produces at each step a
feasible set in state space instead of a single point which is normally produced by
other well-known methods, e.g., the Kalman-Bucy ﬁlter. The BE algorithm has a set

of recursive formulae similar in structure to the Kalman-Bucy ﬁlter. Without a priori

3

knowledge of the statistics of the inputs and the noise which are usually required by
other methods, BE has potential for wider applications. However, as pointed out
by Schweppe, he feasible set of the algorithm at each step is usually impractical to
calculate. He did not provide the convergence analysis of BE algorithm either.

Thereafter, Witsenhausen [47], Bertsekas and Rodes [4] published set-membership
(SM) algorithms based on the same bounded-input and bounded-noise assumptions.
Although SM has a smaller feasible set (convex polytope, see Chap. 2) at each step,
the mathematical analysis of SM is more complicated than that of BE.

In 1979, Fogel published 3. BE algorithm [19] for the identiﬁcation of the ARX
model with the implicit assumption that the noise sequence {v,,} has an asymptotically—
known accumulative energy bound. With this knowledge and the assumptions of the
white noise {v,,}, the regular WSS, and SME, Fogel has shown, in a deterministic
way, the convergence of his BE algorithm. In other words, the sequence of the ellip-
soids of BE algorithm is shown to asymptotically reduce to a point (the true model
parameter vector). Although the recursive formula of BE is similar in structure to
RLS , the computation is less efﬁcient than RLS.

In 1982, Fogel and Huang published the ﬁrst OBE algorithm (F/H OBE) [20]
which featured selective updating to ignore “redundant” data (in the OBE sense).
This innovative feature improves the computational efﬁciency of F ogel’s BE algorithm.
However, F / H OBE algorithm still has 0(m2) computational complexity due to the
check for selective updating at each step (Chap. 2). Selective updating is achieved
by variably weighting the incoming data for an optimal (maximum reduction in size)
ellipsoid at each step. The optimization is based on the assumption that the noise
bounds are known pointwise. If incoming data set does not help shrink the ellipsoid,
the weighting is set to zero, i.e. the incoming data set is “redundant” and hence

discarded. In contrast to OBE, Fogel’s BE algorithm has constant data weights

(= 1). Detailed formulation of OBE algorithms can be found in the next chapter.

Fogel and Huang also provided a proof of convergence of OBE based on the as-
sumption that the noise sequence is white and its pointwise upper bounds are known
in addition to the regular WSS and SME. However, the correctness of the proof is
arguable (also see [13, p.1913]) because the authors use in the proof the questionable
assertion that convergence of Fogel’s BE implies convergence of OBE. In fact, the
whiteness of the noise is neither sufﬁcient nor necessary for the convergence of the
OBE algorithm. Theorem 3.4 and 3.6 in Chap. 3 validate this assertion.

In 1987, Dasgupta and Huang published another version of OBE (D/ H OBE) [8]
and showed the “convergence” of its estimator to a non-inﬁnitesimal region of the
true parameter vector under the same assumptions. An interesting and practically
important feature of the algorithm is that it has an efﬁcient 0(m) check for innovation
instead of 0(m“) in F/ H OBE. However, comments are found in literature [14, 37]
on the lack of interpretability of its optimization criterion [14], i.e., it minimizes, at
each step, a quantity which is not directly related to the “size” of the ellipsoids.

Thereafter, other versions of OBE algorithm, e.g. set-membership weighted recur-
sive least square (SM-WRLS) [11, 15, 37], and set-membership stochastic-approximation
(SM-SA) [11, 27, 35], were developed by Nayeri, Deller, and their students. SM-WRLS
is the ﬁrst algorithm to clearly relate the OBE philosophy and conventional WRLS
(see [11]). They also showed that all versions of OBE algorithm based on the same
optimization criterion of minimizing the ellipsoid at each step, are basically the same
algorithm [14]. Speciﬁcally, F/ H OBE, SM-WRLS, and SM-SA, though different in
their weighting policies and computational complexities, have identical estimators,
ellipsoids, selected data, and convergence performance. A suboptimal check for inno-
vation is introduced to SM-WRLS that results in a computationally efﬁcient version

of “interpretable” OBE algorithm [13, 37]. Further, using SM-SA, the ﬁrst proof

of convergence in probability (p. convergence) [17] for OBE algorithms were accom-
plished under the assumptions of independent identically distributed (iid) noise and
symmetric bounds.

The last OBE-like algorithm Seeking to improve the performance of OBE in this
review is optimal volume ellipsoid (OVE) algorithm proposed by Cheung et a1. [7] in
1991. The algorithm uses an afﬁne transformation [21] to get the smallest ellipsoid
at each step. The simulations in [7] show that the ellipsoid at each step is smaller
than those of OBE. However, the estimator of OVE, at the expense of computational
cost, does not show signiﬁcant improvement over those of OBE’s (also see [27, p.6]).
A proof of “convergence” of OVE similar to that of D/ H OBE was provided in [7].

In 1991, Veres and Norton published a paper [44] presenting the ﬁrst p. conver-
gence (under very general conditions) of the exact polytope algorithm (EPA) which
is an SM algorithm with a polytope feasible set at each step. Although they also
provided a proof of almost sure (a.s.) convergence (or strong convergence in their
notation with equivalent deﬁnition) for the EPA, the justiﬁcation is arguable. What

they have proved for a.s. convergence [on a probability space ((1, f, P)] is:
P(U:.“i=1{w= no. — o.” < 6}) = P(,,]i_{IgO(U$."=1{w= no. - o." < e1»: 1. v: > o (1.1)
or, equivalently, after suppressing w for brevity,
manna. - a.“ > e» = ammrsuw. — a.“ > e») = o, v e > o (1.2)

which are not sufﬁcient to imply a.s. convergence. This becomes evident upon com-

paring (1.1) with (3.3), or (1.2) with (3.6).

1.4 Motivation for New Algorithms

Despite their superiority over RLS in computational efﬁciency and speed of conver-
gence in many noise cases, OBE algorithms have been rarely found in real-world
applications. The main reason is that the assumption of the known noise bounds,
while less restrictive (without the knowledge of statistics of noise, e.g., mean, variance,
probabilistic distribution, etc.), is often impractical. Lack of proof of convergence for
OBE in more realistic noise cases (than symmetric iid noise) is another (minor) rea-
son.

Since the noise sequence {v,,} is usually unobservable in practice, exact noise
bounds are not available in most applications. Without the knowledge of exact noise
bounds, the performance of convergence and the speed of convergence of OBE suffers.
If one or more bounds are underestimated, i.e., v: < 7,, at one or more n, then the
algorithm is no longer theoretically valid. Simulations reveal that the underestimation
of the noise bounds results in an inconsistent [17, 42] or diverging estimator.

A conservative sequence of bounds {7,,} (overestimated bounds) will assure a
meaningful ellipsoid at each n. However, recent work of Nayeri et al. [27, 35] has
demonstrated that the estimator may be very imprecise, even asymptotically, if the
bounds are too “loose.”

To illustrate the effects of incorrectly estimated error bounds, an AR(3) model for

Fig. 1.1 is constructed as follows:
yn = 03X" + ”n (1.3)

in which x,, = [y,,_1 , y,,_2, y,,_3]T is a sequence of observable vectors, 0,. = [2, —1.48, 0.34]T
is the unknown parameter vector to be identiﬁed, and the unobservable noise sequence

{v,,} is uniformly distributed and iid with constant bound 7,. = 1. Figure 1.2 shows

2.1

 

1.95

 

 

 

 

1.9

 

 

0.5

 

 

 

 

 

4511111;1 . 1
10002000300040005000600070000000900010000

 

Figure 1.2: OBE algorithm with 7,, = 0.8 (underestimated) on model (1.3). v,, ~
U ( — 1 , 1).

the simulation result of OBE (SM-SA) algorithm (see Chap. 2 for details of the formu-
lation) identifying this AR(3) model with under-estimated error bound 7,, = 0.8 for
all n. As seen in the ﬁgure, the volume of the ellipsoid becomes negative quickly and
the estimator diverges. This is the result of violating the underlying noise-bounding
assumption of the algorithm. On the other hand, the simulation result of the algo-
rithm using over-estimated error bound 7,, = 1.2 Vn is shown in Fig. 1.3. As seen in
the ﬁgure, the estimator does not converge well to the true parameter (a, = 2 in the
ﬁgures) even after 10,000 iterations.

The speed of convergence of OBE becomes slower if a looser overbounding se-
quence is employed. This is observed in Fig. 1.4 which shows the simulation results
of 100 runs of the OBE algorithm on model (1.3). The numbers on the ﬁgure beside
each dashed line are the estimated noise bounds.

In this dissertation, a new algorithm, OBE-ABE, and its variants are devised to
improve the applicability of OBE algorithms while preserving or improving the supe-
rior performance of OBE algorithms in real-world applications. Proofs of a.s. conver-

8

 

 

 

 

 

 

1.“ .1
1. L l l l 1 I l l l
9 1WD 2000 3000 4W 5W 6N0 7WD 0000 9000 10000
VoiumesoiEiihooids
10" r I 1 j I I l I I
i : :
10'0E..._:.. i ..
.1 f
. 5 i
10° k... .1

 

 

owoozoooaooowoosoooeooomoosooosooowooo

Figure 1.3: OBE algorithm with 7,, = 1.2 (overestimated) on model (1.3). v,, ~
U(—1,1).

 

102 r -
10° .

 

 

 

 

 

260 430 e130 aéoioboaziaoiiooiiooiioozooo
Figure 1.4: Volumes of ellipsoids (after 100 runs) of SM-SA with 7,, = 1,1.1,1.5
respectively, and OBE-ABE (70 = 1.5) on model (1.3). v,, ~ U (—l, 1).

gence and p. convergence of the new algorithms in the cases of iid, mixing, ergodic,
and non-stationary noise with asymmetric bounds are provided. The new algorithms
have efﬁcient computation, excellent convergence performance and tracking capabil-
ity. The details of the new algorithms are found in the subsequent chapters. A

summary of the major contributions of this research is found in Chap. 7.

10

Chapter 2

OBE Algorithms with

Automatic Bound Estimation

 

2.1 Introduction

In this chapter, the new algorithms, OBE-ABE, Sub-OBE-ABE, and the adaptive
versions of these two algorithms are introduced. The algorithmic steps are listed in

Tables 2.1 — 2.3. More details of these algorithms are found in subsequent chapters.

2.2 Formulation of OBE algorithms

A dynamic system or signal can often be modeled by a linear-in-parameters model
ya = 0339. + vn (2.1)

in which 0... = [a1., - . - ,ap., b0.., b1..., - - . , bq.]T is the unknown parameter vector to be
identiﬁed; {x,,} is a sequence of observable vectors of dimension m = p + q + 1;
and {v,,} is an unobservable noise or model-error sequence. An important special

case is the auto-regressive with exogenous input (ARX) [42] model in which x,, =

11

[y,,_1, - . - ,y,,_p,u,,, - . - ,u,,_q]T is the observed data set composed of samples of the
observable input sequence {u,,} and output sequence {y,,}.
All versions of OBE algorithms [8, 11, 14, 20, 35] are based on the premise that

v,, has a pointwise bound that is known a priori,
v: S 7,,, ‘v’n E N. (2.2)

Let A,- denote the parameter set at time i such that all elements in A,- are feasible
parameter estimates consistent with (2.2). In conjunction with (2.1), it is clear that

A,- is a (hyper-) strip region which can be expressed as
A; = {i9 2 0 E Rm, (3]; — 0TXi)2 S. 7i}

As time i goes from 1 to n, the feasible set at time n, B,,, is the intersection of A,,
Vi E [1,n], i.e.,

B" = n?=1A,.

In general, 8,, is a convex (hyper-) polytope in R“ which is mathematically difﬁcult
to track. Some SM algorithms are based on minimizing B,, at each step while OBE
algorithms are based on minimizing a (hyper-) ellipsoid E',, [see (2.3)] which is a
superset of B,,.
It can be shown [14] that the ellipsoid associated with any OBE algorithm is given
by
E, ‘1—3‘ {9 : (o — 0,)T P;1 (a — 0,) < in} (2.3)

where 0,,, 5,, and the matrix P,, are computed recursively using
an = XnTPn_1Xn (2.4)

12

5,, = y,-93f_,x,, (2.5)

 

1 ,BnPn—lxnanPn-l
pa = _ pn_ _ 2.
(Yul 1 an'l'ﬂnGn l ( 6)
0,, = on-l'l’ﬂnpnxngn (27)
2
a,, 7157,
«In = awn-1 +ﬂn7n-ﬁm- (2.8)

Recursions (2.4) — (2.8) comprise a general OBE algorithm. The ellipsoid center 9,,
can be used as an estimator of the parameters 0,. at each n. The matrix P? satisﬁes

the following recursion:

P;1 = a,, P31 + 6,, xnxz. (2.9)

The OBE algorithm is usually initialized with 00 = 0, no = p and P0 = L151, where p
is a small number, typically 10‘3. In almost every OBE algorithm, the nonnegative
data weights {a,,} and {ﬁn} are calculated according to an optimization criterion
which minimizes the “size” of the set E,, at each iterationl. When such optimal
weights do not exist, the updating need not take place. This prominent feature is
called selective updating.

OBE algorithms are distinguished from one another by the choices of weighting
sequences {a,,} and {Ba} [14]. For example, the SM-SA algorithm has a,, = 1 — )1,,

and [3,, = An in which A" is given by

-bn+\/b3,-4ancn

2a” , if c,,<0

‘)

A“:
0, if anO

where

a,, = m7,, — meg, + mGi7,, — 2mG,,7,, — n,,-1G,, + rc,,_1 G: + 0,,7,, — 03,7“ — 520,,

b,, = 2mg“, - 2m7,, + 2mG,,7,, + 2n,,_1G,, — “-103, — Gn7,, + 6:0,,

 

1The exception is the D/ H OBE algorithm [8] in which a,, is minimized at each iteration.

13

c,, = m7,, — me: — a,,_1G,,. (2.10)

2.3 OBE-ABE Algorithm

While OBE algorithms require a priori knowledge of exact noise bounds to achieve
convergence and fast speed of convergence (see Fig. 1.4 and Theorem 3.6 in Chap. 3),
the new algorithm, OBE-ABE, does not require this knowledge. OBE-ABE theoreti-
cally requires a lower bound of tail probability but practically does not (see Remark 2
after Theorem 3.3). Initialized with any overestimated noise bound, OBE-ABE au-
tomatically (or blindly) updates its estimated noise bound 7,, to the unknown true
bound 7..., while concurrently shrinks the ellipsoids to get a consistent estimate. The
OBE-ABE algorithm is found in Table 2.2. As seen in the table, the ABE procedure
(below “otherwise”) features selective updating and is computationally negligible,
hence preserves the computational efﬁciency of OBE. Further, the ABE procedure
guarantees the convergence of the estimator, increases the speed of convergence, and

increases the efﬁciency of updating the estimator (see Chap. 4). The convergence

analysis of OBE-ABE algorithm is found in Chap 3.

2.4 Sub-OBE-ABE Algorithm

In optimal form, all versions of the OBE algorithm except D / H OBE have an 0(m2)
check for innovation at each step. Hence, the computational complexity of OBE
algorithms is similar to the popular RLS. To fully beneﬁt from the selective updating
feature and achieve superior computational superiority over RLS, a modiﬁcation of the
check for innovation to an 0(m) operation is necessary. One approach is developed by
Deller et al. [11, 13]. The modiﬁed OBE algorithm (Sub-OBE algorithm) in [11, 13]

are 0(pm2) complexity with p indicating the fraction of data found to be innovative.

14

 

 

I. Initialization:
1. 00 = 0, no = p and P0 = Eli-I, where p is a small number, typically 10‘3.
2. 70 = any overestimated bound.
3. Choose 6 (small positive number), M and N (see Remark 2 following

Theorem 3.3).
II. Recursion:
For n = 1 : N
If a,, < 0
Execute recursions
Gn = xzpn-lxn
5n = yn — oil—Ix"

,BnPn-l xnanPn—l

an + 31101:
9n = 071-1 + ,BnPanEn

 

1
Pn = anpn-l - ]

071167153;
CIn + ﬂnGn .

A:11 = annn—l + iBn7n —'

Otherwise,
If a time interval I of length M over which a,, Z 0 is found,

7n={7,,_,—dJ, ifdJ>0

7,,..1 , otherwise,

where d; déf n1_1GJ/m — e(2,/7,,_1 — e) and J = arg maxnel 5,2,.

 

Table 2.1: The OBE-ABE algorithm.

15

 

Hence, for small m (<10), the Sub-OBE algorithm is an 0(m) algorithm since p is
typically near 0.1 for a uniformly distributed {v,,}.

However, the same 0(m) formula for checking of the Sub-OBE algorithm cannot
be applied directly to the OBE-ABE algorithm, since the ABE procedure will fail
[(3.61) no longer holds]. Hence, the Sub-OBE—ABE algorithm is endowed with another

0(m) checking formula for innovation:

—- 2 T

where d,, has the recursion“:

T
dn = andn—l + ﬂnxnxno

Justiﬁcation of this check is found in Chap. 3. The steps of Sub-OBE—ABE algorithm
are shown in Table 2.2. The Sub-OBE—ABE algorithm is compared to the OBE-

ABE algorithm with regard to speed of convergence and computational complexity

in Chap. 4.

2.5 Adaptive Sub-OBE-ABE

Tracking performance of adaptive OBE algorithms to time-varying parameters have
been investigated in [11, 37, 38]. In those papers, adaptive OBE algorithms have been
shown to have superior tracking capability to RLS, LMS, and their variants.

In [38], Rao and Huang modiﬁed D/ H OBE algorithm by resetting n,,_1, when-
ever n,, < 0, to a value C + K1 or C + K2, where K1, and K2 are two positive values

obtained from the algorithm’s parameters and data at time n, and C is a “safety fac-

 

2Preliminary version of Sub-OBE-ABE algorithm computes l/d,, to be the minimum eigenvalue
of P,, that features selective updating. Nayeri modiﬁes it to this 0(m) recursion.

16

 

 

I. Initialization:
1- 90=oaﬂo=ﬂido=ﬁfiwdpo=ﬁrli
where p is a small number, typically 10‘3.
2. 70 = any overestimated bound.
3. Choose 6 (small positive number), M and N (see Remark 2 following

Theorem 3.3).
II. Recursion:
For n = 1 : N
Ha<0
Execute recursions
G,, = xZPndxn
5n = yn - OZ-IXH

1 ,BnPn-lxnanPn—l
Pa = '— Pn- '—

an[ 1 0,; + ﬂnGn
0n = 011-1 + ,BnPanEn

 

]

011131.63,
an + .BnGn
dn = audit-l + ﬂnxzxrv

Kn = CYnﬁin-l + ﬂn7n "

Otherwise,
If a time interval I of length M over which E,, Z 0 is found,

,7 ={ 7n-l_dJ9 ifdJ>0

7,,..1 , otherwise,

where d,, déf n1_1x§xJ/(d1-1m)—e(2,/7,,_1-e) and J = arg maxnel 53,.

 

Table 2.2: The Sub-OBE—ABE algorithm.

17

 

tor” chosen to be 1. Equivalently, their adaptive OBE algorithm is obtained through
modifying D/ H OBE algorithm to include the resetting of n,,_1 to a value greater
than 1 whenever m, < 0. Resetting n,,-1 is equivalent to expanding the ellipsoid.

In [11, 37], Deller et al. basically proposed three methods, windowing, graceful
forgetting and selective forgetting, for modifying OBE algorithms to be adaptive to
time-varying parameters. The basic spirit of those modiﬁcations is the various ma-
nipulations of forgetting factors (data weightings) similar to those of adaptive RLS
(or WRLS). Among those methods, it is observed in [11, 37] that the best policy
regarding tracking performance is the selective forgetting method which back-rotates
previously accepted data sets when a,, < 0 until a,, > 0. This is equivalent to the
expansion of the ellipsoid according to a systematic criterion.

In this dissertation, an adaptive Sub-OBE—ABE algorithm is presented by modify-
ing Sub-OBE-ABE to include the resetting of n,, = 0.1 whenever m, < 0. Despite its
simplicity and computational efﬁciency, the adaptive Sub-OBE-ABE algorithm has
shown excellent tracking performance in both slow and fast time-varying parameter
cases (see Chap. 4). The steps of adaptive Sub-OBEABE algorithm are shown in
Table 2.3.

18

 

 

I. Initialization:
I. 00:0,no=p,do=;1;,andPo=:l;-I,
where p is a small number, typically 10"“.
2. 70 2 any overestimated bound.

3. Choose 6 (small positive number), M and N (see Remark 2 following
Theorem 3.3).

II. Recursion:
For n = 1 : N
Ha<0
Execute recursions

Gn : ann—l xn

 

6,, = y,, — 0:439,

1 ﬂnPn—lxnanPn—l
Pu = —' Pn- _

aﬂl 1 an + 31101; ]
on = on-l + ﬂnpnxnsn

anﬂn63,

Km — OnKn—l + 57.77; - m
dn = andn-l + ﬂnxzxn-

If 15,, < 0, set 51,, = 0.1.
Otherwise,
If a time interval I of length M over which E,, Z 0 is found,

,7 ={ 7n—1-dJ, ifd.I>0

7,,-1 , otherwise,

where d; 212 nJ_1x§xJ/(dJ-1m)—e(2,/7,,..1 —e) and J = arg maxﬂel 6:.

 

Table 2.3: The Adaptive Sub-OBE—ABE algorithm.

19

 

Chapter 3

Convergence Analysis

 

3.1 Introduction

This chapter provides proofs of a.s. convergence and p. convergence of OBE-ABE
and Sub-OBE-ABE in the cases of iid, mixing, ergodic, and non-stationary noises.

Throughout this chapter, the model considered is a stable ARX model

y. = 03x. + v. (3.1)
in which 0.. = [a1., - . - ,ap., b0.., b1.., - - - , b,.] is the unknown parameter vector to be
identiﬁed; {x,,} = [y,,_1, - - - , y,,.,,, a,,, - - - , u,,_,,] is a sequence of observable data sets

composed of the observable input sequence {u,,} and output sequence {y,,}. The
dimension of {x,,} is m = p+q+ 1. The unobservable noise (or model-error) sequence
{v,,} is assumed to be asymmetrically bounded with unknown least upper bound ﬂ
or greatest lower bound —\/7—... for all n. The results of this chapter are generalizable
to cases in which 3],, and v,, are vectors (e.g., [11, 37]).

For stochastic analysis, v,,, u,,, and y,,, etc., are modeled as random variables (r.v.)

deﬁned on a probability space (0,.7, P) where Q is a sample space, .7: a o-ﬁeld, and

20

P a probability measure. Clearly, for the ARX model, if we let
In déf a{vma um+1,m S n},

then x,, is fad-measurable.
Note that the usual assumptions of WSS, SME, and whiteness of {v,,} which are
essential for the convergence of RLS, LMS, IV, etc., are not needed for OBE-ABE

and its variants.

3.2 Deﬁnitions

For the concise statement and analysis of new theorems, the following deﬁnitions are
introduced. Most deﬁnitions are cited from the literature. The ﬁrst two below deﬁne
the convergence of a random sequence to a random variable. These two types of

convergence are treated in this dissertation.

Deﬁnition 3.1 [17, 42]. An estimator 0,, is called p. consistent, or p. convergent

(convergent in probability) if for all e > 0,
£1310 P(||0,, — 0..” > e) = 0 (3.2)

where I] - || denotes any valid norm.

Note that p. convergence implies (is stronger than) weak convergence or, alternatively,

convergence in distribution.

Deﬁnition 3.2 [17, 42]. An estimator 0,, is called a.s. consistent, or a.s. convergent
( convergent almost surely) if

P( lim 0,, = 0...) = 1. (3.3)

”#00

21

Note that (3.3) is equivalent to
P(||6,, — 0...” > e i.o.) = 0, V e > 0 (3.4)

in which 1.0. denotes inﬁnitely often with respect to 11..
Comparing (3.2) with (3.4), it follows that a.s. convergence implies p. convergence.

Equation (3.4) can be written as
P(US.°=1 021m (Il9n - 9:” > 5)) = 0, V 6 > 0 (3-5)

or equivalently,

19931330 nggmuw, — o.“ > 6)) = o v e > o. (3.6)

Since the sets ng°=m(||9,, — 0...” > e) is increasing with m, (3.6) can be written as
A1330 P(ﬂf,°=m(||0,, — 0..” > 6)) = 0 V e > 0. (3.7)

Although surely convergence [17] is theoretically stronger than a.s. convergence, the
latter is regarded as the strongest type of convergence employed in practice (called

strong convergence in some literature).

Deﬁnition 3.3 A random sequence {v,,} is called asymptotically independent if
3112) |P((v,- E A)ﬂ (v,+,, 6 B)) — P(v,~ E A) - P(v,-+,, E 3)] = 0, Vi andV A,B E 1:.

For convenience, let us denote e-neighborhoods of noise bounds as

D?=[ﬁ-€,x/7:l

22

and

D? = l‘ﬁa-ﬁ'l’fl-

The following deﬁnition is seen in [44] which is a sufﬁcient condition employed in

their convergence analysis of the EPA algorithm.

Definition 3.4 [44]. A random sequence {v,,} is called uniformly conditionally heav-
ily tailed (UCHT) if there exist Cl > 02 > 0 and an inﬁnite subsequence {t,-} C N,

such that, with any suﬂiciently small 6 > 0,

C, e S P(v,, E (D: U 0:) I}— -1) S C26 a.s. Vn 6 {t,}. (3.8)

Deﬁnition 3.5 A random sequence {v,,} is called uniformly conditionally tailed
(UCT) if given 6 > 0, there exist a 6 > 0 and an inﬁnite subsequence {t,} C N,
such that

P(v,, E (D: U D?) [7,,4) > 6 a.s. Vn 6 {t,}. (3.9)

Note that, throughout the dissertation, the assumption that {v,,} is asymmetri-

cally bounded with the UCT condition means that either

P(v,, E D: I 3,4) 2 6 a.s. and P(v,, 6 D: I f -1) = 0 a.s.

01‘

P(v,, E D: I .7: -1) = 0 a.s. and P(v,, E D: I .7: -1) Z 6 a.s.

holds for all n. Also, the assumption that {v,,} is symmetrically bounded with UCT

condition means that the following condition holds:

P(v,, E D: I fad) Z 61 a.s. and P(v,, E D: I .7: -1) Z 62 a.s.

23

Deﬁnition 3.6 A random sequence {v,,} is called uniformly tailed (UT) if given

6 > 0, there exist a 6 > 0 and an inﬁnite subsequence {t,} C N, such that
P(v,, E (D;+ U D:)) > 6 a.s. Vn 6 {t,}. (3.10)

Note that UT and UCT conditions are less restrictive than UCHT. For example,
“triangle” or “bounded-Gaussian” distributed iid sequence satisﬁes UCT and UT
conditions but not UCHT.

The following deﬁnition of almost periodic function (a.p.) which is due to Bohr
[6], and a lemma (Lemma A.3 in Appendix) which relates the type of random variable

to an a.p. function are introduced for the proof of main theorems.

Deﬁnition 3.7 [1, 6]. A continuous function (P(t) is almost periodic (a.p.) if given
6 > 0, there exists a length l(£) such that every interval of length l(e) contains at least

a number 1', and |<I>(t + r) — <I>(t)| < e, Vt E R.

The set of real numbers 1', {r},, deﬁned above is called relatively dense [6]. Note
that l(e) exists if and only if there do not exist arbitrarily large gaps among the
numbers 1'. Clearly, a continuous periodic function is also a.p.

For the following deﬁnitions and some lemmas in the Appendix, let K (34‘ N or Z,
RK {l-Ef {(---,x1,x2,---):x,- E R, Vi E K},

and

BK‘ié‘{(.--,A,,A,,m):A,es, Vie K}

where B is the Borel set.

24

Deﬁnition 3.8 [17]. {X}, : k 6 K} is stationary if
(---,X1,X2,"')g("',Xk,Xk+1,°°°), VkEK

where 2 denotes equality in distribution.

Deﬁnition 3.9 [17]. The measurable transformation T : Q —+ Q is called a measure

preserving transformation (m.p.t.) ifP o T‘1 = P, i.e., P({w : T(w) E A}) = P(A).

Deﬁnition 3.10 [17]. A set A E .7: is called invariant {or T-invariant) if A =
T'1(A), i.e., A = {w : T(w) E A}.

Deﬁnition 3.11 [17]. The invariant set I drc-f {A E f: T'1(A) = A] is called trivial
ifP(A) 6 {0,1} VA 6 I.

Deﬁnition 3.12 [17]. The m.p.t. T is called ergodic if I is trivial.

Deﬁnition 3.13 [17]. The m.p.t. T is called mixing if

”113,10 |P(A n T‘"(B)) — P(A) - P(B)| = o v A,B e f. (3.11)

Note that the analogous deﬁnitions of mixing and ergodicity for a stationary
sequence can be infered from Deﬁnition 3.12, Deﬁnition 3.13, and Lemma A.5. By -
Deﬁnition 3.13, a stationary sequence is mixing if and only if it is asymptotically

independent. Also, by Lemma A.6, mixing implies ergodicity.

3.3 Persistency of excitation condition

Persistency of excitation (PE) (deﬁned below) is a necessary condition [5, 31, 42]

for the convergence (consistency) of recursive algorithms such as LMS, RLS, and

25

their variants. For SM algorithms that assume bounded noise, PE is at least one
of the sufﬁcient conditions for convergence (see theorems in this chapter and [44]).
Simulations even show [32] that PE is necessary for the consistency of the estimator
of OBE algorithms.

The following deﬁnitions of PE that appear in literature are rewritten and related.

The ﬁrst deﬁnition is the most general one employed in proofs of the new theorems.

Deﬁnition 3.14 [44]. A sequence of random vectors, {x,,}, is called persistently

exciting (PE)3 or omni-directional if for any nonsingular cone

K = {X2X=0181 +"'+amem1 det(ela"'aem)7“0i ai >0,Vl},

there exist p1 and p2 such that

liginghn E K [3.4) 2 p1 > 0 a.s., (3.12)
and
E(||x,,|] I}. -2) 2 p2 > 0 a.s. V n. (3.13)

The condition (3.12) means that the orientation of x,, is sufﬁciently varied in a
conditional-probability sense while the condition (3.13) means that the magnitude of
x,, cannot be too small on average. These conditions imply the following conditions
which constitute another PE deﬁnition frequently used in the convergence analysis of

conventional LSE recursive algorithms.

Deﬁnition 3.15 [31]. A sequence of random vectors, {x,,}, is called PE if there exist

p1 and p2 such that

liﬂgﬁEknxz m4) 2 p11 > o a.s. (3.14)

 

3Please read “PE” as “persistency of excitation” or “persistently exciting” as appropriate in
context.

26

and

E(||x,,|| If -2) 2 p2 > 0 a.s. V n. (3.15)

Violation of (3.14) is effectively a.s. restriction of x,, to a proper subspace of R”.
This justiﬁes the assertion that (3.12) implies (3.14). Also note that the assumption
of stability imposes an upper bound on (3.14).

The fact that

1”3(1*3(XnX3.~ Via-2)) = 130904.”),

leads to an equivalent deﬁnition of PE:

Deﬁnition 3.16 A sequence of random vectors, {x,,}, is called PE if there exist p1
and p2 such that

liﬂgfmxnxf) 2 ml > 0 a.s. (3.16)

and

E(||x,,|| I fn_2) 2 p2 > 0 a.s. V n. (3.17)

Condition (3.16) means that the autocorrelation matrix of x,, is positive deﬁnite
asymptotically. For the stationary case, it becomes positive deﬁnite for all n as in
condition (3.18) below.

Deterministic approaches to convergence analysis assume the stationarity and er-
godicity (or at least WSS and SME) of the signals. In this case, Deﬁnition 3.16 is

equivalent to the following.
Deﬁnition 3.17 [5]. A stationary (or at least W55) sequence of random vectors,

{x,,}, is called PE if there exist p1 and p2 such that

E(x,,x:) 2 p11 > 0 a.s. V n (3.18)
27

and

E(||x,,|| I]: -2) 2 p2 > 0 a.s. Vn. (3.19)

By the Ergodic Theorem [17], Deﬁnition 3.17 is equivalent to the following.

Deﬁnition 3.18 [5, 42]. A stationary and ergodic (or at least WSS and SME) se-
quence of random vectors, {x,,}, is called PE if there exists an N1 6 N and p1, p2 > 0
such that for all n

n+N1
Z XkaT 2 p11 > 0, (3.20)
k=n+l
and
”x,,” 2 p2 > 0 Vn. (3.21)

Note that the stability assumption imposes an upper bound on (3.20). Using matrix

algebra, Deﬁnition 3.18 can be shown to be equivalent to the following.

Deﬁnition 3.19 [5]. A stationary and ergodic (or at least WSS and SME) sequence
of random vectors, {x,,}, is called PE if for any unit vector e E R'”, and for all n,

there exist N1 6 N and p1, p2 > 0, all independent of e, such that

n+N1
Z ||xfe|| 2 p, > 0, (3.22)
k=n+1
and
“x,,” 2 p, > o v n. (3.23)

28

3.4 Lemmas

The following lemmas and those in the Appendix (cited from the literature) are

essential for the proofs of the main theorems in this work. For convenience, deﬁne

c,,, ifc,,<0

0 , otherwise.

c,,, if c,,ZO

=+

0 , otherwise.

where c,, is the check for innovation (2.10). Then, by Lemma A.2, lirn,,_.,,o c; = 0.
However, since c,, is a random variable,
lim,,_.co c; = 0 does not imply lim,,_..,.o P(c,, < 0) = 0. For example, let

1 , with probability 0.5

Cu:

—1/n , with probability 0.5.

Then, lim,,_.°° c; = 0 while lim,,_.oo P(c,, < 0) = 0.5. Hence, the following lemmas
are needed for the proof of Lemma 3.3 which, in turn, is essential for the proofs of

the main theorems.

Lemma 3.1 Let X,Y be random variables. If X is continuously distributed, then
X + Y is continuously distributed. If, in addition, Y 7’: 0, then X Y is continuously

distributed.

Proof: Let Z = X + Y, then,

13(2): °°fxy(x.z—x)dx= ”insane—mien.

_m -00

29

where f denotes the various probability density functions. By assumption, f x(x) does
not contain a Dirac delta function, so fz(z) does not. Hence, X + Y is continuous.

Let W = XY, then,

°° 1 w °° 1 w w
fw(w) = /_00 mfxr (5,31) 613/ = [00 mfx (y) fylx (31,?)‘131-

Since Y 75 0 and f x(-‘;—’) does not contain a Dirac delta function, it follows that fw(w)

does not contain a Dirac delta function. Hence, X Y is continuous. Cl

Lemma 3.2 Assume that model (3.1) is a stable ARX model, and {u,,} and {v,,}
are bounded. If both {u,,} and {v,,} are asymptotically independent sequences, and

{u,,} is independent of {v,,}, then y,, converges to a continuous random variable as

12—900.

Proof: First, rewrite (3.1) as

P 9
ya — Z ajyn—j = Z bjun—j + v,,. (3.24)

i=1 i=0

Let

9
V,, d—i-f z b,u,,_j + v,,.

i=0
Note that {V,,} is an asymptotically independent random sequence. Let h,, be the
discrete-time impluse response of the system (3.24), i.e., y,, = h,,, if V,, is a Kronecker

delta function. Then,

.21. = Z hm-.. (3.25)

i=0
By the Lebesgue Decomposition Theorem [17], a random variable can be decomposed
into a discrete part and a continuous part. However, by (3.25) and Lemma 3.1, y,, is

continuous if K, is continuous. Hence, V,, can be assumed discrete for the remainder

30

of the proof. That is, assume
P(Vn=qk)=pk>0, k=1,2,...,K (3.26)

where q), is bounded since V,, is bounded by assumption.

Let <I>v,,(t) denote the characteristic function of V,,. Then, from (3.26),

K
<I>V,,(t) = Zpkeltq", where i = \/—1.
k=l

Hence

K
‘I’h1v.-.(t) = 2 me“""’*- (327)
k=l

Note that elm?" is a periodic function of t with primary period 7',- = 27r/hj9k for all
j,lc, in which hj 96 0 and q), 7t 0. Since h,- is a stable inﬁnite impulse response, it
follows that limjnoo hj = 0. Hence, given 61 > 0,'there exists an N such that for all
j > N, [th < 61 and there existsj > Nsuch that [th > 0. Hence, r,, = 27r/h,,q,, —+ 00
as n —+ 00. This means that as n —i 00, the set {7'}, of <I>hjvn_,.(t) is not relatively
dense. That is, there does not exist an l(e) for (hit/M, (t) to satisfy Deﬁnition 3.7 as
n —1 00. Hence, <I>;,,v,,_,.(t) is a non-a.p. function as n —) 00.

For ﬁnite n, (P,,, is a.p. by Lemma A.3 since y,, is discrete by (3.25). Hence, (1),,” is
non-a.p. as n —1 00 by the asymptotic independence of {V,,}. Thus, by Lemma A.3,

y,, is continuous as n -» 00. [3

Note that, by Lemma 3.1, if {u,,} or {v,,} or both are continuously distributed,
the assumptions that {u,,} and {v,,} are asymptotically independent sequences and
the independence between {u,,} and {v,,} are not required in the above lemma.

The following lemma asserts that if the noise bounds are overestimated, then the

time between updates becomes inﬁnite as n —+ oo.

31

Lemma 3.3 Assume that model (3.1) is a stable ARX model, and {u,,} and {v,,} are
bounded. If PE holds, both {u,,} and {v,,} are asymptotically independent sequences,
{a,,} is independent of {v,,}, and the noise bounds are overestimated for OBE al-
gorithms, i.e., if there exists an 61 > 0 and an N E N such that for all n > N,
7,, -— v,,2 > 61, then the expected updating interval of the estimator 0,, approaches

inﬁnity as n approaches inﬁnity.
Proof: Let i, = 0,, — 0.. From (2.2) and (2.5),
7.. - 63, = 7.. - (yn - 95.1%)” = 7.. - (vn - 51.13%)?

Hence, from (2.10),

~

on : m7,, — 777,5?1 — “in—1G1; = m7,, — m(v,, - 0£_1Xn)2 — E,,_IG,,. (3.28)

Let

¢n ‘2‘ 171(1),, ‘- é:_1xn)2 + Kn_1Gn Z 0. (3.29)

From (3.28),

c,, = m7,, — d,,. (3.30)

From Lemma A.2, it follows that
1133ng c,, = liﬂglﬁc: + c; ) 2 0.
This means that, for any 6 > 0, there exists an N such that for all n > N,

c,, > —me. (3.31)

32

It follows from (3.30) and (3.31) that, for all n > N,

(h,, < m(7,, + 6). (3.32)

From (3.30) and (3.32), it follows that

P(c,, < 0) = P(m7,, < 45,, < m(7,, + 6)) V n > N. (3.33)

This probability approaches zero as 6 —> 0, hence as n —+ 00. To prove this, ﬁrst note

that, as n --1 00, y,, in (3.1) is continuous by Lemma 3.2. From (2.4),

P q
_ T _ 2 2
an — xn Pn-lxn — Z Vn,iyn_,‘ + Z Vn,i+p+l a,,—i

i=1 i=0

where V,,,,- > 0, Vi, are the eigenvalues of the positive-deﬁnite matrix P,,-l. Hence,
by Lemma 3.1, G,, is continuous as n -—1 00. Also, from Lemma 3.7 (to follow), 16,,
does not converge to zero since 7,, are overestimated V n > N by assumption. Thus,
it follows from Lemma 3.1 that x,,-lGn is continuous as n -+ 00. Finally, from (3.29)
and Lemma 3.1, 45,, is continuous as n —* 00. This proves that the probability in
(3.33) approaches zero as n -> 00.

Since an OBE algorithm updates its estimator only when c,, < 0, the probability
of an update diminishes as n —+ 00. This implies that the expected updating interval

approaches inﬁnity as time increases indeﬁnitely. Ci

Note that, if {u,,} or {v,,} or both are continuously distributed, the assumptions
that {u,,} and {v,,} are asymptotically independent sequences and the independence
between {u,,} and {v,,} are not required in the above lemma.

The following lemma asserts that the sum of two independent, ergodic, stationary

sequences is stationary and ergodic.

33

Lemma 3.4 If both {X}, : k E K} and {Y}, : k E K} are ergodic, stationary sequences,
and {X}, : k G K} is independent of {Y1c : k 6 K}, then {Z}, = X), + Y), : k E K} is

also stationary and ergodic.

Proof: Since {Xn} and {Y,,} are stationary, by Deﬁnition 3.8, there exist sets A, B,
C, and D such that

A = {L02(°°°,X_1,X0,X1,"°)EC} (3.34)
= {w i ( ° ° 1Xn—11Xn1Xn+la' ' °) 6 C}, (3.35)
and
B : {W:('”1Y-11)/01Y11”')€D} (3.36)
= {wz(---,Y.-1,Y.,Yn+1,---)eD}. (3.37)

Hence, A and B are invariant sets by Deﬁnition 3.10. Also, by the ergodicity of {Xn}
and {Y,,}, Deﬁnition 3.11, and Deﬁnition 3.12,

P(A) 6 {0,1} and P(B) 6 {0,1}.

By (3.34) and (3.36), let F be such that

F=AnB={w:(°°°1X—17X01X11°°°)ecn('°'1Y-11Y()9Yla”')€D}

Then, there exists an E such that

F = {WIW'(,(X_1+Y_1),(Xo+I/o),(X1+Y1),')€E}

{w:(---,Z_1,Zo,Zl,---)€ E}. (3.38)

34

Similarly, by (3.35) and (3.37),
F = {w : ( . - ,Z,,_,,Z,,,Z,,+1, - - ) E E}. (3.39)

Thus, by (3.38), (3.39), and Deﬁnition 3.10, F is shift-invariant with respective to
{Zn}. Also, by the independence of {X,,} and {Y,,},

P(F) = P(A n B) = P(A) - P(B) 6 {0,1}.

Hence, by Deﬁnition 3.11, F is trivial. Therefore, by Deﬁnition 3.12, {Zn} is station-

ary and ergodic. D

3.5 Almost Sure Convergence of OBE-ABE

In this section, the theorems of a.s. convergence of the OBE-ABE algorithm are
introduced and proven in iid, mixing, and ergodic noise cases. For a.s. convergence,
both the input sequence {u,,} and the noise sequence {v,,} are assumed stationary, and
{u,,} is assumed independent of {v,,}. However, for p. convergence, the stationarity
of {u,,} and {v,,} are not required. Further, if {u,,} or {v,,} or both are continuously
distributed, the conditions for a.s. and p. convergence can be further relaxed. The

following is the main theorem for a.s. convergence in the mixing case.

Theorem 3.1 Assume that the stationary sequences {u,,} and {v,,} are independent.
If PE holds, UCT holds, and {u,,} and {v,,} are mixing, then the estimator of the
OBE-ABE algorithm is a.s. convergent.

35

Proof: Since PE (3.12) and UCT (3.9) hold, there exists an inﬁnite subsequence
{t,~} E Z such that for all n 6 {t,-}, (3.9) and

P(Xn E K If -2) 2 p1 > 0 a.s., (3.40)

hold where K is deﬁned in (3.12). Throughout the proof, all 11 considered are in
{t,-}. For convenience, let {n} replace {t,} as the time coordinate in the proof. The
meaning of time intervals, for example, should change accordingly.

Let {v,, : It 6 Z} denote the noise sequence and {u}, : k E Z} the input sequence
of model (3.1). Rewrite (3.1) as

P 9
yn _ z ajyn-j = Z bjun—j + ”11- (3.41)
j=1 j=0
Let
(I f (1
w,, é Zb,u,,_,-, (3.42)
i=0

and V,, déf w,, + v,,. Then, (3.41) becomes

.21.: Z h..-,V,- (3.43)

j=-oo
where h,, is the inﬁnite impulse response, y,,, if V,, is set to a Kronecker delta function.
Note that, by Lemma A.7, {w,,} is stationary and ergodic. Also, by Lemma 3.4, {V,,}
is stationary and ergodic.

Let {1),} be a sequence of time intervals of length M over which the OBE—ABE

algorithm does not update its estimator. Then, for each k = 1, 2, ' - - , K,

«9 “-—°=‘ [9(1) 0(2) o<m>1

36

is a constant vector for all n in each 1),. Hence, by (2.5),

5n = yn- 0,,—1x11 = yn-29(1)1)yn-1—— 29(1)!)3/11—1, (3-44)

1:] i=0
where 0(0)dé 1. From (3. 43) and (3.44), it follows that

11—1

=2 2 9( hn-e-jV

1:0 J—-OO

By changing the order of summations, this equation becomes, for all n > m,

”-1
6,, = Z V Z0(i)h,,_,-_, +712? V Z0(i)h,,_,-_,-.
j=n—m+1 i=0 j=—oo i=0

Letting p = n — j,

171-1 9 co m m—l oo
= 2 v.-. mans 5: v.-. 20101.-.- dé‘ >2 v._,.,,+ 2 vam- (3.45)
:0

P=0 i=0 1):"! i=0 p=m

Note that both g1 and g; are independent of 11. Hence, (3.45) can be written as
5n = 9(Vn1 Vﬂ-l, Via—21 ° ° ) (346)

where g : RK —1 R is measurable and independent of 71. Hence, by Lemma A.7, {a,,}
is stationary and ergodic as M —> 00.

Now, for each I: = 1,2, - - - ,K, it follows from (2.10) that

611 = m(7n —' 53,) — "11-1071 2 01 V". 6 LP (3'47)

Also from (2.5) and letting 0,, déf 0,, — 0,,,

e: = (y. - 031m)” = (... - oils.) . (3.48)

Let A; be the event that v,, 6 D; and 011x” > 0. Let A; be the event that

v,, 6 D2“ and 0: 13% < 0. Also, let A,, (ETC-f A; U A; and denote by Af, the complement

of A,,. Also note that 054x“ is fn-1-measurable.
Let 3;. denote the event that A,, occurs at least once in 1).. Then, for all n 6 [1.4.1,

B). E 7,,_2. Hence, it follows from (3.40) that
P(0Z_,x,, Z 0 IBI.) 2 p1 > p > 0 (3.49)

and

P(i;_,x,, < 0 |3,.) 2 p, > p > 0. (3.50)

Since B]. E .7: -2 C 7""-1 and 0£_1xn is fn_1-measurable, it follows from UCT (3.9)
that

P(v,, e (3,+ u 3;) |5§_,x,, 2 0, 3,.) 2 6, > g > 0 (3.51)

and

~ 5
P(v,, e (D: u 3;) |93,'_,x.. < 0,3,.) 2 a, > 3 > 0. (3.52)

Since A; and A; are mutually exclusive for sufﬁciently small 6, it follows from (3.49),

(3.50), (3.51), (3.52), and the notes below (3.9) that

P(A. (3,.) = P(A; (3,.) + P(A; IB.)
= P(v,, e 3301,11,, 2 0,3,.) - P(5§_,x, 2 0 |3,.)
+P(v,, e Df|5§_,x,, < 0,3,.) . P(6§_,x, < 0 |3,.)
> P(v,, e 3301,11,, 2 0, 3,.) - p + P(v,, e D:|5§_,x, < 0,3,.) . p

> 6 > 0. (3.53)

38

Then, by (3.53) and the deﬁnition of A,,, it follows that

P(lenl > ,/7—.— c |B,.) 2 P(An |B,.) > 6 > 0. (3.54)

Thus, by the Poincaré Recurrence Theorem (Theorem A.1) and (3.54),

P(|6,,| > 7.. — 61.0. |B;.) = l as M —-1 00.

It follows that, for sufﬁciently large M and conditionally on Bk-“ the following
inequality holds with probability 1:

max 63, 2 (J7: — 6)2 = 7.. — 6(2\/:7:— 6) _>_ 7.. — 6(2\/7',: — 6). (3.55)

7161),

Now, let J). = arg max;e 1,. 62

n.

Then, by letting it = Jk, (3.47) can be written as

(replacing J). by J for simplicity)

7,; — 53 — nJ_1GJ/m 2 0. (3.56)
By (3.55) and (3.56),
7.1 — 7:- 2 KJ-lGJ/m — e(2\/7,,—_f — €)- (3.57)
Let
d,, ill-5‘ nJ_1GJ/m — 6(2,/'7,,—_1 — 6). (3.58)

Hence, by (3.57), 7,, is updated (whenever an interval 11, is found at time n) by the

recursion

,_ —d, ifd >0
7,,: l 1 J J , (3.59)

7,,-1 , otherwise

39

is an upper bound of 7., a.s. for each 11:.

Note that PE and the mixing of {u,,} and {v,,} implies that, for overestimated 7,,,
Lemma 3.2 and then Lemma 3.3 hold. Hence, intervals I), exist i.o. It follows that,
with 6 —> 0, (3.59) and Lemma 3.7 imply that 7,, would continue to be updated by
the OBE-ABE algorithm until (1] —-1 0 and 7,, —1 7,. as n —-+ 00. Thus, 11.] —> 0 when
6 —+ 0 since 0,, > 0 due to PE and the stability assumption. Hence, 11,, —> 0 due to

its non-increasing property. Therefore,
P(w : ”111210 0,, = 0,.) = 1.

This completes the proof of a.s. convergence. E]

For p. convergence, the mixing assumptions of both {u,,} and {v,,} in the above
theorem can be relaxed to the requirements that both {u,,} and {v,,} are asymptoti-
cally independent sequences (see Theorem 3.3). Further, if {u,,} or {v,,} or both are
continuously-distributed random sequences, then the mixing condition in the theo-
rem above can be relaxed to an ergodic condition for a.s. convergence. The following

theorem validates this assertion.

Theorem 3.2 Assume that the stationary sequences {u,,} and {v,,} are independent.
If {u,,} or {v,,} or both are continuously-distributed random sequences, PE holds, UCT
holds, and {u,,} and {v,,} are ergodic, then the estimator of the OBE-ABE algorithm

is a.s. convergent.

Proof: Continuously-distributed {v,,} or {u,,} or both implies that Lemmas 3.2 and
3.3 hold without the assumption that v,, and {u,,} are asymptotically independent
sequences. The theorem is then proved by following the same steps as in the proof of

Theorem 3.1. D

40

For p. convergence, the assumptions that {u,,} and {v,,} are independent and
stationary, and that {u,,} and {v,,} are ergodic are not required (see Theorem 3.4).
The following corollaries are iid noise cases for a.s. convergence. They are special

cases of Theorem 3.1 and 3.2, respectively.

Corollary 3.1 Assume that the stationary sequences {u,,} and {v,,} are independent.
If PE holds, UT holds, {u,,} is mixing, and {v,,} is iid, then the estimator of the OBE-

ABE algorithm is a.s. convergence.

Proof: Since UT and UCT imply each other with the iid assumption, and iid implies

mixing, a.s. convergence follows immediately from Theorem 3.1. D

Corollary 3.2 Assume that the stationary sequences {u,,} and {v,,} are independent.
If {v,,} or {u,,} or both are continuously-distributed random sequences, PE holds, UT
holds, {u,,} is ergodic, and {v,,} is iid, then the estimator of the OBE-ABE algorithm

is a.s. convergent.

Proof: Since UT and UCT imply each other with the iid assumption, and iid implies

ergodic, a.s. convergence follows immediately from Theorem 3.2. Ci

The following corollaries assert the a.s. convergence of the OBE—ABE algorithm

when the probability distribution of the noise sequence {v,,} is known.

Corollary 3.3 Assume that the stationary sequences {u,,} and {v,,} are independent.
If PE holds, {u,,} is ergodic, and {v,,} is iid with uniform distribution, then the

estimator of the OBE-ABE algorithm is a.s. convergent.

Proof: For a uniform distribution, P(v,, E (D: U D; )) Z 2%.; > 0. So, UT holds.

By Corollary 3.2, the estimator is a.s. convergent. Ci

 

41

Corollary 3.4 Assume that the stationary sequences {u,,} and {v,,} are independent.
If PE holds, {u,,} is mixing, and {v,,} is iid with binary Bernoulli distribution: P(v,, =
W) > 6 or P(v,, = —\/'K) > 6, Vn, then the estimator of the OBE-ABE algorithm

is a.s. convergence.
Proof: Since UT holds in this case, a.s. convergence follows from Corollary 3.1. Ci

Corollary 3.5 Assume that the stationary sequences {u,,} and {v,,} are independent.
If PE holds, {u,,} is ergodic and continuously distributed, and {v,,} is iid with binary
Bernoulli distribution: P(v,, = W) > 6 or P(v,, = —,/7:) > 6, Vn, then the

estimator of the OBE-ABE algorithm is a.s. convergent.

Proof: Since UT holds in this case, the a.s. convergence follows from Corollary 3.2.

Ci

Note that all the theorems and corollaries in this section are also valid for con-
ventional OBE algorithms (and the EPA algorithm) with a known exact noise bound
7,. or —ﬂ, because, for the OBE-ABE algorithm, 7,, —> 7... as n -—> 00 as seen in

the proof of Theorem 3.1.

3.6 Convergence in Probability of OBE-ABE

Although a.s. convergence implies p. convergence, the sufﬁcient conditions for a.s. con-
vergence can be relaxed for p. convergence. Speciﬁcally, in addition to other relax-
ations in some cases, the stationarity of {u,,} and {v,,} are not required for p. con-
vergence. The main theorem for p. convergence of OBE-ABE algorithm is in the

following.

Theorem 3.3 Assume that the sequences {v,,} and {u,,} are independent. If PE

holds, UCT holds, and {u,,} and {v,,} are asymptotically independent sequences, then
42

the estimator of the OBE-ABE algorithm is p. convergent.

Proof: Since PE (3.12) and UCT (3.9) hold, there exists an inﬁnite subsequence
{t,} 6 2 such that for all n E {t}, (3.9) and

P(x,, 6 K If},_2) 2 p1 > 0 a.s., (3.60)

hold where K is deﬁned in (3.12). Throughout the proof, all n considered are in
{lg}. For convenience, let {n} replace {t,-} as the time coordinate in the proof. The
meaning of time intervals, for example, should change accordingly.

Let {1).} be a sequence of time intervals of length M E N over which the OBE-
ABE algorithm does not update its estimator. Then, for each k = 1,2,- - - ,K, it

follows from (2.10) that

c,, = m(7,, — 6,2,) — x,,_1G,, 2 0, Vn E 1).. (3.61)

Also from (2.5),

2 ” 2

e. = (y. — 65.13.)? = (v. — 0.1.x.) . (3.62)

Let A; be the event that v,, 6 D: and 011x“ 2 0, and A; be the event that
v,, 6 Dj’ and 55.13% < 0. Also, let A,, 91—3! A; UA;, and denote by A5, the complement
of A,,. Also note that 0Z_,x,, is .7:,,_1-measurable.

Let B), denote the event that A,, occurs at least once in 1).. Then, for all n E 1).“,

B), 6 .77.,_2. Hence, it follows from (3.60) that
P(0;‘[_,x,, 2 0 IE.) 2 p1 > p > 0 (3.63)

and
P(0:_,xn < 0 [Bk) 2 p2 > p > 0. (3.64)

43

Since 3;. E 3.4 C .77 -1 and 034x" is fn_1-measurable, it follows from UCT (3.9)
that

- 6
P(v,, e (D: u 3;) |03,‘_,x,, 2 0, 3,.) 2 6, > 0 > 0 (3.65)

and

6
P(v,, e (D: u D;- ) |0',,_,x,, < 0, 3,.) 2 6, > 0 > 0. (3.66)

Since A; and A; are mutually exclusive for sufﬁciently small 6, it follows from

(3.63), (3.64), (3.65), and (3.66) that

P(An lBk)

P(A; |3,.) + P(A; |3,.)
= P(v,, e 1),-I013, 2 0,3,.) P(0’T _,x,, 2 0 |3,.)
+P(v,, e D+|0’,,_,x,, < 0,3,.) - P(0,T_,x,, < 0 |3,.)
> P(v,, e D;|03’,‘_,x,. 2 0, 3,.) . p + P(v,,e eD+|0T _,x,, < 0, 3,.)-p

> (61 + 6,) p— dé‘ 6 > 0. (3.67)

Hence,

P(Af, |3,.)=1— P(A, |3,.) < 1 — 6.

Similarly, since Af,_2 E .7",,_2, it follows that

P(A:IA°-2,Bk) = 1- P(A IAi- 2,31) = 1 - [P(A :IAS.-2,BI=)+P(A.T|Af.-2,Bk)]
= 1 - {P(v,, e D,- |0',,_,x,, 2 0, A;_,,3,.) - P(0,’-’,‘ _,x,, 2 0|A;_ 2,3.)
+ P(v,,e eD+|0T _<1x,,<,0 A,,_ 2,3.) P(0T _,x.. <0|A:,_ 2,3,.)}

<1—6.

Thus,
P(A: ﬂ Aﬁ-Z '31:): P(Af,_2 lBkl' P(Ac lAn- 21 Bk) < (1 _ (5)2.

44

Similarly,

P(A;|A;_,A° 3,.) < 1 — 6.

11-4

Hence,

man/1230023180 = P(A° As.-. worms/11.05.38.)

71-2

< (1—6)2o(1—6)= (1 -6)3.
By induction, (for notational simplicity, let M be even)
P10032612. IBt) < (1— 0“”.

Thus,

P(B). | B)._1) = P(A,, occurs at least once in I). | B)._1) = P(UZIA, IBI._1)

1— P(niiiAi I 131.1) 2 1— P(0M/’A§.- I 131.1)

i=1

> 1— (1 — 6)M/2.

Note that by (3.62), the fact that A,, occurs at least once in 1;. implies that

maxne 1,, 63, 2 (ﬂ — (5)2. Hence, from (3.62) and (3.68), it follows that

max 53, 2 (WT - 6)2 = 7. - 6(21/7—1 - 6) Z 7. - C(Zx/Vn-l - 6)

7161,;

with probability 1 -— (1 - 6)M/2 ( > 0.99995 if 6 = 0.01 and M > 2000).

(3.68)

(3.69)

In the case of iid noise and the assumption that {v,,} is independent of {u,,}, the

conditional probabilities in the above derivation are not required and can be replaced

with marginal probabilities. Hence, (3.69) holds with probability 1 - (l — 6)” in this

case.

45

Let J). = arg maxneh 6,2,. Then, by letting it 2 J1, (3.61) can be written as

(replacing J). by J for simplicity.)

7,1 — 63 — nJ_lGJ/m Z 0. (3.70)
By (3.69) and (3.70),
'71 - ’7:- Z 1171—1 GJ/m — 6(2\/’7n—-1 — 6)- (3-71)
Let
0, “é‘ ch_lGJ/m — ism/77.: — e). (3.72)

Hence, by (3.71), 7,, is updated (whenever an interval 1;. is found at time n) by the

recursion

,_ —d , ifd >0
7,,: 7 1 J J , (3.73)

7,,-1 , otherwise

is an upper bound of 7,. with probability 1 — (1 — 6)le2 for each 10.
Since the o-ﬁelds .77., increase with n, i.e., .'F,,_1 C f”, it follows from (3.68) that,
for each It,

P(Blek_1Bk_2 ' ° ° 81)) l — (I -' (”M/2.

Hence,

P(ﬂf=,B,.) = P(Blek_lBk-2 - - - Bl) - P(B)._1|B;._2B;._3 - - - Bl) - - - P(leBl) - P(Bl)

> [1 — (1 — 6)%]K. (3.74)

The assumptions that PE holds, the independence between {u,,} and {v,,}, and

that both {u,,} and {v,,} are asymptotically independent sequences, imply that, for

46

overestimated 7,,, Lemma 3.2 and then Lemma 3.3 hold. Hence, intervals I), exist i.o.
It follows that with 6 -1 0, (3.73) and Lemma 3.7 imply that 7,, would continue to
be updated by the OBE-ABE algorithm until d,, —1 0 and 7,, —> 7... as n -+ 00. Thus,
11,, —1 0 when 6 —> 0 since G,, > 0 due to PE and the stability assumption. Hence,
11,, —> 0 due to its non-increasing property.

Note that K < N/M where N is the total “time” (number of data sets). Let

M = [V2N] + l where [] denotes the integer part. Then, (3.74) becomes
P(nf,‘=,3,.) > [1 — (1 — @313 2 [1 — (1 — 6)\/N/’]\/N/2 -» 1 as N ——1 00.
This is veriﬁed by applying L’Hopital’s rule as shown below to ﬁnd that (1 —q")" -+ 1,

asn—+oo,where0§q<1:

lim (1 — q ")” = nangoexp[nln(1—q”)] = exp[lim nln(1— q”)]

 

 

 

 

q" l ,, "1n
= exp(lim ﬂ-1——2)--exp(11m-—l-q C] q)
11—900 n n-voo —-’;}-
2 2nlnq
= exp 11m = exp 11m ——
(ft-900 q‘nn _ ql) (11—100— _q-n lnq)
2n
= ex 11m — -— ex 11m
p(,,_,oo _q_,, ..) p(,,_,°, q-.. 111 q)
= exp(0) = 1.
Thus,
311% P(||0,, — 0,.” > 6) = 0.
This completes the proof of p. convergence. Ci

Remark 1: Theoretically, instead of the a priori knowledge of 7,. to achieve a.s. or

p. convergence, the OBE-ABE algorithm requires only a lower bound, 6(6), of the

47

6-tail probability of v,, [see (3.10)].

Remark 2: For the application of the OBE-ABE algorithm, since N is ﬁnite, an 6
is ﬁrst arbitrarily chosen to be a small positive number (111v decreases with 6). Given
this 6, a 6 is estimated. Any lower bound satisfying UT (3.10) (in the iid case, for
example) will work. Then, a large enough N and M (= [W] + 1, for example)
are chosen so that the probability [1 - (1 - (5)1111]; is close to one. For example,
if 6 = 0.1, choose M = 101 and N = 10,000, for example, to get the probability
[1 - (1 —6)M]% > 0.997. If 6 = 0.02, choose M = 601 and N = 360,000, for example,
to get the probability [1 — (1 —6)M]l‘l > 0.996. Since the probability [1 - (1 — 6)M]l§ is
the most conservative value, if N is ﬁxed in practice and is not too large (N < 10, 000,
for example), M can be simply chosen large enough to make 1 - (1 — 6)“ close to
one. If, in a real-world application, 6 is still difﬁcult to estimated after choosing 6,
then simply start the OBE-ABE algorithm with a small M (M = 20 for example)
and then increase M after each trial if necessary. The negativity of 11,, can serve as an
indication of a too small M. The algorithm is guaranteed to be p. convergent (with
probability close to 1) if M is large enough when sufﬁcient conditions (PE, UCT) are

satisﬁed.

If {u,,} or {v,,} or both are continuously-distributed random sequences, then the
requirements that both {u,,} and {v,,} be asymptotically independent sequences, and
that {u,,} be independent of {v,,} are not required for p. consistency. The following

theorem validates this assertion.

Theorem 3.4 If {u,,} or {v,,} or both are continuously-distributed random sequences,
PE holds, and UCT holds, then the estimator of the OBE-ABE algorithm is p. con-

vergent.

48

Proof: Continuously-distributed v,, implies that Lemmas 3.2 and 3.3 hold without
the assumption that both {u,,} and {v,,} are asymptotically independent sequences,
and that {u,,} is independent of {v,,}. The theorem is then proved by following the

steps of the proof of Theorem 3.3. D

The following corollaries involve iid noise cases for p. convergence. They are

special cases of Theorems 3.3 and 3.4, respectively.

Corollary 3.6 Assume that the sequences {u,,} and {v,,} are independent. If PE
holds, UT holds, {u,,} is asymptotically independent, and {v,,} is iid, then the esti-

mator of the OBE-ABE algorithm is a.s. convergent.

Proof: Since UT and UCT imply each other with the iid assumption, and iid implies

asymptotically independent, p. convergence follows immediately from Theorem 3.3.

Ci

Corollary 3.7 If {u,,} or {v,,} or both are continuously-distributed random sequences,
PE holds, UT holds, and {v,,} is iid, then the estimator of the OBE-ABE algorithm

is p. convergent.

Proof: Since UT and UCT imply each other with the iid assumption, p. convergence

follows immediately from Theorem 3.4. Ci

The following corollaries assert the p. convergence of the OBE—ABE algorithm

when the probability distribution of the noise sequence {v,,} is known.

Corollary 3.8 If PE holds, and {v,,} is iid with uniform distribution, then the esti-

mator of the OBE-ABE algorithm is p. convergent.

Proof: For a uniform distribution, P(v,, 6 (Dj’ U D; )) 2 N67; > 0. So, UT holds.

By Corollary (3.7), the estimator is p. convergent. D
49

 

Corollary 3.9 Assume that the sequences {u,,} and {v,,} are independent. If PE
holds, {u,,} is asymptotically independent, and {v,,} is iid with binary Bernoulli dis-
tribution: P(v,, = \/7_..) > 6 or P(v,, = —\/T) > 6, Vn, then the estimator of the
OBE-ABE algorithm is a.s. convergent.

Proof: Since UT holds in this case, the p. convergence follows from Corollary 3.6. CI

Corollary 3.10 If PE holds, {u,,} is continuously distributed, and {v,,} is iid with
binary Bernoulli distribution: P(v,, = ([77,) > 6 or P(v,, = —\/7_..) > 6, Vn, then the

estimator of the OBE-ABE algorithm is p. convergent.

Proof: Since UT holds in this case, the p. convergence follows from Corollary 3.7. D

Note that all the theorems and corollaries in this section are also valid for con-
ventional OBE algorithms (and the EPA algorithm) with a known exact noise bound
7.. or —ﬁ, because, for the OBE-ABE algorithm, 7,, —+ 7,. as n —1 00 as seen in

the proof of Theorem 3.3.

3.7 Almost Sure and Probability Convergence of
Sub-OBE-ABE

All the convergence theorems developed in previous sections are valid for Sub-OBE-
ABE algorithm in various noise cases with the same sufﬁcient conditions respectively.
The sketch of proofs of all the theorems for Sub-OBE—ABE is in the following. The
algorithmic steps of the algorithm are found in Table 2.3.

For convenience, let us rewrite the optimal check coefﬁcient 6,, suboptimal check

coefﬁcient E,,, and recursion formula for d,, here:

c,, = m7,, — meg, -- 11,,_1G,, (3.75)

50

E,, = m7,, — m6: — 11,,_1x:x,,/d -1 (3.76)

d,, = a,,d,,_1 + B,,xzxn. (3.77)

Also, deﬁne

c,, , if E,, < 0
E; = (3.78)

0 , otherwise.

and

Cn1ifén20

.4.
cu

0 , otherwise.
Hence, c,, = E; + E; . The following lemmas are needed for the proof of the main
theorem.

Lemma 3.5 E,, < 0 implies c,, < 0

Proof: Since the matrix P,, is real symmetric,

1

G1; = XiPn—lxn Z ijn(Pn-1 )szn = m

xfx, (3.79)

where me() and V,,m(-) denote the minimum and maximum eigenvalues, respec-
tively. Also, as in (2.9)

-1 -l T
Pn = anPn_1 + IBanxno

This implies
tr(P;l) = o,tr(P;1,) + 0,1156... (3.80)

Since P; l is positive deﬁnite,

tr(Pr-11) Z Vmax(P;l)
51

That is,

l l
6(1):) 3 4.403;!) (3'81)

Comparing (3.80) and (3.77), we see that d,, = tr(P;1). Also, by (3.79) and (3.81),

0,, Z x,,xZ/dnd . (3.82)

Hence, by (3.75), (3.76) and (3.82), the lemma is proved. . D

Note that a version of Lemma A.2 is valid for Sub«OBE—ABE as is stated in
Lemma 3.6 below. The proof is the same except that c; in the original proof [27] is

replaced by E; .

Lemma 3.6 For Sub-OBE-ABE algorithm, lim,,_.°° E; = 0, where E; is deﬁned in
(3.78).

Theorem 3.5 All the convergence theorems of OBE-ABE are valid for Sub-OBE—

ABE with the same respective conditions.

Proof: That Lemma 3.3 is valid for Sub-OBE—ABE is shown by substituting c; and
c; in the proof with E; and E; , respectively. Also note that P(En < 0) S P(c,, < 0)
which is implied by Lemma 3.5. Since other lemmas (including Lemmas 3.5 and 3.6)
which are necessary for the proof of the main theorems of Sub—OBE—ABE algorithm
are not affected by the suboptimal check E,,, it follows that the proofs of convergence
for OBE-ABE are also valid for Sub-OBE—ABE provided that c,, and G" in the proofs

are changed to E,, and xzxn /d,,_1, respectively. D

52

3.8 A Necessary Condition

All convergence theorems in the previous sections have two common sufﬁcient condi-
tions: PE and UCT (or UT in the iid case). PE has been shown to be a necessary
condition [5, 31, 42] for the convergence of any LSE algorithm. Another necessary
condition for the convergence of those algorithms is the whiteness of the noise se—
quence. This is a logical result since white noise carries the least power of all WSS
noises.

SM algorithms are based on minimizing the feasible set (polytope or ellipsoid)
according to assumed known noise bounds. Although some SM algorithms (especially
OBEs) have similar recursion formulae to RLS, a LSE algorithm, the convergence of
SM algorithms does not rely on the whiteness of the noise sequence. Even in the
white noise case, the estimator of an OBE algorithm (i.e. the center of ellipsoid at
each step) may be biased away from the true parameter vector due to the optimal
data weighting process. This is reasonable since whiteness is an “average” property,
not a pointwise property (e.g. UCT and UT are pointwise properties).

If UCT or UT does not hold, then the ellipsoid of SM algorithms will not shrink to
a point, hence convergence is not guaranteed. The following lemma which is extended
from Lemma A.1 by considering an almost sure (a.s.) w-set in its proof, validates this

assertion.

Lemma 3.7 Assume that the PE (3.12) holds. If there exists an 6 > 0 and N E N,
such that 7,, - v,,2 > 6, Vn > N, a.s., then the sequence of the ellipsoids of OBE

algorithms does not asymptotically shrink to a point a.s.

Theorem 3.6 Assume that PE holds. Then, UCT is a necessary condition for the

sequence of the ellipsoids of any OBE algorithm to shrink to a point a.s.

Proof: Suppose that UCT does not hold. Then, considering the notes below (3.9),
53

there exist an 6 > 0 and an N E N such that, for all n > N, P(v,, E Di]? -1) =
0 and P(v,, E D;|f' -1) = 0 a.s. Hence, 7,, — v3, > 6, Vn > N a.s. Thus, by

Lemma 3.7, the sequence of the ellipsoids does not shrink to a point a.s. Cl

54

Chapter 4

Simulation Studies

 

4.1 Introduction

In this chapter, the performance of each of the new algorithms is investigated through
simulations. Although the algorithms have been proven to be convergent in iid,
mixing, ergodic, and non-stationary noise cases in the previous chapter, simulations
can provide other characteristics which are not explicitly inferred from theoretical
analysis. For example, simulations can illustrate how the speed of convergence of
OBE-ABE algorithm is affected by the ABE procedure. Simulations also reveal that
the suboptimal check in Sub-OBE—ABE does not slow down the speed of convergence.
This is not surprising since the the noise bounds estimated by the ABE procedure are
not affected by the suboptimal check. Simulations also reveal the effect of asymmetric
noise bounds on the performance of convergence.

The version of OBE algorithms used in this dissertation is SM-SA algorithm due
to its interpretable optimization criterion and numerical stability. Colored noise cases,

in addition to iid cases, are also demonstrated.

55

4.2 IID Noise Cases

In this section, the performance of OBE-ABE, Sub-OBE-ABE, and conventional OBE
are compared in iid noise cases with symmetric and asymmetric bounds. Comparisons
of performance of OBE-type algorithms with RLS in the iid case are found in [27].
Hence, they are omitted here. When compared to RLS, OBE algorithms have better
speed of convergence in many noise cases.

Throughout this section, an AR(3) model is simulated as follows:

311: : altyn—l + 02-3/n-2 + aSIUn-ZB + ”11 (4'1)

where a,,, = 2,a2,. = —1.48,a3.. = 0.34 are unknown parameters to be identiﬁed.
Three types of noise sequences {v,,} are employed to generate three observable out-

put sequences {y,,}. Speciﬁcally,

CASE 1: AR(3) model as (4.1). v,, ~ B(—1, 1) is iid and non-zero-mean with binary

Bernoulli distribution.

1, with robabilit 0.7
p y (4.2)

vn
—1 , with probability 0.3.
CASE 2: AR(3) model as (4.1). v,, ~ U (—1, 1) is uniformly distributed on [-1,l].

The following is a case of asymmetric noise bounds:

CASE 3: AR(3) model as (4.1). v,, ~ B(—0.5, 1) is iid, non-zero—mean, asymmetri-

56

cally bounded with binary Bernoulli distribution. Speciﬁcally,

I , with probability 0.7
v11

II
A
1"“
C10
V

—0.5 , with probability 0.3.

The following case is constructed for the illustration of the ﬁnal ellipsoid of the algo-

rithm and the trajectory of the estimator.

CASE 4: AR(2) model is constructed as follows:

lln = —0.1yn—l + 0.5611,.-2 + ”71 (4.4)

in which v,, is as in CASE 1, (4.2).

Note that the noise bound (7.. = 1) and the non-zero mean of v,, are assumed unknown

in these cases.

4.2.1 OBE-ABE vs. OBE

In this subsection, OBE-ABE and OBE are compared with regard to the speed of
convergence and computational efﬁciency using CASES 1 — 3. These results support
the validity of the theorems in the previous chapter.

For CASE 1, the simulation results of OBEABE (70 = 1.5, 6 = 0, and M = 20)
and OBE (70 = 1.5) are shown in Figs. 4.1 - 4.3. As seen in the ﬁgures, neither
estimator is affected by the non-zero mean of v,, and both algorithms converge to
the true parameter (a,,. = 2). This supports the assertions of related theorems in
Chap. 3.

However, with the help of automatic bound estimation, OBE-ABE converges faster

57

(as seen in Fig. 4.1) than OBE. This result is not unexpected in light of Lemma 3.7
and related theorems in Chap. 3.

For CASE 2, the simulation results of OBE-ABE (70 = 1.5, 6 = 0.05, and M = 50)
and OBE (70 = 1.5) are shown in Figs. 4.4 - 4.6. The difference of convergence
performance between OBE-ABE and OBE is similar to those of CASE 1.

Further, a comparison between CASE 1 and CASE 2 shows that the speed of
convergence of both algorithms is proportional to the 6-tail probability, 6, deﬁned in
UT (3.10) or UCT (3.9).

For CASE 3, the simulation results of OBE-ABE (70 = 1.5, 6 = 0, and M = 70)
and OBE (70 = 1.5) are shown in Figs. 4.7 — 4.9. As seen in the ﬁgures, the con-
vergence of (overestimated) OBE is affected by the asymmetry of noise bounds while
OBE-ABE converges consistently as estimated bounds 7,, automatically converge to
the true bound (= 1).

For CASE 4, the ﬁnal ellipsoid of OBE-ABE ("10 = 1.5, 6 = 0, and M = 20) and
the trajectory of the estimator after 1000 steps are shown Fig. 4.10. The trajectory
of the estimator starts from the 17th step. The ellipsoid does not degenerate to a
line segment. This conforms to the positiveness of P,, and 11,, under PE and stable
assumptions.

These simulations show that the number of updates for the estimator of OBE-ABE
algorithm is smaller than that of the (overestimated bound) OBE algorithm if both
algorithms terminate at same ellipsoidal volume. This means that OBE-ABE shrinks
the volumes of ellipsoids more drastically than OBE each time that the updating
occurs. The improved computational efﬁciency of OBE-ABE is not surprising since
the ABE procedure, having trivial computational cost as seen in CASE 1 through

CASE 3, blindly gets more correct information (noise bound) than those of OBE.

58

 

 

 

 

 

 

1.9, 1 1 1 1 1 1 1 1 1
200 400 800 1000 1200 1400 1600 1800 2000
Time(n)

Figure 4.1: Estimators of a,,. = 2. CASE 1 : v,, ~ B(—1,1) is non-zero mean for
model (4.1).

-2
‘0 :::::::::l:::::::::T‘::::::::Et::::::::i::::::::::V:::::::::T:::::::::T:::::::::!:::::::::l:::::::::

.................................................................................................

 

...............................................................................................

.................................................................................................
............................................... ...-.........1........,.........~-........~.....-..
. .............................................................................................
....................................................................................................
................................................. ,....................A,........-...................q
p........,.. ,1.. ........

.....................................................................

1O .- .............................................................................................. 1
...................._......... . ...............

...................................................................................................
.................................................................................................

..................................................................................................

.....................................................................................................

. . . 1 . . . . .
-5 . . . . . .
._ ................ . ...........................................................................
1O .... ......... a
....................................................................................................
..................................................................................................

...................................................................................................

. . . . . . .
................... -.. ......1................._............................_
............................................................................................
.................... .... .....-.-..........«.......1..'....,....r........q

...................................................................................................
....................................................................................................

..................................................................................................

-7 : i E E Z 3 : . E
10 1- ............................ .......... - ........

.......................................................................................................

........................................................................................................

 

 

16- 11;;

 

Figure 4.2: CASE 1 : v,, ~ B(—1, 1) is non-zero mean for model (4.1).

59

 

 

 

 

 

 

 

 

1.5 t i 1 t t i t i v
1.45b .,
1.41- .1
1,3. .
1.3- A
1.25- .1
1.2- ]
1.15- ,
1.11 ,
1.05.. ..
1 1 1 4 1 1_ 1 1 1 1 ‘—
200 400 000 800 1000 1200 1400 10m 1800 2000
Thom)

Figure 4.3: CASE 1 : v,, ~ B(—1, 1) is non-zero mean for model (4.1).

 

 

 

 

OBE
22 a. 7 T T 1 7 Y 1. 1
a, ................ ...... ........
2 4' ' a
1.9
1.9
0

 

1.9

 

 

 

 

1.8 l l l LLL l l
100015002000250030003500400045005000

Figure 4.4: Estimators of a1... = 2. CASE 2 : v,, ~ U(-1, 1) for model (4.1).

60

 

 

 

 

 

10 I I I 1 v r J I l --------
................................................. _............... .1... .. . W111...
................................................. 1
................................................. ‘.............. ......... .......................4
................................................................................ .4
.....................................................................................................
.............................................................. .4
............................................................ 1

-1

‘0 .................................................................. _
......................................................................................................
------------------------------------------------------------------------- l‘
..................................................................... 4

.......................................................... 1
........................................................... :11........'..........'..1.....11'1111-1...u
p ........................................................ 1 ....................................... 1
........................................................... '........... ................111111111111
1 ......................................................... . ................................. T

.

1o ............................................................................................... .1
..................................................... . ...... ............1...........
............................................. . . ..1
.............................................. .08...E
.............................................. l
............................................................................................. q
....................................................... 1
5 ................................................. ................................ 4
L ...................................................................

.- ........................................................... . ................................

10 ......................................................... .............................. .1
.......................................................... _.............................~.......114
............................................................ ,.~1.......,.......111,............1.......4
....................................................................................................
p. ..‘.. 1‘
p ...................................................................................................
5. ....................................................... « .......................................

OBE-ABE
.............................................................................................
1

10 p ........................................................ ' ...................................... u
p ...................................................... ' .......................................
y ......................................................... . ..................................... {
........................................................................................... 4
p ......... ‘ .................................................. ‘ ............................
>1 .1 1...} ........................................... ......: ...............................
P ............................. ' .................... ‘ .......... ‘ .................................

l l l l L l l 1 I

Figure 4.5: CASE 2 : vn ~ U(—1, 1) for model (4.1).

 

 

 

 

 

 

 

 

 

 

WWW

1.5 ﬁ 1 a 1 1 1 1 f 1
1.45- -
1.4L J
135- ‘
1.3- ‘
125» 1
1.2'- W
1.15- -
1.11- L_—l__ J
1.05)- s

1 1 1 1 1 1 1 1 1 1———1

500 1000 1500 2000 2500 $00 3500 4000 4500 5000
Thom)

Figure 4.6: CASE 2 : 12,. ~ U(—1,1) for model (4.1).

61

 

 

 

 

 

 

 

;. A A g A;
o 500100015002000250030003500400045005000

OBE-ABE

 

 

 

 

 

 

 

; P. . i . i ;;
500100015002000250030003500400045005000

Figure 4.7: Estimators of al. = 2. CASE 3 : 1),. ~ B(—0.5, 1), (4.3), for model (4.1).

Volumes of Ellipsotds

 

Figure 4.8: CASE 3 : v,, ~ B(-0.5,1), (4.3), for model (4.1).

62

Estimated noise bounds
‘.5 ‘H I I T T I I

 

1.45-

14*

 

1.35“
1.3?
1.25? l

1.2-

 

1.1l'

1.05"

 

 

 

1 L A L L J n 1 H:
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Tlmo(n)

 

 

Figure 4.9: CASE 3 : vn ~ B(—0.5, 1), (4.3), for model (4.1).

 

0.8 y-
0.6 -
0.4 ~

0.2 -

 

 

L 1 l l

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
A.1

 

Figure 4.10: CASE 4 : The ﬁnal ellipsoid and the trajectory of the estimator. vn ~
B(-—l, 1), (4.2), for model (4.4).

63

4.2.2 Sub-OBE-ABE vs. OBE-ABE

In this subsection, Sub-OBE—ABE and OBE-ABE are compared with regard to the
speed of convergence and computational efﬁciency using CASES 1 and 2. These
convergence results support the validity of the theorems in the previous chapter.

For CASE 1, the simulation results of OBE-ABE (70 = 1.5, e = 0, and M = 20)
and Sub-OBE-ABE (70 = 1.5, c = 0, and M = 20) are shown in Figs. 4.11 and 4.12.
As seen in the ﬁgures, neither estimator is affected by the non-zero mean of v,,, and
both algorithms converge to the true parameter (a1. = 2).

In this case, Sub-OBE—ABE, which selects 18% of the data, has similar convergence
performance to OBE-ABE which selects 50% of the data. This reveals that the
excellent performance of ABE procedure is not affected by the suboptimal check.

For CASE 2, the simulation results of OBE-ABE (70 = 1.5, c = 0.05, and M = 50)
and Sub-OBE—ABE (‘70 = 1.5, c = 0.05, and M = 30) are shown in Figs. 4.13 and
4.14. As seen in the ﬁgures, neither estimator is affected by the non-zero mean of
v,, and both algorithms converge to the true parameter (a1... = 2). Sub-OBE—ABE
which selects very few (1.5%) data in this case has similar convergence performance to
OBE-ABE which selects 5% of the data. However, both in CASEs 1 and 2, the speed
of convergence of Sub-OBE—ABE in the beginning (e.g. ﬁrst 150 steps) is slower than
that of OBE—ABE. This is because, in that stage, Sub-OBE—ABE has big ellipsoids
and is busy updating the estimator using the suboptimal check without updating the

estimated noise bound using the ABE procedure.

4.3 Colored Noise Cases

In this section, the performances of OBE—ABE, Sub-OBE-ABE, and conventional

OBE are compared in colored noise cases with symmetric and asymmetric bounds.

64

 

 

 

 

 

 

 

 

 

2” f T v r Y 3 r .' r
2.“ ................ .......... .......... ...... .......... E ......... . ........ .1
2} u
1.“ ‘... . ...... .......... .......... ......... ......... :"""“ . ........ ..
m l i i i . i i . .
200 400 000 1300 1WD 1200 1400 1000 1m 2000
Stb—OBE-ABE
i 1 g 4 1 i i .
400 600 M 1m 1200 1400 IBM 1000 2000
"m(n)

Figure 4.11: Estimators of a1... = 2. CASE 1 : 12,. ~ B(—1,l) is non-zero mean for
model (4.1).

 

 

 

 

 

L
10‘ L .
t
102 . -
r -— OBE-ABE 1
10° +
4 _ Sub-OBE—ABE ‘
10
10"
10"
10"
10"0 l l 1 l l l l L l
200 400 000 000 1000 1200 1400 1000 1000 2000

Tina (n)

Figure 4.12: CASE 1 : 0,, ~ B(—1, 1) is non-zero mean for model (4.1).

65

 

 

1.9 '

 

1.8

 

 

 

 

 

 

10001500200025003000500400045005000
Timon)

 

Figure 4.13: Estimators of al.. = 2. CASE 2 : 0,, ~ U(—1,1) for model (4.1).

 

‘0 I l I I V I j Y r

10" '4

 

> 4

 

 

‘ 4 I 4 l L L 1 l l I
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time (n)

 

Figure 4.14: CASE 2 : 0,. ~ U(—1,1) for model (4.1).

66

Four colored noise cases are constructed as follows:

CASE 5: AR(3) model as (4.1). v,, is a colored non-zero-mean noise sequence related

to a colored sequence {wn} as follows:

1, ifwn>—1

”n

(4.5)

—1 , otherwise

in which the colored sequence {wn} is generated by a uniformly-distributed iid white

noise sequence 2,. ~ U (—1, 1) as follow:

wn = —0.8w,,..1 + Zn.

CASE 6: v,, is the same as in CASE 5, (4.5), while the order of the AR model increases

to 12. Speciﬁcally, a stable AR(12) model is employed in this case as follows:

311: = a1.y,._1 + 02-yn—2 + ° ' ° + a12.yn—12 + vn (4.6)

where a1... = —O.l,a2.. = 0.9175, a3... = —0.191,a4... = —0.2253,a5. = 0.2601,a6... =
0.0046, a7... = —0.0367, as... = —0.0209,09. = —0.0082,a10... = 0.0095, (111... = -0.0052,

and an»... = -0.0041 are unknown parameters to be identiﬁed.

CASE 7: AR(3) model as (4.1). v,, is a colored non-zero—mean noise sequence with
asymmetric bounds generated by a colored sequence {wn}, i.e.,

l , if w, > —1
v,, = (4.7)

—0.5 , otherwise

The following case is constructed for the illustration of the ﬁnal ellipsoid of the algo-

67

rithm and the trajectory of the estimator in colored noise case.

CASE 8: AR(2) model as in CASE 4, (4.4). v,, is as in CASE 5, (4.5).

4.3.1 OBE-ABE vs. OBE

In this subsection, OBE—ABE and OBE are compared with regard to the speed of
convergence and computational efﬁciency using CASES 5 — 7. These results support
the validity of the theorems in the previous chapter.

For CASE 5, the simulation results of OBE-ABE (70 = 1.5, e = 0, and M = 40)
and OBE ('70 = 1.5) are shown in Figs. 4.15 — 4.17. A seen in the ﬁgures, neither
estimator is affected by the color of v,, and both algorithms converge to the true
parameter (a1. = 2). This supports the assertions of related theorems in Chap. 3.

For CASE 6, the simulation results of OBE-ABE (70 = 1.5, c = 0, and M = 40)
and OBE (70 = 1.5) in this case are shown in Figs. 4.18 — 4.20. A seen in the ﬁgures,
the slow speed of convergence of OBE due to overestimated noise bound results in poor
convergence, while OBE-ABE converges very well to the true parameter (0;. = —0.1).
This shows that the excellent convergence performance is not affected by model order
while (overestimated bound) OBE is.

For CASE 7, the simulation results of OBE-ABE (70 = 1.5, c = 0, and M = 250)
and OBE (70 = 1.5) are shown in Figs. 4.21 - 4.23. As seen in the ﬁgures, the con-
vergence of (overestimated) OBE is affected by the asymmetry of noise bounds while
OBE-ABE converges consistently as estimated bounds 7,, automatically converge to
the true bound.

For CASE 8, the ﬁnal ellipsoid of OBE—ABE (70 = 1.5, e = 0, and M = 40) and

the trajectory of the estimator after 1000 steps are shown in Fig. 4.10. The trajectory

68

 

 

 

 

 

 

 

 

 

2.1 T ﬂ
2.05--
2
1.95
1.9 1 i i A. L i j i L
200 400 600 ”0 1M 1200 1400 1000 1&0 m
OBE-ABE
2.1 ‘ F ' r #
2.x ' i" .1
1.05 ..................................................... .
1.9 1 l L L L L . l 1
200 400 000 MO 1WD 1200 1400 1000 1m 2000
TImom)

Figure 4.15: Estimators of a1. = 2. CASE 5: v,, is colored in (4.5) for model (4.1).

of the estimator starts from the 17th step. The ellipsoid does not degenerate to a
line segment. This conforms to the positiveness of P,, and 10,, under PE and stable

assumptions.

4.3.2 Sub-OBE-ABE vs. OBE-ABE

In this subsection, Sub-OBE—ABE and OBE-ABE are compared with regard to the
speed of convergence and computational efﬁciency using CASES 5 and 6. These
results support the validity of the theorems in the previous chapter.

For CASE 5, the Simulation results of OBE—ABE (70 = 1.5, e = 0, and M = 20)
and Sub-OBE—ABE (70 = 1.5, e = 0, and M = 20) are Shown in Figs. 4.25 and 4.26.
AS seen in the ﬁgures, neither estimator is affected by the color and non-zero mean
of v,, and both algorithms converge to the true parameter (al.. = 2). This supports
the assertions of related theorems in Chap. 3.

In this case, Sub-OBE-ABE, which selects 15% of the data, has similar convergence
performance to OBE—ABE which selects 45% of the data. This reveals that the

69

 

 

 

 

 

 

 

:mo am: an two «no 12m) 14» “no 1am) 2am
Tim. (0)

Figure 4.16: CASE 5 : v,, is colored in (4.5) for model (4.1).

 

 

 

 

 

 

 

 

Smudnobobounds

1.5 u 1 u 1 u v r r 1

1.45- -
1.4“ -
1.35- -'
1.3» ~
1.25- -
12*- .
1.15b .4
1.1? .
1.05- i l—‘
'saaa.a.a............m

11111an)

Figure 4.17: CASE 5 : v,, is colored in (4.5) for model (4.1).

70

 

 

 

 

 

 

 

-0.1' ........ ......

u u .l n . . . .
0‘5. ............. . ,. -.. .....,.........‘
I 1
. . . . . . . . .

 

 

 

'02 200400 000 000100012001400100010002000
"110(0)

 

Figure 4.18: Estimators of a]... = —0.1. CASE 6: v,, is colored in (4.5) for model (4.6).

 

10

 

 

10

10‘”...

 

 

 

200400600800100012001400100018002000
111110.01)

-1: ‘.

10

l

 

Figure 4.19: CASE 6 : v,, is colored in (4.5) for model (4.6).

71

 

 

 

 

 

 

 

 

1.5 x v r . . I t 1 1

1.45- d
1.4'- 4
1.35- r
1.3- ~
1.25- 4
1.2». ..
1.15- .
1.1- .
1.05- .

1 1 .

 

 

0200400600800100012001400160018002000
Tlmo(n)

Figure 4.20: CASE 6 : vﬂ is colored in (4.5) for model (4.6).

 

 

 

 

 

 

 

0.5 1 1.5 2 2.5 3
The (n)

Figure 4.21: Estimators of al.. = 2. CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7),
for model (4.1).

72

 

...................................................................................................

..o-u..........._.........-.....~,...-~............». ............. ..................................

..............................................................................................

...........................

‘04 HHHHIHHH'HE!!!!!!!!!!H!:!!!!!"'"H""E'HH'"l'"NH"!'!!'!!!'.!'!!‘!E!'H!!”WHEEL"!

 

 

 

 

Figure 4.22: CASE 7 : Colored noise 12,; ~ B(—0.5, 1), (4.7), for model (4.1).

Elam-bambobomds

 

 

 

 

‘ i i i

 

1.5
Tm (n)

Figure 4.23: CASE 7 : Colored noise vn ~ B(—0.5, 1), (4.7), for model (4.1).

73

 

0.8 '-

0.6 '-

OA '-

0.2 '-

-o.a ~ «

 

 

-1 g L I L l 41

-1 -0.8 —0.6 -O.4 -0.2 0 0.2 0.4 0.6 0.8 1
a_1

 

Figure 4.24: CASE 8 : The ﬁnal ellipsoid and the trajectory of the estimator. 22,, ~
B(—1,1), (4.5), for model (4.4).
excellent performance of the ABE procedure is not affected by the suboptimal check.
For CASE 6, the simulation results of OBE-ABE (’70 = 1.5, c = 0, and M = 30)
and Sub-OBE-ABE (70 = 1.5, e = 0, and M = 30) are shown in Figs. 4.27 and 4.28.
As seen in the figures, neither estimator is affected by the non-zero mean of v,, and
both algorithms converge to the true parameter (a1. = 2). However, Sub-OBE-ABE,
which selects very few (1.5%) data in this higher order case, has slower speed of

convergence in 2000 steps than OBE-ABE which selects 5% of the data.

4.4 Tracking Performance of Adaptive Sub-OBE-
ABE

In this section, tracking performance of Sub-OBE-ABE algorithm is illustrated through
study of two examples of time-varying systems: one gradually changing, the other

abruptly changing. Specifically, an AR(2) model with time-varying parameters is

74

2.‘ 1 I T f T T

 

 

 

 

 

 

 

 

 

1“ ................................................................. T

‘3 1 l 1 L l

Stb-OBE-ABE
2.1 I I I 7 I I I I I I
2.05--.. . ,.,..; .4 ........ .
J'l 3 ' ; i : : : : :

2 . l
‘5 .. .:. .1
‘n L 1 , , 4 L 41 1 n i

" zoo 400 000 000 1000 1200 1400 woo 1000 2000

M(n)

Figure 4.25: Estimators of al.. = 2. CASE 5: v,, is colored in (4.5) for model (4.1).

 

-..—wry
....A

o .
10 r 1
10" r -— OBE-ABE ‘

 

 

 

 

ZOOMGOONOIWQOOMWWWIWZWO
Thom)

Figure 4.26: CASE 5 : v,, is colored in (4.5) for model (4.1).

75

 

 

 

 

 

 

. . . . . . '.
o - l n . I a I
D
. . . . . .
. . . . . .
_O‘5 V... ,.... .......... . ............ . ..................................... .1
- . . I . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . .

 

 

 

 

 

200400600800100012001400160018002000

Figure 4.27: Estimators of a1... = -0.1. CASE 6: v,, is colored in (4.5) for model (4.6).

 

 

 

 

l

MMBOOEJO1W1ZW14M1BOO1MOZOW

 

10

Figure 4.28: CASE 6: v,, is colored in (4.5) for model (4.6).

76

constructed as follows:

yn = altyn—l + aZtyn—Z + ”71 (4.8)

in which a2. = —0.68 and a1... varies between —1.6 and 1.6 (the same as in [11]).
This is equivalent to varying the system’s conjugate poles 0.8 :i: j0.2 to and from
—O.8 :l: 10.2. The variations of al.. (abruptly or gradually) are shown as dashed lines
in Figs. 4.31 and 4.39. The noise sequence {v,,} is iid and uniformly distributed on
[-1,l].

Although “non-adaptive” OBE algorithms are well-known to have good tracking
capability in slowly time-varying systems, they eventually lose tracking capability.
This result is shown in Fig. 4.29 and 4.30 (7,, = 1.5).

On the other hand, the adaptive Sub-OBE-ABE algorithm ('70 = 1.5, e = 0.02,
M = 70), which selects 4.5% and 3% (p in each ﬁgure) of the data, respectively, keeps
track of the varying parameters very well, as shown in Figs. 4.31 and 4.32.

To further investigate its tracking capability, faster-varying systems are simulated
as in Figs. 4.33 — 4.36. As shown in the ﬁgures, the adaptive Sub-OBE—ABE algorithm
(’70 = 1.5, e = 0.02, M = 70), which selects no more than 7.5% of the data, keeps
track of these faster time-varying parameters equally well.

Comparing these diagrams with those in [11, 37, 38] shows that the adaptive
Sub-OBE—ABE algorithm has better tracking capability.

The tracking capability of the Sub-OBE—ABE (or any OBE) algorithm is propor-
tional to the tail probability of noise. This is a natural result since the larger the tail
probability, the more frequent Sub-OBE—ABE updates its estimator. To demonstrate
this, a much faster time-varying systems with the same uniformly distributed noise v,,
is simulated as shown in Fig. 4.37. As shown in the ﬁgure, the tracking performance
is not as good as in previous cases. However, if vn is now binary Bernoulli distributed

with 0.5 tail probability, the tracking performance is improved as shown in Fig. 4.38.

77

 

 

-2»-

 

 

 

1ooozoooaooo4ooosooosooo7oooaooosooo1oooo
Tlme(n)

Figure 4.29: N on-adaptive OBE algorithm on the time-varying system (4.8). 7,. = 1.5.

An even faster time—varying system with the same binary Bernoulli distributed noise
is shown in Fig. 4.39. As shown in the ﬁgure, the tracking performance of Sub«0BE—
ABE algorithm is still excellent in this case. In these cases, Sub-OBE—ABE selects,

on average, 32% of the data.

78

 

 

 

 

I
I
I
I
I

L

 

 

 

 

 

r
l
l
I
I
l
l
l
I
I
I

 

 

 

 

 

 

 

 

 

 

1000200030004000500060007000m900010000
mm)

Figure 4.30: N on-adaptive OBE (SM-SA) algorithm on the time—varying system (4.8).
7,, = 1.5.

 

 

 

 

 

10002000300040005000600070008000900010000
Tlmo(n)

Figure 4.31: Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8).
p = 4.5%.

79

 

 

 

 

 

 

 

 

 

-1- a

-2.

 

 

 

10002000300040005000600070008000900010000
Timur)

Figure 4.32: Adaptive Sub-OBEABE on the time-varying system (4.8). p = 3%.

 

-1 I- ' I l
-2 -

-3h 1

 

 

 

10002000300040005000600070008000900010000
Thom)

Figure 4.33: Adaptive Sub-OBE-ABE algorithm on the time-varying system (4.8).
p = 5.5%.

80

 

 

-1

 

.2 I-

 

4 4

 

 

 

 

 

 

 

 

 

 

 

 

L...

 

 

L_L__L:

 

 

1ooozoooaoooaooosoooeooomoosooosooo1oooo

Time (n)

Figure 4.34: Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 3.5%.

 

-1-

-2.

 

 

L

 

1ooozoooaooowoosoooeooo7ooosooosoootoooo

The (n)

Figure 4.35: Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 7.5%.

81

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10002000300040005000600070008000900010000
Time(n)

Figure 4.36: Adaptive Sub-OBE—ABE on the time-varying system (4.8). p = 5.5%.

 

   

 

 

I I ' g I
I I I II I II II I I
I I ‘l I I II II I I v I I II II 'I ‘I V I' J

 

 

 

10002000300040005000w007000m00900010000
Tlmo(n)

Figure 4.37: Adaptive Sub-OBE-ABE on the time-varying system (4.8). p = 10%.

82

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

“‘I

-2-

 

 

 

10002000300040005000600070008000900010000
Tlmo(n)

Figure 4.38: Adaptive Sub-OBE-ABE on the time-varying system (4.8) (v,, ~
B(-1,1)). p = 31%.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. I. . I i

 

 

 

.2r-
.3.
4 1 L;
1000200033004000500060007000w00900010000
Tlmo(n)

Figure 4.39: Adaptive Sub-OBE—ABE on the time—varying system (4.8) (vn ~
B(—l,l)). p = 33%.

83

Chapter 5

Application to Linear Prediction
Analysis of Speech

 

5.1 Introduction

OBE-ABE algorithm and its variants presented in the previous chapters are proven
in Chap. 3 to be a.s. and p. convergent under various conditions including colored and
non-stationary noise cases. These algorithms have excellent performance in simula-
tions which are designed to satisfy the sufﬁcient conditions of convergence. However,
in real-world applications where model structures or noise characteristics are not ideal
as in simulations, the performance of most algorithms suffer. In order to further inves-
tigate the performance of the new OBE-ABE algorithms in real-world applications,
they are applied to the linear prediction (LP) analysis of speech [12, 40] in this chapter

and to the blind-deconvolution problem in the next chapter.

5.2 LP model of speech

Speech is a very dynamic signal. Even a short frame (e.g., 25.6 msec) of speech can

be regarded as quasi-stationary at best. However, for analytical tractability, a short

84

frame of 25.6 msec (256 points for 10 kHz sampling rate) speech signal is usually

modeled as an AR(m) process as follows:

yn = Zaityn-i + vn = 03X" '1’ ’0". (5.1)

i=1

0? = [al. - - oam...] is the vector of unknown LP coefﬁcients to be identiﬁed and the
unobservable excitation input sequence {v,,} is a white noise for unvoiced sounds and
a quasi-periodic pulse train of appropriate pitch for voiced sounds.

The LP model has been the industry standard for decades in speech analysis be-
cause of its mathematical tractability and acceptable results for processing or recog-
nition purposes. Although cepstral coefficients [12] or temporal cepstral derivative
[40] are employed in some speech recognition systems for robustness or improved
performance, they are often derived from LP coefﬁcients.

The autocorrelation method using the Levinson-Durbin algorithm [12, 40] to
achieve an 0(m) [0(3m) if overlapped frames are taken into account] computation
has been the industry standard for identifying the LP coefﬁcients {a,,...}. It is a batch
method based on LSE optimization. Its recursive counterpart is the widely-used RLS.
Both algorithms work very well on identifying LP coefﬁcients as shown in the ﬁgures
in the following sections. However, RLS has 0(m2) computational complexity since
it uses 100% of the data to update the estimator.

Any recursive algorithm must converge fast enough in, for example, 256 steps in
order to be successfully applied to LP analysis of speech. Known to have fast speed of
convergence and efﬁcient computation [0(pm2)], OBE algorithms (with suboptimal

checking), besides RLS, is another candidate for this purpose.

85

5.3 LP analysis using OBE-ABE

The ﬁrst attempt to applying OBE algorithm to LP analysis of speech is performed by
Deller and Luk [15]. This attempt reveals the fact that the convergence performance
of OBE is seriously affected by the precision of the estimated noise bounds which are
not available in most real-world applications.

With the ABE procedure, the OBE-ABE algorithm makes possible the application
of OBE to real-world applications including LP analysis of speech. Due to OBE-
ABE’s fast speed of convergence, robustness to measurement noise (see Chap. 6)
and non-stationarity, and blind-deconvolution capability in colored noise, the results
shown below are comparable to those of RLS or the autocorrelation method.

Theoretically, OBE-ABE can be initialized with any overestimated bound 70,
small 6, and large enough M. Since each speech frame in the LP model is typically
only 256 points (for 10 KHz sampling rate) and quasi-stationary, the initialization
must be deliberately chosen to assure convergence. To this purpose, several speech
signals which are drawn from TIMIT (Texas Instruments and Massachusetts Insti-
tute of Technology) speech database [18] are employed in determining the optimal
initialization. Also, each frame of speech is normalized to have maximum unity mag-
nitude. However, no windowing (e.g., Hamming window) is needed. An acceptable
initialization is 70 = 1.5, e = 0.25, and M = 12, for both voiced and unvoiced sounds.
This makes the application of OBE-ABE straightforward, without requiring a priori
knowledge of the voice/unvoiced status of the frame. This initialization produces
acceptable results using only, on average, 12% of the data for voiced sounds and 8%
for the unvoiced. By experimentation, an excellent initialization is found (70 = l,
e = 0.01, and M = 20) for all the voiced and unvoiced. This initialization results in
excellent spectral envelopes of LP coefﬁcients which are comparable to those of the

autocorrelation method and RLS. The data selected in this case are, on average, 25%

86

for voiced sounds and 60% for the unvoiced.

As typical examples of LP analysis, the ﬁrst set of spectra of LP coefficients for
vowels /I/ (voiced phoneme, from utterance “six”), / E / (from “seven”), and /u/ (from
“two”) are shown in Figs. 5.40 - 5.42, and for unvoiced plosive /t/ (from “two”), /f/
(from “four”), and /s/ (from “six”) are shown in Figs. 5.43 — 5.45. The upper portions
of the ﬁgures show the spectra of the speech frames themselves based on a 512-point
F FT. The lower portions show the spectra of LP coefﬁcients (m = 14) obtained by
autocorrelation method (solid line), and OBE—ABE (dashed line). As seen in the
ﬁgures, OBE-ABE produces similar spectra to those of conventional batch method.

The second set of spectra of LPG coefﬁcients for vowels / I / (voiced phoneme, from
utterance “six”), /E/ (from “seven”), and /u/ (from “two”) are shown in Figs. 5.46
— 5.48, and for unvoiced plosive /t/ (from “two”), /f/ (from “four”), and /s/ (from
“six”) are shown in Figs. 5.49 - 5.51. Similar to the ﬁrst set, the upper portions of
the ﬁgures show the spectra of the speech frames themselves based on a 512-point
F FT. The lower portions show the spectra of LP coefﬁcients (m = 14) obtained by
RLS (solid line), and OBE—ABE (dashed line). As seen in the ﬁgures, OBE—ABE
produces similar spectra to those of RLS while selects, on average, no more than half
as many data as RLS.

To further investigate OBE-ABE’s capability, a simulated speech frame (256
points) of phoneme /u/ is generated by the LP model (5.1) using a pulse train (pulses
separated by 80 points, amplitude 0.5, and ﬁrst pulse located at the 10th point) as
the excitation {v,,}. Both OBE-ABE and autocorrelation method are applied to this
case for the identiﬁcation of the LP coefﬁcients. The results are shown in Fig. 5.52.
As seen in the ﬁgure, OBEABE produces better result since its LP spectrum is more
resemblant to that of the simulated speech itself. This conforms to the fact that

OBE-ABE converges consistently in colored noise case while other algorithms based

87

1 Spearmint/II

10 I I I I U I I 1 Y

 

10

'310'2

1 J. L A L A

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

o 1 l l l I 4

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mn W normalized b PI

Figure 5.40: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line).

I Wot/El

 

 

 

 

 

 

 

l J 1 l l l

G l l l
O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon MW nonnallud b Pl

 

Figure 5.41: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE—ABE (dashed line), and autocorrelation (solid line).

88

‘ Mind/U

 

 

0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.0 1
Man W m :5 PI

Figure 5.42: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line).

 

o Sputum 0M!
10 ﬁr I I f I I T 1
1o"
-2 u
‘° . 1.
104i 1
10"
104 l I 1 1 J_

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

L l l 1 A L 1 l

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon Mm norm-Izod to PI

 

"o 01

Figure 5.43: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line).

89

‘ Spuwunouv

 

10“ '

0 0.1 02 0.3 0.4 05 0.0 0.7 on 0.9 1

 

 

 

 

G I L

 

0 on 02 03 0A 05 05 0] 03 09 1
allmh'mnmwnaWUhndel

Figure 5.44: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line).

 

 

 

 

a Spoctunothl
‘0 1' ﬂ 1 r I r
10°
010*»
10"}
104 1 LL 9 l L 4 l l l
0 0.1 02 03 0.4 as 0.0 0.7 as 0.9 1
15 . s

 

 

 

 

 

0 OJ 02 03 0A 05 on 0]
Hulmﬂuwunymmwﬂmﬂtﬂﬂ

Figure 5.45: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and autocorrelation (solid line).

90

2 Spoctunotlll
‘0 I I I I I I T I I

 

10

010*

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon Iroquoncy nonnalzed 10 PI

P l L

 

Figure 5.46: Spectra of voiced /I/ phoneme. Upper graph: Direct 512-point FFT
spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and RLS (solid

line).

3 Wot/El

‘0 I ﬁ I I I I T

 

10
10 '

10'

 

 

 

 

 

 

 

15 I I I f r
10 _ \ J
8 I ,.
5 - — ’l
G J 1 1 1 1 1 1 1 J ‘
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon lroqmncy normalized 10 PI

Figure 5.47: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line).

91

Spammdhv
102 I I I I

0 10“ . 1

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon M norm-Izod b Pl

Figure 5.48: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line).

0 Spectrumoflv

10'“

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1
WW nomullzodb PI

Figure 5.49: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on OBE-ABE (dashed line), and RLS (solid line).

92

‘ Spoummot/ll

 

10"; .

0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1

 

 

 

 

O 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1
Mn 11W nourished b Pl

 

Figure 5.50: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra

based on OBE-ABE (dashed line), and RLS (solid line).

 

a Spectrumdls/

8104- 4

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1

 

 

 

 

 

c l l 1 L l l l I l
0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Radon M normalized 10 PI

Figure 5.51: Upper graph: Direct 512-point F FT spectrum. Lower graph: Spectra

based on OBE-ABE (dashed line), and RLS (solid line).

93

on LS concept (e.g., LS, RLS, LMS) are inconsistent.

5.4 Sub-OBE-ABE and computational complexity

The Sub-OBE—ABE algorithm has been shown in Chap. 4 to have similar speed of
convergence to OBE-ABE while having more efﬁcient 0(pm2) computation. However,
within short periods of 256 steps, Sub-OBE—ABE does not converge as well as OBE-
ABE with the same initialization, due to the suboptimal check for innovation. This
phenomenon has been explained in Chap. 4 for Fig. 4.12 and 4.14.

Through experiments, two set of initializations of Sub-OBE-ABE algorithm are
found respectively for the voiced and unvoiced sounds. For voiced sounds, OBE-ABE
initialized with 70 = 0.25, e = 0.0001, and M = 12 has excellent results as shown in
Figs. 5.53 — 5.55. Seven percent of the data is selected in the voiced case on average.
For unvoiced sounds, the initialization is 70 = 0.5, e = 0.0001, and M = 8. The
results for unvoiced sounds are shown in Figs. 5.56 - 5.58. Thirteen percent of the
data is selected in the unvoiced case on average.

As seen in the ﬁgures, Sub-OBE—ABE achieves similar spectra to those of RLS
or the autocorrelation method while selecting, on average, only 10% of the data.
Since the number of LPG coefﬁcients, m, is on the order of 10 [12, 40] for most
speech applications, Sub-OBE—ABE is virtually an 0(m) recursive algorithm in this

application.

94

z Specuunot/w

 

 

10 I I f T
L
10" i
810"
P
4
10 V1
10" L 1 L L

 

 

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1
Baden lrequency normalized b P1

Figure 5.52: Spectra of simulated voiced /u/ phoneme. Upper graph: Direct 512-

point FFT spectrum. Lower graph: Spectra based on OBE-ABE (dashed line), and
autocorrelation method (solid line).

Specuumotlv
102 I I I I I

10 "

0 10" .

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

0 4L 1 4 k

 

0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Baden trequemy normalized 10 P1

Figure 5.53: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line).

95

 

 

 

 

1o 1 . . . . T . f .
10° - -
10" i 4
B 10“ ' j
‘0'... .1
10" 1 1— 1 L L 1 1 4 1

 

 

 

1 A L 1 1 1 ‘1 - - -
o 0.1 0.2 0.3 0.4 0.6 0.6 0.7
Mn tequeney nonnelzed b PI

 

o l L

 

Figure 5.54: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line).

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.0 1

 

 

 

 

 

0 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Hedien hequemy normalized 10 Pl

Figure 5.55: Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line).

96

‘ Spout-110M!

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1
Red-n new nonnelzed b Pl

Figure 5.56: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line).

0 Specuumofnl

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.8 0.9 1
Reden lrequency nommlzed 10 P1

Figure 5.57: Upper graph: Direct 512-point FF T spectrum. Lower graph: Spectra
based on Sub-OBE-ABE (dashed line), and autocorrelation (solid line).

97

 

 

 

 

10 I I T
10° - .
> 4
810" ~ ~
10" [
10'. 1 l l L L L 1 1 1
0 0.1 0.2 0.3 0.4 0.5 0.0 0.7 0.0 0.9 1

 

15 I I I I I I I I

 

 

 

1 j l l 1 l

 

0 0.1 0.2 0.3 0.4 0.5 0.0

0.7 0.8 0.9 1
Hedien 1requenay normalized 10 Pl

Figure 5.58: Upper graph: Direct 512-point FFT spectrum. Lower graph: Spectra
based on Sub-OBE—ABE (dashed line), and autocorrelation (solid line).

98

Chapter 6

Application to Blind
Deconvolution

 

6.1 Introduction

The blind-deconvolution problem [23] has attracted intensive research in recent years
in the ﬁelds of data communications, adaptive ﬁltering, and signal processing. A typi-
cal block diagram for the blind-deconvolution problem is shown in Fig. 6.1. The input
{v,,} and the noise {10”} in the ﬁgure are assumed unobservable. In data communi-
cations, the unknown input data are recovered using the output data received from a
noisy and distorted channel (e.g., satellite, cellular or optical ﬁber). Another example
of a blind-deconvolution problem is the LP analysis of a speech signal corrupted by
unknown measurement or environmental noise.

Several well-known estimation methods have been applied to solve the blind-
deconvolution problem. Bayesian estimation [23] (batch method) and LMS (recursive)
are two popular methods. However, both of those methods require that the input {0,,}
and the noise {wn} be white for convergence since they are based on the LSE op-
timization. This seriously restricts the application scope since input data ({v,,}) in

most engineering applications (e.g., the previous two examples) are not always white.

99

 

yn

 

 

D H(z)

 

 

 

 

Figure 6.1: Model of the received signal in the blind—deconvolution problem.

The new OBE-ABE algorithms presented in this dissertation are good candidates
to solve this difﬁculty since they converge in colored noise cases as demonstrated in
the previous chapters. Although the model of blind deconvolution shown in Fig. 6.1
is not an ARX model, OBE-ABE algorithms show excellent results in this application
due to their robustness to the measurement noise.

The block diagram in Fig. 6.2 is one of the application structures for blind de-
convolution and is employed in this chapter. The parameters of the inverse ﬁlter
H‘1(z) are estimated by OBE-ABE using output data {y,,}. The detector in the
figure has known transfer functions for the recovery of the input data using {0,,}.
The unknown H(z) is simulated as HR (inﬁnite impulse response) and FIR (ﬁnite
impulse response) ﬁlters, respectively. The unknown input {0,,} is simulated as an
iid or a colored digital data stream. The noise {ion} is independent of {v,,} and is iid

with uniform distribution speciﬁed in the following sections.

100

6.2 IID input and IIR ﬁlter

In this section, the IIR ﬁlter in Fig. 6.2 is assumed to be:
H(z) = 1/(a1...z’1 + 02.2-2 + a3..z"3) (6.1)

where the unknown parameters al.. = 2, a2... = —1.68,a3... = 0.34, the input {v,,} is
iid with equally distributed discrete values {-1, -0.5, 0, 0.5, 1}, and the noise {wn} is
iid and uniformly distributed on [-0.05, 0.05]. In this case, the signal to noise ratio
(S/N) [ignoring the gain of H (2)] is 42.5 dB. The order of the inverse ﬁlter f1 “1(2)
in this case is three. The application results of OBE-ABE and conventional OBE
algorithms in this case are shown in Fig. 6.3 and 6.4, respectively. As seen in Fig. 6.4,
the ellipsoidal volumes of OBE algorithm (77. = 1) quickly become negative because
of the added noise {wn}, while the volume of the OBE-ABE algorithm (70 = 10, e =
0.5, and M = 80) converges to zero. Figure 6.3 shows the estimator of the ﬁrst
parameter both algorithms. As seen in the ﬁgure, the OBE estimate diverges due
to the negativity of ellipsoidal volumes, while OBE-ABE algorithm converges to the
theoretical values ([al,ag,a3] = [1.9930, —1.4676,0.3332]) calculated by the Wiener
optimization method [23]. Simulations show that OBE-ABE algorithm is robust to
the noise {wn} while OBE is not.

Simulations also show that, using OBE—ABE in this case, the output of the detec—
tor, {13n}, recovers the unknown input data {0,,} with a 99% success rate. Figure 6.5
shows the ﬁrst 100 samples of v,,, i3", and i)... The 1% wrong data can be easily
recovered using some error-correction techniques in data communications.

Simulations also show that the performance of OBE-ABE algorithm in the noise-
corrupted case is not affected by the order of H (2) However, when the S / N decreases

(e.g. increases the bound of the noise {wn}), the success rate decreases due to the

101

biased estimator which can be theoretically calculated by the Wiener optimization
method. This can be seen in Figs. 6.6 and 6.7 which are the simulation results of the
above model except that now wn ~ U(—0.1,0.1) (S/N = 36.5 dB). As seen in the
ﬁgures, the estimator of OBE-ABE is biased (theoretical values of the estimator in
this case are [01,02,031 = [1.9724, —1.4315,0.3133]), and the success rate is 91%.

6.3 IID input and FIR ﬁlter

In this section, the FIR ﬁlter in Fig. 6.2 is assumed to be:
11(2) = ale-z-l + a2ez_2 + 03112—3

where the unknown parameters a1... = 0.7,a2. = 05,03... = —0.336, the input {v,,}
is iid with equally distributed discrete values {—1, -0.5, 0, 0.5, 1}, and the noise
{wn} is iid and uniformly distributed on [-0.05, 0.05] (S/N = 42.5 dB) and [-0.1,0.1]
(S/N = 36.5 dB), respectively. In this FIR case, the performance of OBE-ABE
algorithm is still excellent if the order of the inverse ﬁlter H‘1(z) increases from 3 to
10. The results are shown in Figs. 6.8 — 6.9. As seen in the ﬁgures, OBE-ABE shows
excellent performance with 99% success rate in the case of S/N = 42.5 dB (Fig. 6.8)
and 93% success rate in the case of S/N = 36.5 dB (Fig. 6.9).

6.4 Colored Input and IIR Filter

In this section, the HR ﬁlter in Fig. 6.2, H (z), is the same as in (6.1) and the input

{v,,} is colored as in (4.5), and the noise {tan} is iid and uniformly distributed on
[—0.1, 0.1] (S/N = 36.5 dB) or [-0.5,0.5] (S/N = 22.5 dB). The application results of
OBE-ABE in these cases are shown in Figs. 6.10 and 6.11. As seen in the ﬁgures,

102

 

 

 

... A
'0" 7 + yn A I vn 7 0n
——>[ H(z) H'1 (2) H Detector [——>

Figure 6.2: Block diagram of the blind-deconvolution problem.

 

 

 

 

J. l IJLI l l Ji
010002000300040005000600070008000900010000

OBE-ABE

 

 

 

 

l l l J l
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time(n)

1 4 4' 1

 

Figure 6.3: Estimates of al. of OBE (7,, = 1) and OBEABE (70 = 10) for Fig. 6.2
(iid-11R, S/N = 42 dB).

103

 

 

 

 

-69

 

 

.3

 

10002000300040005000000070000000000010000
Figure 6.4: Volumes of ellipsoids of OBE (7,. = 1) and OBE-ABE (70 = 10) for
Fig. 6.2 (iid-HR, S/N = 42 dB).

 

 

 

 

 

_, 4 4 .,

l l L 1 l L I L J

 

 

 

d

—<?

 

 

 

o .1
:m- a

................................................................................

 

 

d

 

Figure 6.5: First 100 samples of 13,, (top), 13,. (middle), and on (bottom) using OBE-
ABE (70 = 10) in Fig. 6.2 (iid-HR, S/N = 42.5 dB).

104

 

 

 

 

 

 

 

010002000300040005000000070000000900010000

OBE-ABE

 

 

 

 

010002000300040005000000070000000000010000
Tine(n)

Figure 6.6: Estimates of al.. of OBE (7,, = l) and OBE-ABE ('70 = 10) for Fig. 6.2
(iid-HR, S/N = 36.5 dB).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

““1212 ..l“

 

10 90 100
‘ a
. llllllllll [null [hm llll. llll
|"‘|' 'l'llllll III I |"I"
-1

 

 

 

10 20 40 50 00 710100

0

l“ [will

10 20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.7: First 100 samples of 13,. (top), 13,. (middle), and vn (bottom) using OBE-
ABE ('70 = 10) in Fig. 6.2 (iid-IIR, S/N = 36.5 dB).

105

 

.2 ”H “,1.!” |||2||7
-. ”I | ||'”',

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10 30 l 1100
11-j
o|l|ii|22| ||||||||||||||5||| 5222222|||2 «II ||||||||||| II
'|2|'| | ||"' |"'||"||" Ill
-1
100
1|-
, | |Ill|llll||III|I ||| |||Il|mu|| ||
II'I‘j'ol'll | '|"' |" ||"||" ||'||
_, ....................................................................
100

Figure 6.8: First 100 samples of 13,. (top), 13,, (middle), and 12,, (bottom) using OBE-
ABE (’70 = 10) in Fig. 6.2 (iid-FIR, S/N = 42.5 dB).

 

 

 

 

 

 

 

 

 

 

 

 

 

-, “2||2|j||.§|"|§..||.if§||§||||I||

 

 

d
u...-
l

I
|' |'|||||

L

O

 

 

 

 

 

1W
1 r
,1 |||| | || ||1||||m||||
I|||'||""'|| |'|'|"|"'||' |I '|'||||'|
-1 ”“1”

 

Time (11)

Figure 6.9: First 100 samples of 13,, (top), 13,. (middle), and v,, (bottom) using OBE—
ABE ('70 = 10) in Fig. 6.2 (IID-FIR, S/N = 36.5 dB).

106

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

wwwwwwwwwwww

 

 

 

 

 

 

 

 

 

 

 

 

...................
lllllllll

 

 

 

f r T I I j’ ‘r I I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

llllllllllllllll

 

Figure 6.10: First 100 samples of 13,, (top), 13,, (middle), and 12,, (bottom) using OBE-
ABE (70 = 10) in Fig. 6.2 (Colored—11R, S/N = 36.5 dB).

OBE-ABE shows excellent performance,achieving 100% success rate in the case of
S/N = 36.5 dB (Fig. 6.10) and 90% success rate in the case of S/N = 22.5 dB (Fig.
6.11). This demonstrates the excellent blind-deconvolution performance of OBE-ABE

algorithm in the colored-input case.

107

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

;’||||"'||||' _ i ' | | |
-. || ||| | | ||||| |||| |||

LLLLLLLL

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.11: First 100 samples of 13,, (top), 13,, (middle), and 0,, (bottom) using OBE-
ABE (70 = 10) in Fig. 6.2 (Colored-HR, S/N = 22.5 dB).

108

Chapter 7

Conclusions

 

7 .1 Concluding Remarks

This research has been concerned with an innovative parameter-estimation technique,
the OBE algorithm, which is based on a bounded-noise assumption. The OBE algo-
rithms employ selective updating to improve computational efﬁciency (using subop-
timal check) while having excellent speed of convergence. However, this advantage
is offset by the impractical assumption of known noise bounds. Hence, despite those
superiorities, OBE algorithms have rarely been found in real-world applications.
The new algorithms, OBE-ABE and their variants, introduced in this dissertation
do not require the knowledge of the noise bounds, while preserving or improving,
virtually without cost, the inherently superior performance of OBE algorithms with
regard to convergence, speed of convergence, computational efﬁciency, and tracking
capability. The new algorithms also converge in colored noise and non-stationary noise
cases — a feat that is theoretically impossible for other popular techniques like RLS,
LMS, Kalman-Bucy ﬁlter, etc. Hence, the popularity of the new algorithms in real-
world applications is promising. A summary of the contributions of this dissertation

appears in the next section.

109

 

7 .2 Contributions and further work

The major contributions of this dissertation are summarized as follows: This research

has

1. Produced an algorithm (OBE-ABE) which theoretically relaxes and practically
releases the unavailable prerequisite (known noise bounds) of all SM (and OBE)
algorithms while preserving and improving their inherently superior perfor-

mance.

2. Produced a computationally efﬁcient version of the new algorithm (Sub-OBE-
ABE) without any sacriﬁce of performance. This efﬁcient Sub-OBE—ABE algo-
rithm has similar performance to the original OBE-ABE with respect to con-

vergence, speed of convergence, and tracking capability.

3. Produced an adaptive version of Sub-OBE-ABE (and OBEABE) to track time-
varying parameters. The tracking performance of the algorithm is excellent

while its computation is very efﬁcient.

4. Proved the a.s. convergence and the p. convergence of OBE-ABE and Sub-
OBEABE in iid, mixing, ergodic, and non-stationary noise cases by employing
the analytical techniques from probability theory and measure theory. Several
theorems (with very general conditions) provide the theoretical foundations for

the analysis and applications of OBE—ABE (and hence all SM) algorithms.

5. Provided extensive simulations which reveal some not-clearly-known behaviors

or characteristics of OBE-type (hence SM) algorithms.

6. Applied the new algorithms successfully (due to their intensionally-devised ﬂex-

ibility and the resulting robustness) to real-world problems of speech analysis

110

and blind deconvolution. The application results have demonstrated the poten-

tial of the new algorithms.

As seen in the theorems and in simulation studies, the excellent convergence per-
formance of all SM algorithms depends on UCT (3.9) or UT (3.10) condition of the

noise sequence. Hence , an interesting open research problem is as follows:

e Devise an algorithm which has similar performance and a priori knowledge to
the OBE—ABE’s, and still converges consistently and quickly when UT (3.10)
or UCT (3.9) is not satisﬁed.

111

APPENDIX

Appendix A

The following lemmas from the literature are essential for the proofs of new theo-
rems. Original proofs of some of those lemmas are listed here for better understanding
those lemmas and the proofs of the new theorems in this dissertation. Please refer to

Chap. 3 for the notation.

Lemma A.1 (Proof: [27, 35]). Assume that condition PE (3.12) holds. If there
exists an e > 0 and N E N, such that 7,, - v,,2 > e, Vn > N, then the ellipsoids of

OBE algorithms do not asymptotically shrink to a point.

Lemma A.2 (Proof: [27]). Let c; and c: be the negative and positive parts ofc,,, re-
spectively, where c,, is deﬁned in (2.10). That is c; = {c,,, if c,, < 0; 0, otherwise},

and c: = {c,,, if c,, 2 0; 0, otherwise}. If PE holds, then lim,,_.°o c; = 0.

Lemma A.3 [46, p.120], [16, p.35]. A discrete random variable has an almost peri-

odic characteristic function, and the converse is also true.

Lemma AA [17]. If {X}, : k E K} is stationary and g : RK -+ R is measurable,
then {Y}, = g(Xk,Xk_1, - - .) : k E K} is also stationary.

Proof: Let a: E RK, B E BK, and g], = g(:c;,,a:k_1,- - ). Also, let

A = {x 1 (90(33lag-1($)1°“) E B}-

112

Then, stationarity of {Xk} implies

P(UJ:(YE),Y_1,'”) E B)

P(w 2 (X0,X_1,°") E A) = P(w I (Xk,Xk_1,"') E A)

P(w : (Yk,Y),_1, - ~ ) 6 B).

Hence, by Deﬁnition 3.8, {Y,,} is stationary. [3

Lemma A.5 [17]. Let X), d-Ef X o Tk‘l, V k E K where X is a r.v. on (Q,f, P). If

T is an m.p.t., then, {X}, : k E K} is a stationary sequence.

Proof: The lemma is proved by observing that, for B E BK and
Adi! w:(~--,X1,X2,---)€ B},

P((- - . ,X1,X1+1,-~) e B) = P(Tk‘1(w) e A) = P(w e A)

= P((--°,X1,X2,---)EB).

Lemma A.6 [17]. If m.p.t. T is mixing, then T is ergodic.

Proof: Let A E I. Then, T‘1(A) = A,~--,T"‘(A) = A, V k. Hence, by deﬁni-

tion 3.13,
P(A) = P(A n A) = P(A n T-"(A)) _. P(A) . P(A) as n _. 00.

So, P(A) = [P(A)]z. Thus, P(A) E {0,1}. Hence, I is trivial. Therefore, T is

ergodic. [3

Lemma A.7 [17]. If {X}, : k E K} is an ergodic stationary sequence, and g : RK —>

R is measurable, then {Y}, = g(Xk,X),_1, - - -) : k E K} is also stationary and ergodic.
113

Proof: Let X0,X_1,-- - be deﬁned on sequence space 9 = RK with X),(w) == 02),,

where w = (---w-1,wo,w1,-- ) 6 it. Since {Y,,} is stationary by Lemma A.4, let

B 6 3K be such that

{w : (Y0,Y_1,---) E B} = {w: (Y),,Yk_1,~-) 6 B}.

then, by Deﬁnition 3.10, A die! 02 : (Y0,Y_1,~ - .) 6 B} is shift invariant. So, by the

ergodicity of {X1},

P(A) = P({w = (X01X-11'”) 6 A}) 6 {0,1}-

Therefore, {Y,,} is ergodic by Deﬁnitions 3.11 and 3.12. CI

Note that {X1} and {Y,,} in the above lemma can be one-sided inﬁnite sequences.
Also, the mapping g which is independent of index k can also be any finite-length

moving function.

Theorem A.1 (Poincaré Recurrence Theorem) [17]. Let T : (2 —-1 it be an

m.p.t. and A E .7. Deﬁne TA = inf{n Z 1 : T"(w) 6 A}. Then,
(i) TA < oo a.s. on A, i.e., P(w 6 A : m(w) = 00) = 0.
(ii) A C {02 : T"(w) E A i.o. }.

(iii) If T is ergodic and P(A) > 0, then P(w : T"(w) 6 A i.o. ) = 1.

Proof:

(i) Let B = {w E A : 731(0)) = 00}. Ifw E T‘1(B), then T(w) E A, T2(w) ¢ A,
T3(w) ¢ A, - - -.
Ifw E T“2(B), then T2(w) E A, T3(w) ¢ A, T°(w) ¢ A, 2 - o.
114

Hence, T‘1(B), T’2(B), T'3(B), - - - are pairwise disjoint. Since T is m.p.t.,
it follows that P(B) = P(T'1(B)) = P(T"2(B)) = Thus, P(B) = 0.
(Otherwise, P(Q) 2 22:0 P(T'k(B)) = 00.)

(ii) For any h E N, since T is m.p.t., it follows from (i) that
0=P(w€A:T""(w)¢A, Vn21)2P(w€A:Tm(w)¢A, VmZk).

This inequality implies that the last probability is 0 for all lc. Hence, P(w E A :
Tm(w) 6 A i.o.) = 1. Thus, A C {0.2 : T"(w) i.o. }.

(iii) By Deﬁnition 3.10, The set C gr (.0 : T"(w) E A i.o.} is an invariant set. Since
T is ergodic, it follows from Deﬁnition 3.12 that P(C) 6 {0,1}. Also, by (ii),
C D A. Hence, P(C) 2 P(A) > 0 by assumption. Therefore, P(C) = 1. Cl

115

BIBLIOGRAPHY

Bibliography

[1] Amerio, L., Prouse, G., Almost Periodic thctions and Functional Equations,
Van Norstrand Reinhold Company, New York, 1971.

[2] Atal, B.S., J .R. Remde, “A new model of LPG excitation for producing natural-
sounding speech at low bit rates,” Proc. ICASSP, Paris, pp. 614-617, 1982.

[3] Banyasz, Cs. and Keviczky, L. (editors), Proc. 9th IFAC Symp. on Identification
and System Parameter Estimation (SYSID ’91) (Special sessions on bounded-
error methods), vols. l &, 2, Budapest, July, 1991.

[4] Bertsekas, D. P., and LB. Rhodes, “Recursive state estimation for a set-

membership description of uncertainty,” IEEE Trans. on Automatic Control,
vol. AC-16, pp. 117-128, April, 1971.

[5] Bitmead, R. R., “Persistence of excitation conditions and the convergence of
adaptive schemes,” IEEE Trans. on Information Theory, vol. IT-30, No. 2, pp.
183-191, March, 1984.

[6] Bohr, H., Almost Periodic Functions, Chelsea Publishing, New York, 1951.

[7] Cheung, M. E, On Optimal Algorithms for Parameter Set Estimation (Ph.D.
dissertation), Ohio State University, 1991.

[8] Dasgupta, S., and Y. F. Huang, “Asymptotically convergent modiﬁed recursive
least squares with data dependent updating and forgetting factor for systems
with bounded noise,” IEEE Trans. Info. Theory, vol. 33, pp. 383-392, 1987.

[9] Deller Jr., J. R., T. M. Lin, and M. Nayeri, “Linear prediction using set-
membership identiﬁcation with blind error-bound estimation,” Proc. 1997 IEEE'
Int. Conf. Acoustics, Speech, and Signal Processing, (in press).

[10] Deller Jr., J. R., and M. Nayeri (session chairs), Proc. IEEE Intl. Symp. on
Circuits and System (Special session on set-membership-based identiﬁcation for
signal processing and control), vol. 1, Chicago, May, 1993.

[11] Deller Jr., J. R., M. Nayeri and S. F. Odeh, “Least-square identiﬁcation with
error bounds for real-time signal processing and control,” Proc. IEEE, vol. 81,
pp. 815—849, 1993.

116

[l2] Deller, Jr., J. R., J. G. Proakis, J. H. L. Hansen, Discrete-Time Processing of
Speech Signals, Macmillan, New York, 1993.

[13] Deller Jr., J. R. and S. F. Odeh, “Adaptive set-membership identiﬁcation in
0(m) time for linear-in-parameters models,” IEEE Trans. on Signal Processing,
vol. 41, No. 5, May, 1993.

[14] Deller Jr., J. R., M. Nayeri, and M. S. Liu, “Unifying the landmark developments
in optimal bounding ellipsoid identiﬁcation,” Int. J. of Adaptive Control and
Signal Processing, vol. 8, pp. 43-60, 1994.

[15] Deller, J .R., and T.C. Luk, “Linear prediction analysis of speech based on set-

membership theory,” Computer Speech and Language, Oct., vol. 3, pp. 301-327,
1989

[16] Douglas, J. B., Analysis with Standard Contagious Distributions, vol. 4, Interna-
tional Co-operative Publishing House, Burtonsville, Maryland, 1980.

[17] Durret, R., Probability: Theory and Examples, Wadsworth be Brooks/ Cole Pub-
lishing Co., California, 1993.

[18] Fisher, W. M., G. R. Doddington, and K. M. Goudie-Marshall, “The DARPA
speech recognition research database,” Proc. DARPA Speech Recognition Work-
shop, pp. 93-99, 1986.

[19] Fogel, E., “System identiﬁcation via membership set constraints with energy
constrained noise” IEEE Trans. on Automatic Control, vol. 24, pp. 752—758,
1979.

[20] Fogel, E., and Y. F. Huang, “On the value of information in system identiﬁcation
- bounded noise case,” Automatica, vol. 18, pp. 229-238, 1982.

[21] Goldberg, J. L., Matrix Theory with Applications, MacGraw-Hill Co., 1991.

[22] Graupe, D., Time Series Analysis, Identiﬁcation, and Adaptive Systems, Krieger,
Malabar, Florida, 1989.

[23] Haykin, S, 8., Adaptive ﬁlter theory, 3rd ed., Prentice Hall, Englewood Cliffs,
New Jersey, 1996.

[24] Kosut, R.L. (editor), IEEE Trans. Automatic Control (Special section on error
bounding methods for robust control), vol. 37, July, 1992.

[25] Lin, T. M., Nayeri, M., and J. R. Deller Jr., “Performance of Blind OBE in a
Correlated-Error Environment,” 30th Asilomar Conf. on Signals, Systems, and
Computers, Paciﬁc Grove, CA, Nov. 2 - 5, 1996 (in press).

[26] Lin, T. M., Nayeri, M., and J. R. Deller Jr., “A Consistently Convergent OBE
algorithm with Automatic Bound Estimation,” Int. J. of Adaptive Control and
Signal Processing, (in review).

[27] Liu, M. 8., Development and Analysis of an Optimal Bounding Ellipsoid al-
gorithm using Stochastic Approximation (Ph.D. dissertation), Michigan State
University, 1993.

117

[28] Liu, M.S., M. Nayeri, and J.R. Deller Jr., “Do interpretable optimal bounding
ellipsoid algorithms converge? Part II - OBE vs. RLS: Clearing the smoke,”
Proc. 10th IFAC Symp. on System Identiﬁcation (SYSID ’94), vol. 3, pp. 395-
400, Copenhagen, Denmark, July, 1994.

[29] Milanese, M., H. Piet-Lahanier, J. Norton, and E. Walter (editors), Bounding
Approaches to System Identiﬁcation, Plenum, London, 1996.

[30] Milanese, M., and A. Vicino, “Estimation theory for dynamic systems with un-
known but bounded uncertainty: an overview,” Automatica, vol. 27, pp. 997-

1009, 1991.

[31] Moore, J .B., “Persistence of excitation in extended least squares,” IEEE Trans.
on Automatic Control, vol. AC-28, No. 2, pp. 60 - 68, January, 1983.

[32] N ayeri, M., J. R. Deller Jr., and M. M. Krunz, “ Convergence and colored noise
issues in bounding ellipsoid identiﬁcation,” Proc. IEEE I CASSP ’92, San Fran-
sisco, vol. 5, pp. 337-340, March, 1992.

[33] Nayeri, M., T. M. Lin, and J. R. Deller Jr., “Blind error-bound estimation
for OBE identiﬁcation,” Proc. 34th Annual Allerton Conf. on Communications,
Control and Computing, University of Illinois, Urbana-Champaign, Oct 2-4, 1996
(in press).

[34] Nayeri, M., T. M. Lin, and J. R. Deller Jr., “Some novel variants of blind OBE
algorithm,” Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing,
(in press).

[35] Nayeri, M., M. S. Liu, and J. R. Deller Jr., “Do interpretable optimal bounding
ellipsoid algorithms converge? Part I - The long-awaited set-convergence proof,”
Proc. 10th IFAC Symp. on System Identiﬁcation (SYSID ’94), July, vol. 3, pp.
389-394, Copenhagen, Denmark, 1994.

[36] Norton, J.P. (editor), Int. J. Automatic Control and Signal Process (Special
issues on bounded-error estimation), Jan.-Feb., 1994-95.

[37] Odeh, S. F., Algorithm and Architectures For Adaptive Set Membership-based
Signal Processing (Ph.D. dissertation), Michigan State University, 1990.

[38] Rao, A. K., and Y. F. Huang, “Tracking characteristics of an OBE parameter
estimation algorithm,” IEEE Trans. Signal Processing, vol. 41, pp. 1140—1148,
1993.

[39] Rao, A. K., Y. F. Huang, and Dasgupta, “ARMA parameter estimation using
a novel recursive estimation algorithm with selecting updating,” IEEE Trans.
Acoustics, Speech, Signal Process., vol. 38, pp. 447—457, 1990.

[40] Rabiner, L. R., and B.-H. Juang, Fundamentals of Speech Recognition, Prentice-
Hall, Englewood Cliffs, New Jersey, 1993.

[41] Schweppe F. C., “Recursive state estimation: Unknown but bounded errors and
system inputs,” IEEE Trans. on Automatic Control, vol. AC-13, pp. 22-28, Feb.,
1968.

118

[42] Séderstrém, T., and P. Stoica, System Identiﬁcation, Prentice Hall, Englewood
Cliffs, New Jersey, 1989.

[43] Stark, H., J. W. Woods, Probability, Random Processes, and Estimation Theory
for Engineers, Prentice Hall, Englewood Cliffs, New Jersey, 1986.

[44] Veres, S.M., and J .P. Norton, “Structure selection for bounded-parameter mod-

els: Consistency conditions and selection criterion,” IEEE Trans. on Automatic
Control, vol. 36, pp. 474-481, 1991.

[45] Walter, E., and H. Piet-Lahanier, “Estimation of parameter bounds from
bounded-error data: A survey,” Mathematics and Computers in Simulation, vol.
32, pp. 449-468, 1990.

[46] Wiener, N., Generalized Harmonic Analysis and Tauberian Theorems, MIT press,
Cambridge, Massachusetts, 1964.

[47] Witsenhausen, H. 8., “Sets of possible state of linear systems given perturbed

observation,” IEEE Trans. on Automatic Control, vol. AC-13, pp. 556-568, Oct.,
1968.

119

 

"I22222222|||-122222225