.. .Ii‘ui

3'10".7r7v0l

I. ’o.-'. POIO!I)
v

I-A:P-.) .
.Il‘t‘... .IJ..3.§..€R v:
{1.01:2} ....z.....:..¢. >12!
...s..:.{:’).-; .2515 lrta
. 1:... {r1 .Jvc .
9.;

:1!!!
1.913.

a... 3:11.. ’15..
.34); 7..
2 2.3

. If}... - s1. 1‘
.5: .3 so...‘..3 f..."ul.¢-.|l . .
) otuolo’ 0.2.40. 0)::
a . Olv) .IO-A-Ilkh. of.

a:

.a.
1.. I .
All lat-it
. 3.1.0.! J .3 .1, .HA
.ﬁﬂnv...‘ I .
.ll. Mastur- x .t: .
A. 0"? ; ‘7 ..
l!!A‘.O.

31...: v
aI-X..v.

15.)
.61!

Fifi,” .
I! 1} ii; A
a All . I .
. . . mutton
3...... . £25..
. ”ﬁtﬂﬁris. .51.}... .
.311 ..I :1!" a 15.13:: .
5.14:} It: a: . lift! .
‘(v’ .
:..t.¥l.§a.}5
I. .Iktal‘l ’IJS. . . . .
3.1.3.2... . . . .
.<. 4. . . 1 .. IA.D‘.II:‘.A¢5!'II‘I
1.5.2.35}. {rift
.. 3...... : . r2. .ratezv

. . . .
nfflo
S 3...}: .23.. Y.
Ill...v):.:.5aa... 10G;

.fénihtu

40¢ .235. J . 1:...

..aia5val.. 6;. .ltiatftr..§l -.

f 35:39.... ..-..a .13, .lilalrzb...v.¢. . 1A.!)ﬁa...i.

s0. . 001.11.. .10. I:I:‘Jn:.9..i‘ftlnlwurllsl:llti:
A c4.)!lv.ti.‘lxlé;§i¢n El... .luual .l .4 0.4.31):

3-: ’05:! If “91:3“ 8".” it;

. I‘ll-'l" IF

 

' r1 rttilL‘

llllllliillllIIHHllllllllllillllllilliillilll'lllllllllllll

31293 00885 9948

This is to certify that the

dissertation entitled

Development and Analysis of An Optimal
Bounding Ellipsoid Algorithm Using
Stochastic Approximation

presented by

Ming-Shou Liu

has been accepted towards fulﬁllment
of the requirements for

Electrical
Engineering

Ph.D.

 

 

degree in

Major professor

Date 8'&' 93

MS U i: an Afﬁrmative Action/Equal Opportunity Institution

 

.n. .A._._....A,.._.A—-

M ~a—— -. -ﬂ-

4 _.....-—.. -. ,

 

 

LIBRARY
Michigan State
1 University

 

 

 

PLACE iN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU Is An Affirmative Action/Equal Opportunity institution
chS—oj

———___

DEVELOPMENT AND ANALYSIS OF AN
OPTINIAL BOUNDING ELLIPSOID ALGORITHM
USING .
STOCHASTIC APPROXIIVIATION

By

Ming-Short Liu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical Engineering

1993

ABSTRACT

DEVELOPMENT AND ANALYSIS
OF AN .
OPTIMAL BOUNDIN G ELLIPSOID ALGORITHM
USING STOCHASTIC APPROXINIATION

By

Ming-Shea Liu

Set-membership-based (SM) techniques provide an alternative to conventional system iden-
tiﬁcation algorithms. This research is concerned with a class of SM algorithms known as
the optimal bounding ellipsoid (OBE) algorithms which combines the weighted least square
error minimization approach with a bounded error constraint. More speciﬁcally, OBE al-
gorithms focus on ﬁnding an optimal hyper-ellipsoidal region for thefeasible set when the
additive model noise is bounded. Previous work has shown that OBE techniques have great
potential in many applications involving parametric modelling.

The published OBE algorithms, however, either do not include a formal convergence
proof or do not show interpretable geometrical meaning. In this research, a modiﬁed OBE
algorithm, which is both interpretable and convergent, is introduced. This algorithm is
referred to as set-membership stochastic approximation (SM-SA) algorithm because of its
resemblance to the stochastic approximation (SA) method developed by Robbins and Monro
in 1951.

Recently, a general class of OBE algorithms including all methods published to date, is
uniﬁed into a single framework called the uniﬁed OBE (UOBE) algorithm. UOBE is based
on generalized weighted recursive least squares in which very broad classes of “forgetting

factors” and data weights may be employed. Different instances of UOBE are distinguished

by their weighting policies and the criteria for determining optimal weight values. SM-SA is
a. special case of UOBE, and, through the analysis of this research and the paper by Deller,
Nayeri, and Krunz, the convergence properties of SM-SA, which will be introduced later,
can be applied to the UOBE framework.

In this dissertation, the updating criteria and optimal weights for shrinking feasible
sets for SM-SA are ﬁrst analyzed. Then asymptotic properties and performance of SM-SA
are discussed. Simulation results for different SM algorithms and comparisons to conven-
tional recursive least squares are also examined. Finally, the UOBE formulation and its

convergence properties are introduced.

Copyright by
Ming-Shou Liu
August 3, 1993

To my Mother and my wife
For their Love, Support, and Sacriﬁce

ACKNOWLEDGMENTS

A deep debt is owed to my advisor Professor Majid Nayeri for his encouragement and
guidance, and many hours spent on behalf of my dissertation. His direction is very important
for the development and analysis of this research. Thanks are also given to my co-advisor,
Professor John R. Deller, Jr., for his time spent with me in helpful discussions.

I would like to express my gratitude to other members of my Ph.D. guidance committee:
Professor Clifford E. Weil and Professor Hassan K. Khalil for their help in reviewing my
dissertation.

I gratefully acknowledge the ﬁnancial support provided by a grant from the National
Science Foundation and Ofﬁce of Naval Research.

vi

Acronyms and Abbreviations

Auto-regressive with exogenous input (ARX)
Auto-regressive-moving-average with exogenous input (ARMAX)
Multiple-input—multiple-output (MIMO)

Optimal bounding ellipsoid (OBE)

Optimal volume ellipsoid (OVE)

Recursive least squares (RLS)

Set-membership (SM)

Set-membership stochastic approximation (SM-SA)
Set-membership weighted recursive least squares (SM-WRLS)
Single-input-single-output (SISO)

Stochastic approximation (SA)

Uniﬁed optimal bounding ellipsoid (UOBE)

Weighted recursive least squares (WRLS)

vii

TABLE OF CONTENTS

LIST OF TABLES x
LIST OF FIGURES xi
1 Introduction and Background 1
1.1 Real-Time Identiﬁcation and the Recursive Least Square Estimator ..... 2
1.2 Introduction to Stochastic Approximation ................... 3
1.3 Introduction to Set-Membership Theory .................... 4
1.4 Review of Previous Work on OBE Algorithms ................. 6
1.5 Motivation for Developing the New Algorithm ................. 9
2 The Development of the SM-SA Algorithm 13
2.1 The Set-Membership Stochastic Approximation Algorithm .......... 13
2.2 Initial Values ................................... 20
2.3 The Optimal Weight ............................... 21
2.4 Data Selection .................................. 23
2.5 Relations Between SM-SA and SM—WRLS ................... 33
3 Convergence Issues 37
3.1 Fundamental Convergence Properties of SM-SA ................ 41
3.2 Convergence of Parameter Sets and Suboptimal Test ............. 47
3.2.1 Convergence of Parameter Sets ..................... 47
3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm ..... 50

3.3 Convergence of the Parameter Set with White Noise Case or Errors Distur-
bances ....................................... 52
3.3.1 IID White Noise Case .......................... 53
3.3.2 General White Noise Case ........................ 58
3.4 Independence and The Mixing Conditions ................... 72
4 Simulation Studies 79
4.1 Performance of SM-SA for Time-Invariant Systems .............. 80
4.1.1 SM-SA with Pseudo—Random Excitation ................ 81
4.1.2 AR Models with Binary Bernoulli Excitation . . . ' .......... 83
4.2 Performance of SM-SA for Time-Varying System ............... 85
4.3 Suboptimal Test ................................. 90

viii

4.3.1 AR Models with Pseudo—Random Excitation .............
4.3.2 AR Models with Binary Bernoulli Excitation .............
4.4 More Studies on Time-varying Systems .....................

5 The Convergence Properties of UOBE Algorithms
5.1 The UOBE Algorithm ..............................
5.2 Convergence Properties of UOBE ........ l ................

6 Conclusion

6.1 Concluding Remarks ...............................
6.1.1 Summary of Landmark Developments of SM-Based Techniques . . .
6.1.2 Contributions of this Research .....................

6.2 Future Work ...................................

A Inﬁnite Product

BIBLIOGRAPHY

96
96

101
101
103

105
105
105
106
107

108

109

2.1

2.2

4.1

LIST OF TABLES

The data-selection and the performance of SM-WRLS and SM-SA for the

AR model (2.50) .................................. 35
Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a1 = 1.6

and a2 = —0.68 ................................... 36
Noise processes used in the simulations in this chapter ............. 80

1.1
1.2

2.1
2.2

2.3

2.4

2.5

3.1

3.2

3.3

3.4

3.5
3.6

3.7

3.8

3.9

3.10

3.11

3.12
3.13

LIST OF FIGURES

Relationship between C(11) and 9(a) for the two dimensional-case. .....
The weights become very large when more data are used. ..........

Plots of V(/\,,) when cn > 0 and Gn < 1 .....................
When G7, > 1 and c,, > 0, if there exists one extremum of V(/\,,) in (0,1),
then there exist more than two extrema .....................
When cn = 0, the the only two possibilities for V(z\,,) in (0,1) are: (a) There
is a saddle point, or (b) no extrema and saddle points exist ..........
When c,, < 0, if more than one extremum appear in (0, 1), then there must
be at least three extrema. ............................
The relationship between logarithms of time and weight ...........

(a) If Evin—13%.. > 0 and ”m" in shaded area, then a contradiction results.
(b) If 6,7,,“an < 0 and v,,,,, in shaded area, then a contradiction results. .
Sample correlation coefficients of {wn} in the example on page 63 up to time
lag40 .........
Sample correlation coefﬁcients of {vn} in the example on page 63 up to time
lag 40 ........................................
Estimates of the parameter (1;, when the system is driven by (a) an Lid.
white noise (b) a white noise but not i.i.d.. ..................
Distribution of {mu} and {12"} at each amplitude interval. ..........
The logarithmic plot of volume versus the actual time n. We see that the
volume of ellipsoid is much smaller when the noise is i.i.d ...........
Distribution of a white noise at each amplitude interval after passing {vn}
through a soft-limiter. ..............................
Sample correlation coefficients of {vn} after passing through a soft-limiter. .
Estimates of the parameter a3 when 0,, is passed through a soft-limiter.

Distribution of a white noise at each amplitude interval after multiplying 12,,
by two and then passing it through a soft-limiter ................
Estimates of the parameter a3 when 12,, is multiplied by two and then passed
through a soft-limiter. ..............................
Sample correlation coefﬁcients of {vn} up to time lag 40 ............
Estimates of a3 by SM-SA and RLS when the noise is color ..........

27

29

29

30
34

56

62

62

64
65

66

66

67

67

68

68

69
70

3.14
3.15

3.16

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12

4.13

4.14
4.15

4.16
4.17
4.18

4.19

4.20

Distribution of a color noise at each amplitude interval. ...........
Sample correlation coefficients of a noise sequence obtained by passing {0”}
through a soft-limiter. ..............................
Estimates of the parameter a3 when the color noise {vn} is passed through

a soft-limiter. ...................................

The logarithmic plot of 11,, versus time n for thelinear time-invariant AR(3)

model with pseudo-random excitation. .....................
The logarithmic plot of Isnl versus time n for the linear time-invariant AR(3)
model with pseudo random excitation. .....................
First 1000 estimates of the parameters in the AR(3) model obtained using
SM-SA and a pseudo-random excitation, where a1 = 2.1, a; = —1.48, and
a3 = 0.34. .....................................
The relationship between logarithms of time and weight ............

The estimates of model parameters by RLS and SM-SA for biased noise
sequence. .....................................
The logarithmic plot of pa versus time 11 when a binary Bernoulli sequence
is used. ......................................
First 30,000 estimates obtained using SM-SA and RLS when a binary
Bernoulli sequence is used .............................
First 30,000 estimates obtained using SM-SA and RLS when a biased binary
Bernoulli sequence is used .............................
The acoustic waveform of the word “four” ....................
The “true” parameters for the word “four”: (a) b1(n) (b) bg(n) ........
Estimates of b1(n) for the word “four” using (a) SM-SA (b) RLS .......
Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS .......
The logarithmic plot of 11,, versus time n for the AR(3) model when the
suboptimal test is used. .............................
The logarithmic plot of Isnl versus time n for the AR(3) model ........
First 2000 estimates of the parameters using SM-SA, where a1 = 2.1, a; =
—1.48, and a3 = 0.34. ..............................
The relationship between logarithms of time and weight ............
Estimates of parameter using RLS and SM-SA for the suboptimal test. . . .
First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary
Bernoulli sequence is used and suboptimal test is employed. .........
First 30,000 estimates obtained by SM-SA and RLS when a bias binary
Bernoulli sequence is used and suboptimal test is employed ..........
Estimates of (a) b1(n) (b) bg(n) by using SM-SA algorithm for the word
“four” when noise bounds are over-estimated ..................

'71

81

82

82

83

84

85

86

87

88

89

91

92

93
93

94

94

95

97

98

100

Chapter 1

Introduction and Background

“Man creates models of his natural and man-made environments to understand and explain
them better and as prelude to action” [8]. Indeed, from social science to engineering and
natural science, modeling plays an important roll to help humankind describe functions of
processes and to predict future actions. For example, in economics, modeling is used to
study inﬂation, to predict the trends of stock markets, and to maximize proﬁts on invest-
ments. In engineering applications, modeling can be used to analyze the characteristics
of a process, optimize the performance, and study the functions and dynamics before a
large plant is actually built and large amounts of money and effort are invested. In many
branches of science, modeling is used as an aid to describe and understand reality.

The term “system identiﬁcation” generally refers to the determination of a mathematical
model for a system or a process by observing its input-output relationships. Based on the
model structures that are used, system identiﬁcation methods are generally classiﬁed into
two categories: Nonparametric and parametric. A nonparametric model is described by
a curve, function, or table. Impulse responses and frequency diagrams are two examples
of a nonparametric model. In this dissertation, we will consider the parametric model
only. Such models are characterized by a parametric vector. Typical parametric models are
auto-regressive with exogenous input (ARX), auto—mgmssive-moving-average with exogenous
input (ARMAX) [16], and state-space structures.

For modeling engineering systems and industrial processes in practice, some basic char-

acteristics of the system, such as linearity and stability, are assumed to be known. Further-

more, in many cases, we assume to have basic knowledge about the mathematical structure
of the system such as the order of the system, the type of the model, and so on, so that it
is possible to derive a speciﬁc mathematical model of the system dynamics. Consequently,
only a set of parameters in the model equation are left to be determined. In this research,
we derive a new algorithm for identifying model parameters which has well-analyzed con—
vergence properties and an intuitively meaningful optimization criterion. This algorithm is

applicable when the system is corrupted by additive and bounded noise.

1.1 Real-Time Identiﬁcation and the Recursive Least

Square Estimator

In classical system design, modeling is accomplished once during the design stage. This is
called off-line identiﬁcation. In most adaptive systems, on-line or recursive identiﬁcation
is desirable to allow optimum adaptation in the face of environmental uncertainty and
changing environmental conditions.

From computational point of view, the memory requirement for recursive estimators is
quite modest since not all the data are stored. In off-line identiﬁcation, however, all the
previous data have to be memorized. I

A variety of well-known recursive techniques such as least square error (LSE), maximum
likelihood, instrumental variables [16], and stochastic approximation [1] have been used to
identify system parameters in real time. Among these techniques, the LSE method is most
popular because of its simplicity and well-understood criteria.

To review the LSE method, consider a simple linear regression model
y(t) = ITO)"

where x(t) is an n-vector of known quantities and 0 is an n-vector of unknown parameters.

The LSE estimate of 0 at time n is obtained by minimizing the cost function

u

v<é> = ,2 (ya) -x"'é)”

t=1

Recursive least squares (RLS) [16] estimates are obtained through an on-line iterative com-
putation of the off-line LSE estimates. Weighted recursive least square (WRLS) is a variation

of RLS which minimizes

me) = g :1 w, (y(t) — xT6)2

where {wn} is a weighting sequence.

1.2 Introduction to Stochastic Approximation

Stochastic approximation algorithm is concerned with the problem of estimating the pa-
rameters of a linear system by using the noisy observations.

Let f : 32’ -+ it” be an unknown function with a unique zero a, i.e.

f(8) = 0 (1-1)

We assume that f () can be observed at any x 6 3V. Stochastic approximation (SA) refers
to a general class of recursive algorithms for estimating an equation of the form (1.1) when
the observation f () is corrupted by random noise.

The origins of SA can be traced back to a paper by Robbins and Monro [1] in 1951.
This paper is concerned with the problem of estimating the roots, say a, of an unknown
regression function f (x), which is the mean value of an observable random variable Z(x).

Robbins and Monro prove that a can be estimated recursively by
a'n-H = a1! + an(é,,) (1'2)

where l,, is a sequence of positive scalars that converges to zero. The reason why {1"}
must converge to zero is because convergence of a}, can take place only if the product
l,,Z(a,,) = 0. Since Z (5,.) 7’: 0 with probability one, 1,, must be zero. This ensures that the
noise is rejected by paying less and less attention to the noisy observations.

Robbins and Monro state the sufﬁcient conditions for convergence of the estimate a to

the true parameter in probability [1]:

1,, z 0
2:33:11, = 00 (1-3)
;',‘_’___1(l,,)2 < 00.

Any sequence {In}?i__o which satisﬁes (1.3) is called a sequence of type 1/n. A common
choice for the sequence {la} is 1,, = 1 / n. Ljung [12] further weakened the second condition
in (1.3) to

00

EU")? < 00

11:1
for some p > 1. Blum [2] (1954) extended Robbins and Monro’s theorem to the multidi-
mensional case.

In the next two sections we will brieﬂy review the development of “bounding ellipsoid”
algorithms and investigate the problems associated with the previous work on these meth-

ods. We will then demonstrate that these problems can be overcome by an SA formulation.

1.3 Introduction to Set-Membership Theory

Set-membership (SM) identiﬁcation refers to a class of recursive techniques for estimating
parameters of linear systems or signal models under a prior information that constrains
the solutions to certain sets [25]. In contrast to conventional system identiﬁcation schemes
(e.g. maximum likelihood, LSE, etc.) which yield point estimates of the parameters, an SM
algorithm yields sets of parameter estimates which are compatible with noise constraints.
These sets are generally called “membership sets.” Based on certain set-theoretic checking
criteria, SM algorithms select and use only the useful data to update the parameter estimates
and reﬁne the membership set. When data do not help reﬁne these membership sets, the
effort of updating the parameters can be avoided.

SM theory was pioneered by Krasovski [3], Schweppe [4], Witsenhausen [5], and Bert-
sekas and Rhodes [7] in the late 19603, in the control and system science domain for estimat-

ing the state of a linear dynamic system. The optimum estimate is the smallest time—varying

ellipsoid which contains the unknown system states.

A major appeal of parameter set estimation is that it does not rely on the availability of
a probabilistic description of the uncertainty affecting the model. It simply delineates the
range of the parameter space yielding model-output values within speciﬁed ranges about
the observed output samples. 0n the other hand, since the estimates are updated only
when the input data contain sufﬁcient information, some SM algorithms are more efﬁcient
in some applications.

In recent years, SM-based signal processing is receiving considerable attention and is
becoming increasingly popular around the world as is evident from the extensive literature
on this topic [14, 20, 23, 26, 29, 31, 36, 39]. This research is restricted to the class of
algorithms known as optimal bounding ellipsoid (OBE) algorithms which follow a bounded
error constraint.

The original OBE algorithm is attributable to Fogel and Huang [14] in 1982. Let us refer
to this algorithm as “F-H/OBE.” Deller [25] and Deller and Luk [26] have reformulated the
F-H/OBE algorithm so that the center of the hyper-ellipsoid is exactly the conventional
WRLS solution. This algorithm is referred to as set-membership weighted recursive least
square (SM-WRLS) algorithm in order to distinguish it from the F-H/OBE algorithm. The
main deﬁciency of these algorithms is the lack of well-understood convergence properties.
This problem led to the development of the modiﬁed OBE algorithm by Dasgupta and
Huang in 1987. We will refer to this algorithm by “D-H/OBE.”

Recently a general class of OBE algorithms, including all methods published to date
as well as the algorithm presented in this work, is uniﬁed into a single framework called
the uniﬁed OBE (UOBE) algorithm [38, 39]. Historically, this research on the “SM-SA”
(presented in this thesis) algorithm, and that of Krunz on the “Dual-SM-WRLS” algorithm
[33, 34], precedes and contributed to the development of UOBE. We shall later discuss the
fact that SM-SA is an important special case of UOBE and show how the many of the
results of this work apply to all algorithms in broader UOBE formulation in Chapter 5.

Another OBE—like technique called optimal volume ellipsoid (OVE) algorithm was pro-

posed more recently [31, 32] by Cheung, Yurkovich, and Passino to estimate parameter sets.

It is proven that the algorithm is truly optimal in the sense that the volume of ellipsoid is
smallest at each recursive computation. Simulation studies in [31, 32] compare the perfor-
mance of OVE and F-H/OBE. They show that OVE gives tighter parameter bounds. The
estimates given by OVE, however, do not show signiﬁcant improvement over the estimates

given by F-H/OBE.

1.4 Review of Previous Work on OBE Algorithms

In principle, F-H/OBE and SM-WRLS are the same in spirit. However, SM-WRLS has a
structure which is simpler and more elegant than the F-H/OBE because of the choice of the
weighting sequence (see [38, 39]). Thus we begin with a review of SM-WRLS': similar results
on F-H/OBE and D-H/OBE are found in [38, 39]. Note that the theoretical development
of SM-WRLS can handle a complex vector, multiple-input—multiple-output (MIMO) model,

we consider only real scalars here for simplicity [28]. Let

P 9
3’11 2 Zfiyn-i + Zgjun-j "l’ ”11 (1'4)
i=1 j=0
: 9.1.x” + v,, (1.5)

where O“ = [fl, . . .,f,,,go, . . .,g,,], x: = [yn_1, . . ., ,,_p,u,,,u,,-1, . . .,u,,..,,], y" and u,, are
real measurable outputs and inputs, respectively, and v,, is an unknown noise process. Let
m = p + q + 1. We assume that m > 1 and, for each time n, v,, is known to be bounded
according to

113, < 7,, (1.6)

where {7“} is a known positive sequence.
Let w(n) be a parameter set at time n such that all 9 e w(n) are feasible parameter

estimates consistent with the error bounding in (1.6). In conjunction with the model of

 

‘Generalized results for the more general weighting policies of UOBE are found in [38, 39] to which the
present research contributed.

form (1.5), w(n), which is a “hyper-strip” region, can be expressed as
w(n) = {0 : 0 E 52ma(‘li’n " aTxn)2 S 7n}

Intersection of these sets over a given time range t E [1, 11] form convex polytopes of feasible

parameters
ﬂ

52(1).) = n w(i). (1.7)
i=1
In general, (2(a) is an irregular convex set that is difﬁcult to describe and track analytically.

In conjunction with the WRLS processing, however, (2(a) can be shown to be contained in

a hyper-ellipsoid superset Il(n) [25]

 

Il(n) = [a : (o — 9,)TC(")(0 — 0,.) g 1}

Nu

where 0(a) is the weighted covariance matrix,
C(n) = C(n — 1) + Anxnxf (1.8)
5,, is a scalar quantity,

n 2
5,, = 0£C(n)0,, + z: /\,,7,, ( — {1)

i=1 n

and G,,, the center of 5(a), is the WRLS estimate at time 11 using the weights {A3321 and

can be computed recursively using

lll>

C(n) P'l(n)

G,, = x£P(n—l)x,,

5,, = y» - 93.11:"
AnP(n - 1)x,,x3,'P(n — 1)
1 + AnGn

 

P(n) = P(n — 1) —

0,, = 0,,_1+’\,,P(n)x,,e,,

Anef,

1 + AnGn

Kn = 511—1 + A1177; —

The relationship between ﬁ(n) and 9(n) for two dimensional case is shown in Fig. 1.1.

 

 

5‘2 (2)

 

(0(3)

 

 

 

 

Figure 1.1. Relationship between C(11) and Q(n) for the two dimensional-case.

The SM-WRLS algorithm begins with a large ellipsoid, 37(0), which contains all admis-
sible values of the model parameter vectors. The objective is to employ the weights An to

sequentially minimize the volume of the ellipsoid deﬁned by

mm» 2 det{~.c-‘(n)}

It has been shown that the optimal weight A; is the unique positive root of the quadratic
equation [14, 38]
a,,AE, + b,,A, + a,, = 0 (1.9)

where

an 2 (m — 1)G:7n
b,, = 2mG,,7,, — n,,-1G,2, — G,,7,, + 63,0“ (1-10)

2-
n

cu = m7" — me M40...
When the roots of equation (1.9) are both negative, the data do not contain useful infor-
mation and hence updating can be avoided. Since a,, is always positive, if c,, is negative
then there exists a unique positive root for (1.9). When c,, _>_ 0, b,, is nonnegative because

according to (1.10)

‘ b2.“ = (2m — 1)G,,7,, + 6,2,G,, — x,,_le,
17’? 2 mGnn + 5310,; — ten-10?.
= G,,(m7,, + £3, — Kn-1G,,)
Z G,,(m'yn - me: — 5,,_1G,,)
= 0,612,} o.
6..
Therefore both roots are negative. So there is no optimal weight at this time and updating
can be avoided. Thus the estimate: are updated if and only if a,, < 0. The sign of a,, serves
as an indication for updating. Similar result for trace minimization in which optimization

is applied to ﬂg{§(ﬂ)} .9— tr{n,,C"1(n)} can be found in [14, 38, 39].

1.5 Motivation for Developing the New Algorithm

Although the SM-WRLS algorithm and related algorithms have been applied in many areas,
it is difﬁth to formally analyze the convergence properties. From simulation results, we
have found that with F-H/OBE and SM-WRLS the optimal weight A; might drift to inﬁnity

as more data are used. The reason for this anamoly is as follows: The elements of the

10

weighted covariance matrix C(n) in (1.8) may become very large when more data are used.
So, relatively speaking, the weight A; in (1.8) must be large enough in order for x,, to make
a difference and to effectively shrink the volume of ellipsoid. To illustrate, we consider an
AR(2) system given by

y,, = 1.6yn-I - 0.6811,.-2 + 11”. (1.11)

11,, is a colored noise sequence generated by passing a white sequence through a Butterworth
bandpass ﬁlter of order 14 with passband edge frequencies at w = 1.75 rad and 2.75 rad.
The passband attenuation is -1dB and stopband attenuation is -60dB. This system is stable
since the poles are located at —0.8 :l: 0.2 3' in the z-plane. Figure 1.2 shows that the gradual
growth of the weights as more data are used. In this situation, even if the estimate 0,,
approaches the true parameter asymptotically, the large weights still make this algorithm
difﬁcult to implement on ﬁnite word-length machines. This is just an example in which the
optimal weight may diverge. In order to deal with the convergence issues, Dasgupta and
Huang introduced the D-H/OBE algorithm [20] in 1987. In this algorithm, the updated

covariance matrix is a convex combination of C(n — 1) and the new data outer product,

C(n) = (1 — A;)C(n — 1) + A;x,x£.

The number (1 — A;) is referred to as a “forgetting factor” by Dasgupta and Huang. In a
sense, this D-H/OBE represents signiﬁcant progress in OBE theory because it involves a
formal convergence proof for the weights A} However, the optimal weight A; in D-H/OBE
is obtained by minimizing x,, at each n. Since the magnitude of x,, is not clearly related
to the size of the membership sets (see [30, 38, 39]), this makes the D-H/OBE algorithm
difﬁcult to interpret geometrically.

Therefore, the objectives of this research are:

e To develop a new algorithm to remedy the trade-off between interpretability in weight-

ing strategy and convergence issues.

0 To provide a convergence proof for the new algorithm.

11

 

1013 A u 1 r T l I I I I

II III!“

[fill

1010

I lllllIII
I llllllll

107

I It'll"
l llllllll

 

 

104

l l l [111“

101

l l 1111“]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10-2 . 1 . . ‘ .
010002000300040005000600070008000900010000

11

Figure 1.2. The weights become very large when more data are used.

12

c To perform simulations to study and compare the performance of the new algorithm

with that of existing methods.

In the next chapter, the new algorithm “set-membership stochastic approximation” (SM-
SA) is derived for an ARX model with bounded noise. SM-SA is developed in a systematic
way to accommodate bounded weights and covariance matrix. The relationship of SM-SA to
the stochastic approximation approach is investigated. If the weights of SM-SA are chosen
to be 1 / n, then SM-SA is exactly the RLS estimator.

In Chapter 3, the convergence properties of SM-SA are discussed. First we show that
the sequence of optimal weights {A‘,",} converges to zero as n approaches inﬁnity. The
convergence of the weights to zero is important because it is a good indication that the
estimates converge. This is so since we pay less attention to noisy observations when A;
is close to zero. The main point in this chapter is the convergence of the ellipsoid and its
center, that serves as the estimate of the SM-SA. For RLS, the performance of the estimator
is highly dependent on the statistics of the noise sequence. In SM-SA we will see that the
dependency is much less than that of RLS.

Simulation studies are presented in Chapter 4. In light of the theory and analysis
established in the previous two chapters, we will examine the performance of SM-SA excited
by different noise sequences.

The UOBE framework is introduced in Chapter 5. We will show that the convergence
properties of SM-SA can be applied to the more general weighting policies of UOBE.

Chapter 6 lists the conclusions of this research. We will also suggest some possible

directions for future study.

Chapter 2

The Development of the SM-SA

Algorithm

In the previous chapter, we have observed that in SM-WRLS, the weights A; may become
very large, particularly for large 11, leading to huge numbers in computations and numerical
instabilities. In this work, we will show a feature that tends to promote convergence of the
ellipsoid by preventing the “drift” of the covariance matrix toward inﬁnity. In this chapter,
we will start deriving the SM-SA algorithm by modifying SM-WRLS so that the covariance
matrix is normalized to the sum of the weights. We will develop the optimality test based

on the minimum volume criterion and ﬁnd the similarities between SM-SA and SM-WRLS.

2.1 The Set-Membership Stochastic Approximation Algo-

rithm

We begin with an ARX model given in (1.5) and let the noise bound be the same as in

(1.6). For convenience, we rewrite the system here:
9 q T
yn = Zflyn—i + zgjun-j "l' ”11 = a. xn + ”a (2.1)
i=1 j=0

In an unweighted RLS algorithm, the problem of numerically unstable quantities can be
eliminated by normalizing the covariance matrix to the time n, that is, by replacing C(n)

by (1 /n)C(n). In principle, if the sequence {x,,} represents a stationary stochastic process

13

14

with appropriate ergodicity properties, then (1/n)C(n) will tend to E [x,,xZ]. Clearly,
however, this strategy may not work for weights determined by SM considerations as C(n)
may grow much faster than n. In the SM-SA approach, SM-WRLS is modiﬁed so that the
covariance matrix is normalized to the sum of the weights.

Let AfM be the optimal weight of SM-WRLS at time n. Deﬁne
2 2),”! = 2,., + AS“, 20 2 o
t=1

and

Mn) §;C(n)
= fl; [C(n — 1) + Afonxz']

= 2i [2,,_1R(n — 1) + AfonXZ]

En— SM

 

       

       

 

 

 

_ 1 n T

' 2,, R(n 2,,
If we deﬁne the normalized weight A,, by

A3”
A,, — En , (2.2)
then
2,,_, _ AfM _
2n — l — n — 1 — A,,

Thus the normalized covariance matrix R(n) can be written as
R(n) = (1 — A,,)R(n — 1) + A,,xnxf (2.3)

where A,, 6 [0,1]. Remarkably, this approach in which R(n) is a convex combination of past
normal matrix and incoming puter product, results in the same “covariance” matrix and re-
cursive formula as those in D- li/OBE. However, SM- SA has been developed systematically

from SM-WRLS expressly to connect the gap between the convergence issues and geomet-

15

rical meaning in previous work. This philosophy is quite different from the development of
D-H/OBE in which geometrical interpretability is sacriﬁced.
Replacing A,, for ASM leads to the modiﬁcation of the SM-WRLS original recursions.

’ \
1)!

The description of the corresponding ellipsoid is given in the following theorem.

W.

 

Theorem 2.1 The hyper-ellipsoid region corresponding to the .S'M—SA algorithm is given

by
r(n)_{0|(9 a,,) (o 0,.)<1} (2.4)

where x,, is the quantity

n. = f: ( f1 (1 — A») Am.- — y?) + ofmnw... (2.5)

i=1 j=i+1

G,,, which is the weighted LSE estimate at time 11 using weights {A,,}, can be computed

recursively using the relations

 

 

P(n) é R-1(n) (2.6)
G,, = x£P(n — 1):... (2.7)
5,, = y,, - 95411,, (2.8)
__ 1 A,,P(n — l)x,,x,7,'P(n - 1)
P(n) _ 1 _ 1,, P(n— 1) — 1 _ A” + 1,, G" (2.9)
0,, = a,,.l + A,,P(n)x,,e,, (2.10)
A,,(l — A,,)ef,

 

x,, = (1 — A,,)rc,,_1 + A,,7,, — (2.11)

1—A,,+A,,G,,'

 

Proof: From the bounding condition 11,2, 5 7n, the accumulated inequality holds:

n 71
2v? S 271-

i=l i=1

16

That is,
Z(l/i - O'sz‘)2 S 27;.

i=1 i=1
The relationship remains true if the individual constraints are ﬁrst multiplied by any non-

negative quantity. Thus, we have

2":( ﬁ (1 - Ail) A;(y.- - 013(1)2 S i( ii (1 — Ag) A171 (2.12)

i=1 j=i+1 i=1 j=i+1

where
n
A
H (-) = 1.
j=n+l
Therefore, given a sequence of weights A;, (2.12) represents a global membership set, say

I‘(n), to which 0" must belong. Since 9(n) in (1.7) is the smallest known set, it must be

true that 9(n) C [(11). Let usﬂdeinote any general element of I‘(n) by 9. Upon expanding

,r/j ,...

the square, we then have that

z": ( fl (1 " A1)) Ail/i2 - 22”:( ﬁ (1 - Aﬂ) [\gygaTx,

i=1 j=i+1 i=1 j=i+l

+ E“: ( fl (1 — A19) MOTXiXiro S i( ii (1 — A,)) Am.

i=1 j=i+1 i=1 j=i+l

The following quadratic inequality in 0 emerges after rearranging the terms

: ( fl (1 - A5))":11'2- 291‘: ( f1 (1 - M) Aim-X.-

i=1 '=i+1 i=1 j=i+1

i=1 j=i+1 i=1 j=i+1

+9T [2“: ( f1 (1 ’ A12) Amati] 9 S i ( ﬁ (1 - A19) Am (2-13)

For the WRLS solution, we have [21]

a, = [i( ﬂ (1 — m) 1,11,11,14 [f3( f1 (1 — A,)) Asa-w] . (2.14)

i=1 j=i+1 i=1 j=i+l

17

Thus

[2": ( f1 (1 - ’21)) Agxpcg] 9n = Z": ( n (1 - A») Aixiyi- I (2.15)

i=1 j=i+1 i=1 j=i+l

Substituting (2.15) into (2.13), we have

f:( ﬁ (1 - 1,)) 1,3,3 — 297' [i( 1"] (1 — A») A,x.-x,T] 0,,

i=1 j=i+l i=1 j=i+1
n n n n
+ 9T 2 H (I — Aj) Agxgx? 9 S 2 H (1 - Aj) Ag‘yg. (2.16)
i=1 j=i+1 i=1 j=i+1

For an off-line estimator, the covariance R(n) can be computed as

i=1 j=i+l

R(n) = i ( ﬁ (1 - Aﬂ) A;X;X,T.

Thus

i=1 j=i+1 i=1 '=i+1

in: ( ﬁ (1 — A») My? — 29111009" + 9TR(")9 S i ( ﬁ (1 - Ad) Am.

Since R(n) is symmetric, this can be written as

n

2”: ( fl (1 " Ail) A£313,449"9n)TR(n)(9"'97:)“911100911 S in: ( H (1 — *0) Am-

i=1 j=i+1 i=1 j=i+1
(2.17)

Since 14,, is given by

Na = i( fl (1 - A1)) A{(71 — y?) + 95110091"
i=1 j=i+1
(2.17) is reduced to
(a — 9,)TR(n)(a — a,,) g x,,,

and (2.4) and (2.5) are established.

18

The matrix inversion lemma [16]:
(A + BCD)“ = A-1 — 11-13(0-1 + DA-lBrlDA"1

in which A and C are nonsingular matrices, and B and D are matrices with appropriate

dimensions, can be used in (2.3) to obtain

 

(1 -— A,,)“1P(n — 1)x,,x£(1 - A,,)‘1P(n — 1).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

P = 1—A,,‘1P —1— 2.18
(n) ( ) (n ) A;1 + x£(1 — A,,)‘1P(n — 1)x,, ( )
Replacing x£P(n — 1)x,, by C(n) in (2.18) yields
1 _ (1 — A,,)‘1P(n — 1)x,,xTP(n — 1)
P = P - 1 — n
(n) 1 — A,, _ (n ) A? + (1 - A,,)-1G,,
r __ T _
= 1 P(n— 1) _ A,,P(n 1)x,,x,,P(n 1) .
1—A,,_ l—A,,+A,,G,,
For the WRLS estimate G,,, we have
a. = Pm [)3 ( H (1 — A») Ann-ye]
i=1 j=i+1
_ 1 ' A,,P(n — 1)x,,x£P(n - 1)“ " ' n ,
_ 14” P(n—1)— 1—A..+A..G.. Z; _1'1(1—1,) A,x,y,
. . i— J—i+1
_ 1 ' A,,P(n — 1)x,x£P(n — 1)“
" 1—.\,. _P("‘1)‘ 1—A,,+A,,G,, J
n—1 n-1
' (1 - A102 H (1 - *1) MHz/1+ Anxnyn
i=1 j=i+1
A,,P(n - 1)x,,xT0,,_1 1 A2P(n — 1)x,,G,,y,,
= G,,- - n — n - n n - n
1 1—A,,+A,,G,, +1—A,[’\ P(“ my 1—A,,+A,,G,,
_ 0 _ A.P(n — 1)x.x£0.-1 + A.P(n - 1)x.y. (1 1.6.. )
' "‘1 1—.\,,+.\,G,, 1—.\,. 1—A,,+A,,G,,
__ 0 _ AnP(n - l)xnx£0n_1 A,,P(n — 1)x,,y,,
‘ "‘1 1 — A,, + A,,Gn 1 — A,, + 1,0,,
= 911-1 + A,,P(n — 1)x,,£,,

 

1-A,,+A,,G,,'

19

 

 

 

 

Note that
1 A,,P(n — 1)x,,G,,
P(n)x,, _ 1 __ 1,, [P(n— 1)x,, — 1 __ 1,, + (“an
_ P(n — 1)x,, (1 _ A,,Gn )
— 1—.\,, 1-A,,+A,,G,,
_ P(n — 1)x,, -
" 1 - A,, + A,,Gn
Therefore

0,, = 0....1 + A,,P(n)x,,e,,.

To show (2.11), consider

Kn

Thus the proof is complete.

n-1 n-1
(1 — An) 2 ( H (1 "' Ail) )‘i(7i ’ Ila?) + An(7n — 31721)

i=1 '=i+1
+(93i—1 + AanP(n)6n)R(n)(0n_1 + AnP(n)Xn€n)

n-1 n-1
(1 - A») Z ( H (1 - 15)) 11(71- 1!?) + Am. - 113.)

J=i+1
+0£_1R(n)0n_1 + 2A,,x,7,' 0,,_1 5,, + Aix£P(n)x,,ef,

n—1 n—1
(1 - An) 2 ( H (1 " A») A471“ 3112) + An(7n - 11:) + 31193—139167:

i=1 j=i+l

i=1 '

P in — 1 x,,6,2,
+(1 — An)0£-1R(n - 1)0'1-1 + An0£_lxnx£0n-l + Aile —( A" +)’\nGn

 

 

 

 

 

(1 — An)Kn—1 + 1.17. — 143.) + mien-1)” + 21.19? 92-15" + "i 1 — 13:53.0.
(1 — A,,)nn_1 + Arm. — A. [112 - 2(XZOn-1)yn + (’91.' 911-02] + A311 _ AffanGn
(1 — An)"n—1 + A,,), - 1,5,2, + 13.1 _ Affinan

(1 - Anya-1 + A1177; — 1A:(:n—+’\;:§n '

frf‘

20

2.2 Initial Values

The covariance matrix in SM-SA is

i=1 j=i+1

Rm= :(ﬁu-n)ms

In order to start the recursive algorithm, it is necessary to choose initial values for 0,, and

P(n). If we incorporate the initial condition into the covariance matrix, then it will become

R(n): 1210—1.) no+2nj(1'[ (1—1,-))1.T.-x,-x. (2.19)

i=1 i=1 j=i+1

Note that [H?__.1(1 — A,-)] R0 will not be zero if R0 is chosen to be positive deﬁnitive.
Generally speaking, P(O) = R61 reﬂects the conﬁdence in the initial estimates 00. If

E- ..

 

P(O) is large, from (2.10) we see that the next estimates, 0,, will jump away from its initial

value. Without any a prior information about 00, it is common to take

00 = 0 and P(O) = 0'1.

where a is a “large” number, typically 106 so that the impact of the choices of initial values
is small. From (2.19) we can also see that this choice will ensure the estimates to be close
to the off-line LSE estimate.

Note that given any initial condition P(O), if
n
“11:30 11(1 — 1,) = o (2.20)
i=1

then the recursive estimates will approach to the off-line estimates asymptotically. Using

material in Appendix A, (2.20) implies that

iA;=oo

n=1

From theory of stochastic approximation, as discussed in Chapter 1, this is an essential

21

condition for the convergence of estimates to the true parameters.

2.3 The Optimal Weight

The optimal weight A; of SM-SA is found by using the minimization criterion 1 introduced

in Chapter 1. That is, A; minimizes
1108101)} 3 det{x,,R'l(n)}.
Let B(n) é x,,R'1(n). Then by deﬁnition,

B(n) = x,,P(n)

 

 

 

 

x,, A,,x,,x£B(n — 1)
' B(n ' 1)(1 — 1,.)1r;,,_l ’ n,,-1(l — 1.. + AnGn) '
Deﬁne
h,, 2 1 — 1,, + 1,,G,,, (2.21)
and
A 1 m. An‘rn Ans“
dn = _ _ 1 + — —-"—.. 2.22
I — A,, x,,_1 (1 - A11)1“1ri-1 "n-l hn ( )
Then
B(n): B(n -- 1) {d,, I — d ﬂ[B(A"x--——h—:n — 1)x,,]T}
,,_
and

d ,,A,,x,,

det B(n) = det B(n — 1)det {d,,I — Kn ——[B(n - 1)anT}

Since B(n — 1) is a known quantity at time n, it sufﬁces to minimize the volume ratio V(A,,)

given by
A det B(n)
det B(n— 1).

 

V(1,,)=

Thus

d ,,A,,x,,

V(A,,) = det {11,1 — n. —’:;[B(n — 1)x,,]T}. (2.23)

22

Using the matrix identity [26]

det(cI + sz) = c'"“‘(c + yTz) (2.24)
and letting
d A x 4
c=d,,, =—-1—n—n-, z=B n—l x,,,
y 81-1 h.. ( )

then (2.23) becomes

 

V(’\n) = dill-l (dn Kn—lhn
( A,,xznﬂ-1P(n - 1)x,,)
1 _
Kn-lhn
(Kn—1hr: - AnKn-IGn)

Kn-lhn

(11,, - A,,G,,)
h

d,,A,,x,7,'B(n -— 1)x,,)

u
11”:

 

ll
:1”:

a“; a”?

To minimize the volume ratio with respect to A,,, we must differentiate (2.25) and set the

result to zero:

 

 

 

0 _ 6 _,
a 1. m.) — K [m1 - 1.1». ]
dart-I
= Fug—ha (2.26)
where
0d,,
P(An) = (1 — A,,)mhnai— - (1,107,. (2.27)
Since
6d,, _ 1 x,, 1 0 x,,
a—A: — (I — A,,)2 n,,_1 + (I - A") 6A,, n,,_1 (228)
1 1 7,. _ 53,[1 — 1,, — 1,11,]

—d.. + —— -1 + ,
(1 '- An) (1 '- An.) Kit-I hi’cn—l

 

 

23

(2.27) becomes

 

 

 

 

m1.) = mh,, [d,, — 1 + “:1 — 531“ 22;":1*"h")] — 1.0... (2.29)
The Optimal weight A; is found by setting F (A,,) = 0. Thus
mh,, [d,, — 1 + “:1 - €:(1—113:\:..:1A;hn)] = d,,Gn. (2.30)
Expanding the h,, and d,,, we have a quadratic equation
a,,13, + (1,1,, + c, = 0 (2.31)
where
an = mm. - "163, + mem. — 2mGn7n - 161—10.. + ten—1G,”, + Gn7n - 0.2.7.1 - 53,0,

b,, = 2me; — 2m7,, + 2mG,,7,, + 2n,,_1G,, — n,,_lG; — G,,7,, + 6:0,,

2
a,, = m7,, — me; — x,,_1G,,.

This SM technique is called set-membership stochastic approximation (SM-SA) because of
the resemblance to the stochastic approximation method. The central difference between
D-H/OBE and SM-SA is that the A; in their algorithm is obtained by minimizing x,,,
while in SM-SA.it is obtained by minimizing the volume ratio. Indeed, there is an ellipsoid
associated with Dasgupta-Huang’s OBE at each step, but the meaningful measures of its

size are ignored in the optimization process — sacriﬁcing interpretability.

2.4 Data Selection

In this section, we will derive the criterion for updating. From (2.25) and the deﬁnitions
of h,, and d,, in (2.21) and (2.22), it is easy to see that V(O) = 1 and V(A,,) > 0 for all
0 < A,, < 1. Note that d,, and h,, are unity when A; = 0. Thus from (2.26) and (2.29), we

24

have

 

 

 

6V(’\n)
= 2.32
0A7; ﬂ:0 F(O) ( )
2
___ m( “In _ En ) _ G,, (233)
Kn-l 511—1
6,, ‘
= ~14 (2.34)

where 1c,,_1 > 0 by deﬁnition (2.4) for A,,-1 6 [0, 1). Equation (2.34) reveals some interesting
relationships between the sign of c,, and the volume ratio, because if a,, < 0, there exists
some A,, such that the volume ratio is less than unity, which means that the volume shrinks.

Before we develop the main theorem on this point, we need some preliminary lemmas:

Lemma 2.1 If G,, < 1, then h,, > 0 for all A,, E (11%;) and

lim h,, = 0+
A,,—9%;-

IfG,, > 1, then h,, > 0 for all A,, 6 (1710.71 1) and

lim h,, = 0"

1 +
An" 1-G,,

 

where h,, is deﬁned in (2.21).

Proof: By deﬁnition,
h,, = 1—1,,+A,,G,,

= 1 — A,,(l — G,,).
/—\.
If/an) 1, then 1/(1 — G,,) > 1. So the interval (1,1/(1 — G,,)) is not empty. For all

AJE (1, 1/(1 — G,,)), we have
1

1<A,, —.
< l—Gn

Multiplying each term by (1 — G,,),

0<(1—G,,)<A,,(1—G,,)<1.

25

Thus, for all A,, E (1, 1/(1 — G,,)), h,, = 1 — A,,(l - G,,) > 0 and

lim h,, = 0+.

.__1_-
”"T1—Gn

If G,, > 1, then 1/(1 -— G,,) < 0. Therefore the interval (1/(1 — G,,),O) is not empty and for
all A,, E (1/(1 — G,,),O), we have

<A,,<0.

 

l—Gn

Multiplying each side by (1 — G,,), we obtain

0< A,,(l-Gn) < 1.

Thus h,, > 0 when A,, 6 (1/(1 - G,,),O) and

lim + h,, = 0"".
I
burr-:6:

Lemma 2.2

lim V(A,,) = +00

Anﬂl-

Proof: From (2.25), we have

V(A,,) = “39%.

Since lim1,,_.1- h,, 9t 0, it suffices to show that

lim 11;"(1 — A,,) = +00.

Angl-

26

From the deﬁnition of d,,,

 

«1:10— A.) = (1),")... (5:1)...“ - 1.)

_ 1 ( x,,, )m
— (I — A,,)m-l Kn_1 .

The positiveness of x,, can be immediately seen from (2.4). Thus at any time n, x,,/x,,_1 >
em__ ﬁﬂ._._—r—----__ __ _ _. A-

m“! - -—--"

 

I~

0' SO ”(I — An) approavChes +m as An approaches 1-. (I, *’ (xi, ‘I fr; he)" "f‘if'.’ . ' [3
Theorem 2.2 If c,, 2 0, then there is no extremum of V(A,,) in the open interval (0, 1).

Proof: From (2.34) we know that

aV(/\n)
— >
01,. 1..=o - 0

when c,, 2 0. First we consider the case when c,, > 0, for which there are three possible

sub-cases:

1. Suppose G,, < 1. Then there are two asymptotes at A,, = 1 and A,, = '1-_IG',I > 1.

Claim 2.1 If c,, > 0 and G,, < 1, then

A,,—91+
lim dn = —m.
1 _
burr-7::

Proof of Claim 2.1: From (2.22),

 

 

A 1 x,, A,,7,, A,,e2
d = — = 1 + _ n
n 1 -' An Kn—l (1 - An)"n--1 Kn—lhn
Note that since
lim h,, > 0,

0"]

27

and since x,,_1 > 0 at time n, the second term in the right hand side will dominate

in the limit. So

lim d“ = -w.
Avg-91+

Similar arguments also apply when A,, approaches 1 / (1 — G,,)', because in this con-
dition, 1 — A,, d roach zero, but h,, approaches 0” from Chi/1:131. So the last
term dominates and 4 ///.’/’ 5k,

d,, = --00.

lim

1,.—+,—_15;"

Cl

Note that for all A,, 6 (1, 1 / (1 — G,,)), the sign of 1 — A,, and h,, remain unchanged, so
from (2.25) and Claim 2.1, the value of V(A,,) near the two asymptotes (A,, = 1 and
A,, = 1/ (1 - G,,)) will both approach +00 or —oo, depending on whether m is even
or odd. Note that in Lemma 2.2 we have shown that V(A,,) approaches +00 when
A; approaches 1‘, so minima can not exist in (0, 1) because otherwise more than two
extrema will exist. Therefore there are only two possibilities in this case, as illustrated

in Fig. 2.1.

V(An) V(An)

‘ 1

 

 

 

 

 

 

1/ \+ ‘| U +

 

 

Figure 2.1. Plots of V(A,,) when c,, > 0 and G,, < 1.

28

2. Suppose G',, = 1. Then there is only one asymptote at A,, = 1. Note that in this
condition, a,, = -me,2, — 6,2, < 0. So c,,/a,, < 0. This means that the extrema have
different signs. If there exists one extremum in the region of (0, 1), then there must
be another extremum in (0, 1) since lim,\n_,1_ V(A,,) = +00. This is contrary to the
fact that the roots’ signs must be different. Therefore we conclude that there is no

extremum in (0, 1).

3. Suppose G,, > 1. Then there are two asymptotes at A,, = 1 and A,, = ,—_1G—"- < 0. We
claim that

lim + V(A,,) = +00. (2.35)

An‘-’

 

l—Gn

For convenience, we rewrite (2.25) as follows

V(A,,) = 172$.

Note that lirnA , + h,, = 0+ from Lemma 2.1, 1 - A,, > 0, and
ﬂ_’"Tn

1—

2
d,, = 1 x,, = 1 + A,,7,, A,,(sﬂ

1 — ’\n Kn—l (1 " An)Kn-1 — Kn—lhn

 

 

—) +00. (2.36)

Equation (2.36) holds because h,, -> 0+ and A,, is negative. Now (2.35) is satisﬁed
which implies that there must be one local minimum in the interval (1 / (1 - G,,), 0). If
there is one extremum in the interval of (0, 1), then there must be another extremum
since lim (v.1- V(A,,) = +00. This situation can be seen in Fig. 2.2, which shows
the existence of more than two roots in (2.31)—contradiction. Thus we conclude that

there is no extremum for A,, E (0, 1).

Suppose c,, = 0. Then

 

and

am.) 0
0A,, “:0
. We.)
11/ W =0”-

 

29

 

V(A,,)
1
1/-\./
:15; I A"

 

 

 

Figure 2. 2. When G,, > 1 and c,, > 0, if there exists one extremum of V(A,,) 1n (0,1), then
there exist more than two extrema. .

This means that V(A,,) is convex in the neighborhood of A,, = 0 and there is one minimum
when A,, = 0. If there exists some extrema in (0,1), there are at least three extrema which
violates (2. 31) since the number of extrema must beceyen.) So we ~conclude that there 18 no
extremum in (0, 1), and if there exist some A,, in (0,1) suchlthht (2.31) holds, it must be a
saddle point. The only possibilities with and without saddle points can be seen in Fig. 2.3.

C]

V(A,,) V(An)

 

 

 

 

 

 

(I) (b)

Figure 2.3. When c,, = 0, the the only two possibilities for V(A,,) in (0,1) are: (a) There is
a saddle point, or (b) no extrema and saddle points exist.

30

Theorem 2.3 Suppose c,, < 0. Then V(A,,) has a unique minimum in the open interval

(0, 1).
Proof: When c,, < 0, we have

3V(/\n)

0.
6A” A,,:O <

 

By construction of the algorithm, there are two A,,’s such that

(ax/(1,.) ___

6A,, 0.

 

Also, from Lemma 2.2,

lirn V(A,,) = +00,
A,,-91‘

so there is at least one minimum in (0, 1). If there exist more than one extremum in
(0, 1), then there must be at least three extrema as shown in Fig. 2.4, contrary to (2.31).

Therefore, there must be exactly one minimum in the interval [0, 1). Cl

V(An)
A

\

 

 

 

Figure 2.4. When c,, < 0, if more than one extremum appear in (0, 1), then there must be
at least three extrema.

31

From Theorems 2.2 and 2.3, we conclude that there exists an optimal weight if and only

if c,, < 0.

 

Theorem 2.4 There are always two real roots of the quadratic equation (2.81). Further-
more, if c,, < 0, then there is a unique root A; E (0, 1) with

A" _ —b,, + Vb; — 4a,,c,,

2a,,

(2.37)

 

Theorem 2.4 implies that if c,, < 0, then the optimal weight A;, which belongs to (0, 1),
can be obtained from (2.37).
Proof: First we will show that the real roots of (2.37) exist. Let a,,, b,,, and c,, be

coefﬁcients of (2.31). For convenience, we rewrite them:

a,, = m7,, — me; + mG;7,, - 2mG,,7,, - 53-10,, + x,,-lG; + G,,7,, - 0,2,7” — 5:0,,

0-
a
l

2m5; — 2m7,, + 2mGn7,, + 2n,,_1G,, — x,,_1G; — G,,7,, + 5;G,,

_ 2

Deﬁne 5,, and b,, as:

"D

(m — 1)G;7,. (2-38)

@I
IID

G,,(2m7,, — x,,-lGn — 7,, + 5;). (2.39)

Then

and

32

Note that the expression of 6,, and b,, are exactly the same as the coefﬁcients of the quadratic

equation in (1.9). Let the solutions of (2.31) be A; and A3,. Then

 

A1 _ _bn + Vb; — 4a,,c,,

” 2a,,

A2 —b,, — Mb“! — 4a,,c,,

2a,,

Note that

b; - 4a,,c,, = (—2c,, + b,,)2 — 4c,,(c,, — b,, + a,,)
= 4c; — 4c,,b,, + '13,”, — 4c; + 4c,,b,, — 4c,,E,,
= b,, - 4E,,c,,
= Gib: + 463,1712711 — 2GnKn-1‘Yn - 253.711 + 03.03-; " 26iG'nKn-l '1' 5:)
= G3.(G?.K3.-1 + 73. + 63. - 2G,,n,,_17,, + 253,1, - 2€§Gn~n-1)
+46im’7nG3. - 403,535),
= G;(G,,n,,_1 - 7,, — 5,2,)2 + 4£;G;m27,, — 46;G;7,,

= G;(G,,K.,,_1 — 7,, — 5;)2 + 415,2,G27,,(m2 — 1) 2 0.

n

Thus two real roots always exist.

To prove that A; = A;, consider the following four cases:

1. Suppose a,, < 0 and b,, < 0. Then «b,,; — 4a,,c,, < [b,,] since a,,c,, > 0. So —b,, :l:
«b,,! — 4a,,c,, > 0. But since a,, < 0, both solutions are negative. This violates the

existence of positive A; when c,, < 0. Therefore we can rule out this possibility.

2. Suppose a,, > 0 and b,, > 0. Then A; < 0. The only possibility to obtain a positive

root is to choose A; = A3,.

3. Suppose a,, > 0 and b,, < 0. Then «b,,; — 4a,,c,, > |b,,| since a,,c,, < 0. So A; < 0 and

A; > 0. To obtain a positive root, we must have A; = A3,.

33

4. Suppose a,, < 0 and b,, > 0. Then both roots are positive when c,, < 0. Since
a,,c,, > 0, Mb; — 4a,,c,, < |b,,| and hence A; < A3,. To obtain the unique root in (0, 1),

we must have A; = A;,.

We conclude that A; = A; when c,, < 0. CI

2.5 Relations Between SM-SA and SM-WRLS

As shown in Theorems 2.2 and 2.3, the ellipsoid will shrink if and only if c,, in equation
(2.31) is negative. If we compare c,, in (2.31) and c,, in (1.9), we ﬁnd that they are exactly
the same quantity. This fact led us to explore whether relationships exist between SM-SA
and SM-WRLS.

The following simulation shows some apparent equivalencies between these two algo-

rithms. Consider the system
yn = 1-6yn—1 "’ 0°68yn—2 + ”n (2.40)

where {v,,} is a sequence of pseudo-random numbers uniformly distributed in [—0.5,0.5].
A set of 120,000 data points is generated using the model. Table 2.1 and Table 2.2 are the
results of the simulations for SM-SA and SM-WRLS using the system in (2.40). It can be
seen that both algorithms update the parameters at the same time and result in the same
“volume measure” and estimates.

These empirical results, and others due to Krunz et al. [33, 34], led Deller et al. [38, 39]
to the discovery that the volume measure deﬁned by det B(n), the data points selected,
and the parameter estimates 0(n) is independent of the choice of weighting sequences when
UOBE is considered. Thus the relationship between SM-SA and F-H/OBE is expected to
be the same as that between SM-SA and SM-WRLS. This implies that the present research
on SM-SA has much broader implications for the entire class of UOBE algorithms. Details
will be discussed in Chapter 5.

On the other hand, SM-SA has some features which are not necessarily general. The

34

simulation results also reveal that the average weights satisfy Robbins-Monro’s require-
ments in (1.3) under the condition that the noise input is a pseudo-random number with
uniform distribution in [Viﬂ. Figure 2.5 shows the relationship between the time and
the “averaged optimal weight” from six experiments. The vertical and horizontal axes are
logarithms of weight and time, respectively. The two dotted lines are logarithms of 1/n
and Mn“. The curve of the averaged weights is very close to the curve of 1 / n and hence
the estimates could approach to the true parameter. Therefore, experimental results show
that SM-SA has a great potential in theoretical development and its theoretical analysis
can help explain some features in SM-WRLS which cannot be obtained directly from the

original algorithm.

 

10‘ I W I V‘TV‘T I V I UU'FIU I I V I'UIU

§
a...~‘
". ‘.
P 't ~§ —
.. ~~
.. \-
' Q
'0. ‘.~
._' ..

.
\‘.
§

lo-l -

102 -

 

103 ~

105 '- --..,,_..'. q

 

 

Lljlllll l 1 .Llllle l I lllll

106 . ,
10° 101 102 103

 

Figure 2.5. The relationship between logarithms of time and weight

35

 

 

 

 

 

 

 

 

 

SM-WRLS SM-SA
time volume measure 5,, time volume measure 5,,
1 1588030880 0.045102 1 6250000000 0.045102
2 61090735 0.325878 2 122037760 0.325878
3 75912.859375 -1.841745 3 93237.984375 -1.853359
4 67.021507 6.571213 4 66.320511 8.059097
5 19.407402 0.208971 5 19.290403 0.212660
6 17.094551 -0.261806 6 17.005600 -0.262197
7 6.657953 -0.219103 7 6.645070 -0.222424
8 2.566225 0.210988 8 2.566096 0.208843
9 0.861677 -0.235759 9 0.861815 -0.236259
10 0.399760 -0.076875 10 0.399806 -0.076872
11 0.191688 -0.458952 11 0.191711 -0.458865
12 0.191604 0.157472 12 0.191627 0.157516
13 0.169054 -0.062481 13 0.169072 -0.062429
17 0.168957 -0.355722 17 0.168976 -0.355732
18 0.130553 -0.420661 18 0.130561 -0.420695
19 0.102843 0.197666 19 0.102846 0.197650
20 0.089835 0.242533 20 0.089836 0.242511
21 0.051043 -0.607100 21 0.051041 -0.607123
23 0.044455 -0.433844 23 0.044454 -0.433845
41 0.044317 -0.433763 41 0.044316 -0.433759
55 0.043492 0.340501 55 0.043491 0.340501
58 0.043200 0.330139 58 0.043199 0.330139
115034 0.000072 -0.498830 115034 0.000072 -0.498830
115258 0.000072 -0.500641 115258 0.000072 -0.500641
115280 0.000072 -0.496111 115280 0.000072 -0.496111
115463 0.000072 -0.491712 115463 0.000072 -0.49l711
116302 0.000072 -0.483166 116302 0.000072 -0.483166
116350 0.000071 -0.499281 116350 0.000071 -0.499281
116643 0.000070 -0.503710 116643 0.000070 -0.503710
116738 0.000070 -0.501659 116738 0.000070 -0.501659
116934 0.000070 -0.500541 116934 0.000070 -0.500541
117060 0.000069 -0.499384 117060 0.000069 -0.499384
117354 0.000069 -0.491482 117354 0.000069 -0.491482
117706 0.000069 -0.502020 117706 0.000069 -0.502020
118516 0.000068 -0.503093 118516 0.000068 -0.503093
118709 0.000068 -0.498520 118709 0.000068 -0.498520

 

Table 2.1. The data-selection and the performance of SM-WRLS and SM-SA for the AR

model (2.50).

 

36

 

SM-WRLS

SM-SA

 

time

01

“2

time

01

02

 

 

 

 

. 010-03101-1

115034
115258
115280
115463
116302
116350
116643
116738
116934
117060
117354
117706
118516
118709

 

0.000000
2.659994
2.659676
2.504596
2.485668

1.602177
1.600430
1.601612
1.601638
1-596280
1.594492
1.591593
1.602305
1.604294
1.602866
1.603187
'1.597315
1.600800
1.605809

 

0.000000
0.000000

2.490193 .

-1.994164
-1.956294

-0.678578
-0.676362
-0.677525
-0.677555
-0.673320
-0.670988
-0.668343
-0.678133
—0.680194
-0.678490
-0.678821
-0.678648
-0.681055
-0.682360

 

.01000101-1

115034
115258
115280
115463
116302
116350
116643
116738
116934
117060
117354
117706
118516
118709

 

0.000000
2.662042
2.661741
2.505078
2.485804

1.602176
1.600430
1.601612
1.601639
1.596280
1.594492
1.591593
1.602305
1.604294
1.602866
1.603187
1.597315
1.600799
1.605808

 

0.000000
0.000000
-2.496096
-1.995054
- 1 .956493

-0.678578
-0.676362
-0.677525
-0.677555
-0.673320
-0.670988
-0.668343
-0.678133
-0.680194
-0.678490
-0.678821
-0.678648
-0.681054
-0.682360

 

Table 2.2. Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a, = 1.6
and a; = —0.68.

 

Chapter 3

Convergence Issues

In the previous chapter, we have developed the SM-SA algorithm with a selective updating
scheme and non-increasing volume. In this chapter, the convergence properties of this
algorithm will be discussed. However, for all the estimation problems, the parameters in
model (2.1) can not be determined unless some conditions are imposed on the input signal.
A commonly assumed condition is persistency of excitation (PE)‘in which the input signal
is sufﬁciently rich in frequency components. PE has been deﬁned in a variety of ways
[15, 17, 19, 20, 21, 22]. The following deﬁnition has been used extensively in adaptive

control and system identiﬁcation.

 

Deﬁnition 3.1 A signal u(t) is said to be persistently exciting (PE) of order n if for all t

there is an integer N such that

t+N

911 < Z w(thTUC) < 021 (3-1)
h=t+1

where 91, 92 > 0 and the vector w(t) is given by

90(1) = [u(1 - 1) - - °u(t - ”HT

 

 

'Please read “PE” as “persistency of excitation” or “persistently exciting” as appropriate.

37

38

The underlying meanings of the PE condition have been discussed extensively in the
literature [6, 15, 17, 10]. From a frequency-domain point of view, a stationary signal u(t) is
persistently exciting of order n if its frequency spectrum is non-zero at n points (or more).
That is, its frequency components have to be rich enough so as to drive the estimates toward
the true parameters. According to this deﬁnition, white noise is persistently exciting of all
orders.

Research in SM-theory, however, is concerned with PE of x,,, the regression vector.
Anderson and Johnson [15], and Bitmead [17] proposed that the PE of the input signal
implies the PE of x,, under some mild restrictions on the system function. The major result

is given in the following theorem:

 

Theorem 3.1 [15, 17] Consider a linear system with transfer function H (z) = B(z)/A(z)
where A(z) = z” — alzr".1 — ~ - - — apz”'1’ and 8(2) = boz" + 512’"1 + - - . + qu”“1, which
are co-prime, and with input sequence {uk} and bounded output sequence {yk}. Then with

.f/ l“\~/

x; = [y,,_1, . . .,y,,_,,,u,,,u,,_1, . . .,u,,_q], PE of {a,,} implies PE of {x,,}.

 

Non-coprimeness of A(z) and 8(2) usually renders the parameters non-identiﬁable. This
discussion, however, is far beyond the scope of this research.

For LSE estimates of the parameters of a ﬁnite impulse response model with n param-
eters, the estimate is consistent and the variance of the estimate goes to zero if the input
signal is persistently exciting of order n. In order to deal with the convergence issues in

D-H/OBE, Dasgupta and Huang claimed that if for all t,

t+N

91I < Z xkxf < 921 (3.2)
k=t+l

for some N 6 N and 91, 92 > 0, then the covariance matrix

C(n) = (1 - A;)C(n — 1) + 1:91.11;

39

in D-H/OBE, which has the same form as R(n) in SM-SA, is nonsingular. From Theorem
3.1, if the system function is co-prime, then the statement in (3.2) is validated. However,
examining the proof of their argument, we ﬁnd that the condition (3.2) is not sufﬁcient
because of the time index mismatch. In (3.2), It varies over all the natural numbers between
(t + 1) to (t + N). In their proof, however, the time index varies only over those instances at
which updates take place. Consequently, we require a more strict deﬁnition for PE which
would accommodate non-uniform time indexing.

Given that x,, is PE, the question of whether the subsequence of {x,,} in which the
updates take place is PE or not is still an open issue and is difficult to prove formally.
However, this problem can be simpliﬁed if we impose the mixing condition to the noise
sequence. The mixing condition will be introduced in Section 3.4. For the moment, we
assume that the subsequence of {x,,} used in updates is PE. For SM-SA with ARX model,

we give the following theorem:

 

Theorem 3.2 Given the update equation:
R(n) = (1 — A,,)R(n — 1) + 1,,x,x£, 11(0) = no > 0 (3.3)

the eigenvalues of R(n), for all n, will remain bounded if the signal energy is bounded, i.e.,
there exists an L 6 R such that xxx“ 5 L < 00 for all n, and 0 S A,, 5 1.

Let {t,,} be a subsequence such that c,,, < 0 Vn. If there exists an N 6 N and 91, 92 > 0
such that for all ’12

I] n+N
911 < E x,,xg; < ml, (3.4)
.1 k=n+1 /

then R(n) is nonsingular.

 

Proof: Use induction. Note that from the above recursion, R(n) Z 0 for all 71. So it

sufﬁces to show that trace(R(n)) < 00.
v ,

,J

~~ “111/

40

l. n = 1
R(1)= (I — A1)R(0) + Alxnx;
tr[R(1)] = (I - A1)tr[Ro] + A1t1‘[X1X¥‘]
= (1 — A1)t1'[R.o] + A1X¥X1
S max[tr(Ro), L].
2. Let

tr[R(n)] S max[tr(Ro), L] < 00.

We have to show that

tr[R(n + 1)] S max[tr(Ro), L] < 00.

Since

R(n + 1) = (1 " A,,+1)R(n) + A,,+1x,,+1x;+l,

tr[R(n + 1)] (1 — Art-H )trlR(n)l + An+lxz+lxn+l

IA

multr(R(n)),x£..x..0

IA

InnaLXItr(R(")), L]

IA

max[max(tr(Ro), L), L]

max[tr(Ro), L].

Therefore, the sum of all eigenvalues is bounded for all 71. So the eigenvalues have to remain
bounded.

The non-singularity of R(n) under the assumptions in (3.4) that the subsequence of
{x,,} in which updates take place is PE has been proven in the paper by Dasgupta and
Huang [20]. EJ

41

Theorem 3.2 ensures that the covariance matrix R(n) in (3.3) is always bounded and the
boundedness property holds for all the noise processes. Although boundedness of R(n) is
sufﬁcient for proofs of some fundamental properties, positive deﬁniteness of R(n) is essential

in order to prove the convergence of parameter sets as discussed in the next few sections.

3.1 Fundamental Convergence Properties of SM-SA

The most important feature of SM-SA, which differs from previous work, is the convergence
of weighting sequence {A;}. In this section, the fundamental convergence properties of

SM-SA are analyzed.

 

[ N6] NHL-20") 'lC C(n/(761” hit 6] $5.10]; (1.;
Theorem 3.3 Let (Rm, I] ' ll) be a normed space. Given the R(n) in (3.3) and the update

equation (2.7)-(2.11). Let A; be deﬁned as

“bn+vb?;-4¢ncn 1.an < 0

A. 20a ’

, 0, otherwise

If inﬁnitely many data points are selected, then the SM-SA algorithm developed in Theorem

2.1 converges in the sense that

"11.120 A; = O (3.5)
"11.113063, 6 [017nl- (3'6)

Furthermore, if R(n) is positive deﬁnite, then
"11.1130 “0,, — 0,,_.1|| = 0. (3.7)

If there exists positive real numbers 91 and 92 and integer N1 such that for all k

f.
\

k+~1

glI < z x,,x; < ml (3.8)
n=lc+l

42

then [31] a! _
t 2 4 n+N1
lim 0 — 0 < _ lim 1' 3.9
n—ooo ll nll2 — 91 "“00 £3.24 7 ( )

 

The proof of this theorem is very signiﬁcant because, as a member of UOBE [39], SM-SA
sets the stage for a formal proof of the converging behavior of this family. Before we prove

this theorem, we need the following lemma:

Lemma 3.1 Consider the coeﬁ‘icients of the quadratic equation (2.31) and the conditions

stated in Theorem 3.3. Let c; and c; be deﬁned by

c,,, i c < 0

c; = f " . (3.10)
0, otherwise

0: = cm ifcn > 0

0, otherwise

Then lim,,_.°° c; = 0 if and only if lim,,_,oo A; = 0.

Proof of Lemma 3.1: From Theorem 2.4 and the deﬁnition of A; in Theorem 3.3, if

c,, < 0, then

A" = —b,, + Vb; — 4a,,c,,

n 2aﬂ
Let 'd',, and b,, be the same as in (2.38) and (2.39), then from the proof of theorem 2.4, we
have

—2 —
b; — 4a,,c,, = b,, — 4a,,c,,

If c,, < 0, then A; becomes

 

,_ 2c; - b,, + Vb; — 4(m — 1)G3,c,77 ‘
A = 2a C (3.11)

Ti.

 

Suppose lim,,_.°° c; = 0. If lim,,_.°o a,, 75 0, then from (3.11) we know that lim,,_.°o A; = 0.

Iflim,,_.°° a,, = 0, then we claim that lim,,_.oo b,, 76 0 because from the deﬁnitions of 5,, and

43

b,,, we can write

an+bn=_cn+6n

and for c,, < 0, we have

.1919 + 3,.) = .132. a» + .22. bn

= lim 0,,

n—ooo

= lim (—a; +6“)

71-600

= lim 6,,
ﬂ—‘W

= (m - 1)Gn7ﬂ>
That is, lim,,_.oo b,, > 0. Thus from (2.31),
lim 1;; = — lim 91 = 0

n—ooo n-ooo bn

Suppose lim,,_.oo A; = 0, it follows from (3.11) that when c,, < 0, then in the limit

 

2c; — b,, = —\/b; — 4(m — 1)G3,c;7n

Thus /
“11.00 'm;[4c — 4a,", b,, + b 3,] = "lim 0°[b; - 4(m— 1)G;c;'7,]

and

m[4c;" — 4c;b,, ] = “lim —4(m — 1)G,2,c,, 7]

ﬂ-Ougﬁ

Suppose lim,,_.°° c; at 0. Then

lim c,, =-nle [(m—1)G,, 7+b,,.]

ﬂ-‘m

44

The numerator of equation (3.11) becomes

 

 

2c; — b,, + «lb—,2, — 4(m — 1)G3,7ﬂb,, + 4(m — 1)2G;7: = 2c; - b,, + \/(b,, - 2(m — 1)G,2,7,,)2
(3.12)

Comparing (3.12) and (3.11), we have in the limit

 

 

,ﬂb‘, — 2(m — 1)G§,7)2 = \/53, — 4(m — 00.2.9.7, (3.13)

 

Note that in left hand side of (3.13), \/(b,, -— 2(m — 1)G3,7)2 < [b,,]; while -in right hand side

 

of (3.13), \/b,2, — 4(m — 1)G,’,c,,7n > |b,,| since c,, < 0. This is a contradiction. So the only

possibility when lim,,_..°,, A; = 0 is lim,,__.oo c,, = 0. 0

Proof of theorem 3.3: Suppose A; does not converge to zero. Then there exists a

strictly increasing sequence (m,,);°=l such that A;,“ 2 a for some a > 0. From Lemma 3.1,

we have lim sup,,_,°° c,, < 0. From (2.34),

6V(A,n,,) _ __ c,,,,,
BAN“; Amn=0 — F(O) - nmn" .

 

For the “volume measure” of P(m,,), we have
det B(m,,) = det x,,,nP(m,,) = Km" det P(m,,)

Since R(m,,) is always bounded, det P(m,,) = det R‘1(m,,) > 0. Also note that

{det B(n)};g, is non-increasing, so 16,, is bounded. Since 1imsup,,_,°o c,, 314 0, there ex-

/.

\

ists inﬁnitely many n such that

0V(A,,,,,) _ c,,,"

—- < —b
0A,"; Amnzo ”mu—1

where b > 0. So there are inﬁnitely many n such that V(A;,n) 5 c for some 0 < c < 1. Let

M,,,,, be the volume at time m,,, then Mm; = Vonle, ' - -V,,,,,. Thus lim,,_,°o M,,,,, = 0.

1
(.1'

45

Note that {Mn} is non-increasing. Thus we have 1im,,_.oo M,, = 0. By deﬁnition,

M,, = det n,,P(n) = K? det P(n).

wire] i/ I].

Since det P(n) is positive deﬁnite by Theorem 3.2, lim,,_.oo It? = 0. This leads to the

conclusion that lim,,_.oo x,, = 0. Let 0,, = 0,, — 0", then

5n = yn - 07T—1xn
= 031+ v,, — 0;_1x,, (3.14)

~T

Since the volume of the ellipsoid converges to zero, the estimates converge to the true
. - . ~T

parameters. Thus 11m,,_.°° ||0,,-1|| = 0 and hence 11m,,_.°° 0,,_1x,, = 0. Thus, from (3.14)

we have lim,,_.°o |e,, -— v,,] = 0. Since lim,,_.°° x,, = 0 and since c,, = 7717,, - me; - x,,_1G,,,

we have

.2261; = .liem<1-€i)-~n9n
= lim r”(x-£3.)

n-voo

= .22.":(1- vi) 2 0

By the deﬁnition of c; in Lemma 3.1, lim,,_.oo c; = 0 and hence lim,,_..,,o A; = 0, contrary
to the assumption. Thus we conclude that the sequence {A;} must converge to zero.

It follows from (2.10) that lim,,_,°° ||0,, - 0,,_1|| = 0 when lim,,_.°o A; = 0. Thus (3.7)
is satisﬁed.

Consider the coefﬁcient c,, in (2.31). At any time n, we can write
6,, = c; + 6,",
where c; and c; were deﬁned in Lemma 3.1. Since 1im,,_.°o c; = 0, we have

liminf c,, = liminf(c;' + c,‘,’) 2 0.
n-‘m n-‘w

46

Let s > 0. Then there exists an N E N such that for all n > N,
m(7ﬂ— 6;) — n,,_1G,, > —me.

Let n > N. Then

7,1- 5; > -E + 3137:7031 > —e. t (3.15)

Let s be arbitrarily small, then we have
7 - 63. Z 0-

Thus a; 6 [0,7] and, hence, (3.6) is satisﬁed.

-, Due to the slow convergence of the ellipsoid for large 71, one can argue that for sufﬁciently

“K Km,

—__,,

large n, there exists a positive integer M such that M > N1 and

[ya-H: — OZXnH] < \/7n+k, l S k S M

In other words, no update takes place on {n + 1,. . .,n + M}. Substituting for y,,+), by

07x,,+), + v,,.H, and noting that |v,,+),| < ,/7,,+1,, we can write

[(0'Y— 0,,)Tx,,+),] = [(0‘— 0,,)Tx,,+), + v,,+;, — v,,+),] (3.16)
>4“ |(o"- 00’s.). +v.+1|+|v.+1| (3.17)
3 2w... 7 (3.18)

for l _<_ k 5 N1. Therefore, squaring both side of (3.18) and, then, summing over 1:,
1 S k 3 N1, it is found that

n+N1 J ﬂ+N1

(0]- 0,,)T [ z: x,-X,T (J-On)<4 Z 7,-

i=n+1 i=n+1

47

Hence, using (3.8), it follows that

1 4 "+N’-
“9- 0..“2 < — 2‘ 7.-
91 i=n+1

Theorem 3.3 assumes that inﬁnitely many data points are selected. The algorithm,
however, may stop at some ﬁnite time N, that is, optimal weights no longer exist after N,
when noise excitations are “too colored” or noise bounds are over-estimated. In this case
the convergence properties (3.5) and (3.7) still hold since the volume of the ellipsoid and
the estimates will remain constant, and the optimal weights A; become zero after time N.
For c,,, recall that

c,, = m(7,, — 5;) — x,,_1G,,.

Since cn20forn> N,

m(7,, — 53,) 2 Kn_1Gn > 0 (3.19)

Equation (3.19) ensures that (3.6) still holds even if the algorithm stops at ﬁnite time.

3.2 Convergence of Parameter Sets and Suboptimal Test

In this section, we will ﬁrst investigate the condition under which the membership sets
converge to a point. Then, a “suboptimal test” criterion is introduced. We will ﬁnd that
employing the suboptimal test not only improves computational complexity, but also results

in excellent convergence properties.

3.2.1 Convergence of Parameter Sets

Since the volume of the ellipsoid in SM-SA is non-increasing, it converges for all the sta-
tionary signals and linear time-invariant models. The ideal performance for SM-SA is the
convergence of membership sets to a point asymptotically. In this section, we will study

the asymptotical properties of parameter sets.

48

 

Theorem 3.4 Let {t,,} be a subsequence such that 7,; - a?" < 0 Vn. If there exists an
N E N and 91, 92 > 0 such that (3.4) holds, then the volume of the ellipsoid converges to

zero if {t,,} is inﬁnite.

 

Proof: In Theorem 3.3 we have shown that lim,,_.°° A; = 0. Let c; be deﬁned as in

(3.10). Then from Lemma 3.1, lim,,_.°o c; = 0. That is,

.-

A

('0, £570 ,3 .41 ,1" _;_. .. H I’ll” c", ( H ’ 1 7
"lingo [m(7,, - 6;) — x,,_1G,,] = 0. . i (3.20)
I/ 1 "l ,I - l \
Note that the ellipsoid at time n is deﬁned by (f); L5 ﬁrm/r ’j 1/ (I
7/

P(n) = {0| (9 — 9,.)7’ R(n) (9 — 0,,) < x,,}.

Thus it is obvious that x,, is positive if P(n) is non-empty.

We claim that there exists a a > 0 such that

 

 

 

 

x;B(n — 1)x,, V [1
0' < llxnllz det B(n _ 1). (3.21)
To show (3.21), consider
x;B(n - 1)x,, _ x£n,,_1P(n — 1)x,, (3 22)
“x,,”2 det B(n — 1) — ||x,,||2n$_l det P(n - 1) '
T _
x,,P(n 1)x,, (3.23)

1c""'1||x,,||2 det P(n - 1)

n-l

Since the PE condition deﬁned by (3.4) holds, P(n) is positive deﬁnite. Therefore

0 < A,,,,-,,||x,,||2 g x£P(n — 1)x,,,

49

where A,,,,-,, is the minimum eigenvalue of P(n — 1). Now

 

 

 

A,,,,-,,||x,,||2 x;P(n — 1)x,, __ x;B(n - 1)x,,
Isl-11 ||x,,||2 det P(n - 1) - 11:31 ||x,,||2 det P(n — 1) le,,||2 det B(n — 1)'
That is,
A,,,,-,, xZ'B(n - 1)x,,

 

 

52:11 det P(n - 1) - ”x,,“2 det B(n — 1).
rn-l

By the positive deﬁnite property of P(n), we have det P(n — l) < 00. Also x,,_1 < 00.

Thus we conclude that there exists a o > 0 such that

A,,,,-,, x;B(n — 1)x,,
162:1] det P(n — 1) - ”x,,“2 det B(n — 1)’

 

 

a<

and, hence, (3.21) is satisﬁed.
In the remaining part of this proof, we consider the subsequence {t,,} only. In order to
simplify the notation, we replace the time index t,, by n for the rest of this proof. Since x,,

> _.
satisﬁes (3.4), there exists a 6 such that there are inﬁnitely many 11 for which ||x,,||2 > 6.

Let c > 0. According to (3.20), there exists an N, such that if n > N, then
[m(7,, — 5,2,) — x,,Gn] < 660. (3.24)
Let n > N1. Since 7,, - e; < 0, we have from (3.24) that

m ]7,, - 5;] + x,,_1G,, < 660.

Thus
0 < n,,_1G,, = n,,-1x;P(n — 1)x,, = x;B(n — 1)x,, < 660'.
By (3.21),
0 < 0‘||x,,||2 det B(n — 1) < x;B(n — 1)x,, < 660.
Thus

6

det B(n — 1 < ——e.
) "an?

50

Note that there exists an N > N1 such that

“IQ/[[2 > 6.

Therefore
6
detB N—l <——e<e.
( ) llleP
Since det B(n) is non-increasing,
det B(n) < e for all n > N. (3.25)
By deﬁnition, the volume of the ellipsoid converges to zero. Cl

4

Recall that 6,, ; —0,,_1x,, + v,,. Therefore 7,, — 5,2, < 0 if and only if

lvn " éZ—lxnlz > 7n (3.26)

If the volume of the ellipsoid keeps shrinking, the value 034x" will decrease to a small

number as n becomes large.\When 011x" is small, from (3.26) we see that v,, must be
sufﬁciently close to the 6:31:41: thaftj7n — 5,2, < 0 for inﬁnitely many 71. This is very
important for the convergenceitdf the parameter set. In Section 3.3.2 we will show that if
either the noise bound is over-estimated or the realization of the noise process does not visit
an arbitrary neighborhood of the bound inﬁnitely often, then the ellipsoid can not converge

to a point asymptotically.

3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm

Theorem 3.4 leads to the pursuit of a suboptimal test. Recall that one important feature for
members of UOBE is “selective updating.” That is, no updating is taken when incoming
data do not contain useful information. Sub optimal test is a small modiﬁcation of the
updating criterion associated with SM-SA. Instead of checking c,, = m(7n - 5;) — n,,_1G,,,

the suboptimal test checks only 8,, g 7,, - 5; since the negativeness of 6,, implies the

51

- r
.

negativeness of c,,. Note that the unique root A; on (0, 1) still exists for suboptimal test.
Suboptimal testing has been studied in previous work [28]. The computational beneﬁts
of employing a suboptimal test can be substantial since the updating criterion is reduced
to 0(m) computations where m is the model order. The weights obtained from suboptimal
test, however, may not be optimal in the sense of volume ratio minimization. In order to

obtain similar convergence properties as we do in SM-SA, deﬁne

c,, ifE,,<0

n
:1
||

0 otherwise

Lemma 3.2 Given the coeﬂ‘icients of the quadratic equation (2.31) and the conditions
stated in Theorem 3.3, we have lim,,_.°° 5,, = 0 if and only if lim,,...°o A" - 0, where A;

n—

is obtained by solving the quadratic equation
a,,A; + b,,A,, + c,, = 0

when 8,, < 0 and is set to be zero when 5,, Z 0.

We do not intend to give formal proof for this lemma because 2,, < 0 implies t t c,, < 0
and hence the proof follows directly from that of Lemma 3.1. Only note that c,," proof of
lemma 3.1 must change to é,,. The importance of the suboptimal test in theor cal analysis

can be seen in the next theorem.

 

Theorem 3.5 Let (33m, ll ’ ll) be a normed space. Given the R(n) in (3.3) and the update
equations (2.7)-(2.11), if R(n) is positive deﬁnite and the suboptimal test is employed with
inﬁnitely many data points selected, then the SM-SA algorithm developed in Theorem 2.1

converges in the sense that

"lingo 1:, = 0 (3.27)
lim det B = 0 (3.28)

n-ooo

 

52

Proof: If we change c,, in the proof of Theorem 3.3 to En, then by Lemma 3.2, the
proof of (3.27) follows directly from the proof of Theorem 3.3. Since inﬁnitely many data
points are taken using the suboptimal test, there exists inﬁnitely many points such that
7,, -E; < 0. By Theorem 3.4, the volume of the ellipsoid converges to zero and hence (3.28)

is satisﬁed. 0

Again, if the algorithm stops at ﬁnite time N, then lim,,_.oo A; = 0 for n > N. Thus
(3.27) still holds. In this case, however, the volume of the ellipsoid cannot converge to zero.

Simulation results will be presented in the next chapter in support of the above analysis.

3.3 Convergence of the Parameter Set with White Noise

Case or Errors Disturbances

White noise has been studied frequently in the analysis of many estimation algorithms as

well as model structures. A sequence {v,,} is deﬁned to be white noise if

131),, = 0 (3.29)
and
0 ° 0
Ev,v,-_,- = J 71 (3.30)
a2 j = 0
where 02 is referred to as variance of the process. If (3.30) holds, {v,,} is said to be

uncorrelated. It is common to assume that {v,,} is second moment ergodic so that the

statistical properties in (3.29) and (3.30) can be realized by time averages, that is,

O 1 n
“1320 if ‘E— v,- — 0 (3.31)
and
1 " 0 j 0
lim — E v,-v,-_,- = # (3.32)

53

For LSE algorithms, white properties have been shown to be sufﬁcient for the convergence of
the estimates for the structure in (2.1) [13]. Since the estimates in SM-SA belong to a class
of LSE estimates, it is of interest to know whether the white property of noise excitation
also applies to SM-SA.

In this section, we will ﬁrst consider a special class of white noise which is called i.i.d.
white noise in which the random variables v,- and v,- are not only uncorrelated but also
independent, and identically distributed for all i, j. Following the analysis of the i. i.d. case,
the general white noise processes deﬁned in (3.29)-(3.30) or (3.31)-(3.32) are considered.

3.3.1 IID White Noise Case

 

Theorem 3.6 Suppose the noise process {v,,} is an i.i.d. white noise in which {v,,} is zero
mean and uniformly distributed on [—,/7,,/7] for all n. If SM-SA is employed with noise
bound given by

v; < 7 for all n,

then the volume of the ellipsoid converges to zero.

 

Proof: LetFé{n€N:c,,<0}={n€N:c;#0}andletnEF. Since

limn m c- = 0 >_ I, ;.
f7 T n ’ / .
. , ' 2 ._
‘r f— .‘f ILG' 7‘» .w . ' [ nanolo m(7 - En) - ”71-1071 — 0'
'r My .117. ,. 1 (4
Note that '1 '

x,,_1G,, = n,,_1x;P(n — 1)x,, = x;B(n — 1)x,,.

Thus
0 — — . 2 T —
11111 c,, — “lingo [m(7 — c,,) — x,,B(n - 1)x,,] — 0. (3.33)

n—voo

Equation (3.33) implies that in the limit

_ 3:58.495.
m .

m
:N

54

We may assume that ”x,,“ 76 0 since the probability of ”x,,“ = 0 is zero. Suppose the
volume of the ellipsoid does not converge to zero, that is, lim,,_.0° det B(n — 1) ¢ 0. Since

B(n — 1) is positive deﬁnite, x;B(n — 1)x,, > 0. Thus
"1:11:07—5; >0forn6F.

For other 71 g F, the inequality 7 — 5,2, > 0 still holds since m(7 - 5;) > n,,-1G,,. Thus we
.' ] _.

. 1‘\

have

”lingo 7 — 5,2, > 0 for all n, (3.34)

and there exists an 5 > 0 and a subsequence {m,,};,"’=1 such that

7 — 53,, > 5 Vn. (3.35)

. ~T
Since 5,, = —0,,_1x,, + v,,, we have

7 - 63.... (J7 + lem..|)(\/“7 - lsmnl)

= («7+ Ism.l)(./~7- Ivm. - 03..-.xm1)

>6.

Thus

~T 6
—v,,—0 _x,, >——
6

2,/-7 , , )4
I

=6.

”R

Therefore (3.35) holds if and )0nly if

T
|v,,,n — 0,,,n_1x,,,,,| < ,/7 — 5' Vn. (3.36)

55

Expanding (3.36), we have

— (,ﬁ _ a) < v,,," _ 0;_,..,,, < ﬁ— 5,

and thus

_ﬁ-Jf- (5' + 0:,n_1x,,,,,) < v,,," <‘\/7 — (5' '— 0:n_lx,,,,,)

1. Suppose 0:n_1xm; > 0. If —\f7_ < v,,," < —\/7+ 5’ (see Fig. 3.1(a)), then \/7- 5’ <
|v,,,,,| < ,/7 and thus

6T
vmn — mn-lxmﬂ

 

 

“T
= lvmnl + [emu-Ixmn
“T

> J7—5’,

 

violating (3.36). Consequently, v,,," cannot be in the shaded area in Fig. 3.1 (a).

Since v,, is uniformly distributed, the probability that (3.36) holds is less than 11%;:1.

2. Suppose 0:n_1x,,,,, < 0. If \/7— 5’ < v,,," < ‘/7, then

 

 

 

 

“T ”T
lvmn _ emu-Ixmn = vmn + lama-1x731:
“T
> J? — 5’ + 0,,,n_lx,,,,,
> \f7- 6’1

violating (3.36). The graph interpretation is in Fig. 3.1 (b). Since v,, is uniformly

distributed, the probability that (3.36) holds is less than 2%:

Let the probability of ~0:_1x,, > 0 be q,,. Then, the probability that (3.36) is satisﬁed is

less than

2 —e’ 2 —€’ 2 —e’
L+(1_qm)_ﬂ_-_\/7_<1

gm" 2J7 2J7 — 2\/7 (3.37)

56

 

 

ﬂvn)
*m l 1 v»
—,/7: 5’ o f;
(a)
ﬂvn)
1 l v
.5; e 0 "

 

Figure 3.1. (a) If gin-1x171» > 0 and v,,," in shaded area, then a contradiction results. (b)
If dandxm; < 0 and v,,,,, in shaded area, then a contradiction results.

Let A,, be the event that (3.36) holds at time m,,. Then from (3.37),

PM.) _<_ 1% < 1.

Now, given (3.36) holds at time n, we claim that

. 2 _ a
P(A..+1|A..) s i— < 1

277 , (3.38)

where A,,.” is the event that (3.36) holds at m,,+1. To show (3.38), consider

P (An+1lAn)
= p (A,,, n {é;,,,-,xm,,, > 0} | A,,) + P (A,,, n {é;,,,_,xm,,, < 0} | A,,)
= P (A,,, | {éﬁmdxmﬁ > 0} n A,,) P ({é;,,,_,xm,,, > 0} | A,,)
+P (A,,+1 | {9;n,,_,xm,,, < 0} n A,,) P ({0:n+1_1x,,,,,+, < 0} | A,,)
P (-,/»7 + 5’ < v,,,,, < ﬁl {é;,,,-,xm,,, > 0} n A,,) P ({0;,,,_,xm,,, > 0} | A,,)

~T ~T
+1) (-\/‘7 < vmrH-l < W- 6’ l {omn+1-Ixmn+1 < 0} n A“) P ({9m3+1-Ixmn+1 < 0} l A") '

IA

57

Since {0,,} is an i.i.d. white noise process,

/’T

I "T f 2 7 — 5’
P (—‘/7+ 6, < ”mu.“ < WI {9mn+1—1xm,,+1 > 0} m A") : __\/;__

2,/7
and
~T - 2 7 - 5’
P (—\/T< ”mu-+1 < ﬁ- 6’ l {0mn+1-lxmn+1 < 0} n An) = —\g:—/—:r—_-
Thus

2 7 - 6' -T 2 7 - 5’ ~T
P(An+l|An) S f—ﬂ.—' P ({omn+1—lxmn+l} > 0 l An)+-\g-—\/7_ P ({omn+1—lxﬂ‘ln+1 < 0} l An) °

T

mn-l-l

Let q,,., = P ({9 _,x,,,,,,, > 0} | A,,). Then

2 —€’ 2 -€' 2 -€’
P(An+l l A,,) = an—‘é-if—‘Y- + (1 - 9n+1)—\/77‘/—,7— : Jé—i‘ﬁ—

Thus (3.38) is proved and

_ g 2
P(A,,+1 n A,,) = P(A,,+1 | A,,)P(A,,) g (%.7—)

A similar method can also be used to show that

2 7 — 5’
P(An+2lAn+1 n An) S _‘éZﬁ—

and
2J7 — 5’ 3
P(A,,+2 n A,,“ n A,,) = P(A,,+2|A,,+, n A,,)P(A,,+1 n A,,) s (W) .

By induction, we have

.. g "
n—ooo n—soo 2J7

Thus we conclude that the ellipsoid converges to zero. 0

58

3.3.2 General White Noise Case

Set-membership-based algorithms are derived from noise bounds. In Theorem 3.4 we showed
that if there exists inﬁnitely many data points such that 7ﬂ -_£,2, < 0, then the volume of the
ellipsoid converges to zero. Since 6,, = v,, - 91:13:", the geometrical meaning of 7,. - 5,2, < 0
is that v,, must be very close to the boundaries if more data points are to be used. Before
we study the general white noise case, consider the cases in which the noise bounds are

over-estimated or when the noise processes are highly correlated such that
12,2, < 7,. — 6 Va 2 N (3.39)

for some N and 6.

Claim 3.1 Suppose there exists N and 6 such that (3.39) holds. Then the ellipsoid does

not shrink to a point.

Proof: We prove this claim by contradiction. Consider the N and 6 in (3.39). Suppose

the ellipsoid converges to a. point. Let n 2 N. Then

(\/7—n- Ivnl)(¢7;+ Ivnl) > 6.

Therefore

f
‘/7 — v > ——
n lnl V7n+lvnl
6

«7.. + WT.
6
2,/7—,,'

 

Since x,, is bounded, by the stability assumption on (1.5), and since the volume of the

ellipsoid is assumed to converge to zero, there exists an N1 2 N such that

6

7 <3V7n,

 

lea-1‘43 <——

where 7 = maxneN{7n}. Let n 2 N1, then

7n—En

59

(m4 |€n|)(\/'7?I- lenl)
(«m lenlxw—n- Iv. - fax.»
mm lens/7;— lvnl 461ml)

 

 

 

 

>

> (W? lenl)(2‘ﬁ_ - Iéf..xnl)
> (m+le.l)(2jﬁ-3jﬁ)
= (muesxsjﬁ)

> 5

Now the convergence of the ellipsoid to a point implies that x,, converges to zero. So

there exists an N; 2 N1 such that

6
Kn_1Gn < - Vn 2 N2.

Let n > N2. Then it > N and

Cn

E
6
O.

6

m(7,, " 512:) - "n-l Ga

3
6

This result is contrary to the assumption since the algorithm stops taking points after time

N2. Thus we conclude that the volume of the ellipsoid can not converge to zero.

Cl

Claim 3.1 points out the impact of the noise bounds on the volume of the ellipsoids

and thus leads us to study the relationship between the white noise process and the noise

bounds. Let {0,,} be a white noise process deﬁned in (3.29)-(3.30). If {v,,} is i.i.d., then

from theorem 3.6 the volume of the ellipsoid converges to zero. The key point in the proof

is that the magnitude of the noise can be arbitrarily close to the boundaries. If we eliminate

60

the condition of independence, then the noise may not be arbitrarily close to the boundaries.
Consider the following example.

Let {v,,} be a noise process taking only discrete values at -\/7, —\/273, 0, m, and
ﬂ. Let A be an event deﬁned in an appropriate probability space, and vo be the initial
condition such that P(vo 6 A) = 2/5.

If vo E A, then {v,,} is deﬁned as

\/2 73, w" 2 0
—\/27/3, 10,, < 0

Otherwise

ﬁr 1<wns3

”n: 0, -1<wnsl

—\/7v -3Swns_1

Sequence {wn} is an i.i.d. white noise with mean zero and uniform distribution on [—3, 3].

The mass function of {v,,} is

PM = —\/27/3) = P(v. = «27/3) = gm. 6 A) = g
P(vn = -f7) = P(vn = 0) = P(vn = J7) = %P(vo ¢ A): g.

h.

>

Thus, (v,,,n _<:1) is uniformly distributed over {—‘/7', —\/2773,0, V2773, ﬂ}, and hence

we can write 03, g 7 for all n. The conditional expectations are given by

E(v,,|vo E A) = ‘/27/3P(w,. 2 0) - ‘/27/3P(w,. < O) = O

E(v,,|vo ¢ A): ﬁP(1 < 10,. S 3) — ﬁP(—3 3 mn S —1)= 0.

Therefore,

B(vn) = B(vnlvo E A) P(vo E A) + E(v,,|vo ¢ A) P(vo ¢ A) = O.

61

Also, we ﬁnd that

a}, if s = t
E(v,vt I 120 6 A) = ,
0, otherwise
and
131, if s = t
E(v,vt | v0 5? A) = .
0, otherwise

Consequently,

231, if s =t
E(U,Ut) =

0, otherwise

As a result, {v,,} is a white noise with mean zero and variance 1}. Furthermore, if second
moment conditional ergodicity is assumed, as deﬁned in (3.31)-(3.32), given v0, then 12,, is
also second moment ergodic. If RLS algorithm is used to identify the model parameters
in (2.1) with this noise model, then the estimates converge to the true parameters since
on is uncorrelated. This may not happen to SM-SA if the initial condition v0 6 A, simply
because 12,, can not be arbitrarily close to the boundaries. By Claim 3.1, the ellipsoid cannot
shrink to a point.

In this presented example 12,. can not be close to the boundaries since the dependency
between on and 00 can not be eliminated, even for large n. In addition to this noise process,
there are other white noise processes in which {v,,} rarely hits the boundaries. In this case,
the performance of SM-SA may not be favorable. ‘ .

For the following example, we will assume {ton} to be a sequence of pseudgrandom

numbers with uniform distribution on [-0.5,0.5]. First, let {v,,} be a white noise process

generated by passing {10"} through a ﬁrst-order all-pass ﬁlter

0.5 + 2’1

HM = 1 + 0.52-1'

Figure 3.2 and 3.3 show the sample correlation coefﬁcients of {mu} and {v,,}. ,3
Both sequences have identical sample mean and second moment properties. \‘Consider

the systems

yn = Glyn—1 + a2yn-2 + aayn—s + vn

12"

62

 

 

 

 

 

-02 - -
-o.4 - «
-O.6 .
-o.e - -
'10 5 1o 15 20 25 30 35 40

Figure 3.2. Sample correlation coefﬁcients of {mu} in the example on page 63 up to time
lag 40.

 

 

 

 

 

-02 - l
-o.4 - «
oer ~
-o.a - -
'10 5 1o 15 20 25 so 35 40

Figure 3.3. Sample correlation coefﬁcients of {v,,} in the example on page 63 up to time
lag 40.

63

and

ya = Glyn—1 + awn-2 + “syn-3 + wn

where 01 = 2.1, a; = —1.48, and 03 = 0.34. Comparing the estimates obtained by RLS and
SM-SA in Fig. 3.4 (a) and (b), we ﬁnd that the behavior of SM-SA for these two noise
processes is completely different while the estimates obtained by RLS are almost the same
because both noise processes {v,,} and {mu} are white.

The difference in SM-SA performance can be explained by analyzing the visitation
frequency of the noise bounds. Fig. 3.5 compares the relative frequency of {mu} and {v,,}
at each amplitude interval. In these ﬁgures, the horizontal axis represents the amplitude
and the vertical axis represents the number of times the noise amplitude falls in a speciﬁc
interval. We see that the distribution of {mu} at each interval is close to be uniform on
[—0.5,0.5], while the distribution of {v,,} appears Gaussian, although it is not, within the
interval [—O.9,0.9]. Since {v,,} rarely stays in the neighborhood of the noise bounds, the
ellipsoid cannot shrink enough to provide good estimates as shown in Fig. 3.6.

However, if we pass {v,,} through a soft-limiter such that v,, is set to be 0.5 or —0.5 if
v,, > 0.5 and v,, < —0.5, respectively, and otherwise left unchanged, then the frequency of
visitation take a neighborhood of the bound, namely 7 = 0.55, increases as depicted in Fig.
3.7. This new {v,,} still seems to have white properties as $116411 in Fig. 3.8. However, from
Fig. 3.9, we ﬁnd that the performance of SM-SA in this case is much better than before.

If we multiply the original {v,,} by two and pass it through the same soft-limiter, then
the rate of visiting the neighborhood of the noise bounds by {v,,} is much higher as shown
in Fig. 3.10. Figure 3.11 shows that in this case the performance of SM-SA is extremely
good.

If we examine the proof of Theorem 3.6, we ﬁnd that the convergence of the ellipsoid
does not depend on the ﬁrst and the second moment properties of the noise input. That
is, even if the conditions deﬁned in (3.29) - (3.32) are not satisﬁed, the estimates may
still converge to the true parameters if the “close-to-boundaries” condition is satisﬁed. For

example, let {v,,} be a noise process obtained by passing {wn} (i.i.d. noise, see page 61)

64

 

 

 

 

 

 

 

 

 

 

O . ..
-O.2 - -
True ----
-O.4 - ~
200 400 600 800 1000
(13)
§

Figure 3.4. Estimates of the parameter (13 when the system is driven by (a) an i.i.d. white
noise (b) a white noise but not i.i.d..

65

 

$883

,8

 

N
O

.5
O

Nunber of time the maglitlﬂe falls in the given intenlal

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

O

 

‘1
O

 

8
2
.j
i

'
0.
.L
i.
i

don-CooouozoIo-o-oiooun oooooooooooo

I
a
I
I
I
o
o
I
I
o
o
i
o
o
I
I
I
I
C
o
l
‘5
Q

I
a
o
I
i
2
u
u
'5

d
O
l
I
l
I

 

Mmberoltimemsmagtitude fallshthegiven interval
3.
31
l
l
\ j
i 1
l
j
J

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

O

 

 

Figure 3.5. Distribution of {ton} and {v,,} at each amplitude interval.

66

 

 

 

 

 

 

4
white noise .
4
i.i.d. noise ‘
‘5 .
10 A A A l A A l A A A A
10° 101 1o2 103
n

Figure 3.6. The logarithmic plot of volume versus the actual time n. We see that the volume
of ellipsoid is much smaller when the noise is i.i.d.

\l
O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

E
£60- ........ . ................... ................... ................... . ................... . ........
§ 3
5 l : E F
350. ........ , ................... 3 ................... i: .............................................. _
3 _
gwl ........ ................... ......
gag. ...................... _.l_l_r1 F. .i. T.
s t— - '

20,. .. 5 .. .
'6 E
g 10" :t *' 1 ‘

G -O:2 (I)

Amplitude

Figure 3.7. Distribution of a white noise at each amplitude interval after passing {0,,}
through a soft-limiter.

67

Sample correlation coefficients in to time difference a: 40

 

1 I U I T I V I

 

 

 

 

-0.2 r .
-0.4 - .
-O.6I- .
-0.8 - -
'10 5 1b 115 2b 25 ab 35 40
time difference

 

 

 

 

 

-02 - -
True ---
-O.4 ' 1
200 400 600 800 1000
n

Figure 3.9. Estimates of the parameter a3 when on is passed through a soft-limiter.

68

 

 

 

 

 

 

g I jg I Y I r—
.5200... ........ g ................... g ................... 4 .............................................. q
S.
3
£150., ................................................................................................ 1
so
3
§3100' ............................................................................................... .1
E
3
E 50. ............................................................................................... _
E
c -014 -o:2 o ” 0:2 0.4
Amplitude

Figure 3.10. Distribution of a white noise at each amplitude interval after multiplying v,,
by two and then passing it through a soft-limiter.

 

 

 

 

 

0- .
.02. .
True ---
-O.4~ -
200 400 600 800 1000
n

Figure 3.11. Estimates of the parameter (1;, when on is multiplied by two and then passed
through a soft-limiter.

69

through an all-pole ﬁlter described by

1

H(Z) : 1 + 0.5.?"1

The output is bounded on [—1, 1]. From its sample correlation coefﬁcients in Fig. 3.12, we

know that {v,,} is colored noise. Figure 3.13 shows that the performances of SM-SA and

 

 

 

 

 

1 . . . .
0.8 - -
0.6 .
0.4 - 4
0.2 — .

0 .

.02 r- _.

-0.4 - -

-O.6l- .

-0.s - -

'10 5 1b 15 20 25 ab 35 40
time difference ‘

Figure 3.12. Sample correlation coefficients of {0,,} up to time lag 40.

RLS are severely affected by the autocovariance of {0,,}.

Again, the distribution of {v,,} is examined in Fig. 3.14 where we ﬁnd that {v,,} rarely
appear in the neighborhood of the noise bounds. If we multiply {v,,} by two and pass it
through a soft-limiter, as we described before, then {0,,} stays in the neighborhood of the
noise bounds, $0.5, most of the time. Note that in this case the new {0,,} is still a color noise
as shown by its sample correlation coefﬁcients in Fig. 3.15. However, the performance of

SM-SA is excellent as shown in Fig. 3.16, while RLS is still affected by the autocovariance.

70

 

 

 

 

0.3 -
0.6 -
0'4 _____________________ _Irye. .. _ _ _ _ _ _I
02 <
0| .4
-02 RLS -
-0.4 SM-SA .
200 400 600 300 1000

Figure 3.13. Estimates of 03 by SM-SA and RLS when the noise is color.

falls in the given interval

  
  

Number of time

Figure 3.14. Distribution of a color noise at each amplitude interval.

71

Sample correlation coefficients Lp to time difference - 4O

 

 

 

 

 

0.8 .
0.6 ~
0.4 ~
02 - -
o .
-O.2 - 1
-0.4 - «
-O.6I- -
03 ~ -
'10 1 1b 15 20 25 so as 40
time difference

Figure 3.15. Sample correlation coefﬁcients of a noise sequence obtained by passing {0,,}
through a soft-limiter.

 

 

 

 

as Tare-— ' J
0.6 .
0.4 -
“I 'l ‘ ' l — i - ' ' - i

; SM-SA
02 ' 4
° RLS ‘
-02 -
-0.4» .

200 400 600 800 1000
n

Figure 3.16. Estimates of the parameter 03 when the color noise {v,,} is passed through a
soft-limiter. '

72

The above examples show that the whiteness of the noise processes is not important in
the performance of SM-SA, although it is important to the conventional RLS. The most
important factor for the convergence of SM-SA is the probability for which the noise visits
a neighborhood of the bounds. Thus, while SM-SA employs similar recursions to RLS, the
two algorithms have totally different convergence properties. The convergence properties
in RLS arise from the second moment properties; while the ones in SM-SA come from the

noise bounding assumption.

3.4 Independence and The Mixing Conditions

To generalize the convergence of SM-SA to colored noise and give a more rigorous treatment,
we ﬁrst introduce the measure-preserving transformation T. Let (5,?) be a measurable
space consisting of a set 5 and o-algebra ? of subsets of S. Let P be a probability measure
deﬁned on (5,?) so that (5, ? , P) is a probability space. In our case, if X is a real set
bounded by «7,

X = l‘ﬁH/ﬂ’

then S = X°°, the collection of all real sequences (vo,v1, . . .,). Let T : (5,?) —> (5,?) be
a measurable transformation on (5,?, P), then T is said to be measure-preserving if and
only if P(T"1 A) = P(A) for all A 6 ? [10]. In order to simplify our discussion, we can take

the measure-preserving transformation T as a one-sided shifter,

T(‘vo, ‘01, . . .) = (1)1, 02, . . ..)

Note that P(T"A) = P(A) implies that P(T‘kA) = P(A) for k = 1,2,.... Thus
P{(uo,v1,...,u,,_1) 6 Ba} = P{(vk,vk+1,...,v,,+k_1) e Bu} for all n and k and n-

dimensional Borel sets Bu.

73

 

Deﬁnition 3.2 Let T be measure-preserving on (5, ? , P), T is said to be independent if
for all A,B E ?,
P(A n T‘"B) = P(A)P(B) Vn e N.

T is said to be mixing iffor all A,B E ?,

lim P(A n T‘"B) = P(A)P(B). (3.40)

7%..”

 

The mixing condition is a very reasonable assumption since it reﬂects the physical
characteristics of many common random processes. In this research, let {0,,} be the noise
input of the model in (1.5), then {v,,} satisﬁes the mixing condition if the dependency
between v,- and 0,- decreases to zero when |i — j| —5 00.

For most system identiﬁcation problems, we assume that the PE condition on {x,,}
holds over all natural numbers. Since the SM-SA has a “selective updating” feature, we
should conﬁne the consideration of the PE condition to the subset of {x,,} in which updates
take place. Since we do not know in advance which points will be used in the updates,
this requires that all the inﬁnite subsets of {x,,} are PE. However, this problem can be
alleviated if the mixing condition is imposed on {0,,}. Consider the following corollary

given by Bitmead [17]:

 

Corollary 3.1 If x,, is such that E(x,,x,7,') is positive deﬁnite and either x,, is ergodic or

mixing, then there exists a constant 6 > 0 and a ﬁnite integer N such that
n+N
Z x,x,~T 2 51 > o (3.41)

i=n

holds inﬁnitely almost surely.

 

74

In the present work, the upper bound of (3.41) is guaranteed by the stability assumption.
Note that all the subsequences of {x,,} are also mixing if {x,,} is mixing. Thus if B(xan)
is bounded below at the points which updates take place, then (3.4) is satisﬁed.

The next two corollaries serve as an extension of Theorem 3.6. First we consider the
independent case in which the mean of the noise process does not have to be zero and the

distribution does not have to be uniform.

 

Corollary 3.2 Let {v,,} be a noise sequence and let T be independent on (5, ?, P) according
to Deﬁnition 3.2. Given 6, let D¢+ = [ﬁ— c,,/7] and D,. = [—\/7', —\/7'+ c]. If SM-SA

is employed and there exists a 6 such that for all n

P(vﬂ E De...) > 6, and P(vn E DC.) > 6,

then the volume of the ellipsoid converges to zero.

 

Proof: Consider the proof of Theorem 3.6. If the volume of the ellipsoid does not
converge to zero, then (3.36) must hold. Given 6’, by assumption there exists a 6’ such that

for all n
P(vn 6 D¢I+) > 6’, and P(vn 6 Dy-) > 6’.

1. Suppose bin-13%.. > 0. Then the probability that (3.36) holds is less than 1 - 6’.

2. Suppose 9:..-1xm. < 0. Then the probability that (3.36) holds is also less than 1 —6’.

If we change the quantity 235$ in the proof of Theorem 3.6 by 1 - 6’, then we have
0 e I n _
nlLlIgoP(AnnAn_1“An_gn...A1)Snango(1—6) -0

and thus the ellipsoid converges to zero. 0

75

The key point to the proof of the convergence of the parameter set is the independence
between vﬂ and 6,7,1] x,,. In order to extend the convergence property from independence
to mixing condition, SM-SA has to be modiﬁed. The reason is that for the original SM-
SA with the ARX model, x,, contains yn_1, the output at time n - 1, thus we cannot
separate the dependency between vn and 611x" unless {v,,} is independent. However, if
we modify the SM-SA so that x,, is not changed unless an update takes place, provided
that for large n the time difference between two updates is large, then we can apply the
mixing condition to show convergence. This is possible since x,, depends only on the events
up to the previous update. This modiﬁed version is identical to the SM-SA algorithm
r

except for the regression vector x,,. In SM-SA, x,, = [y,,_1, . . .,yn_,,u,,,u,,_1, . . .,u,,..q],

. . . . T
while m the modiﬁed algorithm, x,, = [y;,,,. . .,y¢,,_P,u,,,u¢,‘,. ..,u,,_g] where y", u,, are

the same as in (1.5), and {t,} is a subsequence such that c,,. < 0 and t,, < n 5 “+1. ,

Note that in this situation, the algorithm is identical to RLS with a decimated input. The

corresponding decimator has a time-varying sampling rate, say D“, which is controlled by

the data selection process. Generally, if y,, is wideband, then D" is large and the data
picked are usually sparse.

‘ All the convergence proofs developed previously for SM-SA are still applicable to this

modiﬁed algorithm, since those convergence properties are not affected by this modiﬁcation.

1”"

The following corollary applies to this modiﬁed algorithm.

 

Corollary 3.3 Let {v,,} be a noise sequence and let T be mixing on (5,?, P) according to
Deﬁnition 3.2. Given 6 and N, let D9,. = [ﬁ- 6,‘/‘—7-] and D,. = [—ﬁ, -\/7+ 6]. If the
modiﬁed SM-SA is employed and there exists a 6 such that for all n

P(vn 6 DC...) > 6, and P(vn 6 DC-) > 6,

and there exists an M > N such that there are inﬁnite number of interval [n — 1,n - M]

on which no update takes place, then the volume of the ellipsoid converges to zero.

 

76

Proof: Consider the proof of Theorem 3.6. Again let {m,,} be the subsequence such that
7 - 53,,” > 6 for some 6 > 0. If the volume of the ellipsoid does not converge to zero, then

(3.36) must hold. For convenience, we rewrite (3.36)
Iv..." — airlxmnl < y; — e' where e": 72,, (3.42)
Given 6’, then, by assumption, there exists a 6’ such that for all n
P(vﬂ 6 Deli.) > 6’, and P(vn 6 Der-) > 6’.

1. Suppose airlxmﬂ > 0, then the probability that (3.42) holds is less than 1 — 6’.

"v

2. Suppose bin-13%.. < 0, then the probability that (3.42) holds is also less than 1 — 6’.

Again let A,, be the event that (3. 42) holds at time m,,, then
7"? Alli/h/AEE at”) : fillV' /

l

F(A,,) < P(- -,/~7+e’<vm,<,/~7|6mn_,xm >0) P(0T0_1xm,,>0)

+P (-\/’7 < v"... < f- 6’ l Emu-1x11». < 0) P (6711;.-1an < 0) . m"
'H-————-—-——-~ly-—~—/'<
. k 77‘

‘ .C
l/l/l‘~{jl

Note that {6,1,4an > 0} does not depend on the events which take place after the

ill.
previous update. Let the time difference betweenqi and the time index of the previous
update be 1:. Deﬁne W,,_k = A{0,,,"_1xm > 0} and Xn- - A{- f+ 6’ < v,,," < J7}. Then
WM)“ X" E ? and P(Xn) al— 6’. By the deﬁnition of mixing condition in (3.40),

an; P(X, n W,,-;,) = P(Xn)P(W -,). (3.43)

Suppose P(W _k) is not zero. Then (3.43) can be written as

m P(Xn 0W —k)
k—ooo P(Wn_k)

 

=klim P(Xn | W,,_,,)- _. P(Xn ).
Let 0 < f < 6’, then there exists an N such that

lP(Xn I Wn-N)—P(Xn)l<£r

77

and, hence,
P(Xn) - E < P(Xn | W -N) < P(Xn) + E. (3.44)

. "dn’A/v’, IMJ‘I
Note that (3.44) holds only if no update takes place on - N, n}.

Due to the slow convergence of the ellipsoid for large 11, most incoming data are rejected.

By assumption, there exists a positive integer M Z N such that no update takes place on

{n — 1,n — 2, . . (,1:— M}. Thus (3.44) is satisﬁed and

F(A,,) g P (-,/7+ 6 < v,,, < J? | birlxm” > o) P (614me > o)
+P (—ﬁ < v,,,” < ﬁ- 6’ | 6:“,an < 0) P (6143:...“ < 0)
(P(—,/~7+ c’ < v,,,” < «7) + 6) P (6141:...” > 0)
+ (P(-ﬁ < v,,.” < ﬁ— 6’) + 5) P (bin-1x77»: < 0)

1-€+£<1,

IA

Q ( i

‘\,

IA

and

P(A,|A,._N) ” »'

IA

P (—,/~7+ a < v,,," < (m {39;qu > 0} n A,,-N) P ({éiﬂxm > 0} | 11..-”)
P

+P (-,/j< v,,." < ﬁ- 6’ | {gin—139m. < 0} n A,,-N) ({éZn-lxmn < 0} | A,,-IV)
/
i 1.

IA

1-€+£

Therefore,

P(An n A,,-N) = P(AnlAn-N)P(A,,_N) g (1 - 6’ + ()2 < 1.

Note that

P(An n A...l n ---n A,,-N) g P(An n A,,-N) g (1 — 5' + ()2 < 1,

78

and

klgrgo P(An-HcN n An-HcN—l “-0 An)

IA

#520 P(An-HcN n An+(k—1)N n -° ' ° 0 An)

3 klim(1—6'+£)"=O.

That is, the probability that the ellipsoid converges to zero is one.

Chapter 4

Simulation Studies

Set-membership based algorithms have shown great potential in applications for many signal
processing and control systems problems involving parametric modeling [25, 26, 28], such
as speech processing, image coding, neural networks, and adaptive control. This chapter is
concerned with simulation studies using the SM-SA algorithm with optimal and suboptimal
strategies for the properties discussed in Chapter 2 and 3.

The simulations performed in this chapter are designed for those noise processes which
are commonly seen in engineering applications. These simulations not only serve as a test
for theoretical analysis, but also provide heuristic descriptions for those phenomena which
cannot ‘ be proven by rigorous mathematics even though they work for most, engineering
applications. We will see that the SM-SA performs better than conventional RLS in many
problems if the noise bound can be correctly estimated.

In Section 4.1 we use an AR(3) time-invariant model to examine the general properties
discussed in Chapter 3, including the behavior of the estimates, the volume of the ellipsoid,
the estimation error, and the resemblance of SM-SA to stochastic approximation. Section
4.2 deals with a time-varying system derived from a speech signal. Comparisons to the
conventional RLS are also made. Section 4.3 is concerned with the suboptimal test when
the model is time-invariant. Finally, a more general case in which the noise bounds are
over-estimated for the time-varying system is considered in Section 4.4.

For all the simulations in this chapter, the models are driven by the noise process {Sn}

which is the name representing all the noise processes summarized in Table 4.1. Note that

79

80

 

5,, Characteristics of the noise processes

wﬂ IID white noise with distribution U [—0.5, 0.5]

v,, Obtained by g-(wn + 0.1)

2,. Zero mean binary Bernoulli sequence obtained from {wn}
t,, Biased binary Bernoulli sequence obtained from {wn}

 

 

 

 

 

 

 

Table 4.1. Noise processes used in the simulations in this chapter.

the sequence {2"} in Table 4.1 is obtained by

0.5 wﬂ Z 0

Zn:

—0.5 112,, < 0

and {t,,} is obtained by

0.5 10,, 2 -0.081

t,,:

-0.5 ton < —0.081

4.1 Performance of SM-SA for Time-Invariant Systems

In this section, an AR(3) model given by

yn = Glyn-1 + (Wyn—2 + “33111—3 + Sn (4.1)

is considered. This model has three poles at 0.8 :l: 0.2j and 0.5 when a1 = 2.1,a2 = —1.48,
and a3 = 0.34. For these simulations, a set of 50,000 data points is generated to identify

the model parameters using {5...} with
5,3 5 0.25 (4.2)

Note that Sﬂ should be replaced by the random sequences given in Table 4.1.

81

4.1.1 SM-SA with Pseudo-Random Excitation

In this section, the model (4.1) is driven by a pseudo random sequence {wn} with mean
ﬂ = -8.9 x 10", which is obtained using a C-language version of the machine-independent
random number generator URAND describe in [11]. Only 1.58% of data points are selected
by SM-SA. Figure 4.1 shows the non-increasing volume measure 11,, on a logarithmic scale
for the model in (4.1). Figure 4.2 shows that the estimation errors Isnl falls into the dead
zone [0, 0.5]. The ﬁrst 1000 SM-SA estimates can be seen in Fig. 4.3. We observe that the
estimates are “almost” consistent and converge at a reasonable speed with only 1.58% of
the total data points used.

Figure 4.4 illustrates the weighting sequence {A;} averaged over six experiments. It
shows that the averaged {A;} is type l/n sequence, which implies that it satisﬁes Robins-

Monro’s conditions for convergence.

1o1o

 

10-10 _

 

 

 

 

Figure 4.1. The logarithmic plot of 11,. versus time n for the linear time-invariant AR(3)
model with pseudo-random excitation.

For LSE algorithms, the estimates are consistent when the noise excitation is white. If

82

 

 

 

 

 

Figure 4.2. The logarithmic plot of lenl versus time n for the linear time-invariant AR(3)
model with pseudo random excitation.

 

 

 

2 " "' I ' I ' 'I -
1s «
1 ..
as _ __ - _ ____ -
o .
-os .
.1 4
-15 -. --- T - - . ----.--
200 400 n soo 800 1000

Figure 4.3. First 1000 estimates of the parameters in the AR(3) model obtained using
SM-SA and a pseudo-random excitation, where a; = 2.1, a; = -1.48, and 03 = 0.34.

83

 

 

 

 

 

4
10 _r '3
1O6r 1
" J A L4 .aaal A A 4 A AH
1° [0 A A A AAAA 1 A 2 3
10 1O 10 10

Figure 4.4. The relationship between logarithms of time and weight.

we let 1),. = %(wﬂ -- 0.1), then the noise bound given in (4.2) still holds for {v,,} but the
sample mean is now —-113. Figure 4.5 compares the performance of RLS and SM-SA. It is
seen that RLS is severely affected by the bias of the noise excitation. The bias in SM-SA,

however, is very small. Note that only 0.89% of the data points are used by SM-SA.

4.1.2 AR Models with Binary Bernoulli Excitation

SM-SA algorithm is derived from bounded noise constraints. The weight A; is optimal in
the sense that the feasible set is minimized given that the previous information is known.
We may view the size of the feasible sets as a “conﬁdence interval.” If the optimal weight
exists most of the time, it is expected that the performance of SM-SA is much better than
RLS because of the optimal weights. From the analyses in the previous chapter, we know
that optimal weight A; exists when the magnitude of the noise is close to the bound. In
this simulation, a zero-mean Bernoulli process {2”} is employed to drive the system. Let
7 = 0.25, then {2"} always hits the bounds. 80.11% of the data points are used. This is

not surprising because {2"} is always at the boundaries and thus updating is carried out

84

True value a1 -2. 1
2.4 a . v w

 

2.35 - ‘4

2.3

2.25

 

2.2

 

2.15

 

2.1

 

2.05

 

 

 

2
x10‘

True value e2--1 .48

' 1
'1 .2 f r I f
-1 .3 ‘
-1.4 -
SM-SA
-1 .5
-1 .8
-1 .7 ..
1

 

 

 

 

 

 

 

 

 

 

 

 

'1'8 2 3 4 5
" x 10‘
True value 33-0.34
0.6 Y r U ‘I
0.5 ~ «
0.5 - J
I
0.45 I q
. 1 RLS
0.4 I H _ v -1
035‘ 4
0.3 SM-SA ~
0.25 -
" x 1 0‘

Figure 4.5. The estimates of model parameters by RLS and SM-SA for biased noise se-
quence.

85

most of the time. One of the advantages of this extensive updating is that the volume of

the ellipsoid converges very fast to a very small number as indicated in Fig. 4.6.

Non-increasing volume when binary Bemoulli sequence is used

 

I v Vﬁvvvv' *7

 

 

 

Figure 4.6. The logarithmic plot of pa versus time n when a binary Bernoulli sequence is
used.

A comprehensive comparison of SM-SA and RLS is shown in Fig. 4.7.

If we replace {2"} by a non-zero mean Bernoulli sequence {t,,}, then, according to Fig.
4.8, the performance of SM-SA with 78.16% of the data points selected is not affected by
the non-zero mean {t,,}. However, the RLS estimates are clearly inﬂuenced.

4.2 Performance of SM-SA for Time-Varying System

In this section, a speech model with time-varying coefficients

ya = 61(n)yn-1 + b2(n)yn—2 + b3(n)yn-3 + wn (4-3)

86

True value a1 -2. 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.137 f l I r v
2.12 RLS ‘
2.1 1 -
SM-SA
2.1 ‘r ‘. ‘
2.09 .
2.08 .
2'07 1 1 5 2 2.5 3
x 10‘
True value a2--1.48
'1 .42‘ t r r f I
'1 .43 l "'
a; A , I am
RLS j
4 '52 0.5 1 1.5 2 2.5 ' 3
x 10‘
True value a3-0.34
0.36 f l I I r
0.355 I _
SM-SA
.4
1.5 2 2.5 3
x 1 0‘

Figure 4.7. First 30,000 estimates obtained using SM-SA and RLS when a binary Bernoulli
sequence is used.

87

True value a1 -2. 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RLS
2'1 1 I
2.1 1: SM'SA
2'09 0 5 1 1 5 2 2 5 3
x 10‘
True value a2--1.48
'1 .46 I T I T I
-1 48 J: #SILSAI
I
-1.5 I -
‘1 .52 I q
-1 .54 I -1
RLS
'1 .56 cl
'1'“ 0.5 1 1 5 2 2 5 a
' x 10
True value 113-0 .34
0.4" I I I I I
0.39 RLS -
0.38 -
0.37 -
.1
4 s22; l
.I
1 1.5 2 2.5 3
x 1 0‘

Figure 4.8. First 30,000 estimates obtained using SM-SA and RLS when a biased binary
Bernoulli sequence is used.

88

is considered. Simulations for speech signals have been performed by Odeh [28]. Here we
use a similar experiment just to examine the performance of SM-SA. True parameters in
this model are derived using short-term linear prediction analysis [37] of order two with a
256-point Hamming window and Levinson-Durbin recursion (see [37]) on an utterance of
the word “four” by an adult male speaker. The acoustic waveform is shown in Fig. 4.9.

The data were sampled at 10kHz after 4.7kHz low-pass ﬁltering. While more meaningful

The acoustic waveform of the word "lour‘
1 000 I I r I I

 

800-

600'

400"

 

 

 

 

 

 

 

200'-

400'-

-600'

 

 

 

 

 

 

 

soc» -

 

 

 

 

-1 000° 1 000 2000 3000 4000 5000 ' 6000
11

Figure 4.9. The acoustic waveform of the word “four”.

analysis of speech would involve model orders of 10-14 [37], this small number of parameters
is used here so that the results are easily illustrated [28]. The true parameters for the word
“four” are shown in Fig. 4.10. A set of 5376 data points were generated by driving this
model by 11:". In this simulation, only 2.15% of data points are used by SM-SA with good
performance. Simulation results by using SM-SA and RLS algorithms can be seen from
Figs. 4.11 to 4.12. Similar results by using SM-WRLS can be seen in [28] in which 1.86%
of data points are used. In each ﬁgure there are two curves, one for the true parameters as

reference, the other for the estimates obtained by using SM-SA or RLS.

89

 

 

 

 

 

 

 

 

 

-1 _ q
-1.5 - «
'2 1000 2000 3000 4000 5000

(a)
3 . . . . .
2.5 ~ -
2 _ .
1.5 - ' ~
1 . -
0.5 - .
0 - _
-0.s -
-1 . q
-1.5 - -
'2 1000 2000 @3000 4000 5000

Figure 4.10. The “true” parameters for the word “four”: (a) b1(n) (b) bg(n).

90

The power of the SM-SA algorithm is evident in this simulation. Upon comparing the
estimates obtained by SM-SA and conventional RLS algorithms, we ﬁnd that SM-SA shows
better tracking capability than conventional RLS. For most algorithms, there is a trade-
off between improved computational complexity and improved performance. For SM-SA,
however, improved performance comes along with improved computational efficiency. More

discussion about SM-SA in time-varying system is left in section 4.4.

4.3 Suboptimal Test

More computational saving can be achieved by employing the suboptimal test. In Section
3.2.2 we have shown that many convergence properties of the suboptimal SM-SA are similar
to the “optimal” SM-SA. In this section, simulations and comparisons to the conventional
RLS are performed to verify the performance of SM-SA and its convergence properties.
Only the time-invariant AR(3) model given in (4.1), will be examined. The reason we do

not apply suboptimal test to speech signal is explained in Section 4.4.

4.3.1 AR Models with Pseudo-Random Excitation

First the model (4.1) is driven by {wn}. Only 0.46% of data points are used by the subop-
timal strategy. Compared with 1.58% in “optimal” SM-SA, this is a signiﬁcant saving. But
more importantly, the checking procedure is 0(m). Figure 4.13 shows the non-increasing
volume measure pa in logarithmic scale. Fig. 4.14 shows that the estimation errors [5"]
also falls into the dead zone [0,05]. The ﬁrst 2000 estimates can be seen in Fig. 4.15. We
see that the estimates are consistent and converge at a reasonable speed with only 0.46%
of total data points used.

Figure 4.16 illustrates that {A;} is a 1/n-type sequence, satisfying Robins and Monro’s
conditions for convergence.

Only 0.19% of data are used if {v,,} is the driving process, Note that the same ex-
periment performed in Section 4.1.1 used 1.03% of the data points. Figure 4.17 compares
the performance of RLS and SM-SA. We see that the effect of biased noise on suboptimal

SM-SA is much less than it is on RLS.

91

CD

 

 

 
    

estimates

 

 

 

 

 

0.5 1
0 d
-0.5 q
-1 I- .4
-1.5 P -

'2 1000 2000 3000 4000 5000

(a)
3 I I I r I

2.5 - .

 

estimate

 

 

 

 

1 000 2000 (b )3000 4000 5000

Figure 4.11. Estimates of bl(n) for the word “four” using (a) SM-SA (b) RLS.

92

 

 

 

 

 

 

 

 

 

2.5 - ‘
2 r '
1.5 r ‘
1 r ‘
0.5 ' ‘
0 estimates -
-0.5
-1 (- true "
.15 . .
'2 1600 2000 ao'oo 4000 so'oo
(I)
25- I
2 r .
1.5 - .
1 '- .
0'5 ' estimate ‘
0 q
05 -
'1 P true I
-1 5 ~ J
.2 .03. 2000 (1)35“ .50. 5000

Figure 4.12. Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS.

93

Non-increasing volume using sub-optimal test

 

 

 

 

 

' ' ""'I V ' """I ' ' ‘ ""'I P' I """I "' ‘ '
p
p
p
5
10 r
L
p
0
10 P
y
p
I
I
-6I
1O -
p
A_ A AAAAAAJ. A A AAAAAAI A LAAU‘A‘ A A A AAAAAl A A A
O 1 2 3 4
10 10 10 10 10

n

Figure 4.13. The logarithmic plot of 11,, versus time n for the AR(3) model when the
suboptimal test is used.

Estimation error for sub-optimal bet

1o . - - "m, - - - - - a - -

 

 

 

 

 

 

 

Figure 4.14. The logarithmic plot of |sn| versus time n for the AR(3) model.

94

 

 

 

 

 

 

 

1 - -- true parameters , s

-0.5 - “

-1 . ..
-1.5

  

 

 

200 400 600 800 1000 1200 1400 1600 1800 2000
n

Figure 4.15. First 2000 estimates of the parameters using SM-SA, where 01 = 2.1, 02 =
-1.48, and 03 = 0.34.

 

 

 

 

 

 

Figure 4.16. The relationship between logarithms of time and weight.

95

True value a1-2.1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.4 I I I I
2.35 ‘
2.3 .1
2.25 ‘
2.2 ' ‘
1 _ FlL§
2.15 -
2-‘ SM-SA I
2.05 ‘
" 1 2 3 4 5
x 10‘
True value a2--1 .48
'1 .2 I 1 I I
-1 .3 r '
'1 .4 .1
SM-SA .
-1 .5 g . ..
'1 .8 . . hmA WA RLS -
-1.7 1
-1.3 . 1 ‘ ‘ ‘ '
1 2 3 4 5
n x 1 0‘
True value a3-0.34
0.6 I I I I I
0.55 .
0.5 -
0.45 -
II ‘ RLS
0.4 l ’ I -
0.35 -
SM-SA
0.3 r
0.25 -1
0.2 l 1 1 1
1 2 3 4 5
n
x 1 0‘

Figure 4.17. Estimates of parameter using RLS and SM-SA for the suboptimal test.

96

4.3.2 AR Models with Binary Bernoulli Excitation

Let 7 = 0.25 and let the AR(3) model is driven by {2”}. In the simulation for suboptimal
test, 49.86% of data points are used. The corresponding experiment for optimal SM-SA
used 80.11% of data points.

Figure 4.18 compares the performance between SM-SA and RLS. We ﬁnd that SM-SA
shows excellent performance in the convergence of estimates even for the suboptimal test.

If we change the noise process to {t,,}, Fig. 4.19 shows that the performance of SM-SA
is not affected by the bias of excitation, which is similar to the result for the “optimal”
SM-SA. In this case, 48.91% of data points are used, while 78.16% data points are used in

the corresponding optimal SM-SA.

4.4 More Studies on Time-varying Systems

SM-SA algorithm is derived assuming a time-invariant system. In the previous simula-
tion, we have seen that SM-SA has some tracking ability. The adaptability of SM-WRLS
algorithm have been studied in previous work [28]. Although we do not intend to pur-
sue theoretical analysis of SM-SA for adaptivity of SM-SA, we can make some heuristic

descriptions. Consider the weighted covariance R(n)
R(n) = (1 — A;)R(n — 1) + 1;):an

The (1 — A;,) term in this equation can be viewed as a forgetting factor for the algorithm.
The smaller the quantity A;, the lesser the impact of the current point. By controlling the
magnitude of the weights, the algorithm can decide the importance between the new and
old data. This is why SMoSA can trace some time-varying systems. In this situation, the
ellipsoid cannot shrink too fast, and since the parameters are always changing, the ellipsoid
cannot converge to zero either. A better way to estimate system parameters is to “over-
estimate” the noise bound because we can prevent the ellipsoid from shrinking to a very

small set. For the speech signal we discussed in this chapter, if we “over-estimate” the noise

97

Tme value a1 -21
2.137 I I I I I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SM-SA '
2'07 0.5 1 1 .5 2 2.5 3
" x10‘
True value a2--1 .48
'1 .42 I I I I I
- A, 8M5
-1 .51 RLS ‘
'1 '52 ’ 0.5 1 1.5 2 2.5 3
" x 1 0‘
True value a3-0.34
0.36 I 1 f I If
RLS
1- - 1 2 f
SM-SA
4
0.315 ] -
0'31 0.5 1 1 .5 2 2.5 3
" x10‘

Figure 4.18. First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary
Bernoulli sequence is used and suboptimal test is employed.

98

True value a1 =21

I

 

 

 

 

 

 

2.09

‘0

x10

True value a2a-1 .48

I I I I

 

-1.46

 

 

-1.48

-1.5

-1 .52 r

-1.54

 

-1.56

 

 

-1 .58

True value a3=034

 

T I I I

0.39
0.33 RLS
0.37
0.36
0.35

0.34 — S—uzM'SA

.—
h

 

 

x10‘

Figure 4.19. First 30,000 estimates obtained by SM-SA and RLS when a bias binary
Bernoulli sequence is used and suboptimal test is employed.

99

bound so that the bound be chosen as

3.0
I/\
O
0"

but the real bound is

103, S 0.25

we will ﬁnd that the algorithm has better tracking ability as shown in Fig. 4.20. In this
case 4.09% of data points are used, which is more than before.

Note that SM-SA is developed for time-invariant systems. If it works in a time-varying
system, some assumptions may be violated, for example 5,, may become negative. Thus it
is not appropriate to apply the suboptimal test to time-varying system since a negative 6;,
may not imply a negative c,,. Some simulations show that if we apply the suboptimal test

to time-varying systems, the results are often not encouraging.

100

C1)

 

 

 

 

 

 

 

 

 

 

1WD 2000 3000 4000 5000
(a)
2.5 r -
2 - -
1.5 ~ -
1 .. .-
0.5 r -
o .- .1
-0 5 M
true

-1 - I 'L v ‘1'" HJ—i

estimate
-1.5 '- 1

 

Figure 4.20. Estimates of (a) b1(n) (b) 02(11) by using SM-SA algorithm for the word “four”
when noise bounds are over-estimated.

Chapter 5

The Convergence Properties of

UOBE Algorithms

UOBE algorithm, a general class of OBE algorithms, is based on generalized WRLS in
which very broad classes of “forgetting factors” and data weights may be employed. Histor-
ically, this research on the SM-SA algorithm, and that of Krunz on the “Dual-SM-WRLS”
algorithm [33, 34], precedes and contributed to the development of UOBE. In this chapter,
we will generalize the convergence properties of SM-SA to all UOBE algorithms and show
the signiﬁcant implication of this research to broader applications. A brief review of UOBE

is given here. Details are found in [38, 39].

5.1 The UOBE Algorithm

Given the system in (1.5) and the noise bound in (1.6), the hyper-ellipsoid set C(n) associ-
ated with UOBE is deﬁned by

0(n) = {o no — 0.0T 07“)“: - a.) s 1}
where 0,, is the WRLS estimates obtained by minimizing

1 ﬂ
V(0) = ; Z m,,,,(yt. — £9)?

7:1

101

102

in which
a,,wn_1,, 1' S n — 1
“’71 r —
ﬁn 1' = n
and x,, is a scalar quantity,

5,, = 010000,. + i "’n... (mg—,3)

1'=1

C(n) is fundamentally deﬁned as the sum of the weighted outer products,
C(n) = a,,C(n — 1) + ﬂnx(n)xH(n)

Different algorithms in UOBE are distinguished by the weighting sequences {a,,} and .
{ﬂu}, in conjunction with the optimization criterion employed in selecting them. Three

criteria have been applied to the optimization process:

Optimization Criterion 1 Minimize the determinant of the inverse ellipsoid matrix,

11,, {B(n)} é det{nnC'1(n)}.

Optimization Criterion 2 Minimize the trace of the inverse ellipsoid matrix,
_ A —1
(14901)} = tr {MC (7%)}-

Optimization Criterion 3 Minimize the parameter Kn.

Criteria 1 and 2 were ﬁrst suggested by Fogel and Huang [14], while criterion 3 was used by
Dasgupta and Huang [20]. In the single-output case in which C(n) is clearly interpretable
as an ellipsoid in 52'", 11,, {C(n)} is proportional to the square of the volume of the ellipsoid,
while pt{ﬁ(n)} is proportional to the sum of squares of its semi-axes. Criterion 3 has been
used in conjunction with a speciﬁc weighting strategy to achieve a rigorous proof in certain
sense [20].

Let us adopt the policy of writing the weights an and 6,, as functions of a single param-

103

eter to be optimized at time n, say A,, so that

a,,(An)
ﬁn = ﬂn(An)

0" (5.1)

where {an} and {0,,} should now be considered to be sequences of functions. If we choose
a,,(An) = l and ﬂn(z\,,) = A,,, then UOBE is exactly the SM-WRLS algorithm. SM—SA is
obtained by choosing a,,(An) = (1 - A,,) and 13710») = A,, and optimization criterion 1. The

extension of the optimization process of SM-SA to criterion 2 is left as a future work.

5.2 Convergence Properties of UOBE

The most important theorem for the theoretical framework of UOBE is given as follows

[38, 39]:

 

Theorem 5.1 Consider a UOBE algorithm in which volume or trace is to be minimized.

Then the following are independent of the choices of function sequences {an(/\n)} and
{16n(’\n)}-'

1. the sequence of measures of optimality ({pv(n)} or {p¢(n)});
2. the data points selected (times for which there exists A; > 0);

3. the parameter estimate, 0,,.

However, in a UOBE algorithm with n minimization, none of these items is independent of

the sequence {an()\,,)} and {ﬂn(An)}.

 

Theorem 5.1 concludes that any UOBE algorithm with volume minimization, regardless of
the weighting policy, will “behave” similarly in three senses. By using this theorem, we can

extend the convergence properties from SM-SA to UOBE:

104

 

Corollary 5.1 Consider the UOBE algorithms in which volume is to be minimized. Let

 

{t,,} be a subsequence such that update takes place at time t,,. If there exists an N 6 N and

91, 92 > 0 such that for all n

n+N

911 < Z x,,‘xi < 921, (5.2)
k=n+l

then

"112;, "0,, "' 971-1" 3 O-

In addition to (5.2), if the noise process {v,,} is independent, and given 6 there exists a 6

such that for all n
P(vn 6 DC...) > 6, and P(vn 6 De.) > 6,

where Dc... = [\fi — €,\/‘7] and D¢_ = [—‘/'7,-\/'7 + 6], then the volume of the ellipsoid
converges to zero and the estimates obtained using any UOBE algorithm converge to the

true parameters.

 

This corollary is very important since, through the research on SM—SA, the convergence
properties of all UOBE algorithms are established? This implies that SM-SA has much

broader implications for the entire class of UOBE algorithms.

 

’The original OBE paper by Fogel and Huang [14] is sometimes misunderstood to indicate the convergence
of the bounding ellipsoid to a point under ordinary conditions on {0...} (see [38]). Now the convergence
properties are proved.

Chapter 6
Conclusion

6.1 Concluding Remarks

6.1.1 Summary of Landmark Developments of SM-Based Techniques

Set-membership-based algorithms are developed to estimate the parameters of a linear
system in which the noise perturbance is additive and bounded.

After the ﬁrst publication of the OBE algorithm by Fogel and Huang [14] in 1982,
SM-techniques attracted the attention of many research groups in the signal processing
ﬁeld. However, the lack of convergence proof has been a serious weakness for the OBE-like
algorithms, although it shows good performance in many applications..

In order to remedy the convergence problem, Dasgupta and Huang proposed a modiﬁed
OBE [20] in 1987. Instead of minimizing the volume of the ellipsoid, they minimize the
upper bound of the set. This algorithm is convergent in the sense that the optimal weight
and the difference of the estimates converge to zero. However, it has been criticized by some
researchers [30, 38, 39] due to the lack of geometrical meaning.

The Adaptive Signal Processing and Speech Processing Laboratories at Michigan State
University have been pioneers in the research on SM-based techniques. The problem of the
tradeoff between the interpretability and convergence led to the study of SM-SA. During
the course of this research, it is found in parallel work that almost all OBE algorithms, can
be combined into a uniﬁed frame work called UOBE, and, SM-SA is just a special case of

UOBE obtained by choosing a,,(An) = 1 - A,, and ﬂn(An) = A,,, and volume minimization.

105

106

The name SM-SA comes from the structural resemblance to the stochastic approximation

method.

6.1.2 Contributions of this Research

Selective updating so as to improve the computational load is the common feature among all
the members in UOBE. In contrast to RLS estimation in which the parameter estimates are
updated at every iteration, UOBE updates the estimates only if the incoming data contains
sufficient information. In addition to this advantage, the major contribution of this work is
the establishment of an algorithm which is both interpretable and convergent.

The interpretability of SM-SA comes from the optimization process in which the optimal

weight is found by minimizing the volume of the ellipsoid deﬁned by
_ A _1
#u{9(n)} = det{KnR (71)}-

The convergence properties come from the boundedness of the covariance matrix. In

this work, we have shown that:
1. The optimal weight A; converges to zero.

2. The squares of the estimation error sequence fall into a dead zone given by [0,73,]

asymptotically.

3. If the noise sequence {v,,} is independent and the probabilities are non-zero in the

neighborhoods of the noise hounds, then the volume of the ellipsoid converges to zero.

4. If the noise process is mixing, then under a slight modiﬁcation of SM-SA, the volume

of the ellipsoid converges to zero.

More importantly, through the analysis on SM-SA, the convergence of estimates presented
in Chapter 3 can be extended to all UOBE algorithms.

The other contribution is the illustration of the central difference between SM-SA and
RLS. While both algorithms have the same recursive formulation, they show completely

different behaviors. The reason is because the weight of SM-SA at time n is a random

107

variable which depends on all the previous and present noise samples, while the weight of
conventional RLS or WRLS is deterministic. We have shown the difference in Chapters 3

and 4 by theoretical analysis and simulation studies.

6.2 Future Work

This research is concerned with the theoretical analysis of a special case of UOBE, SM-
SA. While we have seen in Chapter 4 that SM-SA performs well in tracking time-varying
parameters, we should not rely on an algorithm, which is developed in a time-invariant
setting, to perform well in a time-varying environment. It is of practical interest to develop
the adaptive SM-SA since most engineering applications deal with time-varying models.

Secondly, we have shown the convergence of SM-SA only on volume minimization, while
it is believed that the convergence properties under trace minimization are similar, it is still
necessary to provide a rigorous proof.

Finally, it is often difficult to get an accurate estimate for the noise bounds. In order not
to violate the bounded error assumption, it is natural to give conservative noise bounds. As
we have discussed, this often leads to a bad estimates. One way to solve this problem is to
use soft noise bounds in which the bounded noise assumption may be‘violated sometimes.
Thus the author suggests the study of soft-bound SM-SA algorithm as a further work.

In conclusion, the following topics are proposed for future research:

1. To develop an adaptive SM-SA algorithm.

2. To establish the convergence proof for SM-SA under the trace minimization.
3. To extend the model from ARX to ARMAX.

4. To develop a soft~bound SM-SA algorithm.

APPENDICES

APPENDIX A

Inﬁnite Product

 

Deﬁnition A.1 Given an infinite product [1:11 um let p, = [12:1 uh. If no factor a,, is
zero, we say the product converges if there exists a number p 96 0 such that {pa} converges

to p. If {pa} converges to zero, we say the product diverges to zero.

 

The following theorem is given without proof [9]:

Theorem A.l Assume that each a,, 2 0. Then the product [[(1 — a,,) converges if, and

only if, the series 2a,, converges.

108

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat,
vol. 22, pp. 400-407, 1951.

[2] J. Blum, “Multidimensional stochastic approximation method,” Ann. Math. Stat., vol.
25, pp. 737-744, 1954.

[3] N.N. Krasnovski, “On the theory of controllability and observability of linear dynamic
systems,” Prikl. Mat. Mech., vol. 28, no. 1, 1964.

[4] RC. Schweppe, “Recursive state estimation: Unknown but bounded errors and system
inputs,” IEEE Trans. Automatic Control, vol. AC-13, pp. 22-28, February 1968.

[5] H.S. Witsenhausen, “Sets of possible states of linear systems given perturbed observa-
tion,” IEEE Trans. Automatic Control, vol. AC-13, pp. 556-568, October 1968.

[6] P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968.

[7] D.P. Bertsekas and LB. Rhodes, “Recursive state estimation for a set-membership
description of uncertainty,” IEEE Trans. Automatic Control, vol. AC-16, pp. 117-128,
April 1971.

[8] J .M. Mendel, Discrete Techniques of Parameter Estimation: The equation Error Far-
mulation, Marcel Dekker, New York, 1973.

[9] T.M. Apostol, Mathematical Analysis, Addison-Wesley, 1975.
[10] RB. Ash and M.F. Gardner, Topics in Stochastic Process, Academic Press, 1975.

[11] G.E. Forsythe and M.A. Malcolm, Compute Methods for Mathematical Computations,
Prentice-Hall, 1977.

[12] L. Ljung, “Analysis of Recursive Stochastic Algorithms,” IEEE' Trans. Automatic Con-
trol, vol. AC-22, pp. 551-575, October 1977.

[13] E. Fogel, “System identiﬁcation via membership set constraints with energy constrained
noise,” IEEE Trans. Automatic Control, vol. AC-24, pp. 752-758, October 1979.

[14] E. Fogel and Y.F. Huang, “On the value of information in system identiﬁcation -
bounded noise case,” Automatica, vol. 18, pp. 229-238, 1982.

[15] B.D.O. Anderson and C.R. Johnson, J r., “Exponential convergence of adaptive identi-
ﬁcation and control algorithms,” Automatica, vol. 18, pp. 1-13, January, 1982.

109

110

[16] L.Ljung and T. Séderstréim, Theory and Practice of Recursive Identiﬁcation, Cam-
bridge, MA: MIT press, 1983.

[17] R.R. Bitmead, “Persistence of excitation conditions and the convergence of adaptive
schemes,” IEEE Trans. on Information Theory, vol. IT-30, No. 2, pp. 183-191, March
1984.

[18] A. Papoulis, Probability, Random Variables and Stochastic Processes, (2nd edition),
McGraw-Hill, 1984

[19] G.C. Goodwin and KS. Sin, Adaptive Predition, Filtering, and Control, Prentice-Hall,
1984

[20] S. Dasgupta and Y.F. Huang, “Asymptotically convergent modiﬁed recursive least-
squares with data-dependent updating and forgetting factor for systems with bounded
noise,” IEEE Trans. on Information Theory, vol. IT-33, pp. 383-392, May 1987.

[21] T. Soderstrom and P. Stoica, System Identiﬁcation, Prentice-Hall, 1988
[22] K.J. Astriim and B. Wittenmark, Adaptive Control, Addison-Wesley, 1989

[23] J .P. Norton, “Recursive computation of inner bounds for the parameters of linear
model,” Int. J. Control, vol. 50, pp. 2423-2430, 1989.

[24] J .R. Deller, Jr. and S.F. Odeh, “Implementing the optimal bounding ellipsoid algorithm
on a fast processor,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89,
Glasgow, vol. 2, pp. 1067-1070, May 1989.

[25] J .R. Deller, Jr., “Set-membership identiﬁcation in digital signal processing,” IEEE
ASSP Magazine, vol. 6, pp. 4-22, October 1989.

[26] J .R. Deller, Jr. and T.C. Luk, “Linear prediction analysis of speech based on set-
membership theory,” Computer Speech and Language, vol. 3, pp. 301-327, October
1989.

[27] R.A. Decarlo, Linear Systems, Prentice-Hall, 1989

[28] S.F. Odeh Algorithms and Architectures for Adaptive Set Membership-Based Signal
Processing, Ph.D dissertation, Michigan State University, 1990.

[29] AK. Rao, Y.F. Huang, and S. Dasgupta, “ARMA parameter estimation using a novel
recursive estimation algorithm with selective updating,” IEEE Trans. on Acoust.,
Speech, and Signal Process, vol. 38, pp. 447-457, March 1990.

[30] J .P. Norton and S.H. Mo, “Parameter bounding for time-varying systems,” Mathemat-
ics and Computers in Simulation, vol. 32, pp. 527-534, 1990.

[31] M.F. Cheung 0n Optimal Algorithms for Parameter Set Estimation, Ph.D dissertation,
Ohio State University, 1991.

[32] M.F. Cheung, S. Yurkovich, and KM. Passino, “An optimal value ellipsoid algorithm
for parameter set estimation” Proc. of the 30th Conference on Decision and Control,
Brighton, pp. 969-974, December 1991.

111

[33] M. Krunz Numerical Stability and Convergence Issues in the SM- WRLS Algorithm,
MS. Thesis, Michigan State University, 1992.

[34] M. N ayeri, J .R. Deller, Jr., and M.M. Krunz, “Convergence and colored noise issues in
bounding ellipsoid identiﬁcation,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal
Process. ’92, San Francisco, vol. 5, pp. 337-340, March 1992.

[35] M. N ayeri, J .R. Deller, Jr., and MS. Liu, “Optimal bounding error algorithms with
adaptive tracking capability,” Proc. of the 26th Annual Asilomar Conference On Sig-
nals, Systems, and Computers, Paciﬁc Grove, October 1992.

[36] M. Nayeri, M.S. Liu, and J .R. Deller, Jr., “An interpretable and converging set mem-
bership algorithm,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’93,
Minneapolis, April, 1993.

[37] J .R. Deller, J .G. Proakis, and J .H.L. Hansen, Discrete- Time Processing of Speech Sig-
nals, New York, Macmillan, 1993.

[38] J .R. Deller, Jr., M. Nayeri, S.F. Odeh “Least squares identiﬁcation with error bounds
for real-time signal processing and control,” Proceeding of the IEEE, to appear.

[39] J .R. Deller, Jr., M. Nayeri, M.S. Liu “Unifying the landmark developments in OBE
processing,” Int. Journal Automatic Control and Signal Process., to appear.

HICH IES

llllllll

 

Lb.‘ I §ou4»--—-‘