.. .Ii‘ui
3'10".7r7v0l
I. ’o.-'. POIO!I)
v
I-A:P-.) .
.Il‘t‘... .IJ..3.§..€R v:
{1.01:2} ....z.....:..¢. >12!
...s..:.{:’).-; .2515 lrta
. 1:... {r1 .Jvc .
9.;
:1!!!
1.913.
a... 3:11.. ’15..
.34); 7..
2 2.3
. If}... - s1. 1‘
.5: .3 so...‘..3 f..."ul.¢-.|l . .
) otuolo’ 0.2.40. 0)::
a . Olv) .IO-A-Ilkh. of.
a:
.a.
1.. I .
All lat-it
. 3.1.0.! J .3 .1, .HA
.fiflnv...‘ I .
.ll. Mastur- x .t: .
A. 0"? ; ‘7 ..
l!!A‘.O.
31...: v
aI-X..v.
15.)
.61!
Fifi,” .
I! 1} ii; A
a All . I .
. . . mutton
3...... . £25..
. ”fitflfiris. .51.}... .
.311 ..I :1!" a 15.13:: .
5.14:} It: a: . lift! .
‘(v’ .
:..t.¥l.§a.}5
I. .Iktal‘l ’IJS. . . . .
3.1.3.2... . . . .
.<. 4. . . 1 .. IA.D‘.II:‘.A¢5!'II‘I
1.5.2.35}. {rift
.. 3...... : . r2. .ratezv
. . . .
nfflo
S 3...}: .23.. Y.
Ill...v):.:.5aa... 10G;
.fénihtu
40¢ .235. J . 1:...
..aia5val.. 6;. .ltiatftr..§l -.
f 35:39.... ..-..a .13, .lilalrzb...v.¢. . 1A.!)fia...i.
s0. . 001.11.. .10. I:I:‘Jn:.9..i‘ftlnlwurllsl:llti:
A c4.)!lv.ti.‘lxlé;§i¢n El... .luual .l .4 0.4.31):
3-: ’05:! If “91:3“ 8".” it;
. I‘ll-'l" IF
' r1 rttilL‘
llllllliillllIIHHllllllllllillllllilliillilll'lllllllllllll
31293 00885 9948
This is to certify that the
dissertation entitled
Development and Analysis of An Optimal
Bounding Ellipsoid Algorithm Using
Stochastic Approximation
presented by
Ming-Shou Liu
has been accepted towards fulfillment
of the requirements for
Electrical
Engineering
Ph.D.
degree in
Major professor
Date 8'&' 93
MS U i: an Affirmative Action/Equal Opportunity Institution
.n. .A._._....A,.._.A—-
M ~a—— -. -fl-
4 _.....-—.. -. ,
LIBRARY
Michigan State
1 University
PLACE iN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
DATE DUE DATE DUE DATE DUE
MSU Is An Affirmative Action/Equal Opportunity institution
chS—oj
———___
DEVELOPMENT AND ANALYSIS OF AN
OPTINIAL BOUNDING ELLIPSOID ALGORITHM
USING .
STOCHASTIC APPROXIIVIATION
By
Ming-Short Liu
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Electrical Engineering
1993
ABSTRACT
DEVELOPMENT AND ANALYSIS
OF AN .
OPTIMAL BOUNDIN G ELLIPSOID ALGORITHM
USING STOCHASTIC APPROXINIATION
By
Ming-Shea Liu
Set-membership-based (SM) techniques provide an alternative to conventional system iden-
tification algorithms. This research is concerned with a class of SM algorithms known as
the optimal bounding ellipsoid (OBE) algorithms which combines the weighted least square
error minimization approach with a bounded error constraint. More specifically, OBE al-
gorithms focus on finding an optimal hyper-ellipsoidal region for thefeasible set when the
additive model noise is bounded. Previous work has shown that OBE techniques have great
potential in many applications involving parametric modelling.
The published OBE algorithms, however, either do not include a formal convergence
proof or do not show interpretable geometrical meaning. In this research, a modified OBE
algorithm, which is both interpretable and convergent, is introduced. This algorithm is
referred to as set-membership stochastic approximation (SM-SA) algorithm because of its
resemblance to the stochastic approximation (SA) method developed by Robbins and Monro
in 1951.
Recently, a general class of OBE algorithms including all methods published to date, is
unified into a single framework called the unified OBE (UOBE) algorithm. UOBE is based
on generalized weighted recursive least squares in which very broad classes of “forgetting
factors” and data weights may be employed. Different instances of UOBE are distinguished
by their weighting policies and the criteria for determining optimal weight values. SM-SA is
a. special case of UOBE, and, through the analysis of this research and the paper by Deller,
Nayeri, and Krunz, the convergence properties of SM-SA, which will be introduced later,
can be applied to the UOBE framework.
In this dissertation, the updating criteria and optimal weights for shrinking feasible
sets for SM-SA are first analyzed. Then asymptotic properties and performance of SM-SA
are discussed. Simulation results for different SM algorithms and comparisons to conven-
tional recursive least squares are also examined. Finally, the UOBE formulation and its
convergence properties are introduced.
Copyright by
Ming-Shou Liu
August 3, 1993
To my Mother and my wife
For their Love, Support, and Sacrifice
ACKNOWLEDGMENTS
A deep debt is owed to my advisor Professor Majid Nayeri for his encouragement and
guidance, and many hours spent on behalf of my dissertation. His direction is very important
for the development and analysis of this research. Thanks are also given to my co-advisor,
Professor John R. Deller, Jr., for his time spent with me in helpful discussions.
I would like to express my gratitude to other members of my Ph.D. guidance committee:
Professor Clifford E. Weil and Professor Hassan K. Khalil for their help in reviewing my
dissertation.
I gratefully acknowledge the financial support provided by a grant from the National
Science Foundation and Office of Naval Research.
vi
Acronyms and Abbreviations
Auto-regressive with exogenous input (ARX)
Auto-regressive-moving-average with exogenous input (ARMAX)
Multiple-input—multiple-output (MIMO)
Optimal bounding ellipsoid (OBE)
Optimal volume ellipsoid (OVE)
Recursive least squares (RLS)
Set-membership (SM)
Set-membership stochastic approximation (SM-SA)
Set-membership weighted recursive least squares (SM-WRLS)
Single-input-single-output (SISO)
Stochastic approximation (SA)
Unified optimal bounding ellipsoid (UOBE)
Weighted recursive least squares (WRLS)
vii
TABLE OF CONTENTS
LIST OF TABLES x
LIST OF FIGURES xi
1 Introduction and Background 1
1.1 Real-Time Identification and the Recursive Least Square Estimator ..... 2
1.2 Introduction to Stochastic Approximation ................... 3
1.3 Introduction to Set-Membership Theory .................... 4
1.4 Review of Previous Work on OBE Algorithms ................. 6
1.5 Motivation for Developing the New Algorithm ................. 9
2 The Development of the SM-SA Algorithm 13
2.1 The Set-Membership Stochastic Approximation Algorithm .......... 13
2.2 Initial Values ................................... 20
2.3 The Optimal Weight ............................... 21
2.4 Data Selection .................................. 23
2.5 Relations Between SM-SA and SM—WRLS ................... 33
3 Convergence Issues 37
3.1 Fundamental Convergence Properties of SM-SA ................ 41
3.2 Convergence of Parameter Sets and Suboptimal Test ............. 47
3.2.1 Convergence of Parameter Sets ..................... 47
3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm ..... 50
3.3 Convergence of the Parameter Set with White Noise Case or Errors Distur-
bances ....................................... 52
3.3.1 IID White Noise Case .......................... 53
3.3.2 General White Noise Case ........................ 58
3.4 Independence and The Mixing Conditions ................... 72
4 Simulation Studies 79
4.1 Performance of SM-SA for Time-Invariant Systems .............. 80
4.1.1 SM-SA with Pseudo—Random Excitation ................ 81
4.1.2 AR Models with Binary Bernoulli Excitation . . . ' .......... 83
4.2 Performance of SM-SA for Time-Varying System ............... 85
4.3 Suboptimal Test ................................. 90
viii
4.3.1 AR Models with Pseudo—Random Excitation .............
4.3.2 AR Models with Binary Bernoulli Excitation .............
4.4 More Studies on Time-varying Systems .....................
5 The Convergence Properties of UOBE Algorithms
5.1 The UOBE Algorithm ..............................
5.2 Convergence Properties of UOBE ........ l ................
6 Conclusion
6.1 Concluding Remarks ...............................
6.1.1 Summary of Landmark Developments of SM-Based Techniques . . .
6.1.2 Contributions of this Research .....................
6.2 Future Work ...................................
A Infinite Product
BIBLIOGRAPHY
96
96
101
101
103
105
105
105
106
107
108
109
2.1
2.2
4.1
LIST OF TABLES
The data-selection and the performance of SM-WRLS and SM-SA for the
AR model (2.50) .................................. 35
Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a1 = 1.6
and a2 = —0.68 ................................... 36
Noise processes used in the simulations in this chapter ............. 80
1.1
1.2
2.1
2.2
2.3
2.4
2.5
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
LIST OF FIGURES
Relationship between C(11) and 9(a) for the two dimensional-case. .....
The weights become very large when more data are used. ..........
Plots of V(/\,,) when cn > 0 and Gn < 1 .....................
When G7, > 1 and c,, > 0, if there exists one extremum of V(/\,,) in (0,1),
then there exist more than two extrema .....................
When cn = 0, the the only two possibilities for V(z\,,) in (0,1) are: (a) There
is a saddle point, or (b) no extrema and saddle points exist ..........
When c,, < 0, if more than one extremum appear in (0, 1), then there must
be at least three extrema. ............................
The relationship between logarithms of time and weight ...........
(a) If Evin—13%.. > 0 and ”m" in shaded area, then a contradiction results.
(b) If 6,7,,“an < 0 and v,,,,, in shaded area, then a contradiction results. .
Sample correlation coefficients of {wn} in the example on page 63 up to time
lag40 .........
Sample correlation coefficients of {vn} in the example on page 63 up to time
lag 40 ........................................
Estimates of the parameter (1;, when the system is driven by (a) an Lid.
white noise (b) a white noise but not i.i.d.. ..................
Distribution of {mu} and {12"} at each amplitude interval. ..........
The logarithmic plot of volume versus the actual time n. We see that the
volume of ellipsoid is much smaller when the noise is i.i.d ...........
Distribution of a white noise at each amplitude interval after passing {vn}
through a soft-limiter. ..............................
Sample correlation coefficients of {vn} after passing through a soft-limiter. .
Estimates of the parameter a3 when 0,, is passed through a soft-limiter.
Distribution of a white noise at each amplitude interval after multiplying 12,,
by two and then passing it through a soft-limiter ................
Estimates of the parameter a3 when 12,, is multiplied by two and then passed
through a soft-limiter. ..............................
Sample correlation coefficients of {vn} up to time lag 40 ............
Estimates of a3 by SM-SA and RLS when the noise is color ..........
27
29
29
30
34
56
62
62
64
65
66
66
67
67
68
68
69
70
3.14
3.15
3.16
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
Distribution of a color noise at each amplitude interval. ...........
Sample correlation coefficients of a noise sequence obtained by passing {0”}
through a soft-limiter. ..............................
Estimates of the parameter a3 when the color noise {vn} is passed through
a soft-limiter. ...................................
The logarithmic plot of 11,, versus time n for thelinear time-invariant AR(3)
model with pseudo-random excitation. .....................
The logarithmic plot of Isnl versus time n for the linear time-invariant AR(3)
model with pseudo random excitation. .....................
First 1000 estimates of the parameters in the AR(3) model obtained using
SM-SA and a pseudo-random excitation, where a1 = 2.1, a; = —1.48, and
a3 = 0.34. .....................................
The relationship between logarithms of time and weight ............
The estimates of model parameters by RLS and SM-SA for biased noise
sequence. .....................................
The logarithmic plot of pa versus time 11 when a binary Bernoulli sequence
is used. ......................................
First 30,000 estimates obtained using SM-SA and RLS when a binary
Bernoulli sequence is used .............................
First 30,000 estimates obtained using SM-SA and RLS when a biased binary
Bernoulli sequence is used .............................
The acoustic waveform of the word “four” ....................
The “true” parameters for the word “four”: (a) b1(n) (b) bg(n) ........
Estimates of b1(n) for the word “four” using (a) SM-SA (b) RLS .......
Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS .......
The logarithmic plot of 11,, versus time n for the AR(3) model when the
suboptimal test is used. .............................
The logarithmic plot of Isnl versus time n for the AR(3) model ........
First 2000 estimates of the parameters using SM-SA, where a1 = 2.1, a; =
—1.48, and a3 = 0.34. ..............................
The relationship between logarithms of time and weight ............
Estimates of parameter using RLS and SM-SA for the suboptimal test. . . .
First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary
Bernoulli sequence is used and suboptimal test is employed. .........
First 30,000 estimates obtained by SM-SA and RLS when a bias binary
Bernoulli sequence is used and suboptimal test is employed ..........
Estimates of (a) b1(n) (b) bg(n) by using SM-SA algorithm for the word
“four” when noise bounds are over-estimated ..................
'71
81
82
82
83
84
85
86
87
88
89
91
92
93
93
94
94
95
97
98
100
Chapter 1
Introduction and Background
“Man creates models of his natural and man-made environments to understand and explain
them better and as prelude to action” [8]. Indeed, from social science to engineering and
natural science, modeling plays an important roll to help humankind describe functions of
processes and to predict future actions. For example, in economics, modeling is used to
study inflation, to predict the trends of stock markets, and to maximize profits on invest-
ments. In engineering applications, modeling can be used to analyze the characteristics
of a process, optimize the performance, and study the functions and dynamics before a
large plant is actually built and large amounts of money and effort are invested. In many
branches of science, modeling is used as an aid to describe and understand reality.
The term “system identification” generally refers to the determination of a mathematical
model for a system or a process by observing its input-output relationships. Based on the
model structures that are used, system identification methods are generally classified into
two categories: Nonparametric and parametric. A nonparametric model is described by
a curve, function, or table. Impulse responses and frequency diagrams are two examples
of a nonparametric model. In this dissertation, we will consider the parametric model
only. Such models are characterized by a parametric vector. Typical parametric models are
auto-regressive with exogenous input (ARX), auto—mgmssive-moving-average with exogenous
input (ARMAX) [16], and state-space structures.
For modeling engineering systems and industrial processes in practice, some basic char-
acteristics of the system, such as linearity and stability, are assumed to be known. Further-
more, in many cases, we assume to have basic knowledge about the mathematical structure
of the system such as the order of the system, the type of the model, and so on, so that it
is possible to derive a specific mathematical model of the system dynamics. Consequently,
only a set of parameters in the model equation are left to be determined. In this research,
we derive a new algorithm for identifying model parameters which has well-analyzed con—
vergence properties and an intuitively meaningful optimization criterion. This algorithm is
applicable when the system is corrupted by additive and bounded noise.
1.1 Real-Time Identification and the Recursive Least
Square Estimator
In classical system design, modeling is accomplished once during the design stage. This is
called off-line identification. In most adaptive systems, on-line or recursive identification
is desirable to allow optimum adaptation in the face of environmental uncertainty and
changing environmental conditions.
From computational point of view, the memory requirement for recursive estimators is
quite modest since not all the data are stored. In off-line identification, however, all the
previous data have to be memorized. I
A variety of well-known recursive techniques such as least square error (LSE), maximum
likelihood, instrumental variables [16], and stochastic approximation [1] have been used to
identify system parameters in real time. Among these techniques, the LSE method is most
popular because of its simplicity and well-understood criteria.
To review the LSE method, consider a simple linear regression model
y(t) = ITO)"
where x(t) is an n-vector of known quantities and 0 is an n-vector of unknown parameters.
The LSE estimate of 0 at time n is obtained by minimizing the cost function
u
v<é> = ,2 (ya) -x"'é)”
t=1
Recursive least squares (RLS) [16] estimates are obtained through an on-line iterative com-
putation of the off-line LSE estimates. Weighted recursive least square (WRLS) is a variation
of RLS which minimizes
me) = g :1 w, (y(t) — xT6)2
where {wn} is a weighting sequence.
1.2 Introduction to Stochastic Approximation
Stochastic approximation algorithm is concerned with the problem of estimating the pa-
rameters of a linear system by using the noisy observations.
Let f : 32’ -+ it” be an unknown function with a unique zero a, i.e.
f(8) = 0 (1-1)
We assume that f () can be observed at any x 6 3V. Stochastic approximation (SA) refers
to a general class of recursive algorithms for estimating an equation of the form (1.1) when
the observation f () is corrupted by random noise.
The origins of SA can be traced back to a paper by Robbins and Monro [1] in 1951.
This paper is concerned with the problem of estimating the roots, say a, of an unknown
regression function f (x), which is the mean value of an observable random variable Z(x).
Robbins and Monro prove that a can be estimated recursively by
a'n-H = a1! + an(é,,) (1'2)
where l,, is a sequence of positive scalars that converges to zero. The reason why {1"}
must converge to zero is because convergence of a}, can take place only if the product
l,,Z(a,,) = 0. Since Z (5,.) 7’: 0 with probability one, 1,, must be zero. This ensures that the
noise is rejected by paying less and less attention to the noisy observations.
Robbins and Monro state the sufficient conditions for convergence of the estimate a to
the true parameter in probability [1]:
1,, z 0
2:33:11, = 00 (1-3)
;',‘_’___1(l,,)2 < 00.
Any sequence {In}?i__o which satisfies (1.3) is called a sequence of type 1/n. A common
choice for the sequence {la} is 1,, = 1 / n. Ljung [12] further weakened the second condition
in (1.3) to
00
EU")? < 00
11:1
for some p > 1. Blum [2] (1954) extended Robbins and Monro’s theorem to the multidi-
mensional case.
In the next two sections we will briefly review the development of “bounding ellipsoid”
algorithms and investigate the problems associated with the previous work on these meth-
ods. We will then demonstrate that these problems can be overcome by an SA formulation.
1.3 Introduction to Set-Membership Theory
Set-membership (SM) identification refers to a class of recursive techniques for estimating
parameters of linear systems or signal models under a prior information that constrains
the solutions to certain sets [25]. In contrast to conventional system identification schemes
(e.g. maximum likelihood, LSE, etc.) which yield point estimates of the parameters, an SM
algorithm yields sets of parameter estimates which are compatible with noise constraints.
These sets are generally called “membership sets.” Based on certain set-theoretic checking
criteria, SM algorithms select and use only the useful data to update the parameter estimates
and refine the membership set. When data do not help refine these membership sets, the
effort of updating the parameters can be avoided.
SM theory was pioneered by Krasovski [3], Schweppe [4], Witsenhausen [5], and Bert-
sekas and Rhodes [7] in the late 19603, in the control and system science domain for estimat-
ing the state of a linear dynamic system. The optimum estimate is the smallest time—varying
ellipsoid which contains the unknown system states.
A major appeal of parameter set estimation is that it does not rely on the availability of
a probabilistic description of the uncertainty affecting the model. It simply delineates the
range of the parameter space yielding model-output values within specified ranges about
the observed output samples. 0n the other hand, since the estimates are updated only
when the input data contain sufficient information, some SM algorithms are more efficient
in some applications.
In recent years, SM-based signal processing is receiving considerable attention and is
becoming increasingly popular around the world as is evident from the extensive literature
on this topic [14, 20, 23, 26, 29, 31, 36, 39]. This research is restricted to the class of
algorithms known as optimal bounding ellipsoid (OBE) algorithms which follow a bounded
error constraint.
The original OBE algorithm is attributable to Fogel and Huang [14] in 1982. Let us refer
to this algorithm as “F-H/OBE.” Deller [25] and Deller and Luk [26] have reformulated the
F-H/OBE algorithm so that the center of the hyper-ellipsoid is exactly the conventional
WRLS solution. This algorithm is referred to as set-membership weighted recursive least
square (SM-WRLS) algorithm in order to distinguish it from the F-H/OBE algorithm. The
main deficiency of these algorithms is the lack of well-understood convergence properties.
This problem led to the development of the modified OBE algorithm by Dasgupta and
Huang in 1987. We will refer to this algorithm by “D-H/OBE.”
Recently a general class of OBE algorithms, including all methods published to date
as well as the algorithm presented in this work, is unified into a single framework called
the unified OBE (UOBE) algorithm [38, 39]. Historically, this research on the “SM-SA”
(presented in this thesis) algorithm, and that of Krunz on the “Dual-SM-WRLS” algorithm
[33, 34], precedes and contributed to the development of UOBE. We shall later discuss the
fact that SM-SA is an important special case of UOBE and show how the many of the
results of this work apply to all algorithms in broader UOBE formulation in Chapter 5.
Another OBE—like technique called optimal volume ellipsoid (OVE) algorithm was pro-
posed more recently [31, 32] by Cheung, Yurkovich, and Passino to estimate parameter sets.
It is proven that the algorithm is truly optimal in the sense that the volume of ellipsoid is
smallest at each recursive computation. Simulation studies in [31, 32] compare the perfor-
mance of OVE and F-H/OBE. They show that OVE gives tighter parameter bounds. The
estimates given by OVE, however, do not show significant improvement over the estimates
given by F-H/OBE.
1.4 Review of Previous Work on OBE Algorithms
In principle, F-H/OBE and SM-WRLS are the same in spirit. However, SM-WRLS has a
structure which is simpler and more elegant than the F-H/OBE because of the choice of the
weighting sequence (see [38, 39]). Thus we begin with a review of SM-WRLS': similar results
on F-H/OBE and D-H/OBE are found in [38, 39]. Note that the theoretical development
of SM-WRLS can handle a complex vector, multiple-input—multiple-output (MIMO) model,
we consider only real scalars here for simplicity [28]. Let
P 9
3’11 2 Zfiyn-i + Zgjun-j "l’ ”11 (1'4)
i=1 j=0
: 9.1.x” + v,, (1.5)
where O“ = [fl, . . .,f,,,go, . . .,g,,], x: = [yn_1, . . ., ,,_p,u,,,u,,-1, . . .,u,,..,,], y" and u,, are
real measurable outputs and inputs, respectively, and v,, is an unknown noise process. Let
m = p + q + 1. We assume that m > 1 and, for each time n, v,, is known to be bounded
according to
113, < 7,, (1.6)
where {7“} is a known positive sequence.
Let w(n) be a parameter set at time n such that all 9 e w(n) are feasible parameter
estimates consistent with the error bounding in (1.6). In conjunction with the model of
‘Generalized results for the more general weighting policies of UOBE are found in [38, 39] to which the
present research contributed.
form (1.5), w(n), which is a “hyper-strip” region, can be expressed as
w(n) = {0 : 0 E 52ma(‘li’n " aTxn)2 S 7n}
Intersection of these sets over a given time range t E [1, 11] form convex polytopes of feasible
parameters
fl
52(1).) = n w(i). (1.7)
i=1
In general, (2(a) is an irregular convex set that is difficult to describe and track analytically.
In conjunction with the WRLS processing, however, (2(a) can be shown to be contained in
a hyper-ellipsoid superset Il(n) [25]
Il(n) = [a : (o — 9,)TC(")(0 — 0,.) g 1}
Nu
where 0(a) is the weighted covariance matrix,
C(n) = C(n — 1) + Anxnxf (1.8)
5,, is a scalar quantity,
n 2
5,, = 0£C(n)0,, + z: /\,,7,, ( — {1)
i=1 n
and G,,, the center of 5(a), is the WRLS estimate at time 11 using the weights {A3321 and
can be computed recursively using
lll>
C(n) P'l(n)
G,, = x£P(n—l)x,,
5,, = y» - 93.11:"
AnP(n - 1)x,,x3,'P(n — 1)
1 + AnGn
P(n) = P(n — 1) —
0,, = 0,,_1+’\,,P(n)x,,e,,
Anef,
1 + AnGn
Kn = 511—1 + A1177; —
The relationship between fi(n) and 9(n) for two dimensional case is shown in Fig. 1.1.
5‘2 (2)
(0(3)
Figure 1.1. Relationship between C(11) and Q(n) for the two dimensional-case.
The SM-WRLS algorithm begins with a large ellipsoid, 37(0), which contains all admis-
sible values of the model parameter vectors. The objective is to employ the weights An to
sequentially minimize the volume of the ellipsoid defined by
mm» 2 det{~.c-‘(n)}
It has been shown that the optimal weight A; is the unique positive root of the quadratic
equation [14, 38]
a,,AE, + b,,A, + a,, = 0 (1.9)
where
an 2 (m — 1)G:7n
b,, = 2mG,,7,, — n,,-1G,2, — G,,7,, + 63,0“ (1-10)
2-
n
cu = m7" — me M40...
When the roots of equation (1.9) are both negative, the data do not contain useful infor-
mation and hence updating can be avoided. Since a,, is always positive, if c,, is negative
then there exists a unique positive root for (1.9). When c,, _>_ 0, b,, is nonnegative because
according to (1.10)
‘ b2.“ = (2m — 1)G,,7,, + 6,2,G,, — x,,_le,
17’? 2 mGnn + 5310,; — ten-10?.
= G,,(m7,, + £3, — Kn-1G,,)
Z G,,(m'yn - me: — 5,,_1G,,)
= 0,612,} o.
6..
Therefore both roots are negative. So there is no optimal weight at this time and updating
can be avoided. Thus the estimate: are updated if and only if a,, < 0. The sign of a,, serves
as an indication for updating. Similar result for trace minimization in which optimization
is applied to flg{§(fl)} .9— tr{n,,C"1(n)} can be found in [14, 38, 39].
1.5 Motivation for Developing the New Algorithm
Although the SM-WRLS algorithm and related algorithms have been applied in many areas,
it is diffith to formally analyze the convergence properties. From simulation results, we
have found that with F-H/OBE and SM-WRLS the optimal weight A; might drift to infinity
as more data are used. The reason for this anamoly is as follows: The elements of the
10
weighted covariance matrix C(n) in (1.8) may become very large when more data are used.
So, relatively speaking, the weight A; in (1.8) must be large enough in order for x,, to make
a difference and to effectively shrink the volume of ellipsoid. To illustrate, we consider an
AR(2) system given by
y,, = 1.6yn-I - 0.6811,.-2 + 11”. (1.11)
11,, is a colored noise sequence generated by passing a white sequence through a Butterworth
bandpass filter of order 14 with passband edge frequencies at w = 1.75 rad and 2.75 rad.
The passband attenuation is -1dB and stopband attenuation is -60dB. This system is stable
since the poles are located at —0.8 :l: 0.2 3' in the z-plane. Figure 1.2 shows that the gradual
growth of the weights as more data are used. In this situation, even if the estimate 0,,
approaches the true parameter asymptotically, the large weights still make this algorithm
difficult to implement on finite word-length machines. This is just an example in which the
optimal weight may diverge. In order to deal with the convergence issues, Dasgupta and
Huang introduced the D-H/OBE algorithm [20] in 1987. In this algorithm, the updated
covariance matrix is a convex combination of C(n — 1) and the new data outer product,
C(n) = (1 — A;)C(n — 1) + A;x,x£.
The number (1 — A;) is referred to as a “forgetting factor” by Dasgupta and Huang. In a
sense, this D-H/OBE represents significant progress in OBE theory because it involves a
formal convergence proof for the weights A} However, the optimal weight A; in D-H/OBE
is obtained by minimizing x,, at each n. Since the magnitude of x,, is not clearly related
to the size of the membership sets (see [30, 38, 39]), this makes the D-H/OBE algorithm
difficult to interpret geometrically.
Therefore, the objectives of this research are:
e To develop a new algorithm to remedy the trade-off between interpretability in weight-
ing strategy and convergence issues.
0 To provide a convergence proof for the new algorithm.
11
1013 A u 1 r T l I I I I
II III!“
[fill
1010
I lllllIII
I llllllll
107
I It'll"
l llllllll
104
l l l [111“
101
l l 1111“]
10-2 . 1 . . ‘ .
010002000300040005000600070008000900010000
11
Figure 1.2. The weights become very large when more data are used.
12
c To perform simulations to study and compare the performance of the new algorithm
with that of existing methods.
In the next chapter, the new algorithm “set-membership stochastic approximation” (SM-
SA) is derived for an ARX model with bounded noise. SM-SA is developed in a systematic
way to accommodate bounded weights and covariance matrix. The relationship of SM-SA to
the stochastic approximation approach is investigated. If the weights of SM-SA are chosen
to be 1 / n, then SM-SA is exactly the RLS estimator.
In Chapter 3, the convergence properties of SM-SA are discussed. First we show that
the sequence of optimal weights {A‘,",} converges to zero as n approaches infinity. The
convergence of the weights to zero is important because it is a good indication that the
estimates converge. This is so since we pay less attention to noisy observations when A;
is close to zero. The main point in this chapter is the convergence of the ellipsoid and its
center, that serves as the estimate of the SM-SA. For RLS, the performance of the estimator
is highly dependent on the statistics of the noise sequence. In SM-SA we will see that the
dependency is much less than that of RLS.
Simulation studies are presented in Chapter 4. In light of the theory and analysis
established in the previous two chapters, we will examine the performance of SM-SA excited
by different noise sequences.
The UOBE framework is introduced in Chapter 5. We will show that the convergence
properties of SM-SA can be applied to the more general weighting policies of UOBE.
Chapter 6 lists the conclusions of this research. We will also suggest some possible
directions for future study.
Chapter 2
The Development of the SM-SA
Algorithm
In the previous chapter, we have observed that in SM-WRLS, the weights A; may become
very large, particularly for large 11, leading to huge numbers in computations and numerical
instabilities. In this work, we will show a feature that tends to promote convergence of the
ellipsoid by preventing the “drift” of the covariance matrix toward infinity. In this chapter,
we will start deriving the SM-SA algorithm by modifying SM-WRLS so that the covariance
matrix is normalized to the sum of the weights. We will develop the optimality test based
on the minimum volume criterion and find the similarities between SM-SA and SM-WRLS.
2.1 The Set-Membership Stochastic Approximation Algo-
rithm
We begin with an ARX model given in (1.5) and let the noise bound be the same as in
(1.6). For convenience, we rewrite the system here:
9 q T
yn = Zflyn—i + zgjun-j "l' ”11 = a. xn + ”a (2.1)
i=1 j=0
In an unweighted RLS algorithm, the problem of numerically unstable quantities can be
eliminated by normalizing the covariance matrix to the time n, that is, by replacing C(n)
by (1 /n)C(n). In principle, if the sequence {x,,} represents a stationary stochastic process
13
14
with appropriate ergodicity properties, then (1/n)C(n) will tend to E [x,,xZ]. Clearly,
however, this strategy may not work for weights determined by SM considerations as C(n)
may grow much faster than n. In the SM-SA approach, SM-WRLS is modified so that the
covariance matrix is normalized to the sum of the weights.
Let AfM be the optimal weight of SM-WRLS at time n. Define
2 2),”! = 2,., + AS“, 20 2 o
t=1
and
Mn) §;C(n)
= fl; [C(n — 1) + Afonxz']
= 2i [2,,_1R(n — 1) + AfonXZ]
En— SM
_ 1 n T
' 2,, R(n 2,,
If we define the normalized weight A,, by
A3”
A,, — En , (2.2)
then
2,,_, _ AfM _
2n — l — n — 1 — A,,
Thus the normalized covariance matrix R(n) can be written as
R(n) = (1 — A,,)R(n — 1) + A,,xnxf (2.3)
where A,, 6 [0,1]. Remarkably, this approach in which R(n) is a convex combination of past
normal matrix and incoming puter product, results in the same “covariance” matrix and re-
cursive formula as those in D- li/OBE. However, SM- SA has been developed systematically
from SM-WRLS expressly to connect the gap between the convergence issues and geomet-
15
rical meaning in previous work. This philosophy is quite different from the development of
D-H/OBE in which geometrical interpretability is sacrificed.
Replacing A,, for ASM leads to the modification of the SM-WRLS original recursions.
’ \
1)!
The description of the corresponding ellipsoid is given in the following theorem.
W.
Theorem 2.1 The hyper-ellipsoid region corresponding to the .S'M—SA algorithm is given
by
r(n)_{0|(9 a,,) (o 0,.)<1} (2.4)
where x,, is the quantity
n. = f: ( f1 (1 — A») Am.- — y?) + ofmnw... (2.5)
i=1 j=i+1
G,,, which is the weighted LSE estimate at time 11 using weights {A,,}, can be computed
recursively using the relations
P(n) é R-1(n) (2.6)
G,, = x£P(n — 1):... (2.7)
5,, = y,, - 95411,, (2.8)
__ 1 A,,P(n — l)x,,x,7,'P(n - 1)
P(n) _ 1 _ 1,, P(n— 1) — 1 _ A” + 1,, G" (2.9)
0,, = a,,.l + A,,P(n)x,,e,, (2.10)
A,,(l — A,,)ef,
x,, = (1 — A,,)rc,,_1 + A,,7,, — (2.11)
1—A,,+A,,G,,'
Proof: From the bounding condition 11,2, 5 7n, the accumulated inequality holds:
n 71
2v? S 271-
i=l i=1
16
That is,
Z(l/i - O'sz‘)2 S 27;.
i=1 i=1
The relationship remains true if the individual constraints are first multiplied by any non-
negative quantity. Thus, we have
2":( fi (1 - Ail) A;(y.- - 013(1)2 S i( ii (1 — Ag) A171 (2.12)
i=1 j=i+1 i=1 j=i+1
where
n
A
H (-) = 1.
j=n+l
Therefore, given a sequence of weights A;, (2.12) represents a global membership set, say
I‘(n), to which 0" must belong. Since 9(n) in (1.7) is the smallest known set, it must be
true that 9(n) C [(11). Let usfldeinote any general element of I‘(n) by 9. Upon expanding
,r/j ,...
the square, we then have that
z": ( fl (1 " A1)) Ail/i2 - 22”:( fi (1 - Afl) [\gygaTx,
i=1 j=i+1 i=1 j=i+l
+ E“: ( fl (1 — A19) MOTXiXiro S i( ii (1 — A,)) Am.
i=1 j=i+1 i=1 j=i+l
The following quadratic inequality in 0 emerges after rearranging the terms
: ( fl (1 - A5))":11'2- 291‘: ( f1 (1 - M) Aim-X.-
i=1 '=i+1 i=1 j=i+1
i=1 j=i+1 i=1 j=i+1
+9T [2“: ( f1 (1 ’ A12) Amati] 9 S i ( fi (1 - A19) Am (2-13)
For the WRLS solution, we have [21]
a, = [i( fl (1 — m) 1,11,11,14 [f3( f1 (1 — A,)) Asa-w] . (2.14)
i=1 j=i+1 i=1 j=i+l
17
Thus
[2": ( f1 (1 - ’21)) Agxpcg] 9n = Z": ( n (1 - A») Aixiyi- I (2.15)
i=1 j=i+1 i=1 j=i+l
Substituting (2.15) into (2.13), we have
f:( fi (1 - 1,)) 1,3,3 — 297' [i( 1"] (1 — A») A,x.-x,T] 0,,
i=1 j=i+l i=1 j=i+1
n n n n
+ 9T 2 H (I — Aj) Agxgx? 9 S 2 H (1 - Aj) Ag‘yg. (2.16)
i=1 j=i+1 i=1 j=i+1
For an off-line estimator, the covariance R(n) can be computed as
i=1 j=i+l
R(n) = i ( fi (1 - Afl) A;X;X,T.
Thus
i=1 j=i+1 i=1 '=i+1
in: ( fi (1 — A») My? — 29111009" + 9TR(")9 S i ( fi (1 - Ad) Am.
Since R(n) is symmetric, this can be written as
n
2”: ( fl (1 " Ail) A£313,449"9n)TR(n)(9"'97:)“911100911 S in: ( H (1 — *0) Am-
i=1 j=i+1 i=1 j=i+1
(2.17)
Since 14,, is given by
Na = i( fl (1 - A1)) A{(71 — y?) + 95110091"
i=1 j=i+1
(2.17) is reduced to
(a — 9,)TR(n)(a — a,,) g x,,,
and (2.4) and (2.5) are established.
18
The matrix inversion lemma [16]:
(A + BCD)“ = A-1 — 11-13(0-1 + DA-lBrlDA"1
in which A and C are nonsingular matrices, and B and D are matrices with appropriate
dimensions, can be used in (2.3) to obtain
(1 -— A,,)“1P(n — 1)x,,x£(1 - A,,)‘1P(n — 1).
P = 1—A,,‘1P —1— 2.18
(n) ( ) (n ) A;1 + x£(1 — A,,)‘1P(n — 1)x,, ( )
Replacing x£P(n — 1)x,, by C(n) in (2.18) yields
1 _ (1 — A,,)‘1P(n — 1)x,,xTP(n — 1)
P = P - 1 — n
(n) 1 — A,, _ (n ) A? + (1 - A,,)-1G,,
r __ T _
= 1 P(n— 1) _ A,,P(n 1)x,,x,,P(n 1) .
1—A,,_ l—A,,+A,,G,,
For the WRLS estimate G,,, we have
a. = Pm [)3 ( H (1 — A») Ann-ye]
i=1 j=i+1
_ 1 ' A,,P(n — 1)x,,x£P(n - 1)“ " ' n ,
_ 14” P(n—1)— 1—A..+A..G.. Z; _1'1(1—1,) A,x,y,
. . i— J—i+1
_ 1 ' A,,P(n — 1)x,x£P(n — 1)“
" 1—.\,. _P("‘1)‘ 1—A,,+A,,G,, J
n—1 n-1
' (1 - A102 H (1 - *1) MHz/1+ Anxnyn
i=1 j=i+1
A,,P(n - 1)x,,xT0,,_1 1 A2P(n — 1)x,,G,,y,,
= G,,- - n — n - n n - n
1 1—A,,+A,,G,, +1—A,[’\ P(“ my 1—A,,+A,,G,,
_ 0 _ A.P(n — 1)x.x£0.-1 + A.P(n - 1)x.y. (1 1.6.. )
' "‘1 1—.\,,+.\,G,, 1—.\,. 1—A,,+A,,G,,
__ 0 _ AnP(n - l)xnx£0n_1 A,,P(n — 1)x,,y,,
‘ "‘1 1 — A,, + A,,Gn 1 — A,, + 1,0,,
= 911-1 + A,,P(n — 1)x,,£,,
1-A,,+A,,G,,'
19
Note that
1 A,,P(n — 1)x,,G,,
P(n)x,, _ 1 __ 1,, [P(n— 1)x,, — 1 __ 1,, + (“an
_ P(n — 1)x,, (1 _ A,,Gn )
— 1—.\,, 1-A,,+A,,G,,
_ P(n — 1)x,, -
" 1 - A,, + A,,Gn
Therefore
0,, = 0....1 + A,,P(n)x,,e,,.
To show (2.11), consider
Kn
Thus the proof is complete.
n-1 n-1
(1 — An) 2 ( H (1 "' Ail) )‘i(7i ’ Ila?) + An(7n — 31721)
i=1 '=i+1
+(93i—1 + AanP(n)6n)R(n)(0n_1 + AnP(n)Xn€n)
n-1 n-1
(1 - A») Z ( H (1 - 15)) 11(71- 1!?) + Am. - 113.)
J=i+1
+0£_1R(n)0n_1 + 2A,,x,7,' 0,,_1 5,, + Aix£P(n)x,,ef,
n—1 n—1
(1 - An) 2 ( H (1 " A») A471“ 3112) + An(7n - 11:) + 31193—139167:
i=1 j=i+l
i=1 '
P in — 1 x,,6,2,
+(1 — An)0£-1R(n - 1)0'1-1 + An0£_lxnx£0n-l + Aile —( A" +)’\nGn
(1 — An)Kn—1 + 1.17. — 143.) + mien-1)” + 21.19? 92-15" + "i 1 — 13:53.0.
(1 — A,,)nn_1 + Arm. — A. [112 - 2(XZOn-1)yn + (’91.' 911-02] + A311 _ AffanGn
(1 — An)"n—1 + A,,), - 1,5,2, + 13.1 _ Affinan
(1 - Anya-1 + A1177; — 1A:(:n—+’\;:§n '
frf‘
20
2.2 Initial Values
The covariance matrix in SM-SA is
i=1 j=i+1
Rm= :(fiu-n)ms
In order to start the recursive algorithm, it is necessary to choose initial values for 0,, and
P(n). If we incorporate the initial condition into the covariance matrix, then it will become
R(n): 1210—1.) no+2nj(1'[ (1—1,-))1.T.-x,-x. (2.19)
i=1 i=1 j=i+1
Note that [H?__.1(1 — A,-)] R0 will not be zero if R0 is chosen to be positive definitive.
Generally speaking, P(O) = R61 reflects the confidence in the initial estimates 00. If
E- ..
P(O) is large, from (2.10) we see that the next estimates, 0,, will jump away from its initial
value. Without any a prior information about 00, it is common to take
00 = 0 and P(O) = 0'1.
where a is a “large” number, typically 106 so that the impact of the choices of initial values
is small. From (2.19) we can also see that this choice will ensure the estimates to be close
to the off-line LSE estimate.
Note that given any initial condition P(O), if
n
“11:30 11(1 — 1,) = o (2.20)
i=1
then the recursive estimates will approach to the off-line estimates asymptotically. Using
material in Appendix A, (2.20) implies that
iA;=oo
n=1
From theory of stochastic approximation, as discussed in Chapter 1, this is an essential
21
condition for the convergence of estimates to the true parameters.
2.3 The Optimal Weight
The optimal weight A; of SM-SA is found by using the minimization criterion 1 introduced
in Chapter 1. That is, A; minimizes
1108101)} 3 det{x,,R'l(n)}.
Let B(n) é x,,R'1(n). Then by definition,
B(n) = x,,P(n)
x,, A,,x,,x£B(n — 1)
' B(n ' 1)(1 — 1,.)1r;,,_l ’ n,,-1(l — 1.. + AnGn) '
Define
h,, 2 1 — 1,, + 1,,G,,, (2.21)
and
A 1 m. An‘rn Ans“
dn = _ _ 1 + — —-"—.. 2.22
I — A,, x,,_1 (1 - A11)1“1ri-1 "n-l hn ( )
Then
B(n): B(n -- 1) {d,, I — d fl[B(A"x--——h—:n — 1)x,,]T}
,,_
and
d ,,A,,x,,
det B(n) = det B(n — 1)det {d,,I — Kn ——[B(n - 1)anT}
Since B(n — 1) is a known quantity at time n, it suffices to minimize the volume ratio V(A,,)
given by
A det B(n)
det B(n— 1).
V(1,,)=
Thus
d ,,A,,x,,
V(A,,) = det {11,1 — n. —’:;[B(n — 1)x,,]T}. (2.23)
22
Using the matrix identity [26]
det(cI + sz) = c'"“‘(c + yTz) (2.24)
and letting
d A x 4
c=d,,, =—-1—n—n-, z=B n—l x,,,
y 81-1 h.. ( )
then (2.23) becomes
V(’\n) = dill-l (dn Kn—lhn
( A,,xznfl-1P(n - 1)x,,)
1 _
Kn-lhn
(Kn—1hr: - AnKn-IGn)
Kn-lhn
(11,, - A,,G,,)
h
d,,A,,x,7,'B(n -— 1)x,,)
u
11”:
ll
:1”:
a“; a”?
To minimize the volume ratio with respect to A,,, we must differentiate (2.25) and set the
result to zero:
0 _ 6 _,
a 1. m.) — K [m1 - 1.1». ]
dart-I
= Fug—ha (2.26)
where
0d,,
P(An) = (1 — A,,)mhnai— - (1,107,. (2.27)
Since
6d,, _ 1 x,, 1 0 x,,
a—A: — (I — A,,)2 n,,_1 + (I - A") 6A,, n,,_1 (228)
1 1 7,. _ 53,[1 — 1,, — 1,11,]
—d.. + —— -1 + ,
(1 '- An) (1 '- An.) Kit-I hi’cn—l
23
(2.27) becomes
m1.) = mh,, [d,, — 1 + “:1 — 531“ 22;":1*"h")] — 1.0... (2.29)
The Optimal weight A; is found by setting F (A,,) = 0. Thus
mh,, [d,, — 1 + “:1 - €:(1—113:\:..:1A;hn)] = d,,Gn. (2.30)
Expanding the h,, and d,,, we have a quadratic equation
a,,13, + (1,1,, + c, = 0 (2.31)
where
an = mm. - "163, + mem. — 2mGn7n - 161—10.. + ten—1G,”, + Gn7n - 0.2.7.1 - 53,0,
b,, = 2me; — 2m7,, + 2mG,,7,, + 2n,,_1G,, — n,,_lG; — G,,7,, + 6:0,,
2
a,, = m7,, — me; — x,,_1G,,.
This SM technique is called set-membership stochastic approximation (SM-SA) because of
the resemblance to the stochastic approximation method. The central difference between
D-H/OBE and SM-SA is that the A; in their algorithm is obtained by minimizing x,,,
while in SM-SA.it is obtained by minimizing the volume ratio. Indeed, there is an ellipsoid
associated with Dasgupta-Huang’s OBE at each step, but the meaningful measures of its
size are ignored in the optimization process — sacrificing interpretability.
2.4 Data Selection
In this section, we will derive the criterion for updating. From (2.25) and the definitions
of h,, and d,, in (2.21) and (2.22), it is easy to see that V(O) = 1 and V(A,,) > 0 for all
0 < A,, < 1. Note that d,, and h,, are unity when A; = 0. Thus from (2.26) and (2.29), we
24
have
6V(’\n)
= 2.32
0A7; fl:0 F(O) ( )
2
___ m( “In _ En ) _ G,, (233)
Kn-l 511—1
6,, ‘
= ~14 (2.34)
where 1c,,_1 > 0 by definition (2.4) for A,,-1 6 [0, 1). Equation (2.34) reveals some interesting
relationships between the sign of c,, and the volume ratio, because if a,, < 0, there exists
some A,, such that the volume ratio is less than unity, which means that the volume shrinks.
Before we develop the main theorem on this point, we need some preliminary lemmas:
Lemma 2.1 If G,, < 1, then h,, > 0 for all A,, E (11%;) and
lim h,, = 0+
A,,—9%;-
IfG,, > 1, then h,, > 0 for all A,, 6 (1710.71 1) and
lim h,, = 0"
1 +
An" 1-G,,
where h,, is defined in (2.21).
Proof: By definition,
h,, = 1—1,,+A,,G,,
= 1 — A,,(l — G,,).
/—\.
If/an) 1, then 1/(1 — G,,) > 1. So the interval (1,1/(1 — G,,)) is not empty. For all
AJE (1, 1/(1 — G,,)), we have
1
1 0 and
lim h,, = 0+.
.__1_-
”"T1—Gn
If G,, > 1, then 1/(1 -— G,,) < 0. Therefore the interval (1/(1 — G,,),O) is not empty and for
all A,, E (1/(1 — G,,),O), we have
0 when A,, 6 (1/(1 - G,,),O) and
lim + h,, = 0"".
I
burr-:6:
Lemma 2.2
lim V(A,,) = +00
Anfll-
Proof: From (2.25), we have
V(A,,) = “39%.
Since lim1,,_.1- h,, 9t 0, it suffices to show that
lim 11;"(1 — A,,) = +00.
Angl-
26
From the definition of d,,,
«1:10— A.) = (1),")... (5:1)...“ - 1.)
_ 1 ( x,,, )m
— (I — A,,)m-l Kn_1 .
The positiveness of x,, can be immediately seen from (2.4). Thus at any time n, x,,/x,,_1 >
em__ fifl._._—r—----__ __ _ _. A-
m“! - -—--"
I~
0' SO ”(I — An) approavChes +m as An approaches 1-. (I, *’ (xi, ‘I fr; he)" "f‘if'.’ . ' [3
Theorem 2.2 If c,, 2 0, then there is no extremum of V(A,,) in the open interval (0, 1).
Proof: From (2.34) we know that
aV(/\n)
— >
01,. 1..=o - 0
when c,, 2 0. First we consider the case when c,, > 0, for which there are three possible
sub-cases:
1. Suppose G,, < 1. Then there are two asymptotes at A,, = 1 and A,, = '1-_IG',I > 1.
Claim 2.1 If c,, > 0 and G,, < 1, then
A,,—91+
lim dn = —m.
1 _
burr-7::
Proof of Claim 2.1: From (2.22),
A 1 x,, A,,7,, A,,e2
d = — = 1 + _ n
n 1 -' An Kn—l (1 - An)"n--1 Kn—lhn
Note that since
lim h,, > 0,
0"]
27
and since x,,_1 > 0 at time n, the second term in the right hand side will dominate
in the limit. So
lim d“ = -w.
Avg-91+
Similar arguments also apply when A,, approaches 1 / (1 — G,,)', because in this con-
dition, 1 — A,, d roach zero, but h,, approaches 0” from Chi/1:131. So the last
term dominates and 4 ///.’/’ 5k,
d,, = --00.
lim
1,.—+,—_15;"
Cl
Note that for all A,, 6 (1, 1 / (1 — G,,)), the sign of 1 — A,, and h,, remain unchanged, so
from (2.25) and Claim 2.1, the value of V(A,,) near the two asymptotes (A,, = 1 and
A,, = 1/ (1 - G,,)) will both approach +00 or —oo, depending on whether m is even
or odd. Note that in Lemma 2.2 we have shown that V(A,,) approaches +00 when
A; approaches 1‘, so minima can not exist in (0, 1) because otherwise more than two
extrema will exist. Therefore there are only two possibilities in this case, as illustrated
in Fig. 2.1.
V(An) V(An)
‘ 1
1/ \+ ‘| U +
Figure 2.1. Plots of V(A,,) when c,, > 0 and G,, < 1.
28
2. Suppose G',, = 1. Then there is only one asymptote at A,, = 1. Note that in this
condition, a,, = -me,2, — 6,2, < 0. So c,,/a,, < 0. This means that the extrema have
different signs. If there exists one extremum in the region of (0, 1), then there must
be another extremum in (0, 1) since lim,\n_,1_ V(A,,) = +00. This is contrary to the
fact that the roots’ signs must be different. Therefore we conclude that there is no
extremum in (0, 1).
3. Suppose G,, > 1. Then there are two asymptotes at A,, = 1 and A,, = ,—_1G—"- < 0. We
claim that
lim + V(A,,) = +00. (2.35)
An‘-’
l—Gn
For convenience, we rewrite (2.25) as follows
V(A,,) = 172$.
Note that lirnA , + h,, = 0+ from Lemma 2.1, 1 - A,, > 0, and
fl_’"Tn
1—
2
d,, = 1 x,, = 1 + A,,7,, A,,(sfl
1 — ’\n Kn—l (1 " An)Kn-1 — Kn—lhn
—) +00. (2.36)
Equation (2.36) holds because h,, -> 0+ and A,, is negative. Now (2.35) is satisfied
which implies that there must be one local minimum in the interval (1 / (1 - G,,), 0). If
there is one extremum in the interval of (0, 1), then there must be another extremum
since lim (v.1- V(A,,) = +00. This situation can be seen in Fig. 2.2, which shows
the existence of more than two roots in (2.31)—contradiction. Thus we conclude that
there is no extremum for A,, E (0, 1).
Suppose c,, = 0. Then
and
am.) 0
0A,, “:0
. We.)
11/ W =0”-
29
V(A,,)
1
1/-\./
:15; I A"
Figure 2. 2. When G,, > 1 and c,, > 0, if there exists one extremum of V(A,,) 1n (0,1), then
there exist more than two extrema. .
This means that V(A,,) is convex in the neighborhood of A,, = 0 and there is one minimum
when A,, = 0. If there exists some extrema in (0,1), there are at least three extrema which
violates (2. 31) since the number of extrema must beceyen.) So we ~conclude that there 18 no
extremum in (0, 1), and if there exist some A,, in (0,1) suchlthht (2.31) holds, it must be a
saddle point. The only possibilities with and without saddle points can be seen in Fig. 2.3.
C]
V(A,,) V(An)
(I) (b)
Figure 2.3. When c,, = 0, the the only two possibilities for V(A,,) in (0,1) are: (a) There is
a saddle point, or (b) no extrema and saddle points exist.
30
Theorem 2.3 Suppose c,, < 0. Then V(A,,) has a unique minimum in the open interval
(0, 1).
Proof: When c,, < 0, we have
3V(/\n)
0.
6A” A,,:O <
By construction of the algorithm, there are two A,,’s such that
(ax/(1,.) ___
6A,, 0.
Also, from Lemma 2.2,
lirn V(A,,) = +00,
A,,-91‘
so there is at least one minimum in (0, 1). If there exist more than one extremum in
(0, 1), then there must be at least three extrema as shown in Fig. 2.4, contrary to (2.31).
Therefore, there must be exactly one minimum in the interval [0, 1). Cl
V(An)
A
\
Figure 2.4. When c,, < 0, if more than one extremum appear in (0, 1), then there must be
at least three extrema.
31
From Theorems 2.2 and 2.3, we conclude that there exists an optimal weight if and only
if c,, < 0.
Theorem 2.4 There are always two real roots of the quadratic equation (2.81). Further-
more, if c,, < 0, then there is a unique root A; E (0, 1) with
A" _ —b,, + Vb; — 4a,,c,,
2a,,
(2.37)
Theorem 2.4 implies that if c,, < 0, then the optimal weight A;, which belongs to (0, 1),
can be obtained from (2.37).
Proof: First we will show that the real roots of (2.37) exist. Let a,,, b,,, and c,, be
coefficients of (2.31). For convenience, we rewrite them:
a,, = m7,, — me; + mG;7,, - 2mG,,7,, - 53-10,, + x,,-lG; + G,,7,, - 0,2,7” — 5:0,,
0-
a
l
2m5; — 2m7,, + 2mGn7,, + 2n,,_1G,, — x,,_1G; — G,,7,, + 5;G,,
_ 2
Define 5,, and b,, as:
"D
(m — 1)G;7,. (2-38)
@I
IID
G,,(2m7,, — x,,-lGn — 7,, + 5;). (2.39)
Then
and
32
Note that the expression of 6,, and b,, are exactly the same as the coefficients of the quadratic
equation in (1.9). Let the solutions of (2.31) be A; and A3,. Then
A1 _ _bn + Vb; — 4a,,c,,
” 2a,,
A2 —b,, — Mb“! — 4a,,c,,
2a,,
Note that
b; - 4a,,c,, = (—2c,, + b,,)2 — 4c,,(c,, — b,, + a,,)
= 4c; — 4c,,b,, + '13,”, — 4c; + 4c,,b,, — 4c,,E,,
= b,, - 4E,,c,,
= Gib: + 463,1712711 — 2GnKn-1‘Yn - 253.711 + 03.03-; " 26iG'nKn-l '1' 5:)
= G3.(G?.K3.-1 + 73. + 63. - 2G,,n,,_17,, + 253,1, - 2€§Gn~n-1)
+46im’7nG3. - 403,535),
= G;(G,,n,,_1 - 7,, — 5,2,)2 + 4£;G;m27,, — 46;G;7,,
= G;(G,,K.,,_1 — 7,, — 5;)2 + 415,2,G27,,(m2 — 1) 2 0.
n
Thus two real roots always exist.
To prove that A; = A;, consider the following four cases:
1. Suppose a,, < 0 and b,, < 0. Then «b,,; — 4a,,c,, < [b,,] since a,,c,, > 0. So —b,, :l:
«b,,! — 4a,,c,, > 0. But since a,, < 0, both solutions are negative. This violates the
existence of positive A; when c,, < 0. Therefore we can rule out this possibility.
2. Suppose a,, > 0 and b,, > 0. Then A; < 0. The only possibility to obtain a positive
root is to choose A; = A3,.
3. Suppose a,, > 0 and b,, < 0. Then «b,,; — 4a,,c,, > |b,,| since a,,c,, < 0. So A; < 0 and
A; > 0. To obtain a positive root, we must have A; = A3,.
33
4. Suppose a,, < 0 and b,, > 0. Then both roots are positive when c,, < 0. Since
a,,c,, > 0, Mb; — 4a,,c,, < |b,,| and hence A; < A3,. To obtain the unique root in (0, 1),
we must have A; = A;,.
We conclude that A; = A; when c,, < 0. CI
2.5 Relations Between SM-SA and SM-WRLS
As shown in Theorems 2.2 and 2.3, the ellipsoid will shrink if and only if c,, in equation
(2.31) is negative. If we compare c,, in (2.31) and c,, in (1.9), we find that they are exactly
the same quantity. This fact led us to explore whether relationships exist between SM-SA
and SM-WRLS.
The following simulation shows some apparent equivalencies between these two algo-
rithms. Consider the system
yn = 1-6yn—1 "’ 0°68yn—2 + ”n (2.40)
where {v,,} is a sequence of pseudo-random numbers uniformly distributed in [—0.5,0.5].
A set of 120,000 data points is generated using the model. Table 2.1 and Table 2.2 are the
results of the simulations for SM-SA and SM-WRLS using the system in (2.40). It can be
seen that both algorithms update the parameters at the same time and result in the same
“volume measure” and estimates.
These empirical results, and others due to Krunz et al. [33, 34], led Deller et al. [38, 39]
to the discovery that the volume measure defined by det B(n), the data points selected,
and the parameter estimates 0(n) is independent of the choice of weighting sequences when
UOBE is considered. Thus the relationship between SM-SA and F-H/OBE is expected to
be the same as that between SM-SA and SM-WRLS. This implies that the present research
on SM-SA has much broader implications for the entire class of UOBE algorithms. Details
will be discussed in Chapter 5.
On the other hand, SM-SA has some features which are not necessarily general. The
34
simulation results also reveal that the average weights satisfy Robbins-Monro’s require-
ments in (1.3) under the condition that the noise input is a pseudo-random number with
uniform distribution in [Vifl. Figure 2.5 shows the relationship between the time and
the “averaged optimal weight” from six experiments. The vertical and horizontal axes are
logarithms of weight and time, respectively. The two dotted lines are logarithms of 1/n
and Mn“. The curve of the averaged weights is very close to the curve of 1 / n and hence
the estimates could approach to the true parameter. Therefore, experimental results show
that SM-SA has a great potential in theoretical development and its theoretical analysis
can help explain some features in SM-WRLS which cannot be obtained directly from the
original algorithm.
10‘ I W I V‘TV‘T I V I UU'FIU I I V I'UIU
§
a...~‘
". ‘.
P 't ~§ —
.. ~~
.. \-
' Q
'0. ‘.~
._' ..
.
\‘.
§
lo-l -
102 -
103 ~
105 '- --..,,_..'. q
Lljlllll l 1 .Llllle l I lllll
106 . ,
10° 101 102 103
Figure 2.5. The relationship between logarithms of time and weight
35
SM-WRLS SM-SA
time volume measure 5,, time volume measure 5,,
1 1588030880 0.045102 1 6250000000 0.045102
2 61090735 0.325878 2 122037760 0.325878
3 75912.859375 -1.841745 3 93237.984375 -1.853359
4 67.021507 6.571213 4 66.320511 8.059097
5 19.407402 0.208971 5 19.290403 0.212660
6 17.094551 -0.261806 6 17.005600 -0.262197
7 6.657953 -0.219103 7 6.645070 -0.222424
8 2.566225 0.210988 8 2.566096 0.208843
9 0.861677 -0.235759 9 0.861815 -0.236259
10 0.399760 -0.076875 10 0.399806 -0.076872
11 0.191688 -0.458952 11 0.191711 -0.458865
12 0.191604 0.157472 12 0.191627 0.157516
13 0.169054 -0.062481 13 0.169072 -0.062429
17 0.168957 -0.355722 17 0.168976 -0.355732
18 0.130553 -0.420661 18 0.130561 -0.420695
19 0.102843 0.197666 19 0.102846 0.197650
20 0.089835 0.242533 20 0.089836 0.242511
21 0.051043 -0.607100 21 0.051041 -0.607123
23 0.044455 -0.433844 23 0.044454 -0.433845
41 0.044317 -0.433763 41 0.044316 -0.433759
55 0.043492 0.340501 55 0.043491 0.340501
58 0.043200 0.330139 58 0.043199 0.330139
115034 0.000072 -0.498830 115034 0.000072 -0.498830
115258 0.000072 -0.500641 115258 0.000072 -0.500641
115280 0.000072 -0.496111 115280 0.000072 -0.496111
115463 0.000072 -0.491712 115463 0.000072 -0.49l711
116302 0.000072 -0.483166 116302 0.000072 -0.483166
116350 0.000071 -0.499281 116350 0.000071 -0.499281
116643 0.000070 -0.503710 116643 0.000070 -0.503710
116738 0.000070 -0.501659 116738 0.000070 -0.501659
116934 0.000070 -0.500541 116934 0.000070 -0.500541
117060 0.000069 -0.499384 117060 0.000069 -0.499384
117354 0.000069 -0.491482 117354 0.000069 -0.491482
117706 0.000069 -0.502020 117706 0.000069 -0.502020
118516 0.000068 -0.503093 118516 0.000068 -0.503093
118709 0.000068 -0.498520 118709 0.000068 -0.498520
Table 2.1. The data-selection and the performance of SM-WRLS and SM-SA for the AR
model (2.50).
36
SM-WRLS
SM-SA
time
01
“2
time
01
02
. 010-03101-1
115034
115258
115280
115463
116302
116350
116643
116738
116934
117060
117354
117706
118516
118709
0.000000
2.659994
2.659676
2.504596
2.485668
1.602177
1.600430
1.601612
1.601638
1-596280
1.594492
1.591593
1.602305
1.604294
1.602866
1.603187
'1.597315
1.600800
1.605809
0.000000
0.000000
2.490193 .
-1.994164
-1.956294
-0.678578
-0.676362
-0.677525
-0.677555
-0.673320
-0.670988
-0.668343
-0.678133
—0.680194
-0.678490
-0.678821
-0.678648
-0.681055
-0.682360
.01000101-1
115034
115258
115280
115463
116302
116350
116643
116738
116934
117060
117354
117706
118516
118709
0.000000
2.662042
2.661741
2.505078
2.485804
1.602176
1.600430
1.601612
1.601639
1.596280
1.594492
1.591593
1.602305
1.604294
1.602866
1.603187
1.597315
1.600799
1.605808
0.000000
0.000000
-2.496096
-1.995054
- 1 .956493
-0.678578
-0.676362
-0.677525
-0.677555
-0.673320
-0.670988
-0.668343
-0.678133
-0.680194
-0.678490
-0.678821
-0.678648
-0.681054
-0.682360
Table 2.2. Estimates of SM-WRLS and SM-SA for the AR model (2.50) where a, = 1.6
and a; = —0.68.
Chapter 3
Convergence Issues
In the previous chapter, we have developed the SM-SA algorithm with a selective updating
scheme and non-increasing volume. In this chapter, the convergence properties of this
algorithm will be discussed. However, for all the estimation problems, the parameters in
model (2.1) can not be determined unless some conditions are imposed on the input signal.
A commonly assumed condition is persistency of excitation (PE)‘in which the input signal
is sufficiently rich in frequency components. PE has been defined in a variety of ways
[15, 17, 19, 20, 21, 22]. The following definition has been used extensively in adaptive
control and system identification.
Definition 3.1 A signal u(t) is said to be persistently exciting (PE) of order n if for all t
there is an integer N such that
t+N
911 < Z w(thTUC) < 021 (3-1)
h=t+1
where 91, 92 > 0 and the vector w(t) is given by
90(1) = [u(1 - 1) - - °u(t - ”HT
'Please read “PE” as “persistency of excitation” or “persistently exciting” as appropriate.
37
38
The underlying meanings of the PE condition have been discussed extensively in the
literature [6, 15, 17, 10]. From a frequency-domain point of view, a stationary signal u(t) is
persistently exciting of order n if its frequency spectrum is non-zero at n points (or more).
That is, its frequency components have to be rich enough so as to drive the estimates toward
the true parameters. According to this definition, white noise is persistently exciting of all
orders.
Research in SM-theory, however, is concerned with PE of x,,, the regression vector.
Anderson and Johnson [15], and Bitmead [17] proposed that the PE of the input signal
implies the PE of x,, under some mild restrictions on the system function. The major result
is given in the following theorem:
Theorem 3.1 [15, 17] Consider a linear system with transfer function H (z) = B(z)/A(z)
where A(z) = z” — alzr".1 — ~ - - — apz”'1’ and 8(2) = boz" + 512’"1 + - - . + qu”“1, which
are co-prime, and with input sequence {uk} and bounded output sequence {yk}. Then with
.f/ l“\~/
x; = [y,,_1, . . .,y,,_,,,u,,,u,,_1, . . .,u,,_q], PE of {a,,} implies PE of {x,,}.
Non-coprimeness of A(z) and 8(2) usually renders the parameters non-identifiable. This
discussion, however, is far beyond the scope of this research.
For LSE estimates of the parameters of a finite impulse response model with n param-
eters, the estimate is consistent and the variance of the estimate goes to zero if the input
signal is persistently exciting of order n. In order to deal with the convergence issues in
D-H/OBE, Dasgupta and Huang claimed that if for all t,
t+N
91I < Z xkxf < 921 (3.2)
k=t+l
for some N 6 N and 91, 92 > 0, then the covariance matrix
C(n) = (1 - A;)C(n — 1) + 1:91.11;
39
in D-H/OBE, which has the same form as R(n) in SM-SA, is nonsingular. From Theorem
3.1, if the system function is co-prime, then the statement in (3.2) is validated. However,
examining the proof of their argument, we find that the condition (3.2) is not sufficient
because of the time index mismatch. In (3.2), It varies over all the natural numbers between
(t + 1) to (t + N). In their proof, however, the time index varies only over those instances at
which updates take place. Consequently, we require a more strict definition for PE which
would accommodate non-uniform time indexing.
Given that x,, is PE, the question of whether the subsequence of {x,,} in which the
updates take place is PE or not is still an open issue and is difficult to prove formally.
However, this problem can be simplified if we impose the mixing condition to the noise
sequence. The mixing condition will be introduced in Section 3.4. For the moment, we
assume that the subsequence of {x,,} used in updates is PE. For SM-SA with ARX model,
we give the following theorem:
Theorem 3.2 Given the update equation:
R(n) = (1 — A,,)R(n — 1) + 1,,x,x£, 11(0) = no > 0 (3.3)
the eigenvalues of R(n), for all n, will remain bounded if the signal energy is bounded, i.e.,
there exists an L 6 R such that xxx“ 5 L < 00 for all n, and 0 S A,, 5 1.
Let {t,,} be a subsequence such that c,,, < 0 Vn. If there exists an N 6 N and 91, 92 > 0
such that for all ’12
I] n+N
911 < E x,,xg; < ml, (3.4)
.1 k=n+1 /
then R(n) is nonsingular.
Proof: Use induction. Note that from the above recursion, R(n) Z 0 for all 71. So it
suffices to show that trace(R(n)) < 00.
v ,
,J
~~ “111/
40
l. n = 1
R(1)= (I — A1)R(0) + Alxnx;
tr[R(1)] = (I - A1)tr[Ro] + A1t1‘[X1X¥‘]
= (1 — A1)t1'[R.o] + A1X¥X1
S max[tr(Ro), L].
2. Let
tr[R(n)] S max[tr(Ro), L] < 00.
We have to show that
tr[R(n + 1)] S max[tr(Ro), L] < 00.
Since
R(n + 1) = (1 " A,,+1)R(n) + A,,+1x,,+1x;+l,
tr[R(n + 1)] (1 — Art-H )trlR(n)l + An+lxz+lxn+l
IA
multr(R(n)),x£..x..0
IA
InnaLXItr(R(")), L]
IA
max[max(tr(Ro), L), L]
max[tr(Ro), L].
Therefore, the sum of all eigenvalues is bounded for all 71. So the eigenvalues have to remain
bounded.
The non-singularity of R(n) under the assumptions in (3.4) that the subsequence of
{x,,} in which updates take place is PE has been proven in the paper by Dasgupta and
Huang [20]. EJ
41
Theorem 3.2 ensures that the covariance matrix R(n) in (3.3) is always bounded and the
boundedness property holds for all the noise processes. Although boundedness of R(n) is
sufficient for proofs of some fundamental properties, positive definiteness of R(n) is essential
in order to prove the convergence of parameter sets as discussed in the next few sections.
3.1 Fundamental Convergence Properties of SM-SA
The most important feature of SM-SA, which differs from previous work, is the convergence
of weighting sequence {A;}. In this section, the fundamental convergence properties of
SM-SA are analyzed.
[ N6] NHL-20") 'lC C(n/(761” hit 6] $5.10]; (1.;
Theorem 3.3 Let (Rm, I] ' ll) be a normed space. Given the R(n) in (3.3) and the update
equation (2.7)-(2.11). Let A; be defined as
“bn+vb?;-4¢ncn 1.an < 0
A. 20a ’
, 0, otherwise
If infinitely many data points are selected, then the SM-SA algorithm developed in Theorem
2.1 converges in the sense that
"11.120 A; = O (3.5)
"11.113063, 6 [017nl- (3'6)
Furthermore, if R(n) is positive definite, then
"11.1130 “0,, — 0,,_.1|| = 0. (3.7)
If there exists positive real numbers 91 and 92 and integer N1 such that for all k
f.
\
k+~1
glI < z x,,x; < ml (3.8)
n=lc+l
42
then [31] a! _
t 2 4 n+N1
lim 0 — 0 < _ lim 1' 3.9
n—ooo ll nll2 — 91 "“00 £3.24 7 ( )
The proof of this theorem is very significant because, as a member of UOBE [39], SM-SA
sets the stage for a formal proof of the converging behavior of this family. Before we prove
this theorem, we need the following lemma:
Lemma 3.1 Consider the coefi‘icients of the quadratic equation (2.31) and the conditions
stated in Theorem 3.3. Let c; and c; be defined by
c,,, i c < 0
c; = f " . (3.10)
0, otherwise
0: = cm ifcn > 0
0, otherwise
Then lim,,_.°° c; = 0 if and only if lim,,_,oo A; = 0.
Proof of Lemma 3.1: From Theorem 2.4 and the definition of A; in Theorem 3.3, if
c,, < 0, then
A" = —b,, + Vb; — 4a,,c,,
n 2afl
Let 'd',, and b,, be the same as in (2.38) and (2.39), then from the proof of theorem 2.4, we
have
—2 —
b; — 4a,,c,, = b,, — 4a,,c,,
If c,, < 0, then A; becomes
,_ 2c; - b,, + Vb; — 4(m — 1)G3,c,77 ‘
A = 2a C (3.11)
Ti.
Suppose lim,,_.°° c; = 0. If lim,,_.°o a,, 75 0, then from (3.11) we know that lim,,_.°o A; = 0.
Iflim,,_.°° a,, = 0, then we claim that lim,,_.oo b,, 76 0 because from the definitions of 5,, and
43
b,,, we can write
an+bn=_cn+6n
and for c,, < 0, we have
.1919 + 3,.) = .132. a» + .22. bn
= lim 0,,
n—ooo
= lim (—a; +6“)
71-600
= lim 6,,
fl—‘W
= (m - 1)Gn7fl>
That is, lim,,_.oo b,, > 0. Thus from (2.31),
lim 1;; = — lim 91 = 0
n—ooo n-ooo bn
Suppose lim,,_.oo A; = 0, it follows from (3.11) that when c,, < 0, then in the limit
2c; — b,, = —\/b; — 4(m — 1)G3,c;7n
Thus /
“11.00 'm;[4c — 4a,", b,, + b 3,] = "lim 0°[b; - 4(m— 1)G;c;'7,]
and
m[4c;" — 4c;b,, ] = “lim —4(m — 1)G,2,c,, 7]
fl-Ougfi
Suppose lim,,_.°° c; at 0. Then
lim c,, =-nle [(m—1)G,, 7+b,,.]
fl-‘m
44
The numerator of equation (3.11) becomes
2c; — b,, + «lb—,2, — 4(m — 1)G3,7flb,, + 4(m — 1)2G;7: = 2c; - b,, + \/(b,, - 2(m — 1)G,2,7,,)2
(3.12)
Comparing (3.12) and (3.11), we have in the limit
,flb‘, — 2(m — 1)G§,7)2 = \/53, — 4(m — 00.2.9.7, (3.13)
Note that in left hand side of (3.13), \/(b,, -— 2(m — 1)G3,7)2 < [b,,]; while -in right hand side
of (3.13), \/b,2, — 4(m — 1)G,’,c,,7n > |b,,| since c,, < 0. This is a contradiction. So the only
possibility when lim,,_..°,, A; = 0 is lim,,__.oo c,, = 0. 0
Proof of theorem 3.3: Suppose A; does not converge to zero. Then there exists a
strictly increasing sequence (m,,);°=l such that A;,“ 2 a for some a > 0. From Lemma 3.1,
we have lim sup,,_,°° c,, < 0. From (2.34),
6V(A,n,,) _ __ c,,,,,
BAN“; Amn=0 — F(O) - nmn" .
For the “volume measure” of P(m,,), we have
det B(m,,) = det x,,,nP(m,,) = Km" det P(m,,)
Since R(m,,) is always bounded, det P(m,,) = det R‘1(m,,) > 0. Also note that
{det B(n)};g, is non-increasing, so 16,, is bounded. Since 1imsup,,_,°o c,, 314 0, there ex-
/.
\
ists infinitely many n such that
0V(A,,,,,) _ c,,,"
—- < —b
0A,"; Amnzo ”mu—1
where b > 0. So there are infinitely many n such that V(A;,n) 5 c for some 0 < c < 1. Let
M,,,,, be the volume at time m,,, then Mm; = Vonle, ' - -V,,,,,. Thus lim,,_,°o M,,,,, = 0.
1
(.1'
45
Note that {Mn} is non-increasing. Thus we have 1im,,_.oo M,, = 0. By definition,
M,, = det n,,P(n) = K? det P(n).
wire] i/ I].
Since det P(n) is positive definite by Theorem 3.2, lim,,_.oo It? = 0. This leads to the
conclusion that lim,,_.oo x,, = 0. Let 0,, = 0,, — 0", then
5n = yn - 07T—1xn
= 031+ v,, — 0;_1x,, (3.14)
~T
Since the volume of the ellipsoid converges to zero, the estimates converge to the true
. - . ~T
parameters. Thus 11m,,_.°° ||0,,-1|| = 0 and hence 11m,,_.°° 0,,_1x,, = 0. Thus, from (3.14)
we have lim,,_.°o |e,, -— v,,] = 0. Since lim,,_.°° x,, = 0 and since c,, = 7717,, - me; - x,,_1G,,,
we have
.2261; = .liem<1-€i)-~n9n
= lim r”(x-£3.)
n-voo
= .22.":(1- vi) 2 0
By the definition of c; in Lemma 3.1, lim,,_.oo c; = 0 and hence lim,,_..,,o A; = 0, contrary
to the assumption. Thus we conclude that the sequence {A;} must converge to zero.
It follows from (2.10) that lim,,_,°° ||0,, - 0,,_1|| = 0 when lim,,_.°o A; = 0. Thus (3.7)
is satisfied.
Consider the coefficient c,, in (2.31). At any time n, we can write
6,, = c; + 6,",
where c; and c; were defined in Lemma 3.1. Since 1im,,_.°o c; = 0, we have
liminf c,, = liminf(c;' + c,‘,’) 2 0.
n-‘m n-‘w
46
Let s > 0. Then there exists an N E N such that for all n > N,
m(7fl— 6;) — n,,_1G,, > —me.
Let n > N. Then
7,1- 5; > -E + 3137:7031 > —e. t (3.15)
Let s be arbitrarily small, then we have
7 - 63. Z 0-
Thus a; 6 [0,7] and, hence, (3.6) is satisfied.
-, Due to the slow convergence of the ellipsoid for large 71, one can argue that for sufficiently
“K Km,
—__,,
large n, there exists a positive integer M such that M > N1 and
[ya-H: — OZXnH] < \/7n+k, l S k S M
In other words, no update takes place on {n + 1,. . .,n + M}. Substituting for y,,+), by
07x,,+), + v,,.H, and noting that |v,,+),| < ,/7,,+1,, we can write
[(0'Y— 0,,)Tx,,+),] = [(0‘— 0,,)Tx,,+), + v,,+;, — v,,+),] (3.16)
>4“ |(o"- 00’s.). +v.+1|+|v.+1| (3.17)
3 2w... 7 (3.18)
for l _<_ k 5 N1. Therefore, squaring both side of (3.18) and, then, summing over 1:,
1 S k 3 N1, it is found that
n+N1 J fl+N1
(0]- 0,,)T [ z: x,-X,T (J-On)<4 Z 7,-
i=n+1 i=n+1
47
Hence, using (3.8), it follows that
1 4 "+N’-
“9- 0..“2 < — 2‘ 7.-
91 i=n+1
Theorem 3.3 assumes that infinitely many data points are selected. The algorithm,
however, may stop at some finite time N, that is, optimal weights no longer exist after N,
when noise excitations are “too colored” or noise bounds are over-estimated. In this case
the convergence properties (3.5) and (3.7) still hold since the volume of the ellipsoid and
the estimates will remain constant, and the optimal weights A; become zero after time N.
For c,,, recall that
c,, = m(7,, — 5;) — x,,_1G,,.
Since cn20forn> N,
m(7,, — 53,) 2 Kn_1Gn > 0 (3.19)
Equation (3.19) ensures that (3.6) still holds even if the algorithm stops at finite time.
3.2 Convergence of Parameter Sets and Suboptimal Test
In this section, we will first investigate the condition under which the membership sets
converge to a point. Then, a “suboptimal test” criterion is introduced. We will find that
employing the suboptimal test not only improves computational complexity, but also results
in excellent convergence properties.
3.2.1 Convergence of Parameter Sets
Since the volume of the ellipsoid in SM-SA is non-increasing, it converges for all the sta-
tionary signals and linear time-invariant models. The ideal performance for SM-SA is the
convergence of membership sets to a point asymptotically. In this section, we will study
the asymptotical properties of parameter sets.
48
Theorem 3.4 Let {t,,} be a subsequence such that 7,; - a?" < 0 Vn. If there exists an
N E N and 91, 92 > 0 such that (3.4) holds, then the volume of the ellipsoid converges to
zero if {t,,} is infinite.
Proof: In Theorem 3.3 we have shown that lim,,_.°° A; = 0. Let c; be defined as in
(3.10). Then from Lemma 3.1, lim,,_.°o c; = 0. That is,
.-
A
('0, £570 ,3 .41 ,1" _;_. .. H I’ll” c", ( H ’ 1 7
"lingo [m(7,, - 6;) — x,,_1G,,] = 0. . i (3.20)
I/ 1 "l ,I - l \
Note that the ellipsoid at time n is defined by (f); L5 firm/r ’j 1/ (I
7/
P(n) = {0| (9 — 9,.)7’ R(n) (9 — 0,,) < x,,}.
Thus it is obvious that x,, is positive if P(n) is non-empty.
We claim that there exists a a > 0 such that
x;B(n — 1)x,, V [1
0' < llxnllz det B(n _ 1). (3.21)
To show (3.21), consider
x;B(n - 1)x,, _ x£n,,_1P(n — 1)x,, (3 22)
“x,,”2 det B(n — 1) — ||x,,||2n$_l det P(n - 1) '
T _
x,,P(n 1)x,, (3.23)
1c""'1||x,,||2 det P(n - 1)
n-l
Since the PE condition defined by (3.4) holds, P(n) is positive definite. Therefore
0 < A,,,,-,,||x,,||2 g x£P(n — 1)x,,,
49
where A,,,,-,, is the minimum eigenvalue of P(n — 1). Now
A,,,,-,,||x,,||2 x;P(n — 1)x,, __ x;B(n - 1)x,,
Isl-11 ||x,,||2 det P(n - 1) - 11:31 ||x,,||2 det P(n — 1) le,,||2 det B(n — 1)'
That is,
A,,,,-,, xZ'B(n - 1)x,,
52:11 det P(n - 1) - ”x,,“2 det B(n — 1).
rn-l
By the positive definite property of P(n), we have det P(n — l) < 00. Also x,,_1 < 00.
Thus we conclude that there exists a o > 0 such that
A,,,,-,, x;B(n — 1)x,,
162:1] det P(n — 1) - ”x,,“2 det B(n — 1)’
a<
and, hence, (3.21) is satisfied.
In the remaining part of this proof, we consider the subsequence {t,,} only. In order to
simplify the notation, we replace the time index t,, by n for the rest of this proof. Since x,,
> _.
satisfies (3.4), there exists a 6 such that there are infinitely many 11 for which ||x,,||2 > 6.
Let c > 0. According to (3.20), there exists an N, such that if n > N, then
[m(7,, — 5,2,) — x,,Gn] < 660. (3.24)
Let n > N1. Since 7,, - e; < 0, we have from (3.24) that
m ]7,, - 5;] + x,,_1G,, < 660.
Thus
0 < n,,_1G,, = n,,-1x;P(n — 1)x,, = x;B(n — 1)x,, < 660'.
By (3.21),
0 < 0‘||x,,||2 det B(n — 1) < x;B(n — 1)x,, < 660.
Thus
6
det B(n — 1 < ——e.
) "an?
50
Note that there exists an N > N1 such that
“IQ/[[2 > 6.
Therefore
6
detB N—l <——e N. (3.25)
By definition, the volume of the ellipsoid converges to zero. Cl
4
Recall that 6,, ; —0,,_1x,, + v,,. Therefore 7,, — 5,2, < 0 if and only if
lvn " éZ—lxnlz > 7n (3.26)
If the volume of the ellipsoid keeps shrinking, the value 034x" will decrease to a small
number as n becomes large.\When 011x" is small, from (3.26) we see that v,, must be
sufficiently close to the 6:31:41: thaftj7n — 5,2, < 0 for infinitely many 71. This is very
important for the convergenceitdf the parameter set. In Section 3.3.2 we will show that if
either the noise bound is over-estimated or the realization of the noise process does not visit
an arbitrary neighborhood of the bound infinitely often, then the ellipsoid can not converge
to a point asymptotically.
3.2.2 A Suboptimal Test for Innovation in the SM-SA Algorithm
Theorem 3.4 leads to the pursuit of a suboptimal test. Recall that one important feature for
members of UOBE is “selective updating.” That is, no updating is taken when incoming
data do not contain useful information. Sub optimal test is a small modification of the
updating criterion associated with SM-SA. Instead of checking c,, = m(7n - 5;) — n,,_1G,,,
the suboptimal test checks only 8,, g 7,, - 5; since the negativeness of 6,, implies the
51
- r
.
negativeness of c,,. Note that the unique root A; on (0, 1) still exists for suboptimal test.
Suboptimal testing has been studied in previous work [28]. The computational benefits
of employing a suboptimal test can be substantial since the updating criterion is reduced
to 0(m) computations where m is the model order. The weights obtained from suboptimal
test, however, may not be optimal in the sense of volume ratio minimization. In order to
obtain similar convergence properties as we do in SM-SA, define
c,, ifE,,<0
n
:1
||
0 otherwise
Lemma 3.2 Given the coefl‘icients of the quadratic equation (2.31) and the conditions
stated in Theorem 3.3, we have lim,,_.°° 5,, = 0 if and only if lim,,...°o A" - 0, where A;
n—
is obtained by solving the quadratic equation
a,,A; + b,,A,, + c,, = 0
when 8,, < 0 and is set to be zero when 5,, Z 0.
We do not intend to give formal proof for this lemma because 2,, < 0 implies t t c,, < 0
and hence the proof follows directly from that of Lemma 3.1. Only note that c,," proof of
lemma 3.1 must change to é,,. The importance of the suboptimal test in theor cal analysis
can be seen in the next theorem.
Theorem 3.5 Let (33m, ll ’ ll) be a normed space. Given the R(n) in (3.3) and the update
equations (2.7)-(2.11), if R(n) is positive definite and the suboptimal test is employed with
infinitely many data points selected, then the SM-SA algorithm developed in Theorem 2.1
converges in the sense that
"lingo 1:, = 0 (3.27)
lim det B = 0 (3.28)
n-ooo
52
Proof: If we change c,, in the proof of Theorem 3.3 to En, then by Lemma 3.2, the
proof of (3.27) follows directly from the proof of Theorem 3.3. Since infinitely many data
points are taken using the suboptimal test, there exists infinitely many points such that
7,, -E; < 0. By Theorem 3.4, the volume of the ellipsoid converges to zero and hence (3.28)
is satisfied. 0
Again, if the algorithm stops at finite time N, then lim,,_.oo A; = 0 for n > N. Thus
(3.27) still holds. In this case, however, the volume of the ellipsoid cannot converge to zero.
Simulation results will be presented in the next chapter in support of the above analysis.
3.3 Convergence of the Parameter Set with White Noise
Case or Errors Disturbances
White noise has been studied frequently in the analysis of many estimation algorithms as
well as model structures. A sequence {v,,} is defined to be white noise if
131),, = 0 (3.29)
and
0 ° 0
Ev,v,-_,- = J 71 (3.30)
a2 j = 0
where 02 is referred to as variance of the process. If (3.30) holds, {v,,} is said to be
uncorrelated. It is common to assume that {v,,} is second moment ergodic so that the
statistical properties in (3.29) and (3.30) can be realized by time averages, that is,
O 1 n
“1320 if ‘E— v,- — 0 (3.31)
and
1 " 0 j 0
lim — E v,-v,-_,- = # (3.32)
53
For LSE algorithms, white properties have been shown to be sufficient for the convergence of
the estimates for the structure in (2.1) [13]. Since the estimates in SM-SA belong to a class
of LSE estimates, it is of interest to know whether the white property of noise excitation
also applies to SM-SA.
In this section, we will first consider a special class of white noise which is called i.i.d.
white noise in which the random variables v,- and v,- are not only uncorrelated but also
independent, and identically distributed for all i, j. Following the analysis of the i. i.d. case,
the general white noise processes defined in (3.29)-(3.30) or (3.31)-(3.32) are considered.
3.3.1 IID White Noise Case
Theorem 3.6 Suppose the noise process {v,,} is an i.i.d. white noise in which {v,,} is zero
mean and uniformly distributed on [—,/7,,/7] for all n. If SM-SA is employed with noise
bound given by
v; < 7 for all n,
then the volume of the ellipsoid converges to zero.
Proof: LetFé{n€N:c,,<0}={n€N:c;#0}andletnEF. Since
limn m c- = 0 >_ I, ;.
f7 T n ’ / .
. , ' 2 ._
‘r f— .‘f ILG' 7‘» .w . ' [ nanolo m(7 - En) - ”71-1071 — 0'
'r My .117. ,. 1 (4
Note that '1 '
x,,_1G,, = n,,_1x;P(n — 1)x,, = x;B(n — 1)x,,.
Thus
0 — — . 2 T —
11111 c,, — “lingo [m(7 — c,,) — x,,B(n - 1)x,,] — 0. (3.33)
n—voo
Equation (3.33) implies that in the limit
_ 3:58.495.
m .
m
:N
54
We may assume that ”x,,“ 76 0 since the probability of ”x,,“ = 0 is zero. Suppose the
volume of the ellipsoid does not converge to zero, that is, lim,,_.0° det B(n — 1) ¢ 0. Since
B(n — 1) is positive definite, x;B(n — 1)x,, > 0. Thus
"1:11:07—5; >0forn6F.
For other 71 g F, the inequality 7 — 5,2, > 0 still holds since m(7 - 5;) > n,,-1G,,. Thus we
.' ] _.
. 1‘\
have
”lingo 7 — 5,2, > 0 for all n, (3.34)
and there exists an 5 > 0 and a subsequence {m,,};,"’=1 such that
7 — 53,, > 5 Vn. (3.35)
. ~T
Since 5,, = —0,,_1x,, + v,,, we have
7 - 63.... (J7 + lem..|)(\/“7 - lsmnl)
= («7+ Ism.l)(./~7- Ivm. - 03..-.xm1)
>6.
Thus
~T 6
—v,,—0 _x,, >——
6
2,/-7 , , )4
I
=6.
”R
Therefore (3.35) holds if and )0nly if
T
|v,,,n — 0,,,n_1x,,,,,| < ,/7 — 5' Vn. (3.36)
55
Expanding (3.36), we have
— (,fi _ a) < v,,," _ 0;_,..,,, < fi— 5,
and thus
_fi-Jf- (5' + 0:,n_1x,,,,,) < v,,," <‘\/7 — (5' '— 0:n_lx,,,,,)
1. Suppose 0:n_1xm; > 0. If —\f7_ < v,,," < —\/7+ 5’ (see Fig. 3.1(a)), then \/7- 5’ <
|v,,,,,| < ,/7 and thus
6T
vmn — mn-lxmfl
“T
= lvmnl + [emu-Ixmn
“T
> J7—5’,
violating (3.36). Consequently, v,,," cannot be in the shaded area in Fig. 3.1 (a).
Since v,, is uniformly distributed, the probability that (3.36) holds is less than 11%;:1.
2. Suppose 0:n_1x,,,,, < 0. If \/7— 5’ < v,,," < ‘/7, then
“T ”T
lvmn _ emu-Ixmn = vmn + lama-1x731:
“T
> J? — 5’ + 0,,,n_lx,,,,,
> \f7- 6’1
violating (3.36). The graph interpretation is in Fig. 3.1 (b). Since v,, is uniformly
distributed, the probability that (3.36) holds is less than 2%:
Let the probability of ~0:_1x,, > 0 be q,,. Then, the probability that (3.36) is satisfied is
less than
2 —e’ 2 —€’ 2 —e’
L+(1_qm)_fl_-_\/7_<1
gm" 2J7 2J7 — 2\/7 (3.37)
56
flvn)
*m l 1 v»
—,/7: 5’ o f;
(a)
flvn)
1 l v
.5; e 0 "
Figure 3.1. (a) If gin-1x171» > 0 and v,,," in shaded area, then a contradiction results. (b)
If dandxm; < 0 and v,,,,, in shaded area, then a contradiction results.
Let A,, be the event that (3.36) holds at time m,,. Then from (3.37),
PM.) _<_ 1% < 1.
Now, given (3.36) holds at time n, we claim that
. 2 _ a
P(A..+1|A..) s i— < 1
277 , (3.38)
where A,,.” is the event that (3.36) holds at m,,+1. To show (3.38), consider
P (An+1lAn)
= p (A,,, n {é;,,,-,xm,,, > 0} | A,,) + P (A,,, n {é;,,,_,xm,,, < 0} | A,,)
= P (A,,, | {éfimdxmfi > 0} n A,,) P ({é;,,,_,xm,,, > 0} | A,,)
+P (A,,+1 | {9;n,,_,xm,,, < 0} n A,,) P ({0:n+1_1x,,,,,+, < 0} | A,,)
P (-,/»7 + 5’ < v,,,,, < fil {é;,,,-,xm,,, > 0} n A,,) P ({0;,,,_,xm,,, > 0} | A,,)
~T ~T
+1) (-\/‘7 < vmrH-l < W- 6’ l {omn+1-Ixmn+1 < 0} n A“) P ({9m3+1-Ixmn+1 < 0} l A") '
IA
57
Since {0,,} is an i.i.d. white noise process,
/’T
I "T f 2 7 — 5’
P (—‘/7+ 6, < ”mu.“ < WI {9mn+1—1xm,,+1 > 0} m A") : __\/;__
2,/7
and
~T - 2 7 - 5’
P (—\/T< ”mu-+1 < fi- 6’ l {0mn+1-lxmn+1 < 0} n An) = —\g:—/—:r—_-
Thus
2 7 - 6' -T 2 7 - 5’ ~T
P(An+l|An) S f—fl.—' P ({omn+1—lxmn+l} > 0 l An)+-\g-—\/7_ P ({omn+1—lxfl‘ln+1 < 0} l An) °
T
mn-l-l
Let q,,., = P ({9 _,x,,,,,,, > 0} | A,,). Then
2 —€’ 2 -€' 2 -€’
P(An+l l A,,) = an—‘é-if—‘Y- + (1 - 9n+1)—\/77‘/—,7— : Jé—i‘fi—
Thus (3.38) is proved and
_ g 2
P(A,,+1 n A,,) = P(A,,+1 | A,,)P(A,,) g (%.7—)
A similar method can also be used to show that
2 7 — 5’
P(An+2lAn+1 n An) S _‘éZfi—
and
2J7 — 5’ 3
P(A,,+2 n A,,“ n A,,) = P(A,,+2|A,,+, n A,,)P(A,,+1 n A,,) s (W) .
By induction, we have
.. g "
n—ooo n—soo 2J7
Thus we conclude that the ellipsoid converges to zero. 0
58
3.3.2 General White Noise Case
Set-membership-based algorithms are derived from noise bounds. In Theorem 3.4 we showed
that if there exists infinitely many data points such that 7fl -_£,2, < 0, then the volume of the
ellipsoid converges to zero. Since 6,, = v,, - 91:13:", the geometrical meaning of 7,. - 5,2, < 0
is that v,, must be very close to the boundaries if more data points are to be used. Before
we study the general white noise case, consider the cases in which the noise bounds are
over-estimated or when the noise processes are highly correlated such that
12,2, < 7,. — 6 Va 2 N (3.39)
for some N and 6.
Claim 3.1 Suppose there exists N and 6 such that (3.39) holds. Then the ellipsoid does
not shrink to a point.
Proof: We prove this claim by contradiction. Consider the N and 6 in (3.39). Suppose
the ellipsoid converges to a. point. Let n 2 N. Then
(\/7—n- Ivnl)(¢7;+ Ivnl) > 6.
Therefore
f
‘/7 — v > ——
n lnl V7n+lvnl
6
«7.. + WT.
6
2,/7—,,'
Since x,, is bounded, by the stability assumption on (1.5), and since the volume of the
ellipsoid is assumed to converge to zero, there exists an N1 2 N such that
6
7 <3V7n,
lea-1‘43 <——
where 7 = maxneN{7n}. Let n 2 N1, then
7n—En
59
(m4 |€n|)(\/'7?I- lenl)
(«m lenlxw—n- Iv. - fax.»
mm lens/7;— lvnl 461ml)
>
> (W? lenl)(2‘fi_ - Iéf..xnl)
> (m+le.l)(2jfi-3jfi)
= (muesxsjfi)
> 5
Now the convergence of the ellipsoid to a point implies that x,, converges to zero. So
there exists an N; 2 N1 such that
6
Kn_1Gn < - Vn 2 N2.
Let n > N2. Then it > N and
Cn
E
6
O.
6
m(7,, " 512:) - "n-l Ga
3
6
This result is contrary to the assumption since the algorithm stops taking points after time
N2. Thus we conclude that the volume of the ellipsoid can not converge to zero.
Cl
Claim 3.1 points out the impact of the noise bounds on the volume of the ellipsoids
and thus leads us to study the relationship between the white noise process and the noise
bounds. Let {0,,} be a white noise process defined in (3.29)-(3.30). If {v,,} is i.i.d., then
from theorem 3.6 the volume of the ellipsoid converges to zero. The key point in the proof
is that the magnitude of the noise can be arbitrarily close to the boundaries. If we eliminate
60
the condition of independence, then the noise may not be arbitrarily close to the boundaries.
Consider the following example.
Let {v,,} be a noise process taking only discrete values at -\/7, —\/273, 0, m, and
fl. Let A be an event defined in an appropriate probability space, and vo be the initial
condition such that P(vo 6 A) = 2/5.
If vo E A, then {v,,} is defined as
\/2 73, w" 2 0
—\/27/3, 10,, < 0
Otherwise
fir 1
Thus, (v,,,n _<:1) is uniformly distributed over {—‘/7', —\/2773,0, V2773, fl}, and hence
we can write 03, g 7 for all n. The conditional expectations are given by
E(v,,|vo E A) = ‘/27/3P(w,. 2 0) - ‘/27/3P(w,. < O) = O
E(v,,|vo ¢ A): fiP(1 < 10,. S 3) — fiP(—3 3 mn S —1)= 0.
Therefore,
B(vn) = B(vnlvo E A) P(vo E A) + E(v,,|vo ¢ A) P(vo ¢ A) = O.
61
Also, we find that
a}, if s = t
E(v,vt I 120 6 A) = ,
0, otherwise
and
131, if s = t
E(v,vt | v0 5? A) = .
0, otherwise
Consequently,
231, if s =t
E(U,Ut) =
0, otherwise
As a result, {v,,} is a white noise with mean zero and variance 1}. Furthermore, if second
moment conditional ergodicity is assumed, as defined in (3.31)-(3.32), given v0, then 12,, is
also second moment ergodic. If RLS algorithm is used to identify the model parameters
in (2.1) with this noise model, then the estimates converge to the true parameters since
on is uncorrelated. This may not happen to SM-SA if the initial condition v0 6 A, simply
because 12,, can not be arbitrarily close to the boundaries. By Claim 3.1, the ellipsoid cannot
shrink to a point.
In this presented example 12,. can not be close to the boundaries since the dependency
between on and 00 can not be eliminated, even for large n. In addition to this noise process,
there are other white noise processes in which {v,,} rarely hits the boundaries. In this case,
the performance of SM-SA may not be favorable. ‘ .
For the following example, we will assume {ton} to be a sequence of pseudgrandom
numbers with uniform distribution on [-0.5,0.5]. First, let {v,,} be a white noise process
generated by passing {10"} through a first-order all-pass filter
0.5 + 2’1
HM = 1 + 0.52-1'
Figure 3.2 and 3.3 show the sample correlation coefficients of {mu} and {v,,}. ,3
Both sequences have identical sample mean and second moment properties. \‘Consider
the systems
yn = Glyn—1 + a2yn-2 + aayn—s + vn
12"
62
-02 - -
-o.4 - «
-O.6 .
-o.e - -
'10 5 1o 15 20 25 30 35 40
Figure 3.2. Sample correlation coefficients of {mu} in the example on page 63 up to time
lag 40.
-02 - l
-o.4 - «
oer ~
-o.a - -
'10 5 1o 15 20 25 so 35 40
Figure 3.3. Sample correlation coefficients of {v,,} in the example on page 63 up to time
lag 40.
63
and
ya = Glyn—1 + awn-2 + “syn-3 + wn
where 01 = 2.1, a; = —1.48, and 03 = 0.34. Comparing the estimates obtained by RLS and
SM-SA in Fig. 3.4 (a) and (b), we find that the behavior of SM-SA for these two noise
processes is completely different while the estimates obtained by RLS are almost the same
because both noise processes {v,,} and {mu} are white.
The difference in SM-SA performance can be explained by analyzing the visitation
frequency of the noise bounds. Fig. 3.5 compares the relative frequency of {mu} and {v,,}
at each amplitude interval. In these figures, the horizontal axis represents the amplitude
and the vertical axis represents the number of times the noise amplitude falls in a specific
interval. We see that the distribution of {mu} at each interval is close to be uniform on
[—0.5,0.5], while the distribution of {v,,} appears Gaussian, although it is not, within the
interval [—O.9,0.9]. Since {v,,} rarely stays in the neighborhood of the noise bounds, the
ellipsoid cannot shrink enough to provide good estimates as shown in Fig. 3.6.
However, if we pass {v,,} through a soft-limiter such that v,, is set to be 0.5 or —0.5 if
v,, > 0.5 and v,, < —0.5, respectively, and otherwise left unchanged, then the frequency of
visitation take a neighborhood of the bound, namely 7 = 0.55, increases as depicted in Fig.
3.7. This new {v,,} still seems to have white properties as $116411 in Fig. 3.8. However, from
Fig. 3.9, we find that the performance of SM-SA in this case is much better than before.
If we multiply the original {v,,} by two and pass it through the same soft-limiter, then
the rate of visiting the neighborhood of the noise bounds by {v,,} is much higher as shown
in Fig. 3.10. Figure 3.11 shows that in this case the performance of SM-SA is extremely
good.
If we examine the proof of Theorem 3.6, we find that the convergence of the ellipsoid
does not depend on the first and the second moment properties of the noise input. That
is, even if the conditions defined in (3.29) - (3.32) are not satisfied, the estimates may
still converge to the true parameters if the “close-to-boundaries” condition is satisfied. For
example, let {v,,} be a noise process obtained by passing {wn} (i.i.d. noise, see page 61)
64
O . ..
-O.2 - -
True ----
-O.4 - ~
200 400 600 800 1000
(13)
§
Figure 3.4. Estimates of the parameter (13 when the system is driven by (a) an i.i.d. white
noise (b) a white noise but not i.i.d..
65
$883
,8
N
O
.5
O
Nunber of time the maglitlfle falls in the given intenlal
O
‘1
O
8
2
.j
i
'
0.
.L
i.
i
don-CooouozoIo-o-oiooun oooooooooooo
I
a
I
I
I
o
o
I
I
o
o
i
o
o
I
I
I
I
C
o
l
‘5
Q
I
a
o
I
i
2
u
u
'5
d
O
l
I
l
I
Mmberoltimemsmagtitude fallshthegiven interval
3.
31
l
l
\ j
i 1
l
j
J
O
Figure 3.5. Distribution of {ton} and {v,,} at each amplitude interval.
66
4
white noise .
4
i.i.d. noise ‘
‘5 .
10 A A A l A A l A A A A
10° 101 1o2 103
n
Figure 3.6. The logarithmic plot of volume versus the actual time n. We see that the volume
of ellipsoid is much smaller when the noise is i.i.d.
\l
O
E
£60- ........ . ................... ................... ................... . ................... . ........
§ 3
5 l : E F
350. ........ , ................... 3 ................... i: .............................................. _
3 _
gwl ........ ................... ......
gag. ...................... _.l_l_r1 F. .i. T.
s t— - '
20,. .. 5 .. .
'6 E
g 10" :t *' 1 ‘
G -O:2 (I)
Amplitude
Figure 3.7. Distribution of a white noise at each amplitude interval after passing {0,,}
through a soft-limiter.
67
Sample correlation coefficients in to time difference a: 40
1 I U I T I V I
-0.2 r .
-0.4 - .
-O.6I- .
-0.8 - -
'10 5 1b 115 2b 25 ab 35 40
time difference
-02 - -
True ---
-O.4 ' 1
200 400 600 800 1000
n
Figure 3.9. Estimates of the parameter a3 when on is passed through a soft-limiter.
68
g I jg I Y I r—
.5200... ........ g ................... g ................... 4 .............................................. q
S.
3
£150., ................................................................................................ 1
so
3
§3100' ............................................................................................... .1
E
3
E 50. ............................................................................................... _
E
c -014 -o:2 o ” 0:2 0.4
Amplitude
Figure 3.10. Distribution of a white noise at each amplitude interval after multiplying v,,
by two and then passing it through a soft-limiter.
0- .
.02. .
True ---
-O.4~ -
200 400 600 800 1000
n
Figure 3.11. Estimates of the parameter (1;, when on is multiplied by two and then passed
through a soft-limiter.
69
through an all-pole filter described by
1
H(Z) : 1 + 0.5.?"1
The output is bounded on [—1, 1]. From its sample correlation coefficients in Fig. 3.12, we
know that {v,,} is colored noise. Figure 3.13 shows that the performances of SM-SA and
1 . . . .
0.8 - -
0.6 .
0.4 - 4
0.2 — .
0 .
.02 r- _.
-0.4 - -
-O.6l- .
-0.s - -
'10 5 1b 15 20 25 ab 35 40
time difference ‘
Figure 3.12. Sample correlation coefficients of {0,,} up to time lag 40.
RLS are severely affected by the autocovariance of {0,,}.
Again, the distribution of {v,,} is examined in Fig. 3.14 where we find that {v,,} rarely
appear in the neighborhood of the noise bounds. If we multiply {v,,} by two and pass it
through a soft-limiter, as we described before, then {0,,} stays in the neighborhood of the
noise bounds, $0.5, most of the time. Note that in this case the new {0,,} is still a color noise
as shown by its sample correlation coefficients in Fig. 3.15. However, the performance of
SM-SA is excellent as shown in Fig. 3.16, while RLS is still affected by the autocovariance.
70
0.3 -
0.6 -
0'4 _____________________ _Irye. .. _ _ _ _ _ _I
02 <
0| .4
-02 RLS -
-0.4 SM-SA .
200 400 600 300 1000
Figure 3.13. Estimates of 03 by SM-SA and RLS when the noise is color.
falls in the given interval
Number of time
Figure 3.14. Distribution of a color noise at each amplitude interval.
71
Sample correlation coefficients Lp to time difference - 4O
0.8 .
0.6 ~
0.4 ~
02 - -
o .
-O.2 - 1
-0.4 - «
-O.6I- -
03 ~ -
'10 1 1b 15 20 25 so as 40
time difference
Figure 3.15. Sample correlation coefficients of a noise sequence obtained by passing {0,,}
through a soft-limiter.
as Tare-— ' J
0.6 .
0.4 -
“I 'l ‘ ' l — i - ' ' - i
; SM-SA
02 ' 4
° RLS ‘
-02 -
-0.4» .
200 400 600 800 1000
n
Figure 3.16. Estimates of the parameter 03 when the color noise {v,,} is passed through a
soft-limiter. '
72
The above examples show that the whiteness of the noise processes is not important in
the performance of SM-SA, although it is important to the conventional RLS. The most
important factor for the convergence of SM-SA is the probability for which the noise visits
a neighborhood of the bounds. Thus, while SM-SA employs similar recursions to RLS, the
two algorithms have totally different convergence properties. The convergence properties
in RLS arise from the second moment properties; while the ones in SM-SA come from the
noise bounding assumption.
3.4 Independence and The Mixing Conditions
To generalize the convergence of SM-SA to colored noise and give a more rigorous treatment,
we first introduce the measure-preserving transformation T. Let (5,?) be a measurable
space consisting of a set 5 and o-algebra ? of subsets of S. Let P be a probability measure
defined on (5,?) so that (5, ? , P) is a probability space. In our case, if X is a real set
bounded by «7,
X = l‘fiH/fl’
then S = X°°, the collection of all real sequences (vo,v1, . . .,). Let T : (5,?) —> (5,?) be
a measurable transformation on (5,?, P), then T is said to be measure-preserving if and
only if P(T"1 A) = P(A) for all A 6 ? [10]. In order to simplify our discussion, we can take
the measure-preserving transformation T as a one-sided shifter,
T(‘vo, ‘01, . . .) = (1)1, 02, . . ..)
Note that P(T"A) = P(A) implies that P(T‘kA) = P(A) for k = 1,2,.... Thus
P{(uo,v1,...,u,,_1) 6 Ba} = P{(vk,vk+1,...,v,,+k_1) e Bu} for all n and k and n-
dimensional Borel sets Bu.
73
Definition 3.2 Let T be measure-preserving on (5, ? , P), T is said to be independent if
for all A,B E ?,
P(A n T‘"B) = P(A)P(B) Vn e N.
T is said to be mixing iffor all A,B E ?,
lim P(A n T‘"B) = P(A)P(B). (3.40)
7%..”
The mixing condition is a very reasonable assumption since it reflects the physical
characteristics of many common random processes. In this research, let {0,,} be the noise
input of the model in (1.5), then {v,,} satisfies the mixing condition if the dependency
between v,- and 0,- decreases to zero when |i — j| —5 00.
For most system identification problems, we assume that the PE condition on {x,,}
holds over all natural numbers. Since the SM-SA has a “selective updating” feature, we
should confine the consideration of the PE condition to the subset of {x,,} in which updates
take place. Since we do not know in advance which points will be used in the updates,
this requires that all the infinite subsets of {x,,} are PE. However, this problem can be
alleviated if the mixing condition is imposed on {0,,}. Consider the following corollary
given by Bitmead [17]:
Corollary 3.1 If x,, is such that E(x,,x,7,') is positive definite and either x,, is ergodic or
mixing, then there exists a constant 6 > 0 and a finite integer N such that
n+N
Z x,x,~T 2 51 > o (3.41)
i=n
holds infinitely almost surely.
74
In the present work, the upper bound of (3.41) is guaranteed by the stability assumption.
Note that all the subsequences of {x,,} are also mixing if {x,,} is mixing. Thus if B(xan)
is bounded below at the points which updates take place, then (3.4) is satisfied.
The next two corollaries serve as an extension of Theorem 3.6. First we consider the
independent case in which the mean of the noise process does not have to be zero and the
distribution does not have to be uniform.
Corollary 3.2 Let {v,,} be a noise sequence and let T be independent on (5, ?, P) according
to Definition 3.2. Given 6, let D¢+ = [fi— c,,/7] and D,. = [—\/7', —\/7'+ c]. If SM-SA
is employed and there exists a 6 such that for all n
P(vfl E De...) > 6, and P(vn E DC.) > 6,
then the volume of the ellipsoid converges to zero.
Proof: Consider the proof of Theorem 3.6. If the volume of the ellipsoid does not
converge to zero, then (3.36) must hold. Given 6’, by assumption there exists a 6’ such that
for all n
P(vn 6 D¢I+) > 6’, and P(vn 6 Dy-) > 6’.
1. Suppose bin-13%.. > 0. Then the probability that (3.36) holds is less than 1 - 6’.
2. Suppose 9:..-1xm. < 0. Then the probability that (3.36) holds is also less than 1 —6’.
If we change the quantity 235$ in the proof of Theorem 3.6 by 1 - 6’, then we have
0 e I n _
nlLlIgoP(AnnAn_1“An_gn...A1)Snango(1—6) -0
and thus the ellipsoid converges to zero. 0
75
The key point to the proof of the convergence of the parameter set is the independence
between vfl and 6,7,1] x,,. In order to extend the convergence property from independence
to mixing condition, SM-SA has to be modified. The reason is that for the original SM-
SA with the ARX model, x,, contains yn_1, the output at time n - 1, thus we cannot
separate the dependency between vn and 611x" unless {v,,} is independent. However, if
we modify the SM-SA so that x,, is not changed unless an update takes place, provided
that for large n the time difference between two updates is large, then we can apply the
mixing condition to show convergence. This is possible since x,, depends only on the events
up to the previous update. This modified version is identical to the SM-SA algorithm
r
except for the regression vector x,,. In SM-SA, x,, = [y,,_1, . . .,yn_,,u,,,u,,_1, . . .,u,,..q],
. . . . T
while m the modified algorithm, x,, = [y;,,,. . .,y¢,,_P,u,,,u¢,‘,. ..,u,,_g] where y", u,, are
the same as in (1.5), and {t,} is a subsequence such that c,,. < 0 and t,, < n 5 “+1. ,
Note that in this situation, the algorithm is identical to RLS with a decimated input. The
corresponding decimator has a time-varying sampling rate, say D“, which is controlled by
the data selection process. Generally, if y,, is wideband, then D" is large and the data
picked are usually sparse.
‘ All the convergence proofs developed previously for SM-SA are still applicable to this
modified algorithm, since those convergence properties are not affected by this modification.
1”"
The following corollary applies to this modified algorithm.
Corollary 3.3 Let {v,,} be a noise sequence and let T be mixing on (5,?, P) according to
Definition 3.2. Given 6 and N, let D9,. = [fi- 6,‘/‘—7-] and D,. = [—fi, -\/7+ 6]. If the
modified SM-SA is employed and there exists a 6 such that for all n
P(vn 6 DC...) > 6, and P(vn 6 DC-) > 6,
and there exists an M > N such that there are infinite number of interval [n — 1,n - M]
on which no update takes place, then the volume of the ellipsoid converges to zero.
76
Proof: Consider the proof of Theorem 3.6. Again let {m,,} be the subsequence such that
7 - 53,,” > 6 for some 6 > 0. If the volume of the ellipsoid does not converge to zero, then
(3.36) must hold. For convenience, we rewrite (3.36)
Iv..." — airlxmnl < y; — e' where e": 72,, (3.42)
Given 6’, then, by assumption, there exists a 6’ such that for all n
P(vfl 6 Deli.) > 6’, and P(vn 6 Der-) > 6’.
1. Suppose airlxmfl > 0, then the probability that (3.42) holds is less than 1 — 6’.
"v
2. Suppose bin-13%.. < 0, then the probability that (3.42) holds is also less than 1 — 6’.
Again let A,, be the event that (3. 42) holds at time m,,, then
7"? Alli/h/AEE at”) : fillV' /
l
F(A,,) < P(- -,/~7+e’0) P(0T0_1xm,,>0)
+P (-\/’7 < v"... < f- 6’ l Emu-1x11». < 0) P (6711;.-1an < 0) . m"
'H-————-—-——-~ly-—~—/'<
. k 77‘
‘ .C
l/l/l‘~{jl
Note that {6,1,4an > 0} does not depend on the events which take place after the
ill.
previous update. Let the time difference betweenqi and the time index of the previous
update be 1:. Define W,,_k = A{0,,,"_1xm > 0} and Xn- - A{- f+ 6’ < v,,," < J7}. Then
WM)“ X" E ? and P(Xn) al— 6’. By the definition of mixing condition in (3.40),
an; P(X, n W,,-;,) = P(Xn)P(W -,). (3.43)
Suppose P(W _k) is not zero. Then (3.43) can be written as
m P(Xn 0W —k)
k—ooo P(Wn_k)
=klim P(Xn | W,,_,,)- _. P(Xn ).
Let 0 < f < 6’, then there exists an N such that
lP(Xn I Wn-N)—P(Xn)l<£r
77
and, hence,
P(Xn) - E < P(Xn | W -N) < P(Xn) + E. (3.44)
. "dn’A/v’, IMJ‘I
Note that (3.44) holds only if no update takes place on - N, n}.
Due to the slow convergence of the ellipsoid for large 11, most incoming data are rejected.
By assumption, there exists a positive integer M Z N such that no update takes place on
{n — 1,n — 2, . . (,1:— M}. Thus (3.44) is satisfied and
F(A,,) g P (-,/7+ 6 < v,,, < J? | birlxm” > o) P (614me > o)
+P (—fi < v,,,” < fi- 6’ | 6:“,an < 0) P (6143:...“ < 0)
(P(—,/~7+ c’ < v,,,” < «7) + 6) P (6141:...” > 0)
+ (P(-fi < v,,.” < fi— 6’) + 5) P (bin-1x77»: < 0)
1-€+£<1,
IA
Q ( i
‘\,
IA
and
P(A,|A,._N) ” »'
IA
P (—,/~7+ a < v,,," < (m {39;qu > 0} n A,,-N) P ({éiflxm > 0} | 11..-”)
P
+P (-,/j< v,,." < fi- 6’ | {gin—139m. < 0} n A,,-N) ({éZn-lxmn < 0} | A,,-IV)
/
i 1.
IA
1-€+£
Therefore,
P(An n A,,-N) = P(AnlAn-N)P(A,,_N) g (1 - 6’ + ()2 < 1.
Note that
P(An n A...l n ---n A,,-N) g P(An n A,,-N) g (1 — 5' + ()2 < 1,
78
and
klgrgo P(An-HcN n An-HcN—l “-0 An)
IA
#520 P(An-HcN n An+(k—1)N n -° ' ° 0 An)
3 klim(1—6'+£)"=O.
That is, the probability that the ellipsoid converges to zero is one.
Chapter 4
Simulation Studies
Set-membership based algorithms have shown great potential in applications for many signal
processing and control systems problems involving parametric modeling [25, 26, 28], such
as speech processing, image coding, neural networks, and adaptive control. This chapter is
concerned with simulation studies using the SM-SA algorithm with optimal and suboptimal
strategies for the properties discussed in Chapter 2 and 3.
The simulations performed in this chapter are designed for those noise processes which
are commonly seen in engineering applications. These simulations not only serve as a test
for theoretical analysis, but also provide heuristic descriptions for those phenomena which
cannot ‘ be proven by rigorous mathematics even though they work for most, engineering
applications. We will see that the SM-SA performs better than conventional RLS in many
problems if the noise bound can be correctly estimated.
In Section 4.1 we use an AR(3) time-invariant model to examine the general properties
discussed in Chapter 3, including the behavior of the estimates, the volume of the ellipsoid,
the estimation error, and the resemblance of SM-SA to stochastic approximation. Section
4.2 deals with a time-varying system derived from a speech signal. Comparisons to the
conventional RLS are also made. Section 4.3 is concerned with the suboptimal test when
the model is time-invariant. Finally, a more general case in which the noise bounds are
over-estimated for the time-varying system is considered in Section 4.4.
For all the simulations in this chapter, the models are driven by the noise process {Sn}
which is the name representing all the noise processes summarized in Table 4.1. Note that
79
80
5,, Characteristics of the noise processes
wfl IID white noise with distribution U [—0.5, 0.5]
v,, Obtained by g-(wn + 0.1)
2,. Zero mean binary Bernoulli sequence obtained from {wn}
t,, Biased binary Bernoulli sequence obtained from {wn}
Table 4.1. Noise processes used in the simulations in this chapter.
the sequence {2"} in Table 4.1 is obtained by
0.5 wfl Z 0
Zn:
—0.5 112,, < 0
and {t,,} is obtained by
0.5 10,, 2 -0.081
t,,:
-0.5 ton < —0.081
4.1 Performance of SM-SA for Time-Invariant Systems
In this section, an AR(3) model given by
yn = Glyn-1 + (Wyn—2 + “33111—3 + Sn (4.1)
is considered. This model has three poles at 0.8 :l: 0.2j and 0.5 when a1 = 2.1,a2 = —1.48,
and a3 = 0.34. For these simulations, a set of 50,000 data points is generated to identify
the model parameters using {5...} with
5,3 5 0.25 (4.2)
Note that Sfl should be replaced by the random sequences given in Table 4.1.
81
4.1.1 SM-SA with Pseudo-Random Excitation
In this section, the model (4.1) is driven by a pseudo random sequence {wn} with mean
fl = -8.9 x 10", which is obtained using a C-language version of the machine-independent
random number generator URAND describe in [11]. Only 1.58% of data points are selected
by SM-SA. Figure 4.1 shows the non-increasing volume measure 11,, on a logarithmic scale
for the model in (4.1). Figure 4.2 shows that the estimation errors Isnl falls into the dead
zone [0, 0.5]. The first 1000 SM-SA estimates can be seen in Fig. 4.3. We observe that the
estimates are “almost” consistent and converge at a reasonable speed with only 1.58% of
the total data points used.
Figure 4.4 illustrates the weighting sequence {A;} averaged over six experiments. It
shows that the averaged {A;} is type l/n sequence, which implies that it satisfies Robins-
Monro’s conditions for convergence.
1o1o
10-10 _
Figure 4.1. The logarithmic plot of 11,. versus time n for the linear time-invariant AR(3)
model with pseudo-random excitation.
For LSE algorithms, the estimates are consistent when the noise excitation is white. If
82
Figure 4.2. The logarithmic plot of lenl versus time n for the linear time-invariant AR(3)
model with pseudo random excitation.
2 " "' I ' I ' 'I -
1s «
1 ..
as _ __ - _ ____ -
o .
-os .
.1 4
-15 -. --- T - - . ----.--
200 400 n soo 800 1000
Figure 4.3. First 1000 estimates of the parameters in the AR(3) model obtained using
SM-SA and a pseudo-random excitation, where a; = 2.1, a; = -1.48, and 03 = 0.34.
83
4
10 _r '3
1O6r 1
" J A L4 .aaal A A 4 A AH
1° [0 A A A AAAA 1 A 2 3
10 1O 10 10
Figure 4.4. The relationship between logarithms of time and weight.
we let 1),. = %(wfl -- 0.1), then the noise bound given in (4.2) still holds for {v,,} but the
sample mean is now —-113. Figure 4.5 compares the performance of RLS and SM-SA. It is
seen that RLS is severely affected by the bias of the noise excitation. The bias in SM-SA,
however, is very small. Note that only 0.89% of the data points are used by SM-SA.
4.1.2 AR Models with Binary Bernoulli Excitation
SM-SA algorithm is derived from bounded noise constraints. The weight A; is optimal in
the sense that the feasible set is minimized given that the previous information is known.
We may view the size of the feasible sets as a “confidence interval.” If the optimal weight
exists most of the time, it is expected that the performance of SM-SA is much better than
RLS because of the optimal weights. From the analyses in the previous chapter, we know
that optimal weight A; exists when the magnitude of the noise is close to the bound. In
this simulation, a zero-mean Bernoulli process {2”} is employed to drive the system. Let
7 = 0.25, then {2"} always hits the bounds. 80.11% of the data points are used. This is
not surprising because {2"} is always at the boundaries and thus updating is carried out
84
True value a1 -2. 1
2.4 a . v w
2.35 - ‘4
2.3
2.25
2.2
2.15
2.1
2.05
2
x10‘
True value e2--1 .48
' 1
'1 .2 f r I f
-1 .3 ‘
-1.4 -
SM-SA
-1 .5
-1 .8
-1 .7 ..
1
'1'8 2 3 4 5
" x 10‘
True value 33-0.34
0.6 Y r U ‘I
0.5 ~ «
0.5 - J
I
0.45 I q
. 1 RLS
0.4 I H _ v -1
035‘ 4
0.3 SM-SA ~
0.25 -
" x 1 0‘
Figure 4.5. The estimates of model parameters by RLS and SM-SA for biased noise se-
quence.
85
most of the time. One of the advantages of this extensive updating is that the volume of
the ellipsoid converges very fast to a very small number as indicated in Fig. 4.6.
Non-increasing volume when binary Bemoulli sequence is used
I v Vfivvvv' *7
Figure 4.6. The logarithmic plot of pa versus time n when a binary Bernoulli sequence is
used.
A comprehensive comparison of SM-SA and RLS is shown in Fig. 4.7.
If we replace {2"} by a non-zero mean Bernoulli sequence {t,,}, then, according to Fig.
4.8, the performance of SM-SA with 78.16% of the data points selected is not affected by
the non-zero mean {t,,}. However, the RLS estimates are clearly influenced.
4.2 Performance of SM-SA for Time-Varying System
In this section, a speech model with time-varying coefficients
ya = 61(n)yn-1 + b2(n)yn—2 + b3(n)yn-3 + wn (4-3)
86
True value a1 -2. 1
2.137 f l I r v
2.12 RLS ‘
2.1 1 -
SM-SA
2.1 ‘r ‘. ‘
2.09 .
2.08 .
2'07 1 1 5 2 2.5 3
x 10‘
True value a2--1.48
'1 .42‘ t r r f I
'1 .43 l "'
a; A , I am
RLS j
4 '52 0.5 1 1.5 2 2.5 ' 3
x 10‘
True value a3-0.34
0.36 f l I I r
0.355 I _
SM-SA
.4
1.5 2 2.5 3
x 1 0‘
Figure 4.7. First 30,000 estimates obtained using SM-SA and RLS when a binary Bernoulli
sequence is used.
87
True value a1 -2. 1
RLS
2'1 1 I
2.1 1: SM'SA
2'09 0 5 1 1 5 2 2 5 3
x 10‘
True value a2--1.48
'1 .46 I T I T I
-1 48 J: #SILSAI
I
-1.5 I -
‘1 .52 I q
-1 .54 I -1
RLS
'1 .56 cl
'1'“ 0.5 1 1 5 2 2 5 a
' x 10
True value 113-0 .34
0.4" I I I I I
0.39 RLS -
0.38 -
0.37 -
.1
4 s22; l
.I
1 1.5 2 2.5 3
x 1 0‘
Figure 4.8. First 30,000 estimates obtained using SM-SA and RLS when a biased binary
Bernoulli sequence is used.
88
is considered. Simulations for speech signals have been performed by Odeh [28]. Here we
use a similar experiment just to examine the performance of SM-SA. True parameters in
this model are derived using short-term linear prediction analysis [37] of order two with a
256-point Hamming window and Levinson-Durbin recursion (see [37]) on an utterance of
the word “four” by an adult male speaker. The acoustic waveform is shown in Fig. 4.9.
The data were sampled at 10kHz after 4.7kHz low-pass filtering. While more meaningful
The acoustic waveform of the word "lour‘
1 000 I I r I I
800-
600'
400"
200'-
400'-
-600'
soc» -
-1 000° 1 000 2000 3000 4000 5000 ' 6000
11
Figure 4.9. The acoustic waveform of the word “four”.
analysis of speech would involve model orders of 10-14 [37], this small number of parameters
is used here so that the results are easily illustrated [28]. The true parameters for the word
“four” are shown in Fig. 4.10. A set of 5376 data points were generated by driving this
model by 11:". In this simulation, only 2.15% of data points are used by SM-SA with good
performance. Simulation results by using SM-SA and RLS algorithms can be seen from
Figs. 4.11 to 4.12. Similar results by using SM-WRLS can be seen in [28] in which 1.86%
of data points are used. In each figure there are two curves, one for the true parameters as
reference, the other for the estimates obtained by using SM-SA or RLS.
89
-1 _ q
-1.5 - «
'2 1000 2000 3000 4000 5000
(a)
3 . . . . .
2.5 ~ -
2 _ .
1.5 - ' ~
1 . -
0.5 - .
0 - _
-0.s -
-1 . q
-1.5 - -
'2 1000 2000 @3000 4000 5000
Figure 4.10. The “true” parameters for the word “four”: (a) b1(n) (b) bg(n).
90
The power of the SM-SA algorithm is evident in this simulation. Upon comparing the
estimates obtained by SM-SA and conventional RLS algorithms, we find that SM-SA shows
better tracking capability than conventional RLS. For most algorithms, there is a trade-
off between improved computational complexity and improved performance. For SM-SA,
however, improved performance comes along with improved computational efficiency. More
discussion about SM-SA in time-varying system is left in section 4.4.
4.3 Suboptimal Test
More computational saving can be achieved by employing the suboptimal test. In Section
3.2.2 we have shown that many convergence properties of the suboptimal SM-SA are similar
to the “optimal” SM-SA. In this section, simulations and comparisons to the conventional
RLS are performed to verify the performance of SM-SA and its convergence properties.
Only the time-invariant AR(3) model given in (4.1), will be examined. The reason we do
not apply suboptimal test to speech signal is explained in Section 4.4.
4.3.1 AR Models with Pseudo-Random Excitation
First the model (4.1) is driven by {wn}. Only 0.46% of data points are used by the subop-
timal strategy. Compared with 1.58% in “optimal” SM-SA, this is a significant saving. But
more importantly, the checking procedure is 0(m). Figure 4.13 shows the non-increasing
volume measure pa in logarithmic scale. Fig. 4.14 shows that the estimation errors [5"]
also falls into the dead zone [0,05]. The first 2000 estimates can be seen in Fig. 4.15. We
see that the estimates are consistent and converge at a reasonable speed with only 0.46%
of total data points used.
Figure 4.16 illustrates that {A;} is a 1/n-type sequence, satisfying Robins and Monro’s
conditions for convergence.
Only 0.19% of data are used if {v,,} is the driving process, Note that the same ex-
periment performed in Section 4.1.1 used 1.03% of the data points. Figure 4.17 compares
the performance of RLS and SM-SA. We see that the effect of biased noise on suboptimal
SM-SA is much less than it is on RLS.
91
CD
estimates
0.5 1
0 d
-0.5 q
-1 I- .4
-1.5 P -
'2 1000 2000 3000 4000 5000
(a)
3 I I I r I
2.5 - .
estimate
1 000 2000 (b )3000 4000 5000
Figure 4.11. Estimates of bl(n) for the word “four” using (a) SM-SA (b) RLS.
92
2.5 - ‘
2 r '
1.5 r ‘
1 r ‘
0.5 ' ‘
0 estimates -
-0.5
-1 (- true "
.15 . .
'2 1600 2000 ao'oo 4000 so'oo
(I)
25- I
2 r .
1.5 - .
1 '- .
0'5 ' estimate ‘
0 q
05 -
'1 P true I
-1 5 ~ J
.2 .03. 2000 (1)35“ .50. 5000
Figure 4.12. Estimates of b2(n) for the word “four” using (a) SM-SA (b) RLS.
93
Non-increasing volume using sub-optimal test
' ' ""'I V ' """I ' ' ‘ ""'I P' I """I "' ‘ '
p
p
p
5
10 r
L
p
0
10 P
y
p
I
I
-6I
1O -
p
A_ A AAAAAAJ. A A AAAAAAI A LAAU‘A‘ A A A AAAAAl A A A
O 1 2 3 4
10 10 10 10 10
n
Figure 4.13. The logarithmic plot of 11,, versus time n for the AR(3) model when the
suboptimal test is used.
Estimation error for sub-optimal bet
1o . - - "m, - - - - - a - -
Figure 4.14. The logarithmic plot of |sn| versus time n for the AR(3) model.
94
1 - -- true parameters , s
-0.5 - “
-1 . ..
-1.5
200 400 600 800 1000 1200 1400 1600 1800 2000
n
Figure 4.15. First 2000 estimates of the parameters using SM-SA, where 01 = 2.1, 02 =
-1.48, and 03 = 0.34.
Figure 4.16. The relationship between logarithms of time and weight.
95
True value a1-2.1
2.4 I I I I
2.35 ‘
2.3 .1
2.25 ‘
2.2 ' ‘
1 _ FlL§
2.15 -
2-‘ SM-SA I
2.05 ‘
" 1 2 3 4 5
x 10‘
True value a2--1 .48
'1 .2 I 1 I I
-1 .3 r '
'1 .4 .1
SM-SA .
-1 .5 g . ..
'1 .8 . . hmA WA RLS -
-1.7 1
-1.3 . 1 ‘ ‘ ‘ '
1 2 3 4 5
n x 1 0‘
True value a3-0.34
0.6 I I I I I
0.55 .
0.5 -
0.45 -
II ‘ RLS
0.4 l ’ I -
0.35 -
SM-SA
0.3 r
0.25 -1
0.2 l 1 1 1
1 2 3 4 5
n
x 1 0‘
Figure 4.17. Estimates of parameter using RLS and SM-SA for the suboptimal test.
96
4.3.2 AR Models with Binary Bernoulli Excitation
Let 7 = 0.25 and let the AR(3) model is driven by {2”}. In the simulation for suboptimal
test, 49.86% of data points are used. The corresponding experiment for optimal SM-SA
used 80.11% of data points.
Figure 4.18 compares the performance between SM-SA and RLS. We find that SM-SA
shows excellent performance in the convergence of estimates even for the suboptimal test.
If we change the noise process to {t,,}, Fig. 4.19 shows that the performance of SM-SA
is not affected by the bias of excitation, which is similar to the result for the “optimal”
SM-SA. In this case, 48.91% of data points are used, while 78.16% data points are used in
the corresponding optimal SM-SA.
4.4 More Studies on Time-varying Systems
SM-SA algorithm is derived assuming a time-invariant system. In the previous simula-
tion, we have seen that SM-SA has some tracking ability. The adaptability of SM-WRLS
algorithm have been studied in previous work [28]. Although we do not intend to pur-
sue theoretical analysis of SM-SA for adaptivity of SM-SA, we can make some heuristic
descriptions. Consider the weighted covariance R(n)
R(n) = (1 — A;)R(n — 1) + 1;):an
The (1 — A;,) term in this equation can be viewed as a forgetting factor for the algorithm.
The smaller the quantity A;, the lesser the impact of the current point. By controlling the
magnitude of the weights, the algorithm can decide the importance between the new and
old data. This is why SMoSA can trace some time-varying systems. In this situation, the
ellipsoid cannot shrink too fast, and since the parameters are always changing, the ellipsoid
cannot converge to zero either. A better way to estimate system parameters is to “over-
estimate” the noise bound because we can prevent the ellipsoid from shrinking to a very
small set. For the speech signal we discussed in this chapter, if we “over-estimate” the noise
97
Tme value a1 -21
2.137 I I I I I
SM-SA '
2'07 0.5 1 1 .5 2 2.5 3
" x10‘
True value a2--1 .48
'1 .42 I I I I I
- A, 8M5
-1 .51 RLS ‘
'1 '52 ’ 0.5 1 1.5 2 2.5 3
" x 1 0‘
True value a3-0.34
0.36 I 1 f I If
RLS
1- - 1 2 f
SM-SA
4
0.315 ] -
0'31 0.5 1 1 .5 2 2.5 3
" x10‘
Figure 4.18. First 30,000 estimates obtained by SM-SA and RLS when a zero mean binary
Bernoulli sequence is used and suboptimal test is employed.
98
True value a1 =21
I
2.09
‘0
x10
True value a2a-1 .48
I I I I
-1.46
-1.48
-1.5
-1 .52 r
-1.54
-1.56
-1 .58
True value a3=034
T I I I
0.39
0.33 RLS
0.37
0.36
0.35
0.34 — S—uzM'SA
.—
h
x10‘
Figure 4.19. First 30,000 estimates obtained by SM-SA and RLS when a bias binary
Bernoulli sequence is used and suboptimal test is employed.
99
bound so that the bound be chosen as
3.0
I/\
O
0"
but the real bound is
103, S 0.25
we will find that the algorithm has better tracking ability as shown in Fig. 4.20. In this
case 4.09% of data points are used, which is more than before.
Note that SM-SA is developed for time-invariant systems. If it works in a time-varying
system, some assumptions may be violated, for example 5,, may become negative. Thus it
is not appropriate to apply the suboptimal test to time-varying system since a negative 6;,
may not imply a negative c,,. Some simulations show that if we apply the suboptimal test
to time-varying systems, the results are often not encouraging.
100
C1)
1WD 2000 3000 4000 5000
(a)
2.5 r -
2 - -
1.5 ~ -
1 .. .-
0.5 r -
o .- .1
-0 5 M
true
-1 - I 'L v ‘1'" HJ—i
estimate
-1.5 '- 1
Figure 4.20. Estimates of (a) b1(n) (b) 02(11) by using SM-SA algorithm for the word “four”
when noise bounds are over-estimated.
Chapter 5
The Convergence Properties of
UOBE Algorithms
UOBE algorithm, a general class of OBE algorithms, is based on generalized WRLS in
which very broad classes of “forgetting factors” and data weights may be employed. Histor-
ically, this research on the SM-SA algorithm, and that of Krunz on the “Dual-SM-WRLS”
algorithm [33, 34], precedes and contributed to the development of UOBE. In this chapter,
we will generalize the convergence properties of SM-SA to all UOBE algorithms and show
the significant implication of this research to broader applications. A brief review of UOBE
is given here. Details are found in [38, 39].
5.1 The UOBE Algorithm
Given the system in (1.5) and the noise bound in (1.6), the hyper-ellipsoid set C(n) associ-
ated with UOBE is defined by
0(n) = {o no — 0.0T 07“)“: - a.) s 1}
where 0,, is the WRLS estimates obtained by minimizing
1 fl
V(0) = ; Z m,,,,(yt. — £9)?
7:1
101
102
in which
a,,wn_1,, 1' S n — 1
“’71 r —
fin 1' = n
and x,, is a scalar quantity,
5,, = 010000,. + i "’n... (mg—,3)
1'=1
C(n) is fundamentally defined as the sum of the weighted outer products,
C(n) = a,,C(n — 1) + flnx(n)xH(n)
Different algorithms in UOBE are distinguished by the weighting sequences {a,,} and .
{flu}, in conjunction with the optimization criterion employed in selecting them. Three
criteria have been applied to the optimization process:
Optimization Criterion 1 Minimize the determinant of the inverse ellipsoid matrix,
11,, {B(n)} é det{nnC'1(n)}.
Optimization Criterion 2 Minimize the trace of the inverse ellipsoid matrix,
_ A —1
(14901)} = tr {MC (7%)}-
Optimization Criterion 3 Minimize the parameter Kn.
Criteria 1 and 2 were first suggested by Fogel and Huang [14], while criterion 3 was used by
Dasgupta and Huang [20]. In the single-output case in which C(n) is clearly interpretable
as an ellipsoid in 52'", 11,, {C(n)} is proportional to the square of the volume of the ellipsoid,
while pt{fi(n)} is proportional to the sum of squares of its semi-axes. Criterion 3 has been
used in conjunction with a specific weighting strategy to achieve a rigorous proof in certain
sense [20].
Let us adopt the policy of writing the weights an and 6,, as functions of a single param-
103
eter to be optimized at time n, say A,, so that
a,,(An)
fin = fln(An)
0" (5.1)
where {an} and {0,,} should now be considered to be sequences of functions. If we choose
a,,(An) = l and fln(z\,,) = A,,, then UOBE is exactly the SM-WRLS algorithm. SM—SA is
obtained by choosing a,,(An) = (1 - A,,) and 13710») = A,, and optimization criterion 1. The
extension of the optimization process of SM-SA to criterion 2 is left as a future work.
5.2 Convergence Properties of UOBE
The most important theorem for the theoretical framework of UOBE is given as follows
[38, 39]:
Theorem 5.1 Consider a UOBE algorithm in which volume or trace is to be minimized.
Then the following are independent of the choices of function sequences {an(/\n)} and
{16n(’\n)}-'
1. the sequence of measures of optimality ({pv(n)} or {p¢(n)});
2. the data points selected (times for which there exists A; > 0);
3. the parameter estimate, 0,,.
However, in a UOBE algorithm with n minimization, none of these items is independent of
the sequence {an()\,,)} and {fln(An)}.
Theorem 5.1 concludes that any UOBE algorithm with volume minimization, regardless of
the weighting policy, will “behave” similarly in three senses. By using this theorem, we can
extend the convergence properties from SM-SA to UOBE:
104
Corollary 5.1 Consider the UOBE algorithms in which volume is to be minimized. Let
{t,,} be a subsequence such that update takes place at time t,,. If there exists an N 6 N and
91, 92 > 0 such that for all n
n+N
911 < Z x,,‘xi < 921, (5.2)
k=n+l
then
"112;, "0,, "' 971-1" 3 O-
In addition to (5.2), if the noise process {v,,} is independent, and given 6 there exists a 6
such that for all n
P(vn 6 DC...) > 6, and P(vn 6 De.) > 6,
where Dc... = [\fi — €,\/‘7] and D¢_ = [—‘/'7,-\/'7 + 6], then the volume of the ellipsoid
converges to zero and the estimates obtained using any UOBE algorithm converge to the
true parameters.
This corollary is very important since, through the research on SM—SA, the convergence
properties of all UOBE algorithms are established? This implies that SM-SA has much
broader implications for the entire class of UOBE algorithms.
’The original OBE paper by Fogel and Huang [14] is sometimes misunderstood to indicate the convergence
of the bounding ellipsoid to a point under ordinary conditions on {0...} (see [38]). Now the convergence
properties are proved.
Chapter 6
Conclusion
6.1 Concluding Remarks
6.1.1 Summary of Landmark Developments of SM-Based Techniques
Set-membership-based algorithms are developed to estimate the parameters of a linear
system in which the noise perturbance is additive and bounded.
After the first publication of the OBE algorithm by Fogel and Huang [14] in 1982,
SM-techniques attracted the attention of many research groups in the signal processing
field. However, the lack of convergence proof has been a serious weakness for the OBE-like
algorithms, although it shows good performance in many applications..
In order to remedy the convergence problem, Dasgupta and Huang proposed a modified
OBE [20] in 1987. Instead of minimizing the volume of the ellipsoid, they minimize the
upper bound of the set. This algorithm is convergent in the sense that the optimal weight
and the difference of the estimates converge to zero. However, it has been criticized by some
researchers [30, 38, 39] due to the lack of geometrical meaning.
The Adaptive Signal Processing and Speech Processing Laboratories at Michigan State
University have been pioneers in the research on SM-based techniques. The problem of the
tradeoff between the interpretability and convergence led to the study of SM-SA. During
the course of this research, it is found in parallel work that almost all OBE algorithms, can
be combined into a unified frame work called UOBE, and, SM-SA is just a special case of
UOBE obtained by choosing a,,(An) = 1 - A,, and fln(An) = A,,, and volume minimization.
105
106
The name SM-SA comes from the structural resemblance to the stochastic approximation
method.
6.1.2 Contributions of this Research
Selective updating so as to improve the computational load is the common feature among all
the members in UOBE. In contrast to RLS estimation in which the parameter estimates are
updated at every iteration, UOBE updates the estimates only if the incoming data contains
sufficient information. In addition to this advantage, the major contribution of this work is
the establishment of an algorithm which is both interpretable and convergent.
The interpretability of SM-SA comes from the optimization process in which the optimal
weight is found by minimizing the volume of the ellipsoid defined by
_ A _1
#u{9(n)} = det{KnR (71)}-
The convergence properties come from the boundedness of the covariance matrix. In
this work, we have shown that:
1. The optimal weight A; converges to zero.
2. The squares of the estimation error sequence fall into a dead zone given by [0,73,]
asymptotically.
3. If the noise sequence {v,,} is independent and the probabilities are non-zero in the
neighborhoods of the noise hounds, then the volume of the ellipsoid converges to zero.
4. If the noise process is mixing, then under a slight modification of SM-SA, the volume
of the ellipsoid converges to zero.
More importantly, through the analysis on SM-SA, the convergence of estimates presented
in Chapter 3 can be extended to all UOBE algorithms.
The other contribution is the illustration of the central difference between SM-SA and
RLS. While both algorithms have the same recursive formulation, they show completely
different behaviors. The reason is because the weight of SM-SA at time n is a random
107
variable which depends on all the previous and present noise samples, while the weight of
conventional RLS or WRLS is deterministic. We have shown the difference in Chapters 3
and 4 by theoretical analysis and simulation studies.
6.2 Future Work
This research is concerned with the theoretical analysis of a special case of UOBE, SM-
SA. While we have seen in Chapter 4 that SM-SA performs well in tracking time-varying
parameters, we should not rely on an algorithm, which is developed in a time-invariant
setting, to perform well in a time-varying environment. It is of practical interest to develop
the adaptive SM-SA since most engineering applications deal with time-varying models.
Secondly, we have shown the convergence of SM-SA only on volume minimization, while
it is believed that the convergence properties under trace minimization are similar, it is still
necessary to provide a rigorous proof.
Finally, it is often difficult to get an accurate estimate for the noise bounds. In order not
to violate the bounded error assumption, it is natural to give conservative noise bounds. As
we have discussed, this often leads to a bad estimates. One way to solve this problem is to
use soft noise bounds in which the bounded noise assumption may be‘violated sometimes.
Thus the author suggests the study of soft-bound SM-SA algorithm as a further work.
In conclusion, the following topics are proposed for future research:
1. To develop an adaptive SM-SA algorithm.
2. To establish the convergence proof for SM-SA under the trace minimization.
3. To extend the model from ARX to ARMAX.
4. To develop a soft~bound SM-SA algorithm.
APPENDICES
APPENDIX A
Infinite Product
Definition A.1 Given an infinite product [1:11 um let p, = [12:1 uh. If no factor a,, is
zero, we say the product converges if there exists a number p 96 0 such that {pa} converges
to p. If {pa} converges to zero, we say the product diverges to zero.
The following theorem is given without proof [9]:
Theorem A.l Assume that each a,, 2 0. Then the product [[(1 — a,,) converges if, and
only if, the series 2a,, converges.
108
BIBLIOGRAPHY
BIBLIOGRAPHY
[1] H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat,
vol. 22, pp. 400-407, 1951.
[2] J. Blum, “Multidimensional stochastic approximation method,” Ann. Math. Stat., vol.
25, pp. 737-744, 1954.
[3] N.N. Krasnovski, “On the theory of controllability and observability of linear dynamic
systems,” Prikl. Mat. Mech., vol. 28, no. 1, 1964.
[4] RC. Schweppe, “Recursive state estimation: Unknown but bounded errors and system
inputs,” IEEE Trans. Automatic Control, vol. AC-13, pp. 22-28, February 1968.
[5] H.S. Witsenhausen, “Sets of possible states of linear systems given perturbed observa-
tion,” IEEE Trans. Automatic Control, vol. AC-13, pp. 556-568, October 1968.
[6] P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968.
[7] D.P. Bertsekas and LB. Rhodes, “Recursive state estimation for a set-membership
description of uncertainty,” IEEE Trans. Automatic Control, vol. AC-16, pp. 117-128,
April 1971.
[8] J .M. Mendel, Discrete Techniques of Parameter Estimation: The equation Error Far-
mulation, Marcel Dekker, New York, 1973.
[9] T.M. Apostol, Mathematical Analysis, Addison-Wesley, 1975.
[10] RB. Ash and M.F. Gardner, Topics in Stochastic Process, Academic Press, 1975.
[11] G.E. Forsythe and M.A. Malcolm, Compute Methods for Mathematical Computations,
Prentice-Hall, 1977.
[12] L. Ljung, “Analysis of Recursive Stochastic Algorithms,” IEEE' Trans. Automatic Con-
trol, vol. AC-22, pp. 551-575, October 1977.
[13] E. Fogel, “System identification via membership set constraints with energy constrained
noise,” IEEE Trans. Automatic Control, vol. AC-24, pp. 752-758, October 1979.
[14] E. Fogel and Y.F. Huang, “On the value of information in system identification -
bounded noise case,” Automatica, vol. 18, pp. 229-238, 1982.
[15] B.D.O. Anderson and C.R. Johnson, J r., “Exponential convergence of adaptive identi-
fication and control algorithms,” Automatica, vol. 18, pp. 1-13, January, 1982.
109
110
[16] L.Ljung and T. Séderstréim, Theory and Practice of Recursive Identification, Cam-
bridge, MA: MIT press, 1983.
[17] R.R. Bitmead, “Persistence of excitation conditions and the convergence of adaptive
schemes,” IEEE Trans. on Information Theory, vol. IT-30, No. 2, pp. 183-191, March
1984.
[18] A. Papoulis, Probability, Random Variables and Stochastic Processes, (2nd edition),
McGraw-Hill, 1984
[19] G.C. Goodwin and KS. Sin, Adaptive Predition, Filtering, and Control, Prentice-Hall,
1984
[20] S. Dasgupta and Y.F. Huang, “Asymptotically convergent modified recursive least-
squares with data-dependent updating and forgetting factor for systems with bounded
noise,” IEEE Trans. on Information Theory, vol. IT-33, pp. 383-392, May 1987.
[21] T. Soderstrom and P. Stoica, System Identification, Prentice-Hall, 1988
[22] K.J. Astriim and B. Wittenmark, Adaptive Control, Addison-Wesley, 1989
[23] J .P. Norton, “Recursive computation of inner bounds for the parameters of linear
model,” Int. J. Control, vol. 50, pp. 2423-2430, 1989.
[24] J .R. Deller, Jr. and S.F. Odeh, “Implementing the optimal bounding ellipsoid algorithm
on a fast processor,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89,
Glasgow, vol. 2, pp. 1067-1070, May 1989.
[25] J .R. Deller, Jr., “Set-membership identification in digital signal processing,” IEEE
ASSP Magazine, vol. 6, pp. 4-22, October 1989.
[26] J .R. Deller, Jr. and T.C. Luk, “Linear prediction analysis of speech based on set-
membership theory,” Computer Speech and Language, vol. 3, pp. 301-327, October
1989.
[27] R.A. Decarlo, Linear Systems, Prentice-Hall, 1989
[28] S.F. Odeh Algorithms and Architectures for Adaptive Set Membership-Based Signal
Processing, Ph.D dissertation, Michigan State University, 1990.
[29] AK. Rao, Y.F. Huang, and S. Dasgupta, “ARMA parameter estimation using a novel
recursive estimation algorithm with selective updating,” IEEE Trans. on Acoust.,
Speech, and Signal Process, vol. 38, pp. 447-457, March 1990.
[30] J .P. Norton and S.H. Mo, “Parameter bounding for time-varying systems,” Mathemat-
ics and Computers in Simulation, vol. 32, pp. 527-534, 1990.
[31] M.F. Cheung 0n Optimal Algorithms for Parameter Set Estimation, Ph.D dissertation,
Ohio State University, 1991.
[32] M.F. Cheung, S. Yurkovich, and KM. Passino, “An optimal value ellipsoid algorithm
for parameter set estimation” Proc. of the 30th Conference on Decision and Control,
Brighton, pp. 969-974, December 1991.
111
[33] M. Krunz Numerical Stability and Convergence Issues in the SM- WRLS Algorithm,
MS. Thesis, Michigan State University, 1992.
[34] M. N ayeri, J .R. Deller, Jr., and M.M. Krunz, “Convergence and colored noise issues in
bounding ellipsoid identification,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal
Process. ’92, San Francisco, vol. 5, pp. 337-340, March 1992.
[35] M. N ayeri, J .R. Deller, Jr., and MS. Liu, “Optimal bounding error algorithms with
adaptive tracking capability,” Proc. of the 26th Annual Asilomar Conference On Sig-
nals, Systems, and Computers, Pacific Grove, October 1992.
[36] M. Nayeri, M.S. Liu, and J .R. Deller, Jr., “An interpretable and converging set mem-
bership algorithm,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’93,
Minneapolis, April, 1993.
[37] J .R. Deller, J .G. Proakis, and J .H.L. Hansen, Discrete- Time Processing of Speech Sig-
nals, New York, Macmillan, 1993.
[38] J .R. Deller, Jr., M. Nayeri, S.F. Odeh “Least squares identification with error bounds
for real-time signal processing and control,” Proceeding of the IEEE, to appear.
[39] J .R. Deller, Jr., M. Nayeri, M.S. Liu “Unifying the landmark developments in OBE
processing,” Int. Journal Automatic Control and Signal Process., to appear.
HICH IES
llllllll
Lb.‘ I §ou4»--—-‘