PLACE IN RETURN BOX to romovo this chockout from your record.
TO AVOID FINES return on or baton dd. dun.

 

DATE DUE DATE DUE DATE DUE

 

‘5“) '2’ 7 "1"3

1. #3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L_-—__-_ J

!

 

 

 

 

 

 

MSU Is An Afﬁrmative Action/Equal Opportunity Institution

 

ALGORITHMS AND ARCHITECTURES
FOR ADAPTIVE SET MEMBERSHIP-BASED
SIGNAL PROCESSING

By
S ouheil Farah Odeh

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering

1990

ABSTRACT

ALGORITHMS AND ARCHITECTURES
FOR ADAPTIVE SET MEMBERSHIP-BASED
SIGNAL PROCESSING

By

S ouheil Farah Odeh

This research is concerned with a class of set membership (SM) algorithms for esti-
mating the parameters of linear system or signal models in which the error sequence is
pointwise “energy bounded.” Speciﬁcally, it is focused on the set membership weighted
recursive least squares {SM- WRLS’) algorithm which works with bounding hyperel-
lipsoidal regions to describe the solution sets which are a consequence of the error
bounds. SM-WRLS is based on the familiar WRLS algorithm with the SM considera-
tions handled through a special weighting strategy. The original version of SM-WRLS
is applicable to real scalar data. In this work, a generalized SM-WRLS algorithm that
can handle complex vector-input vector-output data streams is developed which ex-
tends the applicability of this algorithm to virtually any signal processing problem
involving parametric models. Further, a signiﬁcant reduction in computational com-
plexity can be achieved by employing a “suboptimal” test for information content in
an incoming equation. This new strategy can be applied to virtually any version of
the SM-WRLS algorithm to improve the computational complexity. The suboptimal
check is argued to be a useful determiner of the ability of incoming data to shrink
the ellipsoid.

The “unmodiﬁed” SM-WRLS algorithm has inherent adaptive capabilities in its
own right. However, it is not possible to depend upon this algorithm to reliably behave

in an adaptive manner, particularly in cases of quickly varying system dynamics. In

this work, explicitly adaptive SM-VVRLS algorithms are developed. Adaptation is in-
corporated into SM-WRLS in a very general way by introducing a. flexible mechanism
by which the algorithm can forget the influence of past data. The general formula! ion
permits the extension of SM-WRLS to a wide range of adaptation strategies.

The various SM-WRLS developments are tested on models derived from real
speech data. Simulation results are presented which illustrate important points about
the various methods and show that the adaptive algorithms yield accurate estimates
using very few of the data and quickly adapt to fast variations in the signals dy-
namics. It is also significant that in preliminary experiments, most of the SM-WRLS
algorithms are found to be robust in small (16-bit) wordlength environments.

Finally, a parallel architecture is developed that implements the various SM-
WRLS algorithms in 0(m) floating point operations per equation using 0(m) cells,
where m represents the number of parameters estimated. A detailed analysis of the

computational complexity issues is carried out.

To my parents
Farah and Georgette Odeh

for their love, support, and sacriﬁce

iv

Acknowledgments

I am gratefully indebted to my thesis advisor, Professor John R. Deller, Jr., for
his invaluable guidance and generous support throughout the course of this research.

I would like to express my gratitude to all the members in my Ph.D. guidance
committee, Professor Donnie K. Reinhard, Professor Lionel M. Ni, Professor Majid
Nayeri, and Professor Paul M. Parker, for their time and effort in reviewing and
discussing my dissertation. The late Professor Harriett B. Rigas was also an important
source of inspiration and guidance throughout my education, especially in difﬁcult
times.

I would also like to give a special thanks to everyone in my family for their con-
tinuous love, understanding, and encouragement.

I also gratefully acknowledge the partial support of this work by a grant from the

Whitaker Foundation.

Table of Contents

List of Tables
List of Figures

1 Introduction and Background

1.1 General Objectives and Scope ......................
1.2 Set Membership Theory .........................
1.2.1 The Identification Problem and Least Squares Estimation . . .

1.2.2 The OBE Algorithm .......................

1.2.3 The SM-WRLS Algorithm ....................

1.2.3.1 MIL-based SM-WRLS Algorithm ...........

1.2.3.2 GR-based SM—WRLS Algorithm ............

2 New Theoretical Results and Algorithms

2.1 Introduction ................................
2.2 A Generalized “Non-adaptive” SM-WRLS Algorithm .........
2.3 Suboptimal Tests for Innovation in SM-WRLS Algorithms ......
2.4 Adaptive SM-WRLS Algorithms .....................
2.4.1 General Formulation .......................
2.4.1.1 Windowing .......................

2.4.1.2 Graceful Forgetting ...................

2.4.1.3 Selective Forgetting ...................

2.4.2 Exponential Forgetting Factor Adaptation ...........

2.5 A Survey of the Computational Complexities of Several Related Se-
quential Algorithms ............................

3 Simulation Studies

3.1 Introduction ................................
3.2 Simulation Results of two AR(2) models ................
3.2.1 Conventional RLS and SM-WRLS Algorithms .........

3.2.2 Adaptive SM-WRLS Algorithms .................

3.2.2.1 Windowing .......................

3.2.2.2 Graceful Forgetting ...................

3.2.2.3 Selective Forgetting ...................

vi

viii

g.

WCOCDKIUIu-ﬁht—‘H

y...

18
18
19
34
37
38
40
40
41
42

45

52
52
53
57
62
62
65

65

3.2.3 Suboptimal SM-WKIS ......................

3.2.4 Adaptive Suboptimal SM~VVRLS

3.3 Simulation Results of an AR(14) model .................
3.3.1 Conventional RLS and SM-WRLS Algorithms .........
3.3.2 Adaptive SM—WRLS Algorithms .................

3.3.2.1 Windowing

3.3.2.2 Graceful Forgetting ...................
3.3.2.3 Selective Forgetting ...................
3.3.3 Suboptimal SM-WRLS ......................

3.3.4 Adaptive Suboptimal SM-WRLS

3.4 Roundoff Error Analysis .........................

4 Architectures and Complexity Issues

4.1 Introduction .

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

4.2 Parallel Architecture for SM-WRLS ...................
4.3 An Adaptive Compact Parallel Architecture ..............
4.4 Computational Complexities .......................

5 Conclusions and Further Work
5.1 Algorithmic Developments ........................
5.1.1 A Generalized SM-WRLS Algorithm ..............

5.1.2 Suboptimal Tests for Innovation .................
5.1.3 Adaptive SM-WRLS Algorithms .................

5.2 Simulation Studies
5.3 Architectures and Complexity Issues

5.4 Further Work
Appendix A

Bibliography

vii

Ti
71
7-1
78
T9
79
80
81
82
83
85

97
97
97
106
112

117
117
117
118
118
119
119
120

122

126

2.1
2.2
4.1
4.2
4.3
4.4

4.5

List of Tables

Computational complexities (in ﬂoating point operations (ﬂops) per

equation) for the real scalar sequential algorithms ........... 46
Computational complexities (in complex ﬂoating point Operations (cﬂops)
per equation) for the generalized sequential algorithms ........ 50
The timing diagram of the triangular array of Fig. 4.1 ......... 105
The timing diagram of the back substitution array of Fig. 4.1 ..... 106
The timing diagram of the Givens rotation (GR) array of Fig. 4.5 . . 111
Computational complexities (in ﬂops per equation) for the real scalar
sequential and parallel GR-based SM-WRLS algorithms ........ 113

Parallel computational complexities (in flops per equation) for the var-
ious SM-WRLS algorithms using the implementations of Figs. 4.1 and 4.5113

viii

1.1
1.2
2.3
3.1
3.2
3.3
3.4

3.5
3.6

3.7
3.8

3.9

3.10

3.11

3.12

3.13

3.14

3.15

List of Figures

Local and global membership sets in 2-D ................ 10
The SM—WRLS algorithm based on Givens rotations ......... 16
(a) The real scalar linear model and (b) the general complex vector

linear model ................................ 21
The acoustic waveform of the word “four” ............... 53
The acoustic waveform of the word “six” ................ 54
The “true” parameters for the word “four”. (a) Parameter a1 and (b)

Parameter a2 ............................... 55
The “true” parameters for the word “six”. (a) Parameter a; and (b)

Parameter a; ............................... 56
Simulation results of the conventional RLS algorithm for the word four 58

Simulation results of the SM-WRLS algorithm for the word four. 1.86%
of the data is employed in the estimation process ........... 59
Simulation results of the conventional RLS algorithm for the word six 60
Simulation results of the SM—WRLS algorithm for the word six. 2.16%
of the data is employed in the estimation process ........... 61
Simulation results of the windowed SM-WRLS algorithm for the word
four (I = 1000). 5.69% of the data is employed in the estimation process 63
Simulation results of the windowed SM-WRLS algorithm for the word
six (I = 1000). 5.44% of the data is employed in the estimation process 64
Simulation results of the graceful forgetting SM-WRLS algorithm for
the word four ()1 = 10'3). 6.19% of the data is employed in the esti-
mation process .............................. 66
Simulation results of the graceful forgetting SM-WRLS algorithm for
the word six ()1 = 104). 4.89% of the data is employed in the estima-
tion process ................................ 67
Simulation results of the selective forgetting SM-WRLS algorithm for
the word four. 3.6% of the data is employed in the estimation process 69
Simulation results of the selective forgetting SM-WRLS algorithm for
the word six. 2.83% of the data is employed in the estimation process 70
Simulation results of the SM-WRLS algorithm with suboptimal data
selection for the word four. 1.19% of the data is employed in the
estimation process ............................ 72

ix

3.16

3.17

3.18

3.19
3.20
3.21
3.22

3.23

3.24

3.25

3.26

3.27

3.28

3.29

3.30

3.31

3.32

3.33

Simulation results of the SM-WRLS algorithm with suboptimal data
selection for the word six. 1.53% of the data is employed in the esti-
mation process .............................. T3
Simulation results of the selective forgetting SMoWRLS algorithm with
suboptimal data selection for the word four. 1.89% of the data is
employed in the estimation process ................... 75
Simulation results of the selective forgetting SM-WRLS algorithm with
suboptimal data selection for the word six. 1.86% of the data is em-

ployed in the estimation process ..................... 76
The acoustic waveform of the word “seven” ............... 77
The fourth “true” parameter (a.,) for the word “seven” ........ 77

Simulation results of the conventional RLS algorithm for the word seven 78
Simulation results of the SM-WRLS algorithm for the word seven.
7.93% of the data is employed in the estimation process ........ 79
Simulation results of the windowed SM-WRLS algorithm for the word
seven (I = 500). 22.1% of the data is employed in the estimation process 80
Simulation results of the windowed SM-WRLS algorithm for the word
seven (1 = 1000). 17.04% of the data is employed in the estimation
process ................................... 81
Simulation results of the windowed SM-WRLS algorithm for the word
seven (l = 1500). 14.34% of the data is employed in the estimation
process ................................... 8?.
Simulation results of the graceful forgetting 'SM-WRLS algorithm for
the word seven (p = 10‘3). 18.66% of the data is employed in the
estimation process ............................ 83
Simulation results of the graceful forgetting SM-WRLS algorithm for
the word seven ()1 = 2 x 10‘3). 27.63% of the data is employed in the
estimation process ............................ 84
Simulation results of the graceful forgetting SM-WRLS algorithm for
the word seven ()1 = 4 x 10’“). 35.59% of the data is employed in the
estimation process ............................ 85
Simulation results of the selective forgetting SM-WRLS algorithm for
the word seven. 12.89% of the data is employed in the estimation process 86
Simulation results of the SM-WRLS algorithm with suboptimal data
selection for the word seven. 4.74% of the data is employed in the
estimation process ............................ 87
Simulation results of the windowed SM-WRLS algorithm with subop-
timal data selection for the word seven ( = 500). 10.93% of the data
is employed in the estimation process .................. 88
Simulation results of the windowed SM-WRLS algorithm with subop-
timal data selection for the word seven (1 = 1000). 8.71% of the data
is employed in the estimation process .................. 89
Simulation results of the selective forgetting SM-WRLS algorithm with
suboptimal data selection for the word seven. 8.8% of the data is
employed in the estimation process ................... 90

3.34

3.35

3.36

3.37

3.38

4.1

4.2

4.3

4.4

4.5

4.6

4.7

Simulation results of the SM-WRLS algorithm for the word four using
a l6-bit wordlength. 1.96% of the data is employed in the estimation
process ...................................
Simulation results of the SM-WRLS algorithm for the word six using
a 16-bit wordlength. 2.17% of the data is employed in the estimation
process ...................................
Simulation results of the windowed SM-WRLS algorithm for the word
four (I = 1000) using a 16-bit wordlength. 5.4% of the data is employed
in the estimation process .........................
Simulation results of the graceful forgetting SM-WRLS algorithm for
the word four ()1 = 10‘3) using a 16-bit wordlength. 3.27% of the data
is employed in the estimation process ..................
Simulation results of the selective forgetting SM-WRLS algorithm for
the word four using a 16-bit wordlength. 3.27% of the data is employed
in the estimation process .........................
Systolic array implementation of the Givens rotation-based SM~WRLS
algorithm. For simplicity of notation but without loss of generality, the
figure shows a purely autoregressive case with p=3; AR(S) ......
The operations performed by the cells used in the triangular array of
Fig. 4.1. (a) The Givens generation (GG) cells, (b) the Givens rotation
(GR) cells, and (c) the delay element ..................
The operations performed by the back substitution array. (a) The left-
end processor and (b) the multiply-add units. The initial yin-,1 entering
the rightmost cell is set to 0 .......................
The multiply-add unit used in Fig. 4.1 .................
A compact architecture implementation of the adaptive SM-WRLS
algorithm .................................
The Operations performed by (a) the GG and (b) the GR cells used in
the modules of Fig. 4.7. 6 = +1 (-1) for rotating the equation into
(out of ) the system ............................
(a) The GG’ module and (b) the GR’ module used in the architecture
of Fig. 4.5 .................................

xi

9]

93

94

99

101

102

103

108

109

110

Chapter 1

Introduction and Background

 

1.1 General Objectives and Scope

This research is concerned with techniques for estimating the parameters of linear
system or signal models. Set membership (SM) identification refers to a class of
estimation techniques that uses certain a priori knowledge about a linear paramet-
ric model to constrain the solutions to certain sets. Based on certain set-theoretic
checking criteria, SM algorithms select and use only the useful data to update the
parameter estimates and reﬁne the “membership sets” to which the true parameters
must belong. When data do not help refine these membership sets, the effort of up-
dating the parameter estimates at those points can be avoided. The power of SM
algorithms becomes more apparent when implemented using parallel architectures
due to the signiﬁcant reduction in the computational complexity. Because of their
strong potential for application and theoretical development in virtually any signal
processing problem involving parametric models (such as speech recognition, image
processing, beamforming, spectral estimation, and neural networks), SM algorithms
have been the subject of intense research effort in recent years [1] - [26]. Much of the
recent work on parametric models is closely related to previous papers by Schweppe
[27], Witsenhausen [28], and Bertsekas and Rhodes [29] which study state space sys-

tems in the control and systems science domains.

1

The most widely studied class of SM algorithms involves the case in which the

error sequence, say v(n), is pointwise “energy bounded,”

7(n)v2(n) <1 (1.1)

where the sequence 7(n) is known or can be estimated from the data. It is this problem
with which this research is concerned. (Other interesting variations involve stability
constraints [30], and other noise parameter bounds [31, 32]. Veres and Norton [33]
have also investigated the effects of error bounding on model structure identification.)
In addition to the many advantages of the algorithms to be developed from this
information, the constraint (1.1) minimizes the necessary knowledge of the input. In
particular, it is not necessary to know the form of the density function for v(n).

Constraints of form (1.1), in conjunction with the model and data, imply pointwise
“hyperstrip” regions of possible parameter sets in the parameter space which, when
intersected over a given time range, usually form convex polytopes of permissible
solutions for the “true” parameters. While exact descriptions of these polytopes are
possible [1, 10, 11, 12, 17, 21, 24], algorithms of much lower complexity have been
developed which work with a bounding hyperellipsoid, a tight superset of the polytope
[5, 6, 7, 14, 20]. Such an “optimal bounding ellipsoid” (OBE) algorithm is the focus
of this research. Recently, Deller [5, 6] has reformulated an OBE algorithm of Huang
[3, 8, 14, 19, 20] so that it is exactly the familiar weighted recursive least squares
solution [34, 35] with the SM considerations handled through a special weighting
strategy. This algorithm is referred to as set membership weighted recursive least
squares (SM- WRLS) to distinguish it from the original OBE algorithm. A review of
SM theory and related topics is presented in the next section.

The main objectives of this research are:

1. To generalize the SM-WRLS algorithm to handle complex vector-input vector-

2

output data streams.
2. To reduce the computational complexity of the algorithm.
3. To make the estimator adaptive.

4. To perform simulation studies to research the performance of the SM-WRLS
algorithms.

5. To develop parallel architectures to implement the SM-WRLS algorithms.

The ﬁrst three objectives which are addressed in Chapter 2 are concerned with
algorithmic developments centered on SM-WRLS identification. Currently, the SM—
WRLS algorithm can be used in the noted application areas if the data are real
numbers. It will be advantageous if this powerful algorithm can be extended to work
with complex numbers. Complex-valued data are encountered in many digital signal
processing and image processing problems (e.g., see [36, Chs. 2 & 8]). The current
version of SM-WRLS cannot handle complex-valued data. Also, SM-WRLS has the
potential for application to a wide range of problems in which the data to be processed
(at each instant of time) may take the form of a vector quantity. For example, a very
important research area is beamforming [37], a spatial ﬁltering task, to which the
SM-WRLS algorithm may be applied. At every time interval, the array of sensors
of a narrowband beamformer provide vector outputs that need to be processed by
the beamformer using complex computations. There is also potential for applying
SM-WRLS to neural networks in which both the input and the output are (complex)
vectors [38]. The theoretical development of a generalized SM-WRLS algorithm that
can handle complex vector-input vector-output data (objective 1) is the subject of
Section 2.2.

The second main objective of this research is to develop a more efficient SM-
WRLS algorithm which uses a suboptimal checking criterion to reduce the computa-
tional complexity of SM-WRLS at the expense of using “suboptimal” weights. This

suboptimal strategy can be applied to any version, adaptive or “non-adaptive,” of

the SM-WltlS algorithm to improve the computational efﬁciency. The theoretical
development of the suboptimal SM-WRLS algorithm is presented in Section 2.3. Sec-
tion 2.4 demonstrates how to make the SM-WRLS algorithm explicitly adaptive (ob-
jective 3) by introducing a flexible mechanism by which it can “forget” the influence
of past data. It is to be noted, however, that the basic (unmodiﬁed) SM-WRLS algo-
rithm is inherently “adaptive” compared with the RLS algorithm. This adaptation
is inherent in the use of data weights which are “optimal” in the SM sense.

Chapter 3 is concerned with objective 4 which discusses simulation studies of
the various suboptimal and adaptive strategies presented in Chapter 2. These studies
illustrate important points about these strategies and about the SM-WRLS algorithm
in general.

One of the advantages of the SM-WRLS formulation (contrasted with Huang’s
OBE algorithm [20]) is that it immediately admits solution by contemporary paral-
lel architectures. This is critical because it reduces the complexity of the algorithm
from 0(m2) to 0(m), where m is the number of parameters to be estimated. The
signiﬁcant reduction of computational complexity and parallel hardware implementa-
tion of SM algorithms improve their potential for real time applications. Chapter 4 is
devoted to parallel hardware implementations (objective 5) of the SM-WRLS and dis-
cussion of their advantages, particularly with regard to their improved computational
complexity.

Finally, Chapter 5 summarizes the main conclusions and contributions of this

research and suggests possible directions for future research topics in the SM realm.

1.2 Set Membership Theory

A brief background on SM theory is presented in this section which is divided into

three major subsections. The ﬁrst contains a brief overview of the identiﬁcation

problem and least squares (LS) estimation. The second gives an overview of the ()lll'l
algorithm. The third presents the “scalar case” of the SM-WRLS algorithm and its
formulation based on Givens rotations (GR ’3). However, the computational com plex-

ities of these algorithms (and others) are presented and compared in Chapter 2.

1.2.1 The Identiﬁcation Problem and Least Squares Esti-

mation

A well known identiﬁcation problem is the estimation of the parameters of a general

autoregressive moving average with exogenous input (ARMAX(p,q)) [35] model of the

form

ye) = fawn — 2') + )3th - j) + v(n) (1.2)

i=1 j=0
in which y(n) is a scalar output of the model; w(n) is a measurable, uncorrelated,
input sequence; v(n) is an uncorrelated‘ driving (or error) sequence, known to be
bounded as in (1.1), which is independent of w(n); and a,’s and bj,3 are the parameters

to be identiﬁed. For convenience, the following vector notations are employed

3T(n) '-'- [y(n -1)y(n - 2) - - - v(n - P)w(n)w(n - 1) - ' - w(n - 9)] (1-3)

and

93; s [a,a,...a,bob,...b,] (1.4)

and hence,

y(n) -'_: 03::(12) + v(n) . (1.5)

 

1As will be noticed below, the SM-WRLS algorithm has no nominal constraint on v(n) other than
(1.1). However, if v(n) is correlated, one would expect to obtain biased estimates since the solution
is essentially based on the RLS method [35]. The issue of a correlated v(n) with SM processing
remains open for further research.

Note that this model has an important subcase of a purely order p autoregressive
(AR(p)) model which is prevalent in many important applications. The dimension of

00 is deﬁned as the integer m,

m=p+q+1 (1.6)

noting that m should be reduced to simply m = p for the pure AR case.
Consider the LS problem [39]: Given data (or a system of observations) on the
interval i 6 [1,11] (n 2 m), and some set of error minimization weights, say {A(i)},

form the overdetermined system of equations

q

i A(1)y(1) l

\/*(2.)y(2) (1.7)

i A(1)2:T(1) _.

VAC-037(2) -* 00

 

 

 

 

shown) -», _ Maw),

denoted

X0090 = v(n) , - (1-8)

A

and ﬁnd the LS estimate, say 0(n), for the vector 90.
There are well known methods to solve this problem. The ﬁrst is the “batch”

solution given by [39]

. —1

em = [xT(n)X(n)] xr(..),,(..) (1.9)
with the matrix in brackets playing the role of the weighted covariance matrix, i.e.,

C,(n) = [XT(n)X(n)] = ih(i)z(i)zT(i) . (1.10)

i=1

The second is the recursive matrix inversion lemma (MIL)-based WRLS solution

given by [34, 35]

P(n -1)a:(n)a:T(n)P(n —- 1)
1+ A(n)C(n)

 

P(n) = P(n-l)-/\(n) (1.11)

l A

9(n) = 9(n—1)+/\(n)P(n)a:(n)en_1(n) (1.12)
where,
P(n) é C;1(n) (1.13)
G(n) e 2T(n)P(n-1)a:(n) (1.14)
en_1(n) e y(n)-ST(n—l)a:(n) (1.15)

in which C,(n) is the weighted covariance matrix deﬁned above, P(n) is its inverse;
6(n) is the parameter vector estimate using n points of data; en_1(n) is the residual
(or error) at time 71 based on 9(n — 1); and A(n) is some error minimization weight.

These recursions are theoretically equivalent to the batch solution given in (1.9) at

each 12.

1.2.2 The OBE Algorithm

The OBE algorithm is an SM algorithm developed by Fogel, Huang, and colleagues
[3, 8, 14, 19, 20, 23, 25] which can be used to identify a general ARMAX(p, q) model.
An overview of this algorithm and its origin is presented in this section (paraphrased
from [5, 6, 7]).

Let us ﬁrst present the general idea behind SM algorithms. Consider the case of
a real m-dimensional parameter space; 11'". These algorithms bound the parameters
in a subset of ’R’" as follows: At any given time, the model, incoming datum, and
constraint (1.1) deﬁne a pointwise “hyperstrip” region of possible parameter sets in

72’”. Over a given time range, the intersection of these hyperstrips forms a convex

polytope of permissible solutions for the “true” parameters; 00. The description of
this polytope can be greatly simpliﬁed (mathematically) when approximated by a
bounding hyperellipsoid, a tight superset of the polytope [5, 6, 7, 14, 20].

By recognizing the relationship between the conventional WRLS and SM identiﬁ-
cation, Fogel [25] showed that there is a membership set (hyperellipsoid) centered on
the unweighted RLS estimate associated with the identiﬁcation of constant unknown
parameters of a linear system driven by uncorrelated noise, based on constraints of
the form (1.1). Fogel also studied the convergence of the membership set to a single
point. The subsequent papers by Fogel, Huang, and colleagues [3, 8, 14, 19, 20, ‘23]
discuss a weighted approach (i.e., the OBE algorithm) based on the same principle.
This algorithm uses energy constraints of form (1.1) to restrict the solutions of the

linear parameters to ellipsoidal domains. At time n, the estimator, 0(n), is the center

of an ellipsoidal region in R”, of the form
E(n) =°_- {o | [a - é(n)]T<I>-1(n) [o — é(n)] g 1} , o e nm (1.16)

which represents the smallest bounding ellipsoid of possible solutions, 9, to which the

true parameters must belong. <I>“(n) can be interpreted as a weighted covariance

matrix on the observations,

7!

<I>’l(n) = ZA(i)c(i)zT(i) . (1.17)

i=1

The algorithm, in effect, seeks the Mn) which minimizes the “volume ratio” of the
sequential ellipsoids, E(n) and E(n — 1), given the incoming datum, and subject to
scaling of the previous weights. For this reason, the technique is referred to as the
OBE algorithm. Frequently, no such weight exists, indicating that the new datum is
uninformative (in the SM sense) and the effort of updating the parameter estimates

can be avoided.

It is important to note that the OBE algorithm is derived from geometric consid-
erations and is centered on three recursions [20], two of which are remarkably similar
to the conventional MIL-based WRLS algorithm [34]. Careful analysis of OBE reveals
that it is “WRLS with time varying weights.” This is alluded to above. Whereas the
0813’s geometric approach solves a weighted LS problem on a point-by-point basis,
its recursions cannot be exactly interpreted as conventional WRLS because of the
fundamental difference in their development. Recently, Deller [5, 6] has reformulated
OBE into a more conventional WRLS technique referred to as SM-WRLS. The SM-
WRLS formulation, which incorporates SM considerations directly into the standard

WRLS recursions, is treated in the next section.

1.2.3 The SM-WRLS Algorithm
1.2.3.1 MIL-based SM-WRLS Algorithm

In this section, the theoretical development of the basic MIL-based SM-WRLS algo-
rithm is sketched. This algorithm, which is described in detail in [6, 7], accepts only
scalar, real-valued data which makes it applicable to a speciﬁc class of problems. A
brief overview of the SM theory and the main results of the basic SM-WRLS algorithm
are summarized below (paraphrased from [6, 7]) to serve as a foundation for general-
izing this algorithm to handle complex-valued vector-input vector-output data. The
theoretical development of the generalized algorithm is presented in Chapter 2.
Suppose that, at time i, y(i), 3(2), and the model of form (1.5) are given. If there
is no other information about the system, the parameter vector 00 can theoretically
take any real vector value. A constraint on v(i) like (1.1), however, restricts the

possible range of values of 90. From (1.1) and (1.5), it is clear that (at time i)

122(1) = [31(1) — 932(i)]’ < 3112—) . (1.18)

 

 

 

 

 

 

 

 

 

 

 

Figure 1.1: Local and global membership sets in 2-D.

 

It is assumed that the sequence of numbers, 7(i), is known (or can be accurately
estimated), and therefore, (1.18) restricts possible values of 90 to some range, say w(i),
that is called a “local membership set,” to which 00 must belong at time i. If the data
and the v(i) values are given over a range [1,n], then n local sets w(i), i = 1,2, . . . , n
(one for each observation) can be generated. Each of these takes the form of a
“hyperstrip” in ’R’”, the two-dimensional (2D) case shown in Fig. 1.1. The parameter

vector 90 must simultaneously belong to each of these sets, and therefore, must belong

10

to a global membership set given by

11(72): ﬂw(i) . (1.19)

Q(i) will be a monotonically non—increasing set with i, and it will be the minimal
(most restrictive) membership set known under the conditions of the problem. The
global membership set, 0(n), is the intersection of the individual strips w(i) (see
1.19), which takes the form of a convex polytope in ’R'", as illustrated in Fig. 1.1 for

a 2-D case. Note that the individual w(i) does not necessarily help to reﬁne Q(i), i.e.,

it might be true that
Q(i) = fl(i—1)ﬂw(i) = Q(i — l) (1.20)

and the corresponding data are considered “unuseful” in the SM sense. See, for
example, the case of (2(4) in Fig. 1.1.

The set 0(i), which requires high computational complexity algorithms [1, 10,
11, 12, 17, 21, 24] to describe and work with, provides a useful way of determining
which data points are informative and which are not. Note that the center of this set
provides a systematic estimate of the parameter vector 90 which would be expected to
improve as i increases. However, neither 9(i) nor its center is clearly or conveniently
related to the WRLS estimation process of interest. There is, however, a related (but
potentially larger) global membership set associated with the WRLS process at time
n. This is derived from (1.1) and (1.18) by noting that the constraint (1.1) on the
input implies an “accumulated inequality” given by

)5 1(1) [31(1) — (93,‘:.:(.')]2 < )2 3(7) (1.21)

i=1

which holds as long as the set of “weights” used, {A(i)}, are non-negative (see [7,

11

Lemma 1]). This inequality leads to a global membership set, say C(12), to which 00
must belong. Since C(n) (from the discussion above) is the smallest known set, it
must be true that fl(n) Q C(n).

Two fundamental theorems underlie the basic SM-WRLS algorithm [5, 6, 7, 18].
The ﬁrst indicates how the bounding ellipsoid is related to the conventional W RLS
process and the second indicates how the optimal data weights are chosen. Proofs

are found in [7] for the AR(p) case. The generalization to ARMAX(p,q) is straight-

forward.

 

Theorem 1 [7] Let 0(n) C_: ’R'" be the set of all parameter vectors which are com-
patible with the data fori E [1,n] under constraint (1.1). Then there exists a superset

of C(n), say C(n), a hyperellipsoid in ’R'”, which is closely associated with the WRLS
estimation process:

 

" “ C307“ 7 m
e(n) = {a ([9 — 9(n)]T E(n)) [9 — 0(n)] <1} , o e n (1.22)
where, n A .
an) -- 6T(n>c.(n)é<n) + 3%] (1 - was/2(2)) (1.23)

in which 6(n) is the conventional WRLS estimate of00 at time n using weights {)1(i)}.

 

This theorem, which is derived from (1.21) by replacing 00 with a general vector
9, simply means that there is a hyperellipsoidal domain in ’R’" (i.e., all 9 6 E(n))
which is guaranteed to contain 00, and which is centered on the WRLS estimate 0(n).

The “volume” of the ellipsoid fl(n) is inversely proportional to the determinant
of the matrix C,(n)/s(n), and is a function of only one unknown, z\(n). Therefore, a
logical strategy for the selection of weights is to choose A(n) to maximally shrink the
volume of C(72). If no such weight exists, the data at time n should be “rejected” (A(n)

effectively set to zero), as it does not serve to refine the estimate of the parameters,

12

thereby saving the computational CXpense otherwise necessary to incorporate it into

the estimate.

The ellipsoid volume is proportional to the quantity
det B(n) é det n(n)C;1(n) . (1.21)

A reasonable strategy is to ﬁnd an optimal weight, A‘(n), at each step that minimizes

the “volume ratio” of the ellipsoids at n and n -- 1:

V(A(n)) = (1:25?” .

 

 

Theorem 2 [7] The weight, A‘(n), which minimizes the volume ratio (1.25), is the
most positive root of the quadratic equation

F(A)=ag/\2+alh+ao=0 (1.26)
where,

a; = (m-l)Gz(n)
a. = {2m—1+7(n)c:-.(n)—~(n—1)~1(n)G<n)}G<n)
co = m [1 - v(n) e3.-.(n)] - ~(n - 11702160»)

in which all the quantities are deﬁned above.

 

1.2.3.2 GR-based SM-WRLS Algorithm

In this section, a solution of the SM-WRLS algorithm that is amenable to parallel

hardware implementation is presented, which integrates SM weights with a GR version

of WRLS (paraphrased from [5, 6]).

13

The solution is based on the orthogonal triangularization (by GR’s) of the X(n)
matrix of (1.8) [5, 6, 39, 40, 41, 42, 43, 44]. The procedure, in principle, involves the
application of a sequence of orthogonal operators (GR’s) to (1.8) which leaves the
system in the form _

T(n) die) l
90 = (1.27)

0(n—m)xm d2(n)

[- J l. .l

 

 

 

 

where the matrix T(n) is an m x m upper triangular Cholesky factor [39] of Cx(n)

(see (1.30) below), and 0,5,,- denotes the i x j zero matrix. The system

T(n)é(n) -_- d,(n) (1.28)

A

is easily solved using back substitution [39] to obtain the LS estimate, 0(n). This
formulation makes possible the solution (in terms of computation and implementa-
tion) of SM-WRLS on contemporary parallel architectures (developed in Chapter 4)
for great speed advantages.

Some computational details need to be examined for future purposes. In computo
ing A‘(n), it is noted that F (A) contains terms involving the inverse covariance matrix
C;1(n), which never occurs elsewhere in the GR—based algorithm. In particular, the

computation of the scalar (see (1.14))

C(n) é 2T(n)C;1(n - 1)a:(n) (1.29)

requires 0(m2) operations which comprise the main computational load in determin-

ing )t‘(n). This problem is easily resolved by noting that

c,(n) = xT(n)X(n) -_- TT(n)T(n) (1.30)

14

because T(n) represents an orthogonal transformation on X(n). Therefore,

C(n) = zT(n)T‘1(n—1)T'T(n—1)::(n)

gT(n)g(n) = II 901) n: (1.31)

in which [I 3 [I2 denotes the l2 norm on ’R’”. Since 3(n) = TT(n - 1)g(n), and the
matrix TT(n -— 1) is lower triangular, g(n) is easily found from the available quantities
at time n by back substitution.

A similarly inexpensive procedure for computing n(n) is available in the context

of (1.27). Equation (1.23) can be written

x(n) = éT(n)c,(n)é(n) + E(n) (1.32)
where,
E(n) = 213% (1 - 7(¢)y’(z))
2(n _ 1) + 578—; (1 - 7(n)y2(n)) (1.33)

with 12(0) 2: 0. Also, from (1.27) and (1.30), the ﬁrst term in (1.32) is easily shown

to be [I d1(n) [[3, and therefore, (1.32) can be written

E(n) = II dd") II; + 76(71) - (1-34)

Figure 1.2 summarizes this GR-based SM-WRLS algorithm.
Finally, the quantity det B(n) can be conveniently monitored within the system

(1.27), since, from (1.24) and (1.30),

log {det B(n)} = m log {s(n)} — 2 log {det T(n)] (1.35)

15

 

lNlTlALlZATlON: Fill (m +1) x (m + 1) working matrix, W, with zeros.
AU): 1, i: 1,2,...,m+l
12(0) = 0

RECURSION: For i = 1,2, . . . ,n,

STEP 1. (Skip° ifi S m +1) Update C(i), e;_1(i).
Solve TT(i - 1)g(i) = a:(i) for g(i) by back substitution.
6(1) -— u g(i) u: T
«4(1): 1(1) — é (1-1)..(1)
STEP 2. (Skip ifi S m +1) Compute optimal Mi), say )t'(i), by ﬁnding
most positive root of quadratic (1.26).
STEP 3. (Skip ifi _<_ m +1) If A'(i) S 0, set

31(1) = 31(1- 1)
9(i) = 0(i - l)
E(i = E(i — 1)

and go to STEP 7.
Otherwise, continue.

STEP 4. Update T(i).
Replace bottom row of W by (/z\‘(i) [mT(i) I g(i)].

Rotate this “new equation” into W using Givens rotations,

leaving the result [T(i) | d1(i)] in the upper m rows of W.

These rotations involve the scalar computations [40, 43]

W;k = ijO’ + “4,1,”,de and W:n+1'k = — p.76 + Wm+1’k0’6

for lc=j,j+1,...,m+1 and forj = 1,2,...,m;

where, a = Wjj/p, r = Wm+1,k/p, p = W};- + 6W3,+1,j, 6 is unity"

and ij (Wjik) is the j, k element of W pre- (post-) rotation.

STEP 5. (Skip ifi S m) Update 6(i), solving T(i)6(i) = d1(i) by
back substitution.
STEP 6. Update x(i) and E(i) according to
11(1) = 11(1- 1)+ 171710 - «are»
“(1') = ll 0'10) "3 + 79(1')
Compute and store only E(i) if i S m.
STEP 7. If i S n, increment i and return to Step 1.

 

 

“Generally T(i) does not become nonsingular until i = m + 1.The ﬁrst g(i) cannot be computed
until i = m + l and the ﬁrst A‘(i) at i = m + 2. We arbitrarily set Mi) = 1 on the initial range.
'’6 is set to -1 to rotate an equation out of the estimate [40].

Figure 1.2: The SM~WRLS algorithm based on Givens rotations.

 

16

d d tT(n) is easily computed as the product of the diagonal elements of T(n).
an e ,

 

17

Chapter 2

New Theoretical Results
and Algorithms

 

2.1 Introduction

This chapter is divided into three major sections. The ﬁrst is devoted to the theo-
retical development of a generalized SM-WRLS algorithm that can handle complex
vector-input vector-output data.

In the case when it is critical to obtain the solutions of certain problems as fast as
possible, it is essential to have a yet more efﬁcient SM-WRLS algorithm that produces
acceptable solutions faster. The theoretical development of this algorithm, which uses
a suboptimal checking criterion to reduce the computational complexity of SM-WRLS
at the expense of using “suboptimal” weights, is presented in Section 2.3.

Finally, Section 2.4 demonstrates how to make the SM-WRLS algorithm explicitly
adaptive with a very ﬂexible mechanism by which it can “forget” the inﬂuence of past
data. Three major subcases are identiﬁed and presented. This section also shows that
exponential forgetting factor adaptation can be incorporated into SM-WRLS. The
theoretical development of this algorithm and a discussion explaining why it has not

been found to be effective for adaptation in preliminary experiments are presented.

18

2.2 A Generalized “Non-adaptive” SM-WRLS Al-
gorithm

This section is devoted to the theoretical development of the generalized (for complex
vector data) SM-WRLS algorithm. The developments here are guided by the work
of Deller and Luk in [7] on the real scalar case and Deller in [45] on a special vector

(3856.

Consider the general linear model of the form (cf. (1.2))

q

y(n)ziAf’ym—i)+ZB§”w(n—j)+v(n) (2.1)

-

J=0

in which y(n) 6 C” is a complex vector output sequence of the model; w(n) E C” is
a measurable, uncorrelated, complex vector input sequence to the model; v(n) E C”
is an uncorrelated’ complex vector driving (or error) sequence, known to be bounded
(as in (2.18) below), which is independent of w(n); A; E Ckx", i = 1,2,. . . ,p, and
BJ- E Ckx", j = 0,1,. . . , q, are complex matrices of “true” parameters to be identiﬁed;
and superscript H is a standard notation for the conjugate transpose ( or hermitian).
For convenience in the following, the dimensions of the vectors g(n) and w(n) are
assumed to be equal (more on this below).

The key to generalizing the results of the real scalar case discussed in Chapter 1

is the reformulation of (2.1) as

I:
v(n) = 2963M) + v(n) ( -

l=1

[\D
(\3
V

in which 9; E me" is a complex matrix of the parameters to be identiﬁed which are

 

2Absence of both temporal correlation and component correlation are assumed here. This will
be useful in formulating the energy bounding assumption (2.18) below. Also, see Footnote 1 in
Chapter I.

19

associated with the vector 3,02) 6 Cm. For convenience, the following vector/matrix

notations are employed

37(11): fyzfn -1)y1(n - 2) ° ° ° y1(n - let(n)w1(n - 1) - - ' w1(n — q)] (2.3)

and
r .
01.11 01,12 "' 01.11:
02,11 02,12 ° ° ° 02,11:
an 012 an.
10. p. p.
91 = (2.4)
50,11 50.12 ' ° ° 50,11:

b1,11 b1,12 51,11:

 

 

[ban 5m b“),

.1

in which w1(n) is the 1‘” element of w(n); and a;,1k’s and bJ-Jk’s are the parameters
to be identiﬁed which are associated with the 1‘” vector, 31(72), and the k‘” output
element, yk(n). Figure 2.3 shows (a) the real scalar linear model of (1.2) and (b) the
general complex vector linear model of (2.1). It is to be noted that the matrix A.- in
(2.1) consists of the Us elements, 01.11:, and the matrix B, consists of the lie elements,
ijk (see (2.4) and Fig. 2.3). Note that if the dimensions of the vectors g(n) and w(n)
are not equal, then the corresponding elements in (2.2) are replaced by zeros.

The general model can be written
y(n) = 65,301) + v(n) (2.5)

where,

eg’ a [ofefnef] (2.6)

 

 

y(n)

 

 

 

01

02

 

 

X(n) y(n-I) ytn-Z)

 

 

 

 

 

 

 

 

 

 

 

 

ytn) yztn) yztn) - . - Mn)

 

 

 

 

 

x101) x201) . . . xk(n) _

 

 

 

 

 

 

 

 

 

(b)

Figure 2.3: (a) The real scalar linear model and (b) the general complex vector linear
model.

 

21

and

T(n) i [3T(n)a:'2r(n) - - - $1.01)] . (2-71

Given a vector 3(2) 6 Cm” and an output vector g(i) E C“ on the interval
i E [1,n] (n 2 m), and some set of error minimization weights, say {Mi)), the
LS estimate, say C(n), of the parameter matrix 90 6 kax" is the solution of the

overdetermined system of equations of the form

P WW1) —>i 7 (hence) —-» -

(/1(2)z"(2) -» 90: )(2)y"(2) —+

 

 

 

 

(2.3)
( Mann) —+ . , 1(n)y"(n) —» ,
denoted
x”(n)€-)0 = Y”(n) . (2.9)
The batch solution is given by (39]
out) = [X(n)x"(n)]"]l X(n)Y”(n) (2.10)

with the matrix in brackets called the (weighted) covariance matrix, i.e.,

c,(n) s [X(n)x"(n)] = Z Mi)a:(i)a:”(i) . (2.11)

i=1

The remaining matrix product (in (2.10)) is cross-covariance matrix for the vector

inputs and outputs, denoted Cn(n) and given by

0.,(n) s. [X(n)Y”(n)] = Z Mi):c(i)y”(i) . (2.12)

i=1

The conventional recursions of the MIL-based WRLS solution, which we upgrade

22

here to the complex vector case, are given by

 

where,

P(n) a c;1(n) (2.15)
C(n) é m”(n)P(n—1)a:(n) (2.16)
6,,_1(n) s. y(n)—O”(n—l)z(n). (2.17)

As in the real scalar case, it is assumed that the complex vector error sequence,

v(n), is pointwise “energy bounded,” i.e.,
7(n) tr {v(n)v”(n)} < l (2.18)

where the sequence 7(n) is known or can be estimated from the data, and tr {A}
denotes the trace of the matrix A. Since v(n)v” (n) is a hermitian matrix with real
diagonal elements, the sequence 7(n) is real numbers. This is useful in the proof of
Lemma 1 below.

The signiﬁcance of the bounding sequence on tr {v(n)v”(n)} is that it implies
pointwise “local membership sets” to which any reasonable estimate for 90 must

belong. If the local membership set at time n is w(n), it follows immediately from

(2.5) and (2.18) that

w(n) = {a I 7(71) tr { [y(n) — @Hz(n)] [g(n) — @H2(n)]H} < 1} , 9 E kaxk
(2.19)

where ('9 is a general matrix replacing Go. The interpretation of this set becomes

23

clearer when considering a single output, y,(n), the i‘” element of y(n). The related

subset is

w,(n) = {0,- I 7(n) [y,(n) —— 0,”.1:(n)] [31,-(n) -— 9[{x(n)]ﬂ <1} , 0,- E cm}: (2.20)

in which 9,- is the i‘” column of the parameter matrix 9. Each w,(n) takes the form
of a hyperstrip (or degenerate hyperellipsoid) in the parameter subspace 6””. If the
data and the 7(i) values are given over a range [1,n], then it local sets w(i),i =
1,2, . . . ,n (one for each observation) can be generated. The parameter matrix 90
must simultaneously belong to each of these sets, and therefore, must belong to a

“global membership set” given by

51(11): (33(1) . (2.21)

(Mi) will be a monotonically non-increasing set with i, and it will be the minimal
(most restrictive) membership set known under the conditions of the problem.
Following the same arguments as in the real scalar case (see Section 1.2.3.1), a
related (but potentially larger) global membership set associated with the WRLS
process (at time n) can be derived. This is done by noting that the constraint (2.18)

on v(n) implies that an “accumulated inequality” holds:

 

Lemma 1 Condition (2.18) implies

ﬂ

ZMi) tr {v(n)v”(n)} S iii—3 (2.22)

i=1

for any non-negative (real) sequence {Mi)}. The equality can be removed for n 2 i0,
where i0 is the minimum i for which Mi) aé 0.

 

'24

Proof of Lemma 1: (Guided by Deller and Luk [7]). That the equality holds for

n < i0 is obvious. At i0

 

1 r 1 "1 ”('0)
1(0)1{v(0)v (o)}< 7,0

which follows immediately from the positivity of Mia) and (2.18). But

1(1 011(1.,)11{ ”=(10)} 21(1) 1111(1){ v"(i)} (2.24)

and
= :Mi) . (2.25)

SO, . .

:1(1)1r{v(1)v”(1)} < gig—[3. (2.26)
Also
Adding (2.26) and (2.27)

io+l io+l Z

2; 1(1) 11{1(1)1”(1)} < Z: %1‘j . (2,28)
and so on, by induction. D

It is assumed, for convenience, that M1) 75 0, and therefore, Lemma 1 becomes

in i)t(i)r{ v(1)}<):j—E—. (2.29)

i=1 i=1 7

By inserting y(i) — 9512(i) for v(i), inequality (2.29) becomes

2": 1(1) tr [ [v(i) - 93’ 21(1)] [9(1) — 932(1)]H} < :1: 59-}- . (2.30)

i=1 7(2

25

This inequality is a fundamental result which leads to a global membership set, say
8:2(n), to which 90 must belong. Since ﬂ(n) (from the discussion above) is the

smallest known set, it must be true that (Mn) Q E(n). The main result is stated as

a theorem.

 

Theorem 3 Let “(12) Q 0““ be the set of all parameter matrices which are com-
patible with the data fori E [1,n] under constraint (2.18). Then there exists a super-
set of (Mn), say C(n), a hyperellipsoid in Cm“"”, which is closely associated with the
WRLS estimation process:

 

ﬁ(n)={® | tr{[G—é(n )]”C’((n)[9- é( n)]}<1} , 966111111).- (2.31)

where,

11(11): 11{o”(n)c n)e(n )}+§: ”(2

i=17 7(6)

— tr {Cy(n )}. (2.32)

in which 9 is a general matrix replacing Go.

 

As before, the interpretation of the set (Mn) is simple when considering each scalar

component of the output individually. The result is a corollary of the theorem.

 

Corollary 1 Under the conditions of Theorem 3, all possible parameter vectors as-
sociated with output yg, say 9,, are conﬁned to a hyperellipsoidal membership set
which is centered on its current estimate, 0,-(n),

 

a.(n)=(9. | [9.— 19,-(11))"C(n)(") [91—12(11)] <1} , 0,-6ka (2.33)

in which 6,-(n) is the WRLS estimate of column i of the parameter matrix @(11) at
time n using error minimization weights {Mi)}.

 

26

This means simply that there is a hyperellipsoidal domain in the parameter sub-
space which contains all possible parameter vectors and which is centered on the
WRLS estimate. Note that the ellipsoid associated with each y,, i = 1,2, . . ..k, is

identical to all others except for its center.

Proofs of Theorem 3 and Corollary 1: (Guided by Deller [45]) It follows imme-
diately from Lemma 1 that (see (2.30))

i: 1(1) 11 I [11(1) — @"x(i)} [11(1) — O”x(i)IH} < i % . (2.311)
This constrains the possible parameter matrices to the set
{(9 | 2: 1(1) 11 I [g(i) — O”x(i)} [11(1) — ®”x(i)IH} < 2:31-38} (2.35)

Expanding the trace term,

{9 l i Mi) tr {v(i)y"(i)- 9”z(i)v”(i) - v(i)z”(i)@ + ®”m(i)w”(i)®}
< Z— (’ (I,} (2.36)

i=17(z)

Moving the summation across terms,

W)
{9 I tr {Cy(n) — @"CIy(n)— ny(n)9 + GHC (n)@} < 1:1 jig—(0} (2.37)

where, C,(n) and C,y(n) are deﬁned in (2.11) and (2.12), and Cy(n) is deﬁned in

the same way as Cx(n). From (2.10),

Cut”) = Cx(n)é(n)

or Cg(n) = O”(n)Cf(n)=OH(n)Cx(n). (2.38)

This substitution in (2.37) and some simple manipulation yields

{o I 11(9"c.(n)o — e"c,(n)o(n) - e"(n)c,(n)o} < Z 3%} — 111{C,(n))}
i=1
(2.39)
Completing the square on the left side yields
{9 I tr {@"C,(n)@ - GHC;(n)(-:)(n) — é”(n)C,(n)€-) + é"(n)Cx(n)O(n)}

< éié—Z} -— tr {C,,(n)} + tr {9"(n)C,(n)€-)(n)} é n(n)} (2.40)

from which it follows that the set is described by

{O I tr{I@ -- O(n)}H C,(n) [G — é(n)}} < Ic(n)} . (2.41)

If the system is assumed to be stationary then C,(n) is positive deﬁnite, and the
left side of this inequality must be a positive number. x(n), therefore, must also be
positive. Dividing both sides by x(n) yields (2.31). D

To prove Corollary 1, it is convenient to write

. H .. "
tr I [e — e(n)] c.(n) [o - 9(n)}} = 21,- (2.42)
i=1
. H 1.
where c,- indicates the 1‘” diagonal element of [G - é(n)I C;(n) I9 — C(11)}. Now
it is clear that
H .
e,- = [9.- - 9101)] c,(n) [9,- — 9,-(n)} (2.43)
for any i, where 9; and 6,(n) are the i‘” columns of G and C(n). It is also true that
all the cj’s are positive since Cx(n) is a positive deﬁnite matrix. Therefore,

I:
q<Zc,-<x(n) foranyi=1,2,...,k. (2.44)

1:1

28

Dividing through by K(n) yields inequality (2.33). [:1

According to Corollary 1, all possible parameter vectors, 9,, associated with the
output y, are guaranteed to be in a hyperellipsoidal set which is centered on 9,02).
described by inequality (2.33). Further, the ellipsoids are identical for each i except
for the centers. It therefore is reasonable to use Mn) which maximally shrinks this
common ellipsoid if such can be found. Following the same arguments as in the real

scalar case (see Section 1.2.3.1), the quantity
det B(n) .-'= det x(n)C;1(n) (2.45)

is proportional to the volume of ellipsoid (2,-(11) of inequality (2.33). A reasonable
strategy is to ﬁnd A‘(n) at each step which minimizes the “volume ratio” of the

ellipsoids at n and n — 1:

_ detB(n)
— det B(n - 1) °

 

V(A(n)) (2°46)

 

Theorem 4 The weight, P(n), which minimizes the volume ratio (2.46), is the most
positive root of the quadratic equation

F01) = (12A? + 011)‘ + do = 0 (2.47)
where,

02 = (mlc — 1)G2(n)
a1 = I2mlc - 1 + 7(n) tr{en-1(n)ef_,(n)} — 16(n — l)7(n)G(n)I C(n)
90 = m: [1 — v(n) tr{e.-1(n)ef..(n)}] - n<n — 1)1(n)G(n) .

 

Proof of Theorem 4: (Guided by Deller [45]) Substituting the deﬁnition of B(n)

into (2.13) results in

B(n) _ B(n— 1) _ n
x(n) - x(n—l) M )

B(n — l):c(n)a:”(n)B(n -1)
Kzfn - 1) ll + 400001)]

 

 

 

Deﬁning h(n) i l + Mn)G(n) and r(n) i- K(n)/x(n — 1) yields

_ Wild")
B(n) _ B(n -1)r(n){I - n(n -1)h(n) [B(n —1):c(n)]H} . (2.49)

 

So,

 

_ rfnlbfnlzfn)
detB(n) _. detB(n — 1) det {r(n)I-— x(n -1)h(n) [B(n —1)x(n)]H} . (2.50)

Using the matrix identity [46] (for the complex case)
det(cI + yz”) = emf-Me + y”:) (2.51)

where y and z E Cm“ and c is a real number. The term inside the braces becomes

r n n x” n
rink-.101) {T(n) _ EC 23: 1))h(7(1))B(n -1):c(n)} ' (252)

 

Using (2.49), the term in (2.52) can be written as

rm”(n) {1 — W} = :11). (2.53)

M") h(n) '

Therefore, to minimize the volume ratio (2.46) with respect to Mn), (2.53) is differ-

entiated and the result is set to zero.

a (r”"‘(n))
0Mn) h(n)

 

0 (2.5.1)

30

 

C(71) E 0 . (2.57))

Since r"""1 (n) # 0,
0r(n)

mkh(n)a/\(n)

 

- r(n)G(n) == 0. (2.66)
The partial derivative of r(n) is further expanded as follows: Inserting the right side
of (2.14) into (2.32) for C(n) gives
* H“1 ” —1(
11(11) = tr{[®(n - 1) + 1(11 )P(n )a:(n)e,, (11)} P (n)

[é<n-1)+1(n )P (n) .- ’t.(n)]}+2 -—[-:.I - 1110161)) (2.57)

= 11 {6”(12 _ 1) [P“(n _ 1) + Mn)a:(n):cH(n)} o(n — 1) +

1(n)e.-1(n)x”<n)©(n - 1) + 1(n)é”(n - 1)c(n)ef.'..(n) +
P(n -1)x(n)x”(n)P(n — 1)

 

12999.- (n):c"(n) [P(n — 1) — an) , + A(n)G(n) 2191) 53.19)
+ :ﬁ— tr{C,(n )} (153)
Mn
: n(n—1)+:7-(-;1—}—(ri)()(ri)tr{yyﬂ(n)}+

n) " 511-1001” ‘1' 511-101) [31(9) - 5n—1(n)lH +

(
A2("kn-1("Riff—100301)I
1 + Mn)G(n)

Mn) tr {1961) — 9-19119

fy(n) - 611—162)] 6.7.161) +

 

 

 

(2.61)

31

or,

 

 

 

r n _ n(n) - )(n) _ Mn) tr [6,,_1(n)c,’,’_,(n)} . .,
( )— n(n—l) _1+ n(n-1)7(n) n(n—1)h(n) ° (2'0‘)

Differentiating this result with respect to Mn) yields

 

 

3r(n) _ 1 I 1 tr {en-1(n)€n- 40‘le (2.63)

aMn) _ x(n - 1) 7(n) — h2(n)

Putting this result in (2.56) and replacing n(n) with the right side of (2.61) yields

mkh(n) { 1 _ tr{€n-1(n)€f—1(n)}}_

 

 

 

 

n(n -1) v(n) 112(9)
1(1) Mn) 11 {e.-1(n)e.*.’_1(n)} __
{1+ n(n —1)7(n) — n(n -1)h(n) GUI) _ 0 (2.64)
mk [h2(n )— 7(n) tr {en_(n) (n)-}I
[11(11 — 1)7(11)h(11) 1. Mn)h(n) _ 1(11)~,(11) 11 {1,_,11() ,(11)G(11}I

When h(n) is replaced by its deﬁnition (see (2.49)), the following result is obtained

after some manipulation

2 n){(mk — 1)02( 11)} +
1(11) {2111/1 _ 1+ 7())11{;.,_,(11)1"1,,_,(11)}_ 11(11 _1)(1(1)}G

mkIl-7(n)()tr{en161(n)()7(n}I—xn—l)G(n=) 0 (2.66)

which is the quadratic (2.47).

Finally, it is noted that the 012 coefficient of the quadratic is always positive, so

32

that F(A) is concave upward. It follows immediately that the larger real root of
1"(21) = 0 (if it exists) will correspond to a local minimum of V(Mn)). This root must
be real and positive to be a valid weight (see Lemma 1). D

Theorems 1 and 2 can be considered as special cases of Theorems 3 and 4 in
which the data are real scalar quantities. Another special case can also be derived
when the data are complex scalar quantities and the fundamental results are stated
below.

Consider the model of the form

y(n) e 9511(11) + v(n) (2.67)
with the constraint
r(n) I v(n) I2 < 1 (2-68)
and an error term
1.-1(n) i an) — 9% -1)11(..) . (2.69)

All the quantities above are complex except ')(n) which is real.

The hyperellipsoidal membership set associated with this special case is easily
deﬁned by noting that Corollary 1 applies directly to this case with 9; and 9.- re-
placed by 9 and 9, and tr {Cy(n)} (in Theorem 3) by 23;, Mi) I g(i) I2. Similarly,
the quadratic equation (2.47) can be applied with tr {en_1(n)c,’,f_l(n)} replaced by

l 571-101) l2-

33

2.3 Suboptimal Tests for Innovation in SM-WRLS
Algorithms

A signiﬁcant reduction in computational complexity can be achieved by employing
a “suboptimal” test for information content in an incoming equation. The proposed
check is argued to be a useful determiner of the ability of incoming data to shrink the
ellipsoid, but it does not rigorously determine the existence of an optimal SM weight
in the sense described above. The main issue here is to avoid the computations of the
quantities necessary at each step to construct and solve the quadratic (2.47) in cases
in which the quadratic turns out only to be useful for the purpose of checking for the
existence of a meaningful weight. Since most of the time these computations result
in the rejection of incoming data, a more efﬁcient test could signiﬁcantly reduce the
complexity of the algorithm.

The estimation error matrix at time n can be denoted by

h

o(n) 1; e1, - 9(11) . (2.70)
The following inequality results immediately from (2.31),
é"(n)C,(n)é(n) < n(n) . (2.71)

Using a similar inequality (for the real scalar case), Dasgupta and Huang [14] have
noted that their n(n)-like quantity provides a bound on the error vector (or matrix
for the generalized case) sequence and have suggested minimizing this quantity with
respect to Mn) in an effort to decrease computational complexity. However, this
minimization does not, in general, imply an improvement in the estimate with respect
to previous times, since both sides of the inequality (2.71) are dependent upon Mn).

Further, the nonexistence of a minimum of n(n) with respect to Mn) is not very

34

informative in this sense. However, further arguments are presented here to provide
support for this process in the SM-WRLS context.

Consider the usual volume quantity to be minimized at time n, deﬁned in (2.15).
Let us temporarily write the two key quantities there as functions of Mn) : Cr(n, Mn))
and n(n, Mn)). It is assumed that enough equations have been included in the covari-
ance matrix at time n —- 1 so that its elements are large with respect to the data in the
incoming equation. Now the quantity det C,(n,Mn)) is readily shown to be mono-
tonically increasing with respect to Mn) on Mn) 6 [0,00) (see Appendix A), with
C,(n, 0) = C,(n - 1, X'(n —1)), where A‘(n — 1) indicates the optimal weight at time
n — 1. Under the assumption above, det C,(n,Mn)) will not increase signiﬁcantly
over reasonably small values of Mn). The attempt to maximize det C,(n,Mn)) in
(2.45) causes a tendency to increase Mn) in the usual optimization process. However,
the attempt to minimize n(n, Mn)) generally causes a tendency toward small values
of Mn), unless a minimum of n(n, Mn)) occurs at a “large” value of Mn). To pursue
this idea and further points of the argument, key results about n(n, Mn)) are noted

in the following.

 

Theorem 5 n(n,Mn)) has the following properties:

0 0n the domain Mn) 6 [0,00), n(n,Mn)) is either monotonically increasing or
it has a single minimum.

0 n(n, Mn)) has a minimum on Mn) 6 [0,00) iﬂ'

tr{£n_1(n)£f_1(n)} > 7’1(n) . (2.72)

 

35

 

Lemma 2 Let A‘(n — 1) denote the optimal weight in the sense of 'l‘heorem 4 (whirl)
might be zero) at time n — 1. Then

_ . Mn) Mn) tr{e,,_1(n)6£,’_,(n)} . _.
n(n,Mn))_n(n—1,)\ (n_1))+:y_(TII- l+Mn)G(n) . (2.73)

 

 

Proof of Lemma 2: See (2.61).
Proof of Theorem 5: The minimum of n(n, Mn)) with respect to Mn) can be found

by differentiating (2.73) and setting the result equal to 0,

 

311(11, A(71)) 2 2 u _
W5)— : G (11)1 (11) + 20(11)1(11) + [1— 7(11) 11 {1.._,(11)1,,_,(11)}I = 0 .
(2.74)
This is a concave upward quadratic function with its minimum at
X(n) = —G"1(n) < 0 . (2.75)
Two real roots of (2.74) always exist,
—1:t n tr 6”. n61 n
11101.01) = 1/7( ) { 1( ) 1( )} (2.76)

 

C(n)

the smaller corresponding to a maximum of n(n,Mn)), the larger to a minimum.
Only the larger root can be positive since the lower root is bound to be less than
/\'(n). Therefore, it is only possible for n(n,Mn)) to exhibit a minimum or to be
increasing on positive Mn). It is easy to use (2.76) to verify that the larger root. is
positive if condition (2.72) is met. Cl

With these results, it can be argued that: If det Cx(n,Mn)) is increasing, but
not changing signiﬁcantly over reasonably small values of Mn), then it is sufﬁcient

to seek Mn) which minimizes n(n, Mn)). If n(n, Mn)) is monotonically increasing on

36

Mn) 2 0, this value is Mn) = 0 which corresponds to rejection of the equation at time
n. It sufﬁces, therefore to have a test for a minimum of n(n,Mn)) on positive Mn).
As noted above; a simple test is embodied in condition (2.72). If this test is met, it
is then cost effective to proceed with the standard optimization process centered on
(2.47). Otherwise, the explicit construction and solution of (2.47) can be avoided.

It is to be noted that even if (2.72) is met, it is possible that the optimization
procedure will still reject the datum. Perhaps more importantly, it is also possible
for (2.72) to reject data which would have been accepted by the usual process. These
ideas will be explored in the simulation studies (Chapter 3).

Finally, note that when the simpliﬁed test (2.72) accepts the new equation, there
are tools to compute the weight which is “optimal” in the sense of minimizing
n(n, Mn)). In particular, this would be the larger of the roots in (2.76). However,
it clearly makes more sense to compute the optimal weight according to (2.47), since
this computation is not much more expensive. The improvement in the computational

complexity due to “suboptimal checking” is discussed in Chapter 4.

2.4 Adaptive SM-WRLS Algorithms

While the theoretical developments of the previous two sections, in principle, provide
the background for further SM-WRLS developments with general complex vector in-
puts and outputs, it is upon the special case of real scalar data that most of the
remaining work will focus. This special case was necessary in order to initiate a
tractable study of the difﬁcult issues of adaptation, algorithmic behavior, and archi-
tecture development. The work on this simpler case which is to follow, however, will

lay the groundwork for future studies on the more general cases.

37

2.4.1 General Formulation

The adaptive algorithm presented here uses “back rotation” in order to partially
or completely “forget” past information enabling it to track (potentially fast) time
varying signals. Back rotation [40] is a Givens rotation-based technique that removes
(or rotates out) a previously included equation from the system. In this section, the
back rotation technique is modiﬁed such that a previous equation can be partially
removed. This will permit a broader class of adaptive strategies. In SM terms,
back rotation causes the ellipsoidal membership set to expand due to the removal of
information. This expansion entices the algorithm to incorporate present data. The
back rotation technique requires that all the weights with the corresponding equations
(for weights other than zero) be stored for later use.

Recalling Fig. 1.2, it is seen that at each step in the SM-WRLS algorithm, the
upper triangular system of simultaneous equations T(n)9(n) = d1(n), is solved (when
data are accepted) to obtain the optimal estimate [6, 7, 9]. Suppose in approaching
time n that the past equation to be (partially) removed is at time 1'. Rotating this
equation out of the system is accomplished by re-introducing it as though it were a new
equation. A weight M, where u is the fraction of the equation to be removed from
the system, is used, and some sign changes in the rotation equations are necessary
[40]. The system of equations with r removed is referred to as the “downdated”

system at time n — 1, and the related quantities are labeled with subscript d, i.e.,
T.(11 — 1)91(n — 1) = 91,.(11 — 1) . (2.77)
The downdated ellipsoid matrix is Cd(n — 1)/nd(n — 1) where

01(11—1) = T§(11—1)Td(n—1), (2.78)

1102 — 1) = 1111.101 —1) 111+ 71102 — 1) . (2.79)

38

with
INT) (
l

Rd(n -— 1) i h(n — 1) — 7(7)

 

- 7(1)y2(1)) (2.80)

in which h(n — 1) represents the updated value of k which includes the equation at
time n — 1. Equations (2.79) and (2.80) follow immediately from the deﬁnition of 11
found in (1.23). These relations can be used repeatedly regardless of the number of
equations (partially or completely) removed prior to time 12. If more than one equation
is removed prior to n, E(n — 1) in the right side of (2.80) is replaced by the h(n — 1)
for all downdates after the ﬁrst one. Following all necessary downdating just prior to
time n, the algorithm uses the downdated system to compute the quantities Gd(n)
and tin-1,1101) which are necessary to compute the optimal weight for the equation at
n. To compute a downdated SM-WRLS estimate, therefore, it is only necessary to
downdate the matrix T(n — l) and the vector d1(n — 1) and to solve for 94(n — 1) prior

to Step 1 in Fig. 1.2, then replace all relevant quantities in Step 1 by their downdated

versions, i.e.,
0101) = Hera-‘01 -1)T1T(n -1)1(.1) = n 9.01) 1):. (2.81)

and

(n-1,d(n) = y(n) - 95(n — 1)x(n) . (2.82)

n(n - 1) and h(n — 1) are downdated according to (2.79) and (2.80). Then A‘(n)
is found in Step 2 using (1.26) with downdated quantities. Note that downdating is
unnecessary if the equation 7' was rejected by SM-WRLS. In this case Td(n — 1) =
T(n — 1) and 9.1(n —- 1) = 9(n - 1). Conversely, when the “new” equation at n is
rejected, then T(n) = Td(n — 1) and 9(n) = 9.1(n - 1).

A wide range of adaptation strategies is inherent in the general formulation de-
scribed above. Three major subcases are identiﬁed in the following. In each of these

subcases, the objective (in SM terms) is to expand the ellipsoidal region of possible

39

solutions in order to track fast time variations in the signal.

2.4.1.1 Windowing

Windowed adaptation is a special subcase of the general formulation which uses a
sliding window of ﬁxed length, l (l 2 m + 1), so that the estimate at time 77. covers
the range In — l + 1, n]. The windowing technique is made possible by the ability to
completely (i.e., p = 1) and systematically remove equations at the trailing edge of
the window. An illustration of this technique is shown in [5].

This technique implements the same procedure as that of the basic SM-WRLS
algorithm for i = 1, 2, . . . , l, and thus, exhibits similar performance. The initialization
process (for this strategy and all other GR-based SM-WRLS strategies) is also the
same as that of the basic SM-WRLS algorithm; i.e., the working matrix W (or T)
is filled with zeros, Mi) = 1 for i = 1,2,...,m + 1, and 76(0) = 0 (see Fig. 1.2).
Windowed adaptation starts at time 1 +1 and works as follow: Prior to consideration
of the equation at time n (n _>_ I + 1), it is simply necessary to remove the equation
at time 1' = n —l from the system by complete “downdating” as described above. All
the theoretical results of the general formulation are valid with the value kd(n — 1)

given by

1101-1) = . 190—1101121110

i=n—l+l 7 Z)

, Mn —I 2
= n(n — 1) — ﬂ (1 — 7(n - ()3; (n — 1)) (2.83)

in which the value n — l acts like a special 7' for the windowed adaptation.

2.4.1.2 Graceful Forgetting

In this technique, only a fraction, )1 (chosen such that p“ is an integer), of all previ-

ously included equations is removed at each n similarly to the exponential forgetting

40

factor conventionally used with WRLS”. This technique has a sliding window of ﬁxed

1

length, )1” , with the effective weights decreasing linearly when moving toward the

trailing edge of the window. Hence, the equation at the trailing edge of the window

 

has an effective weight of ([pMn + 1 -— p") and the equation just rotated in has an
effective weight of I/Mn). Note that although each equation must be partially rotated
out )1" times, only those equations that were previously accepted (in the past 17"

recursions) need to be considered by the algorithm.

2.4.1.3 Selective Forgetting

This technique selectively chooses the equations to be (partially or completely) re-
moved from the system based on certain user deﬁned criteria in order to remove their
inﬂuence from the system. The selection process can be, for example, to remove (or
downweight) only the previously heavily weighted equations, to remove the equations
that were accepted in regions of abrupt change in the signal dynamics, or to remove
the equations starting from the ﬁrst equation and proceeding sequentially. Whatever
the criteria, a fundamental issue is to detect when adaptation is needed to improve
the parameter estimates. This issue is further investigated in Chapter 3.

Therefore, the adaptive SM-WRLS algorithm and its extensions (e.g., the special
subcases described above) are potentially very useful techniques that can be applied
to identify models with fast time varying signals. Due to the ﬂexible nature of this
algorithm, however, various subcases can also be deﬁned and tested. In any subcase,
the objective (as noted above) is always to expand the ellipsoidal region of possible
solutions. This objective can also be achieved in an unconventional but intuitive
method. For example, an ad hoc adaptation strategy may be to inﬂate the ellipsoid

(i.e., multiply the matrix Cz(n)/rc(n)) by a scale factor, say a (0 < a < 1), when

 

3An exponentially forgetting factor can also be incorporated into SM-WRLS. Although the strat-
egy is computationally very efﬁcient, it has not been found to be effective for adaptation in prelim-
inary experiments. See Section 2.4.2 for theoretical development and discussion.

41

there is a need for adaptation.
It is important to note that the general adaptive formulation is amenable to the

suboptimal strategy presented in Section 2.3. The performance of the adaptive,

suboptimal, and adaptive suboptimal techniques are investigated in Chapter 3.

2.4.2 Exponential Forgetting Factor Adaptation

In this section, it is shown that exponential forgetting factor“ adaptation can be
incorporated into SM-WRLS. The theoretical development of this computationally
efﬁcient strategy is presented, followed by a discussion explaining why this method is
not found to be effective for adaptation in simulations to date.

It is possible to make the conventional WRLS solution by GR’s adaptive by mul-
tiplying the existing system of equations of the form T(n)9(n) = d1(n), through by
a “forgetting factor,” [3, where, 0 < B < 1, prior to rotation of the equation at time
n + 1 into the system [40, 43, 44]. Since this process nominally inﬂates the ellipsoid
size by removing information, the optimization at the next time must be performed

with respect to this “intermediate” system, say
'r.(11)9(11) = 111,.(11) , (2.84)

where Ta(n) = 9T(n) and d1,a(n) = ﬂdl(n). Note that in order to use (1.26),

“ada ted” versions of e, G, and is must be used. It is easy to see that
P

(w(n +1) = 6,,(n +1) (2.85)

 

‘This is a more conventional “forgetting factor” strategy than that suggested by Dasgupta and
Huang in [14].

(since 9(11) is unchanged), and 11111
G.(n +1) = 940(1) +1) . (2.86)
Recalling (1.23) and the alternative expression for 5(a) in (1.34), and noting that
C1.a(n) = TZ(n)Ta(n) = 62TT(n)T(n) = B2Cx(n) , (2.87)

the leading term of 76.,(n) is easily shown to be II d1,a(n) ”3 = 92 II d1(n) “3. To
complete h(n), note the fact that, at time n + 1, this method effectively assigns
weight ﬂn+1“m to the equation at time i, where Mi) is the weight assigned to
the equation when originally rotated into the system [44]. Therefore, using (1.34) and

the discussion above, yields

h(n) = 6’ ll d1(n) H3 + (“n(n) (2188)

where,

 

 

71.01) = 2": WWW") (1 — 109201))
1:, 7(1)
= 921.(n-1)+”7If;)(1-11n)y’(n)). (2.89)

To introduce exponential forgetting factor adaptation into the SM-WRLS algo-
rithm, therefore, it is only necessary to downweight the matrix T(n — 1) and the
vector d1(n — 1) by multiplying through by ,9 prior to Step 1 in Fig. 1.2, then to
replace all relevant quantities in Step 1 by their downweighted versions. xa(n -- 1)
and ka(n — 1) are updated according to (2.88) and (2.89). Then A‘(n) is found in
Step 2 using (1.26) with downweighted quantities.

In preliminary experiments, it has been found that this strategy is not effective

43

for adaptation. Actually, this strategy performs well when applied to slow time
varying systems but fails to track fast time variations. To CXplain this behavior, it is
important to understand the effect of the “downweighting” process. As noted earlier,
multiplying the system of equations (at each recursion) by 9 (0 < 5 < 1) inﬂates
the ellipsoid volume by a factor that is inversely proportional to 6. Therefore, a large
value of 9 causes the ellipsoid volume to increase slightly which lessens the tendency
to accept new equations and the ability to track fast time varying signals. On the
other hand, a small value of 9 causes the ellipsoid volume to increase signiﬁcantly, and
therefore, a tendency to accept more equationss. However, every time an equation is
accepted, the ellipsoid volume decreases accordingly. Hence, a small value of 9 may
result in “bad” estimates due to the continuously expanding and shrinking ellipsoid
and, more importantly, to the fact that the effective window length is very small.
Therefore, it is desirable to choose 9 to be large but not to the extent of having
insigniﬁcant effect on expanding the ellipsoid volume.

To pursue this discussion further, consider the problem of estimating a signal
characterized as having slow time variations everywhere except in a region around
time n at which the signal exhibits fast time variations (see, e.g., Fig. 3.3, where
n = 2000). The fact that the signal is changing very slowly prior to n induces
the algorithm to accept some points which, in turn, causes the ellipsoid volume to
decrease. An increase in the “conﬁdence” of the estimate results. Near time n, the
ellipsoid volume becomes very small. When the signal moves rapidly away from its
current location, it eventually moves outside the ellipsoid which is therefore no longer
a valid bounding ellipsoid. The situation can be described as follows: The signal
parameters are wandering outside of a small ellipsoid, but the incoming equations are

being rejected (because of the high conﬁdence in the existing estimate) until some of

 

5Note that one of the main advantages of SM-WRLS is the use of a small fraction of the data.
Therefore, from the SM point of view, small values of 9 are least desirable.

44

the earlier influential equations are downweighted (or suppressed). Remember that
the “downweighting” process is recursively being implemented. Hence, if 5 is large,
it takes a long time to suppress the inﬂuence of previously accepted equations, or, in
SM terms, the ellipsoid volume increases at a slow rate which may not be sufficient to
track the fast time varying signal. This explains why the algorithm ceases accepting
new equations for a period of time (which is dependent on B), and therefore, fails to
track the signal.

The adaptive SM-WRLS algorithms (of Section 2.4.1) do not depend on a ﬁxed
factor, such as 9, to expand the ellipsoid volume in order to adapt to the changing dy-
namics of the system. However, these algorithms expand the ellipsoid by (selectively)
removing previously accepted inﬂuential equations from the system, either partially
or completely, and therefore, relinquishing their inﬂuence from the current ellipsoid,

thereby allowing it to expand and adapt to the changes in the signal dynamics.

2.5 A Survey of the Computational Complexities
of Several Related Sequential Algorithms

The purpose of this section is to compare the computational complexities of several
related sequential algorithms that solve the LS problem. The computational complex-
ities are shown in Table 2.1 in which the ﬁrst column indicates the algorithm under
study, the second column gives the complexity (in ﬂoating point operations (ﬂops)
per equation) required to check for a valid SM weight, where one ﬂop is deﬁned as
one ﬂoating point multiplication plus one ﬂoating point addition. The third column
gives the complexity required to update (or downdate in the case of back rotation)
the covariance matrix and the LS solution, and the fourth column gives the total
number of ﬂops per equation for a typical example described below.

There are two theoretically equivalent methods associated with conventional LS

45

 

Table 2.1: Computational complexities (in ﬂoating point operations (flops) per equa-
tion) for the real scalar sequential algorithms.

 

 

 

 

Covariance and Example
Algorithm Checking Solution Update (ﬂops)

Batch Solution -- 0(m3) > 1000
MIL—based WRLS — 3m2 + 5m + 3 353
MIL-based SM-WRLS m2 + 2111 + 13 2m” + 3m + 7 180
Suboptimal MIL-based SM-WRLS (m + 1) + 453(m2 + m + 12) 2m2 + 3m + 7 47
GR-based SM-WRLS .5m2 + 2.5m + 13 2.5m2 + 10.5m + 5 160
Suboptimal GR-based SM-WRLS (m + 1) + s(.5m3 + 1.5m + 12) 2.5m2 + 10.5m + 5 55

 

 

 

 

 

 

solution; the “batch” solution [39] which requires 0(m3) ﬂops per equation, and
the MIL—based WRLS solution [34, 35] which requires 0(m2) ﬂops per equation
(see Section 1.2.1). The SM-WRLS algorithm can also solve the problem in 0(m2)
ﬂops per equation based on MIL (see Section 1.2.3.1), however, a GR-based solution
(outlined in Section 1.2.3.2) is also considered because it is amenable to a systolic
architecture implementation which reduces the complexity of the algorithm to 0(m).
The latter algorithm is the subject of Chapter 4.

Similar computational complexity expressions for the MIL- and GR-based SM-
WRLS solutions can be derived for the suboptimal strategy of Section 2.3, and are
shown in Table 2.1. Note that there is one square root operation associated with each
of the SM-WRLS algorithms but has been dropped from the table since it does not
have any signiﬁcant effect on the comparison.

A note about the computational complexity of the OBE algorithm [20] is in order.
It was noted by Huang [20] that the complexity of the OBE algorithm is in the order

of m2 multiplications (or ﬂops) for the information evaluations (i.e., checking), while

46

 

updating the estimates requires 6m2 multiplications (or ﬂops). However, a careful
analysis of the OBE algorithm during the course of this work, taking into account the
symmetric properties of some matrices, reveals that the complexity of this algorithm
is m2 + 2m + 3 ﬂops for checking and 2m2 + 3m +16 ﬂops for updating the estimates.
A similar analysis of a “suboptimal” OBE algorithm [14] shows that the complexity
of this algorithm is m + l ﬂops for checking and 3m2 + 4m + 15 ﬂops for updating
the estimates.

If the fraction of the data accepted by the SM-WRLS algorithm is denoted by
r, the fraction of the data accepted by the suboptimal SM-WRLS algorithm by s
(s < r), and the fraction of the data accepted by the SM-WRLS algorithm after
passing the test (2.72) by t (t S s), then the total computational complexities of
the SM-WRLS algorithms shown in Table 2.1 can be deﬁned as follows. For the

MIL-based SM-WRLS algorithm, the total computational complexity is given by
(m2 + 2m +13) + 1' [2m2 + 3m + 7] (2.90)

flops per equation. For the suboptimal MIL-based SM-WRLS algorithm, it is given
by

(m+1)+s[m2+m+12]+t[2m’+3m+7] (2.91)

ﬂops per equation. For the GR-based SM—WRLS algorithm, it is given by
(.5m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] (2.92)

ﬂops per equation. Finally, for the suboptimal GR-based SM-WRLS algorithm, it is
given by

(m + 1) + .3 [.5m2 + 1.5m +12] +1 [2.5m2 +10.5m + 5] (2.93)

ﬂops per equation.

47

Let us consider a typical example to compare the complexities of the various
algorithms. Assume that the model order m = 10, the fraction of the data accepted
by the SM algorithms (both SM-WRLS and OBE), r, is 0.2, and the fraction of the
data accepted by the suboptimal SM algorithms, 3, is 0.1. To simplify the analysis, it
is assumed that all the equations satisfying condition (2.72) for the suboptimal SM-
WRLS algorithm are also accepted by the SM-WRLS algorithm (3 = t). The total
number of ﬂops (per equation) is shown in the fourth column of Table 2.1. According
to the computational complexities computed here, the OBE algorithm uses 173 ﬂops
and the “suboptimal” OBE algorithm uses 47 ﬂops. The SM algorithms typically
use 45 ~ 50% the number of ﬂops required by the MIL-based WRLS algorithm. The
computational savings are mainly due to the infrequent updating of the covariance
matrix and the LS solution. A suboptimal test which is presented in Section 2.3
can be incorporated in virtually any version of the SM-WRLS algorithms to further
improve the computational efﬁciency. When the suboptimal strategy is applied to a
given algorithm, it reduces the computational complexity of the algorithm by 60 ~
70%.

The adaptive GR-based SM-WRLS algorithms of Section 2.4.1 use the same com-
putational complexities as those of the “non-adaptive” GR-based SM-WRLS algo-
rithms when performing back substitution to downdate the covariance matrix and
the LS solution. The only exceptions being 7:4 (see (2.80)) which requires one extra
ﬂop per equation for cases when p ;é 1, and the fact that the LS solution needs to be
computed (downdated) only after all necessary downdating of the covariance matrix
and F: at any given time (see Section 2.4.1). If the fractions of the data used by
the adaptive GR-based SM-WRLS algorithms are denoted by the same symbols used
I above, and the fraction of the data removed from the system is denoted by u, then
the total computational complexities of the adaptive SM-WRLS algorithms are as

follows. For the windowed and selective forgetting SM-WRLS algorithms, the total

48

computational complexity is given by
(m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] + u [2m2 + 10171 +5] (2.9.1)

ﬂops per equation. For the suboptimal windowed and selective forgetting SM-WRLS

algorithms, it is given by
(m + 1) + 3 [.5m2 + 1.5m +12] +t]2.5m2 +10.5m + 5] + u [2m2 + 10m + 5] (2.95)
ﬂops per equation. For the graceful forgetting SM-WRLS algorithm, it is given by
(.5m2 + 2.5m + 13) + r [2.5m2 + 10.5m + 5] + p-lu [2m2 + 10m + 6] (2.96)

ﬂops per equation. Finally, for the suboptimal graceful forgetting SM-WRLS algo-

rithm, it is given by

(m + 1) + 3 [.5m2 +1.5m +12] +1 [2.5m2 +10.5m + 5] + ,rlu [2m2 + 10m + 6]
(2.97)
ﬂops per equation.

Consider the same example with the new assumption that the fraction of the data
removed from the system, 11, is half that of the data accepted (i.e., u = 0.1 for the
SM-WRLS algorithm and 0.05 for the suboptimal strategy). Both the windowed and
the selective forgetting strategies have the same complexity (191 ﬂops per equation)
which is reduced to 67 ﬂops per equation when the adaptive suboptimal strategy is
employed.

Using the same fractions of data and a p value of 0.005, the graceful forgetting
strategy use a total of 6280 ﬂops per equation which is reduced to 2565 ﬂops per

equation for the adaptive suboptimal strategy. This larger number is due to the fact

49

 

Table 2.2: Computational complexities (in complex ﬂoating point operations (cﬂops)
per equation) for the generalized sequential algorithms.

 

 

 

 

j
Covariance and Example
MIL-based Algorithm Checking Solution Update (cﬂops)
WRLS — 3(mk)’ + (21: + 3)mk + k + 2 32312
SM-WRLS (mic)2 + (I: + 1)ml: + I: + 12 2(mk)’ + (I: + 2)mk + k + 6 15365
Suboptimal SM-WRLS (mi:2 + k) + a [(77%)2 + mk + 12] 2(mlc)2 + (k + 2)mk + k + 6 7276

 

 

 

 

 

 

that each equation must be partially rotated out p“ (or 200 in this case) times. Al-
though this technique “forgets gracefully” and quickly adapts to the rapid changes in
the signal dynamics, it is computationally expensive. However, the adaptive systolic
architecture (of Section 4.3) reduces the computational complexity of the “sequential”
adaptive GR-based algorithm by 60%.

In Chapter 4, the GR-based SM-WRLS algorithm is mapped into a systolic ar-
chitecture for speed advantages. The (parallel) complexities of the parallel GR—based
SM-WRLS algorithm and its suboptimal version are discussed in Chapter 4, however,
it is worth noting that, compared to the complexity of the “sequential” GR-based al-
gorithm, the systolic architecture implementation reduces the complexity of this algo-
rithm by about 60%, and when the suboptimal strategy is employed, the complexity
is reduced by 84%.

In order to ﬁnd the computational complexity of the generalized SM-WRLS al-
gorithm developed in Section 2.2, let us deﬁne a complex ﬂop (cﬂop) as four real
ﬂoating point multiplications plus four real ﬂoating point additions. The computa-
tional complexities of the MIL-based WRLS, SM-WRLS, and suboptimal SM-WRLS

algorithms are shown in Table 2.2. The fourth column in this table shows the total

50

number of cflops (per equation) when using the same example used for the real scalar
case with k = 10. The SM-WRLS algorithm typically uses 45 ~ 50% the number
of cﬂops required by the MIL—based WRLS algorithm which is consistent with the
real scalar case (see Table 2.1). However, when the suboptimal strategy is applied,
it reduces the computational complexity of the algorithm by 75 ~ 80%. The com-
putational savings for the complex vector case are more than those reported for the
real scalar case because of the fact that the suboptimal strategy performs “scalar”
checking compared with “vector” updating. However, the computational complexity
of the generalized sequential GR—based SM-WRLS algorithm is expected to be worse
than that of the MIL-based algorithm due to the fact that the LS solution matrix has

to be solved for one vector at a time.

 

51

Chapter 3

Simulation Studies

 

3. 1 Introduction

SM algorithms have the potential for application to many real digital signal process-
ing problems such as speech recognition, image processing, beamforming, spectral
estimation, and neural networks. This chapter is concerned with testing the behavior
of the various suboptimal and adaptive SM-WRLS strategies presented in Chapter 2
for the real scalar case. This is done by conducting extensive simulation studies which
illustrate important points about these strategies and about the SM-WRLS algorithm
in general. The simulation studies are performed on models derived from real speech
data. Section 3.2 discusses the results using a model of order two so that the results
can be easily illustrated. The performance of a more realistic model of order 14 is
analyzed in Section 3.3. These simulation studies are carried out on a 322-bit ma-
chine, however, it is important to research the behavior of the SM-WRLS algorithm
on a smaller wordlength. The performance of this algorithm is tested using a 16-bit

wordlength and is discussed in Section 3.4.

52

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

'2. ﬂ 1 v T f

,o 1 2. 3. to. s. 51. 7,
Sample. n (X103)

Figure 3.1: The acoustic waveform of the word “four”.

 

3.2 Simulation Results of two AR(2) models

In this section, the identiﬁcation of two time varying AR(2) models of the form
y(n) = 010090! - 1) + adult/(n - 2) + v(n) (3-1)

is considered. Two sets of AR parameters were derived using linear prediction (LP)
analysis of order two on utterances of the words “four” and “six” by an adult male
speaker. The acoustic waveforms of these two words are shown in Figs. 3.1 and 3.2.
While more meaningful analysis of speech would involve model orders of 10-14 (see,
e.g., [47]), this small number of parameters is used here so that the results are easily
illustrated. The adaptive LP algorithm used to compute these parameters is de-
scribed in [44]. The data were sampled at 10 kHz after 4.7 kHz loxvpass ﬁltering, and

53

 

 

 

 

 

 

 

 

 

 

 

-2. ﬁ ‘r 17 r T r

o 1 2. 3. s. s. s. 7.
Sample. n (X103)

Figure 3.2: The acoustic waveform of the word “six”.

 

the “forgetting factor” in the LP algorithm (see [44]) was 0” = 0.996. A 7000 point
sequence, y(n), for each case (“four” and “six”) was generated by driving the appro-
priate set of parameters with an uncorrelated sequence, v(n), which was uniformly
distributed on [—1,1]. v(n) in each case was generated using a random number gener-
ator based on a subtractive method [48]. In the simulations below, the conventional,
adaptive, suboptimal, and adaptive suboptimal SM-WRLS algorithms are applied to
the identification of the a.- parameters.

In the following, the simulation results are shown and discussed. In each ﬁgure,
there are two diﬂerent frames, one for each parameter. Each frame shows two curves,
one for the true parameter, the other for the estimate obtained by the algorithm
under study. Additionally, the true parameters for the words “four” and “six” are

shown in Figs. 3.3 and 3.4, respectively. These are provided as a reference in cases

54

 

 

 

-2 d

 

 

 

-3 w r r v v r
3. ‘9. 5 6 7
Sample. n (x103)

(a)

O
'-
N

 

L

-2

 

 

 

‘3 r r r T r r

3 ‘9 5 6 7
Sample. n (x103)

(b)

O
N

Figure 3.3: The “true” parameters for the word “four”. (a) Parameter a1 and (b)
Parameter a2.

 

55

 

 

 

 

 

 

 

’3 1 Y 1 Y I r
O l, 2 3 ‘0 5, 6. 7

Sample. n (X103)

(9)

 

 

‘3 I r r v r 1
O l 2 3. ‘t 5. 6 7

Sample. n (X103)

(1))

 

 

3

Figure 3.4: The “true” parameters for the word “six’.
Parameter 02.

(a) Parameter a1 and (b)

 

56

where the true curves are difﬁcult to discern.

3.2.1 Conventional RLS and SM-WRLS Algorithms

The power of the SM-WRLS algorithm is evident when compared with the con-
ventional RLS [35] algorithm. As a basis for further discussion, we ﬁrst show this
comparison. Figures 3.5 and 3.6 show the simulation results for the word four using
the RLS and the SM-WRLS algorithms, respectively, and Figs. 3.7 and 3.8 show the
simulation results for the word six. It is evident that SM-WRLS outperforms RLS in
terms of its tracking capability, and it is critical to note that this improved perfor-
mance comes with improved computational efﬁciency. In this case SM-WRLS uses
only 1.86% and 2.16% of the data for the words four and six, respectively, and yet
yields better parameters estimates almost all the time. It is important to note that
SM—WRLS tracks the time varying parameters faster than RLS. This is manifest in
both examples, especially the word six (see Figs. 3.7 and 3.8).

While the main theme of Section 2.4 is the development of adaptive SM-WRLS
methods, it is noted that the “unmodiﬁed” SM-WRLS algorithm apparently has
adaptive capabilities in its own right. While SM-WRLS is developed under the as-
sumption of stationary system dynamics, it is capable of behaving in this manner in
certain circumstances because of the special weights used. Recall that these optimal
data weights have the interpretation as parameters which minimize the ellipsoid vol-
ume. Note, however, that these weights multiply the corresponding equations, and
therefore, different equations have diﬂerent weights. The SM-WRLS algorithm deter-
mines the value of each weight depending on the “amount” of new information (from
the SM point of view) contained in an equation. Hence, an equation with “no new
information” is likely to be rejected ()(n) = 0) whereas an equation with a signiﬁcant
amount of information (such as in the case of fast changing dynamics) is likely to be

heavily weighted. This intuitively accounts for the inherent adaptation behavior of

57

 

 

 

 

 

 

 

 

 

 

 

3
hue
a l
a
. « \
estimate
0‘ -4
-1 -(
-2 d
-3 1' ﬁr T T 7* V
O l 2 3 '4 5. 6. 7,
Sample. n (X103)
(9)
3
2
1
estimate
0 ‘ /
-1 cl \
_a d 11118
’3 1% r T r
0 l 2 3 ‘4 5 6 7

Sample. n (X103)

(13)

Figure 3.5: Simulation results of the conventional RLS algorithm for the word four.

 

58

 

 

 

 

 

 

 

 

 

 

 

’3
Hue
/’
2 1
\ WW
1 O
esunuue
° l
-1 4
-2
-3 T T *7 Y r j
0 l 2. 3. ‘9 5. 6 7
Sample. n (X103)
(21)
3
2 i
1
o . estimate
*— __ W A
-1 \
-2 true
’3 r ‘1? v 1 v v
0 l 2 3. ‘6. 3. 6. 7.

Sample. n (X103)

(b)

Figure 3.6: Simulation results of the SM-WRLS algorithm for the word four. 1.86%
of the data is employed in the estimation process.

 

59

 

 

 

 

-, l estimate

 

 

 

-3 1 v T Y Y

Sample. n (X103)

(9)

 

estimate

 

 

‘3 T 7 r T T r
O l 2 3. '4 5 6 7

Sample. n (X103)

(1))

 

Figure 3.7: Simulation results of the conventional RLS algorithm for the word six.

 

60

 

 

 

   
 

 

 

 

 

3
a ’/true
1
o .
estimate
-1 -l
-2 <
‘3 Y F T r j v
0 l 2 3 ‘+ 5 6 . 7 .
Sample. n (X103)
(a)
3
a 1
1
estimate

. / M
\

hue

 

 

 

0 1r ZY 3r 9‘ 3V 6'. 7 .
Sample. n (x103)

(b)

Figure 3.8: Simulation results of the SM-WRLS algorithm for the word six. 2.16% of
the data is employed in the estimation process.

 

61

the unmodiﬁed SM-WRI.S algorithm.

However, as will be seen below, it is not possible to depend upon SM—Wltl..S to
reliably behave in this adaptive manner, particularly in cases of quickly varying system
dynamics. Each time a new equation is accepted, the ellipsoid volume decreases and
the “conﬁdence” in the current estimate increases. In a situation in which the signal
is varying rapidly and the parameters are moving away from their “current” locations.
then the algorithm accepts incoming equations to incorporate the new information
into the estimate, and the ellipsoid volume decreases rapidly, eventually becoming
very small. As the parameters continue to move rapidly away from their current
locations, they eventually move outside the shrinking ellipsoid which becomes an
invalid bounding ellipsoid. This condition indicates that a violation of the theory
has taken place, and therefore, the unmodiﬁed SM—WRLS algorithm is no longer
guaranteed to work properly. However, it is noted (empirically) that the unmodiﬁed
SM-WRLS algorithm performs well when applied to slow time varying systems.

Next, the simulation results of the several variations on the general adaptive SM-
WRLS algorithm are shown. It is worth reiterating that when it becomes difﬁcult
to distinguish the true parameters from their estimates, the reader can refer to Figs.
3.3 and 3.4 for clariﬁcation. Also note that the ﬁgure captions will not contain the

description for the two frames for it is implied that frame (a) shows parameter a, and

frame (b) shows parameter a2.

3.2.2 Adaptive SM-WRLS Algorithms
3.2.2.1 Windowing

Figures 3.9 and 3.10 show the simulation results of the windowed SM-WRLS algorithm
for the words four and six, respectively, using a window of length 1000. This strategy
uses only 5.69% and 5.44% of the data for the words four and six, respectively.

Since each equation rotated in is eventually rotated out of the system, this strategy

62

 

 

 

 

 

 

 

 

 

 

 

0
estimate
-1
-2
‘3 v r 1 Y T Y
0 l 2. 3 '4 5 6 7
Sample. n (X103)
(31)
3
2
estimate
1 i /
o .l
-l
nue
-2
-3 1 r r Y T 7
0 l a 3 *1 S 6. 7.

Sample. n (X103)

0))

Figure 3.9: Simulation results of the windowed SM-WRLS algorithm for the word
four (I = 1000). 5.69% of the data is employed in the estimation process.

 

63

 

 

estimate

 

 

 

 

ﬁne
-2
-3 TV Y r f Y T
0 t 2 3 H 5 6 7
Sample. n (X103)
(a)
3
2 J
estimate

hue

 

 

 

‘3 Y Y Y f ‘7 j'w

o l a 3 9 5 s 7
Sample. n (X103)

(1))

Figure 3.10: Simulation results of the windowed SM- WRLS algorithm for the word
six (I = 1000). 5. 44% of the data 18 employed 1n the estimation process.

 

64

cﬂectivcly uses about twice the number of equations rotated in. More data than with
the unmodified SM-WRLS algorithm are used, but more accurate estimates result
and the time varying parameters are tracked more quickly and accurately. This can
easily be seen when the parameter dynamics change abruptly near the point 2000 for
the word four (see Fig. 3.9) and near the points 2000 and 4500 for the word six (see
Fig. 3.10).

3.2.2.2 Graceful Forgetting

Figures 3.11 and 3.12 show the simulation results of the graceful forgetting SM-
WRLS algorithm when rotating out 0.1% of each of the equations (p = 10'3) that
was accepted in the past 1000 recursions. This strategy uses only 6.19% and 4.89%
of the data for the words four and six, respectively. Note that this technique uses
comparable percentages of the data to those used by the windowed strategy and yields
smoother estimates. Although the algorithm uses very small percentages of the data,
the value of p used here might not be practical because it means that the algorithm
will rotate out each equation that was initially accepted 1000 times, which is clearly a
computational burden. Therefore, this strategy effectively uses about 1000 (or [1")
times the number of equations rotated in. Depending on the nature of the problem,
practical values of 11 may range from 0.002 to 0.01 with an effective window of length

500 to 100.

3.2.2.3 Selective Forgetting

As noted in Subsection 2.4.1.3, the selective forgetting strategy selects the equations
to be (partially or completely) removed from the system based on user deﬁned crite-
ria. The selection procedure used here is to remove the equations starting from the
ﬁrst accepted equation remaining in the estimate at a given time, and proceeding
sequentially until some other condition is satisﬁed. The determination of when to

apply the forgetting procedure and when to stop removing equations at a given time

65

 

 

Hue

o J \

\

—1 4 estimate

 

 

 

'3 r r r 1 I T

O
.-
N
to)
£
u
(D
‘1

Sample. n (X103)

(9)

 

estimate

1
p.-
1

true

 

 

 

31. ~.' 5’ E 7 ,
Sample. n (X103)

(b)

O
p.
N

Figure 3.11: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word four (p = 10'3). 6.19% of the data is employed in the estimation process.

 

66

 

 

 

 

 

 

 

 

 

3
Hue

2

1 ct

., \

8811111316
"1
-3 1
‘3 Y t T v r r
0 1 a 3 ‘1, 5. 6 7
Sample. n (X103)
(*1)
3
3.
estimate

1 4

o .l
-1 -] /

UB6
‘2
-3 T 7’ ﬁt 7 T Y
0 l 2. 3 "f 5, 6 . 7

Sample. n (X103)

0))

Figure 3.12: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word six (11 = 10‘3). 4.89% of the data is employed in the estimation process.

 

67

is discussed ill the following.

When the true parameters of the word four are inspected (see Fig. 3.3), for ex-
ample, it is noted that they can be characterized as having slow time variations
everywhere except in the region from 2000 to 2300 where they have fast time varia-
tions. The fact that the parameters are changing very slowly in the ﬁrst 2000 points,
induces the algorithm to accept some points which, in turn, causes the ellipsoid vol~
ume to decrease. An increase in the “conﬁdence” of the estimate results. Near time
2000, the ellipsoid volume becomes very small. When the parameters move rapidly
away from their current location, they eventually move outside the ellipsoid which
is therefore no longer a valid bounding ellipsoid. An “animation” subroutine has
been developed which allows the user to see the locations of the true parameters and
their estimates with respect to the ellipsoid in the 2-D parameter space. When this
condition happens, it eventually leads to a negative value of n(n), (see (1.23)). For
a stationary system, n(n) is always positive [7], so that this condition indicates that
a violation of the theory (in particular, the violation of the assumption of stationary
dynamics) has taken place“. A similar condition was also reported by Dasgupta and
Huang [14] while applying their OBE algorithm to nonstationary systems. In our
simulation studies, it is found that a negative n(n) is an effective indicator of need
for adaptation, and this criterion is used as the prompt to begin selective forgetting.
Whenever accepting an equation causes n(n) to become negative, the algorithm starts
rotating out the equations which are selected based on the selection procedure until
n(n) becomes positive again.

Figures 3.13 and 3.14 show the simulation results of the selective forgetting strat-
egy described here. For the words four and six, respectively, this technique uses only
3.6% and 2.83% of the data, 18.7% and 56.1% of which are rotated out during the

adaptation process, and therefore, when counting the total number of equations r0-

 

6Mathematically, n(n) < 0 indicates an ellipsoid of negative dimensions.

68

 

 

 

 

 

 

a ‘ Hue
l < “\\\\
estimate
0
.. 1 4
-a
'3 r r 1 Y r v
o l a 3 w 5 s a
Sample. n (X103)
(«9)
3

 

 

 

 

o F a 3 9' S s, 7
Sample. n (X103)

(b)

Figure 3.13: Simulation results of the selective forgetting SM-WRLS algorithm for
the word four. 3.6% of the data is employed in the estimation process.

 

69

 

 

INC

 

 

 

 

 

 

 

 

 

o 4
estimate
-1 d
-2 an
-3 4 . . 1 T .
0 l 2 3 '4 5 6. 7
Sample. n (X103)
(21)
3
2
1 estimate
-1. d
nue
-2. 1
-3 , , r j r .
O l 2 3 9 S, 6 7

Sample. n (X103)

0))

Figure 3.14: Simulation results of the selective forgetting SM-WRLS algorithm for
the word six. 2.83% of the data is employed in the estimation process.

 

70

tated into and out of the system, this strategy effectively uses 4.27% and 4.41% of the
data. Compared to the windowed and graceful forgetting adaptive strategies, the sim-
ulation results show that the selective forgetting strategy yields smoother estimates
using even fewer data. It is important to recall that the percentages given in the
windowed and graceful forgetting adaptive strategies represent the total number of
equations used by these techniques, and do not account for the number of equations
rotated out of the system. Detailed complexity comparisons are made in Section 2. 5.

It should be pointed out that n(n) > 0 is only a necessary condition for the
true parameters to be inside the current ellipsoid. The fact that K. goes negative at
a particular time does not precisely determine the point at which system dynam-
ics began to change. In fact, n(n) < 0 indicates a rather severe breakdown of the
process indicating that the “true” parameters have moved well outside of the cur-
rent ellipsoid. However, it is precisely in cases of fast changing dynamics that this
“breakdown” occurs rapidly resulting in “n(n) < 0” in fact being a good locator of
changing dynamics which require “immediate” adaptation to preserve the integrity of
the process. In cases of slowly changing dynamics where the theory can be violated
without the appearance of negative 1:, SM-WRLS seems to be sufficiently robust to
these slow changes to make its own adjustments. The examples of Figs. 3.6 and 3.8

above illustrate this later point.

3.2.3 Suboptimal SM-WRLS

Figures 3.15 and 3.16 show the simulation results of the unmodiﬁed SM-WRLS al-
gorithm with suboptimal data selection. In this case, only 1.19% and 1.53% of the
data are used for the words four and six, respectively. Compared to the SM-WRLS
algorithm (see Figs. 3.6 and 3.8), the suboptimal technique uses slightly fewer data
but produces comparable estimates. It is interesting to note that most of the equa-

tions (97.6% for the word four and 94.4% for the word six) that are accepted by

71

 

 

estimate

 

 
    

 

 

 

 

 

 

 

 

-2 1
-3 . , , , f

0 l 2 3 “t 3. 7

Sample. n (x103)
(a)
3
2
1
estimate
0 .l /
A _W A

-x \
-2. 1 true
-3 V v 1 r r

O l 2 3. ‘9 5 7.

Sample. n (X103)

(b)

Figure 3.15: Simulation results of the SM-WRLS algorithm with suboptimal data
selection for the word four. 1.19% of the data is employed in the estimation process.

 

72

 

 

estimate

 

 

 

 

 

 

 

 

 

 

0 1' 2' 3r “1' 5'. 6i 7
Sample. n (X103)
(9)
3
2
1 .
8811111316
0 / HIE “7W
—1 _
uue
-2 4
~3. . r . . a r
O l 2 . 3 ‘+. 5 6 7

Sample. n (X103)

0))

Figure 3.16: Simulation results of the SM-WRLS algorithm with subOptimal data
selection for the word six. 1.53% of the data is employed in the estimation process.

 

73

the suboptimal technique are also accepted by the SM-WRLS algorithm. It is also
interesting to note that the equations that are accepted by the suboptimal technique

but not by the SM-WRLS algorithm lie mostly in regions of fast changing dynamics.

3.2.4 Adaptive Suboptimal SM-WRLS

It is noted in Section 2.4.1 that the general formulation of the adaptive SM-WRLS
algorithm is amenable to the suboptimal technique. The simulation results of the
selective forgetting SM-WRLS technique with suboptimal data selection are shown
in Figs. 3.17 and 3.18. This strategy uses only 1.89% and 1.86% of the data, 14.4%
and 48.5% of which are rotated out during the adaptation process, and therefore,
when counting the total number of equations rotated into and out of the system, this
strategy effectively uses 2.16% and 2.76% of the data.

Compared to the selective forgetting strategy (Figs. 3.13 and 3.14), the selective
forgetting technique with suboptimal data selection uses fewer data but produces
comparable estimates. On the other hand, when compared to unmodiﬁed SM-WRLS
with suboptimal data selection (Figs. 3.15 and 3.16), the selective forgetting subop-

timal technique uses more data but produces better estimates.

3.3 Simulation Results of an AR(14) model

In this section, the identiﬁcation of a time varying AR(14) model is considered. The
same procedure of Section 3.2 is used to generate the “true” AR parameters from an
utterance of the word “seven” by an adult male speaker. The acoustic waveform of
this word is shown in Fig. 3.19. In the following, the simulation results are shown and
discussed using the same format as that used in Section 3.2, however, only one true
parameter (04), which is the most challenging parameter for the algorithm to track,

is used for illustration and is shown in Fig. 3.20.

74

 

 

 

 

 

 

 

 

 

 

 

 

-1 l estimate
-a
‘3 1* 1 ﬁ Y r f
o 1 a. 3. a 5. s. 7
Sample. n (X103)
(a)
3.
a 4
1 4 estimate
0. -l
-1 q + ‘W
ﬁne
-2 ..
-3 1 v __r v v v
o 1 a 3. s. s. a 7
Sample. n (X103)
(1))

Figure 3.17: Simulation results of the selective forgetting SM-WRLS algorithm with
suboptimal data selection for the word four. 1.89% of the data is employed in the
estimation process.

 

 

 

Hue

.. l
\

 

 

 

 

 

 

 

estimate

_!‘
-2. .4
-3 r r v T v r

O t 2 3 ‘4 5, 6 7.

Sample. n (X103)
(80
3
a .J
‘ ‘ estimate
0
_‘ 4
true

.a~ d
-3 1 . , , r r

0 l 2 3 ‘t. 5. 6. 7

Sample. n (X103)

(b)

Figure 3.18: Simulation results of the selective forgetting SM-WRLS algorithm with
suboptimal data selection for the word six. 1.86% of the data is employed in the
estimation process.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

'2 T ﬂ 1 f 1 1*

o l a. 3 a. s. s. 7.
Sample. n (X103)

Figure 3.19: The acoustic waveform of the word “seven”.

 

 

 

 

 

3
2 4
1 a
O. 4
-l 1
-3 a
-3 . v v v v 1
0 l 2 3. H S. 6 7

Sample. n (x103)

Figure 3.20: The fourth “true” parameter (a) for the word “seven”.

 

77

 

 

, 4 estimate

 

-Z,

 

 

 

‘3 T w v v v r
O t 2 3 '9. S. 6. 7
Sample. n (X103)

 

Figure 3.21: Simulation results of the conventional RLS algorithm for the word seven.

 

3.3.1 Conventional RLS and SM-WRLS Algorithms

Figures 3.21 and 3.22 show the simulation results for the word seven using the RLS
and the SM—WRLS algorithms, respectively. Although both algorithms yield bad
estimates and do not track the time varying parameters, the SM-WRLS algorithm
which uses only 7.93% of the data yields better parameter estimates than those of
the RLS algorithm most of the time.

In contrast with the AR(2) results of Figs. 3.6 and 3.8, the “unmodiﬁed” SM-
WRLS algorithm for the AR(14) example shown in Fig. 3.22 fails to track the time
varying parameters. This is expected since the signal is varying rapidly (especially in
the range [2000—6000]) and the algorithm is no longer guaranteed to work properly
(recall the discussion of Section 3.2.1). By inspecting Fig. 3.22 carefully, it is clear
that the SM-WRLS algorithm loses its ability to track the signal in the range when

78

 

 

estimate

 

 

 

 

 

Sample. n (X103)

Figure 3.22: Simulation results of the SM-WRLS algorithm for the word seven. 7.93%

of the data is employed in the estimation process.

 

the signal is varying rapidly.

3.3.2 Adaptive SM-WRLS Algorithms
3.3.2.1 Windowing

This subsection tests and compares the simulation results of the windowed SM-WRLS
algorithm for the word seven using three different window lengths. Figures 3.23, 3.24,
and 3.25 show the simulation results using windows of lengths 500, 1000, and 1500,
and using 22.1%, 17.04%, and 14.34% of the data, respectively. In all three tests,
more data than with the unmodiﬁed SM-WRLS algorithm are used, but much more
accurate estimates result and the time varying parameters (one of which is shown) are

tracked more quickly and accurately. When comparing the effect of the window length

79

 

 

 

 

 

-3 Y Y Y T T r
0 l 2 3 ‘t S 6 7

 

Sample. n (X103)

Figure 3.23: Simulation results of the windowed SM-WRLS algorithm for the word
seven (1 = 500). 22.1% of the data is employed in the estimation process.

 

on the algorithm, it is noted that shorter window lengths use more data, but yield
more accurate estimates (most of the time) and track the time varying parameters
more quickly and accurately. However, if the window length becomes very short
(< 300), the variance of the estimates becomes very large because these estimates
involve a small window length which is effectively even smaller because of the small

fraction of data accepted.

3.3.2.2 Graceful Forgetting

This subsection tests and compares the simulation results of the graceful forgetting
SM-WRLS algorithm for the word seven using three different p values. Recall that
p represents the fraction of the equations removed from the system at each time.
Figures 3.26, 3.27, and 3.28 show the simulation results using p values of 10’3, 2x10‘3,

and 4 x 10‘3 (or effective windows of lengths 1000, 500, and 250), and using 18.66%,

80

 

 

-2

 

 

 

 

o 1' a 3. J 5' s . 7
Sample. n (X103)

Figure 3.24: Simulation results of the windowed SM-WRLS algorithm for the word
seven (1 = 1000). 17.04% of the data is employed in the estimation process.

 

27.63%, and 35.59% of the data, respectively. Note that this strategy is initiated at
time 100 to allow for the estimation to stabilize.

It is clear that as the value of )1 increases (or the effective window length decreases),
the algorithm uses more data (as expected), tracks the time varying parameters more
quickly, but yields inaccurate estimates at certain points. The latter point is more
evident in Fig. 3.28 where the effective window length is only 250 points. When com-
paring the results of the graceful forgetting technique with those of the corresponding
windowed technique (for example, compare Figs. 3.24 and 3.26 in which the effective
window length is 1000), it is noted that the graceful forgetting technique uses more

data and yields “comparable” estimates.

3.3.2.3 Selective Forgetting
Using the same selection criterion as that used in Subsection 3.2.2.3, the selective

81

 

 

to

 

estimate

 

 

 

 

Sample. n (X103)

Figure 3.25: Simulation results of the windowed SM-WRLS algorithm for the word
seven (1 = 1500). 14.34% of the data is employed in the estimation process.

 

forgetting strategy yields the simulation result shown in Fig. 3.29. This technique uses
only 12.89% of the data, 72.9% of which are rotated out during the adaptation process.
Therefore, it effectively uses (or rotates into and out of the system) 22.29% of the data.
It is noted that this technique slowly tracks the rapidly varying parameters and yields
estimates that are not as accurate as those produced by the other adaptation strategies
but uses fewer data. Recall that this technique produces highly accurate estimates in
the AR(2) examples (see Figs. 3.13 and 3.14), however, the signal variations in these

examples are not as severe as those in the AR(14) example.

3.3.3 Suboptimal SM-WRLSV

Figure 3.30 shows the simulation results of the unmodiﬁed SM-WRLS algorithm with

suboptimal data selection which uses only 4.74% of the data. Compared to the SM-

(1)
(\D

 

 

estimate

 

 

 

'3 1 v v 1—

o l a , 3 . w 5'. 5' 7
Sample. n (x103)

Figure 3.26: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word seven (p = 10“”). 18.66% of the data is employed in the estimation process.

 

WRLS algorithm (see Fig. 3.22), the suboptimal technique uses fewer data (60% of
the data used by SM-WRLS) but produces comparable estimates. It is noted that
most (91.3%) of the equations that are accepted by the suboptimal technique are also
accepted by the SM-WRLS algorithm. It is also noted that the equations that are
accepted by the suboptimal technique but not by the SM-WRLS algorithm lie mostly

in regions of fast changing dynamics.

3.3.4 Adaptive Suboptimal SM-WRLS

Two adaptive suboptimal techniques are tested in this section. The first technique
is the windowed SM-WRLS algorithm with suboptimal data selection. Figures 3.31
and 3.32 show the simulation results of this technique which uses only 10.93% and

8.71% of the data when using a window of length 500 and 1000, respectively. Compared

83

 

 

-1 1 ii
true , J! ‘

-. . M ..

 

 

 

Sample. n (X103)

Figure 3.27: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word seven ()4 = 2 x 10‘3). 27.63% of the data is employed in the estimation
process.

 

to the windowed SM-WRLS algorithm (see Figs. 3.23 and 3.24), the corresponding
adaptive suboptimal technique uses fewer data (50% of the data used by the win-
dowed SM-WRLS algorithm), however, it produces estimates that are comparable
to but not as smooth as those of the windowed SM-WRLS algorithm (see Figs. 3.31
and 3.32).

The second technique is the selective forgetting SM-WRLS algorithm with sub-
optimal data selection. The simulation result of this technique is shown in Fig. 3.33
which uses only 8.8% of the data, 63.3% of which are rotated out during the adapta-
tion process, therefore, it effectively uses 14.37% of the data. This technique produces
comparable estimates to those of the selective forgetting SM-WRLS algorithm shown

in Fig. 3.29 using fewer data.

84

 

 

 

 

 

 

“'3 I f r v 1* T

Sample. n (x103)

Figure 3.28: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word seven ([1 = 4 x 10‘3). 35.59% of the data is employed in the estimation
process.

 

3.4 Roundoff Error Analysis

Recently, there has been an increasing interest in the performance of adaptive algo-
rithms in small wordlength environments [8, 49, 50, 51]. Rao and Huang [8] have
investigated the effect of small wordlengths on one of the OBE algorithms presented
in [14]. Their simulation studies were performed in integer arithmetic using a ﬁxed
point implementation of the OBE algorithm. They have shown that the OBE algo-
rithm yields consistently good estimates over a large range of wordlength and performs
better than the conventional RLS algorithm for small wordlengths [8].

Marshall and Jenkins [49] have presented a fast quasi-Newton (FQN) adaptive

ﬁltering algorithm which is quite robust with respect to small wordlength implemen-

85

 

 

 

 

 

 

Sample. n (x103)

Figure 3.29: Simulation results of the selective forgetting SM-WRLS algorithm for
the word seven. 12.89% of the data is employed in the estimation process.

 

tation and has comparable performance to that of RLS. However, this algorithm is
developed based on the assumption that the input is real and wide-sense stationary
which yields a symmetric and Toeplitz autocorrelation matrix. The F QN algorithm
appears to avoid the numerical problems reported for several fast RLS techniques
[51, 52]. The numerical problems consist of numerical inaccuracy in the results (per-
formance degradation) and numerical instability (overﬂows or underflows) which are
caused by ﬁnite wordlength computations [50].

All the simulations presented in the previous sections are performed by using C
with 32-bit, single precision, ﬂoating point arithmetic on a VAX 8600 mainframe
running under a Unix operating system. It is the main purpose of this section to
test the effect of smaller wordlength computations on the performance of the GR-

based SM-WRLS algorithm. The simulations presented here are performed with 16-

86

 

 

estimate

-2

 

 

 

 

‘3 ﬁr 1 17 T T

Sample. n (x103)

Figure 3.30: Simulation results of the SM-WRLS algorithm with suboptimal data
selection for the word seven. 4.74% of the data is employed in the estimation process.

 

bit, single precision, unnormalized ﬂoating point arithmetic. Whenever an overﬂow
(underﬂow) occurs, the algorithm sets the detected value to the maximum (minimum)
possible value which is determined by the wordlength used.

The roundoff error analyses are performed on the same AR(2) models used in
Section 3.2. Figures 3.34 and 3.35 show the simulation results of the SM-WRLS algo-
rithm for the words four and six using only 1.96% and 2.17% of the data, respectively.
Compared to the results of the SM-WRLS algorithm for the word four obtained when
using a wordlength of 32-bit (see Fig. 3.6), this algorithm uses slightly more data and
yields slightly better estimates when a 16-bit wordlength is used. The simulation
results for the word six when a 16bit wordlength is used are almost identical to those
obtained when using a wordlength of 32—bit (see Fig. 3.8).

Figure 3.36 shows the simulation results of the windowed algorithm for the word

87

 

 

 

.2 c1

 

 

 

Sample. n (x103)

Figure 3.31: Simulation results of the windowed SM-WRLS algorithm with subopti-

mal data selection for the word seven (1 = 500). 10.93% of the data is employed in
the estimation process.

 

four using a window length of 1000 and a wordlength of 16-bit. This strategy uses
slightly fewer data (5.4%) and produces comparable but slightly less accurate es-
timates compared with those of the corresponding results obtained when using a
wordlength of 32-bit (see Fig. 3.9).

Figure 3.37 shows the simulation results of the graceful forgetting SM-WRLS
algorithm when rotating out 0.1%, and using a wordlength of 16-bit. This strategy
uses about half the data (3.27%) but produces unacceptable estimates compared with
those of the corresponding results obtained when using a wordlength of 32-bit (see
Fig. 3.11).

Finally, Figure 3.38 shows the simulation results of the selective forgetting SM-

WRLS algorithm using a wordlength of 16-bit. This technique uses only 3.27% of

88

 

 

 

 

'3

 

Sample. n (X103)

Figure 3.32: Simulation results of the windowed SM-WRLS algorithm with subopti-

mal data selection for the word seven (1 = 1000). 8.71% of the data is employed in
the estimation process.

 

the data, 10.5% of which are rotated out during the adaptation process. Therefore,
it effectively uses (or rotates into and out of the system) 3.61% of the data. This
strategy uses slightly fewer data and produces comparable but slightly less accurate
estimates compared with those of the corresponding results obtained when using a
wordlength of 32-bit (see Fig. 3.13).

Except for the graceful forgetting strategy, the simulation results using a wordlength
of 16-bit are very encouraging. This is due to the infrequent updating behavior inher-
ent in the SM-WRLS algorithms, and hence, slower accumulations of roundoff errors.
Also, note that the orthogonal rotation used in the GR-based SM-WRLS algorithms
is known to be a numerically stable operation under the assumption that the orthog-

onal matrices are produced in the absence of roundoff errors [39, 50]. However, the

89

 

 

 

estimate

 

 

 

 

-3
Sample. n (X103)
Figure 3.33: Simulation results of the selective forgetting SM-WRLS algorithm with

suboptimal data selection for the word seven. 8.8% of the data is employed in the
estimation process.

 

graceful forgetting strategy requires performing a large number of downdates for each
previously included equation, which has a severe effect in degrading the performance
of the algorithm due to the accumulation of roundoff errors per iteration.

The most widely discussed fast identiﬁcation algorithm is the Fast Transversal
Filter (F TF) [52] which is a fast RLS algorithm, requiring 8m + 15 ﬂops per iteration
(or 0(m2) for small m) in its more stable form. It includes a “rescue” procedure to
prevent divergence due to ﬁnite precision effects, and to ensure numerical stability.
The rescue procedure is a fast initialization procedure, requiring 3m + 10 ﬂops per
invocation, in which all the accumulated quantities are sacriﬁced.

It is true that roundoff error experiments performed in this section are 0(m2),

however, it is clear from the simulation results presented in Sections 3.2 and 3.3

90

 

 

 

 

 

 

 

 

 

 

 

3
true
a /
W
1 -4 \
estimate
0.
-1‘
-2, q
’3 1 I v v
0 t 2 3. 9. 5 6. 7
Sample. n (x103)
(a)
a
2 4
1
o 4 esﬁnuue
/ _._. 7f _J’?!’
-l. '1 \
‘3 ‘ ttue
'3. v f v v r r
O t. 2. 3. '4. S. 6. 7.

Sample. n (X103)

0))

Figure 3.34: Simulation results of the SM-WRLS algorithm for the word four using
a 16-bit wordlength. 1.96% of the data is employed in the estimation process.

 

 

nue

‘ /

 

 

 

 

 

-1_ 4
-a
‘3 r v r r r T
_o l a. 3. s. 5. s. 7.
Sample. n (X103)
(a)
3
2.
estimate

L. / Jew...

 

 

 

true
-2
'3 T m r 1 r Y
0 l 2 3 ‘0. 5. 6. 7.
Sample. n (x103)

(b)

Figure 3.35: Simulation results of the SM-WRLS algorithm for the word six using a
16-bit wordlength. 2.17% of the data is employed in the estimation process.

 

 

 

 

 

 

 

 

 

 

 

 

 

3
2 1
true
1
0, ‘ \
-2 a
-3 Y 1 T F
0 l 2 3 ‘6. 5 6 7
Sample. n (x103)
(a)
3,
2.
estimate
1 " /
O.
.1 _A at}
-1.1 V'"
Hue
-2. q
-3 v r 1 T r I
.0 l 2. 3. ‘9. 5. 6. 7.

Sample. n (X103)

(1))

Figure 3.36: Simulation results of the windowed SM-WRLS algorithm for the word
four (I = 1000) using a 16-bit wordlength. 5.4% of the data is employed in the
estimation process.

 

93

 

 

 

 

 

 

-2.
-3, r v r v r r
O l. 2. 3. ‘9. 5. 6. 7.
Sample. n (X103)
(a)

3.

a J

1.

esnnune

 

 

 

 

-3 T 7 v v r 7

Sample. n (x103)

(b)

Figure 3.37: Simulation results of the graceful forgetting SM-WRLS algorithm for
the word four (a = 10‘3) using a l6-bit wordlength. 3.27% of the data is employed
in the estimation process.

 

 

 

true

 

 

 

 

 

 

 

 

 

estimate
-1, 4
—a m
‘3. v t v—i r I T
O t 2 3 W. 3 6 7
Sample. n (x103)
(a)
3
2
estimate
1
o It
_— ._ j
-l
ﬂue
-314
-3, y f j I r 1
O t 2. 3. H. S. 6. 7.

Sample. n (X103)

(b)

Figure 3.38: Simulation results of the selective forgetting SM-WRLS algorithm for the
word four using a l6-bit wordlength. 3.27% of the data is employed in the estimation
process.

 

95

that the suboptimal strategy produces estimates which are not very different from
those of the SM-WRLS algorithm and yet requires only m flops per checking, which
represents a signiﬁcant computational savings with respect to FTF. Ilowever, it is
essential to note that the suboptimal strategy is used to select points and is not
directly involved in the update procedure. Therefore, the results of the suboptimal
strategy with respect to small wordlength effects are not expected to be signiﬁcantly
different from those of the SM-WRLS presented in this section. In fact, since even

fewer computations are generally used in the suboptimal strategy, even better ﬁnite

precision characteristics could be expected.

 

96

Chapter 4

Architectures and
Complexity Issues

 

4.1 Introduction

It is noted in Chapter 1 that one of the reasons why the GR-based SM-WRLS for-
mulation is desirable is that it is amenable to contemporary computing architectures.
This chapter is devoted to the development of parallel hardware implementations
of the real scalar SM-WRLS algorithm and discussion of their advantages, particu-
larly with regard to their improved computational complexity which improve their
potential for real time applications. Section 4.2 presents a parallel architecture that
implements the SM-WRLS algorithm. Section 4.3 develops an adaptive compact par-
allel architecture that implements virtually any version of the real scalar SM-WRLS
algorithm. Finally, a detailed analysis of computational complexity issues is carried

out in Section 4.4.

4.2 Parallel Architecture for SM-WRLS

The use of a parallel machine beneﬁts the processing in terms of computational com-

plexity by reducing the process to one requiring 0(m) time, rather than 0(m2)

97

 

required for the conventional version of the algorithm [5, 6], where m is the number
of parameters in the system model. This speed-up is due to the parallel processing
inherent in the design.

The main computational requirements of the GR-based SM-WRLS formulation
are a GR processor (to effectively execute orthogonal triangularization) to update the
matrix [T(n) I d1(n)] at each step, and a back substitution (BS) processor to solve
for the scalar C(72) and also for the estimate S(n) at each n. Systolic processors for
these operations, based on the original work of Gentleman and Kung [42] and Kung
and Leiserson [53], are well known. It is the purpose of this section to manifest this
algorithm as a parallel architecture based on these processors.

The SM-WRLS algorithm of Fig. 1.2 is mapped into a parallel architecture. The
need for implementing the SM-WRLS algorithm on a parallel architecture arises from
the fact that portions of the algorithm are compute-bound, speciﬁcally, updating the
matrix [T(n) | d1(n)] and computing the value C(11) and the parameter vector 8(a).
The architecture that speeds up the computation of these quantities and satisﬁes
the desirable characteristics of systolic arrays (SA’s) is shown in Fig. 4.1. Note that
although this architecture is designed based on SA design methodologies, it is used
here to process one equation at a time (more on this below), and therefore, is not
used as a SA. This architecture provides an improvement over that described in [9]
by replacing the global buses with local buses for communication between adjacent
cells. For simplicity of notation, the ﬁgure shows a purely autoregressive case with
p=3; AR(3). Once the processor is understood, it should be clear that the architecture
is perfectly capable of handling the general ARMAX(p,q) case. In the discussion
below, the vectors r(n) and 9(n) are used, however, the architecture of Fig. 4.1
uses the vectors y(n) and 5(a) instead to denote the special case AR(S), where
y(n) = ly(n - llyfn - 2)y(n - 3)lT-

The architecture is composed of two SA’s, several memory management units (i.e.,

98

 

y(n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

lldlllz MAU [ITDMX DMX

F

I

F

O

8‘ ' 82 ' g" ‘— MPX MPX
G(n+1) 4——— MAU _—
t...’

. 33 . 32 . a]

 

 

 

 

 

 

 

 

 

Figure 4.1: Systolic array implementation of the Givens rotation-based SM-WRLS
algorithm. For simplicity of notation but without loss of generality, the ﬁgure shows
a purely autoregressive case with p=3; AR(3).

 

99

[first-in First-out (FIFO) and Last-in First—out (Lll’O) stacks7), multiply-add units
(M AU’s), multiplexers (MPX’S), and demultiplexers (DMX’S). The ﬁrst SA is a trian-
gular array that performs orthogonal triangularization using GR’s [42, 43] which are
particularly suitable for solving recursive linear LS problems. The diagonal (circular)
cells perform the “Givens generation” (GG) operations and all other (square) cells in
the triangular array perform the GR operations. There is a delay element at the lower
right-hand corner of the triangular array that is used to synchronize the ﬂow of the
generated entries into the FIFO stacks and to simplify the control of these stacks once
they are ﬁlled and ready to output their contents to the BS array. The operations
performed by this array are shown in Fig. 4.2 [42, 43]. Therefore, the triangular array
rotates the new equation into the upper triangular matrix [T(n) I d1(n)], where the
t,,- cells update the matrix T(n) and the right-hand column (dlj) cells update the
vector d1(n). The element t,, denotes the i j‘h element of the matrix T(n) and the
element d1,- denotes the j‘“ element of the vector (h(n).

The second array is a linear array that performs the BS operations shown in
Fig. 4.3 [53]. Note that the same BS array is used to solve for the vectors g(n + 1)
and B(n) with the data provided to the appropriate cells in the required order by the
FIFO and LIFO stacks. The FIFO stacks feed the lower triangular matrix TT(n) to
solve for the vector g(n + 1), and hence, the value G(n + 1). The LIFO stacks feed
the upper triangular matrix T(n) to solve for the parameter vector g(n). The values
C(n + 1) = [I g(n + 1) [[3 and I] d1(n) I]; are generated by the MAU’s shown in
Fig. 4.4. The number of segments in each stack is equal to the number of elements
the stack holds. Therefore, the leftmost stack consists of m segments, whereas the

rightmost stack has only one segment.

 

7Note that the architecture shown in Fig. 4.1 does not include any of the LIFO stacks that
were used to hold the matrix T(n) in the architecture reported in [9]. This is achieved by slightly
increasing the complexity of the cells used in the triangular array so that they can be used as storage
elements as well. This is facilitated by the diagonal interconnections between adjacent cells which
now constitute the LIFO stacks.

100

 

 

 

 

 

 

c = 1
x- s = 0
In t = t.
tin } 01“ 1n
0 ( ) else{
c,s
temp = [x2 + (xin)2]1/2
t c = x I temp
t
01.1 S = Kin / temp
(a) X = temp
tout = x
1
tin “in If(xin=0&c=l&s=0)
] tout = tin
else{
(c.s) —-u x e (cs) x = cx + sxin
l XOUt = 'S X + C Kin
t _
xout out tout - X
l
(b)
[in
tout = t1n
tout
(C)
When the array is used as LIFO stacks: First cycle: tout = x (tij cells)
xOUt = X (d1 06118)

All following cycles: tout=tin (tij cells)

xout = xin ((11 cells)

Figure 4.2: The operations performed by the cells used in the triangular array of

Fig. 4.1. (a) The Givens generation (GG) cells, (b) the Givens rotation (GR) cells,
and (c) the delay element.

 

lOl

 

1i

 

 

y.
xi,out x, l xi,out = (bi'yi)/ aii
1,0ut
bi
(a)
aij
yl out ‘— ‘__ y 1.1!! ..
x, ____,. __+ )5 yi,out " yi,in+aijxj
J ____.
(b)

Figure 4.3: The Operations performed by the back substitution array. (a) The left-end
processor and (b) the multiply-add units. The initial y;,,-,, entering the rightmost cell
is set to 0.

 

 

a . d=a+b.c

 

 

a G(n) = llg(n)l|2

 

- g3 82 81

XZU

 

 

 

 

 

 

 

Figure 4.4: The multiply-add unit used in Fig. 4.1.

 

The system shown in Fig. 4.1 works as follows. The ﬁrst m+1 equations (with
appropriate weights) enter the triangular array (from the top) in a skewed order, and
the matrix [T(n) | d1(n)] is generated and stored inside the cells. A shift register with
appropriate feedback connection and data sequencing can be used to hold and feed the
equation to the array. The initial upper triangular matrix residing in the array, and
corresponding to the ﬁrst m +1 equations, is ready after 3m +1 GG time cycles. The
CO time cycle is that of the triangular array performing the CG operations without
square roots, which is the time required to perform ﬁve ﬂoating point operations
(ﬂops) [43, 55], where one flop is deﬁned as one ﬂoating point multiplication plus one
ﬂoating point addition. Note that in order to prevent data collision, the ﬂow of data
in the triangular array moves along a corresponding wavefront and is controlled by
the slowest cells in the array, via, GG cells. Note that the data are fed to the array

one (skewed) equation at a time, therefore, the contents of each cell remain constant

103

after the completion of the current recursion. After the new equation is rotated into
the matrix [T(n) I d1(71)], the vectors g(n + 1) and B(n) are computed. All the t,,-
cells in the triangular array load their contents on the to,“ lines (tau. «- ar), and then
pass these elements across the diagonal lines (tau, +— t,,,) (see Figs. 4.1 and 4.2). This
obviates the LIFO stacks. The FIFO stacks are still needed, however, to compute the
vector g(n + 1). The FIFO stacks are ﬁlled with the elements of the lower triangular
matrix TT(n) as they are generated. This is done by loading the ti, entry on the
to... line (to... i- 2:) when it is generated. This entry propagates down the diagonal
cells (with the function to,“ +— t,-,,) until it arrives at and ﬁlls the appropriate FIFO
stack. For the cells in the right-hand column, which generate the vector d1(n), the
operations are different because it is this column that constitutes the LIFO stack for
the vector d1(n). Hence, after the new equation is rotated into the array, all the cells
in the right-hand column load their contents on the 2:0... lines (mom «— as), and then
they pass these elements down the column (am «- xgn) (see Figs. 4.1 and 4.2). Note
that the output am leaving the bottom cell in this column passes through the delay
element and is routed to both the MAU and the MPX feeding the (11,- elements to
the BS array. Note that the elements d1", and tmm leave the triangular array at the
same time because of this delay element. The timing diagram of the triangular array
is shown in Table 4.1. In this table, the inputs refer to the elements fed to the cells
in the top row. The circle (0) represents the CG cell and the square ([3) represents
the GR cell (see Fig. 4.1). The outputs refer to the elements that are produced in the
array cells and are written column wise; i.e., the ﬁrst column in the table represents
the ﬁrst column in the array, and so on.

The BS array is used to solve for the vectors g(n + 1) and 9(a). The vector
g(n +1) is solved using (1.31) and the parameter vector 9(a) using (1.28). Therefore,
the vector g(n + 1) is generated from the matrix TT(n), which is residing in the

FIFO stacks, and the vector :e(n + 1) which is available. The entries are fed to the

104

 

Table 4.1: The timing diagram of the triangular array of Fig. 4.1.

 

 

Inputs Outputs
Time 0 a a D [T(n) l dd
0 y(n - 1)
1 y(n — 2) t1]
2 y(n — 3) tn
3 y(n) t2; t13
4 t23 du
5 tss €112
6 (113

 

 

 

 

 

 

BS array every other BS time cycle, where the BS time cycle is the time required
to perform one ﬂop. As the 9.- entries are output from the left-end processor of the
BS array, they enter the MAU to generate the value C(n + 1) after 2m + 1 BS time
cycles. Likewise, the parameter vector 9(n) is generated using the matrix T(n) and
the vector (h(n) which are stored in the triangular array. Starting one BS time cycle
after the initiation of the ﬁrst BS Operation, the appropriate entries (of the second
BS operation) are also fed to the BS array every other BS time cycle. The parameter
vector 9(n) is output from the left-end processor of the BS array in reversed order
and interleaved with the vector g(n +1) as shown in Fig. 4.1. The value [I d1(n) I]: is
generated using a MAU one BS time cycle after the last (m‘h) element of the vector
dl(n) is generated. The timing diagram of the BS array is shown in Table 4.2 in
which the inputs refer to the elements fed to the shown cells, and the outputs refer
to the elements produced by the left-end processor in the array.

The values n(n) and c:(n+ 1) are then computed, and hence, the value An.“ which

determines whether the new equation is to be accepted or not. If the new equation is

105

 

Table 4.2: The timing diagram of the back substitution: array of Fig. 4.1.

 

Inputs Outputs

Time 0 C! D O

 

0 t1], y(n "' 1)

1 ‘33. d13 t12 91
2 in. y (n " 2) t23 tn 33
3 t22. (112 in in 92
4 t3, y(ﬂ — 3) in £22
5 in. du 93
6 .

 

 

 

 

 

 

accepted, then the weighted new equation enters the triangular array and the same
procedure described above takes place producing a new [T(n + 1) | d1(n + 1)] matrix
after 2m + 1 CG time cycles, and therefore, an updated C(n + 2), 8(n + 1), and
n(n + 1). On the other hand, if the new equation is rejected, then the triangular
array preserves its contents (hold state), but the value C(n + 2) is updated to make
the decision concerning the next equation. In the latter case, the same TT(n + 1)
matrix is used as the previous TT(n) matrix, and hence, the feedback on the FIFO

stacks. This procedure is repeated for every new equation.

4.3 An Adaptive Compact Parallel Architecture

The basic idea behind the compact architecture is to map the triangular array of
Fig. 4.1 into a linear array (called the GR array), that is, mapping all of the CG cells
into one GG cell and all the GR cells that are on the same diagonal into one GR cell.

This constitutes a permissible schedule because the projection vector, cf, is parallel to

106

the schedule vector, .3, and all the dependency arcs flow in the same direction across

the hyperplanes [36, Ch. 3]. In other words, this schedule satisfies the conditions

5%? > 0 (4.1)

and 3‘76 2 0, for any dependence arc E. (4.2)

The compact architecture implementation of the adaptive SM-WRLS algorithm is
shown in Fig. 4.5. The operations performed by this architecture are similar to those
of Fig. 4.1 with the exception that the CG and GR cells are now capable of performing
back rotation (see Fig. 4.6) and are embedded in a slightly more complicated modules
needed for scheduling. These modules are called GG’ and GR’, and are shown in
Fig. 4.7.

This architecture uses 0(m) cells (one GG’ cell and m GR’ cells) compared with
0(m2) cells (m GG cells and ”—2331 CR cells) used in the architecture shown in
Fig. 4.1, and yet has the same computational efﬁciency (per equation). Note however
that the LIFO stacks that were embedded in the triangular array of Fig. 4.1 are now
needed to hold the matrix T(n).

The system shown in Fig. 4.5 works as follows. Each equation (with its opti-
mal weight) enters the GR array (from the top) in a skewed order, and the matrix
[T(n) I d1(n)] is generated and stored in the appropriate memory units. Note that
the GR array can operate in two modes, forward (6 = +1) and backward (6 = —-l)
rotation modes (see Fig. 4.6). In the backward rotation mode, the equation to be
(partially or completely) removed is re-introduced to the GR array with the appropri-
ate weight (see Section 2.4.1). At the end of each recursion, the FIFO stacks contain
the lower triangular matrix TT(n) needed to solve for the vector g(n +1), and hence,
the value G(n+1). The LIFO stacks contain the upper triangular matrix T(n) needed

to solve for the parameter vector g(n). The values C(n + 1) = II g(n + 1) H?2 and

107

 

y(n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

y(n-3)
y(n-Z) -
NH)
4 l
66 ‘GR' GR GR
‘ 4,
E2“ [2ij Mad 1
F I.7 F L F L
1 1 I 1 I 1*
F r F F F F
o o o o o o
81- 82- 83 W-
m m I11!!!
G(n+1) e—i MAU
.a, . a, . a, —'l___. H
m.
lldll2
Y(n'l) F d MAU —’ I
. , l u
y(n-Z) F '
. o d12
Y(n'3) T

Figure 4.5: A compact architecture implementation of the adaptive SM-WRLS algo-
rithm.

 

108

 

If (Kin = O){

xm C=l
s = O
1
else{
x(n-l) ® ((3,8) x(n) = [x(n-1)2 + 5(xin)2] 1/2
c = x(n-1)/x(n)
s = xin / x(n)
X(n) }
(a)
xout xin

(C’s) _, F, (c,s) x(n): cx(n-l) + sxin5

X(n-1) ——q

T x0“, = -s x(n-l)5 + c xin5

x(n)

 

 

(b)

Figure 4.6: The operations performed by (a) the CG and (b) the GR cells used in the
modules of Fig. 4.7. 6 = +1 (—1) for rotating the equation into (out of) the system.

 

109

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

“XII-I) CG 7 (C,S)
Q
(a) n(n) "
y(n-k)
Gilli!) < £113“)
or diiln) I MPX | or ﬁlm)
(c.8) > (0.8) V
tg(n-1) ‘ GR
le-(n) <——— dIJ-lm)

 

 

 

 

 

(b)

 

33(1)) v

Figure 4.7: (a) The GG’ module and (b) the GR’ module used in the architecture of

Fig. 4.5.

 

110

 

Table 4.3: The timing diagram of the Givens rotation (GR) array of Fig. 4.5.

 

 

Inputs Outputs

Time O U C] D O D Cl Cl

0 y(n — 1)

1 y(n " 2) tn

2 y(n " 3) t1:

3 y(n) ‘22 tl3

4 in 6111

6 d13

 

 

 

 

 

 

|| d1(n) “3 are generated by the MAU’s. Note that the values which were propagating
downward in the triangular array of Fig. 4.1 are now propagating leftward due to
the new scheduling. Note also that the vector d1(n) is treated differently from the
matrix T(n). When the element d1,- is computed, it is stored in an internal register
in the GR’ cell (see Fig. 4.7). After generating and storing the matrix [T(n) I d1(n)],
the processor is ready to compute the vectors g(n + 1) and é(n) using the BS array.
The vector (h(n) is downloaded into the latches which serve as a LIFO stack used in
conjunction with the other LIFO stacks (containing the matrix T(n)) to solve for the
parameter vector é(n). The timing diagram of the GR array is shown in Table 4.3
in which the input (output) columns show the elements that are input (output) to
(from) the corresponding CG (0) or GR (Cl) cells. Compared to the triangular array
of Fig. 4.1, it is noted that the cell utilization per update (or downdate) has increased
by a factor of 2.25 for the case when m = 3, or by W in general. The operations
and timing diagram of the BS array is described in detail in Section 4.2.

The architectures of Figs. 4.1 and 4.5 can also be used for the suboptimal SM-

111

WRLS algorithm, however, they are not utilized to the same extent as they are in
the SM-WRLS algorithm because the suboptimal SM-WRLS typically uses fewer
data. The infrequent updating feature of this algorithm (and virtually all SM-WRLS
algorithms) might provide processing advantages in the systolic (and other parallel
processing) schemes by permitting the sharing of processing time and resources.
The complex scalar case can also be implemented using the same architectures of
Figs. 4.1 and 4.5 which now perform complex GG and GR Operations. These opera-
tions are well-deﬁned and are found in [56]. However, the generalized complex vector
case is not readily mapped into similar architectures. The generalized architecture

that efficiently implements this case requires further research.

4.4 Computational Complexities

The computational complexities (in flops per equation) for the scalar sequential GR-
based SM-WRLS algorithm and for that implemented using the architecture of Fig.
4.1 are shown in Table 4.4. Note that the complexities of the parallel GR-based
SM-WRLS algorithm shown in Table 4.4 are parallel complexities in the sense that
they denote the effective number of operations per equation, though many processors
can be performing this number of operations simultaneously. Accordingly the parallel
complexity indicates the time it takes the parallel architecture to process the data
regardless of the total number of operations performed by the individual cells. The
GG and GR operations constitute the main computational load of the algorithm as
shown in Table 4.5. In this table, the number of flops associated with the GR’s is
multiplied by five to account for the CG cycle time (see Section 4.2). These oper-
ations are avoided when the equation is rejected, and thus, a significant savings in
computation time. Since this technique uses 0(m) ﬂops per equation when imple-

mented using the parallel architecture, it is clearly advantageous with respect to the

112

 

Table 4.4: Computational complexities (in flops per equation) for the real scalar

sequential and parallel GR-based SM-WRLS algorithms.

 

 

 

 

Covariance and Example

SM-WRLS Algorithm Checking Solution Update (flops)
Sequential GR-based .5m2 + 2.5m + 13 2.5m2 + 10.5m + 5 160
Sequential Suboptimal GR-based (m + 1) + 8(.5m2 +1.5m + 12) 2.5m2 + 10.5m + 5 55
Parallel GR-based 3m + 14 11m + 10 68
Parallel Suboptimal GR-based (m + l) + 5(2m + 13) 11m + 10 26

 

 

 

 

 

 

 

 

Table 4.5: Parallel computational complexities (in ﬂops per equation) for the various
SM-WRLS algorithms using the implementations of Figs. 4.1 and 4.5.

 

 

Element Computed ﬂops per equation
63,02 + 1) m + l
Coefﬁcients of quadratic (1.26) 7
A‘(n + 1) 5 + f
C(n +1) and B(n) 2m + 1

 

If the equation is accepted:

Updating the equation m + 1 + f
Givens rotations 5(2m + 1)
x(n) 4

 

 

 

 

 

113

 

original 0(m’) sequential formulations [5, 6, 20].
If the fraction of the data accepted by the SM-WRLS is r (r is typically less than

30% [7]), then the total parallel computational complexity is given by

(3m + 14) + r[11m +10] (4.3)

flops per equation.

The adaptive compact architecture of Fig. 4.5, which has slightly more complicated
cells, is as efﬁcient as the architecture of Fig. 4.1. The only difference is that the
architecture can be used to rotate out (part of) an equation which was previously
rotated in. Therefore, the parallel computational complexity (per equation) does not
change. However, in the windowed technique, for example, there might be a need to
go through updating the system twice for a single equation; ﬁrst to rotate an equation
out (if it was accepted) and then to rotate the new equation in (if it is accepted). The
architecture (of Fig. 4.5) can be visualized as operating in two modes, the first mode
is when it is rotating an equation into the system (6 = +1) and the second mode is
when it is rotating an equation out of the system (6 = —1). Note that the two modes
have the same parallel computational complexity with some addition operations in
one mode replaced by subtraction operations in the other mode (see Fig. 4.6).

The total parallel computational complexities of the general adaptive SM-WRLS
algorithms (see Section 2.4.1) depend on the adaptive strategy employed, the per-
centage of the data accepted, and the number of times the algorithm rotates out an
equation from the system (whether partially or completely). Consider the windowed
adaptation, for example, which effectively slides a window of ﬁxed length through the
data by appropriate sequencing of rotating particular equations into and out of the
system. Suppose that the fraction of the data accepted (rotated in) by the windowed

SM-WRLS algorithm is r and the fraction of the data removed (rotated out) from

114

the system is u (u < r), then the total parallel computational complexity is given by
(3m+ 14)+(r+u)[11m+ 10] (4.4)

flops per equation. This expression also holds for the selective forgetting strategy,
however, the graceful forgetting strategy uses the same expression with it replaced by

I

if u. It is important to note that the adaptive techniques typically use more data

but produce better estimates.
To show the computational savings when using the “suboptimal” SM-WRLS al-
gorithm, it is noted in Section 2.3 that at each recursion, we only need to compute

c3,_1(n) and check if following condition holds (written here for the real scalar case)

cf._1(n) > i“(n) , (45)

whereas in the SM-WRLS algorithm, the coefficients of the quadratic (1.26), cf,_1(n),
C(n), and )(n) must be computed before making the decision. In the suboptimal
case, these quantities are computed only if condition (4.5) is met, and then the new
equation is accepted if the optimal weight is positive.

To calculate the total computational complexity for the suboptimal SM—WRLS
algorithm, let us denote the fraction of the data satisfying the condition (4.5) by
s (s < r) and the fraction of the data accepted by the SM-WRLS algorithm after
passing the test (4.5) by t (t S 3). Then the total parallel computational complexity

for the suboptimal algorithm is given by
(m + 1) +s[2m+ 13] +t[11m +10] (4.6)

flops per equation, which clearly shows signiﬁcant improvement over that of the SM-

WRLS algorithm (cf. (4.3)). The total parallel computational complexity for the

115

suboptimal windowed and selective forgetting strategies is given by

(m+1)+s[2m+l3]+(t+u)[llm+10] (4.7)

flops per equation, with u replaced by p’lu for the suboptimal graceful forgetting
strategy.

The fourth column in Table 4.4 shows the total number of flops per equation for
a typical example with m = 10, r = 0.2, and s = t = 0.1. It shows that the parallel
architecture reduces the complexity of the algorithm by about 60%, and when the
suboptimal strategy is employed, the complexity is reduced by 84%.

The performance of the SM-WRLS algorithm in terms of its adaptive behavior
and tracking capability, solution quality, and fraction of data used requires further
research; however, it is important to note that the gain in the computational complex-
ity of the GR-based algorithm, when implemented on a sequential machine, is only
two to three times when compared to that of the MIL-based WRLS (see Table 2.1),
and ﬁve to six times when implemented on a parallel machine (see Table 4.4). It is
the suboptimal technique that gives an order of magnitude (13 to 14 times) gain in
the computational complexity when implemented on a parallel machine, and gives six
to seven times gain when implemented on a sequential machine. Therefore, it makes
more sense to use the suboptimal technique for speed advantages since the estimates
are not very different from those of the SM-WRLS algorithm.

The computational complexity of the generalized complex vector case of the SM-
WRLS algorithm developed in Section 2.2 when computed on a sequential machine
is discussed in Section 2.5. However, the parallel computational complexity of the
generalized parallel GR—based algorithm depends on the architectural implementation

of this algorithm which requires further research.

 

116

Chapter 5

Conclusions
and Further Work

 

5.1 Algorithmic Developments

5.1.1 A Generalized SM-WRLS Algorithm

This research has been concerned with a class of SM algorithms for estimating the
parameters of linear system or signal models in which the error sequence was pointwise
“energy bounded.” Speciﬁcally, it was focused on the SM-WRLS algorithm which
works with bounding hyperellipsoidal regions to describe the solution sets which are
a consequence of the error bounds. SM-WRLS is based on the familiar WRLS solution
with the SM considerations handled through a special weighting strategy. However,
the original theoretical development of this algorithm made it applicable to real scalar
data. Due to the strong potential for using this powerful algorithm in virtually
any signal processing problem involving parametric models, this algorithm has been
extended to work with a wider range of problems. The theoretical development of a
generalized SM-WRLS algorithm that can handle complex vector-input vector-output
data streams has been presented. The original SM-WRLS algorithm is a special case

of the generalized algorithm.

117

5.1.2 Suboptimal Tests for Innovation

A new strategy has been developed which can be applied to virtually any version
of the SM-WRLS algorithm to improve the computational complexity. A signiﬁcant
reduction in computational complexity is achieved by employing a “suboptimal” test
for information content in an incoming equation. The suboptimal check has been
argued to be a useful determiner of the ability of incoming data to shrink the ellipsoid,
but one which does not rigorously determine the existence of an optimal SM weight
in the SM-WRLS sense. The main issue is to avoid the computations of an 0(m2)
checking procedure required to check for the existence of a meaningful weight. Since
most of the time these computations would result in the rejection of incoming data,

a more efﬁcient test signiﬁcantly reduces the complexity of the algorithm.

5.1.3 Adaptive SM-WRLS Algorithms

It has been argued that the “unmodiﬁed” SM-WRLS algorithm has inherent adaptive
capabilities in its own right. However, it is not possible to depend upon this algorithm
to reliably behave in an adaptive manner, particularly in cases of quickly varying
system dynamics. In this work, explicitly adaptive SM-WRLS algorithms have been
developed. Adaptation was incorporated into SM-WRLS in a very general way by
introducing a flexible mechanism by which the algorithm can forget the inﬂuence
of past data. The general formulation permitted the extension of SM-WRLS to a
wide range of adaptation strategies. Three different adaptation techniques have been
presented and tested on models derived from real speech data. Windowing is a simple
way to make the algorithm adaptive by effectively sliding a window of ﬁxed length
through the data by appropriate sequencing of “adding” or “removing” equations.
The Graceful Forgetting technique removes only a fraction of all previous equations
at each time. The Selective Forgetting technique chooses the equations to be (partially

or completely) removed from the system based on certain user defined criteria.

118

A survey of the computational complexities of several related sequential algorithms
has been presented which shows the computational savings obtained when using the
SM-based algorithms compared with the conventional WRLS algorithms. The dif-
ferences in the computational complexities among the various SM-based algorithms

have been discussed.

5.2 Simulation Studies

The SM-WRLS algorithms have been tested on models derived from real speech data
representing the words “four,” “six,” and “seven” using an AR(2) model for the ﬁrst
two words and AR(14) model for the third. The simulation results presented illustrate
important points about the various methods and show that the adaptive algorithms
yield accurate estimates using very few of the data and quickly adapt to fast variations
in the signals dynamics. It is signiﬁcant that in preliminary experiments, most of the
SM-WRLS algorithms have been found to be robust in small (16-bit) wordlength

environments.

5.3 Architectures and Complexity Issues

The “nonadaptive” SM-WRLS algorithm has been formulated to be run on a parallel
architecture. An architecture has been developed that implements this algorithm
in 0(m) flops per equation. Then, this architecture (which uses 0(m2) cells) was
mapped into a compact architecture in order to increase the cell utilization at the
expense of using slightly more complicated cells (needed for scheduling). The compact
architecture uses only 0(m) cells which is clearly advantageous with respect to the
0(m2) cells architecture. The cells used by the compact architecture were upgraded
to implement the adaptive strategies. It was also noted that the same architectures

could be used to implement the “suboptimal” strategy.

119

Finally, a detailed analysis of the computational complexity issues was carried out
which clearly shows the signiﬁcant computational savings when implementing the SM-
WRLS algorithm using the parallel architectures. The computational complexities of
the adaptive SM-WRLS algorithms depend on the adaptive strategy employed, the
percentage of the data accepted (which is typically more than that of the nonadaptive
algorithms), and the number of times the algorithm rotates out an equation from the
system. The analysis also shows that the suboptimal strategy (whether implemented
sequentially or using the parallel architectures) provides signiﬁcant improvements in

the computational efﬁciencies of the various algorithms.

5.4 Further Work

This research has been concerned with problems in which the error sequence was
pointwise energy bounded (see constraint (1.1)). It is of great interest to study other
classes of SM algorithms that use different constraints. Other interesting variations
involve stability constraints [30], and other noise parameter bounds [31, 32].

The optimization criterion used in this research was based on minimizing the
“volume ratio” of the ellipsoids at n and n — 1 (see (1.25)). It might be useful to use
different optimization criteria in order to minimize the ellipsoid volume. For example,
it may be possible to deﬁne a strategy that efﬁciently minimizes the longest axis of
the ellipsoid.

The adaptive SM-WRLS algorithm presented in Section 2.4 works with a very
flexible “forgetting” scheme. Three different techniques were presented and tested,
however, it is possible to deﬁne (and test) various other adaptation strategies.

It is true that the SM-WRLS algorithm was tested on models derived from speech
data (since they are available in the Speech Processing Laboratory), however, this

algorithm has the potential for application to a wide range of problems. It is of great

interest and significance to study the performance of this algorithm when applied to
beamforming, neural networks, and other important applications.

Finally, the computational complexity of the GR-based SM—WRLS algorithm can
be further improved by incorporating more parallelism (in hardware) within the cells
of the architectures of Chapter 4. Also, a generalized architecture is needed to efﬁ-

ciently implement the complex vector case.

Souheil F. Ode/z
East Lansing, Michigan

May, 1990

 

Appendix

Appendix A

The Relationship between C$(n) and )(n)

 

The theoretical developments in this appendix are for the real scalar case. The
generalization to the complex vector case is straightforward.

It was noted in Section 1.2.3.1 that the system of equations (1.8) (or (2.9) for the
generalized case) can be reduced to an upper triangular system (1.27) by applying a
sequence of orthogonal operators (GR’s). Suppose that a new equation is accepted
with an optimal weight )(n), it can be rotated into the upper triangular system by

inserting it in the m + 1" row, i.e.,

T(n-1) l d1(n—1)

 

 

 

 

 

_,/1(n).r(n.) —g . Amigo)

and applying another sequence of m GR’s leaving the system in the form

_ T(n) dim)

oixm l. dzfn)

 

 

 

 

b

where the matrix T(n) is an m x m upper triangular Cholesky factor [39] of Cr(n)

(sec (1.30)). This sequence of GR’S is given by
anl=Jme—1°”JzJi (A3)

where J,- denotes an (m + 1) x (m + 1) orthogonal matrix that annihilates element i
of the m + 1” row of (A.l).

It can be shown that Q(n) takes the form

i C1 0 0

 

 

0 81
-3231 c2 0 0 32c]
—s3c251 —8382 c3 ~-- 0 S3C2C1
-s4c3c231 —S4C332 ~3433 0 s4c3c2c1
-3mCm-1 ' ' ' €231 -8mCm—1"'0332 —3mCm—1 ° ' ' 0483 ° " Cm SmCm—1°'°Ci
L "'CmCm—l ' ' '6231 _Cmcm—1”'C332 -CmCm—1 ' ° ' C483 ' ' ° 3m CmCm—1"°Cl .

in which c,- (s,) is the cosine (sine) term associated with the i‘” rotation. This form
of Q(n) simpliﬁes the generation of the matrix T(n) from T(n - 1) and is useful in

ﬁnding det C1.(n) below. Since the matrix T(n) is upper triangular, then

det T(n) = ﬁt,,-(n) (A.4)
and
det C;(n) = [det T(n)]2 = ﬁt?,~(n) (A5)

in which t,,- denotes the i‘“ diagonal element of the matrix T(n).

After some (tedious) algebraic manipulation, it follows that:

Case m =1:

det C,(n) = t¥,(n)

Case m = 2:

detC,(n) = t22(n)tfl(n)

= 13,0. _1)z';‘,(n)+ )(n) [1,,(n -1)$2(n)-t12(n —1)xi(n)l2
Case m = 3:

detsznl = t§3(n)t;2(n)tf1(n)
= t§3(n " llt§2(n)tfl(n) +
)(n) {t22(n — 1) [t11(n — 1):r3(n) — t13(n —1):c1(n)]—

t23(n —1)[tn(n — 1);rg(n) — t12(n —1):r1(n)]}2

Case m = 4:

detcxn) = t:.(n)t§.(n)t§.(n)t:.(n)
= t:.(n - 1)t§.(n)t§.(nit:.(n) +
Mn) {t33(n - 1) {t22(n — 1) [t11(n — l):r4(n)-t14(n -1):r1(n)]—
w(n —1)[tu(n -1)h(n)-t12(n—1)r1(n)l}-
t34(n — 1) {t22(n — 1) [tn(n -— 1):c3(n) — t13(n -1):r1(n)] —

t23(n — 1) [t11(n -- 1)h(n) - t12(n - 1):1:1(n)]}}2

124

and so on. Therefore, det C,(n) is a monotonically increasing function of Mn). D

 

125

Bibliography

Bibliography

[1] SH. Mo and J.P. Norton, “Fast and robust algorithm to compute exact polytope
parameter bounds,” Mathematics and Computers in Simulation, to appear in
1990.

[2] J.P. Norton and SH. Mo, “Parameter bounding for time-varying systems,” Math-
ematics and Computers in Simulation, to appear in 1990.

[3] A.K. Rao, Y.F. Huang, and S. Dasgupta, “ARMA parameter estimation using
a novel recursive estimation algorithm with selective updating,” IEEE' Trans.
Acoust., Speech, and Signal Process, vol. 38, pp. 447-457, March 1990.

[4] SF. Odeh and J.R. Deller, Jr., “A systolic algorithm for adaptive set membership
identiﬁcation,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’90,
Albuquerque, NM, vol. 5, pp. 2419-2422, April 1990.

[5] .I.R. Deller, Jr., “A ‘systolic array’ formulation of the optimal bounding ellipsoid
algorithm,” IEEE Trans. Acoust, Speech, and Signal Process, vol. 37, pp. 1432-
1436, Sept. 1989.

[6] JR. Deller, Jr., “Set membership identiﬁcation in digital signal processing,”
IEEE ASSP Magazine, vol. 6, no. 4, pp. 4-20, Oct. 1989.

[7] JR. Deller, Jr. and T.C. Luk, “Linear prediction analysis of speech based on set-
membership theory,” Computer Speech and Language, vol. 3, no. 4, pp. 301-327,
Oct. 1989.

[8] A.K. Rao and Y.F. Huang, “Analysis of ﬁnite precision effects on an OBE algo-
rithm,” Proc. IEEE Int. Conf. AcousL, Speech, and Signal Process. ’89, Glasgow,
vol. 2, pp. 853-856, May 1989.

[9] JR. Deller, Jr. and SF. Odeh, “Implementing the optimal bounding ellipsoid
algorithm on a fast processor,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal
Process. ’89, Glasgow, vol. 2, pp. 1067-1070, May 1989.

126

[10] E. W'alter and II. I’iet-Lahanier, “Estimation of parameter bounds from bounded-
crror data: A survey,” Proc. 12‘" [MACS World Congress on Scientiﬁc Compu-
tation, Paris, pp. 467-472, July 1988.

[11] E. Walter and H. Piet-Lahanier, “OMNE versus least squares for estimating
parameters of a biological model from short data records,” Proc. 12‘” [MA CS
World Congress on Scientiﬁc Computation, Paris, pp. 85-87, July 1988.

[12] H. Piet-Lahanier and E. Walter, “Practical implementation of an exact and re-
cursive algorithm for characterizing likelihood sets,” Proc. 12’“ [MA CS World
Congress on Scientiﬁc Computation, Paris, July 1988.

[13] R. Tempo, “Robust estimation and ﬁltering in the presence of bounded noise,”
IEEE Trans. Automatic Control, vol. AC-33, pp. 864-867, 1988.

[14] S. Dasgupta and Y.F. Huang, “Asymptotically convergent modiﬁed recursive
least squares with data-dependent updating and forgetting factor for systems
with bounded noise,” IEEE Trans. Information Theory, vol. IT-33, pp. 383-392,
1987.

[15] J .P. Norton, “Identiﬁcation of parameter bounds of ARMAX models from records
with bounded noise,” Int. J. Control, vol. 45, pp. 375-390, 1987.

[16] J .P. Norton, “Identiﬁcation and application of bounded-parameter models,” Au-
tomatica, vol. 23, pp. 497-507, 1987.

[17] E. Walter and H. Piet-Lahanier, “Exact and recursive description of the feasible

parameter set for bounded error models,” Proc. 26"" IEEE Conf. Decision and
Control, Los Angeles, pp. 1921-1922, 1987.

[18] J.R. Deller, Jr. and T.C. Luk, “Set-membership theory applied to linear pre-
diction analysis of speech,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal
Process. ’87, Dallas, vol. 2, pp. 653-656, 1987.

[19] Y.F. Huang and A.K. Rao, “Application of a recursive estimation algorithm with
information-dependent updating to ARMAX models and ARMA models with
unknown inputs,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process.
’87, Dallas, vol. 2, pp. 1007-1010, 1987.

[20] Y.F. Huang, “A recursive estimation algorithm using selective updating for spec-
tral analysis and adaptive signal processing,” IEEE Trans. Acoust., Speech, and
Signal Processing, vol. ASSP-34, pp. 1331-1334, 1986.

[21] V. Broman and M.J. Shensa, “Polytopes, a novel approach to tracking,” Proc.
25’“ IEEE Conf. Decision and Control, Athens, pp. 1749-1752, 1986.

[22] M.L. Feron and J.R. Deller, Jr., “‘Membership set’ identiﬁcation applied to
the impulse train excited AR model” (abstract), Final Program: SIAM 1983
National Mtg., Denver, p. 30, 1983.

127

[23] E. Fogel and Y.F. Huang, “On the value of information in system identification
--— Bounded noise case,” Automatica, vol. 18, pp. 229—238, 1982.

[24] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals eval-
uation in presence of unknown but bounded errors: Linear families of models and
estimators,” IEEE Trans. Automatic Control, vol. AC-27, pp. 408-414, 1982.

[25] E. Fogel, “System identiﬁcation via membership set constraints with energy con-
strained noise,” IEEE Trans. Automatic Control, vol. AC-24, pp. 752-758, 1979.

[26] B.R. Barmish and J. Sankaran, “The propagation of parametric uncertainty via
polytopes,” IEEE Trans. Automatic Control, vol. AC-24, pp. 346-349, 1979.

[27] EC. Schweppe, “Recursive state estimation: Unknown but bounded errors and
system inputs,” IEEE Trans. Automatic Control, vol. AC-13, pp. 22-28, 1968.

[28] HS. Witsenhausen, “Sets of possible states of linear systems given perturbed
observations,” IEEE Trans. Automatic Control, vol. AC-l3, pp. 556-568, 1968.

[29] DR Bertsekas and LB. Rhodes, “Recursive state estimation for a set-
membership description of uncertainty,” IEEE Trans. Automatic Control, vol.
AC-16, pp. 117-128, 1971.

[30] P.L. Combettes and H.J. Trussell, “Stability of the linear prediction ﬁlter: A set
theoretic approach,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process.
’88, New York, vol. 4, pp. 2288-2291, 1988.

[31] P.L. Combettes and H.J. Trussell, “General order moments in set-theoretic esti-
mation,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’89, Glas-
gow, vol. 4, pp. 2101-2104, May 1989.

[32] P.L. Combettes, Set Theoretic Estimation in Digital Signal Processing (Ph.D.
Dissertation), North Carolina State University, Raleigh, 1989.

[33] SM. Veres and J.P. Norton, “Structure identiﬁcation of parameter-bounding
models by use of noise-structure bounds,” Int. J. Control, vol. 50, pp. 639-649,
1989.

[34] D. Graupe, Time Series Analysis, Identiﬁcation, and Adaptive Systems, Malabar,
FL: Krieger, Ch. 5, 1989.

[35] L. Ljung and T. Séderstrém, Theory and Practice of Recursive Identiﬁcation,
Cambridge, MA: MIT Press, pp. 14-15 & Sect. 2.2.1., 1983.

[36] S.Y. Kung, VLSI Array Processors, Englewood Cliffs, NJ: Prentice Hall, 1988.

[37] ED. Van Veen and K.M. Buckley, “Beamforming: A versatile approach to spatial
ﬁltering,” IEEE ASSP Magazine, pp. 4-24, April 1988.

128

[38] T. Kohonen, Self-Organization and Associative Memory (2”" ed.), Berlin, llei-
delberg: Springer-Verlag, Chs. 6 8". 9, 1988.

[39] CH. Golub and GE. Van Loan, Matrix Computations, Baltimore, MD: .lolms-
Hopkins Univ. Press, Chs. 5 & 6, 1983.

[40] J.R. Deller, Jr. and GP. Picaché, “Advantages of a Givens rotation approach to
temporally recursive linear prediction analysis of speech,” IEEE Trans. Acoust.,
Speech, and Signal Process, vol. 37, pp. 429-431, March 1989.

[41] T.C. Luk and J.R. Deller, Jr., “A nonclassical WRLS algorithm,” Proc. 23rd
Ann. Allerton Conf., pp. 732-741, 1985.

[42] W.M. Gentleman and HT. Kung, “Matrix triangularization by systolic arrays,”
Proc. Soc. Photo-Optical Instrumentation Engineers: Real-Time Signal Process-

ing IV, vol. 298, pp. 19-26, 1981.

[43] J.G. McWhirter, “Recursive least squares minimization using a systolic array,”
Proc. Soc. Photo-Optical Instrumentation Engineers: Real- Time Signal Process-

ing VI, vol. 431, pp. 105-112, 1983.

[44] J.R. Deller, Jr. and D. Hsu, “An alternative adaptive sequential regression al-
gorithm and its application to the recognition of cerebral palsy speech,” IEEE
Trans. Circ. and Syst., vol. GAS-34, pp. 782-787, July 1987.

[45] J .R. Deller, Jr., Unpublished research notes, Michigan State University, East
Lansing, 1990.

[46] M.L. Feron, “Membership set system identiﬁcation with pulse train input,” M.S.
Thesis, Illinois Institute of Technology, Chicago, 1982.

[47] J. Makhoul, “Linear prediction: A tutorial review,” Proc. of the IEEE, vol. 63,
pp. 561-580, 1975.

[48] W.H. Press, B.P. Flannery, S.A. Teukolsky, and WT. Vetterling, Numerical
Recipes in C: The Art of Scientiﬁc Computing, Cambridge, MA: Cambridge
Univ. Press, pp. 209-213, 1988.

[49] D.F. Marshall and W.K. Jenkins, “A fast quasi-Newton adaptive ﬁltering algo-
rithm,” Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. ’88, New
York, NY, vol. 3, pp. 1377-1380, 1988.

[50] J.M. Ciofﬁ, “Limited-precision effects in adaptive ﬁltering,” IEEE Trans. Circ.
and Syst., vol. GAS-34, pp. 821-833, July 1987.

[51] S. Ljung and L. Ljung, “Error propagation properties of recursive least squares
adaptation algorithms,” Automatica, vol. 21, pp. 157-167, 1985.

[52] J.M. Ciofﬁ and T. Kailath, “Fast, recursive least squares transversal ﬁlters for
adaptive ﬁltering,” IEEE Trans. Acoust., Speech, and Signal Processing, vol.

ASSP-32, pp. 304-337, April 1984.

[53] HT. Kung and C. Leiserson, “Algorithms for VLSI processor arrays,” Rpt. No.
MU-CS-79-103, Dept. of Computer Sci., Carnegie-Mellon Univ., 1978 (Reprinted

in reference [54]).

[54] C. Mead and L. Conway, Introduction to VLSI Systems, Reading, MA: Addison-
Wesley, Section 8.3, 1980.

[55 W .M. Gentleman, “Least squares computations by Givens transformations with-

out square roots,” J. Inst. of Math. and its Appl., Vol. 12, pp. 329-336, 1973.

[56] S. Haykin, Adaptive Filter Theory, Englewood Cliffs, NJ: Prentice Hall, Ch. 10,
1986.

 

130

"iiiiiiiiiiiiiir