SYSTEM IDENTIF!CAT!0N BY BAYESIAN LEARNING .

Thesis for the Degree of Ph. D.’
MICHIGAN STATE UNWERSITY
PATRICK J. DONOGHUE
1968

b .,, .t
1‘1 .

'I'hcblb

0-169

IIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIL

31293 00995 8731

This is to certify that the

thesis entitled

System Identification by Bayesian Learning
presented by
Patrick J. Donoghue

has been accepted towards fulfillment
of the requirements for

2% degree in —EE__

.. /) ' " J _.‘

l 'I x r . ’3!

..//4 "1:74;”! gun/l “1/"-
V

Major professor

 

‘ ,l‘ ’1’“

-- 1’1 "x l _ 1‘ / '- '
Date -' " g/U,/.-’,/< ”9

 

I" N

ABSTRACT
SYSTEM IDENTIFICATION BY BAYESIAN LEARNING

by Patrick J. Donoghue

The problem of system identification is of fundamental
importance both from a practical and system-theoretic point of
View. As such it has received wide attention in the literature.
However, much of the work in this area suffers from the lack of
a unifying structure. This is especially true of those identifica-
tion techniques that treat problems concerned with noise-obscured
measurements and unobservable random disturbances.

In this thesis the approach taken to system identification
is Bayesian learning. The systems considered are described by
a finite set of difference equations relating the system states and
inputs, and by a finite set of algebraic equations relating the sys-
tem states, inputs, and observations. The measurement of all. ,
states and all inputs is assumed to be obscured by additive noise.
Further, the system states are assumed to be influenced by un-
observable additive random disturbances. The object of the
identification is to determine the matrices or constants that

specify the system.

PATRICK J. DONOGHUE

Using Bayesian learning for identification allows the deriva-
tion of a general identification algorithm which is practical, includes
many previous results as special cases, and provides a framework
for solving new problems. The identification algorithm is iterative
and has the form of a general stochastic approximation algorithm.
Thus the algorithm operates on the data as it becomes available and
produces a sequence of estimates for the parameters which specify
the system. These estimates are Bayes-optimal in some cases con-
sidered and are sub-optimal in others. The specific identification
algorithms for each class of systems considered have one simple
structure and are computationally feasible.

The introduction of the concepts of identifiability and strong
identifiability provides a workable basis from which convergence of
the general identification algorithm can be established. Three
theorems on the convergence of the identification algorithm to the
true system parameters as the number of observations becomes
infinite are proved using these concepts. Two of these theorems
are new results in identification. A new proof for the third result
follows from a strengthening of the hypotheses of the other two
theorems.

The general identification algorithm is used to derive

Specific algorithms for important classes of systems. Algorithms

PATRICK J. DONOGHUE

for both stationary and time-varying linear systems are obtained,
and the derivation of algorithms for nonlinear systems is indicated.
Other formulations are given, including systems with state distur-
bances having an unknown mean and systems with generated inputs.
To demonstrate that implementation of the proposed algorithm
is not difficult and to demonstrate that the algorithm does indeed con-
verge, a fourth-order digital control system with eight unknown
parameters is identified. Computer-sirnulated results show the
behavior of the algorithm under different initial estimates, noise

conditions, and a-priori uncertainty.

SYSTEM IDENTIFICATION BY BAYESIAN LEARNING
BY

Patrick J. Donoghue

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering

1968

5‘\

ACKNOWLEDGEMENTS

The author wishes to thank his thesis advisor, Dr. R.
C. Dubes, for his continued assistance and inspiration during the
preparation of this work and for his invaluable guidance as a
friend and educator.

Thanks are also due to Dr. H. E. Koenig and Dr. G. L.
Park for their assistance as academic advisors and to Dr. E. A.
Nordhaus and Dr. R. B. Zemach for their interest in this work.

Finally the author owes a great debt of love and gratitude
to his wife, Sharon, for her unselfish patience and understanding

during his graduate studies and the preparation of this thesis.

ii

TABLE OF CONTENTS

ABSTRACT

ACKNOWLEDGEMENTS ............ . . . . . .

LISTOFFIGURES.....................

LIST OF APPENDICES .................. .
I INTRODUCTION .......... . . . . ......
Modeling and Induction .......... . . . .

II

III

System Theory and the Identification Problem .
Applications of System Identification . . . . . .
General Identification Techniques ..... . . .
ObjectoftheThesis. . . . . . . . . . . . . . .

D—ll—lD—ID—‘l—l
£11900an

FORMULATION OF THE IDENTIFICATION
PROBLEM AS BAYESIAN LEARNING. . . . . . . . .

2.1 Identification by Estimation and Learning. . . .
2. 2 Previous Results in Identification by
Estimation and Learning. . . . . . . . . . . . .
2. 3 Basic Equations and. the A-posteriori
Distribution....................
2.4 Alternate Forms for the A-posteriori
Distribution .............. . . . . . .
2.5 Summary ....... ......

SOLUTION OF THE BAYESIAN IDENTIFICATION
PROBLEM ....... ...... .

Derivation of the Identification Algorithm. . . .
Convergence Theorems and Assumptions. . . .
Derivation of Previous Results as Special Cases
Summary ..... . ...............

wwww
o. O
Jimmi—

iii

Page

ii

vi

00¢me

10
10
l4
17
19
24
2.6
26
32

36
39

IV SPECIAL FORMS OF THE ALGORITHM .

parse-94:4:
O‘U‘II-hUONH

Summary

V AN EXAMPLE

UTU1UI

Summary

VI CONCLUSION . . . ...... .

. l
. 2 Computer Simulation Results
. 3

Linear Stationary Systems . . . . .
Linear Time-Varying Systems . ..... . . . .
Stationary Nonlinear Systems. . . .
Time-Varying Nonlinear Systems . . .
Miscellaneous. . . . . .

Problem Statement and Formulation

6.1 Results of the Thesis ........
6.2 Possible Extensions . . .

BIB LIOGRAPHY . .

iv

40

41
44
50
51
53
56

57
57
63
68
7O

70
72

LIST OF FIGURES

Figure Page
3.1.1 General system configuration . . ...... 27
5. 1. 1 Digital control system with noisy

transmission and feedback paths . ...... 59
5. 2.1 Behavior of a typical estimator ....... 66
5. 2. 2 Effect of noise power on convergence . . . . 66
5. 2. 3 Effect of initial estimates on convergence . . 67
5. 2. 4 Effect of uncertainty on convergence ..... 67

LIST OF APPENDIC ES

Page
APPENDIX
I CONVERGENCE PROOFS FOR THE
IDENTIFICATION ALGORITHM. . . . . . . 74
II THE REPRODUCING NORMAL AND ITS

QUADRATIC FORM ............. 88

vi

I INTRODUCTION

This chapter provides a foundation for the results in this
thesis. Section 1. 1 indicates in a general way how induction and
modeling are related. The role of identification in the general
structure of system theory is discussed in section 1. 2, and the wide
range of practical applications for identification techniques is indi-
cated in section 1. 3. Various general identification techniques are
listed. in section 1. 4 to show how Bayesian learning is related to other
methods. Section 1. 5 discusses the basic results that will be derived

in the thesis.

1. 1 Modeling and Induction

The fundamental goal of virtually all scientific investigation
is to acquire knowledge about the real world (except perhaps for the
mathematical formalist's investigation [SCH] ). This real world may
take on diverse appearances depending upon a particular investiga—
tor's background and individual goals.

Information about the real world. usually appears in the form

of causality relations, and one medium through which these causality

relations are di8p1ayed is mathematics. In fact, mathematics is
practically the only universally accepted tool for handling the rela-
tionships which arise in scientific studies. Mathematics provides
a systematic procedure for predicting and analyzing the observable
characteristics of an investigator's environment. This is parti-
cularly true in those systems that are of practical interest. This
prediction and analysis is realized by development of a mathematical
model, i. e. , a finite set of mathematical relationships which inter-
relate certain observable or measurable quantities of the system
under consideration. The derivation or synthesis of the relation-
ships forming the model is accomplished in terms of the structural
features of the system. These features are the characteristics of
the individual members of the aggregate and their manner of inter-
action or interconnection. The individual characteristics are
obtained by assumption, a-priori information, or sets of funda-
mental measurements. The test of the mathematical model's
validity rests upon the connection between the variables of the
mathematical structure and their physical counterpart. Hopefully,
the relation between these entities is isomorphic [AHL] . That is,
the values assumed by the variables in the mathematical model are
in a one-to-one pr0perty correSpondence with the values that are
measured.

An alternative way of arriving at mathematical models of
physical systems is to employ inductive inferential relationships

[CAR][AY] . That is, a partially specified mathematical model

is assumed to be an accurate counterpart of the physical system and
appropriate measurements are taken to confirm or deny the hypo-
thesis. If all variations of the original model fail to accurately pre-
dict or account for the observations, then the model is changed or
discarded. If, however, suitable manipulation or further specifica-
tion of the model results in accurate prediction of the observations
for some finite time, then the model is retained. Since the phenomena
are only observed for a finite time it is impossible to verify the
hypothesis completely and only confirmational support for it is ob-
tained [HE] [ POP] . This is the nature of inductive logic. The
model is retained only as long as it is able to account for all the
observations. Even if no contradictory evidence occurs for a very
long time, the original hypothesis is not necessarily verified. The
approach taken in this thesis, called identification by Bayesian

learning,is based on inductive inference.

1. 2 System Theory and the Identification Problem

In the foregoing discussion the nature of the problem to be
considered. here was indicated only in a very general way. In this
section the problem will be defined in the context of contemporary
system theory.

In one of his treatments of general system theory, Zadeh
[Z-l] has given a comprehensive listing of the problems that are

included in this theoretic structure. Among these problems are

system analysis, synthesis, control, stability, reliability, learning,
and signal theory. In addition three distinct aSpects of the theory are
singled out for Special consideration because of their fundamental
importance. These are the problems of system characterization,
classification and identification.

The first of these problems deals with the representation of
input-output relations. These representations may be expressed as
solutions of differential equations, state functions, transfer functions,
integral operators or in any other convenient form and may change
according to whether the system is time-continuous or discrete, sto-
chastic, causal or finite state among other considerations. The
second problem, classification, is concerned with determining class
membership when a system is assumed to belong to one of a family
of classes. The classification relies on observations of system
input and output. Two problems that are in this category are:

(1) determination of the order of a system's differential or difference
equation, and. (2) determination of whether a time-varying system is
linear or nonlinear. Lastly, and most important for this discussion
and often for practical situations, is the problem of system identi-
fication. Generally, the identification problem considers means

for determining the Specific characteristics of a system through
observations of the input and output.

A precise definition of identification is now stated [Z-l]:

Given a class of systems S where each member of the class is

completely Specified, the identification of a system A consists in
finding a system s 6 S that is input-output equivalent to A. It is
important to note that the definition requires input-output equivalence
and does not require 3 c S and A to be identical. For a given input-
output relation, there is generally no unique system representation
[K-l][ARN][Z-2] . In this thesis systems are assumed to be com-
pletely Specified within a parameter set and the purpose of identifi-
cation is to determine these parameters.

There are three major complications which normally enter
into any real identification problem. The first of these is the absence
of knowledge concerning the system's initial condition;-_ the second
is the presence of random noises obscuring the observations of
inputs and outputs; last is the difficulty in establishing a meaningful
and convenient method for estimating the system parameters as a

function of the observations .

l. 3 Applications of System Identification

Probably the most common use of identification techniques
is found in the area of automatic control system design. Here a
device such as a servomechanism is identified using sinusoidal or
step response data. The identification consists of determining poles
and zeros or time constants. Knowledge of these system character-
istics aids in the design of devices which may improve total system

performance.

In process control situations, such as found in oil refineries
or batch production industries, the input-output relations may be
very complicated or essentially unknown. Only by identifying the
characteristics of the process from operating records can any kind
of satisfactory control policy be developed.

Many other systems of interest are describable by sets of
equations which have slowly varying parameters. Through continuous
monitoring, the system can be approximately identified at each instant
in time and effectively controlled. Controllers which utilize con-
tinuous on-line identification are called (parameter) adaptive con-
trols [AO-1][ SK] . Still other systems may vary in a random manner.
By using a system identifier in a learning loop, effective learning
control systems can be developed [SK].

Identification techniques can also be applied to communica-
tion problems. For example a communication channel may some-
times be characterized by a slowly-varying linear system. Con-
tinuous identification of the channel can make more efficient
communication pos sible [ KAI] [ HAN] [ DAL] .

Two more areas where identification techniques are of
interest are pattern recognition and reliability. The object in
pattern recognition problems is to distinguish one probability
distribution from another. If each distribution is assumed to be

generated. by a different system, then an identification scheme

may be effective in discriminating among these patterns. In reli-
ability problems a system reliability index may depend upon the
system parameters which can vary. By identifying the system
parameters, a check on the reliability of the system can be main-

tained .

1.4 General Identification Techniques

A wide variety of techniques have been developed to solve
some aSpect of the identification problem. Many of these techniques
were developed to study specific processes of a practical interest
[IFAC] . Others have been deveIOped to attack the problem in a
very general way [BAL- 1, Z] [ HE] .

Most identification techniques may be classified either as
statistical or analytic, depending upon whether or not the problem
formulation accounts for random effects. These general areas

may be subdivided as follows:

A. Statistical Methods [EY]
1. Parameter Estimation

a. Maximum Liklihood
i) conditional [ SMI] [ RAU]
ii) unconditional [EY]

b. Regression Analysis and Least Squares
1) linear [ALB] [ STI]
ii) nonlinear [ALB]

c. Stochastic Approximation [ HO- 1] [ LEE]
[SAR] [ SAK— 1, 2]

d. Bayesian Estimation [FK—3] [ MA][AO-l]

2. Learning Model Techniques [SK][ TSY] [ FK-l]
[FU-l]

3. Spectral Analysis (Wiener-Kolmogorov Theory)
[DA] [ LAN][ LEV] [ AND] [ WE-l, 2][ LE- 1, 2]

B. Analytic Methods
1. Functional State Representations [HE] [ BAL-l, 2]
2. Gradient Techniques [ BAL-Z] [ MG]

3. Numerical Methods and Approximations [ BX][ CU-Z]
[BEL-1, 2] [ 30-2]

4. Frequency Domain Techniques [SG-l] [ PU] [ BEL-1]
[CH] [ AH] .

Most identification techniques in the literature apply only to
stationary linear systems. Some work has been done on nonlinear
problems, but it is a relatively untouched area even in those pro-
blems that are of important practical interest. Relatively few
studies are based upon mathematical statistics and probability
theory. However, work in this area is significant and treats a

wide variety of problems.

1. 5 Object of the Thesis

In this thesis system identification, as defined in section 1. 2,
is approached by a method called Bayesian learning. The systems
considered are represented by a finite set of difference equations
relating the system states and inputs, and by a finite set of alge-
braic equations relating the system states, inputs and observations.

Identification of such systems when the states are subjected to

unobservable additive random disturbances and when observations are
corrupted by additive noise will be examined in this study.

Most of the work in this area lacks a unifying approach. The
purpose of this study is to derive a general identification algorithm
which is convenient and practical, includes many previous results,
extends the applicability of some of them, and allows the derivation
of new results. The identification algorithm is iterative. That is,
it operates on the data as it becomes available and produce a sequence
of estimates for the parameters which Specify the system. These
estimates are approximations to estimates which are optimal in a
sense to be defined, by virtue of an approximation to Bayes' rule.
The specific algorithms for each class of systems have one simple
structure and are computationally feasible. Furthermore, the
algorithms converge (in various senses) to the true parameter
values as the number of observations becomes infinite.

The main contribution of the thesis consists in the formu-
lation and solution of a general identification problem in the structure
of Bayesian learning. This formulation allows the derivation of
algorithms for important classes of systems; namely linear statio-
nary, linear time-varying and nonlinear systems. Three important
convergence theorems are proved, two of which are new results in
identification. A detailed examination of the properties of the
identification algorithm for a Specific system is carried. out by

digital computer simulation.

II FORMULATION OF THE IDENTIFICATION PROBLEM
AS BAYESIAN LEARNING

Identification by Bayesian learning is a Special method
that comes under the general heading of statistical estimation.
After the structure of estimators that are useful in identification
is indicated in section 2.1, the scope of previous results in this
area is given in section 2. 2. The basic relations used to derive
algorithms are developed in section 2. 3. lrnportant implications of

using these particular relations are discussed in section 2.4.

2.1 Identification by Estimation and Learning

System identification by methods that are based. on statis-
tical estimation have f*und wide application. These methods can
account for the effects of random observation noises as well as
unknown random inputs or disturbances. Further, they may be
applied to nonlinear as well as linear systems and are applicable
to both on-line and off-line problems.

One way of formulating the identification problem of

section 1. 2 follows. Let the system state equations be

10

x :

f 2. .
k+l k( ) 1 1

E
Xk' uk, “km

for k = O, 1, . . . , where x is an n-vector called the system state,

k

uk is an r-vector of inputs, ék is a Sample from an n-dimensional

>:<

random process , fk(.) is a known vector function of its arguments

at each time tk and a iS a p—vector of fixed but unknown parameters.

The system observation equations are

1
Yk : gklxk! “R: B) 2010 2
- + Z 2 l 3
Vk ’ ”k ”k ' '

where yk is an m-vector of observations, vk is an r-vector of

2.
input observations, n1 and nk are samples from m and r-dimen-

k
Sional independent random processes, gk(.) is a known function

of its arguments for each k and {3 is a q-vector of fixed but unknown
parameters. Identification of the system represented by 2.. l. 1,

Z. 1. Z, 2. 1. 3, consists in the determination of the parameters

a and [3.

Since random disturbances enter into the system equations,

\

o. and B will in general b— imp

Q

ssihle to find exactl . Rather, suc-
cessive estimates of these parameters that converge (in some
sense) to the true values of a and [3 will be sought. In this context

an identification algorithm will learn the true parameter values.

 

pln this thesis a random variable and the values it takes on will be
denoted by the same letter. The meaning will be clear from the
context.

12

Knowledge of f k(. ) ar: gk (. i is required 8 ane identificati ion and not
classification or characterization is being considered.

Success ve estimates of a and B will be derived in the frame-
work of Bayesian estimation. Ln particular, a sequence of esti-
mators {LIDk} of the comp sLte vector LlJ = (a, .3) is sought. This se-
quence of estimators is to depend upon the observations yk and vk,

k = 0,1,. . . . The loss function m3? or) represents the loss incurred

k,
when the estimator $k based on the first k observations is used and
q; is the true parameter value. This loss function is assumed to
be non-negative and to have a relative minimum for $k = LP. De-
. k . .., k
noting by y the set oi observatiens (yk, yk 1, . . . , yo), and by v
the set of observations (vk. . . v0}, the “'rstrnnar risk or expected
loss in choosing 13k as an estimatr ref L11 5 defined by the conditional
expectation,
A . ,A , k k k k
E{L(¢ Alt/w} = Law .thay .V /¢)d(y .V)
k yk Vk k

where p(yk, vk/qJ) re ptresen s the grim, pr'bab rLty densi y of the

J;

observations conditisned an L9. Letting p(LL') denote the a-priori
. . . . . A
den81ty of the parameters, the ex pested risk of choosrng up 18
k

defined by

 

>:<

For notational convenience, in all of teh fo -13wing development p(. )
will be used to dencte the density functi 3n .rf a random variable and
the argument of the den nsity will Speeif the random variable.

l3

R($)

E{E{L($k.¢)/¢ }}

S 5 k k “$18 ‘l’lpwk: Yk/¢)p(¢)d(yk, vk, 4;)
4’ Y . V

II

That function L’Dk which minimize ROTJk) is called the minimum risk

or Bayes estimator of 41 for each k. Since

p(vkmk/wpm = p(t/vk,yk>p(vk.yk)

then
Rm?) = C p(vk,yk) L<¢p¢)p(¢/vk.yk>d(yk.vkuv)
k Uyk vk ¢

where p(LlJ/yk, vk) is the a-posteriori density of the parameters

evaluated at KIJ conditioned on yk, vk and
k k k k
p(v .y > = Sptv .y /¢>p<¢> at,
LP

Since p(vk, yk) Z 0, R($k) can be minimized at each k by minimizing
. . k k . . .

the inner integral for each y , v . This is often done by solv1ng

the gradient equation associated with R($k).

To form recursive estimates of LIJ as the observations (vk, yk)

become available, the a-priori density p(qJ) is replaced by p(LlJ/ vk-l,

- .. k-
yk 1) and a relation between p(LlJ/ vk, yk) and p(Lp/vk 1, y l) is es-
tablished by Bayes' rule. Furthermore, it is possible to show
under conditions of convex loss functions and symmetric density

functions that this minimum risk is realized by taking {bk to be the

conditional expectation [ DE] ,

l4

tk = ElLP/vk.yk}.

These conditions hold for the very important case of gaussian
random variables and squared error loss functions. This particular
case is used extensively in the development of the system identifica-

tion algorithms .

Z. 2 Previous Results in Identification by Estimation and Learning
A large number of publications deal with system identification
when random inputs and/ or observation noises are present. Only a
relative few of these utilize estimation theory and learning proce-
dures to derive recursive estimators for the system parameters.
Some recent results in this area are discussed in this section.
The basic system structure and conditions for convergence of the
identification scheme are considered. to be the important factors of
the individual investigations. Only time domain techniques for identi-
fication of discrete-time systems are considered.
A relatively simple problem in identification is studied by

Ho and Whalen [HO-1] . It is given that the system equations are
x : Ax k=0,1,... 2.2.1

2 222
yk Xk+nk "

which represent the state and observation equations, where xk is

an n-vector, A is a constant nxn matrix to be found and 11k is a

15

gaussian random vector with zero mean and covariance

T
E : : °

0 otherwise.

The system is stationary, linear, homogenous and all states are
observed with additive white noise. The authors give a recursive

algorithm for finding the A matrix that converges if

detlxr: xk+1"°" xk+n-l] %’ 0

and there exist constants X1, X0 > 0 such that

x > ll[xk,..., x THZ

>>\ >0.
1 o

k+n-1][xk’°°°'xk+n-l]
The identification algorithm converges to the true value of A with
probability one (WPl or A. E. , [PAP] ). The proof follows from an
application of Dvoretzky's theorem on stochastic approximation
[DV][ WOL].

Another type of identification problem can be formulated
as a least squares filtering problem. The solution can be derived
either as a least squares or Bayesian estimate. The system equa-

tions are

2. 2.6

where 6k is a random n-vector with zero mean and known covariance,

and A is an unknown matrix.

16

As shown by Mayne [MA] the problem of forming unbiased
least-squares estimates of A can be solved using known results
[K-Z][K-3] . Specifically, the problem can be formulated, as a linear
state estimation problem and solved using Kalman filtering techniques.
Mayne has also shown how estimates of a randomly or deterministi-
cally varying A matrix can be found, and can account for an unknown
input-gain matrix B. A serious restriction to this formulation is
that all states and any control inputs must be observed without noise.
No proof of convergence is given by Mayne.

Another method of solution to this problem is given by Fukao
using a Bayesian approach [FK-Z, 3]. In his original formulation all
states are observed without noise and the input noise 5k has an un-
known mean. Fukao‘s proof that his algorithm converges WPl relies
on results from stochastic approximation theory. An alternate proof
of his result with appropriate assumptions is given in Appendix I.

By intuitively extending his algorithm to account for observation
noises, Fukao has also shown how to handle this important case
under stated assumptions. In fact he is able to apply the algorithm
to some nonlinear systems if the assumptions can be shown to hold.

For identification of nonlinear systems with all states ob-
scured by noise, but having no input noise process, Kirvaitis [KI]
[FU-Z] has indicated how stochastic approximation may be used to

develop convergent algorithms .

l7

2.3 Basic Equations and the A-Posteriori Distribution

For the results described in section 2. Z, a different approach
is taken for each problem. Although each of the algorithms has a
similar form, their derivations, where any are given, are directed.
solely to the problem at hand. By adopting a Bayesian approach to
system identification many of the above results may be combined into
a single structure and furthermore, new results can be derived.

Consider the system of equations

: 2
Xk+1 fk(xk’ uk’ gk’ ‘1) ' 3'1
and the observation equations
- (x 1 B‘ Z 3 Z
yk _ gk k, “R, } o o
_ u + 7- 2 3 3
Vk ‘ k nk ' °

where each of the variables is defined in section 2.1. Identification
of the system consists of finding a sequence of estimates for o. and B

that converge to the true values based on observations of v and yk.

k
As indicated in section 2. l, the conditional eXpectation
k k ._ . . .

ENJ/y , v ), where 41 : (o, B), 18 the optimal Bayes estimator for LP
based on k observations. Thus, if it is possible to obtain a recursive
relation, either approximate or exact, for this estimator as a func-
tion of observations and previous estimates, then a solution to the
identification problem will have been found. Of course if an approxi-

mation is used then convergence of the estimates to the true value

must be e stablished .

18

To obtain the desired conditional mean, the a-posteriori den-
sity of the unknown parameter vector Ll» is needed. This can be found
as follows.

k+1 k+1
p(¢.Y .v )

(' k k v )
pw,y ’V’yk+1’ k+1

k k k k k k
= p(yk+1.vk+l/¢.y .v )PNJ/Y .v )p(y .V)

 

 

Also
k+1 k+1 k+1 k+1 k+1 k+1
p(¢,y :V ) : p(qJ/Y 3V )p(y ,V )
Thus
k k k k k k
(W k+1 k+1) _ p(yk+1.vk+1/¢.Y .v )PNJ/Y .v )P(Y .V)
p V ’v ' k+1 k+1
p(y .v )
Since
k+1 k+1 k k k k
p(y .v )- p(kaNkH/y .v Mr .V)
then
k k k k
PW/Y .v )= k k 2.3.4
£363“vathr .v )
where

k k k k k k
p(ka’VkH/Y .V ) = S¢P(¢/Y .v )p(Yk+1.vk+l/LP.Y ,v )dqa.

Equation 2.. 3.4 is a statement of Bayes rule and is a recursive

k+1 k+1
, v

relation between the probability densities p(¢[y ) and

p(LIJ/Yk, vk). The form of this relation indicates that as observations
yk+l' vk+1 become available the conditional density for up can be up-

dated. Since the true value of up is a constant, if certain conditions

19

on the densities hold,then [SP] it is possible to show that the condi-
tional density p(¢/yk, vk) approaches an impulse function centered
at the value L11.

The density p(y v /4J yk vk) is required in the relation

k+1’ k+1 ’ ’
2. 3.4. This is obtained from the state and observation equations
2

2.. 3.1, 2. 3. Z and 2. 3. 3 and from the distributions of 11]: and 11k.
However, this is generally a very difficult task except in Special

cases.

2. 4 Alternate Forms for the A-Posteriori Distribution

There are two alternatives to the direct use of equation
2. 3. 4. These alternatives will be developed here and shown to be
of a less tenable form than equation 2. 3.4.

One iterative expression for the a-posteriori density of 4;

can be derived as follows [AO-l] , for a simpler problem having

k k, k

k

k k
= p(¢.xk/Y )p(Yk+1/¢.xk.y .xk+1)p(xk+1/¢.xk.y ).

Also

k k+1 k
p(¢.xk+1.Yk+I/Y )=p(¢.xk+l/Y )P(Yk+1/Y )

k
I
- d
- S p(w.xk.xk+l.yk+l/Y ) xk

k

20

Combining these relations gives

k k k
S p<y1<+1/4"Xk’Y '"k+1)p(xk+1/‘IJ’xk’y )pW’Xk/Y )dxk

 

k+1
PN‘: X /y ) _
k+1 k
P(Yk+1/Y )

20 4o l
where p(yk+l/yk) equals the integral of the numerator over (4:, xk+1).
Since

k+1 k+1
: 2. .2
p(i/y ) 5 p(¢.xk+l/y >dxk+1 4
X
k+1

the relations 2.4.1, 2.4. 2 give a recursive relation for the a-posteriori
distribution of 4;.
The indicated density functions in 2.4.1 simplify because,

from 2. 3.1 and. 2. 3. 2 assuming Ek's and ni's are independent,

k

p(vk+1/¢.xk.y .xk+1) = P(Yk+1/f3: ka)

and from 2. 3.1

k
P(Xk+l/LP, xk. Y )= p(xk+l/a. xk) .

DeSpite the apparent usefulness of 2. 4.1 there is a serious
drawback to its application. This is the difficulty of performing
the required integration. Even with simple gaussian random variables
and a linear system it is not clear that the integration can be carried
out in closed form. Even if it were possible to do so, there is no
reason to susPect that a recursive relation between conditional
means for up, which are the optimal Bayes estimators, would be

obtained .

21

Another form for the a-po steriori distribution of LIJ can be ob-

tained using 2. 3. 4, and noting that (again assuming 11k = vk)

k k k
p(YkH/tbm )- 5X p(yk+1/Lb,xk+1.y )p(xk+l/Ll4.y )dxk+1 2.4.3.
k+1

Since
PW /¢ x Yk) = p(y /Bx )
k+1 ’ k+1’ k+1 ’ k+1

this would appear to be a useful relation. However, establishing the
density p(ka/kp, yk) presents in effect an optimal nonlinear filter-
ing problem. Even in the linear case with gaussian random variables
the problem would. be very difficult since it requires the solution of
an optimal (Kalman) filtering problem for the mean Qk+1 as a function
of up and all the past observations yk. The covariance matrix is given
by the solution to a Riccati equation.

To establish that the exact form of p(ka/xp, Xk+l' yk) is

indeed difficult to work with and furthermore to establish a basis

for approximation, the foliowing linear example is considered. Let

X
n

k+1 “ xk + gk

Yk : Xk + ”k
where all variables are scalars, while gk and nk are zero-mean,
white, gaussian processes. To utiiize 2. 4. 3, the density
/¢, yk) is required. From well known results [K- 2, 3],

1’ (xk+1

this density is normal with mean 32 and variance P , where
k+1 k+1

22

P
A z A _ _ 9 2
Xk+1 ‘1 Xk + r (Yk k) “4'4
and 2
1Dk
: Z — ——--— 2.
Pk+l a Pk+ q r . 4. 5

The terms q and. r are the variances of Ek and 71k, respectively,

(assuming stationarity). The observation equation yields

k , *
p(vk+1/¢,xk+l.y )NN(xk+1 . r).

Using these relations in equation 2. 4. 3 it can be shown that the

exponent of p(ka/tp, yk) equals (except for the -l/ 2 factor)

—— { — 9 r"
r + P yk+1 k+1 °
k+1

Therefore

. k A .
p(YkH/W’Y ) N("1<+1° r + Pk+1)

. . A ,. k .
U81ng 2. 4.4 to write x as a funC';;cn of y gives

k+1

A _ _ \( _ p _ A
xk+1 — (a Pk/rho. Pk_l/r)....(a PO/r)xO

P
x, . _ _2
+(a-Pk/r)(a-Pk_1/I)....(a Pl/r) r Yo

Pl
+(a - Pk/r)....(a - 132/1“)? Y1

P I
k-l
+ (a - Pk/r) —— yk_1
.i
r yk

 

*
The symbols ~N(a : b) are used to denote the normal distribution
with first argument as mean, and second argument as covariance.

23

where the Pj's satisfy 2.4. 5. This is a rather involved expression
for ka, and is inconvenient for two reasons. First, to obtain
a recursive relation for a from 2. 3. 4 would require storing all
observations, which is undesirable for practical reasons. Second,
the expression in the numerator of 2. 3.4 must be maximized as a
function of a to find the conditional mean. This would be a difficult
task because of the form of Qk+l°

A simple approximation to the density p(yk+1/¢, yk) can be

obtained from 2. 4. 4. By taking 32 2. 4. 4 becomes

kzyk!

A .—
Xk+1 - ”k 2.4.6

and the variance may be approximated as

Z
P +r=q+r+ZaP-Pk/rf_q+r+s}:=s

k+1 k k

for some 31:) O. This is eSpecially convenient because if p(a/yk)

. , _ k k
has a normal form in 2. 3.4 then the product p(a/y )p(yk+1/u, y )
. . . . ,. , k+1
1s ea31ly max1mized With reSpect to a and p(u/y ) has a normal
form also. That is, the density p(a/yk) reproduces [SP] and only
one observation needs to be retained.

The relation 2.4.6 is sometimes exact. If there is no system

observation noise (r 5 0) then 2.4.6 is exact since yk : xk and the

variance for p(yk+l/a,yk) is s = q. This is the case studied by

Mayne [MA] and one of the cases considered by Fukao [FK-Z, 3].

With these results in mind it would appear that using 2. 3.4
directly is a more logical way to proceed than either of the indirect
ways indicated. Equation 2. 3.4 gives a very fundamental relationship
between the density functions. More detailed information about the
density functions can be supplied by the form of the system being con-
sidered. As was seen in the simple scalar example, some approxi-
mation may be necessary in the utilization of 2. 3.4 but the effect of
these approximations is more clearly seen in this relationship than
in the others described. Furthermore, the algorithms which result
from this relation are quite simple and can be realized by conventional

computing te chnique s .

2. 5 Summary

Under the general structure of Bayes estimation it has been
shown that a general class of identification problems can be formu-
lated. In examining previous results it was seen that there is little
uniformity of approach. Ho and Whalen's result for the linear
stationary autonomous system is derived ad—hoc and the convergence
follows from stochastic approximation. Mayne's work is a refor-
mulation of linear filtering and does not allow any observation
noises. Fukao's results apply only to stationary systems with no
input observation noise. Further, no derivation of results is given
or motivated and only a single restrictive convergence theorem is

given. A single point of reference in attacking these problems is

25

provided by Bayes' rule, which also allows computationally feasible
algorithms to be developed. Because of the difficulty in forming the
exact density functions required by Bayes' rule, an approximation

was seen to be desirable.

III SOLUTION OF THE BAYESIAN
IDENTIFICATION PROBLEM

The general form of the problem to be solved has been given
in Chap. II. In Sec. 3. 1 an iterative equation will be derived that
relates the estimates at time k-l to the kth estimate and observa-
tions. The conditions under which this algorithm converges to the
true value of the parameters and the sense of convergence are given
as convergence theorems I, II and III in Sec. 3. 2. This result is

related to those given by other authors in Sec. 3. 3.

3. l Derivation of the identificaticn Algorithm
In this section, the identification algorithm for a class of
systems is derived using 2. 3.4. In particular, for the system of

Figure 3.1.1 the equations are now assumed to be of the form

: ' \ : .l ‘l \
xk+1 fk(xk, uk, gk, (1, Dk‘xk' akja + ék 3 . 1.1
- x + 1 3 1 2
YR ’ k “k ' ‘
- u + Z 3 1 3
Vk _ k nk . 0
where Dk(.) is an n x p matrix, a is a fixed but unknown p-vector

26

Z7

 

 

 

 

 

 

 

 

 

 

 

 

and
E(§)-E(1)-O E(Z)-O 314
k - nk " ! nk " . . a
me, tT)=Q k2) 3.1.4b
kj k’
:0 , kij 3.1.4C
1
’ék m
.<
u k+1 - x
k Delay k I
l l > Dk(.)a A ‘ a“ 3k
2
“k

 

 

 

 

k
Figure 3.1.1 General System Configuration
1 Z T .
E(nk[nj] ):O for all k,J 3.1.4d
E(i[ i]T)-Ri fork-"i-IZ 3l4e
nk nj _ k — J 9 _ l 0 .
= O forkyij‘,i:1,2 3.1.4f
i T . .
and E(§k[nj] )2 O for all R, J', 121, 2 3.1.4g

Since I3 is known in 3.1. 2, 2. 3.4 becomes

k k k k
H vk+1 p(a/Y ’V )p(yk+1’vk+l/a’y ’V )

k
p(G/Y 9 ) Z k k 3.1.5
(V v /Y v )
p k+1’ k+1 '

 

28

In 3.1.1 - 3.1. 4 assuming v are independent and that

d
k+1 3“ yk+1

vk+1 is independent of a, vk and yk gives

k k k k
p(yk+1"’k+1/“’y 'V ) ‘ p(ykn/a’y 'V )p(vk+l) ’

Thus p(ka) can be canceled in the numerator and denominator of

3.1. 5. This gives the recurrence relation

k k k k
k+1 k+1 Phi/y .v )P(Yk+1/a,y ,v )
P(G/Y ,V )2 k k 3.1.6

 

where the denominator equals the integral of the numerator over a.
The vector u is assumed to have a gaussian a—priori d is-

tribution, with den 3 ity

-1 A
- % (a -80)TPO (Ct-(10)
o) e

O V(Zn)pdet(PO)

where det(.) denotes determinant.

 

p(u/YO. V

 

O O
i, ’A o
p.a/Y .V )~N\a0. P0)

The form of the density function po(a) and the mean and co-
variance matrices are assumed to be given and reflect the a-priori
uncertainty concerning the parameter vector u. If th e a-priori in-
formation concerning a is small, then the Po matrix is large in
norm (c.f. Sec. 3.2). If the true value of o. is known with some

certainty then the matrix P is small in norm. The initial mean
0

7-9

value 60 is taken to be the best available estimate of the tr ue

value of a.

The density function p(yk+1/a, yk, vk) is now needed . From

3.1.1, 3.1.2, 3.1.3

1
yk+1 ‘ XL<+1Jr T'1<+1
- D (x u)u+§ + 1
' k k’ k k “k+1
1 Z 1
’ Dk(yk- T‘k’ Vk‘ T‘k)‘1 +§k +quk+1'

This equaticn gives a basis for approximating the function

p(yk+1/u,yk, vk) as discussed in Sec. 2.4. Taking

k k
E(Yk+1/Q:Y 9V ) — Dk(yka Vk). 0-

_ k k
and cov {yk+l/a,y ,v } = Sk

where Sk is larger (in norm) than Qk + Rk’ the density is approxi-

mated as normal. Thus

k k
‘ ~ I . . 3 o 0

At the kth step suppose

k -
p(a/Y ,Vk)~N(ak', Pk) . 3.1.8

o t q - u o k
Since p (a) 15 normal, if a recurswe relation between p(a/y , v )
o

and p(a/yk+l, vk+l) is obtained having the a-posteriori as normal then

a complete algorithm will be established. The a-posteriori mean

and covariance will define the algorithm, with the mean being

taken as the estimator of a.

30
Using 3.1.7 and 3.1.8 in 3.1.6 gives

k+1 k+1
p(G/Y s V ) :

1 T -1
kexp-EUa-ﬁk) Pk (a—Gk)+(yk+l-Dk(y k,vk )(1) Tskl (Yk'l'l- D k(Yk. Vk)a)}

 

C [numerator] do
“a

3 . l. 9
wh ere k is a constant. Since it is known that the exponential family
reproduces [SP], the density p(a/yk+1, Vk+l) is normal. Thus it
is only necessary to find the maximum value of the numerator to

find the conditional mean. The exponent of the numerator (except

for the constant - 1/2) equals

-1 T -1
Pk + Dk (yk,vk)Sk Dk(yk,vk)}a

T -l A T -1
.. 2 o o
0. {Pk Gk + Dk (yk, Vk)Sk yk+1} 3 l 10

T —1 -1
A
+ {Gk Pk“ k +yk+lsk yk+1}'

To maximize the numerator cf 3.1. 9, the quadratic form 3 .1. 10
must be minimized. This is accomplished by taking (1, and thus

the a-posteriori mean Gk“, to be

T
A —
a _ {PkA k+13k (yk,v )s } 3.1.11

k+1 P+l< k k yk+1

where

The matrix Pk+1 is the a-posteriori covariance matrix ass ociated

with p(a/yk+l, vk+1). The details of the argument are found in

Appendix II.

31

Equation 3.1.11 and 3.1.12 can be written in a mor e con-

venient form

A "lA T -l
= P + D , s
C1k+1 pk+1 k “k pk+1 ka Vk)1< yk+1
A ‘1 '1 A T '1
= +P p - +p D , s
“k k+1( k Pk+1 )ak k+1 k (yk Vk) k yk+1
From 3.1.12
13'1 P ’1 — D ( )s"1D( v)
‘ k+1 " " k Yk’Vk k kyk’ k
so that
G :6 +13 {-DT( v)S-1D ( v we +p DT( v )s’1
k+1 k k+1 k Yk’ k k kyk' k k k+1 k yk’ k kyk+1
=6+P DT(y VIS-l[y -D(y v)8] 3113
k k+1k k'k’k k+1 kk’ kk "
Using a matrix inversion lemma [AO-l][HOU] , pk+l can be written

as, from 3.1.12

)P

T - T -1

k+1 k k k
3 .1. 14
Equations 3.1. 3 and 3. 1.14 represent the identification
algorithm for the system 3.1.1, 3.1. 2, 3.1. 3. This algorithm
gives a Bayes-optimal sequence of estimates for the vector (1 when~
ever the system is linear and no observaticn noises are present.
Otherwise, the estimates are sub-optimal. The form of 3.1.13
is of interest and has intuitive appeal. From 3. 1. 1, the te rm

D , v )u can be interpreted as a prediction of the next obser-

ka k k

vation yk+l' based on the present observations yk and vk and on

32

. . A .
the present parameter escimate (1k. Therefore the differe nce

D (

(ka' k )Gk) represents an error term, and the present

yk! Vk
estimate 3k is updated by an amount proportional to this er ror.

The algorithm 3.1.13, 3.1.14 is actually a generalized stochastic
approximation algorithm [DV] [ BL] [ KIE] [ CHU] [ROB] with a gain

sequence that depends upon the observations rather than being

specified a-priori.

3.2 Convergence Theorems and Assumptions

Convergence of the general algorithm 3.1.13, 3.1.14 will
be considered here. Conditions under which this algorithm con-
verges to the true value a, and in what senses will be given. The
proofs for the three theorems to be stated are found in App endix 1.

Equation 3.1.13 can be written in error form by defining

_ A _
6k — uk (1
and writing
A A T -l T -1 A
_ = .. + D S - P D
“k+1 ‘1 “k “ Pk+1 k k yk+1 k+1 kSk Dkuk
01'
T «1 T -1 T -1
z D ‘ - p D c — p D
Ek+1 €15" pk+1 ksk yk+1 k+1 ksk Dk‘k k+1 k Sk Dk“
T ..1 T -1
= - D ‘ + D - D
Ek+1 (I Pk+l ksk Dk)Ek Pk+1 ksk(yk+1 k“)
where Dk is assumed to represent Dk(yk, vk) for convenience. From

3.1.12

 

33

-1 T -l
: P D o
Pk+1k +Pk+1kSk Dk
Substituting this into the equation for €k+1
e — P P’16+P DTS'1( D0) 3’21
k+1—k+lkkk+1kkyk+1k° "

Equation 3. 2.1 relates the estimation error at the (k+1)St step to the
previous error and the true value of the parameter (1.

Conditions under which 6k approaches zero and in what sense
are now given. The concepts of system identifiability and strong
identifiability are introduced. These concepts are fundamental to
the convergence theorems, and indicate whether sufficient observa-
tional information about the system is available. Other assumptions
required for convergence are also stated.

The norm of a vector X and a square matrix A as used in

this discussion are defined by

2
H XHZ = 2 'x I
i
and
>:<
H All = x (A A)
max
where (’1‘) denotes conjugate transpose and X ax(.) denotes the
m

largest eigenvalue of the matrix argument.

Definition 3.2.1: The system 3.1.1, 3.1.2, 3.1.3 is said to be

identifiable if there exists a finite positive integer q such that the

 

matrix sum

34

k+q
2 D.T 371D.
jzk 1 J J

is positive definite with probability one (WPl) for any k.

Assumption 3.2.A1 . ”30- all < 00 and O < HPOH < oo.

Assumption 3.2.A2 : E{(y, _ Dja)/’yJ, v3} = 0

‘J+1
k k
. T -1 T
Assumptlon 3.2.A3: HE{ E D, S, (y,+l- Dja) Z (y.+l- D.a)
j=k-q '1 J J j=k-—q J J

SZIHI _<_ M < oo
.5

Convergnce Theorem I: If the system 3.1.1, 3.1. 2, 3.1. 3 is

 

identifiable and assumptions 3. 2.Al, 3. 2. A2, 3. 2. A3 hold then
the algorithm 3.1.13, 3.1.14 converges to the true value of the

parameter vector u in the sense that

Hcov(8k- a)“ : Hcov EkH»O as k -+oo .

i. e., the norm of the error covariance matrix converges to zero.
Assumption 3. 2.A4:

k
-1
E P ' +P >3 D. . .
”Pk“ k-q e1<.q k+1 k S; (Y

-1 2
5 EHPk+1Pk_q ek_qll + EHP

Assumption 3. 2. A5:

k 2
Ell 2 D. (y. Djaﬂl 5 M< oo .
:k-

J

35

Convergence Theorem 11: If the system 3.1.1, 3.1. 2, 3.1. 3 is

 

identifiable and assumptions 3. 2.A1, 3. 2. A4, 3. 2.A5 hold then the
algorithm 3.1.13, 3.1.14 converges to the true value of the para—

meter vector u in the sense that
2 A 2
EUIEkH } = EHlak- all }-> 0 as k -> 00.
i.e., the error norm converges to zero in mean square [PAP].

Definition 3. 2. 2: The system 3.1.1, 3.1. 2, 3.1. 3 is said to be
k

 

strongly identifiable if the minimum eigenvalue of E D.TS-.lD.
satisfies
k
x . {2 D.Ts.'1D.} > ckp
min ij J J J —

with probability one (WPl) where 1 < p < 00 and O < c < 00 are

constants .

k
Assumption 3.2.A62 H E D.T5:l(y.

. , +1
J=k~q ‘ J
probability one (WPl‘j, where M is a constant.

- 13:0)” < M < 00 with
J __

Convergence Theorem 111: If the system 3.1.1, 3.1. 2, 3.1. 3 is

 

strongly identifiable and if assumptions 3. 2.A1, 3. 2.A6 hold then
the algorithm 3.1.13, 3.1.14 converges to the true value of the

parameter vector (1 in the sense that
”6k” 5 ”316 uH-e O as k —» co

with probability one (WPl).

36

This theorem is equivalent to one given by Fu kao [FK-3], but
the proof given in Appendix I is distinct.

Assumption 3. 2.A1 is common to the three converg ence
theorems and is very reasonable. Assumptions 3. 2.A3, 3. 2. A5 and
3. 2.A6 are constraints on the magnitude of the random variables and
their moments, and are easily satisfied. Assumption 3. 2. A6 cbes
not hold for gaussian random variables. This is not a stringent con -
dition because in practical situations all variables are bounded.

Assumption 3. 2.A2, as required by Theorem I, is true for
linear systems, but is not true in general for nonlinear systems.
Theorem II is similar to Theorem I. In the former, assumption
3. 2. A4 replaces 3. 2.A2 and allows convergence in mean square of
the error norm as opposed to convergence of the error covariance
norm. Theorem 11 applies to linear systems, but in general will
apply to nonlinear systems only if 3. 2. A4 can be verified.

The condition of identifiability is required by both Theorem I
and Theorem II. Conditions under which a system is identifiable ar e
discussed in Chapter V. The strong identifiability of the system as
required by Theorem III is sufficient for identifiability, as shown

"‘

in Appendix 1. Theorem III may be applied to nonlinear systems.

3. 3 Derivation of Previous Results as Special Cases
The form of the system equations 3.1.1, 3.1. 2, 3.1. 3 are
sufficiently general to permit derivation of known results as Spe-

cial cases.

37

The problem studied by Mayne [MA] is considered first.

Equation 2. 2. 5 may be rewritten as

T
Xk+1 ‘ Xka+§k
and
yk : Xk
where
r- T — ‘1
xk 0 al
T T
Xk : Xk , O. — ] 3.2 3 3.1
o I o
0 “J g
k T n

 

 

 

and aiT is the ith row of the matrix A. In the algorithm 3. 1.13,

3.1.14, the D .) matrix is taken as

k(
Dk(y'k, Vk) = Dk(yk) = D(Xk) = X 3 . 3. Z

k
Since there are no observation noises, p(yk+1/a, yk, v ) need not

be approximated since

k k k
p(yk+l/a3y ,V )- p(yk+1/a9x)

and.
y :x =Ax +§ =XTQ+§
k+1 k+1 k k k k
Therefore w1th 5k: Qk
k T ,
p(yk+1/a.x )’VN(Xk a , Qk)

38

where 0k is the covariance matrix of the gaussian random variable

gk. The definitions for a and Dk in 3. 3.1 and 3. 3. 2 are then used
with the algorithm 3.1.13, 3.1.14 to solve the problem. If the
matrix A varies with k then Dk is changed at each step accord-

ingly. If inputs are also applied then D (.) must include these noise-

k
free observations.

The results of Fukao [FK—Z, 3], for a linear system with no
observation noise, are the same as those given by Mayne and can

be derived accordingly. When observation noises are present at

the output, then uk: v and Dk is taken to be

 

 

k
T T I
o
Yk O “k
Dk(yk, Vk) : D(Yk» uk) : o . . .
T T

0 yk 0 uk

L .1

The nonlinear problems studied by Fukao are also included in the
formulation 3.1. 1, 3.1. 2, 3.1. 3. For his formulation there is no

observation noise on the inputs so that v = u Also, time-vary-

k k.

ing systems are not considered so that the D .) matrix remains

k(
fixed,
Dk(yk, vk) = D(yk, uk) .
The identification of the system studied by Ho and Whalen

[HO-1] is easily handled. There is no input noise, so Qk’ the

covariance of ék, is identically zero. There are no inputs so

39

uk 2 vk = 0, and the matrix A is fixed. All states are observed.

Therefore in 3.1.13, 3.1.14

Dkwk. vk) = D(Vk) = '.

 

 

and this type of system can be identiﬁed accordingly.

Kirvaitis [KI] [ FU-Z] considers identification of nonlinear
systems with all states obscured by noise and with no state dis-
turbances. If the system is taken to be time-discrete then
3.1. 1, 3.1. 2, 3.1. 3 are sufficiently general to include this case,

taking gk = 0, and. vk 5 uk.

3. 4 Summary

An identification algorithm 3.1. 13, 3.1.14 has been derived
for the general system 3.1.1, 3.1.2, 3.1. 3. This was accom-
plished using a reproducing gaussian distribution for the para-
meters which specify the system. The convergence of this
algorithm to the true parameter values is established by convergence
Theorems 1, II and III in Sec. 3. 2. Theorems 1 and II are new
results in identification, and the algorithm 3. 1. 13, 3. 1. 14 is
more general than any given previously. The algorithm can be
Specialized to give results of other authors as indicated in Sec.

3.3.

IV SPECIAL FORMS OF THE ALGORITHM

System equations of the form

xk+1 : Dk(xk’uk)° +§k

1
Yk ‘ Xk+nk

Vk : uk+nk

have been considered in Sec. 3.1. The identification algor ithm
for these systems has the form 3.1.13, 3.1.14. For a given
system structure, implementation of the equations requires

that the Dk(yk, vk) matrix be found. Special forms of Dk(.) for
some important classes of systems are given in this chapter.

The convergence theorems of Sec. 3. 2 are discussed in relation
to each system. The formulations for linear, stationary, and
time-varying systems are given in Sec. 4.1 and 4. 2, respectively.
Nonlinear stationary, and time-varying system formulations are

given in Sec. 4. 3 and 4. 4. Section 4. 5 discusses the formulation

40

41

for systems with generated inputs, for systems with state distur-

bances having unknown mean, and systems with indirectly ob-

served states.

4.1 Linear Stationary Systems

When all states are observed the equations for a linear,

stationary system have the form

xk+1= Axk + Buk + gk
-x + 1
Yk‘ k “k
Vk‘uk T'k

where A is an n x n matrix with rows aiT and B is an n x r matrix
with rows biT. The entries of the matrices A and B are to be

identified. The state equations may be written as

xk+l : [A B] xk + gk
“k
_ .1 _
T T F
al b1 xk
= : + ék
a T b T u
n n k
_ _J ._ _.

 

 

 

 

Rearranging into the form 3.1.1

42

 

 

 

 

I
T T
xk 0 : uk 0 a1
T T a
Z 2 o
xk+1 xk : uk + gk 4 1 l
O I . O
O x TI 0 u T an
k k b
_ I a l
b
n
T T
xk+1 ‘ [xk Uk 1°+§k ' D(xk’uk)a+gk 4'1'2

Thus for the linear, stationary system, the Dk(.) matrix is defined
by 4. l. 2.

Conditions under which such a system is identifiable can
q

now be given. For 2: D,Ts.'1D. to be positive definite (WPl)
j=1 J J J
xT(‘Z D.TS.'1D.)x 2' 23 xTD.TS.-1D.x = 2(D.X)TS.-1(D.X) > 0
j J J J j J J J J. J J J

for all x 7! 0, where Dj denotes Dj(yk, vj). Since Si"1 is positive

definite, no x 7! O can exist for which

29
N
n
>4
II
o

 

 

where K) has nq rows and n(n+r) columns. This equation has only

43

the trivial solution x = 0 if the rank of E is equal to n(n+r) so that

q 2 n+r is required. The 9 matrix has the form

I I l
0 O O
Y1y I Y2 I I yq
l I I I
T _ O ’ l O ‘ | l 0
3 Y1 | Y2 I ' ' 'I Y<21
v1 . 0 : v2 0 : : vq 0
0 v1 | 0 v2 I | O vq
L I I I _

 

 

so that the rows of 9 (columns of QT) are linearly independent if
any (n+r) of the pairs [yiTvi ], i = 1, 2, . , ,q, are linearly inde-
pendent. This is satisfied (WPl) if both yi and vi contain additive
noise components. The noise process gk guarantees (WPl) the
linear independence of the yi vectors, which is sufficient for the
linear independence.

To apply convergence Theorem I of Sec. 3. Z to linear
stationary systems, assumptions 3. 2.A1, 3. 2.A2, 3. 2. A3 must
also be satisfied. Assumptions Al and A3 are very reasonable.

Assumption A2 is satisfied since, with all random variables

having zero mean

E(v.

-D.a) = E(. -A .- Bv.)
J+1 J y yJ J

3+1

: E(x,

1 2
- An. - Bn.)
J+1 J J

1
- Ax.- Bu.) + E(n,
J J J+1

1 1 2
= Eé.)+E . -A .-B.
(J (TIJ+1 TIJ TIJ)

= O

44

Therefore Convergence Theorem I applies to the linear stationary
case and the algorithm 3. 1.13, 3. 1. l4 converges accordingly. Con-
vergence Theorem II may also be applied if assumptions 3. 2.A4

and 3. 2.A5 hold.

4.2 Linear, Time-Varying Systems
When all states are observed the equations for a linear, time-

varying system have the form

N
II

k+1 Akxk + Bkuk + gk

 

 

 

 

vk = uk +nk
Using
7 T _ F T — _ '—
al (k) Io1 (k) al(k)
a (k)
T T 2
Ak= a2 (k) . Bk = b2 (k) . wk).
a.(k)
T T n
an (k) bn (k) b (k)
__ _ _ .J l
bn(k)

 

 

where aiT (k) and biT(k) are the rows of Ak and ER respectively,

the variation of 7(k), and thus the variation of Ak and Bk’ is

assumed to be governed by

45

7(k+1) = Auk“) 4.2.1
Z(k+l) = GZ(k) 4.2.2

The matrix LA is unknown, Z(k) is a q-vector, and 7(k) is an n(n+r)-
vector. The matrix G and the initial value ZO are known.

From 4.2.1 and 4.2.2

7(k+l) = 942(k+1) = cAG Z(k)=cAGk+1Zo

 

 

Then
._ I .—
T T
xk O : uk 0
Akxk+ Bkuk = .. l .. 700
TI ° T
O xk I 0 uk
T T k
_ [xk Uk 194020,
k .
The vectorc/iG Z0 can be written
LA sz : (A h : H T a.
o k k
where
T T T T
a' = [a' ..o.' ],,_A = [(1.02 0'] 4.2.3
and
hkT o
T T k k-l
: : Z . 020
Hk hk , hk G Z0 G G Z0 4 4
0 hkT
L J

 

 

46

Therefore

_ T T T ,_
Axk+Buk— [Xk Uk ]Hk a — Dk(xk’uk a 4.2.5

where a' generally contains both known components and unknown
components a which are to be determined- Equations 4. 2. 3, 4. 2. 4,

o 20 . g o
4 5 define Dk(xk uk)

Example 4. 2.1. The use of 4. 2. 3, 4. 2.4, and 4. 2. 5, is studied. in

 

this example of a time-varying linear system of second order.

- qr _ — — ‘P — —l — -

 

 

 

 

 

 

 

 

 

 

xk+1 aLo+al°k a2 xk b1 g1k
: + u +
2 k
yk+1 a3k a4k yk b2k ng
_ .. .__ -— — _I. L d. b _I

The a1 and bi are unknown parameters. The vector '7 is

then given by

2
1(, a k,13, b

4 1 2k]

k, a2, a

3

T
7 (k) = [ao+ad

01'

7T(k) : [ao+a k”), a , a k(2)

1 Z +ak

3 3 (l) a karb

(1)
.4 k1

, b

1 2

where the superscript (. ) denotes a factorial polynomial an d is
defined by [HAM]

(n)

x : x(x-l)(x-2)...(x-n+l), n_>_1

(0)

and x =1. The terms in the second. expression for 7T(k) are

obtained from Sterling numbers of the second kind.

47

Equation 4. 2.1 is written as

 

 

0 a1 a0
0 0 a2 'k(2)—
wk) = vimk) = a3 a3 0 k“)
0 a4 0 km)
0 0 bl '— "
LO b2 0

 

 

and 4. 2. 2 is written as

Z(k+1)

II
0
N

E

— q _ -—

(k+l)(z) 1 2 o k
(k+l)(l) = o 1 1 k(

(k+l)(0) o o 1 k

— — I-— _u~

and Z0T = [O 0 l] . These relations are used with 4.2. 3, 4. 2.4,

 

 

 

 

 

 

,u).

and 4. 2. 5 to determine D xk k

k(

From the cA matrix, the (1' vector to be found has the

form

T
l ._
a — [Oa1a000a2a3a300a4000b1ObZO] .

Since some of the components of a' are known, further reduction of

the problem is possible. For some k,

Then the product H

To ', where H

k

 

written as, eliminating the zeros of a'

Re-ordering the column vector and eliminating the redundant a

gives

Multiplying H

 

 

k

 

row

'0!”

0 0‘0.-

 

 

r

(02a)

 

3

 

 

w

O‘Cl"!u
NHIFUND-‘O

 

 

L _

T on the left by [XkTUkT] gives

T is defined by 4. 2.4can be

3

 

 

 

 

49

[XTUT]HTQ,_ Xkyk O O “k O (”302 0 8'0
k k k - o 0 o ‘03 8L1
Xkyk “k a
_ ((01+w2) a2
002 a3
w 4
o 3
b1

 

 

 

 

so that, performing the indicated multiplication,

ka3 kaz ykw3 O O u (1) O a
D (x ,u )-o. =
k k k 0 0 O ( + )

Xk “’1 “2 3'sz k 2

 

 

L. .J

When the vector 0. is found, 7(k) can be calculated, and from this,
AR and Bk.

Assumptions 3. 2.A1 and 3. 2.A3 as required by convergence
Theorem I of Sec. 3. 2 are easily satisfied. Identifiability will be
satisfied generally. An indication of why this is true is given by
the example of this section. Assumption A2 is also satisfied so that

the algorithm will converge. Assumptions A4 and A5 may als 0 hold

so that the algorithm would also converge in the sense of Theorem 11.

50

4. 3 Stationary Nonlinear Systems

The system equations for a stationary nonlinear system with

all states observed are

.. f(

xk+1 ’ Xk’ ”16“) + gk

1

Yk " ka‘k
2

k k k

v :u+'r]

where f(.) is a vector valued function. Its components satisfy

fi(xka uk: 0.) = [fl]. (xk’ uk), {12(xk, uk): 0 o - : flip (xks “kn ° 3

for i : 1, 2, . . .n. Each of the n- p functions fij(' ) is assumed

known. Therefore, the required Dk(°) matrix is

 

 

r _
f11(xk,uk) . . . f1p(xk’ uk)
Dk(xk’uk) = I Z 4°3'1
fn1(xk’uk) ° ° ° fnp(xk'uk)J

Example 4. 3.1 : To illustrate, the third order nonlinear sta-

tionary system

 

 

_ _ F g _ l _ _,
Xk+1 ao+ a‘1"15r azxkyk O glk
2
yk+1 ' a33‘1: + a‘4.”4. + ID1 “k + ng
zk+1 assm Zk b2 g3k

 

 

 

 

 

 

is consideredwhere all variables are scalars.

 

51

This may be written as

 

 

 

 

 

 

 

_ _ _,,_ i _ _
ka+1 1 xk xkyk 0 O O O 0 a0 glk
— o o o x2 0 u 0 a1 + g
yk+l ‘ k yk k 2k
aL2
zk+1 o o o 0 o Slnzk 0 uk a §3k
L _ L _ 3 _ _
a4
a5
b
1
sz
01' -
xk+1
yk+1 : D(Xk’yk’ zk’ “13° ‘1 + gk
zk+1

 

 

Convergence Theorem I is not generally applicable to non-
linear systems since assumption A2 may not hold. If assumption
A4 can be justified, then Theorem II can be applied. Generally,
however, Theorem III will be required for nonlinear systems.
Assumptions Al and. A6 hold easily. Identifiability generally
holds, as can be conjectured from the form of Dk(° ) in the pre-
ceding example. Assuming strong identifability gives convergence

in the sense of Theorem III.

4. 4 Time-Varying Nonlinear Systems
The system equations for time-varying nonlinear systems

with all states observed have the general form

52

= f
Xk+1 k(xk’ uk’a) + ié‘k
- x + l
yk ‘ k "k
Vk : uk+nk -

By analogy with the stationary nonlinear case, the Dk(°) matrix

has the form

— —

fll(k,x u) . . . f1p(k,xk,uk)

k’k

Dk(xk’ uk) 2 . .

 

fn1(k. Xk’ 11k) . . . fnp(k.xk. 11k:

 

where all the fij (k, x , uk) are known.

k

Example 4.4.1 2 To illustrate, the second order system

x a kx 3 + a y

k+1 : o k 1 k + 1k
, 2

yk+1 azyk51n k §2k

where all variables are scalars is considered. This may be writ-

ten as
kx 3 0 F a _ g
xk+1 k yk 0 1k
: Z a +
ka o o yksmk 1 §2k
aL2

 

 

53

As in the stationary case, if assumptions A1, A6 are satis-

fied then convergence Theorem III can be applied.

4. 5 Miscellaneous

Simple generalizations of the above results are considered.
in this section. Sec. 4. 5a considers systems with generated in-
puts. Sec. 4. 5b considers systems having state disturbances
with unknown means, while 4. 5c considers systems with indirectly

observed states.

4. 5a. If the input process u is generated by the equation

k

ULk+1 : cuk

where C is an unknown matrix the function uk can be identified by
the matrix C. The observations of uk are obscured by additiv e

noise.

Vk : uk+nk

Then, for example, if the system is linear

Xk+1 _ A B xk + gk
uk+1 O C uk 0
and
x 1
Yk _ k + "k
‘ 2
V u

54

To identify this system, having unknown matrices A, B and
C the results of Sec. 4. 1 can be applied and the input generating
mechanism can be identified. Convergence Theorem I applies to
this case.

If the input mechanism is a random process generated by

2
uk+1_ C “k +gk

and. the observations are

2
Vk = uk+nk

. 2 . .
where C 18 unknown and ék is a zero mean white process, then

with a linear system

ﬁ+1_ABxkg

uk+1 O C “k g

The Dk(.) matrix is formed as in Sec. 4.1 and the algorithm
identifies the input noise process by its matrix C, as well as
A and B.

By adjoining the input generating mechanism to the state
equations, similar generalizations are possible for the other
cases considered in this chapter. The corresponding conver-
gence theorems will still apply.

4. 5b. If the state disturbance process ék has a non-

zero unknown mean m, then the process

55

£11: ék-m

has zero mean. For a linear system

xk+1 = Axk-tBuk+€.k
_ I
—Axk+Buk+m+§k
_ '- ‘ i
— [A B m] xk + gk
“k
l

 

 

Th e corresponding observation equation is

— — — 1 — -

 

 

 

 

 

 

Vk xk "k
Vk ‘ “k TIk
l J l 0

This formulation allows the unknown mean m to be considered as
a system parameter and estimated accordingly. In a similar
manner, nonlinear or time-varying systems having unknown

mean input processes §k can be identified.
4. 5c. If the observation equations have the form
Yk z ka +111:

where H is unknown and H"1 exists, then for a linear system, a

new basis in the state Space may be found. Letting

56

gives

-1
k+1 HAH Zk+HBuk+H§k

N
II

and.
Yk : Zk+ "k
Thus the system characterized by the matrices HAH”1 and HB

may be identified, even though the covariance matrix of the pro-

cess Hgk is unknown.

4. 6 Summary

The generality of the algorithm 3. 1.13, 3.1.14 has been
clearly established by the formulations of this chapter. Equa-
tions 4. 1. 1 and 4. l. 2 define the algorithm for the stationary
linear case, while 4. 2. 3, 4. 2. 4, and 4. 2. 5 complete the time -
varying case. The relations between the convergence theorems
of Sec. 3. 2 and these systems is also established. For non-
linear systems, 4. 3. l and 4.4. 1 define the algorithm and are used.
to establish the applicability of the convergence theorems. Sec.

4. 5 considers other simple generalizations.

V AN EXAMP LE

In order to show how a specific identification problem
may be formulated, and to show that implementation of the pro-
posed algorithm is not always difficult, the identification of a fourth-
order digital control system with eight unknown parameters is
considered in this chapter. Formulation of the problem is given
in Sec. 5.1. Computer simulation results for the behavior of
the algorithm under different initial estimates, noise conditions,

and. a-priori uncertainty are shown in Sec. 5. Z.

5. 1 Problem Statement and Formulation

An interesting example of a situation where identifica-
tion techniques can be useful is depicted in Fig. 5.1. 1. The
system consists of a digital controller C2. and an unknown plant:
Q. The plant is assumed to be describable by a second-order,
linear differential equation with constant coefficients. The digital
controller, which is completely Specified, is described by a pair
of first-order linear difference equations with constant coef-

ficients. The analog-digital (A/ D) converter transforms the

57

58

(t) into digital inputs for the con-

continuous signals e1(t) and eZ

troller. The D/A converter accepts the controller's output
and transforms it to a zero-order-hold (ZOH) signal. The

and controller outputs x and x are obser ved

inputs 111 and u 3 4

2
without noise. The plant 63 receives inputs from the digital con-
troller c that are corrupted by additive noise €(t). Observations
of the plant outputs are corrupted by the additive noise n(t) and.
are used as feedback signals. This situation may arise when the
plant is located at a remote site. For example, the plant may be
a moon probe while the controller is on the earth. The identifi-
cation of this system consists in determining the coefficients of
the plant's differential equation.

The identifier I operates on the observations ul, uz, yl, YZ’
x3, and x4 and calculates the plant parameters. Knowledge of
these plant parameters may then be us ed to alter the digital con-
troller so that system performance is improved. For example,
the controller may be altered so that the overall system is

asymptotically stable .

The digital controller C, is described by 5.1. 1.
d k
x3[ (k+1)T] c11 C12 x3(kT) + (111 12 e1( T)

x4[(k+1)T] c2.1 c22 x4(kT) d21 d22 e2(kT)

5.1.1

where cij and dij are known for i,j : 1, 2. The constant T repre-

sents the length of time between samples.

59

soﬁmmﬂhmcmnh >302 5M3 Swamiwm 33:00 ﬁmﬁwﬂm

9.3mm xomnpmmh pad

4.1m enema

 

 

 

 

 

 

 

 

Hanna

6

 

 

 

 

3.9 en

G
mg“
g.

4X
5?

 

 

«\Q

 

2:, me 4

H

 

 

 

Hmﬁouﬂuoo
Seaman

U

 

 

 

 

 

 

3

__
.—

 

 

$235.3

 

 

 

H

 

 

 

 

 

 

 

 

 

 

60
The plant 63 is described by 5.1. 2 ,

d xl(t) a a x (t) b b v (t)
dt

II
+
U1

.l.2

x2“) a21 a22 x2“) b21 bzz V

where aij and. bij are unknown for i, j = 1, 2, and all variables are

scalars .

The gaussian processes €(t) and n(t) are white, have zero
mean and are independent. The processes €(t) is assumed to
behave as a ZOH signal.

From the plant equations and. system configuration

x (t) X (t) X (t)
d 1 T 3 + T3 €(t)

x.Z (t) x2 (t) x4(t)

where A and B have entries :51], and bij. Since x3(t), x4(t) and

€(t) are assumed to be ZOH signals, at the sampling times

xl(k+l) X k) X3(k)
= A + B + B§(k) 5.1.3
x2(k+l) x2(k) x4(k)

where A and B have entries a? and bij' T : 1 is used for con-

venience and [KOE]
A = eAT, B = S eA(T"’)1‘3d7 5.1.4
0

If A and B can be found from the identification algorithm then

identification of the continuous system is possible using

 

 

 

 

 

 

 

 

 

 

 

 

X : ~l-T In A 5 .1
T
_ -A - ..
and B = [ S 6 7dr] 1 A 1B 5 .l
0
Writing the direct sum of 5.1.1 and 5.1. 3
_ T L ._
x1(k+1) x3(k)
x2(k+l) A O 0 x (k) B €(k)
4
: Xk+l : Xk + +
x3(k+1) 0 C D e1(k) O
x4(k+1) e2(k)
_ _ _ J
Using the relations
Fx3(k)_ r- o 0 1 0 o _x1(k)— P o 1
x (k) 0 o o 1 0 X2(k) + o
4
= x3(k)
e (k) -1 o o 0 1 1100
1 x4(k) _ n
e2(k) O -1 O O O 111 (k)
L d - 112(k)
the above becomes
A B 0 B gk
X = X + u(k) + 5 .1
k+1 -D C k D _an

where C and D have entries Cij and d., as in 5.1.1.

The observation equations are

_

 

L

Y1 (k)

Y2(k)
Y3(k)

v4(k)

 

—

 

b

62

x1(k)
x2(k)
x3(k)

X4(k)

 

 

‘

11(k)

 

.1

Equations 5.1. 7 and 5.1. 8 are of the general form discussed in

Sec. 4.1, and the corresponding algorithm can be applied for

identification.

and x

and

indicate more clearly how the algorithm can be applied.

then

4

x1(k+1)

x2(k+1)

Defining

Y1(k)

Yz(k)

[a11a12a21a22

are observed, the equations

xl (k)

xzﬂd

x1(k)

szk)

bb
11

12

Since the matrices C and D are known, and x

x (k)

+ B 3
X4(k)

+ TI(1<)

b

x3(k) x4(k)

O

21‘022:I

O

3

+ B §(k)

O

x3(k) 25,00

0

O.

63
From Sec. 4.1,

lek) Y2(k) 0 0 x3(k) x4(k) o o

D (Y ,X )
k k k 0 0 lek) yz(k) o o x3(l<) 35,00

The identification algorithm takes the form

T 1 y1(k+1)
A _ A " .A
ak+1 — ak + Pk+1Dk Sk - Dk 0k} 5.1.9
Y2(k+1)
and P =P-PDT(S+DPDT)-1DP . 5110
k+1 k k k k k k k k k ' °
The Sk matrix may be taken as
_ x
Sk _ Qk +Rk +Sk

where Skis some positive definite symmetric matrix and

Qk

Rk cov[ n(k)]

cov[§(k)]

5. 2 Computer Simulation Results
The particular example considered in this section uses the

matrices

64

-2 0 40/9 0

The plant is unstable since A has eigenvalues of l and 2.
Also, the closed 100p system in the absence of a digital controller

is unstable since

(A-B) =

has an eigenvalue greater the unity. However, the complete sys-

tem with matrix

—D C

is stable, with eigenvalues of o, o, +1/«f2 , -1/'\/2. It is also

completely controllable since

is nonsingular.

The example was programmed on the Control Data 3600
digital computer. Ordinary matrix routines were used to simu-
late the system. The gaussian noises were simulated using the

sum of nine samples from a uniform distribution [SC] .

65

Figure 5. 2.1 shows the simulation results for a typical para-

A
meter. The average value of b over ten runs is plotted against the

22
number of samples. In this figure, P0 = 51, S = 31, Q = R = I and
the initial estimate of all parameters was taken as zero. It can be
seen that the initial response of the estimator is good, while as the
number of estimates increases, the convergence becomes slow.
This is typical of stochastic approximation algorithms.

The effect of different noise levels is depicted. in Fig. 5. 2. 2.
Here, the normalized error HEkHZ/HEOHZ averaged over ten runs is
plotted against the number of samples. In both cases indicated,
P0 = 51 and 60 = O. In the lower curve, Q = R = I and S = 31, while
for the upper curve Q : R = 251 and S = 501. With the increased
noise level, Q = R = 251, it can be seen that the convergence rate is
slowed considerably. The final value for this case is 1. 44 x 10"2
as opposed to 4. 34 x 10'”4 for the lower noise level.

Figure 5. 2. 3 shows the effect of different initial estimates.
Here, P0 = 51, Q : R : I and S : 31. The error norm HEkHZ aver-
aged over ten runs is plotted against the number of samples. For
the curve having A points, [IEOHZ = l, 300, while for the 0-point
curve, HEOHZ = l. 00. As the number of estimates increases, the
effect of initial errors disappears and the two estimators have
similar asymptotic properties. This indicates that the algorithm's
convergence is unaffected by initial errors.

The effect of au-priori uncertainty is depicted. in Fig. 5. 2. 4.

In both cases indicated, Q = R = I, S : 31, and 30 = O. A ten run

66

 

 

 

 

 

7

’13
22
/\
1. O »- A 0 —0-———0-——0
ten run average
Q = R = I
P0 = 51
0. 5 _ S = 31
N0. of samples/2
‘j\,._l_ l 1 l l l l l l l g
1 2 4 8 16 32 64 128 256 512
Figure 5. 2.1. Behavior of a Typical Estimator
2 2
116k” /||€OH
1 0T 0
° ”"= “: ten run average
P 2 SI
. 0
1 eo==0.o
10' _ .
‘ .
O
O . . .
lo‘zt ‘
10‘3. .
A A ‘
0: Q=R=251, 82501
—4 A: Q = R = I, S : 31 N0. of samples/2
10 W—l l 1 l l 1 1 1 l \
8

16 32 64 128 256 512

Figure 5. 2. 2. Effect of Noise Power on Convergence

67

 
  
      

2
3'1. Hekll 2
10 [ ‘- A; HE H = 1,300
o
2
102- 0: Neon =1.0
ten run average
1 Q = R = I
10 ' S = 31
P0 = 51
10 dy’
10'1_
-2
10 -
'3 1 I l , No. of samples/2

 

 

I

10 l m J A
_AV_3 2 4 8 16 32 64 128 255 512

Figure 5. 2. 3. Effect of Initial Estimates on Convergence

   
  
   

 

 

2 2
HekH /|I€OH
’t
l 0;: 0 P0 = 0.1 I
A: P = 101
0
1 ten run average
10‘ _ (2::R:=1
S
a
-2
10 -
10’3.
-4 N0. of samples/2
10 _/\/ J l ' I 1 1 1 1+ 1 l >
1 2 4 8 16 32 64 128 256 512

Figure 5. 2.4. Effect of Uncertainty on Convergence

68

average of the normalized error HEkHZ/HEOH Z is plotted against
the number of estimates. The uncertainty associated with initial
estimates is reflected in the a-priori covariance matrix Po. If
the initial estimates are thought to be close to the true values, then
PO will be small (in norm). If initial uncertainty is large, the PC
will be large (in norm). In the curve with points A, P0 = 101 and,
for the 0-point curve P0 = 0. 11. In both cases the initial error
norm is small, having a value of 2.61. As indicated by Fig. 5. 2.4
the effect of initial uncertainty tends to produce large changes in
the initial estimates which are close to the true values. This
results in errors that are large relative to those produced by

the same initial estimate with a smalier Po. As more observa-
tions become available, the effects of the a—priori uncertainty
become small and both estimates have similar asymptotic pro-

perties.

5. 3 Summary

The identification of a fourth-order digital control system
with eight unknown parameters has been studied in this chapter.
The formulation of the identification algorithm in Sec. 5. 1 was
shown to be a simple application of the results of Sec. 4. 1.
Various properties of the algorithm were demonstrated by a com-

puter simulation in Sec. 5. 2. It was shown that a-priori uncertainty

69

and initial estimate errors do not affect the asymptotic properties
of the algorithm. It was also shown that increased noise power

slows the algorithm's rate of convergence considerably.

VI C ONCLUSION

The major results of the thesis are listed in Sec. 6. 1 and

possible extensions of this work are discussed in Sec. 6. 2.

6.1 Results of the Thesis

The formulation of a general system identification problem
as a problem in Bayesian learning is carried. out in Chap. 11. This
formulation is significant because it provides a unifying structure
through which a wide variety‘of system identification problems
can be solved. In particular, Sec. 2. 3 shows the basic relations
which exist for the general system 2. 3.1, 2. 3. 2. , 2. 3. 3.

Chapter 111 gives the solution of the Bayesian identification
problem for the general system 3.1.1, 3.1. 2, 3.1. 3. The general
algorithm 3.1.13, 3.1.14 is derived under gaussian assumptions
with an approximation to the optimal Bayesian estimator, and is
optimal for linear systems having no observation noises. This
algorithm is more general than any previous results, and can

be Specialized to many important cases as shown in Sec. 3. 3.

7O

71

The proofs of the three convergence theorems of Sec. 3. 2 are new,
and are found in Appendix 1. Convergence Theorems 1 and II are
new and can be more readily applied than those given by other
authors. These theorems are important to the thesis because they
establish that the algorithm does indeed yield the true system
parameters.

Chapter IV shows how the algorithm 3.1.13, 3.1.14 can
be specialized to important classes of systems. New results or
generalizations are obtained for linear and nonlinear systems in
each section. The formulation for stationary linear systems is
new because of the presence of input observation noises. The
time-varying linear formulation is significant because observa-
tion noises have not been considered previously in this case. The
nonlinear formulations are new because of the input observation
noises and non-stationarity of the system. The generalizations
of Sec. 4. 5 also contribute to the resuits of this thesis.

The feasibility of the identification algorithm is demon -
strated by an example of a fourth-order system with eight un-
known parameters in Chap. V. A computer simulation shows
that increased noise power slows the algorithm’s rate of con-
vergence and that the effect of initial estimates becomes negli-
gible as the number of observations becomes large. Also, when
a—priori uncertainty is large, the effect of initial observations

is weighted heavily, resulting in initial transients in the algorithm.

72

6. 2 Possible Extensions

The class of systems to which the identification algorithm
is applicable is very general. However, the formulation in this
thesis applies only to discrete-time systems. A very natural
extension of the results given in Chaps. II and III is to continuous-
time systems. Conceptually, this is not a difficult problem, al-
though the details of the limiting arguments may be difficult. In

the continuous case, an equation analgous to 3. l. 13 would be

£0”) = P(t)DT(t)S(t)'l{§E y(t) - D(t)0(t)} 6-2-1

and for 3.1. 14

% pm = .. meTm {Sm-1+ D(t)P(t)DT(t)}'1D(t)P(t) 6. 2.2a
or
:19; p'lu) = DTIt)S(t)'1D(t) 6-Z-Zb

Implementation of these equations for analog computer simulations
would be difficult, although hybrid techniques would reduce these
difficulties considerably.

Another useful extension of the results studied here would
be the elimination of 3. 1. 14. A deterministic sequence of matrices

Pk would reduce the computational requirements. Furthermore,

the convergence proofs would be simpler, although the sequence
of estimates would be less Optimal. For example, (l/k) I might

replace P Other more general forms for P might be con-

k° k

sidered and the rate of convergence optimized over a class of
I

n I-

73

The identification algorithm could also be used to study
adaptive controller systems. As indicated in Chap. V, the
identifier could be us ed to track system parameters and the
digital controller could be modified accordingly. Other adap-
tive learning loops could be studied. This same philosophy can
be applied to communication problems. By utilizing a channel
identifier, an adaptive detector could be designed.

The convergence proofs of Appendix 1 could be extended.
Some of the techniques used in these proofs may be useful in
establishing multi-dimensional stochastic approximation algorithms .

Finally, the study of canonical forms would be very

useful when all system states are not observable.

APPENDIX I

CONVERGENCE PROOFS FOR THE
IDENTIFICATION ALGORITHM

The three convergence theorems for the identification algor-
ithm 3.1.13, 3.1.14 were stated with assumptions in Sec. 3. 2. In
this appendix these theorems will be proved.

For convenience of reference the algorithm equations are re-

stated here.

A A I -l A
= P D _ D .
u k+1 ak+ k+1 k (yk' Vk)Sk [ yk+1 k(yk' vk)a k] A1 1

T T -1
P k+1 ‘ Pk‘ Pka (yk’ vk)[sk+ Dkwk’ vk)PkD k (3’16ka Dk(yk'vk)Pk°

A1.2

With 6k = Gk

-1 T -1
: - D O
Ek+1 Pk+lpk €13 Pk+1Dk (Yk’ vk)Sk ”k+1 k(yk’ Vk) “1
A1. 3

In the following discussion D will denote Dk(yk, vk) as opposed to

k
Dk(xk, uk). Since Pi is a covariance matrix it is real, positive

definite and symmetric .

From the definition of Sec. 3. 2,

 

 

(P12) = x (P.)

||P.|l =./x (P.*P.) = /x
1 max 1 1 max 1

max

74

75

Also,

 

xmaxuai) : -1
Lemma A1. 1: The product of two positive definite symmetric
matrices has positive eigenvalues.
Proof: For any square matrix F and any nonsingular G, (FG)TFG
and FTF GGT have the same eigenvalues since by the similarity

transform GT [ROS]

GT[FTFGGT] (GT)-l = GTFTFG .

Also, for any nonsingular H, HTH is positive definite sincer 7! 0

xT(HTH)x : (Hx)T(Hx) > 0 since Hx 7! 0.

1
For two positive definite symmetric matrices P and Q, P‘2 and
i
Q2 are positive definite and symmetric. Thus
l .1. l l l T l l l T
P0 = PZPZQZQZ = (P2) P2Q2(Qz) ,
1 T _1_ _1_
has the same eigenvalues as (PZQZ) (PZQZ). But (PZQZ) is non-

_1_
singular since P and Q are nonsingular, so that (PZQ‘2 )T(PZ Q2) is

positive definite and has positive eigenvalues. Therefore the pr‘cr-r

duct PQ has positive eigenvalues.

Lemma A1. 22 If the system is identifiable then the norm of

-1 . .
Pk+1Pk-q satisfies

HPk+1Pk:<11H < 1

76

and the norm of P _1P satisfies
k-q k+1
-1
< 1
Hpk-quHH

with probability one (WPl).

Proof: Only the first inequality will be proved. The second follows

from a similar development. By lemma A1.1, Pk+lPk q has posi-

tive eigenvalues since Pj is positive definite and symmetric for
k
any j. By identifiability, Z DjI‘Slej is positive definite and
:k-q

J
symmetric (WPl ), so

k
Pk+1 2 015711).
j:k_q J J J

has positive eigenvalues (WPl).
From A1. 2

P '1 = P '1 + 2: DTS-ID”
k+1 k-q .
J=k-q

Therefore, pre-multiplying by Pk“

k
-1 T -l
: - D .
1Dk+lpk-q I Pk+l j—E q j Sj Dj

Both product terms have positive eigenvalues (WPl) so that the

Jordan form

with eigenvalues on the diagonal shows that the eigenvalues of J

or those of Pk+1Pkncl1 are strictly less than one (WPl). Thus the

maximum eigenvalue is less than one, and

77

-1
<
1|Pk+lpk_q ll 1 (WPl) .
An interesting and important property of the matrix P will

k+1

now be derived through a sequence of lemmas.
Lemma A1. 3. The minimum eigenvalue of a symmetric matrix F
satisfies

xTFx

A . (F) = inf 0
min x7’0 xTx

 

Proof: It is first necessary to show that

xTxk . (F) < xTFx
min =
or that
xT[F-x (F)-I]x> 0
min =
1.e., [F - kmin(F). 1] is pOSitive semi-definite. Letting JF denote
the Jordan form of F, and J the Jordan form of the difference gives

J = JF- Amin(F)' I .

Since all eigenvalues of F are on the diagonal of J and Amin(F) E

F
MF) ’0'- X(F) where M.) denotes an eigenvalue, then the eigenvalues
of J satisfy MJ) 2 0. Therefore F - Xmin(F)- I is positive semi-
definite. Thus

xe

T
xx

 

k in(F) 5-

m VX7-(0.

Furthermore, since xminu?) is an eigenvalue there exists x' 3! 0
such that

Fx' = x . (F)x'
min

78

so x'TFx' = x'Tk . (F)x’
1n

x'TFx' < xTFx

 

and X . (F) =
min

_ VX;(O.
x'x' xx

Therefore by definition of infimum [ROY]
T
x ,n(F) = inf x TF" .
m1 xfO x x

 

Lemma A1. 4. A11 eigenvalues of Pk+l

approach infinity as k -. 00,
WPl, and do so at a rate greater or equal to 0° k, for some con-

stant c > 0.

Proof: Iterating equation A1. 2 gives

_1 1 k/q q.i-1 T 1
P :13” + 2: 2: D.S. D.,k=q,2q,...
k+1 o . . . J
121 j:q(1-1)

By identifiability, 2 DjTSj-IDj is positive definite (WPl) and has all
J'
positive eigenvalues. Therefore there exists a constant c' > 0 such

that for all eigenvalues k

x>c’>0

Using this, Lemma A1. 3, and the fact that for all realizations of {21}

ianziZE infzi
i i

it follows that

k -'-
T -1 k/q q 1 1 T -1
xmin[ 2 D. s. D.] = "min[ 2 2 D. s. D.]

79

xTz (2D.Ts.‘1D.)x

i . J J J
. J
: inf T
xf’O x x

T 1

xT(2 D. 5.- D.)x
j J J J
E inf T
i x710 x x

 

l V

> z x .(2D.Ts.'ID.)
- mm. J J J

i J
z E C' : €11.18 C”>0.
i
Therefore
1 1 k T 1
- > .. -
min[ Pk+1:l — Xmin[ 1Do ] + >\min[j230]:)j Sj Dj]
> >
_ co+ clk, co, c1 0

and all eigenvalues of P approach infinity at least as fast as k

k+1
(WPl).
Lemma Al. 5. The norm of Pk+l satisfies
<
H PkHH _ cZ/(k + c.) .

(WPl) where 02, c3 > O are constants.

-1
Pr°°f' HPk+1H ' xmax(Pk+1) ‘ l/xmin(Pk+l)

_<_ 1/(co+ clk) 5 cz/(k+c3)

by lemma A1. 4.

From lemmas A1. 2 and A1. 5

80

||Pk+1Pk:(11 H < 1 (WPl)

and

HP 5 c2/ (k + c3) (WPl).

k+1H
Only system identifiability was used to establish these properties.
These results will be useful in establishing convergence of the identi-

fication algorithm.

Equation A1. 3 may be rewritten as

k
-1 -1
= + P - D
Ek+1 Pk+lpk-q El:<.q k+1j—E qu Sj (yj+1 j“)

by recursive substitution. From this equation the product

6 E T is written as

k+1 k+1
T 1 T 1 k T 1
E = - - +2 D -
6k+1 k+1 Pk+1Pk- -q k q k qu qu+1 Pk+1j_123 q j Sj ‘(Yj+1
T 1 k T 1
_ E ' D ' ..
Dj“) k-q P-kqpk+l 11+1?" '5' (yj+l Dj“)
J=k-q
k
T -1
. 2: (y. -D.<1) s. P A1.5
. +1 k+1
J=k_q J J J

Utilizing assumption 3. 2.A2, the expectation of the cross product

.term in A1. 5 vanishes since

T -1 TP -1
2
E{ Pk+l? Dj Sj (Yj+l Dj-MEk q k qu+1}

T -1

_ 2E{Pk+lE[§:Dj sj (Hjl-Djk-Ta)e:qk/Y’VR]P1<+1}
= 2E{P 2E{DTS'1E[(y -Da)/yJ v316P'1}P }
k+1 j j j+1 k-q P--kq k+1

5
=0.

81

Therefore
T _1 T "l
E{GkHEkH} ' E{Pk+1Pk-qu-qk-qu-qpk+1}
T -1 T ‘1
- D -
+ E{Pk+l :2 DJ' SJ (yj+1 5a)?(yj+1 Din) $5 PHI}

Taking the norm of both sides of this equation and using the triangle

inequality

T -l T -l
”E(Ek+1€ k+1)H E HE{Pk+1Pk-q Ek-qu-qu-qpki-l}H

T -1 T -1
HE{Pk+l ?Dj Sj (Yj+1'Dj°)§3(yj+1'Dj°) Sj Pk+l}”

Applying this result, Lemma Al. 2 and Lemma Al. 5,

2
C2
T
T>H+ —HE{ZD 51(r
(k+c3z) j j j J“

T
HE<ER+1E'<‘1<+1)H12(6HE6-k-qE-k q

T -1
-D. E . -D. S. Al.6
Jct) j (YJ+1 Ja) J H

where0<c1<1 and 0<cZ, c <00.

3

Lemma Al.6. For the scalars Xi'ai' B, if x.

< u.x.+ . and
1 l- 1 1 B1
00

00

Il oi: 0 with all partial products uniformly bounded, 2) [Si < 00,
1:0 0

x130, x0<00, [3 1_>_0thenxi->Oasi->00.

Proof: Recursively substituting into the inequality for xi +1 gives

82

X. < a.x.+ .
1+1— 1 1 B1

<
— aiai-lxi-l+ “1614+ pi

x. < (1.11. ...ax+a.a. ...a +...
1+1 - 11-1 0 0 11-1 lﬂo

+ “1‘11..1‘31.2+ “1‘31 1+ ‘3'

1

1 i-l i
x, < 11 a.x0+ E H (1 [3“? 131
j=0 J j:0 k=j+1
00
Since 11 (ii: 0 and all partial products are bounded, for any m > 0

m-l i m-l i

E H akﬁj _<_ ﬂj max H Gk

j=0 k=j+1 j=0 OﬁjEm-l k=j+1
and

i i i i

E I'[ akﬂ. E Z [3. max 11 o.

j=m k=j+1 J j=m J m:j_<_i k:j+l

Given any 6 > 0 there exists m sufficiently large so that

since 2 [3. < 00. Thus for m sufficiently large

j

i i m-l i

E II ﬁnk < 6/3 + E 5. max l'I ok-l-ﬁ.
j=.0 k:j+l J j=0 J O:j_<_m-1 k=j+l

For fixed m,

m-l i
Z [3. max 11 (1k < 6/3
j=0 J o_<_j_<_m-1 k=j+l

83

for i sufficiently large, since Haj = 0. Also Bi < 6/3 since 23 [31 <.oo.

Therefore

1 1-1 1

n «1.x + 2: n o. p.+p. < E/3+E/3+E/3=Efor largei
J o . . RJ 1

:0 3:0 k=3+1

1
implies x. -> O as i ->oo since x. > O. [DV][WOL].
1 1-

Assumption 3. 2.A3 and the previous results give the first

conifer gence theor em.

Convergence Theorem I: If the system is identifiable and assump-
tions 3. 2.A1, 3. 2.A2, 3. 2.A3 hold, then the algorithm A1. 1, A1. 2

converges to the true value of the parameter vector u in the sense

that

Hcov(€k)H = Hcov(&k- a” -> O as k -> 00.
P f.F A16 d3ZA3HEE eTH< 2HEE 6TH
r00 - mm ' an ° ° (k+1 k+1) - C1 (k-q k-q)

2
+c'2/(k+c3) where0<c1<l,0<c <oo, and0<c'2<oo. The

3
following associations are made in lemma Al.6
T
xi ,_. HE<ek_qek_q)H

T
x1+1 H HE<EI<+1€1<+Nll

Z
ch

i 1
Z

and, noting that

' <
l 0 Since c1 1

0:1
0
II

84

and
0° 2
E c /(k+c ) < oo
k=0 2 3
gives
IIE<ek+1 €k+1)II— - IIcovek 1II = IIcov<8k+l- a)II»0 as k»oo .

Convergence theorem II is proved from assumptions 3. 2.A1,
3. 2.A3, 3. 2.A4. Rewriting A1. 3 and taking the expected value of the

norm squared gives

-1 T -1 z
E{II6 II Z}=E{HP e +P ZDj Sj (YjH-Dja)” }

k+1P-k-q k-q k+1j

From 3. 2. A4

T -1
+le }< _E{IIP IIZk}+E{IIP +1213]. SJ. (Yj

Ewe - DjaIIIZ}

k+1Pk-qu-q +1

:E{IIP ijflIIZII§_qIIZ}+E{IIP Ilzllsansj'lw.
J

2
- Dja)H }

k+1 k+1 1+1

Applying Lemmas Al. 2 and A1. 5

2 2 C.2 T -1
II }_<_ c:1 E{Il-Ek _qZII }+————— (k+c3 ———-—)—2 EM? Dj SJ. (yj

Ewe - Djamz}

k+1 +1

A1. 7
Convergence Theorem 11: If the system is identifiable and as sump-
tions 3. 2.Al, 3. 2.A3, 3. 2.A4 hold then the algorithm Al. 1, A1. 2
converges to the true value of the parameter vector u in the sense
that

EHIqIIZ} = E{IIGk-aIIZ}-»0ask-»oo

85

i.e., the error norm converges to zero in mean square.

Proof: From Al. 7 and assumption 3. 2. A5

C

2 Z 2 Z T -1 Z
E{II6 II } <c E{IIE II }+ —— E{I|2 D s (y - DaIII}
_ - 2 . . . . c
k+1 1 k q (k+c3) j J J 3+1 J
2 Z 2 2.
E{IIEk+1H }:c1E{l|6k_qII }+ c4/(k+c3) , o < cl <1, 0 < c4< oo.
°° 2 0° 2
Since 11 c1 = O, and 2‘, c4/(k+c3) < 00, Lemma Al.6 implies
j=0 o
E{HE |I2}-»o as k->oo
k+1 ’

A V 2 —> ->
E{Hak+l-aH } 0 ask oo .

Theorem III is proved using assumptions 3. 2.A1, 3. 2.A6, and
strong identifiability. This latter condition is stronger than identi-

fiability and follows from lemma Al. 4, which states that the minimum

k
eigenvalue of 2‘, D.TS.-1D. approaches infinity at a rate greater than

i=0
or equal to c-k. Strong identifiability requires that this rate be

strictly greater than c- k, or at least equal to c- kp, p > 1. Consider-
ing the inequalities used in the proof of lemma A1. 4 and the random

nature of the system variables, this is not a stringent requirement.
Lemma A1. 7: If the system is strongly identifiable,

IIP II 5 c4/(kp+c3>, (W131)

k+1
where

0<c4, c <oo,1<p<oo.

3

86

-1
Proof. Hpk+lH )‘max(Pk+1) ' 1/ >‘min(pk+l)
1 k T 1
<1/[1, EV y+1 .(2:D.&'DJ]
— mm 0 mln j=0 J J J

_<_1/[c1+c2kp] E c4/(kp+c3),

1 < p < oo , O < c , c < 00 with probability one, by assumption.

3 4
00
° < < < : .

Lemma Al. 8 . If xk+1_ akxk+ [3k and 0 _ x0 00, II Gk 0 With
00 k=0

all partial products uniformly bounded, 2 pk < 00, 13k _>_ 0, all
k=0

with probability one, then xk -> O as k -> oo WPl.

Proof: This proof follows that of lemma Al. 6 except that statements
hold with probability one (WPl ). This lemma is stated. by Fukao

[FK- 3] without proof.

Convergence Theorem 111: If the system is strongly identifiable and
if assumptions 3. 2.A1, 3. 2.A6 hold then the algorithm Al.1, Al. 2
converges to the true value of the parameter vector u in the sense

that

”E -aH—>O as k—»oo (WPl).

k+1H : Hek+1

Proof: Using the triangle inequality for norms in A1. 3

-1 T

II:IIP P II-II6k_qII+IIP

H Ek+1 k+1 k-q

k
k+1II -II 2 DJ.

-1
S. (y. - Du)”
. +1
J:k_q J J

87
Lemma A1. 2, strong identifiability, and 3. 2.A6 give

IIEkHII 5 clllemII + cs/(kp+ c3)

where0<c1<1,1<p<oo,0<c <oo (WPl).

5
Since
00 oo
2‘, c5/(k+c3) < 00 and II c1 = O (WPl) ,
k=0 k=0

applying Lemma A1. 8 shows that

He - aH->0 (WPl) as k-> oo .

k+1H : HGk+l

This theorem is equivalent to one given by Fukao [FK-3]

but the proof is distinct.

APPENDIX 11

THE REPRODUCING NORMAL AND ITS QUADRATIC FORM

In S ec. 3.1, the a—posteriori distribution of the para-
meter vector u wa 3 given as gaussian with mean value and co-
variance given by 3.1 .11 and 3.1 .12, reSpectively, and followed
from the reproducing property of the gaussian distribution and
minimization of a quadratic form. The details of this derivation
are given in this appendix.

From 3.1. 9 and 3.1 .10 the exponent of the numerator

equals (within a- l/ 2 factor)

T -1 T -1 T -1 T -1
_ 2 A
0' {Pk +Dk Sk D13“ ‘1 {Pk+1ak+ Dk Sk yk+1}
{A T "IA T -1
4. A2.
“‘kPk “U yk+1Sk yk+l} 1

Since the integration in the denominator 3.1. 9 is with reSpect to
o, the third term in this quadratic form cancels. Completing the

square of the remaining part of AZ.1 gives

88

89

T -1 T -1 T -1. T -1
- Z :
‘1 {Pk +Dk Sk Dk}“ ‘1 {Pk+1“k+ Dk Sk Yk+1}
-1 T —1 -1 -1,\ T —1 T -1 T -1
((1 -[Pk +Dk Sk Dk] [Pk ak+Dksk yk+l]) [Pk +Dksk Dk]
-1 T -1 1,. T -1

-1 -
x(o. - [Pk +Dk Sk Dk] [Pk (1k+Dk Sk yk+1]

-1 T -l -l -1 '1/\ T -1
-[P (1 +13k Sk Dk] [Pk +DkSk Dk] [Pk c1k+Dk Sk yk+l]

A2. 2
which is the exponent of the numerator and the exponent under the
integral sign of 3.1. 9. Integrating the perfect square in (1 leaves
the second term of A2. 2, which cancels with an equal term in the
numerator. The resulting exponent of the a—posteriori distribution

is the quadratic form

-1 T -l --1 '1/\ T -1
(a - [Pk +Dk Sk Dk] [Pk c1k+Dk Sk yk+1

-l T -1 -1 -1A T -1
x (a - [Pk +Dk Sk Dk] [Pk uk+ Dk Sk yk+l])

and is a perfect square. Therefore the a-posterior distribution is

gaus sian with mean

A -1 T -1 -1 1A T 1
: , Z
“k+1 [Pk + 13k Sk Dk] [P k+ Pk Sk yk 1] A . 3
and covariance
-l 1

- T -1
= 2
P Pk+DkSkD A.4

90

Equation AZ. 3 is the same as 3.1.11 and can be written in the form
3.1.13. Using a matrix inversion lemma [HOU][AO-1] , A2.4 can
be written in the form 3.1.14.

The mean 3k“ can also be obtained directly from A2. 1 by
minimization of that quadratic form with respect to o, once the

above gaussian reproducing property is established.

[AH]

[AHL]

[ AND]

[AO-l]

[AC-2]

[ALB]

[ARN]

[AT]

[AY]

BIBLIOGRAPHY

Ahmed, N. and S. Karni, "On Obtaining Transfer
Functions from Gain Function Derivatives, " IEEE
Trans. on Automatic Control, AC-12, No. 2, p. 229,
April 1967.

 

Ahlfors, L. V., Complex Analysis, " 2nd Ed.,
McGraw-Hill Book Co. , Inc. , New York, 196 6.

 

Anderson, G. W., R. N. Buland, and G. R. Cooper,
"Use of Cross Correlation in an Adaptive Contro 1
System, ”Proc. Nat'l. Elec. Conf., Vol. 15, Oct.
1959.

 

Aoki, M., Citimization of Stochastic Systems,
Academic Press, New York, 1967.

 

Aoki, M. , "On Some Convergence Questions in
Bayesian Optimization Problems, ” IEEE Trans.
on Automatic Control, Vol. 10, No. 2, pp. 180-

 

 

182, April 1965.

Albert, A. , ”Nonlinear Regression and Stochastic
Approximation, " IEEE Conv. Rec'd, 1967.

 

Arnold, C. R. and K. Narendra, "The Characteri-
zation and Identification of Systems, " Tech. Rept.
No. 471, Cruft Lab., Harvard Univ., Cambridge,
Mass., June 18, 1965.

Athans, M. and P. Falb, OJLtimal Control, McGraw-
Hill Book Co., Inc., New York, 1966.

 

Ayer, Alfred J. , "The A PRIORI, " in Philosophy of
Mathematics; Selected Readings, p. 289-301, Eds.

 

 

P. Benecerraf and H. Putman, Prentice Hall, Inc.,
Englewood Cliffs, New Jersey, 1964.

91

[BAL-H

[BAIPZ]

[BEL-l]

[BEL-2]

[ BLU]

[BX]

[CAR]

[ CH]

[ CHU]

[cu-u

[CU-Z]

92

Balakrishnan, A. V., "Identification of Control Systems
from Input-Output Data, " Proc. IFAC Conf. on Identi-
fication in Automatic Control Systems, Prague, Czecho-
slovakia, June 1967. Also in SIAM J. Control, Vol. 5,
No. 3, Aug. 1967.

 

Balakrishnan, A. V. , "Determination of Nonlinear Sys-
tems from Input-Output Data, " Proc. Princeton Conf.

on Identification Problems in Communication and Control,
Mar. 1963.

 

Bellman, R., B. Gluss, and R. Roth, "Identification of
Differential Systems with Time Varying Coefficients, "
RAND Corp., RM-4288-PR, Nov. 1964.

Bellman, R., R. Kalaba, and R. Sridhar, "Adaptive
Control via Quasilinearization and Differential Approxi-
mation, " RAND Corp. , RM-39Z8-PR, Nov. 1963.

Blum, J. R. , "Approximation Methods which Converge
with Probability One, " Annals. Math. Stat. , Vol. 25,
pp. 382-386, 1954.

 

Boxer, R. and S. Thaler, "A Simplified Method of
Solving Linear and Nonlinear Systems, " Proc. IRE,
Jan. 1956.

 

Carnap, R., Logical Foundations of Probability, Chi-
cago, Univ. of Chicago Press, 1950.

Chen, C. F. and B. L. Philips, "Accurate Determi-
nation of Complex Root Transfer Functions from
Frequency Response Data, " IEEE Trans. on Automatic
Control, AC-lO, No. 3, p. 356, July 1965.

 

Chung, K. L. , "On a Stochastic Approximation Method, ”
Annals of Math. Stat., Vol. 25, pp. 463-483, 1954.

 

Cuenod, M. and A. P. Sage, HComparison of Some
Methods Used for Process Identification in Automatic
Control Systems, Proc. IFAC Symposium on Identi-
fication in Automatic Control Systems, Prague,

June 1967.

 

 

Cuenod, M. and A. E. Durling, An Introduction to
hnpulse Analysis, Academic Press, New York 1967.

 

[DA]

[DAL]

[DE]

[DU]

[DV]

[EY]

[FU-l]

[FU-Z]

[FK-l]

[FK-Z]

[PK-3]

[FR]

[HANU

93

Davenport, W. B. and W. L. Root, An Introduction
to the Theory of Random Signals and Noise, McGraw-
Hill Book Co. , Inc. , New York, 1958.

 

 

Daly, R. F. , "The Adaptive Binary Detection Problem
on the Real Line, " Stanford Electron Lab. Rept. , TR
2003-3, Feb. 1962.

Deutsch, R., Estimation Theory, Prentice-Hall, Engle-
wood Cliffs, New Jersey, 1965.

 

Dubes, R. C., Theory of Applied Probability, Prentice-
Hall, Englewood Cliffs, New Jersey, 1968.

 

Dvoretzky, A. , "On Stochastic Approximation, " Proc.
Third Berkeley Symposium on Mathematical Statistics
and Probability I, pp. 39-55, Univ. of Cal. Press.,
Berkeley, Cal. , 1956.

 

 

Eykhoff, P. , ”Process Parameter and State Estimation, "
Proc. IFAC Symmsium on Identification in Automatic
Control Systems, Prague, 1967.

 

 

Fu, K. S. , "Learning Control Systems, " Computer and
Information Sciences, pp. 318-343, J. T. Tou and R. H.
Wilcox, eds., Spartan Books, Washington, D. C., 1964.

 

 

Fu, K. S. and K. Kirvaitis, "Identification of Nonlinear
Systems by Stochastic Approximation, " Proc. 1966
JACC, pp. 255-264.

 

Fukao, T. , "Some Fundamental Properties of Adaptive
Control Processes (1), "Bull. of Electrotech. Lab.,
Vol. 28, No. 1, pp. 1-19, Jan. 1964.

 

Fukao, T. , "System Identification by Bayesian Learning, "
I, 11, Bull. Electrotech. Lab. 29, No. 5, p. 364, Tokyo,
1965.

 

Fukao, T. , "System Identification by Bayesian Learning, "
Proc. Tokyo IFAC Symposium, pp. 137-146, 1965.

 

Frame, J. S. , "Matrix Functions and Applications, "
IEEE Spectrum, vol. 1, March-July 1964.

Hamming, R. W., Numerical Methods for Scientists
and Engineers, McGraw-Hill Book Company, Inc., New
York, 1962.

 

 

[ HAN]

[HE]

[ HS]

[110-1]

[110-2]

[110-3]

[Ho—4]

[HOU]

[ IFAC]

[K-1]

[K-Z]

94

Hancock, J. C. and P. A. Wintz, Signal Detection Theory,
McGraw-Hill Book Co. , Inc. , New York, 1966.

 

Hempel, C. G. , Fundamentals of Concept Formation in
Empirical Science, Chicago: Univ. of Chicago Press,
1952.

 

 

Hsieh, H. C. , "Least Squares Estimation of Linear and
Nonlinear System Weighting Function Matrices, ”J. Info.
and Control, Vol. 7, March 1964. Also in Computing
Methods in Optimal Control, A. Balakrishnan, and L.
Neustadt, eds., Academic Press, New York, 1966.

 

 

Ho, Y. C. and B. H. Whalen, "An Approach to the
Identification and Control of Linear Dynamic Systems
with Unknown Parameters, " IEEE Trans. on Automatic
Control, Vol. 8, No. 3, pp. 225-256, July 1963.

 

Ho, Y. C. and R. C. K.Lee, "Identification of Linear
Dynamic Systems, " Jrnl. of Information and Control,
Vol. 8, pp. 93-110, Feb. 1965.

 

Ho, Y. C. and R. C. K. Lee, "Identification of Linear
Dynamic Systems, " Proc. N.E.C. , 1964.

 

Ho, Y. C. , "On Optimal Filtering and Stochastic Approxi-
mation, " J. Math. Analysis and Applications, Vol. 6,
No. 1, pp. 152-154, 1963.

 

Householder, A. S. , The Theory of Matrices in Nurneri-
cal Analysis, Blaisdell Publishing Co. , New York, 1964.

 

 

Proc. of Prague IFAC Conference on Identification in
Automatic Control Systems, Prague, Czechoslovakia,
June 1967.

 

 

Kalman, R. , "Mathematical Description of Linear
Dynamical Systems," S.I.A.M. Control, Vol. 1, No. 2,
19630

 

Kalman, R . , "A New Approach to Linear Filtering and
Prediction Problems, " Trans. ASME, ser. D, J. Basic

En ., 82, pp. 35-45, 1960.
__8___

 

[K-31

[ KAI]

[KIE]

[KIR]

[ LAN]

[LE-1]

[LE—2]

[LEE]

[LEV]

[MA]

[MG]

95

Kalman, R. and R. S. Bucy, ”New Results in Linear
Filtering and Prediction Theory, ” Trans. ASME ser. D,
J. Basic Eng. 83, pp. 95-108, Mar. 1961.

 

Kailath, T., ”Correlation Detection of Signals Perturbed
by a Random Channel, IRE Trans., Vol. IT-6, No. 3,
pp. 361-366, June 1960.

 

Kiefer, J. and J. Wolfowitz, "Stochastic Estimation of
the Maximum of a Regression Function, " Annals. Math.
Stat. , Vol. 23, pp. 462-466, 1952.

 

Kirvaitis, K. , “Identification of Nonlinear Systems by
Stochastic Approximation, " Ph. D. Thesis, Purdue Univ. ,
Aug. 196 5.

Lanning, J. H. and R. H. Battin, Random Processes in
Automatic Control, McGraw-Hill Book Co., Inc., New
York, 1956.

 

 

Lee, Y. W. and M. Schetzen, ”Measurement of the
Kernels of a Nonlinear System by Cross Correlation, "
Int. Jrnl. of Control, 1965.

 

Lee, Y. W. and M. Schetzen, "Some Aspects of the
Wiener Theory of Nonlinear Systems, " Proc.N. E. C. ,
1965.

 

Lee, R. C. K., Optimal Estimation, Identification and
Control, Research Monograph, No. 28, M. I. T. Press,
Cambridge, Mass., 1964.

 

Levin, M. J. , ”Estimation of a System Pulse Transfer
Function in the Presence of Noise, " IEEE Trans.“ on
Automatic Control, AC-9, No. 3, July 1964.

 

 

Mayne, D. Q. , ”Optimal Non-Stationary Estimation of
the Parameters of a Linear System with Gaussian
Inputs, " J. Electr. Control, pp. 101-112, Jan. 1963.

 

McGhee, R. B. , "Identification of Non-Linear Dynamic
Systems by Regression Analysis Methods, " Ph.D.
Thesis, Univ. of So. Cal., June 1963.

[ POP]

[ PAP]

[PU]

[Q]

[RAU]

[ROB]

[ROS]

[SAK-l]

[SAK-Z]

[SAR]

[SC]

96

Popper, K. R. , The Loac of Scientific Discovery,
Routeledge and Kegan Paul, London, 1951.

 

Papoulis, A. , Probability, Random Variables and Sto-
chastic Processes, McGraw-Hill Book Co. , Inc. , New

 

 

York, 1965.

Puri, N. W. and C. N. Weygandt, "Transfer Function
Tracking of a Linear Time-Varying System by Means of
Auxiliary Simple Lag Networks, " IEEE Trans. on Appl.

 

and Ind., vol. 83, pp. 70-72, Jan. 1964.

Quine, W. V. , "On What There Is, " Philosophy of

 

 

Mathematics: Selected Readings, pp. 183-196, P.
Benecerraf and H. Putnam eds. , Prentice-Hall,
Englewood Cliffs, New Jersey, 1964.

Rauch, Tung, and Striebel, "On Maximum Likelihood
Estimation of Linear Systems, " Lockheed Missile and
Space Co. Tech. Rpt., June 5, 1963.

Robbins, H. and S. Monro, "A Stochastic Approxima-
tion Method, " Annals. of Math. Stat., Vol. 22, pp. 400-
407, 1951.

 

Rosen, J. B. , "Stability and Bounds for Nonlinear
Systems of Difference and Differential Equations, "
Jrnl. Math. Anal. and App1., Vol. 2, pp. 370-393,

 

1961.

Sakrison, D. J. , ”The Use of Stochastic Approximation
to Solve the System Identification Problem, " IEEE
Trans. on Automatic Control, AC-12, No. 5, pp. 563-

 

 

567, Oct. 1967.

Sakrison, D. J. , "Stochastic Approximations: A
Recursive Method for Solving Regression Problems, "
Advances in Communication Systems, Vol. 2, 1966. ,

 

Academic Press, New York, 1966.

Saridis, G. N. and G. Stein, "Stochastic Approxima-
tion Algorithms for Linear Discrete-Time System
Identification, " Proc. NEC, 1967.

 

Schreider, Methods of Statistical Testing, Elsevier
Publishing Co. , New York, 1964.

 

[SCH]

[SG-l]

[so—z]

[SK]

[3%U

[SP]

[STU

[TSY]

[WE—1]

[WE-2]

PW]

[WOL]

97

Scheffler, I. , The Anatomy of Inquiry: Philosophical
Studies in the Theory of Science, A. Knopf Publ. Co.,
New York, 1963.

 

 

Sage, A. B. and W. C. Choate, "Minimum Time
Identification of Non-Stationary Dynamic Processes, "
Proc. N.E.C., 1965.

 

Sage, A. P. and B. R. Eisenberg, "Experiments in
Nonlinear and Non-Stationary System Identification via
Quasilinearization and Differential Approximation, "
Proc. J.A.C.C., 1966.

 

Sklansky, J. , "Learning Systems for Automatic Control, "
IEEE Trans. on Automatic Control, Vol. AC-11, No. 1,
Jan. 1966.

 

Smith. B. W. , "Parameter Estimation in the Presence
of Measurement Noise, " Int. Jrnl. of Control, Vol. 3,
NO. 4, pp. 297-312, 19660

 

Spragins, J. D. , "A Note on the Iterative Application of
Bayes' Rule, " LEEE Trans. on Info. Theory, IT-ll,
No. 4, pp. 544-549, Oct. 1965.

 

Steiglitz, and Mcbridge, "A Technique for Identification
of Linear Systems, " IEEE Trans. Automatic Control,
AC-lO, Oct. 1965.

 

Tsypkin, Y. Z., "Adaptation, Learning and Selflearning
in Control Systems, " Third IFAC Conjress, Buttenworths,
London, June 1966.

 

Wiener, N. , The Extrapolation, Interpolation and Smooth-
i_r1g of Time Series, John Wiley and Sons, New York,
1949.

 

 

Wiener, N. , Nonlinear Problems in Random Theory,
John Wiley and Sons, New York, 1958.

 

Wilde, D. J., Optimum Seeking Methods, Prentice-
Hall, Englewood Cliffs, New Jersey, 1964.

 

Wolverton, C. T., and J. T. Rawgen, " A Counter-
example to Dvortetzky's Stochastic Approximation
Theoreml' I.E.E.E. Trans., IT-14, No. 1, pp. 157-
158, Jan. 1968.

 

98

[Z-l] Zadeh, L. A. , ”From Circuit Theory to System Theory, "
Proc. IRE, 50, pp. 856-865, May 1962.

 

[Z-Z] Zadeh, L. A. and C. A. Desoer, Linear System Theory:
The State Space Approach, McGraw-Hill Book Co. , Inc. ,
New York, 1963.

 

nICHIan STATE UNIV. LIBRARIES
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
31293009958731