ASYMPTOTIC CONVERGENCE 0F
NON-LINEAR, CONTINUOUS-TIME FILTERS»

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
MICHAEL WESLEY BIRD
1969

 

‘ __
—_.‘_

  

LIBRA R Y ‘
Michigan Start-

University

 

This is to certify that the
thesis entitled

ASYMPTOTIC CONVERGENCE OF
NON-LINEAR, CONTINUOUS-TIME FILTERS

presented by

Michael Wesley Bird

has been accepted towards fulfillment
of the requirements for

O \ ,/’-\
M degree mi;

xii ) \
/.

 

Major professor

0-169

// ‘ I (r/ 1/5,- -’, - I
, .z ___;’/_‘,.t‘// ‘1 ’ ./ ’

 

aunx mom ma

   

 

BINDING BY

HMS & SONS'I‘ ‘

LIBRARY BINDERS

 
 

 

 

ABSTRACT

ASYMPTOTIC CONVERGENCE 0F
NON-LINEAR, CONTINUOUS-TIME FIETERS

by

Michael Wesley Bird

Modern non-linear, continuous-time filtering theory, char-
acterized by the use of state variable techniques, is potentially
applicable to many real-world problems. However, the filters
suggested by the theory as solutions to the time-domain filtering
problem have not been sufficiently analyzed to provide confidence
in their behavior. One aspect of non-linear filter performance
which has received no attention is asymptotic behavior. This thesis
suggests a method for demonstrating the asymptotic performance of
non-linear filters and applies this method to a class of continuous-
time, non-linear filters in the scalar case.

For this investigation, the time-domain filtering problem
consists of a message process, represented as the solution to a
first-order differential equation with unknown initial condition,
and an observation process, modeled as a signal containing the
message to which white noise is added. Filters considered in this
thesis are sequential, being represented by first-order differential
equations which are identical to the model of the message process
plus a correction term. The correction term consists of a gain
function, which depends on the past of the filter output, multiplied

by the difference between an estimate of the observation and the

Michael Wesley Bird

observation itself. This structure is similar to the filter made
popular by Kalman and Bucy for a related linear filtering problem.
The non-linear, continuous-time filters considered here are not
postulated to have any optimal properties; however, when the message
process has a probability distribution, the gain function can be
selected to make the filter approximately-optimal, providing nearly-
minimum-variance estimates of the message.

In the discrete-time case, the stochastic approximation methods
developed for parameter estimation led to asymptotic convergence
theorems for sequential filters. These theorems show that the
difference between the output of certain filters and the message pro-
cess converges to zero, with probability one and in the mean square
sense, as time increases. This thesis applies the same strategy to
the continuous-time problem.

Stochastic approximation methods are developed for the con-
tinuous-time parameter estimation problem. The procedure developed
uses sequential algorithms for estimating a parameter of a signal
when the signal is observed in the presence of white Gaussian noise.
Two theorems are proved which show that, when the signal models and
the gain functions in theestimators satisfy certain conditions, the
estimators converge asymptotically to the true parameter value. The
proofs of these theorems rely on the properties of Ito calculus and
super-martingales.

Relying on the concepts developed for continuous-time para-
meter estimation, two asymptotic convergence theorems are proved
for a solution to the filtering problem. In these theorems, con-

ditions are placed on the message process, observation process, and

Michael Wesley Bird

gain functions in the filters that guarantee asymptotic convergence.
For purposes of evaluation, these results are compared to a theorem
which specifies the asymptotic behavior of the Kalman-Bucy filter
which is a solution to a linear filtering problem.

The convergence theorems developed for filtering show the
asymptotic convergence of a particular sequential filter which is an
approximately-optimal filter. A filter is thus displayed whose out-
put, under certain conditions, provides nearly-minimum-variance
estimates of the message throughout the observation time interval
and converges to the message process as time increases. Computer-
simulated results show the behavior of this filter when applied to

a non-linear filtering example.

ASYMPTOTI C CONVERGEN CE 0F
NON-LINEAR, CONTINUOUS-TIME FILTERS

By

Michael Wesley Bird

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical Engineering

1969

ACKNOWIEDGEMENTS

For consistent confidence in the preliminary work and guidance
in the actual writing of this thesis, the author wishes to thank
Professor Richard Dubes, his teacher and advisor. His counsel and
understanding are greatly appreciated.

For their consideration of this thesis, the author wishes
to acknowledge the other members of his doctoral committee: H. G.

Hedges, C. L. Park, R. B. Zemach, and H. Salehi.

ii

TABLE OF CONTENTS

Page
ABSTRACT
ACKNOWLEDGEMENTS ii
LIST OF FIGURES v
LIST OF APPENDICES vi
Chapter
1. INTRODUCTION ....................................... 1
1.1 Performance Criteria for Solutions of the
Modern Filtering Problem ........ .. .......... 2
1.2 Literature Review ........................... 7
1.2.1 Linear Filtering Theory - the Kalman-
Bucy Filter ............... . .......... 8
1.2.2 The Optimal Non-Linear Filter
Solution .... .................. . ...... 9
1.2.3 Sub-Optimal Sequential Filters with
Known Asymptotic Behavior ............ 11
1.2.4 The Discrete-Time Filtering Problem .. 11
1.2.5 Optimum Discrete-Time Filtering
Theory .............................. 13
1.2.6 The Method of Stochastic Approximation
in Parameter Estimation .............. 14
1.2.7 Recursive Filters .................... 15
1.2.8 Continuous-Time Stochastic Approx-
imation Methods ...................... 16
1.3 Thesis Objectives ........................... 18
1.4 Thesis Outline ........ . ..................... 20
2. A FILTERING PROBLEM AND A CLASS OF SEQUENTIAL
SOLUTIONS ......... . ............. . ...... . ........... 22
2.1 The Non-Linear Filtering Formulation ........ 22
2.2 An Approximately-Optimal Filter ............ . 24
2.3 Sub-Optimal, Sequential Filters ............. 28
2.4 Summary .......... . ..... . ............... ..... 29

iii

CONVERGENT ALGORITHMS FOR PARAMETER ESTIMATION .....
3.1 Preliminary Considerations ..................
3.2 The Main Convergence Theorem ................
3.3 An Alternate Convergence Proof ..............
3.4 Particular Gain Functions ...................
3.5 Summary .....................................

CONVERGENCE INVESTIGATIONS FOR THE FILTERING
PROB IEM OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

4.1 Assumptions and Preliminary Derivations .....
4.2 The Convergence Theorems ....................
4.3 A Convergence Theorem for the Linear Kalman-
Bucy Filter .......... . ......................
4.4 A Class of Gain Functions Guaranteeing
Convergence .................................
4.5 Summary .....................................
AN EXAMPLE OF A FILTERING PROBLEM ..................
5.1 The Example and Computer Simulation .........
5.2 Summary .....................................
CONCLUSIONS ........................................
6.1 Conclusions and Results .....................
6.2 Extensions ..................................
BIBLIOGRAPHY .......................................

iv

31
33
37
41
44
46
47

48
51

55

61
64

67

67
7O

73

73
75

77

LIST OF FIGURES

Figure Page
1. Summary of Literature Review ..................... 19
2. Filter Behavior for Two Runs ..................... 7O
3. Statistical Analysis of Filter Performance ....... 71

LIST OF APPENDICES

Append ix Page

A STOCHASTIC DIFFERENTIAL EQUATIONS AND THE ITO
CALCULUS ......................................... 81

B a-ALGEBRAS, ITO CALCULUS, AND MARTINGALES ........ 87

C THE INFINITE DIMENSIONAL REPRESENTATION FOR THE
OPTIMAL FILTER ................................... 91

D IMPORTANT PROPERTIES OF STOCHASTIC DIFFERENTIAL
EQUATIONS SIMILAR TO EQUATION 2.15 ............... 93

vi

CHAPTER 1

INTRODUCTION

In the process of transmitting or gathering information, the
signal that contains the information frequently becomes distorted.
It is usually necessary to modify this signal in order to remove
the distortion and recover the original message. This act of
modifying the signal is called filtering. In other words, the
essential purpose of a filter is to remove the distortion or noise
from an observed signal.

The design of a filter requires mathematical models that
represent the signal and observation processes. Wiener [1] was
the first to treat these processes as random phenomena and
described them in statistical terms. His treatment allows filters
to be designed from criteria based on the statistics of these pro-
cesses; he points out that his method of developing filters com-
bines the techniques of random time-series and conventional elec-
trical filter theory. More recently, Kushner [2], Bucy [3],
Kalman [4], and Deutsch [5], still using the random descriptions,
have developed filters based on fundamental statistical methods
other than those of Wiener. The approaches of Wiener and Kalman
are the most popular methods for designing filters.

The investigations in this thesis are based on the filtering
formulation proposed by Kalman [4] and Kushner [2]. Section 1.1
outlines this formulation and discusses extensively criteria for

designing filters. A survey of many solutions to the filtering

1

problem is given in Sec. 1.2 pointing out some limitations of the
filtering results presently available and indicating the need for
studies into the asymptotic behavior of statistical filters. In
Sec. 1.3, the objectives of the thesis are discussed and a brief

summary of the main results of the thesis is given in Sec. 1.4.

1.1 Performance Criteria for Solutions of the Modern Filtering

Emblm

During the last eight years, a special form of the statistical
filtering problem has received a vast amount of attention in engi-
eering literature. This form of the problem uses modern modeling
techniques and is distinguished from the traditional by its for-
mulation, the message process being described by a stochastic
differential equation. With E. denoting the n-dimensional message

process, or signal process, the message representation is:

|x°

= _f.(t,z<_) + 9(t)g(t> ; 50:0) = _c_ (1.1)

where .£(t,x) is a set of n known functions of the n components
of g_ and t, RAP) is a known n X n matrix, gﬂt) is an n-dimen-
sional, zero-mean, Gaussian, white noise process,1 and g. is an
unknown constant vector. Equation (1.1) represents a Markov process
and completely determines the probability distribution of x(t).
This distribution determines all moments and correlation prOperties

of the process.

 

1 An r-dimensional white noise process, say wﬂt), is a process
with the prOperty that Ewi(t)wj(s) = 5(t-s)6ij for l s i,j, s r,

where is the Kronecker delta and 5(t-s) is the generalized

bij
delta function.

The remainder of the filtering formulation defines the m-
dimensional observation process, y(t), which is assumed to have the

conventional model:

ﬁt) = mayo) + raffle) (1.2)

where hﬁt,§) is a set of m known functions of the n vector .5
and t, Rﬂt)% is the square root of the positive definite m X m
matrix ‘R(t), and gﬁt) is an m-dimensional, zero-mean, Gaussian,
white-noise process.

The problem is to utilize this formulation to develop useful
filters. At this time, no restrictions will be placed on the form
of the filter, other than that its input must be the finite past of
the observation process and its output must approximate the message
process x(t). The type of approximation provided by the output
depends on the performance criteria used in selecting the filtering
solution. The performance criteria or, stated in different words,
the requirements for a good solution,are selected according to the
requirements of each particular filtering application. A few such
applications and standards for judging the quality of a filter will
be discussed to enumerate features of useful filters.

The communication model, for example, has recently been placed
into the framework of the modern filtering formulation described
above [16]. The solutions to the filtering problem are used as
receivers at the output of noisy channels and are intended to
demodulate the message which has been contaminated by channel noise.
This modern approach to filtering is applicable to the communication

problem because the resulting filters can give reasonably accurate

z.

estimates during the transient phase of the filter's response. The
transient reSponse has long been a difficulty with classical Wiener
filters since they are designed from steady-state error criteria
[1]. However, even though the modern filter has this advantage,
its performance as a communication receiver is still judged on the
quality of its steady-state behavior.

A second example, the aerospace guidance problem, has been
one of the most studied applications of filtering theory [7]. The
time-varying parameters describing the position of a space vehicle
are designated as the message. Noisy measurements of these para-
meters are filtered to give estimates of the parameters from which
the vehicle's location can be established. The filter must be
analyzed in terms of three design criteria: 1) the asymptotic
behavior of the error variance; 2) the amount of time required to
converge into the asymptotic state; 3) the computational feasibility
of the algorithm used in synthesizing the filter.

Since guidance systems are Operated on a real-time basis for
extended periods, the third criteria becomes very important in the
implementation of the filter. Intuitively, it appears that the
first two criteria can be satisfied if the filter is designed to
give an accurate estimate of the message during the transient period,
in which case both the asymptotic error variance and the length of
the transient period should be reduced. However, excessively
stringent accuracy requirements increase the complexity of the
filter, so a trade-off may be necessary to satisfy all three criteria.

A final example, system identification, may be placed in the

filtering formulation [8], [9]. The typical identification problem

demands a filter which produces estimates of system parameters from
noisy measurements of the system's input and output. The unknown
parameters are either constants or varyslowly with time. For any
reasonable identification procedure, the filter's output must con-
verge in some probabilistic sense to the true parameter as time
increases. A secondary, but still important, consideration is the
amount of time required for convergence. The convergence time may
be minimized if the estimates are as accurate as possible during
the entire time interval. The identification scheme is frequently
a small segment of a very large and complicated system so constraints
must be placed on the complexity of the algorithm. Consequently, a
trade-off between complexity and accuracy may be necessary.

These three applications demonstrate the type of real-world
problems which may be formulated using the time-domain representations
defined in (1.1) and (1.2). Since the filter characteristics cited
in these examples are typical of real problems, the following three
requirements for a good and useful filter are proposed.

R.l The output of a good filter should accurately estimate the
message throughout the observation interval. For the filtering
problem in this thesis, there are a few statistically-based
methods which provide a filter satisfying this requirement. The
most commonly used method selects the filter which minimizes the
mean-square difference between the message process and filter
output for each time instant in the observation interval. The
filter minimizing this performance index is represented by the
conditional mean functional. This procedure is widely used and,

since the filter is a solution to an optimization problem, it is

commonly called the optimal filter.

R.2 Analytical results should be available that describe the
asymptotic behavior of a good filter as t a m. Examining the
solution to time-domain filtering problems can bring out the type
of analysis needed. The earlier discussion indicates that the
filter has two phases of operation, the transient-response phase
and a type of asymptotic phase. For a wide variety of real
problems, this second phase will determine the filter's useful-
ness. For these problems, a good transient response is, of course,
ideal, but not at the expense of asymptotic behavior. Any filter-
ing algorithm that guarantees the filter's large-time behavior
will place restrictions on the message and observation models.
These restrictions assure that tfmzfluctuations of the estimate

of the message will be confined by a calculable bound.

R.3 A useful filter must not be too complex, so a sequential
algorithm is required. An algorithm is sequential when the
filter's output is defined by the solution to a finite-dimen-
sional system of first order differential equations, where the
derivatives are functions of the present value of the observation
process, present value of the system solution and, possibly, a
specified functional of the past solution. This might seem to

be an unreasonable limitation for an optimal filter. However,

the time-domain approach to filtering yields Optimal or nearly-
optimal solutions which are sequential. In fact, many present-
day problems are being formulated in terms of this modern approach
in order to obtain sequential algorithms with nearly-optimal

properties. Optimal sequential filters, plus the computer

technology for efficiently implementing them, are providing use-
ful solutions to many previously-unmanageable problems.

Any filter should attempt, in some sense, to satisfy these
three criteria. However, the approaches to filtering in the
literature are concentrated on deriving approximately-optimal
filters, with little or no study of their asymptotic behavior.
Analytical investigations of asymptotic behavior are very difficult
and appear to be a major barrier in the utilization of useful filters.
Considering the difficulties inherent in the analysis of the optimal
filter, it is logical to study other filters, without stringent
specifications on transient reSponse, but which are more amenable to
mathematical treatment than the optimal algorithm. These algorithms
will be sequential and, along with well-analyzed asymptotic behavior,
attempt to satisfy the requirements enumerated in R.2 and R.3. This
thesis will be primarily concerned with initiating an investigation

into this type of filter.

1.2 Literature Review

To better understand the state of filtering theory and to
further reinforce the previous discussions, a summary of existing
results is presented in this section. This synopsis will not only
include available solutions to the filtering problem discussed in
Sec. 1.1, but will also summarize the publications on the discrete-
time filtering problem.

At this point, it is necessary to make the distinction be-
tween continuous-time and discrete-time processes. Equations (1.1)
and (1.2) define waveforms specified on the entire time axis, called

continuous-time processes. In contrast are the discrete-time

processes which are sequences of variables defined on only a countable
number of time instants.

The following continuous-time investigations are surveyed in
Secs. 1.2.1, 1.2.2, and 1.2.3: the well-established linear theory,
the optimal approach to non-linear filtering, and the completely un-
explored realm of asymptotically stable, sequential filters. A review
of the optimal and sub-optimal approaches to discrete-time filtering
are contained in Secs. 1.2.4 through 1.2.7. All these discussions
place a heavy emphasis on reporting studies, or the need for studies,
into the asymptotic behavior of filters. The discrete-time survey
will suggest exploring the method of stochastic approximation as an
approach to investigating the large-time performance of continuous-
time filtering solutions, so this section will conclude with an out-
line of the few publications on continuous-time stochastic approx-

imation.
1.2.1 Linear Filtering Theory - the Kalman-Bucy Filter

Assuming .£(t:§) = Fﬂt)x. and hﬂt,x) = Hﬂt)§, the filtering
formulation defined in (1.1) and (1.2) becomes linear. It is also
assumed that g_ is Gaussian with zero mean and variance 2, Kalman
and Bucy orginated this formulation in 1961, and developed a very
effective and useful solution to the filtering problem [4]. They
derived an optimal filter which produced the conditional mean, the
minimum-variance estimator. Their solution has been shown to have
the following properties:

A. The filter's output is determined from the solution to two
differential equations. Denoting xﬁt) as the n-dimensional

output at time t, the equations are written as:

9

lx’.
II

at): + 2(t)§(t)T[z(t) - sum : goo) = o (1.3)
i: = at): + no? - m<t>Tg<cflg<t>g + ggT : 20:0) = L (1.4)

This optimal algorithm is sequential.

B. The solution to (1.4), commonly called a Riccati equation, is
the error variance, since .£(t) = E(§(t) - xﬁt))(gﬁt) - xﬂt))T.
Equation (1.4) does not depend on observations and can be solved
prior to actual implementation of the filter.

C. Kalman and Bucy formulated observability and controllability
conditions which insure adequate asymptotic behavior of this
optimal filter [10].

The optimal solution (1.3), commonly called the Kalman-Bucy
filter, satisfies R.1, R.2, and R.3, the postulates for useful
filters. This optimal, sequential, asymptotically stable filter
found immediate use in the fields of guidance and control. More
recently, its popularity has Spread to the fields of system iden-
tification, pattern recognition, and differential encoding of

television signals [11].
1.2.2 The Optimal Non-linear Filter Solution

Significant work on optimal non-linear filtering solutions
began in 1964, with important contributions from Kushner [2], Bucy
[3], Stratonovich [12], and Kallianpur [13]. Their primary aim was
establishing the conditional mean functional and, because of the non-
linear nature of the problem, their derivations required a rigorous
treatment of detail. Initially, the mathematical model for the
Gaussian white noise disturbance was interpreted as the formal

derivative of an independent-increment process, Brownian motion,

10

which requires an understanding Of Ito's stochastic differential
equations [3]. Next, the message process, observation process, and
conditional mean process were represented as elements in an abstract
functional Space. Finally, a stochastic differential equation was
derived for the conditional mean using Ito's special differentiation
rule.

The stochastic differential equation representing the solution
for the conditional mean appears initially to be neat and concise.
However, careful examination shows that the optimal filter is de-
scribed by an infinite-dimensional system of first-order stochastic
differential equations, each with the observation process as a
driving term. Furthermore, the entire system requires a simultaneous
solution. These two facts make the optimal filter impossible to
realize.

The present emphasis in optimal, non-linear filtering research
is centered upon deriving algorithms that approximate the optimal
filter. The standard approach is to represent the non-linear
functions in (1.1) and (1.2) as series expansions and to retain the
first few terms. The infinite system of differential equations de-
scribing the optimal filter then reduces tO a finite set and a
realizable algorithm is possible. The approximations made on the
non-linear functions restrict the applications of the filter so a
filtering solution must be developed for each class of problems.
Kushner [14] discusses the shortcomings of a number of such approx-
imation schemes.

Almost all approximately-optimal filters are sequential. An
examination of the system of stochastic differential equations de-

fining the Optimal filter shows that most finite approximations will

ll

be sequential. The demand for a filter with a sequential structure
instigated the now-prodigious amount Of research on the time-domain
approach.

While many investigators have been successful in deriving
approximately-optimal filters, analytical studies of the error char-
acteristics, or asymptotic behavior, for such filters are not avail-
able. All work devoted to the properties of such filters involve
lengthy computer simulations for very specific problems. A recent
survey paper has pointed out both this void in the literature and

the reluctance of users to apply the present non-linear filter

theory [15].
1.2.3 Sub-Optimal Sequential Filters with Known Asymptotic Behavior

Without general analysis of the behavior Of Optimal, non-linear
filters, the real-world user does not have confidence in their be-
havior. However, the demand for sequential, non-linear filters
exhibits a need for sub-optimal schemes whose performances are under-
stood and analyzed. As Sec. 1.1 indicated, there are numerous
applications where large strings of data must be processed using a
minimal amount Of equipment; these applications require filters with
little storage and good large-time behavior. The present state Of
optimal filtering theory cannot supply algorithms with satisfactory

properties for such tasks.
1.2.4 The Discrete-Time Filtering Problem

Presently, there are no published investigations Of the
asymptotic performance of continuous-time filters. Insight into
this area may be gained from a review of the discrete-time filter—

ing results. A brief outline of the discrete-time filtering

12

approaches is presented next, emphasizing investigations into
asymptotic behavior.

The structure of the discrete-time filtering problem is similar
tO that of the continuous-time problem in Sec. 1.1. The n-dimensional

message process is:

k = o,1,2,... (1.5)

X

—k+l = £15331.) + Qkﬂk’

where {2k} isan.r-dimensiona1, zero-mean noise sequence with
ij ..
E u u = ,, 1 s 1 s r f is a known n-vector Of fu '
k m 6km613’ ,J , _k(xk) nct1ons,

and 9n is an n X n matrix. The m-dimensional observation equation

is

1k = 91511.) + Rink (1.6)
where {3k} is an m-dimensional, zero-mean noise sequence,
E viVj = 6 6 1 s i j S m E viuj = 0 and R25 is the s uare root
1, k Lk ij’ ’ ’ 4, k ’ —k q

of a positive definite matrix, Bk'

The filtering problem is to determine a function of the past
Observations which will furnish a good estimate of the present message
value. The function transforming the Observed data into a sequence
of estimates represents a filter.

The discrete-time filtering problem has received a great deal
more attention than its continuous-time counterpart. This may be
attributed to the availability Of results in classical statistics
which concern random phenomenon described by sequences of random
variables and which have been applied directly to the discrete-time

problem. Investigations into the filtering problem are Split into

two groups. When the noises in (1.5) and (1.6) are Gaussian, the

13

Bayes-Optimal approach concentrates on finding a function representing
a conditional mean. The other group of investigations are studies of
asymptotically-stable, recursive, but non-optimal algorithms. An
algorithm is recursive when its output is the solution to a difference
equation with the (n+l)st output depending on the nth output and nth

or (n+l)st Observation.
1.2.5 Optimum Discrete—Time Filtering Theory

Many investigators have suggested a representation for the
conditional mean of the message, given the observations, as a solution
to the discrete-time filtering problem [16], [17], [18], [19]. This
Optimal solution is only employed when the noises {2k} and {2k}
are Gaussian. As could be expected from the non-linear nature of the
problem, a closed-form solution for the general optimal filter does
not exist. The research effort has been channeled into derivations
of approximate algorithms. Aoki [16] discussed the following recursive

algorithm for an approximately-Optimal filter solution:

3n+1 = £main) + 5n+1[1n+1 ' hn(£n(ﬁn))] (1‘7)

A

where g“ is the filter's output and [Kn], the gain sequence, is
determined from a set of recursive equations. This filter has had
numerous applications and is included here for reference.

The development of discrete-time, non-linear filters is
hindered by the lack of asymptotic (large n) results. Albert and
Gardner [20] and Pearson [21] both emphasize the need for work in
this area and indicate the necessity Of studying convergent, recur-
sive filtering schemes. These authors rely heavily on the methods

of stochastic approximation as means for generating sub-Optimum

l4
filtering algorithms as outlined in the next section.
1.2.6 The Method of Stochastic Approximation in Parameter Estimation

Stochastic approximation algorithms are used for estimating
parameters of Signals which are observed in the presence of noise.
Albert and Gardner [20] have applied the method of stochastic

approximation to problems where the Observations have the form
= h + .
yn “(9) vn (1 8)

where e is an unknown parameter, {vn} is a zero-mean process with

E v v = 5

k n and hn(e) is a known function of n and e. The

nk’
noise distribution need not be known. If e is considered a message,
the problem of estimating 9 can be considered a special case of
the filtering problem. The scalar case is treated here to simplify
the notation.

In parameter estimation, as in filtering theory, the problem
is to process the Observation sequence and estimate the value of the
unknown constant. The method of stochastic approximation attempts

to find a zero Of a regression function and results in the following

recursive algorithm which solve the estimation problem.

xn+1 = xn + an[yn - hn(xn)] : x0 15 arbitrary (1.9)

where xn is the nth estimate Of e.

The next step is the formulation Of conditions which guarantee
that ;g - e converges to zero as n a m with probability one or in
the mean-square sense. These conditions are illustrated in the state—
ment Of the following stochastic approximation convergence theorem

[20]. The hypotheses are:

15

T1. hn(x) is assumed monotone in x and differentiable;
T2. For each n, the sign of an is equal to
the sign Of h (x) where h (x) = 2'"h (x)'
n n 3x n ’

m .
T3. E. b a ==m with bn = inf \hn(x)I;

n 1 n n x
m 2
T4. 2 a < an.
n
n=1

Then xn A e as n a m with probability one and in mean-square.
Stochastic approximation algorithms are traditionally associated
with parameter estimation problems where noise models are unknown, large

amounts of data are processed, and simple computations are required.

1.2.7 Recursive Filters

Generalizing on the method of stochastic approximation, Pearson

[21] and Albert and Gardner have suggested the recursive algorithms

§h+1 = fn(;h) + an[yn - hn(;h)] (1.10)

with ;g the filter output as solutions to the general filtering
problem defined in Sec. 1.2.4 with m = n = r = 1. Equation (1.10)
is similar to the approximately-optimal filter in (1.7). These
authors reported on the reSults of initial investigations into the
application of stochastic approximation methods to the determination
of recursive filters with desirable asymptotic characteristics.
Pearson considered a Special case of the problem, assuming

un = 0 and hn(x) = hx. Then, with the hypotheses

0° co
2
[f(x) ' f(Y)‘ 5 ‘x ‘ Y‘a 2 an = ”a E a < w, a Stochastic
n=1 n=1 “
approximation convergence theorem was applied showing that
E(;] - xn)2 a O as n a m. The last two hypotheses are identical

with T3 and T4 of Sec. 1.2.6. The Lipschitz condition on f(x)

16

was a sufficient restriction on the message dynamics to allow use of
the convergence theorem from stochastic approximation. Recently
Wolverton [22], using the same hypotheses, proved convergence with
probability one.

Albert and Gardner [20] considered the general filtering pro-
blem. Their hypotheses are lengthy and provide no motivation for
this discussion. The important point is the complete reliance of
their convergence proofs on techniques developed in stochastic
approximation convergence proofs. Their main conclusion was:

lim sup E(;$ - Xn)2 3 known bound.

Two recent theses, one in pattern recognition, the other in
system identification, illustrate the possibilities of using stochastic
approximation theory in developing approximately-optimal convergent
algorithms. Blaydon [23] is concerned with a pattern recognition
problem in which a probability density function, specified in terms
of unknown parameters,is to be learned. He derives a recursive
estimation algorithm via a minimum—mean-Square criteria and shows
convergence to the true parameter value with a stochastic approx-
imation theorem. In system identification, Donoghue [24] attempts
to formulate an algorithm for learning unknown system parameters
from noisy observations Of the input and Output. He approaches the
estimation problem from a Bayes-Optimal point of view and eventually
derives an approximate solution. A stochastic approximation result

is applied to show convergence Of the algorithm.
1.2.8 Continuous-Time Stochastic Approximation Methods

The stochastic approximation method has proved useful in

studying the asymptotic behavior of discrete~time filters. The

17

extensions of the method to continuous-time processes available in
the literature are examined in this section. Only a few investi-
gations have been made into convergent, sequential algorithms for
continuous-time parameter estimation problems. The major difference
between the formulation of the observation process in Sec. 1.1 and
the models available in the literature is that Sec. 1.1 places more’
stringent restrictions on the correlation properties of the obser-
vation noise than the models in the literature. Driml and Nedoma
[25] and others [26] only assume the noise is stationary and ergodic,
while Sakrison [27] adds a complicated condition on the spread of
the correlation function. In contrast is the white noise assumption
in (1.2).

Initially, the investigations in the literature appear appli-
cable tO the parameter estimation study needed for the continuous-
time filtering problem. However, detailed analyses of the theorems
indicate that the proposed algorithms and methods Of proof are very
specialized and not applicable to the general filtering problem.
Sakrison [27] and others [26] postulate a structure for the Obser-
vation process which does not resemble (1.2) and their entire de-
velopment requires this special structure. Only Driml and Nedoma
have a formulation compatible with (1.2). They assume very re-
strictive estimation algorithms and these Special forms fit neatly
into a convergence proof relying on the Law of Large Numbers. This
method of proof cannot be extended to the algorithms needed for
filtering.

All the continuous-time, stochastic approximation papers

mentioned above rely principally on the Law of Large Numbers for

18

their convergence proofs. This appears to be the only method avail-
able for treating general, non-white, observation noise; to extend
this approach to more meaningful estimation algorithms may require
generalizations or extensions of ergodic theory.

Figure 1 summarizes approaches to discrete-time and con-

tinuous-time filtering.

1.3 Thesis Objectives

The emphasis of the research reported in this thesis is on
formulating an approach to the non-linear filtering problem of (1.2)
by which asymptotic convergence of filtering algorithms can be
exhibited and on demonstrating useful analysis techniques. In a
sense, the discrete-time approach to asymptotically-stable, recur-
sive algorithms will be generalized to the continuous-time filtering
problem.

Investigators Of discrete-time problems relied upon stochastic
approximation methods to prove their convergence results but sto-
chastic approximation methods have not been developed for the con-
tinuous-time filtering problem. The first major objective Of this
thesis will be to state and prove convergence theorems analogous to
theorems from the discrete-time stochastic approximation literature.

After developing an approach to continuous-time stochastic
approximation, the second objective will be to examine a class Of
sequential solutions for the filtering problem defined in Sec. 1.1.
Since the general problem is very difficult to study, this thesis
will concentrate on the scalar, noiseless-message case; i.e.

3(t) = O, m = n = l in (1.1) and (1.2). Under these assumptions,

the message may be considered to be a time-varying parameter

Bow>om manumuouﬁq mo kumessm .H ouswwm

 

 

l9

 

 

 

 

 
   

 

 

 

maoaaoua
wcﬁuouaﬁm Homoeow
ocoz oeoz ocoz ocoz you muouaww unowuo>eoo
Hoawua01>aoumawxoumm<
mu .em umwom coHumEﬁumo amuoswuwm pow
manaawm>o mssuwuowﬁo ucowuo>aoo
ocoz ocoz ocoz mowvsum sow < HwawumOuhaoumswxoumm<
maoanoum
Hm .om "mmmm wawumuawm kuocom now
.oanwaﬂm>m manmowaaam ammo mowuuomoua oquumexmm
ocoz ocoz memosum 30w < ammo: unoocoamoaH nuﬂs manuwuowa<
om
.mm .mu Homom AcoHumBonuaaa owumOLOOOmv
.oaanwm>a mm .om "mama oHnoowaaam ammo cowuweﬁumo Houosaumm now
nuanmou 36w < mcoz oowvsum HHo3 ammo: uOOOOOQOOOH manuﬁuowam ucowum>cou
ma .om .m~ SH .mH
«coz .SH .NH .m "mama ma Mama .GH .NH "mama mumuﬁaa
mOHSmou xcmz .>w:um Ono .muasmou has: I HoEHum0uhaoumaonuaa<
cwwmmsoo cmﬂmmsau cwwmmsww chHumwwumo>cH
ouﬁnzuaoz use ouwna ucoc .unovcoaoocH can ucoocoaoocH
"oowoz "omwoz "omﬁoz "omwoz Boanoum
Osweumsoaaﬁucoo oaHHuououumﬁn waaumuaam

 

 

20

specified by the solution to a deterministic differential equation
excited by an unknown initial condition. The ideas developed for
the stochastic approximation section will be utilized to suggest

filtering algorithms whose error variances converge to zero.

1.4 Thesis Outline

The discrete-time results for recursive filters with known
asymptotic prOperties cited in Secs. 1.2.6 and 1.2.7 were de—
veloped for problems where the noise distributions were unknown.
The analogous situation in the continuous-time filtering problem
is to assume 2211 white noise in (1.1) and (1.2). However, the
more general non-Gaussian case Of filtering cannot be treated
because the mathematical techniques available for handling non-
linear stochastic differential equations are restricted to pro-
blems involving Gaussian, white noise. This restriction has
not been effectively explained in engineering literature, so
Appendix A is devoted to representations of white noise and their
relation to stochastic differential equations.

The non-linear filtering problem treated in this thesis is
defined rigorously in Chapter 2. To supplement later discussions
in Chapters 3 and 4, an approximately-optimal filter is derived
based on a representation for an Optimum filter. The class of
sequential filters, whose asymptotic convergence will be in-
vestigated in later chapters, is also defined and discussed.

The original contributions of this thesis begin with the
statement and proof of two theorems in Chapter 3 similar to theorems
from discrete-time Stochastic approximation. The filtering problem

is attacked in Chapter 4 and the techniques develOped in Chapter 3

21

are utilized to demonstrate convergence for the class of sequential
filters defined in Chapter 2. This thesis marks the first time that
stochastic approximation ideas have been applied to the problem of
estimating a time-varying parameter described by a differential
equation. Accordingly, every aspect of this work is original.

The last section of Chapter 4 specifies conditions on (1.1) and
(1.2) which guarantee the asymptotic convergence of the approx-
imately-optimal filter defined in Chapter 2. These conditions
indicate the existence, under certain conditions, of nearly-
optimal filters satisfying the requirements of good and useful
filters.

Chapter 5 examines an example Of a non-linear filtering
problem. The performance of the approximately-Optimal filter
documented in Chapter 2 is displayed by means of error pro-
files generated from a digital computer simulation.

The results of the thesis are reviewed in Chapter 6 and
conclusions are drawn concerning the application of this approach

in demonstrating asymptotic convergence. A number of extensions

of these investigations are proposed.

CHAPTER 2
A FILTERING PROBLEM AND

A CLASS OF SEQUENTIAL SOLUTIONS

Chapter 1 outlined the general time-domain filtering problem.
This chapter is devoted to rigorously defining the particular pro-
blem of concern in this thesis. In subsequent chapters, the emphasis
will be on investigations into the asymptotic convergence of a class
of sub-optimal, sequential filters; this class is defined and dis-
cussed in Sec. 2.3. Chapters 3 and 4 will indicate that this class
of sub-optimal filters satisfies two of the three goodness criteria
discussed in Chapter 1. In hopes of displaying filters meeting all
three performance criteria, Sec. 2.2 derives a nearly-optimal filter

to which the convergence results will be applicable.

2.1 The Non-linear Filtering Formulation

This thesis examines solutions for a particular case of the
general filtering problem defined in Sec. 1.1. Although the signal
and Observation processes were defined using conventional engineering
models, the type of analysis performed in later chapters necessitates
a rigorous mathematical representation for these processes. The
reason for the special care being taken to define the models of these
processes is to avoid the technical controversies which have plagued

investigations of Optimal, continuous-time, non-linear filters [31],

[32].

22

23

The following definition of the scalar "noiseless-message-
filtering" problem is a specialization of the general definition
given in Sec. 1.1. The signal process is assumed to be the solution

to a first-order differential equation:

95351 = f(t,x<c>>

dt x(to) = b, an unknown constant (2.1)

and f(-,~) is a specified function. The absence Of a noise driver
in this equation explains the "noiseless" adjective. The function
f(t,-) is assumed to satisfy all sufficient conditions which guar-
antee the existence, uniqueness, and continuity of the solution to
(2.1). A complete discussion of these conditions is given in
Coddington and Levinson [33].

The observed process y(t) retains the structure of Sec. 1.1;
y(t) = h(t.X(t)) +V(t) (2.2)

where h(°,°) is a known function and v(t) is a zero-mean,Gaussian,
white noise process with correlation function Ev(t)v(s) = 5(t-s).
The structure of (2.2) is general enough to make y(t) a model of
many physical processes. Kailath [34] discusses when (2.2) is a
reasonable representation for real-world processes.

Appendix A discusses the mathematical complications of de-
fining a white noise representation which will provide valid solu-
tions for non-linear stochastic differential equations involving
this white noise. The appendix indicates the necessity for re-
stating the observed process, (2.2), in terms of an independent-
increment, Brownian motion process B(t). Applying Ito's stochastic
integral when manipulating equations having the form (2.2) eliminates

the use of delta functions; Ito calculus provides consistent meanings

24

for all equations. Specifically, z(t) is defined by (2.3) and (2.2)

is taken as equivalent to (2.4).
z(t) = f:0y(s)ds (2.3)

dz(t) = h(t,x(t))dt + dB(t) (2.4)

Equation (2.4) is equivalent to (2.2) if (2.4) is divided
formally by dt and ggiEl is interpreted as the Gaussian white
noise, v(t). Equation (2.4) is interpreted as being equivalent to

(2.5), a random integral equation.
I: t
z(t) - z(to) +~jtoh(s,x(s))ds + [todB(s) (2.5)

The first integral in (2.5) is the ordinary Riemann integral,
while the second is Ito's stochastic integral. In addition to Appendix
A, a recent paper by Wonham [35] briefly surveys the properties of (2.4).

Equations (2.1) and (2.4) define models of the signal and
observation processes. The formulation of the noiseless message pro-
blem may be completed by summarizing the discussion of filter solu-

tions in Sec. 1.1. The observed waveform is denoted by

{mpg—(il- .0....}=.

ds , A filter may be defined as a

t t'
0,

functional mapping of yt t into a real variable for each t 2 to
3
o

The filtering problem is considered solved when a particular mapping
has characteristics which satisfy the three goodness criteria of

Optimality, asymptotic convergence, and sequential structure.

2.2 An Approximately-Optimal Filter

 

This section develops a filter which is approximately-Optimal

and sequential. A proper evaluation Of any approximation requires

25

the optimal solution to the filtering problem. It is important to
note that, when considering an optimal solution to the filtering
problem, a probability distribution will be assigned to the initial
condition on the message process, x(to) = b. Since optimal filters
are concerned, at least in part, with accurate estimates of the
message during the transient portion of the filteré' time responses,
it is mandatory that all moments of the initial condition be known.
A precise development of an optimal filter is contained in
the papers of Kushner [2] and Kallianpur [13]. The stochastic
differential equation for the expectation of any function, g(t,x(t)),
of the message, x(t), and the time, t, conditioned on the Observed
waveform z is developed in these papers; conditioning on 2

t,t t,t
0

o
is equivalent to conditioning on y by (2.3). Letting

t,t
0

L8(t.X(t)) = f(t.X(t))é(t,X(t)), where g(t.X) = g;’8(t.X). this

stochastic differential equation is

d§<t> €§<t>dc + [éh<c) - g(t)h<t>]tdz<t> - ﬁtt>dt1 (2.6)

where

s(t) EIg(t.x(t))\zt ,t]

0

{g(t) = E[f(t,X(t))é(t,X(t))\zt ,t]

O

éi<c> = Etgtt.x<c>)h<t.x<t>>|zt ,.1
O

B(t) Ethcc,x<t>>lzt ,.T
0

ﬁ<t> E[x(t)|zt ’t].

0

When g(t,x) = x, (2.6) becomes

da(t) = f(t)dt + [§h(c) - R(t)h(t)][dz(t) - h(t)dt] (2.7)

26

with 2(t) the conditional mean. This thesis will follow the con-
vention of calling the differential equation (2.7) the representation
of the optimal filter, even though there are other legitimate choices
for an "optimal" filter.

In general, the optimal filter (2.7) can never be realized
since an infinite-dimensional system of Stochastic differential
equations must be solved to determine the random functions f(t),
4h(t), and h(t). Appendix C displays this infinite-dimensional
system.

Approximations must be made to produce a finite set of equa-
tions. An approximation scheme is now described. The structure of
the resulting filter is very similar to that of the linear Kalman-
Bucy filter (Sec. 1.2.1). The conditional mean R(t) is assumed to

be in the neighborhood of x(t) so that

f(t.x(t)) f(t.%(t)) + f(t,%(t))[x(t) - s<t>1 (2 8)

h(t.x(t)) was» +ﬁtt,s(t>>tx(t> - stun . (2.9)

It is also assumed that the third conditional moment,
E([R(t) - x(t)]3]zt ,t) é 0. (2.10)
0
An examination of these assumptions should provide insight
into how well the approximately-optimal filter described in (2.13)
and (2.14) will satisfy the goodness criteria R.1, optimality. This
approximate filter, to be studied in Chapter 4, is now derived using
(2.6) - (2.10). Equation (2.8) shows that f(t) é f(t,s(t)) (2.11)

so that (2.7) becomes

27

dx(t) é f(t,x(t))dt + P(t)h(t,%(t))[dz(t) - h(t,x(t))dt] (2.12)

where P(t) =1: (t) - x(t)2.
The function P(t) can be approximated from (2.6) and Ito's Lemma

(Appendix B). Letting g(t,x) = x2 and using (2.6) along with (2.8) -
(2.10),

/3~ .
dx (t) E 2 R(t)f(t,x(t))dt + 2 f(t,ﬁ(t))P(t)dt
. . /§ ’72 .
+ h(t,X(t))[X (t) - X (t)X(t)][dZ(t) - h(t,ﬁ(t))dt].
Applying Ito's lemma, Appendix B, to (2.12) gives
dx(t)2 e 2 x(t)f(t,x(t))dt + 2 x(t)h(t,ﬁ(t))P(t)[dz(t) - h(t,x(t))dt]
+ [h(t,x(t))P(t)]2dt.

The differential of P(t) is now approximated by

dP(t) = dE:>(t) - s(t)21 s 2 f(t.%(t))P(t>dt - [h(t,R(t))P(t)]2dt
A o

+ h(t,§(t))[x3(t) - x (t)x(t) - 2 R(t)P(t)][dz(t)-h(t,§(t))dt
The condition (2.10) can be rewritten as

x - 3 x a + 2 a = x - x a - 2 a? 5 0.

The simultaneous integration of the following stochastic differential

equations provides an approximately-optimal filter; x(t) is the

filter output.

dﬁ(t) = f(t,s(c>>dt + P<t>ﬁ<t,s<c>>tdz<t> - h(t.s<t>>dt1; 5Ht.) = E<b>
(2.13)
dpgt) =

dt 2 f(t,x(t))P(t) - [h(t,x(t))P(t)]2 ; P(to) = Var (b) (2.14)

V‘

28

Recalling the equivalence of (2.2) and (2.4) and dividing by

dt permits (2.13) to be interpreted as
$2451 = mam) + P<c>ﬁ<c,s<t>>[y<t> - h(tamn

Retaining (2.13) in its special differential form will emphasis
the necessity of using Ito calculus for its solution. Equation (2.14)
does not contain any white noise terms and may be treated as an
ordinary differential equation. This development is a simplified
version of work performed by Bass, Norum, and Schwartz [29].

The approximately-optimal filter (2.13) - (2.14) was derived
from a solution which minimized the error variance at each instant
in the observation interval. Bellman obtained the same equations by
using a least-squares criteria and the theory of invariant imbedding
[30]. Friedland and Bernstein also derived this filter as an approx-
imate solution to the filtering problem analyzed from the maximum
likelihood approach [18].

This filter, besides being a common academic filtering solution,
has been applied to numerous real-world problems [19], [36]. Its
performance was analyzed in a recent paper by Stear and Schwartz;
their computer simulation compared several approximately-optimal
filters and showed no distinction between this approximation scheme
and other, more nearly optimum filters [37]. It may thus be concluded
that this approximate filter is a realistic solution to the optimal

filtering problem.

2.3 Sub-Optimal, Sequential Filters

The following sequential filters are suggested as solutions

to the filtering problem defined in Sec. 2.1:

WI‘J

 

is

V8

vi

NO

9):

 

th

in

th

St

ma

a1

29
d§(c) = f(t,;(t))dt + a(c,§t t)[dz(t) - h(t,x(t))dt] (2.15)
0,

where x(t) is the filter output with an arbitrary initial value

x(to). The gain term, a(t,;' ), is a time-varying function of the

t ,t
o

t t = [x(s): to s s s t]. The structure of (2.15)
O,

is very similar to that of the approximately-optimal filter in (2.13) -

output waveform x

(2.14). In fact, if a(t,;£ ,t) = P(t)h(t,;(t)) and the initial con-
ditions are set equal, the filters are identical.

The filtering solutions proposed in (2.15) provide filters
which will fulfill two of the requirements for a useful filter; R.2,
asymptotic convergence, and R.3, sequential structure. These require-
ments can be met even if x(to) is not specified so no prior knowledge
is required Of the initial condition x(to) in (2.1).

The outstanding feature of (2.15) is the random gain term,
a(t,-). This gain term must satisfy conditions guaranteeing con-
vergence but, otherwise, has an arbitrary form. It may be possible,
within these convergence limitations, to Specify a gain function which

would allow the filter to fulfill the optimality requirement R.l. An

example of this idea would be to set a(t,;£ t) = P(t)h(t,;(t)). On
0’

the other hand, the discussions in Secs. 1.1 and 1.2.3 indicate that
in many applications one may prefer that a(t,-) be deterministic,
thus providing the filter with a simple, asymptotically convergent

structure .

2.4 Summary

Equation (2.15) introduces a class of sequential filters which
may include an approximately-Optimal filter. The convergence of

algorithms from this class will be the main concern of the remaining

30

chapters.

This chapter also introduces the concepts of Ito calculus and
stochastic differential equations into the non-linear filtering pro-
blem. An understanding of these concepts has enabled investigators
to develop an acceptable optimal filter theory. This knowledge and
a familiarity with the discrete—time stochastic approximation methods

lead to asymptotic convergence studies.

CHAPTER 3

CONVERGENT ALGORITHMS FOR PARAMETER ESTIMATION

Many of the statistical methods which have been developed for
the classic problem of estimating unknown parameters have provided
approaches to filtering problems. The literature review in Chapter
1 showed that stochastic approximation schemes, developed for dis-
crete-time parameter estimation, suggest ideas for analyzing the
asymptotic behavior of the continuous-time filtering algorithms
prOposed in Chapter 2. This chapter concentrates on the problem
of continuous-time parameter estimation by developing two con-
vergence theorems analogous to theorems from discrete-time stochastic
approximation.

The filtering problem discussed in Sec. 1.1 may be con-
sidered equivalent to a problem in estimation theory where the
filter's output provides an estimate of the present message value.
For this reason, the problem of determining, or estimating, para-
meters is a special case of the filtering problem with f(t,x) = 0
in (2.1). Optimum solutions for both filtering and parameter
estimation problems are based on similar theorems and procedures
[12]. There is, however, a conceptual difference between the two
problems. A filter's output attempts to follow the time variations
of the message process; when the message is random, the output can
never follow the message exactly. The parameter estimation problem,
where the message is time-invariant, requires an estimation algorithm

31

32

which eventually determines the SEES value of the unknown constant.
It is for this reason that any study of a practical parameter
estimation scheme must include an analysis of asymptotic behavior.

This chapter analyzes the large-time behavior of the estimator
represented by (2.15) with f(t,x) = O. This sub-Optimal estimator
is shown to converge asymptotically to the true value of the para-
meter. The chapter is broken into three parts. Section 3.1 contains
a list of assumptions regulating the behavior of the observations,
(2.4), and the gain function in (2.15). Also, two lemmas are de-
veloped for later use. In Sec. 3.2 and 3.3, the assumptions are
used as hypotheses for two convergence theorems. These theorems
show that certain estimators converge to the true parameter with
probability one and in the mean-square sense. The theorems are a
major result and indicate that the sequential estimators defined
in (2.15) behave asymptotically as practical estimators.

The convergence theorems do not require any structure for
the gain functions, so in the last part of this chapter a specific
class of gain functions is prOposed. Corollary 3.3 shows that this
class of functions satisfies the assumptions of the main conver-
gence theorems and that the approximately-optimal filter derived
in Chapter 2 and specialized in this chapter has a gain function
which is a member of this class of functions. SO, under certain

assumptions on the observation process, an approximately-optimal
algorithm is shown to converge to the true parameter value. This
is a major result since it implies an estimation scheme which

satisfies the three requirements of a useful filter.

33

All aspects of this chapter are new and original. Convergence
of the estimator algorithm (2.15), in which the information supplied

by the past observations is contained in a(t,xt t), has never been

0,

proved for a continuous-time problem. The assumptions given in Sec.

3.1 are directly analogous to the discrete-time assumptions of Albert

and Gardner [20] but the proofs of the convergence theorems (NT not
,,

follow in a straight-forward manner.

3.1 Preliminary Considerations

This section provides the groundwork necessary for deriving
the convergence theorems in Sec. 3.2. This basic material includes
the formulation of the parameter estimation problem, a description
of the estimation schemes being investigated in this chapter, two
useful lemmas, and the hypotheses for the convergence theorems.

The parameter estimation problem and possible solutions con-
sidered in this chapter are defined using equations equivalent to
those in Chapter 2. The parameter to be estimates is e and if
x(t) = x(to) = e, (2.4) represents noisy observations of this un-

known parameter. This observed process is modeled by
dz(t) = h(t,e)dt +~dB(t) (2.4)

with giSEl = y(t) the observed process and B(t) a Brownian motion
process. In the first three sections of this chapter, 9 is not
assigned a prior distribution. The class of sub-optimal estimation
algorithms proposed in (2.15) with f(t,x) = 0 are considered as

the possible estimators for 9:

d;(t) = a(r,§{t t)[dz(t) - h(t,x(t))dt] : £(ro) arbitrary (3.1)
O,

34

where x(t) is the estimate of e at time t.

The estimator error is denoted as e(t) = x(t) - e and,
according to (2.4) and (3.1), has the following differential re-
presentation.

de(t) = a(r,§£t ,t)[h(t,e) - h(r,§(r))]dr + a(t,;t ,t)dB(t) (3.2)
o 0

An examination of (3.2) brings out many of the difficulties
inherent in a study of theasymptotic convergence of e(t). First
of all, (3.2) is a stochastic differential equation containing the
differential of Brownian motion. Strict attention must be paid to
the properties of Ito calculus in all further derivations. Since
(3.2) is non-linear and since a(t,;;o,t), the gain function, is
unspecified, an analytical solution for e(t) is impossible. Some
description of the behavior of the gain function is mandatory and
assumptions must be placed on h(t,x) before any inferences can be
made about e(t) and/or e(t)2.

The development of convergence theorems for equations such
as (3.2) is not obvious. In a survey of the discrete-time stochastic
approximation literature, Martingale theory was often used to obtain
convergence with probability one [38], [39], [40]. Martingale theory
and Ito calculus seem very compatible and both are used in later
proofs. A survey of stochastic approximation methods does suggest
certain assumptions about h(t,x) and a(t,;[ ,t) which can be
used effectively to Specify the asymptotic behavior of the esti-
mation error. This background of discrete-time investigations pro-
vides a fruitful source of ideas for analyzing (3.2).

Before listing the assumptions sufficient for convergence,

two lemmas are stated and proved. These lemmas are applied in later

35

proofs both in this chapter and in Chapter 4.

Lemma 3.1 Let w(t) be a stochastic process satisfying
t
w(t) s w(r) +'Irg(s)dB(s) for t,r 2 to _(3.3)

where B(s) is a Browian motion process. Assume that At is a
c-algebra, that w(t) is measurable on At’ and that

At(: F{[to,t], 3(5)}, the c-algebra generated by B(s), tO s s s t.
Also assume that g(t), a random function, is measurable on
F[[to,t], 3(3)]. Then w(t) is a positive super-martingale.2

Proof. Apply the expectation relative to Ar to each side of (3.3).

E{w(t)|Ar] s w(r) + E[I:g(s)dB(s)‘Ar] (3.4)

For w(t) to be a super-martingale the second term on the right side
of the inequality must be zero. A property of conditional expecta-

tions3 shows that
Eif:g(s)dB(s)|Ar] = r(r(§:g(s)do(s)|r([ro,r], B<u>11IArI

Since g(s) is measurable on F{[to,s], B(u)] the conditional

expectation satisfies property B.2; i.e.
t .
Etfrg<s>dn<s>IFttto.r1, B(u)}] = o (3.4a)

This completes Lemma 3.1.
t
Lemma 3.2 If m(t) >'0 and M(t) = c + [t m(s)ds with c < m

arbitrary and if M(t) « m as t a m then

It m(s) ds diverges if a s 1
t0 M(s)d

converges if a > 1

 

Consult Appendix B for discussion of c-algebras and martingales.

Doob [41], p. 37.

36

Proof. Assume a s 1; there exists a T0 > tO such that for t > T
M(t) z 1. Now for a 2 TO and b 2 O and because of the monotone

behavior of M(t),

Ia+b m(s)ds 2 l ra+b

l a+b
a m(s)” M(a-I-b)a a m(swssza m(sws

The right hand side can be integrated to obtain

a+b m(s)ds 2 M(a-I-b) - M(a) = 1 _ M(a)
fa M(s)d M(a+b) M(a+b)

The monotone property of M(t) shows that for every 3 2 To, there

exists a T 2 T such that for a+b > T , 1 - EXEZ--—'> %
o M(a+b)

This contradicts the necessary Cauchy condition [42] for the con-

vergence of

I: m(s)ds

O M(S)oz

Assume a > 1, let a = l + 5 where 6 > O, examine the integral

J’EO 53—8- [M(s)]-6ds = M(t)-6 - M(tO)-6: - M(ro)’5 < co

Evaluate the derivative in the integral.

d -6 d 1
ds d8 [[c +-I: m(u)du]g]
o

. 6[c +]‘: m(o)do]5'lm(s)
O

 

 

[c +-I: m(u)du]26
O

 

_ 5 m(s) = _ 6 m(s)
[c +]‘: minnow)”6 M(s)a
O

O

3

37

So
'6
f: ﬂ§-qus~§£§3— as t-oao.
o M(s)
Lemma 3.2 is complete.

The following assumptions are made for (2.4) and the gain
function in (3.1). These assumptions are the hypotheses for the
main convergence theorems proved in Secs. 3.2 and 3.3 and are analogous
to the assumptions Albert and Gardner [20] formulated for their dis-
crete-time observation equation and gain sequences.

Assumption 1: For each value of t, h(t,x) is monotone and differ-

entiable with respect to x.

Assumption 2: The function a(t,x; t) is measurable on F[[to,t], ;(s)],
o,

the a-algebra generated by x(s), where to S s s t. The space of all

real continuous functions defined on the time interval [to,t] is

denoted by CE; h(t,x) and a(t,x[ t) are assumed to satisfy all

0 0’ _
conditions necessary to guarantee the continuity of x(t). (Appendix
D).

t
Assumption 3: For each t, the Sign of a(t,X) for all X 6 Ct is

 

 

o
constant and equal to the sign of h(t,x) = §;'h(t,x).
Assumption 4: I: b(t)inf|a(t)]dt = m where b(t) = inf]h(t,x)\
o x
and iniIa(r)\ = inf \a(r,X)I
t
XEC
to
Assumption 5: I: SUp]a(t)]2dt < m where sup‘a(t)| = supt ]a(t,X)‘
o XGCt
o

3.2 The Main Convergence Theorem

The theorem to be Stated and proved in this section shows that
the output of the estimation algorithm, x(t) in (3.1), converges, as

t increases, to the value of the unknown parameter. A basic

38

martingale theorem provides both convergence with probability one and
in the mean-square sense.

Before the actual statement of Theorem 3.1, an equation is
developed for the square of the estimation error using some of the
assumptions listed in Sec. 3.1. The following simplified notation

will be used for the remainder of this chapter: a(t,;; t) = a(t)
0’

Equation (3.2) exhibits the estimation error. Assumption 2
and the development in Appendix D show that a(t) is measurable on
F[[to,t], B(s)], so, if Z(t,e) = e2, Ito's lemma (Theorem B.l) can

be applied to (3.2) producing a differential equation for e(t)2.
de(t)2 = 2 e(t)a(t)[h(t,9) - h(t,;(t))]dt + a(t)2dt + 2 e(t)a(t)dB(t)
This equation is equivalent to (3.5).
e(t)2 = e(r)2 +1]: 2 e(s)a(s)[h(s,e) - h(s,§(s))jds +-]:a(s)2ds
+‘f: 2 e(s)a(s)dB(S) ; t,r 2 t0 (3.5)
The mean value theorem [42] gives:
h(s,§'<s>) = We) +hts,o(s>)t§(s> - e]

e s ¢(s) s §(s) if §(s) 2 e

where _ _
x(s) s ¢(s) s e if x(s) s e

Assumption 3 implies that:
a(s)h<s,o<s>) = Ia<s>IIﬁ(s.o<s>>I >. o

Combining these results with (3.5) gives

39
e(t)2 s e(r)2 - 2 j:\a(s)I|h(x,¢(s))|e(s)2ds an]: sup|a(s)|2ds
+ 2 f:e(s)a(s)d3(s) (3.6)

Equation (3.6) will be a basic equation both in the following
proof and in the proof of Theorem 3.2 in Sec. 3.3.
Theorem 3.1 Under Assumptions 1-5, the process x(t) defined by
(3.1) converges to 9 with probability one and in the mean square

sense:

lim ;(t) = e w.P.l. and 1im E(x(t) - e)2 = O.

t-co t—uan
Proof. Equation (3.6) provides an inequality for the squared error.
The second term on the right hand side of (3.6) is negative, thus

establishing a simple and very convenient inequality for e(t)2.
e(t)2 s e(r)2 +j‘t sop|a(s)|2os + 2 jte(s)a(s)d3(s) : r t 2 t (3.7)
r r ’ 0

Defining w(t) = e(t)2 +-f: sup‘a(s)]2ds (3.8)

and substituting into (3.7),
w(t) s w(r) + 2 f:e(s)a(s)d3(s) (3.9)

A review of the proof up to this point emphasizes the importance
of Assumptions 1 through 3 in obtaining a transformation of e(t)2
which satisfies an equation of the form (3.9). It has already been
pointed out that a(t) is measurable on FI[to,t], B(s)] and Since
;(t) is measurable on this o-algebra, e(t)a(t) is also. The error
squared is measurable on F{[to,t], E(s)} and it follows directly
that F{[to,t], e(s)2]<: F[[to,t], B(s)]. SO when At is defined

equal to F{[to,t], e(s)2], Lemma 3.1 indicates that w(t) is a

40

positive super-martingale.
With the process w(t) being a super-martingale, the function
Ew(t) is a non-increasing function of t and since it is bounded
below it follows that4
lim Ew(t) = Eg = K.< m (3.10)
64:
A combination of (3.8), Assumption 5, and (3.10) shows that
, 2
11m Ee(t) = EC (3-11)
t—m
Returning to (3.6) and noting that inf|a(s)]b(s) s ‘a(s)‘lh(s,¢))|

allows another weakening of the inequality on e(t)2.

e(t)2 s e(r)2 - 2 I: inf‘a(s)\b(s)e(s)2ds + [: sup|a(s)|2ds

+ 2 [Ee(s)a(s)ds(s) (3.12)

The expectation of each side of (3.12) is taken and Fubini's theorem

is applied to the second term on the right hand side.
Ee(t)2 s Ee(r)2 - 2 I: inf]a(s)]b(s)Ee(s)2ds +-f: sup‘a(s)‘2ds
+ 2 s[f:e(s)a(s)d3(s)] (3.13)

The proof Of Lemma 3.1, specifically the part deriving (3.4a),
t
can be reapplied to Show E[Ire(s)a(s)dB(s)] = 0- (3-14)

Applying (3.14), (3.13) becomes
2 2 t . 2 t 2
Ee(t) s Ee(r) - 2 fr inr|a(s)\h(s)Ee(s) ds +-jr sop\a(s)\ ds (3.15)

The application Of (3.11) and Assumption 5 to (3.15) indicates the

necessity of (3.16).

 

Doob [41], chapter 7.

41

2 I: inf‘a(s)]b(s)Ee(s)2ds < m for t 2 to (3.16)

If (3.16), (3.11), and Assumption 4 are all to be satisfied, it is

necessary that

2
lim Ee(t) = 0, so the mean-square convergence is assured.
t—m

The super-martingale inequality (Theorem B.3) shows conver-

gence with probability one. The inequality gives

Prob{ sup w(u) > e] < [Ee(r)2 +‘r: sup|a(s)‘2ds]%' for each t 2 r.

t2u2r (3. 17)
The definition in (3.8) provides the following relationship
Prob[ sup e(u)2 > e} < Probf sup w(u) > e]. (3.18)

t2u2r t2u2r
Mean-square convergence and Assumption 5 are now combined with (3.17)
and (3.18). Given an e > O and 5 > 0, there exists a R > 0 such

that for every t > R

Prob[e(u)2 > e : t 2 u 2 R] < 6

thus

lim e(t) = 0 W.P.l..

t—am

This completes the proof of Theorem 3.1.

3.3 An Alternate Convergence Proof

The proof of Theorem 3.1 was structured with convergence with
probability one in mind; mean-square convergence was a by-product.
The mean-square result was unexpected since, in all known stochastic
approximation theories, demonstrations of mean-square and of prob-

ability-one convergence have required separate theorems. The theorem

42

in this section demonstrates another way to prove the mean-square pro-
perty. The techniques used in its proof will be applicable to part

of the filtering investigation in Chapter 4.

Theorem 3.2 Under Assumptions 1-5 the process x(t) defined in (3.1)
converges in the mean-square sense to 6; lim E(;(t) - 9)2 = 0.

.nggf, The inequality in (3.6) is weakeneth: applying the fact that

infla(s)‘b(s) s |a(s)“h(s,¢(s))‘. The expectation of each side is

taken to give
2 2 t . 2 t 2
Be (t) g Ee (r) - 2 fr 1nf‘a(s)\b(s)Ee (s)ds + fr sup‘a(s)] ds. (3.19)

The expectation of the last term in (3.6) is zero for the
reasons given in Theorem 3.1. The remainder of this proof relys on
2 .
the construction of a function which dominates Ee (t). The funct1on

e(t) is defined as the solution to

.%§$£l = - 2 b(t)iof|a(t)|e(t) + sop|a(t)|2 : C(to) = Ee2(to) (3.20)

Equation (3.20) is a linear equation having the following solution:
e(t) = D(t t )Ee2(t ) +-It D(t s)sup|a(s)]2ds (3.21)
’ o O to ’

In (3.21), D(t,s) = exp[- 2 f:b(u)int\a(o)]doj (3.22)

Equation (3.20) is now written as an integral equation.

t t 2
= - ’ d + d ° .
e(t) c(r) 2 [rb(s)1nf|a(s)]c(s) s Irsup‘a(s)] s, t,r 2 to (3 23)

Substracting (3.23) from (3.19),

[Ee2(t) - e(t)] + 2 f:b(s)infla(s)I[Eez(s) - c(s)]ds s [Ee2(r) - C<r>1.
(3.24)

43

2
Equation (3.24) indicates that c(t) 2 Ee (t). To verify this, suppose
it is not true. Then there exists a time T1 such that

Ee2(T1) - c(T1) = 0. If r = T1 and t = T1 + 5 where 6 is the

amount of time Ee2(t) - e(t) is positive, then 5 > 0 since Ee2(t)

and e(t) are continuous and (3.24) provides a contradiction.
Equation (3.21) shows that

Ee2(t) s D(t,to)Ee2(to) +-I:0D(t,s)sup]a(s)\2ds (3.25)

The estimator defined in (3.1) converges to 9 in the mean-
square sense when the two terms on the right hand side of (3.25) con-
verge to zero. The first term is easily handled by an application of

Assumption 4 to D(t,to) so that
D(t,to) a O as t a m. (3.26)

Assumption 5 and (3.26) must both be used on the second term.
Arbitrarily selecting e > 0, a T > 0 can be found so that

t
IT sup‘a(s)‘2ds < 3/2 and D(t,to) s 1 for t > T. Writing the

integral term under investigation as a sum Shows that

I: D(t,S)suP‘a(S)‘2ds s fésupia(s)|2ds +-IEOD(t,s)sup‘a(s)‘2ds for t > T-

 

o
. d . d = e
The constant 32 1s ef1ne as 32 ZIT su ‘a(s)‘2ds
top
There exists a T2 > T such that D(t,s) < 32 for t > T2 and s < T.
Therefore, for every e > 0 there exists a T2 > 0 such that for
t > T2 ,

e I: sup‘a(s)|2ds
t 2 a 0 ._
D(t,s)sup a(s) ds 5 + - ,
Ito ' ' 2 2 I: Sup‘a(s)‘2ds e
o

 

44

which shows that lim Ee2(t) = O.
t-m
This completes the proof.

3.4 Particular Gain Functions

The hypotheses in Sec. 3.1 guarantee the convergence to e of
the estimator defined in (3.1). However, they are very general.
This section lists a Specific class of gain functions which satisfy
the five assumptions of Sec. 3.1. Before discussing the gains, a
few assumptions about h(t,x) are listed which are a combination of
one of the assumptions in Sec. 3.1 and two new ones. These new
assumptions can replace those of Sec. 3.1 when the gains of this
section are used in (3.1).

Assumption 81: The function h(t,x) is monotone and differentiable

 

in x for each t.

 

 

 

 

t O
Assumption 82: Sup ‘§%E% S Kt< m where B(t) = sup]h(tsx)i
Cost/w X
, , 2 , 2 t 2
Assumpt1on S3: 11m B(t) =tm Wlth B(t) = It b(s) ds
b4: 0
Qggollary 3.3 With Assumptions Sl - S3, gain functions a(t,;L t)
0’
having the form of (3.27) satisfy Assumptions 1-5 of Sec. 3.1.
_. 2 - _.
a(t,Xt ,t) sgn h(t,x(m
-' o
a(t,X ) = 3 (3'27)
t ,t - t ‘- 2
o o‘(t,xt ’t)[c +'ft y(s,xt ,5) ds]
0 o o
where the three functions g(t,xto,t), a(t,xt0,t), and y(t,xto,t)

are less than g(t), are greater than, b(t)' and are measurable

on F[[to,t], x(s)].
Proof. Assumptions 1-3 are satisfied by inspection. The integrand

of Assumption 4 is now examined.

45

3 3
b(t)inf‘a(t)‘ 2 b(tlt 2 2 3 b(t) t 2
g(t)[c + ft g(s) ds] K h(t)[o'+j‘t b(s) ds]
0 O

This shows that

 

2
I: b(s)inf\a(s)|ds 2 15-]: 13(8) 8 2 ds.
0 K o [c +-It b(u) du]
0

Lemma 3.2 and Assumption S3 indicate that Assumption 4 is satisfied.

 

 

 

Similarily,
2 g(t)2 2 K4b(t)4
sup‘a(t)‘ S t 2 2 t 2 2
b(t)[c + ft b(s) ds] b(t) [c +J‘t b(s) ds]
0 o
which shows that
t 2 4 t b(s)2
Itosup‘a(s)‘ ds 5 K I ds .

to [c + f: b(u)2du]2
O

Assumption 5 also follows from Assumption 83 and Lemma 3.2.

One gain function which satisfies (3.27) is of unusual inter-

est. This function is

m; )= h(t,x(tn

, _ (3.28)
to" W? +j': h(s.x<s>)2ds1
0

where P = var e. For the discussion of this gain and the correspond-
ing estimator, e, the unknown constant, is assigned a prior distribu-
tion and the requirement x(to) = E(e) is made for (3.1). To better

understand the significance of (3.28) P(t) is introduced.

1

P(t) = _ . _
[P 1 +-J: h(s,x(s))2ds]
o

Differentiating,

46

dP(t) _ g_ 1
dt dt [Pu1 +-I: h(s,;(s))2ds]
o

b (mignz

- - 2

_ . _. = ' [h(t,x(t))P(t) - (3-29)

[P 1 +-I: h(s,x(s))2ds]2 ]
0

Now, (3.29) is the same as (2.14) when f(t,x) = 0; i.e.,when para-
meter estimation is considered as a filtering problem. The algorithm
specified when the gain function (3.28) is placed in (3.1) defines an
approximately-Optimal estimator of 9. Theorem 3.1 and Corollary 3.3
show that this estimator asymptotically converges to 9 with prob-
ability one. Thus, the algorithm provides an estimator that qualifies
as a useful algorithm by satisfying in some sense, all three perfor-

mance criteria in Chapter 1.

3.5 Summary

The continuous-time, parameter estimation problem has been
considered in this chapter. The main results, Theorems 3.1 and 3.2,
show that the sub-optimal estimators defined by (3.1) converge to the
true value Of the unknown parameter with probability one and in the
mean-square sense.

A secondary, but still Significant, result was obtained for
the class of gain functions in (3.27). Corollary 3.3 demonstrates
that this class of gains satisfies the hypotheses of Theorems 3.1
and 3.2. In an interesting result, this class is shown to contain
function (3.28). When algorithm (3.1) employs the gain in (3.28),
the algorithm is identical to the approximately-optimal scheme in
(2.13)-(2.14). In other words, the approximately-optimal filter
(2.13)-(2.14), if used for parameter estimation, qualifies as a

useful algorithm satisfying requirements R.l, R.2, and R.3.

CHAPTER 4
CONVERGENCE INVESTIGATIONS FOR THE

FILTERING PROBLEM

In Chapter 3, convergence theorems were developed for a class
of sub-optimal estimators used for solving the continuous-time para-
meter estimation problem. The techniques used to prove these theorems
can be applied in analyzing the behavior of certain sequential, sub-
optimal, continuous-time filters. The main objective of this chapter
is to formulate convergence theorems for the class of sub-optimal
filters represented mathematically by (2.15).

Section 4.1 contains a list of the assumptions which are
sufficient for convergence; basically, they allow the analytical
tools and ideas described in Chapter 3 to be applied to the task of
demonstrating the asymptotic convergence of time-domain filters.

The two convergence theorems are found in Sec. 4.2 and show that the
output of the sub-optimal filter, (2.15), converges to the message

as time becomes infinite. The convergence is both in the mean-Square
sense and with probability one.

Section 4.3 shows that, except for the most general assumptions
on the message and observation models, the convergence theorems de-

veloped for the non-linear filtering problem, can be applied to the

.1—

During the remainder of the thesis the phrases "asymptotic con-
vergence of filters," "filter is shown to converge", etc. means that
the difference between the filter output and message process converges
to zero, in some probabilistic sense, as time increases indefinitely.

47

48

linear filtering problem and that the Kalman-Bucy filter is asymp-
totically convergent. This chapter concludes with Sec. 4.4 where
properties of the message and observations models which guarantee

the convergence of the approximately-optimal filtering solution
developed in Chapter 2 are listed. The convergence ideas presented

in this thesis have demonstrated that a non-linear filter presently
being applied to real-world problems [30], [37], [7] is asymptotically
convergent. This filter meets the three requirements enumerated in
Chapter 1.

The demonstration of a convergent, approximately-optimal
filter illustrates only one type of result which can be realized by
using the mathematical techniques considered in this chapter. The
convergence theorems exhibit the applicability of Ito calculus and
martingale theory to the error analysis of sequential filtering
algorithms. The approach, theoretical proofs, and conclusions have

originated with this thesis investigation.

4.1 Assumptions and Preliminagy Derivations

This section lists the assumptions for the convergence theorems
to be proved in Sec. 4.2 and makes preliminary calculations on the
error equation.

Section 3.1 dealt with the parameter estimation problem.
Assumptions 1 through 5 concerned the behavior of the non-linear
function in the observation process, h(t,x) in (2.4), and the
behavior of the gain function in the sub-optimal estimator, (3.1).
Assumptions of this type are retained in the filtering solutions
suggested in this chapter. In addition, conditions are placed on

the non-linear function defining the message process, f(t,x) in

49

(2.1). Assumptions are also made involving a combination of the

gain function, (2.15), and both h(t,x) and f(t,x). All assumptions
used in Theorems 4.1 and 4.2 are listed below. The first two
assumptions are the same as 1, 2 and 3 in Chapter 3.

Assumption B.1: h(t,x) is monotone and differentiable in x for

 

every t.

Assumption B.2; a(t,x; ) has the same sign as h(t,x) for every

,t
O
t and is measurable on F{[to,t], x(s)}, the o-algebra generated by
t;(s). tO s s s t].

The remainder of the assumptions are given in terms of the

following Simplified notation.

C(t) = SUp f(t,x)
X

b(t) = in£\h(t,x)|
X

¢(t,to) = exptﬁoc<s>ds1

a(t) = a(t,;; )

,t
o
inf|a(t)] = inft |a(t,X)‘
XEC
to
sup‘a(t)] = supt |a(t,X)]
XECt
o

t . . .
where C 13 the space of cont1nuous functions on [to,t].

o
Assumption B.3: f(t,x) is differentiable in x for every t.

2
Assumption 3.4: a) lim sup ¢ (t,to) = K.< m
Tam tZT

b) I: b(S)inf‘a(s)‘ds = m
o

c) I: 52(to,t)sup‘a(t)\2dt < a
o

50

¢(t t )2
Assumption B.4': a) 1im ‘ 9

t
t—m 8Xp[2 It

 

= 0
b(s)inf|a(s)‘ds]
o

b) f:osup‘a(t)]2dt < a

Before the two convergence theorems are stated, a basic equa-
tion involving the filter error is developed from (2.1) and (2.4).
The error at time t is denoted by e(t) = x(t) - x(t). The follow-

ing differential equation can be written from (2.1), (2.4), and (2.15).

de(t) = [f(t,;(t)) - f(t.x<t)>]dc + a(c>th(t.x(t)) - h<t§<c>>1dc

+ a(t)dB(t)

2
If Z(t,e) = e , then Ito's lemma (Appendix B) gives an equa-

tion for e(t)2:

de(t)2 = 2 e(t)[f(t,;(t)) - f(t,x(t))]dt + 2 e(t)a(t)[h(t,x(t)) - h(t,;(t))]dt

+ a(t)2dt + 2 e(t)a(t)dB(t)
The mean-value theorem gives
f(t,xm) = f(c.§<t)> + f(t,a(t))[x(t) - §<t>3

h(t.§(t>> + h(t,e<c>>[x<t> - Em)

h(t,X(t))

where a(t) and e(t) are both in between x(t) and x(t). The
equation for the error Squared becomes:
2 2 ' . 2
de(t) = 2 e(t) [f(t,g(t)) - a(t)h(t,9(t))]dt + a(t) + 2e(t)a(t)dB(t)
(4.1)
Introducing a new variable y(t) = ¢(to,t)2e(t)2, (4.1) leads

to the following differential equation.

51

dy(t) e(t:o.t)2de(t:)2 + e(t)2 g; e(to.t)2dt

2 e<t>2o<co.t>2[f(c.o<t>) - e(t)]dc - 2 o<t0.c>2a<t>ﬁ<t.e<t>)e<t>2dt

+ o<t0.t>2a<c>2dt + 2 otto.c>2e(t>a<t>dB<t)

The definition of C(t) and Assumptions 8.1 and B.2 lead to

the following inequality for y(t).

y(t) s y(r) - I:Z]a(s)‘|h(s,9(s))‘y(s)ds +-f:¢(to,s)25up|a(s)‘2ds

+f:d(tois)28(8)a(8)dB(S) ; t,r 2 t0 (4.2)

Equation (4.2) is the basic inequality used in the proofs of

Theorems 4.1 and 4.2.

4.2 The Convergence Theorems,

In the previous section, (4.2) was derived under Assumptions
B.1-B.3. This equation is significant because it displays certain
Characteristics of the errors produced by the filters defined in
(2.15) and because it is in a form.which allows application of the
convergence techniques developed in Chapter 3. The way in which the
ideas of Secs. 3.2 and 3.3 provide two convergence theorems for the
filters in (2.15) is shown in this section.

The two convergence theorems indicate the potential of using
stochastic approximation ideas for analyzing continuous-time sequential
filters. Theorem 4.1 demonstrates the asymptotic convergence of the
filters in (2.15) with probability one and in the mean-square sense
when (2.1), (2.4), and the gain functions in (2.15) satisfy B.l, B.2,
B.3, and B.4. Theorem 4.2 provides mean-square convergence under

assumptions B.l, B.2, B.3, and B.4'.

52

d
Theorem 4.1 Let y(t) = afiEl- be the observed process defined by

 

(2.4) and let x(t) be the message process defined by (2.1). Then,
if Assumptions B.1 through B.3 and B.4 are satisfied, the output x(t)

of the filter represented by

d;(t) = f(t,;(t))dt + a(t,§[ t)[dz(t) - h(t,x(t))dt]; E(t ) arbitrary
°’ 0 (2 15)

converges to x(t) with probability one and in the mean-square sense.
Proof. The inequality (4.2) can be weakened by noting that the second

term on the Right Hand Side is always negative.
t 2 2 t 2
y(t) s y(r) +Iro<co.s> supIa<s>I ds +fro<to.s> e<s>a(s>dB(s>
Letting w(t) = y(t) +-f:¢(to,8)zsup‘a(s)]2ds gives
t 2
w(t) s w(r) + frz ¢(to,s) e(s)a(s)dB(s) (4.3)

Equation (4.3) satisfies all of the hypotheses of Lemma 3.1,
so w(t) is a positive super-martingale. The previous chapter
stated that the expected value of a super-martingale is non-increas-

ing so that

lim Ew(t) = E; = Kl< m
t-cco

and lim Ey(t) = Eg
t—m

Returning to (4.2) and using the inequality
inf]a(t)‘b(t) s ‘a(t)“h(t,9(t))‘ provides another inequality for
y(t)-
y(t) s y(r) - 2 f:inf]a(s)|b(s)y(s)ds +-I:¢(to,s)zsup‘a(s)|2ds

+-I:¢(to,s)2e(s)a(s)dB(s) (4.4)

53

The expectation operator is applied to (4.4) and the fact,
derived in Lemma 3.1, that E I:¢(to,s)2e(s)a(s)dB(s) = 0 is employed

to obtain an equation for Ey(t).
Ey(t) s Ey(r) - 2 Itinf|a(s)‘b(s)Ey(s)ds +.[t¢(t S>23up‘a(s)‘2ds
r r o’
Combining Assumption B.4(c) and the convergence of Ey(t),
fjin£|a(s)|b(s)sy(s)ds < o for every r 2 to. (4.5)
If (4.5) is to be consistent with Assumption B.4(b),
. . 2 2
11m Ey(t) = 11m ¢(t ,t) e(t) = O.
t-a t—aco o
Since,from Assumption B.4(a), lim sup ¢(t,to)2 = K.< m, the function
2
¢(to,t)2 = 1/¢(t,to) must be non-zero for large t; thus
, 2
11m E e(t) = 0.
t—uoo
The basic super-martingale inequality (Appendix B) demonstrates
probability-one convergence. The process w(t) has been shown to be
a positive super-martingale so
Prob[ sup w(u) > e}‘< EEX£2
r2u2t e
The process w(t) depends on y(t) in such a way that
Prob{ sup y(u) > e] < Prob[ sup w(u) > e]
r2u2t r2u2t
Combining Assumption B.4(c) and the mean-square convergence
result,

lim y(t) = O w.P.l.

t-too

54

It has already been pointed out that ¢(to,t)2 > 0 for large
t so that
lim e(t) = 0 w.P.l.
t-Ieo
The proof of Theorem 4.1 relied on the reasonable assumption
that lim sup ¢(t,to)2 = Kw< a. In Theorem 4.2, ¢(t,to)2 is allowed
to grow large as t approaches a. This theorem differs from Theorem
4.1 both in its list of assumptions (B.4 is replaced by B.4') and
in its conclusions since only mean-square convergence is demonstrated.

Theorem 4;; Under Assumption B.l through B.3 and B.4', the output

 

x(t) of the filter represented by (2.15) converges in the mean-square

sense to x(t):

1im E(x(t) - x(t))2 = o.

t-a
Proof. The inequality inf|a(s)|b(s) s Ia(s)||h(s,e(s))\ and the

expectation of (4.2) show that

Ey(t) s Ey(r) - [:2 iana<s>Ib<s>Ey<s>ds +J§o<to,s)zsup|a(s)|2as;

r,t 2 to.
In the proof of Theorem 3.2, a function which dominates Ey(t) was
shown to exist. Following that argument, the bound for Ey(t)

becomes

Ey(t) s D(t,tO)Ey(t8)+-IEOD(t,s)¢(to,s)zsup]a(s)]zds

where

D(t,to) = exp[- j: inr|a(s)]b(s)ds].
0

55

Now, from the formulation of y(t),

2 .— ._
Ee(t) s ¢(t,to)2D(t,to)Ee(to)2 + ¢(t,t0)2f: D(t,s)¢(to,s)zsup|a(s)]st.
o

 

(4.6)
Assumption B.4'(a) guarantees that
2
2_ Mute)
¢(t,to) D(t,t0) = t _. 0 as t .. as .(4.7)
exp[2 It inf|a(s)|b(s)ds]
o

Letting D(t,to) = ¢(t,tO)ZD(t,tO), (4.6) can be written as
Ee(t)2 D(t t )Ee(t )2 + t D(t s)su ]a(s)]2ds (4 8)
S ’ o o Ito ’ p ' °
Equation (4.8) is the same as (3.25). Equation (4.7) shows
that D(t,s) a 0 as t a a. Assumption B.4'(b) and the same argument
as that used on (3.25) can be applied to (4.8), giving

lim Ee(t)2 = O.

t-aoo
4.3 A Convergence Theorem for the Linear Kalman-Bucy Filter

The two convergence theorems of Sec. 4.2 are self-contained
and developed independently of any other studies into convergence of
filtering solutions, except those of discrete-time stochastic
approximation. To properly evaluate these two new theorems, other
known filtering convergence theorems should be consulted. The only
published convergence results in the area of continuous-time,
sequential filters are for the optimal solution to the linear filter-
ing problem, the problem defined in Sec. 1.2.1. For this Optimal
solution, the Kalman-Bucy filter, it is well known that the filter's
error converges to zero in the mean square sense when the signal

and observation models satisfy certain conditions.

56

One way to use the established linear convergence theorem for
checking the quality of the non-linear theories of Sec. 4.2 is to see
if the linear result can be obtained from the non-linear theorems when
the non-linear functions are reduced to linear functions. This section
proceeds on this basis, checking to see if the assumptions about the
message and observation processes along with the Kalman-Bucy filter's
gain function satisfy the hypotheses of Theorems 4.1 and 4.2.

The linear filtering problem uses the following definition for

the signal and observed processes:

dxgt)

dt F(t)x(t) ; x(to) = b is an unknown constant. (4.9)

dx(t) H(t)x(t)dt + dB(t) . (4.10)

Equations (4.9) and (4.10) are the linear cases of (2.1) and
(2.4), reSpectively. When b is assumed to be Gaussian with a mean
of zero and a variance of P, the optimal, minimum-variance, filter-

ing solution, which is the Kalman-Bucy filter, is represented by

dx(t) = F(t)&(t) + P(t)H(t)[dx(t) - H(t)ﬁ(t)] ; a(to) = O (4.11)
41:: t = 2 F(t)P(t) - [H(t)P(t)]2 ; P00) = P - (“'12)

In (4.11) and (4.12), ﬁ(t) is the filter output at time t and
P(t) = E(R(t) - x(t))z.

Theorem 4.3 describes the asymptotic behavior of the Kalman-
Bucy filter.
Theorem 4.3 Let gz'¢(t,to) = F(t)¢(t,to) with w(to’to) = 1, that

is,

y(t,to) = exp[f:0F(s)ds].

57

 

If
“t,t )2
lim t O 2 2 = 0
tam Ito¢(s,to) H(s) ds
then lim E(2(t) - x(t))2 = O.

tam

Proof. The solution to (4.12) is

 

 

 

_ "(t’to)2
P(t) - P-1 +-f:o¢(s,to)2H(s)2ds
Since
1 l
P“1 +-I: y(s,to)2H(s)2ds S I: y(s,t0)2H(s)2ds for t > to
o o

it follows from the hypotheses that

2
Mme)

2
I:0¢(s,to)2H(s) ds

 

OsP(t)s —.0 as t-ooo-
This theorem is analogous to the multi-dimensional discrete-time
linear filtering result by Aoki [l6] and Sorenson [43].

The object of this section is to see if the non-linear con-
vergence Theorems 4.1 and 4.2 imply the convergence of the Kalman-
Bucy filter from the hypotheses of Theorem 4.3. The function
¢(t,to)2 = (y(t,to)2 (since C(t) = P(t)) may become unbounded so
Theorem 4.2 is checked to see if the gain of the Kalman-Bucy filter
satisfies its hypotheses.

Assumption 8.1 through B.3 are satisfied by inspection, so
only Assumption B.4' need be examined. The gain function for the

Kalman-Bucy filter is

58
o(t.to)2H(t)

[P-1 +-I:O¢(s,to)2H(s)2ds]

 

a(t) = P(t)H(t) =

Define the function

2
e(t.to)

 

tn(t) = t
exp[2 ft b(s)inf|a(s)|ds]
o

where

H<t>2o<c,to>2

 

b(t)inf a(t) = _ .
[P 1 +‘[: ¢(s,to)2H(s)2ds]
O

Letting the denominator of m(t) be

Hts>2o<s,to>2
ds

 

1 t
u(t) = - exp 2 I
P t "1 S 2 2
O[P +‘ft ¢(u,to) H(u) du]
0
produces the differential equation

duct) _ 2 o<c.to)2H<t)2u<t>
dt [P-1 +-I:o¢(s,to)2H(s)2ds]

 

. -1.
. u<to> P

The function w(t) = [P-1 +‘[: ¢(S,to)2H(s)2ds]2 satisfies the
o

differential equation,

dw(t) = 2 ¢(tato)2H(t)2w(t)

dt [P-1 +.[: ¢(s,to)2H(s)2ds]
o

 

 

"Olr—I

; W(to) =

Since w(t) and u(t) satisfy the same linear differential
equation with identical initial conditions, they are equal; i.e.,
w(t) = u(t). Now, to examine Assumption B.4', this equality is used
in m(t).

2 2
o(t.to) ¢(t.to)

m(t) = = _.
exp[2 f: b(s)inf|a(s)‘ds] P[P 1 +‘[: ¢(s,to)2H(S)2ds]2
o o

 

 

59

The hypothesis in Theorem 4.3 is sufficient for

o<t.to)2
lim t = O .
tam exp[2 Itob(s)inf]a(s)|ds]

 

The assumption made for (4.9) and (4.10) in Theorem 4.3 suffices
for making the gain in the Kalman-Bucy filter satisfy Assumption B.4'

(a). Since

¢(s,to)4H(s)2ds

- 2
o [P 1 +-f: ¢(u,to)2H(u)2du]
o

 

[tosup]a(s)‘2ds = I:

the hypotheses of Theorem 4.3 do not guarantee that a(t) satisfy

Assumption B.4'(b). Thus, Theorem 4.2 cannot be used to show that

the Kalman-Bucy filter converges since the gain function may not be
square-integrable.

Conclusions about convergence cannot be drawn from the non-
linear theorems if ¢(t,to)2 is unbounded for large t, so the
hypotheses of Theorem 4.3 will be broken into three separate cases
with one case containing the unbounded ¢(t,to)2 condition.

Case 1: a) lim sup os(t,to)2 = K~< m

b) lim [: ¢(s,to)2H(s)2ds = m
o

tda

Qase 2: a) 1im ¢(t,to)2 = 0

tan

b) limft ¢(s,t )2H(s)2ds = K. < m
t o l
tau 0

Case 3: a) lim sup ¢(t,t0)2 = m
2
o<t.to)

2
I:0¢(s,to) H(s)2ds

 

b) *0 as t-om

60

If f(t,x) and h(t,x) satisfy case 1, case 2, or case 3,
then the functions satisfy the hypotheses of Theorem 4.3. Conversely,
if the non-linear functions agree with the hypotheses of Theorem 4.3,
they fit into one of the above cases. Thus, the three cases are
equivalent to the hypotheses.

Since Theorem 4.1 and 4.2 cannot handle case 3, only the other
two cases will be considered. Case 2 does satisfy the conditions of
Theorem 4.2. Assumption B.4'(a) was examined above and now Assumption

B.4'(b) is obeyed since

dt

 

ott t >4H<t>2

2 ’ O

sup]a(t)‘ dt =. _

Ito J":0 [P 1 +.[: ¢(s,to)2H(s)2ds]2
o

l 4 2
s _2 I: ¢(t,to) H(t) dt
P o
, 2
and w1th ¢(t,to) < l for t > some T > t0

2 1 2 2 1 T 4 2
I: sup\a(t)| dt s gfg‘ﬁ;¢(t,to) H(t) dt + ;:2«fto¢(t’to) H(t) dt

0

l T 4 2
+—-——.
5 K1 P_2 Ito¢(t,to) H(t) dt.

Theorem 4.1 handles case 1. First, Assumption B.4(c) requires
I: ¢(to,s)2sup‘a(s)|2ds be integrable. This follows directly when
pazt (b) of case 1 and Lemma 3.2 are applied to the integral,
o(t.to)2H(t)2

dt.
-1 t 2 2 2
o [P +«fto¢(s’to) H(s) ds]

 

co 2 2
[to¢(to,t) sup\a(t)\ dt = f:

Furthermore, for Assumption B.4(b), part b of case 1 and Lemma 3.2
show that

2 2
m(8.t0) H(S)

o [2'1 +‘f: ¢(u,to)2H(u)2ds]
O

G)

f:Ob(s)inf|a(s)\ds =‘ft

dS =mo

 

61

Consequently, Theorem 4.1 and 4.2 Show that the optimal,
Kalman-Bucy solution to the linear filtering problem is asymptotically
convergent in the mean-square sense if (4.9) and (4.10) satisfy the
hypotheses of Theorem 4.3 and lim sup ¢(t,to)2 < m. This is the first
time that restrictions have been placed on (4.9) and (4.10) which
guarantee the convergence of the optimal linear filter with prob-

ability one.

4.4 A Class of Gain Functions Guaranteeing Convergence

 

This section returns to the filters represented by (2.15)
which were proposed as solutions to the non-linear filtering problem.
In Sec. 4.2, these sub-optimal filters were shown to converge when
the models of the message and observation processes, (2.1) and (2.4),
and the gain function, a(t,; ,t), satisfied Assumptions 13.1 - B.3
and B.4 or B.4'. The gain fugction does not have any fixed Structure
and only has to fulfill conditions such as: I: b(t)inf]a(t)‘dt = m
and I: ¢(to,t)zsup|a(t)]2dt < m. In this section, a class of gain
functiogs is formulated and more assumptions are made about (2.1)
and (2.4). The assumptions and Theorem 4.1 imply that the sub-optimal
filters converge with probability one and in the mean-square sense.

The following assumptions are made about (2.1) and (2.4):
Assumption B.l, B.2, B.3, and

Assumption C.l: sup §%%%-< m where g(t) = sup]h(t,x)‘

t x

 

Assumption C.2: Let C(t) 2 f(t,x) 2 D(t) for t 2 t0 and for

every x and define

¢(t,t0) exp[I:0G(s)ds]

w(t,to) exp[f:0D(s)ds];

‘3’!

_ '2 “1‘qu k. ' ".

 

62

2
o(t.t0)
then sup

._______§.< m and lim sup ¢(t,t )2 = K.< m .
t N(t,to) O

Assumption C.3: I: ¢(s,to)2b(s)2ds = m
0

Corollary 4.4 If (2.1) and (2.4) satisfy Assumptions B.l - B.3 and

C.1 - C.3 and if the gain function a(t,x; t) has the form

0,

gi(t’;ro,r>°1<t’§<t>>

 

]a(t,;£ ,t)] - (4.13)

_ t ._ '—
o [a +Ift g2(s,xt ,S)c2(s,x(s))ds]
o o
where

_ 2 .
,t), g2<t,xt ,t) s oct.to> _)
O O -2-

2 ._
I(t.to) s g1(t.xt

b(t) s .1(r,;(t)), cztc,§(c>> s g(t)

then a(t,;£ t), (2.1) and (2.4) satisfy Assumption B.4.

3
Proof. Assumption B.4(a) is satisfied by inspection. Examining the
integrand of the integral in Assumption B.4(b),

y(c.to>2b<t>2

[a +‘[: ¢(s,to)2g(s)2ds]
O

 

b(t)inf'a(t)] 2

By Assumption 0.2, there exists a K and K such that for

1 2
every t
¢(tt)2
’ o t
ls—jskl and lsmb(t)sx2
I<t»to>

These inequalities weaken the lower bound on the integrand to

e(t.to)2b(t)2

 

b(t)iana(t)| 2 2 t 2 2
K1K2[a' +‘ft0¢(8.to) b(S) d8]

Using Assumption C.3 and Lemma 3.2,

f:0b(t)inf|a(t)]dt = a .

63

Assumption B.4(c) is now investigated.

¢(to,t)2¢(t,to)4g(t)2

r t 2 2 2

a + ft ¢(s,to) b(s) ds]
0

 

¢(t,to)28up‘a(t)]2 s

K4 K: ¢(t, t o) 22b(t)

[a" +Ijt ¢(s,to) 2b(s)2ds]2
o

 

Again, Assumption C.3 and Lemma 3.2 Show that

I:O¢(to,t)zsup]a(t)‘2dt < m,

completing the proof. w
Except for the discussions of the linear Kalman-Bucy filter in
Sec. 4.3, this chapter has concentrated on demonstrating the asymptotic
convergence of general sub-optimal, sequential filters. One reason
this investigation analyzed sub-optimal filters such as (2.15) was
to obtain theorems which provide analytical properties for Optimal,
non-linear filters. With this motivation, the ideas of stochastic
approximation exhibited by Theorems 4.1 and 4.2 were specialized to
produce the corollary in this section, Corollary 4.4. This corollary
is now used to show that the approximately-optimal filter represented
by (2.13)-(2.14) is asymptotically convergent.

Define the gain term in (2.15) as
a(t,§t t) = P(t)h(t,§(t)) (4.14)
O,

with

§(xt ,t)
P(t)= °

[v—l—_ar(b) +:j‘ grotto S)h(s,;(s))2ds]

64

and 6(xt ,t
o

> = exp[2 I: f(s,;(s))dsj .
O

The function P(t) satisfies the differential equation (2.14)

since

#1:: t = 2 i(t.§(t))P(t) - [h(t,;(t))P(t)]2 ; 150:0) = varw).

The gain term a(t,;; t) is the gain term in (2.13) and also satisfies

3

o
(4.13) in Corollary 4.4. Therefore, Theorem 4.1 combined with Corollary

4.4 Show that the approximately-optimal filter represented by (2.13)

and (2.14) converges asymptotically with probability one and in the

mean-square sense if (2.1) and (2.4) satisfy Assumptions B.l, B.3, C.1, ,J
C.2, and C.3 and if the initial condition on (2.1), x(to) = b, has a

prior distribution. The convergence theorems of this chapter have

provided the means for displaying, under certain conditions, a filter-

ing solution which meets the three requirements of a useful filter;

optimality, convergence, and sequential structure.

4.5 Summary

The first major contribution of this chapter is the statement
and proof, in Sec. 4.2, of two theorems demonstrating the asymptotic
convergence of the non-linear filters defined by (2.15). Theorem 4.1
shows that the filtering error goes to zero both with probability one
and in the mean-square sense, with limitations placed on (2.1) and
(2.4) and on the gains a(t,;£ ,t)' One prime restriction in the
hypotheses of Theorem 4.1 is that the message process is bounded;
i.e., lim sup ¢(t,to)2 < m. Theorem 4.2 places other conditions on

the gain functions and allows the message process to become unbounded

yet shows that the output of the sub-Optimal filter converges in the

65

mean-square sense to the message.

To compare Theorems 4.1 and 4.2 to known convergence results,
a convergence theorem for the optimal, Kalman-Bucy solution to the
linear filtering problem is given in Sec. 4.3 as a special case of
these theorems. Theorem 4.1 and 4.2 prove the convergence of the
Kalman-Bucy filter when the message process is bounded and when the «as
functions used in the mathematical models of the message and
observation processes, (2.1) and (2.4), guarantee the divergence of

a certain integral. Thus, Theorems 4.1 and 4.2 imply convergence of

the optimal linear solution but under restricted hypotheses. Section
4.3 produces a minor original result in the linear theory; Theorem

4.1 demonstrates convergence of the Kalman-Bucy filter with probability
one, a type of convergence that has not been proved in the literature.

In Sec. 4.4, additional assumptions were made about (2.1) and
(2.4). Corollary 4.4 showed that Assumptions B.l - B.3 and C.1 - C.3,
when combined with any of the gain functions specified by (4.13),
satisfy the hypotheses of Theorem 4.1. If any of the gain functions
given in (4.13) is used in (2.15) and if (2.1) and (2.4) fulfill the
corollary's assumptions, the filter defined by (2.15) converges, as
time increases, to the message process.

The second major contribution of this chapter concerns the
convergence of an approximately-optimal non-linear filter. The gain
function given in (4.14) satisfies the conditions in Corollary 4.4 and,
when placed in (2.15), defines a filter identical to the approx-
imately-optimal filter derived in Chapter 2. The combination of
Assumptions B.1 - B.3 and C.1 - C.3, Theorem 4.1, and Corollary 4.4
shows that an approximately-Optimal filter is asymptotically con-

vergent. Under the Specified restrictions, this filter, given by

66

(2.13)-(2.14), provides nearly-minimum—variance estimates of the
message throughout the period of observation and, furthermore, is
guaranteed to converge to the message after lengthy processing of
the observations. The filter satisfies all three requirements of
a useful filtering solution as discussed in Chapter 1.

This thesis provides the first study into the asymptotic
convergence of filters for the continuous-time, non-linear filter-
ing problem. The style of analysis and its associated theoretical
tools used to prove Theorems 4.1 and 4.2 appear to provide logical
means for probing deeper into the behavior of non-linear, sequential

filters.

CHAPTER 5

AN EXAMPLE OF A FILTERING PROBLEM

This chapter considers an example of the filtering problem
defined in Sec. 2.1. The filter uses the approximately-optimal
algorithm discussed in Sec. 4.4 and is shown to satisfy the hypotheses
of Corollary 4.4 so that the filter is asymptotically convergent.

This convergence is verified when the message process, observation

process, and filter are simulated on a digital computer.

5.1 The Example and Computer Simulation

The filtering problem discussed in this chapter is defined as:

 

 

x(t) = f(t,x) ; x(0) = b; to = 0 (2.1)
y(t) = h(t.X) + V(t) (2»2)
where
f(t,x) = :33??? '9 x + 3.0.: 2 (5.1)
[1 + {——-—-21 1
(t + l)

h(t,x) = 0.5 x + 5 tan-1x (5.2)

and b is a Gaussian random variable with a mean = -l.5 and a

variance = 1.0.

The approximately-optimal filter derived in Sec. 2.2 and re-
presented by (2.l3)-(2.l4) is used to process y(t). This filter is
asymptotically convergent if (5.1) and (5.2) satisfy Assumptions B.l,

B.3, C.1, C.2, and C.3. These assumptions are now examined. With

67

68

h(t,x) being monotone in x, Assumptions 3.1 and B.3 are verified

directly. Assumption C.1 is satisfied as shown below.

Also

f(t.X) =

which implies that

and ¢(t,to) =
Mute) =
Therefore
e(t.
sup
t v(t.

 

 

 

h(t,x) = 0.5 + 5 2
l + x
g(t) = sup‘h(t,x)| = 5.5 and
x
b(t) = inf|h(t,x)| = 0.5
x
sin(1000 t)__ 6.0 x
100 n t (t+l)4[l + { x 2}2]2

(t+1)

 

sin(1000 t) + 3.0

 

G(t) =
100 n t (t+1)2
_ sin(1000 t) _ 3.0
D(t) ' 100 n t 2
(t+1)

t sin(1000 8) ds + t 3.0 ds
explio 100 11- s O (S+1)2 :1

 

exp[ft sin(1000 3) d3 _ I; __§,g_ ds]

0 100 n S (8+1)2

, 2
11m ¢(t,to) =K<cn ,
t-m

2
t )
O 2 = sup 6xp[2 If; 3.02] < co ,
to) t (8+1)

and f:o(s,to)2b(s)2ds = (0.5)2f:¢(s,to)2ds = m .

69

Thus the non-linearities (5.1) and (5.2) also satisfy Assumption 0.2
and C.3, demonstrating the asymptotic convergence of the filter.

A digital computer has been used to simulate the performance
of this filter. Equations (2.1), (2.13), and (2.14) were approximated

on the computer by the following difference equations.

x(ti+1) = x(ti) + f(ti,x(ti))At ; x(0) = b
ft(ti+1) = a(ti) + f(ti,5‘<(ti))At + P(ti)h(ti,ft(ti))[h(ti,x(ti))-h(tiR(ti))]At
+ P(ti)h(ti,2(ti))ABi ; 2(0) = -1.5
P(tiH) = P(ti) + 2 P(ti)f(ti,x(ti))At
- [P(ti)h(ti.ﬁ(ti))]2At ; P(O) = 1.0
Here, At = t1 - ti_1 for i 2 1 and {A31} are Gaussian random

variables with mean zero and variance At for each i 2 0.

The solution to these difference equations has been shown to
converge to the actual solutions of (2.1), (2.13), and (2.14) in the
mean-Square sense as At a 0 [44].

Using At = 10-3 sec., 200 runs were performed on the computer.
For each run, the initial condition for (2.1), b, was sampled from a
Gaussian distribution and a new sequence of observation noises {A81}
was selected. Using these 200 runs, the sample mean and variance
of the error e(t) = 2(t) - x(t) were calculated. In this simulation

the filter does converge as time increases; Figures 2 and 3 display

the results.

 

A;

x(t)

and

Values of x(t)

70

    

 

 

Message processes

Filter outputs

 

A A
U I

0 0.5 1.0 1.5

Time (seconds)

2.0

Figure 2 Filter Behavior for Two Runs

2.5

51.7,...

-0.

-1.0
-l.l

.2m

.4.»

.6.

71

1 .‘0

Time (seconds)”’,

 

Figure 3

-—-- Sample mean of

e(t)

"""‘Sample mean 1- sample

standard deviation of

Statistical Analysis of Filter Performance

In——.1_.

e(t)

72

5.2 Summary

An approximately-optimal filter has been applied to a non-
linear filtering problem which satisfies the conditions for asymptotic
convergence given in Sec. 4.4. A computer simulation has verified

the convergence.

CHAPTER 6

CONCLUSIONS

Section 6.1 outlines the major results of the thesis. Possible
extensions of the investigations in this thesis are outlined in Sec.

6.2.

6.1 Conclusions and Results

This thesis has proved the existence, under certain conditions,
of an approximately-optimal, asymptotically convergent, sequential,
non-linear filter. This culminates an effort devoted to developing
useful and practical filtering algorithms. The primary results are
theorems which demonstrate the asymptotic convergence of sequential,
non-linear algorithms both for filtering and parameter estimation.

After briefly describing the stochastic filtering problem and
the characteristics of useful solutions, Chapter 1 discusses the need
for analyzing the asymptotic behavior of non-linear, continuous-time
filters. A literature review suggests that the method of stochastic
approximation, a well-developed technique of classical, discrete-
time statistics, could be applied in an investigation of the asymptotic
properties of sequential filters.

In Chapter 2, a scalar version of the filtering problem is
formulated in terms of non-linear stochastic differential equations.
Modeling the problem in this manner allows a calculus developed by
Ito to be applied to the performance analysis of filters. Chapter

2 also prOposed the algorithms in (2.15) as sub-optimal solutions

73

74

to the filtering problem. Besides possessing the general features
of presently-documented time-domain filters, these algorithms have
unstructured gain terms. This thesis marks the first time that
continuous-time filters with arbitrary gain terms have been in-
vestigated.
Before pursuing an analysis of the asymptotic behavior of if
these filters, the method of stochastic approximation is extended
to the continuous-time, parameter estimation problem in Chapter 3.
The large-time behavior of a Special version of the algorithms in
(2.15) is investigated. A parameter of a signal is estimated where .u
the signal is observed in the presence of additive, white noise.
Theorems 3.1 and 3.2 are the most important results in this thesis.
Conditions are imposed on the signal and the gains of estimators
which guarantee convergence to the true parameter as time increases.
The sequence of estimates converges with probability one and in the
mean-square sense. These theorems are original and illustrate the
use of Ito calculus and martingale theory in asymptotic convergence
studies. Section 3.4 illustrates a way of effectively exploiting
the arbitrary gain structure permitted in the theorems. A particular
gain function is selected that leads to a nearly-optimal (in a
Bayesian sense), convergent algorithm.
In Chapter 4, the filters (2.15) are considered. Two key
theorems, 4.1 and 4.2, were developed from the ideas expounded in
Chapter 3. These theorems Show that when the message process, the
observed process, and gain function in (2.15) satisfy certain
hypotheses, the outputs of the filters in ( 2.15) converge to the
message process as time grows large. Both probability-one and mean-

square convergence are demonstrated. Section 4.3 examines the

75

conclusions of Theorems 4.1 and 4.2 for the special case when the
message and observation processes are generated by appropriate
linear stochastic differential equations. Conditions are specified
which guarantee that the Kalman-Bucy filter is asymptotically con-
vergent with probability one.

In Sec. 4.4, a basic goal of this study is accomplished when
Theorem 4.1 is used to demonstrate that an approximately-optimal,
non-linear filter converges asymptotically, given that the message
and observation processes satisfy certain conditions. That is, a
filter is specified which provides nearly-minimum-variance estimates
of the message process throughout the observation time interval and
which converges surely to this message as time increases. This
thesis contains the first investigation of the asymptotic convergence
of non-linear, continuous-time, sequential filters and indicates a
way in which stochastic approximation can be applied to the filter-
ing problem.

An example of a filtering problem is given in Chapter 5. The
approximately-optimal filter of Sec. 4.4 is applied as a filter and

convergence is demonstrated by use of a digital computer simulation.

6.2 Extensions

There are two different directions in which the results of
this thesis can be extended. Either the hypotheses of the conver-
gence theorems can be weakened or these theorems can be extended
to include the multi-dimensional filtering problem.

The convergence theorems in both Chapters 3 and 4 are for one-
dimensional problems. These theorems may be extended to the multi-

dimensional case, thus enabling them to be applicable to large scale

76

engineering problems. Insight into this type of extension may be
gained by consulting Albert and Gardner [20] for an application of
the method of stochastic approximation to estimating, via discrete-
time observations, a vector of parameters.

The analyses of Chapters 3 and 4 may be closely examined to
see if less restrictive hypotheses can be placed on the convergence
theorems. Albert and Gardner's analysis of constrained, discrete-
time estimators offers a way of reducing the restrictions on the
observed processes. Another possibility is to interconnect the con-
cepts of Liapunov stability and the method of stochastic approximation
in continuous-time. Introducing a Liapunov function for the differ-
ential equation representing the message process into the convergence
ideas of this thesis may lead to asymptotic convergence results more
effective than those in Chapter 4.

The approach exhibited in this thesis may also be applied to
the general time-domain filtering problem defined in Sec. 1.1, i.e.
where u(t) # 0 in (1.1). An investigation into the asymptotic
behavior of filters for this type of problem should begin with an
examination of the discrete-time results of Albert and Gardner

discussed earlier in Sec. 1.2.7.

BIB LIOGRAPHY

10.

11.

12.

BIBLIOGRAPHY

Wiener, N., Extrapolation, Interpolation, and Smoothing of
Stationagy Time §eries, John Wiley & Sons, Inc., New York, 1949.

Kushner, H.J., "Dynamical Equations for Optimal Nonlinear

Filtering," Journal of Differential Equations, Vol. 3, pp.
179-190, 1967.

Bucy, R.S., "Nonlinear Filtering Theory," IEEE Trans Automatic
angggl, Vol. 10, pp. 198, 1965.

Kalman, R. and R.S. Bucy, "New Results in Linear Filtering and
Prediction Theory," Trans. ASME ser. D, J. Basic Eng. 83, pp.
95-108, March 1961.

Wars.

Deutsch, R., Estimation Theory, Prentice-Hall, Inc., Englewood
Cliffs, New Jersey, 1965.

Synder, D.L., "The State-Variable Approach to Continuous
Estimation," Ph.D. Thesis, Massachusetts Institute of Technology,
Cambridge, Mass., 1966.

Richman, J. and F. Thau, "Elements of the Kalman Filtering

Technique," Technical News Bulletin, General Precision, Inc.,
Vol. 9, No. 4, 1966.

 

Fisher, J.R., "Conditional Probability Density Functions and
Optimal Non-Linear Estimation," Ph.D. Thesis, University of
California, Los Angeles, 1966.

Ho, Y.C. and R.C.K. Lee, ”Identification of Linear Dynamic

Systems," ErasaJLJLJL. 1964-

Kalman, R.E., "Fundamental Study of Adaptive Control Systems,"
Tech. Rept. ACD-TR-6l, Vol. 1, App. 1, 1961.

Irwin, J.D. and J.B. O'Nea1,.Jr., "The Design of Optimum DPCM
(Differential Pulse Code Modulation) Encoding Systems Via the
Kalman Predictor," Proc. 1968 JACC.

Stratonovich, R.L., Conditional Markov Processes and Their

Application to the Theory of Optimal Control, American Elsevier
Publishing Co., Inc., New York, 1968.

77

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

78

Kallianpur, G. and C. Striebel, Stochastic Differential ﬁguatiogs

' i e s ima ion of ontinuous Parameter Stochastic
Processes, Technical Report No. 103, University of Minnesota,
Minneapolis, Minn., September 1967.

Kushner, H.J., "Approximations to Optimal Nonlinear Filters,"
IEEE Trans. Automatic Control, Vol. AC-12, pp. 546-556, October
1967.

Bucy, R.S., "Recent Results in Linear and Non-Linear Filtering,"
Stochastic Problems in Control, 1968 Joint Automatic Control
Conference, University of Michigan, Ann Arbor, Mich., ASME,

pp. 87-106, June 1968.

 

Aoki, M., Optimization of Stochastic Systems, Academic Press,
New York, 1967.

Smith, G.L., ”Multivariable Linear Filter Theory Applied to .
Space Vehicle Guidance,” J. SIAM Control, Ser. A, Vol. 2, pp. 2
19-32, 1964. J

Friedland, B. and I. Bernstein, "Estimation of the State of a
Non-Linear Process in the Presence of Non-Gaussian Noise and

Distrubances," J. of the Franklin Institute, Vol. 281, No. 6,
pp. 455-480, June 1966.

McLean, J.D., S.F. Schmidt and L.A. McGree, Optimal Filtering
and Linear Prediction Applied to a Space Navigation System for
the Circumlunar Mission, NASA TND-1208, March 1962.

Albert, A.E. and L.A. Gardner, Jr., Stochastic Approximation
and Nonlinear Regression, Research Monograph NO. 42, the M.I.T.
Press, Cambridge, Mass., 1967.

Pearson, J.B., "A Note on Nonlinear Filtering," IEEE Trans.
Automatic Control, AC-13, No. 1, pp. 101-105, February 1968.

Wolverton, C.T., "A Note on the Application of Dvoretzky's
Theorem of Non-linear Filtering," IEEE Trans. Automatic Control,
AC-l4, NO. 1, pp. 117, February 1969.

Blaydon, C.C., "Recursive Algorithms for Pattern Classification,"
Ph.D. Thesis, Harvard University, Cambridge, Mass., 1967.

Donoghue, P.J., ”System Identification by Bayesian Learning,"
Ph.D. Thesis, Michigan State University, East Lansing, Mich., 1968.

Driml, M. and J. Nedoma, "Stochastic Approximations for Con-
tinuous Random Processes," Trans of 2nd Prague Conf. on Infornn
Theory, Statistical Decision Functions, and Rgpdom Processes,
Prague, Czechoslovakia, pp. 145-158, June 1959.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

79

Driml, M. and O. Hans, "Continuous Stochastic Approximations,"
Trans. of 2nd Prague Conf. on Inform. Th., Statist. Decision

Functions, and Random Processes, Prague, Czechoslovakia, pp.
113-122, June 1959.

Sakrison, D.J., "A Continuous Kiefer-Worlfowitz Procedure for
Random Processes," Ann. Math. Statist., Vol. 35, pp. 590-599,
1964.

Sakrison, D.J., "Stochastic Approximation: A Recursive Method

for Solving Regression Problems," Advances in Communicatign
Systems, Vol. 2, pp. 51-106, 1966.

Bass, RJW., V.D. Norum, and L. Schwartz, "Optimal Multichannel

Nonlinear Filtering," J. of Math. Analysis apd Appliggtign,
Vol. 16, pp. 152-164, 1966.

Bellman, R.E., H.H. Kagiwada, R.E. Kalaba, and R. Sridhar,
Invariant Imbedding and Nonlinear Filtering Theory, Rand
Corporation Memo., RM-4374-PR, December 1964.

Kushner, H.J., "On the Differential Equations Satisfied by
Conditional Probability Densities of Markov Processes with
Applications," J. SIAM Control, Ser. A, Vol. 2, No. 1, pp.
106-119, 1964.

 

Stratonvich, R.L., "A New Representation for Stochastic
Integrals and Equations," J. SIAM Control, Vol. 4, No. 2, 1966.

 

Coddington, E.A. and N. Levinson, Theor of Ordinar Differentia
Eguations, McGraw-Hill, New York, 1955.

Kailath, T. and P. Frost, ”Mathematical Modeling of Stochastic
Processes," Stochastic Problems in Control, 1968 Joint Auto-

matic Control Conference, University of Michigan, Ann Arbor,
Mich., ASME, pp. 1-38, June 1968.

Wonham, W.M., "Some Applications of Stochastic Differential
Equations to Optimal Nonlinear Filtering," J. SIAM Control,
Ser. A, Vol. 2, No. 3, pp. 347-369, 1965.

Perfitt, T.E., Adaptive and Non-Adaptive Digital Phase Lock
Loops, Lear Siegler, Inc. Engineering Memorandum Report No.
204, Grand Rapids, Mich., 1968.

Stear, E.B. and L. Schwartz, "A Computational Comparison of
Several Nonlinear Filters,” IEEE Trans. Automatic Control,
AC-13, No. 1, pp. 83-86, February 1968.

 

Morozan, T., "Sur L'Approximation Stochastic," C.3, Acad.
331,, Vol. 264, Paris, pp. 633-635, 1967.

Kushner, H.J., "A Note on the Maximum Sample Excursions of
Stochastic Approximation Processes," Ann. Math. Statist.,
Vol. 37, pp. 513-516, 1966.

 

40.

41.

42.

43.

44.

45.

46.

47.

80

Gladyshev, E.G., "On Stochastic Approximation,” Theory Prob.
Applications, Vol. 10, No. 2, pp. 275-278, 1965.

Doob, J.L., Stochastic Processe§4 John Wiley and Sons, Inc.,
New York, 1953.

Fulks, W., Adggnged galculus, John Wiley and Sons, Inc., New
York, 1961.

Sorenson, H.W., "On the Error Behavior in Linear Minimum
Variance Estimation Problems," IEEE Trans. Automatic Control,
AC-12, No. 5, pp. 557-562, October 1967.

Maruyama, G., "Continuous Markov Processes and Stochastic
Equations," Circolo Matematico Di Palermo, Series 2, Vol. 4,
pp. 48-90, 1955.

Skorokhod, A.V., Studies in the Theory of Random Processes,
Addison-Wesley Publishing Co., Inc., Reading, Mass., 1965.

Gel'fand, I.M. and N. Ya.Vilenkin, Generalized Functions,
Vol. 4, Academic Press, New York, 1964.

Loeve, M., Probability Theory, D. Van Nostrand Co., Inc.,
Princeton, New Jersey, 1963.

 

APPENDICES

APPENDIX A

STOCHASTIC DIFFERENTIAL EQUATIONS AND THE ITO CALCULUS

Stochastic differential equations have only recently been used
in engineering applications. The equations were initially misunder-
stood, creating a confusion about the proper analytical procedures
for handling them. The purpose of this appendix is to paraphrase
recent publications and briefly explain the available facts about
stochastic differential equations.

The following equation defines the n vector g(t):

j—ﬁ‘t—l = g(t) + amen) (M)

where p(t) is a zero-mean, white noise vector process, i.e.,
E ui(t)uj(s) = 5(t-s)5ij, and g(t) and D(t) are, respectively,
random vector and matrix functions of t, which may depend on the
present and past values of u(t). Equation (A.1) is a general
engineering definition of a stochastic differential equation. «The
filtering error equations in this thesis are of this type.

To display the need for the strict attention necessary when
dealing with stochastic differential equations, an example from

Kailath [34] will be discussed. Consider the following system of

equations involving Gaussian white noise u(t).

B(t) ;u(t) ; B(O) = 0 (A.2)

x(t) = B(t)u(t) ; x(0) = 0 (A.3)

81

82

The solution for x(t) is:

x(t) = jgs(s)u(s)ds (A.4)
where B(t) = Igu(s)ds (A.5)

Equation (A.4) can be written as:

x(t) = j‘SB(s)dB(s) (A.6)

and formal integration gives:
B_(e)_2 t 2
x(t) = 2 |0 =B(t) /2. (A.7)

As indicated, the integration of (A.6) was formal, following
normal calculus rules. Equation (A.6) will now be examined by apply-
ing the fundamental formula for evaluating integrals:

x(t) = lim 2 B(pi)[B(ti+1) - B(ti)] (A.8)

iti-I-l-ti "'0 ti

where [0, t1,t2,...,tn 1,t} is a partition of [0,t] and pi is
an oint in the interval t, t, .

yp [1. 1+1]
Before continuing with this discussion, certain properties

of the random process B(t), which has the following characteristics,

are discussed.

EB(t) =0

E B(t)B(s) = min(t,s)

The Gaussian process with these properties, called a Brownian motion
process, has been studied extensively and has the following features:
independent increments, Markov and martingale behavior, and continuity

but non-differentiability for almost all sample functions. Related to

 

83

this last feature, and very important in the analysis of the integral

(A.6), is the Levy oscillation property:6 If {0 = to,t1,...,tn = t}

is a partition of [0,t] then

lim 2 [B(t ) - B(ti)]2 = t (A.9)

'+1
lt:i+1'tii"0 ti 1

where the limit exists w.P.l and in the mean square sense.

To analyze (A.6), let pi = ti' Then (A.8) becomes, from

 

 

(A.9):
x(t) = lim 2 B(ti)[B(ti+1) - B(ti)]
2 2 2
= mm mm) - B(ti) _ z [B(tm) - B(ti)]
2 2
Bftzz t
= - _. (A.10)
2 2
The solution (A.10) disagrees with (A.7). If pi = t1+1 in

(A.8), the result also disagrees with (A.7).

 

 

x(t) = lim 2 B(ti+1)[B(ti+1) - B(ti)]
2 2 2
B(t. > - B(t.) [B(t )- B(t)]
= lim{2 1+1 2 1 +2 i+12 1

2

These conflicts indicate the need for carefully formulating
stochastic differential equations such as (A.l) and for strictly
defining the integration procedure to be utilized in the solution
of such equations.

The formulation presently being applied in the literature
and being relied upon in this thesis has a number of important

theoretical advantages (Kailath). This formulation, which applies

The Levy oscillation property is shown in Doob [41], p. 395.

84

d
when g(t) is Gaussian, formally makes [g(t) = agi£l , where B(t)
is a vector Brownian motion process, and states that (A.1) is equi-

valent to (A.11).
_ t t
g(t) - _}_t_(to) + It £(S)ds + ft p(s)dp(s) (11.11)
0 o
The solution to (A.11) is symbolically represented as:
d)_(_(t) = £(t)dt + p(t)dp(t), (A.12)

The first integral in (A.11) is treated as a Reimann integral,
while the second is defined as the Ito integral, first defined by Ito
[45]. Since the Ito calculus is used throughout the thesis, its pro-
perties are listed in Appendix B. One important property of this
calculus is the chain-rule for differentiation (Theorem 3.1, Ito's
Lemma), which is used when determining the differential equation for
a function of g(t).

To illustrate Ito's lemma and to Show the difference between
Ito and non-Ito calculus, the stochastic differential equation in

(A.13), is considered.
dz(t) = % z(t)dt + z(t)dB(t) ; z(O) = l (A.13)
Theorem B.l shows that the solution in the Ito sense is:

z(t) = exp[B(t)]. (A.14)

Using the notation of Theorem B.l, let a(t) = O and b(t) = 1.

Then, x(t) = B(t). Now, let Z(t,x) = exp[x]. Theorem 8.1 says that

2
dz(t) % a-—g-exp[x(t)]dt +-§;-exp[x(t)]dB(t)

BX

35 z(t)dt + z(t)dB(t).

85

Thus, when Ito calculus is used to analyze (A.13), (A.14) is
the correct solution. The solution to (A.13) when applying the rules

of ordinary calculus to (A.13) divided by dt is:

z(t) = exp[Btt) + t/2J.

Other ways of treating stochastic differential equations have
been documented. All assume (A.1) is equivalent to (A.11) and specify
different definitions for the Stieltjes-type integral in (A.11).

Stratonovich [32] has provided the most popular alternative to Ito's

t + t
1+1 i

2

solutions to (A.11) which would be obtained by formal integration, at

definition; his method selects pi = in (A.8) and provides
least in the scalar case. Unfortunately, his integral doesn't possess
analytical properties as convenient as those listed in Appendix B for
the Ito integral.

This discussion has concentrated on stochastic differential
equations containing Gaussian white noise. The remainder of this
appendix describes the available literature concerning definitions
for stochastic integrals, such as the one in (A.8), when the noise
in (A.1) is non-Gaussian. Portions of the following comments are also
found in Kailath [34] and Fisher [8].

The correct way of representing white noise is to define it as
a generalized random process, such as discussed in Gel'fand and
Vilenkin [46]. Part of Gel'fand's develOpment shows that the gen-
eralized derivative of Brownian motion has a delta function for its
correlation function. More generally, the random variables comprising
the formal derivative of any process with independent increments are

shown to be independent. In other words, the formal derivative of

 

an independent increment process is white. This fact has been used

86

in formulating stochastic differential equations Since both the Ito

and the Stratonovich integral have been defined when B(t) is replaced
by any independent increment process [13], [45]. However, this more
general representation of white noise cannot be applied when analyzing
observation processes such as (1.2) and (2.2). Independent increment
processes can be shown to be either impulsive (Poisson) processes or
Brownian motion processes or a combination of the two. Kailath and
Fisher indicate that it is not possible to define a filtering problem
which has an observation process containing impulsive noise.

In summary, if the white noise in the models of the observation
processes used for the filtering problem is considered equivalent to
the formal derivative of an independent increment process, the white
noise must be Gaussian.

There is at least one other way to interpret white noise.
Assume it is equivalent to the formal derivative of any process with

orthogonal increments. Results on the derivative of independent in-

 

crement processes indicate that the random variables comprising these
processes are independent when all that is needed is orthogonality.
Thus, the derivatives of orthogonal-increment processes do seem
logical as representations for white noise. There does not seem to
be any published work on the formulation of stochastic differential
equations using this interpretation of white noise. Because of these
limitations in the theory of stochastic differential equations, this

thesis treats only Gaussian, white noise in the observed process (2.2).

APPENDIX B

O-ALGEBRAS, ITO CALCULUS, AND MARTINGALES

This Appendix defines and discusses three important concepts
in stochastic processes.
3.1 a-Algebras Generated by Stochastic Processes

Let m(t,w) be a stochastic process defined on the probability
space {Q,H,P] where w E Q and t E I, the index set; 0 is a sample
space, H is a a-algebra of measurable sets and P is a measure
assigned to the sets in H. For the remainder of this appendix, and
for the entire thesis, m(t,w) is denoted as m(t).

The G-algebra generated by the random process [m(s), t0 5 s s t]
is defined as the minimal a-algebra of events that contains all events
of the form: {w : w E Q and m(u,w) E A] where u 6 [to,t] and

A E R is any Borel set. This O-algebra is denoted by F{[to,t],m(u)].

l
Theorems from measure theory [41], [47] Show that P[[to,t],m(u)}<: H,
that P[[to,s],m(u)] c F[[to,t],m(u)] for t 2 s, and that m(t) is
measurable on F{[to,t],m(u)}.7
b.2 Properties of the Ito Integral

Let B(t) be a Brownian motion process defined on I = [tO,T].
This section lists the properties of the Ito calculus that follow
from Ito's definition of the integral I: g(s)dB(s). This integral

0

cannot be defined in the ordinary Riemann-Stietjes sense Since B(t)

is of unbounded variation with probability one.

 

For a discussion of O-algebras (O-fields) consult Loeve [47] (Chap.
1). Doob [41] (pp. 599-602) has a discussion of a-algebras generated
by random processes.

87

88

Ito calculus is thoroughly discussed in Skorokhod [45]. The
o-algebra generated by B(s), where tO S s s t, is F{[to,t],B(s)].
Let M2 (F t) be the class of functions g(t) which are measurable on
F{[t0 ,t] ,B(s)] and satisfy Prob[IT oIg(s)]2 dt < a} = l. The Ito
integral I: g(s)dB(s) is defined for all g(t) 6 M2 (Ft ) and has
the following properties.

Property 3.1: E I: g(s)dB(s) = 0 t E I
0

Property B.2: E[I:g(u)dB(u)]F{[to,s],B(r)]] = O for t 2 s and
t,s 6 18

Property B.3: E]f: og(s)dB(s)]2 =Iig E|g(s)|2ds for t E I

Property B.4: The process w(t) =-IE g(s)dB(s) is a martingale.

Property B.5: The process w(t) is t: continuous process with prob-
ability one.

Appendix A indicated that the Ito integral has been extremely
useful in the formulation and analysis of stochastic differential
equations. Special care must be taken when applying the chain-rule
for differentiation to these equations. Theorem B.l provides the

proper differentiation formula.

Theorem 3.1 (Ito's Lemma) Let x(t) be a process satisfying
dx(t) = a(t)dt + b(t)dB(t) w.P.1 for t e [tO,T]

where B(t) is Brownian motion and a(t), b(t) and b(t)2 belong
to M2 (F t). If Z(t, x) is continuous and has continuous derivatives
%—-Z(t, x), g—-Z(t, x), and §——'Z(t, x) for t E [to,T], then the pro-

3x2
cess z(t) = Z(t,x(t)) satisfies the relation

 

For a discussion of conditional expectations refer to Doob_[4l], p. 37.

9 All integrals not of the Ito type are treated as Riemann integrals.

89

2
dz(t) = [3; Z(t,x(t)) + a(t) 51X- Z(t,x(t)) + 3sb<t>2 L5 Z(t,x(t))Jdt
3x

+ b(t) g; Z(t,x(t))dB(t).

The notation gE'Z(t,x(t)) has been substituted for g; Z(s,x(t))‘s=t.
B.3 Martingale Theory

A random process w(t) defined on a time interval I is called
a martingale if for every t E I there corresponds a O-algebra At
relative to which the random variables w(s) are measurable for s s t,
s E I, and which possesses the prOperty that for t1 S t2 where
t1,t2 E I

E[w(t2)‘At1] = w(tl) w.P.1. (3.1)

If the equality in (B.l) is replaced by S (2), a super-martingale
(sub-martingale) is defined.

Some of the martingale theorems given later require that w(t)
be a separable process. A process w(t) is separable on I if there
is a denumerable, everywhere dense subset D of I such that for

any a < b in I

sup w(t) = sup w(t) and inf w(t) = inf w(t) w.P.1.
tE(a,b) t6(a,b)ﬂD tE(a,b) t€(a,b)nD

The separability condition is satisfied if the process w(t) is right-
continuous in t (w.P.1) on the interval I. In this thesis, all
processes are continuous (Appendix D), so separability is automatically
satisfied and need not be mentioned.

The following three theorems on martingales are used in this
thesis.

Theorem B.2: If w(t) is a non-negative, super-martingale,

1im w(t) = W
a
tam

90

exists with probability one and is finite.
Theorem B.3: If w(t) is a (separable) non-negative super martingale
on any interval 1, then for any t E I and any constant c,
Prob{ sup w(u) 2 c} S EEXEl .
u2t,u€I
Theorem B.4: If w(t) is a (separable) sub-martingale on some
interval I, then for any t E I and any constant c,
Prob{ sup w(u) 2 c} s ELEXELL -
uSt,u€I
The proofs of these properties can be found in Loeve [47] (sub-

sections 29.3 and 36.1) and Doob [41] (chapter 7).

APPENDIX C

THE INFINITE DIMENSIONAL REPRESENTATION FOR THE
OPTIMAL FIETER

This short exposition develops the system of random differ-

 

ential equations which must be solved to provide the minimum-variance E
estimate of the message process x(t) in (2.1). This system re-
presents the filter which is an optimal solution to the filtering
problem discussed in Sec. 2.2.
In Sec. 2.2, the minimum-variance (conditional mean) estimator t

is represented by the following stochastic differential equation.

d$c(t) = Ema: +5310» - x(t)h(t)][dz(t) - Emu} (2.7)

In (2.7), x(t) is the conditional mean at time t. The other
functions are defined in Sec. 2.2. To solve (2.7), where the observed
process dz(t) is an input driver, the functions f(t), xh(t), h(t)
must be found.

The function h(t) is examined first. Equation (2.6) re-
presents h(t), the conditional expectation of h(t,x(t)), when

g(t,x) = h(t,x). Then

dB(t) = {Elma + [g(t) - h(t)2][;dz(t) - h(t)dt]

/\
/\
where h1 = 5X h(t,x). Now, fh1(t) and h2(t) need to be deter-
mined so (2.6) must be consulted.
/\
A /\ .. ..
dfh1(t) = @1(t)dt + [f2h2(t)dt + fhh1(t) - phl(t)h(t)][dz(t) - b(t)dt]

/\
dh (c) = 2 ﬁnd)“ + [h3(t) -?(t)ﬁ(t)][dz(t) - b(t)dtl

91

92

A AA
To continue this procedure,‘ff:h1(t), f2h2(t), fhh1(t), h3(t)

must be determined. In fact, an endless string of differential
equations need to be solved to determine all needed conditional
expectations. The same situation occurs when determining f(t)
and 4km.

This endless string of differential equations, which must be
solved simultaneously with (2.7), provides the infinite dimensional

representation for the optimal filter.

APPENDIX D

IMPORTANT PROPERTIES OF STOCHASTIC DIFFERENTIAL
EQUATIONS SIMILIAR TO EQUATION 2.15

The purpose of this appendix is to indicate conditions on
h(t,x) and a(t,;; ,t) which are sufficient for providing x(t)
in (2.15) with two important properties: (1) Continuity of x(t)
w.P.1 and (2) F{[to,t],;(s)} C:F{[t0,t],B(s)}. The following
derivations are based on Doob [41] (chapter 6) and Skorokhod [45]
(chapters 2 and 3). This appendix does not deal directly with (2.15)
but with an equation containing the same features as (2.15) and

fewer terms.

Let w(t) be the solution to

dw(t) = g(t,w(t))dt + a(t,wt t)dB(t) for t e [to,T] (C.1)

9

where B(t) is a Brownian motion process and wt t = {w(s) : t s s S t}.
s O

The following conditions on g(t,w(t)) and a(t,wt t) are postulated

9

for the remainder of this appendix. 0

H1: The function g(t,w) is a continuous function in the pair (t,w)
and a(t,wt ,t) is measurable on F{£to,t],w(s)}.

H : The fugction g(t,w) satisfies the uniform Lipschitz condition

~‘g(t’x) " g(t:Y)| 5 K‘x - y‘

for x,y s R and t 6 [tO,T]; K is a fixed constant.

1

The function a(-:-) satisfies the following function-space condition

~‘a(t’xto,t) - a(t,yto,t)\ 5 KIIE6lX(s) - y(s)|2ds]%

93

94

where xt ,t and yt ,t E C: , the space of continuous functions on
the time interval [t:,t]. The constant K may be selected equal to
the Lipschitz constant for g(-,°) with no loss of generality.

The derivations in the remainder of this appendix are con-
centrated on demonstrating the following properties of the solution
to (C.1).

P1: The function w(t) is continuous in t with probability one.
P2: F{[to,t],w(s)} C:F{[to,t],B(s)} for every t E [tO,T].

These properties are demonstrated by a proof constructed similar to
Doob_[4l] (pp. 277-281). A by-product of this proof is the existence
of the solution to (C.1). The derivation begins with the following

lemma.

Lemma: If a process w(s) has properties P and P and if g(-,~)

1 2
and a(-,-) have properties H1 and H2 then any process y(t) defined
by
y(t) = ft g(s w(s))ds +ft a(s w )dB(s) (c.2)

t ’ t ’ t ,s

o o 0
has property P2 and the Ito interpretation of the second integral
in (0.2) provides y(t) with property P1.
Proof. According to H1, H2, and P1, the first integrand in (0.2) is

a bounded, continuous function of s for almost all sample functions.
The first integral is a continuous function of t with probability

indicates that a(t,w ) is

one. Condition H1, comb1ned With P2, to’t

measurable on F{[to,t],B(s)} so the second integral may be defined
as an Ito integral (Appendix B). This integral is a martingale and,
in addition, is continuous in t with probability one. Continuity

for almost all sample functions y(t) follows. It is obvious

95

that y(t) is measurable on F{[to,t],B(s)} and, since for all s S t,
y(s) is measurable on F{[t0,s],B(u)}<: F{[to,t],B(u)}, the process
[y(s) : t0 S s S t] is measurable on F{[to,t],B(u)}. The definition
of F{[to,t],y(u)} shows that F{[to,t],y(u)}<: F{[to,t],B(u)}. This
finishes the proof of the lemma.

A solution can be found for (C.1) by successive approximations

that have the properties P1 and P2 shown above. Let wo(t) be any

process having properties P1 and P2. According to the lemma, it is

now possible to define wn(t) in such a way that every wn(t) has

properties P1 and P2.

wn(t) = I:0g(s,wn_1(s))ds +-f:oa(s,wt )dB(s) (C.3)

,s,n-l
o

with w (u) : tO S u S s}. The following derivation

t ,S,n"1 = {Wm-1
0

will show that

lim,wn(t) = w(t) w.P.1 (C.4)
n—G
uniformly in t, thus defining a process w(t) with properties P1
and P2. Also, it will be shown that
. t t
11m I g(s,w (s))ds = I g(s,w(s))ds w.P.1
t n t
nah o o
(0.5)

9

t
1im It a (8 ,wt
nHub o

S,n)dB(S) = f:0a(s,wto,s)dB(s) w P.1

uniformly in t. The process w(t) will be the solution to (C.1).

In proving these facts, the following notation will be con-

venient.

(t)

AnW(t) = Wn(t) - Wn_1

Ang(t) = g(t,wn(t)) - g(t,wn_1(t))

96

Ana(t) = a(t’wto,t,n) - a(t’wto,t,n-1)

From H2,
lang(t)| s K\Anw(t)l
‘AnS(t)‘2 S K.f:o‘Anw(s)‘2ds.
Then, from (C.3) and a property of the Ito integral (Appendix B),
2 t 2 t 2
E{[Anw(t)] } S 2 E{‘ft An_1g(s)ds‘ } + 2 E{|It An_1a(s)dB(s)‘ }
o o

S 2 K2(T-to) I: EEAn_IW(s)]2dS +-2 f: E‘An_la(s)‘2ds
0 0

S 2 K2(T - to) I: E[An_lw(s)]2ds +
o

t

2 t 2
+ 2 K ft Is EEAn_1w(u)] duds
0 O

s 2 K2(T - to) f:0E[An_lw(s)]2ds +
+ 2 K2 I:O(t-S)E[An-1w(s)]2ds
s 4 K2(T - to) j:oE[An_1w(s)]2ds.
Hence, for some constant c,

EEAnw(t)2] S [4 K2(T - to)]n“1 I: (£52)? E[A1w(s)]2ds
o

n
c

S “' for t S t S T.
n! 0

Using this inequality and Chebyshev's inequality gives
t -n T -n
Prob{ sup”,t Ang(s)ds| 2 2 } S Prob[K rt ‘Adw(s)‘ds 2 2 }
toStST o o
S 4n E([K.IT ‘A w(s)\ds]2}
to n ‘

n 2 cn
S 4 K (T-to) ET .

97

Since the last term is the general term of a convergent series, (C.6)
holds for sufficiently large n with probability one from the Borel-
Cantelli lemma (Doob [41], p. 104).
sup [ft A g(s)dsl S 2-n (C.6)
t n
t StST o
o
The process I: Ana(s)dB(s) is a martingale for the reasons
0
given in the lemma. The square of this process is a sub-martingale
and the basic martingale inequality for sub-martingales (Theorem B.4)

shows that

Pr°b{ SUP 1f: A a(s)dB<S>| 2 2'“) s 4“ Eilff A a(s)dB<s)|2}
toStST o n O n

:1

s 4 I: E‘Ana(s)|2ds
O

S 4 K.IE I: EEAnW(u)]2duds
o 0

K I: (T-s)E[Anw(s)'| 2ds
0

n
K(T - s)2-- .

c
n!
Again, the last term is the general term of a convergent series so
(C.7) holds for sufficiently large n, w.P.1.
T -n
sup [ft Ana(s)dB(s>| < 2 (C.7)
t StST O
0
According to (C.6) and (C.7), the integrals on the right hand
side of (C.3) converge uniformly in t when n a a, w.P.1. Hence,
the limit in (C.4) exists uniformly in t, w.P.l and w(t) has pro-
perties P1 and P2. The validity of (C.5) follows from two facts:

(1) the integrands converge uniformly w.P.l, in the first limit

equation; (2) the sub-martingale inequality applied as above, shows

98

that

Prob{ sup If: [a(s,wt S n) - a(s,wt S)1dB(s)l 2 ll
toStST O O, ) O, n

2 2 T 2
s n K It (I - s)E[wn<s) - w(s)] ds
2 - ° 3
n2K (T-to)22n 232—”—
_ on J'
where the fact that ECwn(t) - w(t)]2 S 2 n 2 ngl— for t0 S t S T

i=1 J!
was used. This last inequality is proved in Doob [41] (p. 281).

According to the Borel-Cantelli lemma

t 1
Sup |ft [a(s,wt ,S n) - a(s,wt ,S)]dB(s)‘_< n
tOStST o o 0

thus proving the second limit in (C.5).

S

m

R

A

R

m

L

V.

.h

s

R

E

N

N

U

E

T
All
TIIIZ

mummy