WEE—E‘SENSE KAERNGAEE AFPROACE T0 “REAR
EEESCKE’EE-“Eihiﬁ GF’EHﬂé ESTEMAHSH

The“: ‘30? fits Degree of 95. D.
HESHEGM SMTE L‘NEVERSETY
HALE? MM
197!

LIBRARY

thS~~

Michigan State
University

 

This is to certify that the

thesis entitled

Wide-Sense Martingale Approach to Linear
Discrete-Time Optimal Estimation

presented by

Halit Kara

has been accepted towards fulﬁllment
of the requirements for

Ph. D. degree in Systems Science

 

 

M /fa/L/<

Major professor

 

Date May 191 1971

0-7639

WIDE-SENSE MARTDNGALE APPROACH TO IJNEAR
DISCRETE -TIME 0 PI'IMAL ESTIMATION

BY

Halit Kara

AN ABSTRACT OR A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering and System Science

1971

ABSTRACT

WIDE -SENSE MARTINGAIE APPROACH TO LINEAR
DISCRETE-TIME OPTIMAL ESTD‘IATICN

BY

Ha lit Kara

The dissertation considers the minimum mean-square error
estimation of the signal xk ' §(k)uk where §(k) is an n x n
matrix and uk is a wide-sense martingale process. The optimal
estimation equations are derived for prediction, filtering and
smoothing based on noisy observations.

Along with the statemnt of the problem, the historical
and mathematical background upon which the derivations of the
optimal estimation equations are based is presented. The general
formulas for the optimal estimation equations for a second-order
discrete-time stochastic process are derived assuming that the
observation process has full rank. Then, the recursive and
algebraic estimation equations are derived for the signal when
the observations are corrupted by additive white, cross-correlated
white and cross-correlated colored noises. The recursive nature
of these equations follows easily from wide-sense martingale pro-
perty of uk.

The thesis gives a purely orthogonal projection approach
in solving the prediction, filtering and smoothing problem of

Kalman and their extensions, for a are general class of signals.

The main object is to remove unnecessary analytic complications
introduced by stochastic difference equations and to use a con-
ceptually simpler geometric approach which provides a unified attack
in all three cases (uncorrelated white, cross-correlated white

and cross-correlated colored observation noises). However, at the

same time, analytic solutions which can be studied numerically

are obtained.

WIDE-SENSE MARTINGALE APPROACH TO LINEAR
DISCRETE-TIME OPIIMAL ESTIMATICN

BY

Halit Kara

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHIIDSOHIY

Department of Electrical Engineering and System Science

1971

To my professors M.N.Ozdas and T. Ozker whose

encouragement and insistence made it all possible.

ii

ACKNOWIBDGEMENTS

The constant help, encouragement, and guidance given by Drs.
R.O. Barr and G.L. Park have made this work possible. Besides the
help extended on the actual writing of the thesis, Dr. Park is
responsible for procuring the financial aid without which nothing
could have been achieved. The excellent academic background,
whether it be through teaching or advising, provided by Dr. Barr;
and his friendly and encouraging disposition have also had a great
influence on the initiation and fruition of this work.

The problem was initially brought to the author's attention
by Dr. V. Mandrekar whose suggestions and assistance throughout the
writing of the dissertation needs special mention.

Thanks are also due to Dre. 1. Kern and H. Salehi for their
critical reviews and helpful suggestions.

The author would also like to take this Opportunity to
acknowledge the unselfish interest and encouragement extended by
Dr. M.Z. Akcasu and his wife Melahat of the University of Michigan,
and to Dr. P.K. Wong for providing him with mathematical background.

Support was granted by Consumer's Power Company Cooperative
Research at Michigan State University and the Turkish Sugar Corpora-
tion. This support is gratefully adknowledged.

Last but, definitely not least, the author wishes to give
special mention and thanks for the love, patience, and understanding

put forth by his wife, Ayten, and his children, Tarik and Aysen.
iii

Chapter

TABLE OF CQITENTS

ABSTRACT
DEDICATION
ACKNOWIEDGMENTS
LIST OF FIGURES
GENERAL.NOTATION

IJST OF SYMBOLS

INTRODUCTION

1.1 Historical Background and Literature Survey
1.2 Statement of the Problem
1.3 Outline of the Thesis

MATHEMATICAL BACKGROUND AND BASIC RESULTS

2.1 Hilbert Space of Random‘Vectors

2.2 Wide-Sense Martingale and Markov Processes

2.3 A Solution of the General Minimm Mean-Square
Estimation Problem

BASIC PROBLEM (BP)

1 Optimal Prediction for BP

2 Optimal Filtering for BP

3 Optimal Smoothing for BP

4 An Example

CROSS (DRRELATED NOISE PROBLEM (CCP)
4.1 Optimal Prediction for CCP

4.2 Optimal Filtering for CCP
4.3 Optimal Smoothing for CCP

iv

Page

ii
iii
vi
vii

viii

name

12

12
17

23
37
38
54
60
67
7 2

72
81

Chapter

COIDRED NOISE ROBLEM (CNP)

5 1 Reformulation of the Problem
5 2 Optimal Prediction for CNP
5.3 Optimal Filtering for (NP

5 4 Optimal Smoothing for CNP
CONCLUSIONS

6.1 Conclusions and Results
6.2 Extensions

REFERENCES
.APPENDIX A

APTENDIX B

Page

88
89
91
92
106
111

11 1
113

114
119

121

LIST OF FIGURES

Figure Page
1.1 Block diagram for linear estimation problem (1.2) 3
1.2 Block diagram for the output noise 10
3.1 Block diagram for single-stage predictor 52
3.2 Block diagram of filter 59
4.1 Optimal single-stage predictor for CCP defined

by (4.19) 81
4.2 Block diagram of optimal filter for CCP_ 84
5.1 Block diagram for single-stage predictor for (NP 101

vi

GENERAL NOTATIW

The discrete-time is denoted by i,j,k,{,,...,s

Vectors are denoted by small letters, such as u, v, w, x, y
and z. The transpose of a vector is denoted by superscript

T, for example xT denotes the transpose of the vector x.
Matrices are denoted by capital letters, such as D, F, G, H,...,§.
The transpose and trace of a matrix are denoted by superscript
T and by tr respectively.

The syubols 0 denotes the scalar zero, or the null vector,

or the null matrix, depending on the context.

The proof of a theorem will be introduced by the word PROOF

and terminated by the abbreviation QED. If the proof is omitted

the statement of the theorem will be terminated by the syubol .

vii

Symbol

Usage

k E z
kéz
ACB
...;-

ooov '

k|L

Ml

kl;

M -l
lal
IX|

a,y>L

<x,y>

[x,y]

M

00.:-

w ,4 x(w)

A613

LIST OF SYMBOLS

Meaning
by definition equal to
k is an element of Z
k is not an element of Z
A is a subset of B
.. such that r
... for each (for all) -

Optimal estimate x based on

k

the observation Y (L)

d

x - xk.- aktb, estimation

k|L
error
All x such that -

Absolute value of a

Euclidean norm of vector x

Inner product of x,y in L2

Inner product of x,y in L;
Gramian matrix of x,y

Norm of‘vector x induced by <,>
... implies -

Function on X to Y

Function assigning x(w) to w

Direct sum of A, B

viii

Reference
Ch. 1
Ch. 1
Ch. 2

Q9 A ® B Direct product of A, B Ch. 2

x .L y x, y are orthogonal Ch. 2
‘L {x .l. M x is orthogonal to subspace M Ch. 2
'L A‘L Orthogonal complement of A Ch. 2
+ P+ Generalized inverse of P Ch. 2
( l ) (x‘M) Orthogonal projection of x onto
M Ch. 2
A iAj iAjc-l'min{i,j} 011.2
x > y y is less than of x Ch. 1
> {P > 0 P is positive definite Ch. 1
x 2 y y is less than or equal to x Ch..1
2 {P 2 0 P is positive semidefinite Ch. 1
Abbrev iat ions Meaning Reference
BP Basic problem Ch. 3
CCP Cross-correlated noise problem Ch. 4
WP Colored-noise problem (11. 5
iff if and only if at. 2
OPE Orthogonal projection estimator Ch. 2
H-space Hilbert space Ch. 2
L2 (1.2 (Q,a,p)) Hilbert space of square integrable
random variables (on the probability
space (ﬂ,d,p)) Ch. 2
L2(L;(n,d,p)) Hilbert space of square integrable
random n-vector (on the probability
space (0.4.10) Ch. 2
(mod P) With probability one Ch. 2

ix

r.v.

Y(L)

Ph 2 (°|M) orthogonal projection
Operator onto subspace M
Random'variable

Z 2 {...-l,0,l,...}, the set of
all integers

Kronecker delta

n x n identity matrix
n-dimensional Euclidean space

over reals

Observation record

Ht) 2 {y1,y2,. 0 - ,YL}

CHAPTER 1

INTRODUCTION

The Optimal estimation problem is encountered under dif-
ferent forms in many branches of science as well as in a variety
of engineering disciplines. The discrete-time linear estimation
problem which is an important special case of the general problem
is the topic of this study. It can be described quite generally
in simple terms with reference to the block diagram in Figure 1.1.
In this block diagram xk and zk denote, respectively, the

input and output signals of a memoryless, non-random linear trans-

formation, H(k), so that

2k - H(k)xk .

The output is observed in a noisy environment which is assumed to
be an additive random signal vk, called the output noise (or

measurement noise or observation noise). Thus, the (actually)

observed signal yk can be represented as

= z +’v

where the subscript refers to discrete-time, i.e.,
k E Z - {...,-1,0,l,2,...}, the set of all integers.
It is assumed that the observations are available over a

set of integers {k1,k +1,...,L} where k1 is an arbitrary

1

starting time (for the sake of simplicity k1 is chosen to be
unity) and ; moves along in discrete-time as additional data

are recorded. The problem can now be stated as follows:

(1.2) OPTIMAL DISCRETE-TIME LINEAR ESTIMATION PROBLEM. Given:

(a) The relationship between x and yk, k 6 Z: i.e. H(k):

k
k 6 Z and (1.1).

(b) The means and covariance matrices of the stochastic pro-
cesses {xk, k 6 z} and {vk, k E z}.

Problem: Given an observation record (data) Y(;) 2 {y1,y2,...,yL};

find an optimal realizable estimate R of the signal x

kl; k
which is a linear function of the data y1,y2,...,yL, i.e.
L
d 1)
5: - I: A(1)y
kl; id 1

for k;2 0, where A(i), i - l,2,...,; are matrices in appropriate
dimension. For k >.; the problem is called prediction, for
k - ; filtering, and for kt< L smoothing.

The terms optimal and realizable that occur in the descrip-
tion of the problem are defined as follows:
le of xk
some specified criterion of Optimality. The criterion used in this

OPTIMAL. The estimate it is optimal if it satisfies

dissertation is the minimum mean-square error, i.e., the minimiza-
tion of the mean-square error risk.function (or performance

measure):

~ 2 _ T _ _ _ _« T 2)
(1.3) T<xkl;) 6{(xk.ﬁkl;) (xk,*kl;)} tr6{(xk.ﬁkl;)(xk.xkl;) }

 

1) " 2 as. name, by definition " equals so...
2) tr A denotes the trace of the matrix A.

where T denotes the transpose of a vector (or matrix),

d

g g xk - a is the estimation error and 6{-} denotes the

kl; . kl;

expectation operator. This criterion is not unduly restrictive

   

 

 

l Output vkl k - 0,1,2,...
l noise I ; I 1,2,...
' , _
I Linear R
I I System kl;
l I -* ~ Estimate
| l
SignallObservation process | Estimator
I

Figure 1.1. Block diagram for linear estimation problem (1.2).

because the estimate that is optimal for the minimum.mean-square
error criteria (hereafter will be called "minimum mean-square

estimate") is Often optimal for other criteria as well (cf. [B-4],
[D-l], [K-4], [8-3], [Z-4]).

REALIZABILITY. The realizability of the estimate ﬁkli of xk
means that the estimate depends only on present and past data
y1,y2,...,yL; but not on future data yt+1”t+2"" . The
estimate sk‘L can therefore be generated in discrete-time as

the output of a physical system called the estimator (cf. Figure
1.1). For k > ; the estimator is called a predictor, for k I ;
a filter, and for k.< L a smoother.

The purpose of this research is to study PrOblem (1.2)
where the signal is an ndvariate, discrete-time, second-order
stochastic process {xk I §(k)uk, k I 0,1,2,...} where
Vk I 0,1,2,...,§(k) is a non-random n X n matrix and uk,

k I 0,1,2,... is a wide-sense martingale processl). The major

 

l) The idea of approaching to optimal linear estimation problem
from a wide-sense martingale process approach was first suggested
to the author by V. Mandrekar.

contribution of the present work lies in the characterization Of

the signal xk as a linear transformation of a wide-sense martin-
gale rather then as a solution of a given difference equation.

This characterization leads directly to recursive estimates and also
allows direct alternative derivations and extensions of earlier

well-known results in discrete-time linear estimation (cf. [B-S],

[B-6], [c-l], [x-B], [K-4]).

1.1 HISTORICAL BACKGROUND AND LITERATURE SURVEY

The problem of estimating a stochastic signal from a noisy
observation record has been intensively studied since the appear-
ance, during the early 1940's, of the classical work of
AsN. Kolmogorov [K-9] and N. Wiener [W—l]. Rolmogorov studied
only discrete-time stationary processes and solved the problem of
linear estimation of such processes using a technique which was
based on the time-domain recursive orthogonalization of the observed
data. This technique was suggested by Wold [W-6] in his doctoral
dissertation in 1938, and hence is known as the Wold decomposition
method. Kolmogorov's theory extended to vector valued random
elements by Wiener and Masani [W-3], [W-4] in 1957-58.

On the other hand, Wiener [W-l] studied the linear estima-
tion prOblem of continuous-time processes and reduced it to the
problem of solving a certain integral equation, the so-called
."Wiener-Hopf equation". This equation was already studied by
Wiener and Hopf [W-3] in 1931. It can only be solved explicitly
for certain special cases of the general estimation problem. The

solution involves formidable mathematics, which was beyond the

reach of most engineers at that time, even though Wiener undertook
this work in response to an engineering prOblem. Because of its
complexity, Wiener's theory did not receive proper attention and
recognition for many years.

In 1950, H. Bode and C. Shannon [B-3] gave a different
derivation of Wiener's results based on ideas in a report by
Blackman, H. Bode, and C. Shannon [B-Z]. The work of Bode and
Shannon was instrumental in popularizing Wiener's theory. The
same approach was independently discovered by L. Zadeh and
R. Ragazzini [Z-3].

In 1950's the idea of generating linear estimates recursively
was introduced. Such algorithms were used by Gauss in 1809 [G-l]
in his numerical calculations of the orbit of the astereoid Ceres.
But the modern interest in recursive estimation was stimulated by
the increased usage of digital computers. The first modern work
on this subject was done by R. Kalman [K-3], in 1960. The prac-
ticality of the Kalman approach to the estimation problem has made
it immensely popular among engineers.

The paper by Kalman in 1960 introduced a different approach
to the linear estimation problem of Kolmogorov and Wiener in the
case of a special class of discrete-time stochastic processes. In
1961, R. Kalman and R.S. Bucy [K-6] generalized Kalman's results to
continuous-time processes. The novelty of their formulation was
the representation of all stochastic processes by state equations
that are driven by additive white input noises rather than correla-
tion functions. By restricting their attention to GausséMarkov

processes given by difference or differential eqautions, in

particular, they derived difference (for discrete-time) and dif-
ferential (for continuous-time) equations for the filters, which
are called Kalman filter and Kalman-Bucy filter, respectively.
These equations can be used to construct a linear filter that is,
of course, identical to the one specified by the Wiener-Hopf equa-
tion. However, there is a definite practical advantage in having
a differential (or difference) equation for the estimate instead
of an integral equation for the estimator. Specifically, it is
much easier to solve a differential equation by analog or digital
techniques than to solve an integral equation and then perform a
convolution. This computational advantage of the Kalman approach
to the linear estimation problem.has stimulated a great number of
papers, providing alternative derivations, extensions and relation-
ships tO classical parameter estimation techniques. It may be

an overstatement to suggest that there are as many derivations of
the Kalman filter equations as there are workers in the field.

Most of these works are now in standard texts [A-l], [A-Z], [B-7],

[8-9], [9-1], [J-l], [L-l], [L-z], [N-l], [8-1], [341.

1.2 STATEMENT OF THE PROBLEM

A precise statement of the problem considered in this study
will now be presented. We first note that, in the last decade, all
the work on the Kalman filtering theory assumes that the signal is
generated by a given linear stochastic difference equation driven
by a white-noise process. It is known that such a stochastic dif-
ference equation generates a wide-sense Markov process (and it

always can be written as xk I §(k)uk where §(k) is an invertible

matrix and uk is a wide-sense martingale process (cf. [M-l],
[M-2] and see also chapter 2, Section 2). This remark motivates
us to the following problem which is characterized by the assump-
tions on the signal xk, k I 0,1,2,... and output noise

vk, k I 0,1,2,... in the general problem (1.2) described in

Figure 1.1.

(1.4) SIGNAL. The signal xk, k I 0,1,2,... is assumed to be an
n-variate, second order stochastic process in discrete-time, and

given by
xk I §(k)uk k I 0,1,2,...

where k I 0,1,2,...,§(k) is an n X n matrix of known functions

of discrete-time, and uk’ k I 0,1,2,... is a wide-sense martingale

(see Section 2 of Chapter 2) with zero mean and known n x n

positive semi-definite covariance matrix sequence {Ph(k), k I 0,1,2,...}

where (cf. Section 2 of Chapter 2)
d T
Pu(k) dluk ui}

for k s i, k,i I 0,1,2,... . As a notational convenience, for any

integer k, the definition:

two 9- a (uk+1 - “1.) (um1 - uk)T}
I Pu(k+l) - Pu(k)

is made.
In addition, without loss of generality, it is assumed that
the initial value uo is orthogonal (cf. Section 1 of Chapter 2)

to the output noise. Notice that, if not, define Gk I uk - uol

then do I 0 and therefore it is orthogonal to any output noise.
So that this assumption is not a restriction; it is just a nota-

tional convenience.

(1.5) OBSERVATIONS. The observations are corrupted by additive

noise such that

I H(k)xk +-v

yk k

I M(k)uk +-vk
for k I 1,2,... (thus the starting time is k I l) where

yk E Rm (m s n), observation vector,

VR 6 Rm , output noise vector .

H(k) is an m.x n matrix of known functions of discrete-time

and M(k) 2 H(k)§(k), k I 0,1,2,... . Note that, for k I 0,

M(O) I H(0) I 0 since at the time k I 0, there is no observation.
The following assumptions are made on the output noise

vk, k I 0,1,2,... each of which leads to a prOblem in the estima-

tion theory:

(A.1.l) BASIC PROBLEM. The output noise v k I 0,1,2,... is

k,
a zero mean white noise (cf. Section 2 of Chapter 2) and that
dlv VT} I P (k)6

k j v kj

for k,j I 0,1,2,..., where Pv(k) is an m X m matrix of known

functions of discrete-time and it is assumed that P§(k) > 0 1)

 

1) If P is asymmetric matrix, P>0 (P20) means P is
positive (semi) definite.

Vk I 0,1,2,..., and 6kj is the Kronecker delta:

1 if kIj

o if kl‘j

In addition, it is assumed that the process {vk, k I 0,1,2,...}

and {u k I 0,1,...} are orthogonal, i.e.,

k+1 ' “k’

6{vk(u -uj)T}=o 1r jik .

j+l

(A.1.2) CROSS-CORRELATED NOISE PROBLEM, The process vk,
k I 0,1,2,... is a zero-mean white noise process as in (A.1.1),
k . 0,1,000}

except that it is correlated with the process u

{Ute-+1 ' k’

such that

T
6{(uk+1 - uk)vjl I C(k)6kj

for k,j I 0,1,2,... where C(k) is an n x m matrix of known

functions of discrete-time.

(A.1.3) COLORED NOISE PROBLEM. The output noise process is the
output of a known linear system with a white noise input (see

Figure 1.2):

vk+1 I l(k+l,k)vk +nk

for k I 0,1,2,... with the initial condition v0, where

vk 6 Rtn , output noise,

m

nk E R , white noise,

and l(k+l,k) is the UlX u: transition matrix of known function

10

of discrete-time. It is assumed:
1. The initial condition V0 is a square-integrable random vector

with zero mean and known m X m covariance matrix

 

 

 

l ¢(k+l,k) _ ,
I m X m

Figure 1.2. Block diagram for the output noise.

 

 

d T ,
Pv(0) I 6{vo v0} which 18 orthogonal to the processes uk,

k-0,l,2,... and n k=0,1,2,...

k,
2. The process nk, k I 0,1,2,... is a white noise process with

zero mean and that
6 HT} I P (k)6
{nk j n kj

where Pn(k) is an m X m matrix of known functions of discrete-

time. In addition, the processes k I 0,1,2,...}

{“k+1 " “k’
and {nk’ k I 0,1,2,...} are correlated such that

T
6[(uk+1 - uk)nJ} I C(k)5k1

for k,j I 0,1,2,..., where C(k) is an n X m matrix of known
functions of discrete-time.

T T
M(k+l)C(k) + C (10M (k-H.) + Pn(k)

is positive definite for all k I 0,1,2,... .

11

The problem is now, under one of the assumptions (A.1.l),
or (A.1.2) or (A.1.3), to find the minimumimean-square estimate of

the signal x I §(k)uk defined by (1.4) and observed via (1.5).

k
Our main interest is to compute the optimal estimate *klt
based on the observation record Y(;) and the corresponding

d

estimation error, ikl‘ I xk - k , covariance matrix:

kl;
d ~
Piﬂclt) - sum 55:”) .

So, by the solution Of the Optimal estimation problem we mean a

set of equations which allow us to compute the pair (ile,P (le)).
i

We shall refer to this pair as prediction if k >';, filtering

if k I ;, and smoothing if kl< L.

1.3 OUTIJNE OF THE THESIS
The outline of the dissertation is as follows. Chapter 2
contains the mathematical background upon which the derivations
of the estimation equations are based and the basic results. In
Chapter 3, the Optimal estimation equations for the signal
xk I Q(k)uk are derived under Assumption (A.1.1). Optimal estima-
tion equations for the signal xk I §(k)uk under Assumptions
(A.1.2) and (A.1.3) are derived in Chapters 4 and 5 respectively.
The results of the thesis are reviewed in Chapter 6 and

conclusions are drawn concerning the application of this approach.

A number of extensions of the present research are proposed.

CHAPTER 2

MATHEMATICAL BACKGROUND AND BASIC RESULTS

This chapter is devoted to the basic mathematical notions
which are used throughout the dissertation and a new solution of
the general discrete-time optimal estimation problem (cf. Section
3). The material presented in Section 1 and Section 2 is based
on the works of Wiener and Masani [W-B], and Mandrekar [M-Z],
respectively. The proofs of the new results and the known results

whose proof belong to the author are presented.

2.1 HIlBERT SPACE OF RANDOM‘VECTORS

Let (0,4,P) be a probability space; that is, n is a
set of points w, d is a a-algebra of subsets of n, and P is
a probability measure on 0. A certain property is said to hold
P-almost-everywhere on n (or with probability one) if the proba-
bility of the set of points w at which this property does not
hold equals zero. We indicate this property with the expression
(mod P).

Let (XyB) be a measurable space (cf. [R93], p. 217).
A function x : 0 ... x 1) is called a random elemnt with range
in X, if it is d-measurable; i.e. VB 6 B : {wlx(w) E B] 6 d.

A random element with range in a finite n-dimensional linear Space

 

1) That is w-L x(w) e x.

12

13

is called a random n-vector. In the case in which X is the real
line R and B is the o-algebra of Borel subsets of R, the func-
tion x is called a random variable (abbreviation: r.v.). In this
study, the range space x is chosen to be the Euclidean n-space
R“, and o-algebra B is chosen to be the a-algebra of Borel sub-
sets of Ru, so that a random element x with range in Rn is

(1)

a random n-vector (column) with r.v. components x , i - 1,2,...,n.
We denote by L2(n,d,P) (or 1.2) the set of all r.v. x

defined on (0,4,P) which are square integrable:

6i|Xlzl 9- lluwnzmw) < ..

The set Lzm,d,P) is a Hilbert space (abbreviation: H-space)
with usual Operations and inner product (cf. [G-Z], Theorem 7,
Section 5 of Chapter II).

Now, let 1.261.513) (or L?) be the set of all random
n-vectors x on 0, with components x(i) E Lzm,d,P),
1 - 1,2,...,n. Thus x 6 113mm,» iffl) x“), 1 - 1,2,...,n

are random variables and x is square integrable; i.e.

d
dllxlzl - £lx(w)l2P(dw) < .. ,
where l-l denotes the Euclidean norm:

n
lnzéfx-z u“M€
iIl
The space 1361.43) is a direct-product H-space (cf.

[P-l], p. 75) with usual operations and the inner product

 

1) "iff" is shorthand for "if, and only if".

14

" (1) (1)

<X.y>- 2 0‘ .y >L
i=1
n

(2.1) = 2: xmwnmwwmw)
iIl

T
I dlx y} .
This inner product generates the norm

n
as M - <<x.x>>" - <2 lawn?”
iIl

- wash)“ .

which in turn induces a topology in L; : a sequence {xk} in
n n
L2 is said (i) to converge to a vector x E 1.2 iff
llxk - xll .. O as k _. an, and (ii) to be a Cauchy sequence iff
llhk - hmll -0 0 as k,m -0 on.

The inner product (2.1) does not play any significant
role in the stochastic theory, although the corresponding norm
(2.2) and topology it induces do. Rather than inner product we

often use rectangular Gramian matrices.

(2.3) DEFINITION. The n x m matrix

[XaY] 2 La“): y(j)>L]

" [IE x(1)("OF”)(V)1’(dW)] i I 1,2,...,n;
j I 1,2,...,m

that is defined for x E L3, y 6 I; is called the Gramian of

st°1)

 

1) In what follows we assume implicitly that the random vectors
x and y are defined on the same probability space. This assump-
tion is made throughout of this work.

15

In the next two definitions we introduce the concepts of
orthogonality and subspace [W-3]. These definitions differ from
the usual ones in that Gramians replace inner products, and matrix
coefficients replace scalar coefficients in linear combinations.
(2.4) DEFINITION. We say that:

(a) two vectors x,y in L; are orthogonal, written as x 1.y,
iff [x,y] I 0

(b) two sets M,N contained in L; are orthogonal, written as
M 1.N, iff each vector in. M is orthogonal to each vector in N.
(2.5) REMARK. (a) Note that this concept of orthogonality is
stronger than the usual one. For x 1.y, it is not sufficient
that <x,y> I 0.

(b) From (2.3) we see that [Ax, By] I A[x,y]BT, Vh X p and

m X q real matrices A,B; and vecotrs x 6 Lg, y E E3. Hence,
if x 1y, then Ax iBy.

(2.6) DEFINITION. A non-empty subset ‘M of L; is said to be:
(a) a linear manifold if x,y E M a Ax + By 6 M, Vn X n real
matrices A,B.

n

(b) a subspace of L

2 if it is a linear manifold, which is closed

in the topology of the norm (2.2). The subspace spanned by the
family of randomrvectors {xk, k 6 A} where A is an index set
will be denoted by {{xk, k 6 A}.

The basic facts governing the - notions just introduced
which are referred in this study are given in the next two lemmas.
These lemmas are quoted here from the work of N. Wiener and

P. Masani (CW-3], lemmas 5.8 and 5.9).

16

(2.7) ORTHOGONAL PROJECTION LEMiA (Wiener-Masani, 1957). (a) M'

is a subspace of 1.; iff there is a subspace M of L 3 M I M“,
O 2 o

where M: denotes the direct-product Mo ®...® Mo with n factors.

M0 is the set of all components Of all vectors in M.

n

(b) If M is asubspace of L2

and x 6 L2, then H a unique
(mod P) 15, 6 M 9

“x - KN“ ' :2; “X - y“ -

A vector x E M satisfies this equality iff it satisfies the

following equivalent conditions:

x-xulM or [x,leEJtusy] VYEM.

(c) If M,N are subspaces of L; and MCN, then 3 aunique

subspace M' C N 9 N I M 9 M', M .LM' where 9 denotes a direct
sum of vector spaces (cf. [P-Z], p. 38). In particular,

M c: L; is a subspace =9 a a unique M-L, called the orthogonal
complement of M such that 1.; - M 9 Mi, :4 .L M‘- .

(2.8) DEFINITION. The unique vector KM of (2.7b) is called the
orthogonal projection of x onto M, and denoted by (le). The
operator PM on L; defined by PM(x) I (le) Vx E L; is
called the orthogonal projection Operator.

(2.9) LEMMA (Wiener-Masani, 1957). (a) If M,N are orthogonal

n

subspaces of L3, then M @N is a subspace of L2 and for any

xeL;

Mn @N) - (le) + (xlu) (mod P)

(b) If M,N are subspaces 9 MC: N, then for any x e L;

17

“(14”)“ 5 “(KW)“ -

In the study of the properties of orthogonal projection,
a basic role is played by (2.7b). For example, from (2.7b) we
may deduce the following:
(2.10) COROLLARY. Let M,N be subspaces of L;1 a MC: N. Then
the following holds (mod P):
(a) (Ax + By‘M) I A(x‘M) + B(y\M), Vn X n real matrices A,B
and x,y E L;
(b) ((x|M)|N) - <<x\N)|M) = (xlu) vx e If;

(C) V X E M, (X‘M) I x and v y 6 “$3 (ytu) - 0.

We conclude this section with the remark that the orthogonal
projection of any random vector is defined only (mod P). Such a
projection ought therefore to be viewed as any one of an equivalence
class of random‘vectors differing from one another only on sets
of zero probability. Since these sets Of probability zero do not,
in general, play an essential role in this study, the phrase

"(mod P)" will usually be omitted.

2.2 DISCRETE-TIME‘WIDE-SENSE MARTINGALE AND MARKOV PROCESSES

Let (0,0,P) be a probability space. By an n-variate
discrete-time stochastic process on (n,d,P) we mean a family of
random vectors {xk, k E Z}. If for each k E Z,:xk E L; then
the process is said to be a second-order process.

Associated with a second-order n-variate process
{xk’ k E 2) we denote the following:

(i) the mean-value function mx(-) by

18
k 3 1 k 6
mx( ) 8{xk a 2,

(ii) the correlation matrix function (or simply the correlation

matrix) Cx(-,-) by
cx(i.1) 9 a{xi x? - [‘1' xi] 1.1 e 2.

(iii) the covariance matrix function (or simply the covariance

matrix) Px(-,-) by

gua>3aui-gu»a1-gunﬁ
= [xi - mx(i), x-1 - mx(j)] .

If mx(i) - 0 V i E z, then the process is said to have zero mean.
For a zero mean process, it is clear that Px(i,j) I Cx(i,j)
Vi,j e 2. If Cx(i,j) - mx(i)m:(j) that is, if qu'” - o
vi,j 6 2 then the process is said to be uncorrelated. From now
on we assume, without loss of generality, that all the processes
have zero mean unless otherwise stated.
(2.11) DEFINITION. A second-order mlti-variate process
{xk’ k E Z} is said to be:
(a) a white noise (or orthogonal) process iff [xvxj] I PxUMiJ
v1.1 6 z. where P;(J) 9 P;(J.J).
(b) a process with orthogonal increments iff the process
{xlc'I-l - xk, k E Z} is orthogonal.

Let (ark, k E Z} be a stochastic process in 1;. We shall

denote by L(x;{,) and L(xL) the spaces {{x i 51,} and

1’
{ﬁt} respectively (cf. Definition 2.6b). These are called the
past-present and the present of the process {xv k 6 Z} respec-

tively. Obviously L(xL) c: L(x;{,) C L(x;.(,+l).

19

(2.12) DEFINITION. A discrete-time process {xk, k e z} in L;

is said to be:

(a) a wide-sense martingale process, iff VRJ, E 2 with L S k,

(xk|L(x;L)) I XL ;

(b) a wide-sense Markov process, iff V k,L 6 2, with L s k

(xk|L(x;c>> - (xkluxbn .

We see from (2.7b), (2.128) that a process {xk, k.€ Z}
in L; is a wide-sense martingale process iff vk,{, E 2 with

L s k, xk - XL 1. x1 vi 5 L. Hence for a wide-sense martingale
process Px(i,j) I P;(i A j), where i A j 2 min{i,j].

A necessary and sufficient condition for a process

n

{xk, k.€ Z} in L2 to be a wide-sense martingale is given in
the following lemma. This lemma extends to n-variate case the
result Of Doob ([D-Z], p. 166). The proof of the lemma is similar
to the one that is given by Doob, so it is omitted.

(2.13) IEMMA. A discrete-time process {xk, 1:62} in L;

is
wide-sense martingale iff it satisfies the first-order linear

vector difference equation

I x +sw

x k+1 k 10"]. «a -¢n

where (wk, k 6 Z} is a white-noise process.

(2.14) REMARK. (a) In continuous-time case the "only if" part of
Lemma (2.13) does not hold, i.e. in continuous-time a wide-sense
martingale process cannot be generated by a linear differential
equation unless it is differentiable in the sense [M-3]. When that

is so, the approach of the thesis can be applied to continuous time.

20

(b) We see from (2.5b) that if {wk, k.e Z} is a white noise
process in I; then {r(k)wk, k 6 2}, where F(k) is an n x m
matrix function, is a white-noise process in 1;. Hence the

solution of

x = xk + I"(k+1)wk+1 x I w k > -es

m -o

k+l

is a wide-sense martingale, by virtue of (2.13).

One important wide-sense martingale process for our purpose
is the orthogonal projection of a process.
(2.15) IEMMA. Let y E L; be fixed and {xk, k E 2} be a process

In L2. Define

uk - (y|L(x;k)). k e z .
The process {uk’ k 6 Z} is a wide-sense martingale.
PROOF. Since y 6 L2, so is uk, k e 2 by virtue of (2.91:).
Also from (2.10b)
(uk|L(X;L)) = (<y|L<x;k))|L<x;c))
= (ylL(x;L))

'U

L

for V4, 5 k, where we used the fact that L(x;{,) C L(x;k), L s k.
Since L(u;L)<: L(x;k), L s k, by (2.10b)
(ukIMust) . (ule(x;L)|I-(u;t))
= (uL|L<u;c)>

= S k . ED
UL 178 Q

21

Next we study discrete-time wide-sense Markov processes.
The concept of wide-sense Markov processes first was introduced
by Doob [D-Z]. Let {xk, k 6 2} be a wide-sense Markov process
in L2. The definition Of wide-sense Markov process implies that
(xk‘L(x;L)) - A(k,L)xL for L s k, where A(k,L) is an n x n

matrix. Beutler [B-l] proved that

Anus) = ‘3. (morgue)

where P;(k,L) a [xk,xL] and P+ denotes the pseudo-inverse
(cf. A.3) of the matrix P. He also showed that a multivariate

second-order process is a wide-sense Markov process iff

A(kst) ‘ A(k.J)A(JsL)

for L s j s k. The function A(k,L) is called a transition
matrix.

Mandrekar and Salehi ([M-3], Theorems 2.11 and 2.12)
recently gave a representation to a wide-sense Markov process,
extending to a singular case the work of Mandrekar [M-l], [M92]
which shows the connection between the signal process (1.4) of
the problem that is considered in this study, and wide-sense
Markov processes.

(2.16) THEOREM (Mandrekar-Salehi, 1971). Let {xk, k e 2} be
a discrete-time stochastic process in L3. Then the process

{xk' k E Z} is a wide-sense Markov process with the transition
function A(kpc) I P(k,L)P+(L,L) such that Px(k,L)Pﬁ(L,L) is

one-one on R(Px(k,L)) onto R(Px(L,L)1) for k $.L iff

 

1) Here R(§) denotes the range space of the linear O rator
Q (see Appendix A). P0

22

x1‘ I Q<k>uk. where {uk’ k E z} is a wide-sense martingale and

§(k) is an n X n matrix function such that uk 6 R(§T(k)),

R(Q(k)) is independent of k 6 Z. In either case A(k,L) I §(k)§+(L).
This representation is unique in the sense that if xk I ﬁ(k.)vk

with v E R(¢T(k)) then there exists an n x n matrix X such

that Vk 6 Z we have uk I K‘vk and 6(k) I f(k)K+,

(2.17) mm: The Kahan filtering theory (cf. [It-3], [K—4])

assumes that the signal process {xk, k I 0,1,2,...} is generated

by a given linear difference equation Of the form
(2.17a) xk+1 = F(k)xk +'r(k)wk, k I 0,1,2,...

where F(k) and r(k) are matrices of functions of discrete-time

and wk is a white noise process. In addition P(k) is invertib1e

for all k I 0,1,... and wk is orthogonal to a given any
initial condition. Given an initial condition x0, then the unique
solution of (2.17a) is given by the expression (cf. [K—3])
k
(2.17b) x I §(k,o)x +- z §(k,i)F(i-l)w k I 0,1,2,...
k O 1_1 i-l

where §(-,.) is the fundamental solution of

§(k+1.L) B F(k)9(ksL) a QCst) ‘ I

Define
d k
uk = x0 +-1§1 Q(O,i)I"(i-1)wi_1

= uk_1 + §(o,k)I‘(k-l)wk-1 for k 2 1

no.

x for k I 0 .

23

Then (2.17b) can be written as xk I §(k,o)uk. It follows from
the definition uk and (2.14b) that the stochastic process
{“k’ k I 0,1,2,...} is a zero-mean wide-sense martingale whose
covariance matrix

k T T
(2.17c) Ph(k) = Pi(0) +-i§1 9(o.1)F(1-1)P;(1-1)F (1-1)9 (0.1)

- Pu(k-l) + Q(o,k)F(k-l)Pﬁ(k-1)FT(k-I)QT(o,k) .

Thus, by (2.16), the process {xk’ k I 0,1,2,...}, which is defined
by (2.17a) with the initial condition x0 is a wide-sense Markov
process. We shall refer to this signal as a Kalman signal.

Observe that the class of Kalman signals is a special
class of wide-sense Markov processes. It is obviously contained
in the class of signal defined by (1.4). Thus the optimal estima-
tion equations of a Kalman signal may be obtained from the optimal
estimation equations for a signal defined by (1.4), but not con-

versely.

2.3 A SOUJTION OF THE GENERAL MINIMUM MEAN-SQUARE ESTIMATION 15103151
The problem we will discuss in this section can be formu-

lated as follows: Consider two related stochastic processes

{xk, k - 0,1,2,...} and {yk, k - 1,2,...} In L; and I;

(m s n) respectively. We will refer to {xk, k I 0,1,2,...} as

signal and to {yk, k I 1,2,...} as the observation process. The

problem is to find the minimum mean-square estimate of the signal

xk based on the Observation record Y(L) 2 {y1,...,yL}; that is,
L
to find that vector 2 of the form 2 A(i)y , where A(i),
kl; i-l i

i I 1,2,...,L are n X m real matrices which minimizes the mean-

24
square error risk function (1.3):
~ T
JExk‘L] . 6[(Xkdaklb) (xk - ﬁk‘L)}
2
(2.18) = “xk - $144,“ .
The solution of this problem is as follows: lat
n d
L2(Y3L) - £{K(1)Yis 1 - 132:00'3‘6}

where x(i), i I 1,2,...,L are n X m real matrices. Note that

L;(y;L) is a subspace of L;

tion 2.6b). So that the problem is reduced to finding that vector

and LEON.) I‘ L(y;L) (cf. Defini-

ﬁklc in L;(y;L) such that (2.18) is minimmm. From the orthogonal

projection lemma (cf. 2.7b), we know that *k‘ is given by

L
)

A

kuL ‘ (xk|L;(YSL))1

(2.19) - P x 0
L30 :1.) k

L
based on the observation record Y(L), is the orthogonal

That is the linear minimum mean-square estimate *k‘ of the

signal xk,

projection xk onto the subspace L;(y;L) generated by the vectors
K(i)y1, i I 1,2,...,L. We will refer to the operator

(-|L;(y;L) I P n defined by (2.19) as the orthogonal projec-
170;!)

tion estimator abbreviation: OPE).

 

1) We recall that given processes have zero-means, otherwise
at - a + (x IL“(y'L))
144, 1c|o k 2 ’

d
where ﬁle optimal estimate xk given no observation 6{xk].

25

From now on the notation 5114‘ will denote the minimum
mean-square estimate of the signal xk based on the observation
record Y(L) which is given by (2.19), and the expression
"optimal estimate" will mean this estimate. The pair (iktL,PL(k[L))
will denote the optimal estimate and the correSponding error x
covariance matrix of the estimation problem. ‘We refer to the pair
, k I 0,1,2,...

k
based on the observation record Y(L), L I 1,2,... if k,L are

(RR‘L,P (le)) as optimal estimation of the signal x
x

not specified, as gptimal prediction if k >.L, as Optimal filtering

if k I.L, and as optimal smoothipg if k~< L.

 

Now let {xk, k I 0,1,2,...} and {uk, k I 0,1,2,...} be

two signal processes such that
(2.20) xk I Q(k)uk k I 0,1,2,...,

where §(k) is a linear transformation which may not be invertible
and xk,uk€ L'z‘ Vk -o,1,2,... . 1e: {yk. k - 1,2,...} in

L; (m s n) be the observation process for the signal

{xk, k I 0,1,2,...}. Then the optimal estimates of the signals

x and u based on the Observation record Y(L) are

k k

914;, - (xk‘L;(y;L)) and “up, - (uk|L;()';L))

respectively. Since xk I §(k)uk we have
n

*le - «(wuklhzmm

(2.2m - sac) «Rages»
'3 Q(k)ﬁk‘{’ 0

Thus the OPE commute with any linear deformation of the signal

26
process. Furthermore if uk 6 R(§T(k)) V k I 0,1,2,..., then

§+(k)xk - fans (1c)uk

- p u , by (1.3)
auras) 1‘

.11

k

since uk E x(QT(k)) V k I 0,1,2,... . Hence, it follows from

(2.20) that»

(2.21b) a - fans:

kll. kit ‘

From the definitions of estimation error and covariance matrix of
error, and (2.21a,b) similar results can easily be derived for the
estimation error and its covariance matrix. These results are
summarized in the following lemma.

(2.22) LEMMA. Let {xk, k I 0,1,2,...} and {“k’ k I 0,1,2,...}
be two signal processes as above. The optimal esthmations

(s‘ck‘t, P~(k‘.(,)) and (a , P~(k|{,)), kg, - 0,1,2,... based on
x U

k|L
the observation record Y(L) of the processes xk, k I 0,1,...

and u k I 0,1,... are related to each other via the following ﬂ‘_

k’ --..

re lat ions : "“H

(a) The optimal estimates:

2 I §(k)ﬁ k,L I 0,1,2,... .

14L klc

The estimation errors:

- §(k)ﬁk| k’l, - 0,1,2,... 0

*le l.

The error covariance matrices:

Pia“) - §(k)P~(k|t)§T(k) 1m. - 0.1.2.... -
u

27

(b) If in addition u E R(§T(k)) V k I 0,1,2,... then for

k
k,L I 0,1,2,... we have

u = §+(k)ﬁ

klt kl; ’

a - fags:

kl; kl; ’

T
504:.) -= s+<k)r~(k|m+ (k) .
u u

(2.23) NOTE. From this lemma we conclude that if we are given a
signal process which can be written as a linear deformation of)
another signal (i.e. can be written in the form (2.20)) then it
is sufficient to derive estimation equations for the latter signal.
Thus in our problem (cf. Section 2 of Chapter 1) we may consider
signal process as {uk’ k I 0,1,2,...} and derive the estimation
equations for this signal then use (2.228) to obtain the required
equations for the original process. Since the process
{uk’ k I 0,1,...} is a wide-sense martingale this approach will
simplify the derivation of estimation equations as will be
demonstrated.

FollowingWiener and Masani [W-B] (also see [It-8]) we shall
say that the stochastic process {yk, k I 0,1,2,...] in E;
is purely non-deterministic iff for all k, yk 4 L(y,k-1) where
L(y,k-l) is as defined in (2.6b). Hence for any purely non-

deterministic process {yk, k I 0,1,...},

(2.24) i o k - 0,1,2,...

S"1c+1\1c ' yk-l-l ' 91c+1|1c

where 9k+l‘k 2 (yk|L(y,k)). For any second order process, we

shall call the process {yk+1‘k, k I 0,1,...} the innovation

28

)

process1 associated with {yk, k I 0,1,2,...}. We Observe that

for purely non-deterministic second order process yk+1|k * 0

for all k. This process plays very important roles in this study
because of its simple structure, as shown in the following lemma.
(For the previous results in this line see [C-2], [C-3], [W-3]).
(2.25) 1.1mm. If Wk-u‘k' k - 0,1,2,...} Is the innovation-
process of a stochastic process {yk’ k I 0,1,...} in ﬁg then

it is an orthogonal process, i.e.

Wk+1|k’ an“) ' P§(k+l)6kj°
PROOF. In view of (2.11s) we must show that
Whine §j+1|33 " °
for k i j.
From (2.24) we have if k > j
[7k+1|k' an“) .. [mus- yj+1] " WHl‘k’ 9J+IHJ

' WHI‘R’ yj+l] " [ﬁn-aw 93+1| 5]

a O ‘

Since §k+1‘k 1.L(y,j+1) for j+l S k. Similarly we find that

Wk+1|18 yJ+1|j1 " Pyo‘ﬂnkj '

 

1) Note that this definition of innovation process is different
than one that was recently given by Railath (cf. [K-IJ). Our
definition is the one that was given by Cramér [C-3]. (Also see
[W‘3]) s

29

where

d ..
Rios-+1) - [ac-”‘1‘. yk-I-l‘k] 2 o . QED

It is Obvious that the stochastic process {yk’ k I 0,1,...}

is purely non-deterministic if rank P~(k+l) 2 l for all
k I 0,1,... . Let rank P~(k) I r(k) {2 1. We shall refer to r(k)
as the rank Of the stochasZic process {yk, k I 0,1,2,...}. It
is clear that r(k) s m, Vk I 0,1,2,..., if r(k) I m, Vk then
we say that the stochastic process has full rank. We note that
P~(k+l) is invertible iff r I m, that is, the stochastic process
{:k, k I 0,1,2,...} has full rank (cf. [C-B], [W-3]).

Related to the innovation process associated with a
signal process {yk, k I 0,1,2,...} we have the following result
(see also [C-3], [W-3]).
(2.26) LDIMA. Let {ykﬂ‘w k I 0,1,...] be the innovation
process associated with a stochastic process {yk, k I 0,1,2,...}.

Then for L < k, L,k I 0,1,2,...,

and
L(y;k) . LOW.) 69 LGL-tl‘t) 69 Lﬁti-ZIL-I-l) e...c+> Lﬁk‘kd).

PROOF. In view of (2.25) the subspace Last-HR.) , L(YL+2‘L+1), . . . ,

Lgk‘k-rl) are orthogonal to each other. So that

is a subspace by virtue of (2.9a). Since by (2.24)

3O

yL-I'llL’ yL+21L+1"°°’ yk‘k-l J. L(Y;L)s L(Y;L+1)s-°'a L(y;k'1)

respectively, and L(y,L) is contained in all these subspaces,

it follows that

and therefore by (2.9a)

Mm) 69 L6 ) ems) LG

4+1“ k‘k- l)

is a subSpace. Now it remains to show that this subspace is
equal to L(y;k). Since L(y,k-1)C L(y,k) and yk‘k_1€ L(y;k),

we have
L(Ysk) D L(Ysk'1) @ Llek-l) 0
On the other hand, by (2.24)
(*> yk . and + (ykIL(y;k-1)) E LGk‘k_1) (+3 L(y.k-1)»

and for L < k, yL 6 L(y;k-1) C LGk‘k-l) @ L(y;k-1). It follows

that
(...) L<y.k> c L<y.k-1) e LGk‘k-l) .
Coutining (*) and (we) we Obtain

L(y.k) - Lox-n e Llek-l"

By iteration of this equality we get

L(y.k) L(y.k-l) 69 LG"

k‘k-l)
L(Y9k’2) @ Lgk-l‘k-Z) a LGk‘k-l)

L(y 9%) Q LGNI‘L) @. o .@ Lﬁk‘ k_1) . QED

31

Now, consider the signal process {xk, k I 0,1,2,...} in

and the associated observation process {yk, k I 1,2,...} in

N:

(m s n). Suppose that the process {yk’ k I 1,2,...} has full

his

rank; then we have the following fundamental result:

(2.27) THEOREM. The optimal estimation (Rkw P (k‘L))
i

k,L I 0,1,2,... of the signal xk, k I 0,1,2,... is accomplished

as follows.

(a) If k >.L I 1,2,..., Optimal prediction:

(2.28) ﬁk‘L

- ﬁkjL-l + C(k‘L)§L‘L-1 , Optimal predicted estimate,
_ ~ ~ -1
(2.29) C(k‘L) [xle_l, yL‘L'1][yL‘L'1’ iL‘L‘l] , predictor

gain matrix

2.30 k -Pk-1- k .5?
( ) PRHU id; ) mung";1 14:,-

error covar iance matrix .

1], prediction

For I, - o, 2140 - (${xk} - o and pump) - rum .
(b) If k I L I 1,2,..., optimal filtering:

d

(2.31) 2 I R + C(k)? , optimal filtered estimate, I

k|k k " StIc|1c-I 14m

(2.32) 600 I [ik‘k-l’ S’k‘k_1][§k‘k_1. $410434, filter gain matrix,

(2.33) P~(k‘k) 9- P~(k) - P~(k|k11) - C(k)[§k‘k_1. 1,4191].
x X x

filtering error covariance matrix.

For k I O, 20 I 6{xo} I 0 and P%(0) I P;(0).
(c) If k.< L, optimal smoothing:

L
(2.34) It IR + r, c(k,I|I-I)y

, optimal smoothed estimate,
14:, k iIk+1 I|I-1

32

(2-35) G(k,i|i-l) ‘ [ik‘i-l’ 21‘1_1][yi‘1_1a 21‘1_1]-1, smoother

gain matrix,

L
(2.36) 1:04;) = P~(k) - z C(k, i‘i- -l)[y
x x

iIk+l i‘i-l’ Sikh-11’

smoothing error covariance matrix.
For k - L, 9141‘ - *1. and p (k|k) - p (k).
i i
PROOF. (a) Since by (2.26)

n . - n . - n n . _ n 1)

we have

gm .. (kuL2(Y;L- 1) s L“ 26“?» 1

I (xleZO'm-ID + (xkiLZG‘L‘LdD . by (2-28)

G(k|L)y L-1,2,... k2;

' itlick-1+ L-|L 1

where G(k|L) is the n X m gain matrix to be determined.

To determine the gain matrix G(k|L), notice that from

(2.7b)
xk - G(k‘L)Sr‘L‘L_1 .L yL‘L'l .

SO that

[1:18 yL‘L_1] - G<k|L)[yL‘L-l’ yL‘L'l] °

 

1) Actually by (2. 26) we have L(y;L) I L(y ;L- 1) 6 1.0 W' 1).
it easily follows from the definition L;(y;L) that Lemma.2. 26
holds for this subspace too.

33

Since the observation process has full rank, we get

~ I1
C(k‘L) ‘ [xk’ yL‘L'l-J‘ZYL‘L'I. Luff]

for L I 1,2,..., k 2 L. Noting that x and

k - xk‘L—l + itk|L-1
ﬁle-l .L $44711) we may write the expression for C(k‘L) as

.. ~ .. -1
(2.29) each.) [xk|L_1.yL‘L_1][9m_l. yL‘L-l] .

The prediction error is by definition

~ A

Xk‘!’ - xk ' xk‘L

. gk‘L-l " G<k|L)yL‘L'1 s L " 1329000

Therefore the error covariance matrix is given by

P2041“) ' [itch-1 ' “(Hahn-1’ s{Rh-1 ' C(RILWLIL-IJ
~ ~ T
- [xk‘L-l’ gk‘L-ll + C(k‘LHYLIL-li YL‘L'116 (k‘L)

- ccklmmw. “He-1] - mud. 2d,-IJGT040 .

Using (2.29) and noting that [ik‘L-l’ YL‘L'IJT I [gut-1’ 214131]

we get '.

‘b

(2-30) P (k L) " P (k L-l) ' 60‘ L215 9 5E
52 1 i I I L‘LII “$-11

for k > L I 1,2,... .

For L I 0, Obviously

 

1 .. -1 ~ “,-1,b virteof 2.5b
) YL‘L_1.LL(}',L )gyth-IH'ZG!’ ) y u ( )

~ n -
a yL‘LIl I gk|L-l , since Rk|L-1 E 1.2(y,L l).

34
ﬁk‘o I optimal estimate xk’ given no Observation
I dka} I 0

and

d
P%(k‘o) . [xk - ﬁk‘o’ xk.- gk‘o]

" [xk’xk]

d
I Px (k) .

(b) Since the above equations hold for k 2 L, by letting
k I L I 1,2,... we obtain expression for filtering (2.31-2.33).

For k I L I 0, 20 I 0, since

20 I Optimal estimate xo given no Observation I
I O .

Thus P~(0) 2 [£0,20] I [xo,xo] S Pk(0).
x

(c) Since by (2.26)

130:2) - 1.3030 s gem”) owe L26 1). s > k

L‘LI

\
5“

and the subspaces on the right of this equality are orthogonal to

each other, the optimal smoothed estimate is

£14 L I (xkl 1‘2"“)
- (x I “(y-k) s “6' Name IN? >)
k L2 ' L2 k+l‘k 2 ”(,1

L
(2.34) -R + I: G(k,i I-Iw
1‘ iIk+l ' 1‘1'1

where G(k,i‘i-l), i I k+l,...,L are the n X‘m gain matrices

to be determined. If the steps lead to (2.29) repeated here, their

35
results

~ ~ -1
(2.35) C(k.i\i-l) I [xk‘1-l: 71‘1-13[VI‘I-I' yi‘i-l]

for i g k+1’ooo,{,s

TO obtain an expression for the smoothing error covariance

matrix we note that

k|L k le
L
e xk - ij§+lc(k'1‘1'1)71|I-1

for L > k I 0,1,2,... . Hence

L L
P (km = [i’ - 2: cam 14):; . g - a Mk 1 14>? - J
i k iIk+l ‘ 1‘1'1 k iIk+l ’ | 1‘1 1

L
~ ~ ~ N T
' s +' Z G k,i 1-1 , G k,i i-l -
[xk. xx] 1_k+1 ( ‘ )[yi‘i-l yI|I-1] ( 1 )

L
- Z C(k 1 1-1) y ,f
iIk+l ' ‘ L 1‘1'1 k]

L T
- i ’ G k,i 1-1
£51 [ k 7I‘I-1] ( 1 )

where we made use of (2.25). Substituting (2.35) into this expression

and noting that [§k,9 ] we obtain

i‘i-l] ' [xk‘i-l’yi‘i-l
| L | t
(2.36) Hum-Pao- E cacti-1)? -.5? -1
i g 1dk+1 ' I|I 1 k|I 1
for L > k I 0,1,2,... . For k I.L, it is obvious that
Rk‘k . ik and P;(k‘k) - P;(k). QED
X x
This theorem is basic in the study of discrete-time linear

estimation. To the best of the author's knowledge, the results of

this theorem.do not exist in the literature. At this point, we

36

make the following remarks related to this theorem.

(2.37) REMARK. The only assumption related to the signal process
is that the signal process is second order. The Observation process
is assumed to be second order and have full rank. The full-rank
assumption can be dropped by using the generalized inverse (cf.
Appendix A) instead of the inverse. If the process has constant
rank r, then by using a suitable invertible transformation on

the Observation process one may Obtain a full rank equivalent
Observation process.

(2.38) REMARK. To determine explicitly the estimation equations

in Theorem 2.27 we need only deterndnethe following matrices:

Lik‘i-l’ yi‘i-l] and [yi‘i-l’ yi‘i-l] '

Since the process {yk|k-l’ k I 1,2,...} is a white-noise process,
the computation of the second matrix above is easy, actually it

is usually given as part of the problem. The determination of the
first matrix is rather involved and usually is possible by making
further assumptions on the signal process, such as being generated
by a given linear difference equation driven by a white-noise pro-
cess (Kalman filtering theory) or as in (1.4) (cf. Section 2 Of

Chapter 1).

CHAPTER 3

BASIC PROBIm (BP)

This chapter is devoted to the derivation of optimal
estimation equations for the basic problem (BP). In this problem
the signal and observation equation are described by (1.4) and

(1.5), which are repeated here for convenience

signal: xk I Q(k)uk, (uh, k I 0,1,2,...} is a
wide-sense martingale

Observation: k I 1,2,...

yk I M(k)uk + v

ks
The assumptions on the initial signal xo (or no), output noise
are the same as those stated in (1.4) and (A.1.1). The matrices
{(k) and M(k) have been described in (1.4) and (1.5). The
problem is simply to find the minim mean-square estimation
equations for the signal x1‘ I Q(k)uk based on the observation
record Y(L), for k,L I 0,1,2,... . In view of Section 3 of
Glapter 2, we need to derive the equations for the stochastic
process {uk, k I 0,1,2,...], then use (2.22s) to get the equations
for the process {xk I 900%. k I 0,1,2,...}.

The estimation equations are derived in the following
three sections. Section 1 is devoted to the prediction problem.
Where three distinct classes of prediction are defined and a
recursive algorithm developed for the single-stage prediction.

Sections 2 and 3, the filtering and smoothing problem are
37

38

considered, and recursive algorithms are derived for the filter
and smother. A simple example is given in Section 4 to illustrate
the application of the results of the earlier sections.

We need to note here that the results of the present
Chapter can be Obtained from the results of the following chapter.
The primary reason, however, for its separate treatment is to pro-
vide a full exposition of the new approach, with an explicit state-
ment of the terminology, followed by the derivations of major

est imat ion equat ions .

3.1 OPTIMAL PREDICTIm FOR BP

We recall from Chapter 1 that in the prediction problem,
we wish to obtain the optimal estimate ﬂ‘k‘L of the signal “k’
based on the observation record Y(L) I {y1.y2,...,y‘c}, where
k > L. In other words, we wish to obtain the estimate of the
signal at a time in future in terms of the existing data at the
present time.

Our primary interest here is to obtain data prediction

algorithm for the signal uk. In particular, we wish to develop

\

algorithm which are recursive in time, thereby permitting us to “aeor

perform predict ion efficiently with a digital computer. Before
attempting this, however, it will prove expedient first to classify
predicted estimates according to the possible relationship between
the two time indices, k and ’L. The need for this classification
arises because both indices are variables or one may be fixed and
the other may be allowed to vary. Depending on how the time indices

vary, three distinct classes of prediction can be defined:

39

(3.1) DEFINITICN. (a) Fixed-interval prediction: 6k”, L I L I
fixed positive integer, k I L+l, L+2,...
(b) Fixed-point prediction: ﬁk‘L
L I 0,1,2,...,N-1.
(c) Fixed-lead prediction: 0k”, L I 0,1,2,..., k I L + L where
L is a fixed positive integer.

Having introduced three distinct classes Of prediction
we now derive a formla for the general prediction problem, and
then proceed to Obtain algorithm for computing the Optimal fixed-
interval, fixed-point, and fixed-lead predictions.

From the orthogonal projection lama, we know that the

predicted estimate of the signal u based on the observation

k
record Y(L) which is optimal for the mean-square error risk

function is
a - (u | “(rm
up, I: I'2 ’

for k > L, k,L I 0,1,2,... . In addition, since the orthogonal
projection is unique, all the predicted estimates are unique.
Using the fact that uk - UL .L 1.;(y3L) V k 2 L (cf. (1.4)

and (A.1.1)) we have

I _ n .
“le (uk “c“ch'z‘m”

- (uk - uL\L‘2‘<y;m + Milieu»

(3.2) I at k 2 L
where 8" 204“. is the filtered estimate. Thus the optimal pre-

dicted estimate is equal to the optimal filtered estimate at the

present time. The filtered estimate 8‘ is obtained by the use

, k I N I fixed positive integer,

"‘\

‘5‘ r}

 

 

_.._

40

of a filtering algorithm.

The algorithm (3.2) is valid for all the classes of pre-
diction, although the computational procedure must be altered
slightly. We now consider the above three cases separately.
FIXED-INTERVAL PREDICTION. Let L - L - fixed positive integer,

then the fixed interval predicted estimate is

(3.3) O

I O

le L

for k I L, L+1,... . The corresponding error is by definition

-‘ I: - cl- .-
uk uL uk uL UL GI.

(3.4) IGL+uk-uL, kIL, I.+1,...

or

E I u +'u

k‘L k -‘uk-l , since ﬁk‘ I 8 0L

k-l " 6‘R-I‘L L k-1|L '

(3'5) " aIc-I‘L + “k ' URI].

for k I L+1, L+2,... with the initial condition 3 ﬁ .

L|L' L
From (3.4) and (3.5) we Obtain the following expressions

for the covariance matrix P (le) of the fixed-interval pre-
“x' e
diction error:

d
P (k L) I " . G

P~(L) + Punt) - Pu(L), by using (3.4)

u
k-l
(3-6) I P (L) + z Q(i) , since Q(i) - P (i+l)-P (I)
{I iIL “ ‘1

for k I L, L&I,..., where P (L) is the filtering error co-
a
variance matrix at time L; or

41

P~(k[L) - P;(k-1‘L) +-Ph(k) - Ph(k-1) , by using (3.5)
u u

(3.7) = sac-III.) +Q(k-1)
u

for k I L+1, L+2,... with the initial condition P (L‘L) I P (L).
E a

(3.8) REMARK. It is easily shown that (3.6) is the solution of
the linear matrix difference equation (3.7) with the initial con-

dition P (L|L) - P (L).
G 6

(3.9) REMARK. In order to compute the fixed-interval estimate
ak‘L’ the only value of the filtered estimate we must know is
613 where L is the present time. Hence we do not need to con-
tinue to process the filter algorithm once all the data have been

received.

(3.10) REMARK. It is easily verified that the process

{Gk|L’ k I L, L+1,...} and {G I L, Lf1,...} be defined

k‘L’ k
by (3.3) and (3.5), respectively, are zero mean wide-sense martin-

gale processes. The process {ﬁk|L’ k I L, L+l,...} being a zero

mean wide-sense martingale process Emplies that
G - E 1.6 i j s k ,
so that
nakuuz - nan, - “m. + a”)?
= um, - am,“2 + uamuz
2 “amuz -

This result implies, as One would expect on physical

grounds that the magnitude of the norm of the covariance matrix

42

increases with increasing k for a fixed L. On simpler terms,
the error Of the future estimates with a fixed observation data
is larger for more distant future estimates.
(3.11) REMARK. As shown above the fixed-interval prediction
(ﬁk‘L’ P%(k‘L)), k > L, based on Y(L) is accomplished via (3.3)
and (3.6) with the initial condition (0L, P;(L)). If we are

u

given (0L, P (L)) then the fixed-interval predicted estimate
11

is obtained without calculation, in fact 6 I 6L, and

uk‘L k|L
the fixed-interval prediction error covariance matrix is computed
through simple algebraic operations. But, the only value of L
for which (0L, P~(L)) is known without processing the filter

U

algorithm is L I 0. On the other hand for L I 0,

d
6 I G I optimal estimate u given no Observation
0‘0 0 o
I 6(uo} I o
and
d ~ ~
P (0) I [u , no]
G
I Ph(0) , since u I uo - no I uo .
...g\
80 that Nah.
I O a d P I .
ak‘o n G(k\o) Ph(k)

This result, though,trivial, shows that the future estimates with-
out Observation is zero, and the covariance matrix of the estimate
is the same as that of the signal process. Thus expressions (3.3)
and (3.6) have limited practical use as far as performing the

fixed-interval prediction is concerned.

43

FIXED-POINT PREDICTION. In the fixed-point prediction problem,
we wish to Obtain the optimal estimate of the signal at a given
time in the future as function of the current time. Here the

form of (3.2) that is of interest is
(3.12) aN‘L I 6L L < N I fixed positive integer.

In this case we continue to process the filter algorithm as the
data arrives as opposed to the fixed-interval prediction, where
we stop processing the filter algorithm once all the data are
received.

The equations of the error and the error covariance matrix
Of the fixed point prediction are easily obtained from their

definitions as:

(3.13) ﬁNlé =E +u - u

{mm = {(2) + rum) - Pun)
u U
N-1
(3.14) I P~(:,) + 2 (2(1)
u iIL

for L I 0,1,2,...,N-1 with the boundary conditions 5

MN - {IN

and 11mm) - P~(N).
u u
Note that the fixed-point prediction error covariance matrix
eventually is equal to the filtering error covariance matrix, since
N is fixed and L eventually is equal to N.
FIXED-LEAD PREDICTION. This is probably the most used case as it
has application to control systems for "lead" correction action,

etc. [L-lg. Here, we wish to predict the value of the signal a

fixed amount of discrete-time L in the future from the current

44

time L, i.e., we wish to predict the signal with lead L. Hence

the form of (3.2) that is of interest is

(3'15) A L - 091329000

GL'HJL ' “I.

SO that as in the fixed-point prediction, we must continue to pro-
cess the filter algorithm as the data arrives.
The error and error covariance matrix of the fixed-lead

prediction, respectively, are given by the following equations:

(3.16) EHL‘L I UL + uL+L - “‘6 , L I 0,1,2,...

P~(L+L‘L) . 11(1) + Puma - Pam
u u

L-hL-l
(3-17) I P (L) + 2 0(1) . L I 0.1.2.”-
E iIL

with the initial conditions GL‘O . 0 and P (L‘o) - Pu(L).
8

From (3.16) we see that the magnitude of the fixed-lead
prediction error depends upon the amount of lead I” When L I 1,
the magnitude has its smallest value. For that reason, this special

case has merit attention, and is called the single-stage prediction.

\

We shall obtain a recursive algorithm.for computing the single-stage ““*

prediction 0 I 0,1,2,...,. In develOping the algorithm

k+1‘k’ 1‘
for the single-stage prediction, we assume, only the initial
estimate ﬁl‘o I 0 and the corresponding covariance matrix of the
error P;(l‘o) I Pb(l) are given. The algorithm‘will depend only
on the pgevious estimate lek-l and the new observation yk.

The result is given in the following theorem. To prove the theorem,

we need the following technical lemma, which will be referred to

45

as the innovation lemma.
(3.18) INNOVATION LEMMA. The innovation process
{§k+1‘k’ k I 0,1,2,...] associated with the observation process

{yr k I 1.2,...) of the BP defined by

(3.19) - M(k)uk + v k - 1,2,...

3'1. k

(see (A.1.l) for the meaning of the symbols) is generated by

(3.20) k - 0,1,2,...

s"1c+1|1c " yk+1 ‘ "(k)ﬁ1c+1|1c

The process {yk’ k I 1,2,...] has full rank, i.e.,

Dune huh] > 0 I - 0,1,2,... -

PROOF. By definition

yk+l‘k " y1c+1 ' (yk+1‘L(y‘k))
' yk+1 ' (M(k+l)uk+1 + VR+I|L(""))

. yk+1 - M(k+1)0k+1‘k - (vk+l|L(y;k))

for k I 0,1,2,... . Since vk, k I 0,1,2,... is a white-noise
process, and by (A.1.l) vk Iuj vk,j I 0,1,2,..., we have
Vk+1 J. L(y,k). So that (vk+1‘L(y;k)) - 0 by virtue of (2.10c)

and therefore

- M(k+l)0 k I 0,1,2,... .

31t+1| k ' yk+1 1t+1|1c

Note that for k I 0, yl‘o I yl, since 61 I 6{u1} I 0 (cf.

|o
(3.11)).

“w-OI

l!
\

46

Substituting (3.19) into (3.20) and rearranging the terms

we get

- 11(1.+1)0

S"1c+1|1c ' y1c+1 1t+1|1c

"-' M(k+1)ﬁ k 3 0,1,2,...

1c+1\1c + v1c+1

Thus, for k I 0,1,2,...

I [M(k+l)ﬁ' + VH1, M(k+1)ﬁ

Wk+l|k’ yk-i-l‘k] 1c+1|1t k+1|k + vk-l-l]

- M(k+l)[ﬁ' MT(k+1) + [v

1c+1|1t’ t"1c+1| k] 1c+1’ 3+1] ’

since v k I 0,1,2,... . Noting that

k+1 * ﬁk+l|k V

.. .. d
[uk-l-l‘k’ uk+l‘k] ' Pﬁml‘k) 2 °

and
[VHP vk+1] I Pv(k+l) > O , by (A.1.l) ,
we Obtain
~ _ T
(3.21) [yHI‘kﬁkﬂ‘k] M(k-l-1)Pﬁ(k+l‘k)M (k‘l'l) + Pv(k+l)
-_\\
for k I 0,1,2,... . Since V k I 0,1,2,..., Pv(k+l) >'0 and ..M‘,

P (k+l‘k) 2 0, by 08.2) the inverse of [y exists

u k-l-l‘k’ yk-I-l‘k]
for all k I 0,1,2,... . Thus the Observation process has full

rank. QED

(3.22) THEOREM. The single-stage prediction (8 P (k+1‘k))
E

1c+1|1t’
k I 0,1,2,... for the process {uk, k I 0,1,2,...} is accomplished

via the following equations:

(a) The stochastic process {6 k I 0,1,2,...], which is

1c+1|1c’
defined by the single-stage prediction estimate, is a zero-mean

47
wide-sense martingale, and is generated by the recursive equation

(3.23) a + c(1<+1|1c)[yk - M(k)‘u

1c+1|1c e oklk-l k‘k-ll

for k I 1,2,... with the initial condition a I 0, where

llo
G(k+l‘k) is the n X m gain matrix and is given by the follow-

ing expression:

(3.24) G(k+l|k) a P~(k‘k-1)MT(k)[M(k)P (klk-1)MT(1<) + Pv(k)‘.l'1
u 6
k B 1,2,see

(b) The stochastic process I 0,1,2,...}, which is

{ﬁk+1\k’ 1‘

defined by the single-stage prediction error E is the

1t+1|k’
solution of the following stochastic linear

difference equation

(3.25)

11

a (In - c(k+1‘k)M(k))ﬁ k+l - k

- (:(1c+1|1c)vk + u

a1c+111t k‘k-l

for k I 1,2,... with the initial condition 6 I u . This

1\o 1

process is a zero mean wide-sense Markov process whose covariance

matrix is given by the recursive equation

(3.26) p (k+1|k) - P (k‘k-l) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) +
a u a G

+ Pv(k)]-1M(k) P~(k| k-l) + Q(k)
u

for k - 1,2,... with the initial condition P (1|o) - Pu(l).
a

{Gene k

is a zero mean wide-sense martingale. Obviously, the process has

PROOF. (a) We first prove that the process I 0,1,2,...}

zero-mean. To prove that it is a wide-sense martingale, we must

show that (I) 0 e L; Vk - 0,1,2,..., and (II)

lei-1‘ k

48

|L(u. L+1|L))" L s k .

mk-l-II k ﬁ’L-i-I‘L

Since uk 6 LE, so that 6 by virtue Of (3.9b). It remains

Hl‘k
to verify the requirement (ii). The verification of (ii) is as

follows: For k 2 L

(ﬁm‘klmaalm - ((uk+1|L:(y;k))|L(3.L+1|L))

(ammo ..wlm by (2.1%), since

L(ﬁ,L+l‘L) c L'z‘(y;k) for L s k

<<uk+11L§<yst1) |L<n.x.+11t)>. same reason-

ing as above

(€1\L(U.L+1IL) . by (3 2)

L+1|L

, since 0

1, +1| L e L(0,L+I|L) .

' aL+1|L

Since the observation process has full rank, by (3.18),

the single-stage predicted estimate is given by

G1c+1|1c a °k+1|1c-1 + G<k+l‘kﬁk‘k-l

for k I 1,2,..., which is obtained by letting L I k in (2.28).

For k I 0, by (3.11), ﬁl‘o I 0. From (3.2) and (3.20) we see that

uk'il‘k-l B uk‘k-l and yk‘k-l '3 yk " M(k)ﬁk‘k-l s
and therefore,
qu‘k .. uk|k_1 + c(1c+1|1c)[yk - M(k)0k‘k_1], 1, 1’2””

where the gain matrix G(k+l‘k) is given by (2.29):

 

i?
1.: s
Fiﬁ:
....N‘
0
~;*.

ll.

 

 

 

 

49
c<k+11k> - [a 3 107 3' 1‘1
1c+1\1c-1’ k‘k-l 14km k‘k-l
From (3.21) we know that
(3.21) [Mk-15141.4] - M(k)Pﬁ(k‘k-1)MT(k) + Pv(k) .
On the other hand,
[ﬁk-i-l‘k-l’yklk-l] " [gut-1 + “H1 ' “k' M(k)ﬁk|k-l"+ ”13'

by (3.19)

(3.27) I P~ (kl k-1)MT(1<)
U

k
Substituting (3.21) and (3.27) into the expression for G(k+l|k)

since uk+1 - uk’ uk|k-l and v are orthogonal.
above we Obtain
C(k-i’l‘k) I P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + Pv(k)]-1 .
u 5

(b) The expression for the single-stage prediction error is

Obtained as follows:

IO.

“1c+1|k uk‘l‘l ‘ °k+1|k in“
* k+1 k k u b 3 23 “A
. uk+1 - uk‘k'l - G( ‘ )[yk - M( ) k‘k'l] 9 y ( ' )
. ﬁklk-l - G(k+l|k)[M(k)uk|k-l + vk] + uk+1 - uk ,

by letting u1c+1 I “kid - u +'u and substituting (3.20)

k k
(3.25)

(In - G(k+1‘k)M(k))ﬁ'k‘k_1 - G(k+1‘k)vk + uk+1 - uk

for k I 1,2,... . For k I 0, it is Obvious that ﬁllo I u',
since “1‘0 I O.

I!

\
I

50

TO show that the process I 0,1,2,...], which

[qu‘k’ k
is generated by (3.25), is a zero-mean wide-sense Markov process,

define
F(k) g In - c(1c+1\1c)h(1c),

r(k) ‘3- [-c(k+1\k). In].

and

then (3.25) can be written as
(3.28) uk+l|k I F(k)i'ik‘k_1 +'I‘(k)rk , k I 1,2,...

From the definitions F(k) and rk it is clear that

F(k) V k I 1,2,... is invertible and the stochastic process
{rk, k I 1,2,...] is a zero-mean white-noise process.

I 111 and therefore u1|o 1 rk,

k I 0,1,...] is defined by

Furthermore, since a
1'0
k - 1,2,..., the process {GHI‘R,

(3.28) is of the same form and is subject to the same conditions
as the process defined by (2.17). Hence, it is a zero-mean wide-
sense Markov process.

It now remains to determine the single-stage prediction
error covariance matrix P~(k+1‘k). To obtain an equation for
P~(k+1|k), we may multiplyu(3.25) with its transpose and take

u

mathematical expectations, or let k I L in (2.30) and then sub-

stitute (3.21) and (3.27) into it to get

 

51
p (k-l-l‘k) - P (Relic-1) - P (k1k-1)MT(k>1M(k>Pen-Indus)
5 ﬁ 5
+ Pv(k)]-]M(k)P~(k‘k-l)
U

for k 1,2,... . Since, by (3.7) P~(k+1‘k-1) . P (k‘k-l) +Q(k),
u G

we have
(3.26) P~(k+1\k> - {Mk-1) - P~(k|k-l)MT(k)[M(k)P(k|k-1)MT(k) +
U U U
+ Pv(k)]'lM(k)P~(k|k-l) +000
11

for k I 1,2,... . For k I 0, obviously P (1‘0) I Ph(l),
5

since a I u QED

1‘0 1 '

This theorem gives a recursive algorithm for computing
the single-stage prediction. The recursive algorithms are given
in Theorem 3.22 are extremely useful in processing Observations
to obtain the predicted estimate utilizing a digital computer.
The Observations can be processed as they occur, and there is no
need to store any observation data. In fact, so far as storage
of the observations and the signal is concerned, only ﬂk‘k-l
need to be stored in proceding from time k to time knl. An

additional feature is that the error covariance P~(k+l‘k) is ‘II"'
computed as a direct part of the estimator, and ma; be used to

judge the accuracy of the estimation procedure as in the Kalman

filtering theory. This is based on the assumption that the

observation models and the means and covariances of related pro-

cesses are correctly known.

A block diagram of the single-stage predictor is shown

in Figure 3.1. The information flow in the predictor can be

explained very simply by considering this block diagram, which

17":

 

52

is a representation of (3.23). From this figure we see that single-

stage predictor consists of a model discrete-time linear dynanical

system ﬁp I (In’ G(°‘°).In) (cf. [Ii-5]):

0H1” - nk‘k-l + G(k+l\k)yk‘k_1 , state equation

M»

2k I 8H1” , output equation

in which the gain-times-innovation term is applied to the model
as a forcing function. Observe that the predictor operates in a
predict-correct fashion. That is, the correction term

caet1|1t)yk+1‘k is added to the predicted estimate d to

k|k-l
determine the current predicted estimate. The correction term
involves a weighting of the innovation associated with the obser-

vation progess by the gain matrix C(k'I-l' k).

   

 

Input 0; 2p 5 Dynamical System 8p Output 0f ﬁp

       
   

 

Observation n X m Corr'Lcti-O Current estimate

 

vatioﬁ

 

 

 

 

I
M(k) I : klk-ﬂl guitl
- e- 1
n X m : Previous a ...,N
estimate "
I-a-O'
k-1,2’eee : 01‘0.0

Figure 3.1. Block diagram of single-stage predictor.

The estimation equations derived above, can be used to
estimate the signal {xk I Q(k)uk, k I 0,1,2,...} as pointed out
in Section 3 of Chapter 2. If we wish to have these equations
involving only the estimates of the signal [xk I §(k)uk,

k - 0,1,2,...}, then as shown there we need the assumption that

53

uk 6 R(QT(k)) vk I 0,1,2,... . Assuming that is so, we state the
result in the following corollary for the single-stage predictor.
The proof of this corollary easily follows from (2.22) and (3.22).
(3.29) COROLLARY. The single-stage prediction (RR-H”, Piaci-l‘k»
k I 0,1,2,... for the process {xk I §(k)uk, k I 0,1,2,...} (cf.
(1.4)) such that uk 6 R(§r(k)) V k I 0,1,2,..., is accomplished
via the following equations:

(a) The stochastic process {9k+1‘k, k I 0,1,2,...}, which is
defined by the single-stage prediction estimate is a zero-mean
wide-sense Markov process, and is generated by the recursive equa-

tion

(3.30) 2 - Q (k+1)§+(k)fc

xk+1‘k + 1<(1c+1|1t)[yk - mm

k\k-1 14 Rd1

for k.I 1,2,... with the initial condition ﬁl‘o I 0, where

K(k+l‘k) is the n X m gain matrix and is given by

(3.31) R(k+l|k) I 1(1t+1)o+(1c) P~(k‘k-1)HT(k)[H(k)P~(k‘k-1)HT(k) +
X X

-l
+-P;(k)] .
(b) The stochastic process {ik'l'l‘k’ 1t - 0,1,2,...}, which Is I I
defined by the single-stage prediction error §k+l‘k’ is the

solution of the linear stochastic difference equation

(3.32) -- (Q(k+1) - K(k+l‘k)H(k))x' - K(k+1‘k)vk +

ik‘fl‘ k k|k-1

+ b(k+1)(uk+1 - uk)

for k I 1,2,..., with the initial condition, ﬁl'o I §(1)u1.

This process is a zero-mean wide-sense Markov process whose

‘1

-... ,7“

54

covariance matrix is given by the recursive «nation
+ +I T
(3.33) P~(k+l‘k) - Q(k+l)§ (k)P~(k‘k-1)§ (1.)) (It-+1) -
x x
+ T T
- §(k+1)§ (k) P~(k|k-1)H (k)[H(k)P~(k|k-1)H (k) +
x x
- +T T T
+ PV (10] 5100504191» (kn (k+l) + <1(R+I)Q (km (k+l)
x

fork = 1,2,..., with the initial condItIon P§(l|o) = §(1)Pu(1)§T(1).

(3.34) REMARK. It is shown in (2.17) that the signals of the
Kahan filtering theory can be written in the form xk I 9(k)uk
for k I 1,2,... and x0 I 110 where §(k) I §(k,o) is a transi-
tion matrix. By letting §(k) I §(k,o) and 6+(k) I p-1(k,o) I
Q(O,k) in (3.29) and noting that §(k,j)§(j ,k) I In, one Obtains
the results of the Kahnan filtering theory (cf. [II-3], [lb-4])

for the BP.

(3.35) REMARK. The algorithms given in Corollary 3.29 are not
available in the current literature to the best of my knowledge,
and cannot be obtained directly from the Kahan filtering theory.
(3.36) RWARK. A comparison of (3.22) and (3.29) shows that in

the estimation of the signal x I §(k)uk, k I 0,1,2,... computa-

k
tion time is saved if the algorithms given in (3.22) is first used

followed by (2.22s) to obtain the desired results.

3.2 OPTIMAL FIIIERING FOR BP

We now examine the problem of obtaining an algorithm for
computing the basic problem Of interest, namely, the filtering
Problem. In developing the algorithm for optimal filtering for

the Bignal {u k I 0,1,2,...] and therefore for the signal

k,

I!

\

 

 

 

55

{xk I Q(k)uk, k I 0,1,2,...], we assume that only the initial
estimate Go I 0, and the filtering error covariance matrix at
the initial time, P;(0) I Ph(0), are given.

From (3.2),uwe observe that prediction and filtering are

interdependent in terms of the determination of the predicted

estimate given the filtered estimate and vice versa. In fact,

I 3 V' I ...
Ok+1‘k Uk k 0,1,2,

Hence, the single-stage predicted estimate algorithm (3.23) can
be used to compute the filtered estimate. Of course, the filter-
ing error covariance matrix will not be the same one that is given
by (3.26) for the single-stage predictor error. It must be computed
to judge the accuracy of the estimation procedure.

With these preliminaries completed, we now state and prove
the basic theorem of optimal filtering for the signal
(nu, k I 0,1,2,...}.
(3.37) THEOREM. The filtering (0k, PL(k)), k I 0,1,2,... for
the stochastic process {uk’ k I 0,1,2,?..} is accomplished via
the following equations: V
(a) The stochastic process {ﬁk’ k I 0,1,2,...}, which is defined
by the filtered estimate, is a zero-mean wide-sense martingale.

It is generated by the recursive equation
(3.38) 11k - 01", + c(k)[yk - M(k)ﬁk-1]

for k I 1,2,... with the initial condition 60 I 0, where C(k)

is the n X m gain matrix and is given by

56
T -1
(3.39) C(k) I P (k)M (k)Pv (k) .
u
(b) The stochastic process {ﬁk’ k I 0,1,2,...}, which is defined
by the filtering error ﬁk that satisfies the stochastic linear

difference equation

(3.40) 61‘ - (IIn - G(k)M(k))ﬁk_1 + [In - c;(1c)h(1c)](uk - uk_1)IG(k)vk

for k I 1,2,... with the initial condition Go I no, is a zero-
mean wide-sense Markov process. Its covariance matrix is given by
the following recursive equation:
(3.41) P~(k) I P~(k|k-1) - P~(k)MT(k)P;1(k)M(k)P~(k‘k-l)

u u u u
for k I 1,2,... with the initial condition P~(0) I Ph(0).

u

PROOF. (a) Since, by (3.2) 0 I Oh, it follows from (3.22s)

1t+1|k
that the process (Gk, k I 0,1,2,...} is a zero-mean wide-sense

martingale and

0 - a _1 + c(1t+1|1t)[yk - Rana

k k k-l]

for k I 1,2,... . For k I 0, obviously 00 I 0 (cf. 3.11).
The filter gain matrix C(k) is Obviously equal to the ""
single-stage predictor gain matrix G(k+l|k). Thus, from.(3.24)
C(k) I C(k-i-llk) I P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + P (k)].1 .
~ ~ v
u u
To obtain an expression in terms of the filtering error covariance
matrix for the gain matrix C(k), note that since P (k‘k-l) 2 0

u
and Pv(k) >10, from (B.2) and (B.1)

57
p (k‘k-1)MT(k)[M(k)P (1t|1c-1)NT(R) + Pv(k)]-1
‘u a

- P~(k‘k-1)MT(k)R-1(k) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) +
11 ﬁ' '11

+ Pv (kn-114(k) P” (kl k-1)MT (k) P;1(k) .
u
It is shown in part (b) of this theoremtthat

P~(k) - P (k‘k-l) - P (k‘k-1)MT(k)[M(k)P (1t|1t-1)NT(1c) + Pv(k)]'1
u G a E

x M(k)P (k‘k-l) .
a
Hence

C(k) - P (klk-1>M?(k)tu(k)r (t|t-1)hT(k) +1300)”1
u a

-gemhmghm.
u

(b) The filtering error is by definition

6k B Uk ' Gk, k - 0,1,2,eee e
Substituting (3.38) for Gk, and rearranging the terms we get
uk I uk - ilk-1 - G(k)[yk - M(k)0k+1]

I 310']. I G(k)[M(k)ﬁk-l + M(k) (“k ' uk-l) + vk] +Iuk - uk-l

(3.40) - [In - G(k)M(k)]ﬁk_1 + [In - <3(1c)h(1c)](uk - uk_l) - c:(lt)yk

for k I 1,2,... . For k I 0, it is clear that ﬁo I uo, since
a - O.
O
A procedure similar to that used in (3.22b), shows that
the stochastic process {ﬁk’ k I 0,1,2,...}, which is generated

by (3.40), is a zero-mean wide-sense‘Markov process.

I!

\

.Z.

 

 

 

 

58

To obtain an expression for the filtering error covariance

matrix P (k), we note from (3.7) that
G

P~(k) - P~(k+1‘k) - Q(k) .
u 11
Substituting (3.26) for P~(k+1‘k) and noting that
u
C(k) - c(1t+1|1c) - P~(k‘k-1)MT(k)[M(k)P~(k|k-1)MT(k) + Pv(1t)]"1
u u
. P~ (k)MT (1t) P;1(k)
u
we obtain
P_,(k) - P (k‘k-l) - C(k)M(k)P (k‘k-l)
“ a ti

. P~(k|k-l) - Pu(k)MT(k)P;1(k)M(k) P~(k|k-1)
u u

for k I 1,2,... . For k I O, P (0) I Ph(0), since 60 I uo. QED
{I

A block diagram of the filter is shown in Figure 3.2 which
is a representation of (3.38). By comparing Figures 3.1 and 3.2,
we see that the single-stage predictor and filter for the stochastic
process {“k’ k I 0,1,2,...} based on the same observation record

have exactly the same structure: III”
ﬁP I (In, G(°|°), In) , the single-stage predictor
2F - (In. c(-). In) , the filter .

The computation, of filtering differs from the computation of

single-stage prediction, in the determination of the filtering

error covariance matrix. To compute the filtering error co-

variance matrix P (k), one may use (3.41) or the following
8

59
equality (cf. (3 .7)):

p (k) = P (k+1|k) -Q(1t) 1t - 0,1,2,... .
~ {I

 

A u A
Input of 2p :Dynamical System 21" Output of 8F
E
yk k‘k- C(k):k k_ 3 (1k 2k . Gk
‘E’ ‘n X m 2 1..

Observation Inno—. _Corr ction Current estimate

 

 

 

vation I
140:) W I in“ Unit
L
. de-
k x l
M( kIl n m [Previous a
estimate

I
kIlﬂvu, Mo-o

Figure 3.2. Block diagram of filter.

We now state without proof the result of Theorem (3.37)
for the signal [xk I 900%. k ' oslszs-n) (Cf- (1J0).

assuming that u 6 R(QT(k)), k I 0,1,2,... .

k
(3.42) COROLLARY. The filtering (2“, Pica), k I 0,1,2,...

for the signal {xk I Q(k)uk, k I 0,1,2,...} is accomplished

via the following equations:

(a) The stochastic process {ik’ k I 0,1,2,...}, which is defined

by the single-stage prediction estimate, is a zero-mean wide-sense

Markov process, and is generated by the recursive equation

(3.43) R - Q(k)§+(k-1)ﬁk_1 + KOOIPk - H(k)9(k)9+(k-l)ﬁ

k k-1]

for k I 1,2,... . The initial condition is *0 I 0, where

K(k) is the n X m gain matrix and is given by

(3.44) R(k) - P. (1011" (k) P;1(k)
x

60

(b) The stochastic process {ﬁk’ k I 0,1,2,...], which is defined
by the filtering error ik that satisfies the linear stochastic

difference equation
(3.45) gk = [o<k>¢*rk-1) - x<k>n<k)¢<k>¢+kk-l)]§k_1

+ [Q(k) - K(k)H(k)9(k)](uk - u - K(k)vk

k-l)

for k I 1,2,... with the initial condition i0 I x0 I §(0)uo,
is a zero-mean wide-sense Markov process whose covariance matrix

is given by the following equation:

T
(3.46) P;(k) = 4<k>¢+<k-1>P (k-m+ <k-1>9T<k>
x i
T

- +
- P;(k)HT(k)Bvl(k)H(k)P (k‘k-l)§ (k-l)§T(k)
x i
for k I 1,2,... with the initial condition P§(O) I P;(O) I

4(0)ph(0)4T(0).

3.3 OPTIMAL SMOOTHING FOR BP.

Recall from Chapter 1 that the optimal smoothing problem
deals with estimate of the signal at a time k based on the
observation record Y(L) I {y1,y2,...,yL}, where k‘<.L: that is,
the time at which it is desired to estimate the signal precedes
the time of the last observation yL.

Just as in the case of prediction, the smoothed estimate
of a signal is classified according to possible relationships
between the two time indices, L and k. Depending upon how the
time indices L and k vary, analogous to the prediction, three

classes of smoothing can be defined (cf. [M-6], [M-7]):

a.“

61

(3.47) DEFINITION. (a) Fixed-interval.smoothing: G , L I L I

H;
fixed-positive integer, k I 0,1,2,...,L-1.

(b) Fixed-point smoothing: ﬁk‘L’ k I‘N I fixed-positive integer,
4, - n+1, n+2,... .

(c) Fixed-lag smoothing: ﬁ , L I k+L, L I fixed-positive

kl;
integer, k I 0,1,2,... .

Before examining each of these classes separately in terms
of developing algorithms, we seek a general formula to the optimal
smoothing. First we recall from (3.18) that the observation
process has full rank, so, by (2.27c) the optimal smoothing
(ﬂkJL, P%(klc)), k.< L is accomplished via the following equations
with the initial condition (ﬁk, f~(k)):

u

L

614; - 0k + “glam, i‘i-l)§1‘1_1 ,

,, -1
“(1" 1‘1'1) " [“141-1' S7:1|1-1]w1|1-1’ 9'1|1-1] '

L
P (klu - P (k) - z 00:. 1|1-1)[§ . a 3
a ti 1-k+1 1| 1'1 kl 1'1

where, by (3.18),
ini-l ' yi ' M(imqiul

and

... ~ '1‘

a 1 .. .

[’ili-l’ yi‘H] M( )Pﬁuli 1|)M (0+ Pvu)
Notice that in these equations, which are valid for all the classes
of smoothing, the only unknown is the n x m ‘matrix
[uk‘i-l’ yi‘i-l] i I k+1,k+2,...,{,. This matrix is determined

as follows:

62

vk, k I 0,1,2,... is a white-noise process and vi l'uj’
i,j = 0,1,2,... 0
3V1 J-ﬁk‘i-l i=1,2’ooo, k-0,1’2,ooo
a [“k|1-1’ yi‘i-l] [uk‘i-l’ M<i)“1|1-1 v1]
“ ” M? 1
[“k|1-1’ u1|1-1J ( )
(3.48) = p (k, 1|1-1)MT(1)
a

where P~(k, i‘i-l) 2 [Gk‘i-l’ ﬁi|i-l]° The problem is now to
u

find an expression for the cross-covariance matrix P (k,i‘i-l).
6

Not ing that

no.

{'1'

1.11 “k ' 8141

1
6k - z C(k, j|j-l)y' for k< 1 by (2.27c)

" 3
j=k+l 3‘3'1
= uk‘i_1 - C(k, 1|1-1)91‘1_1 , k1< 1
and
.. g a
u1+1\1 “1 ' 1+1|1
. u1‘1_1 - C(1+1| :0th1 + qu - ui by (3.25) N
we have "t

d s
P~(k, 1+1|1) - [uk‘i. “1+1|1)
U
= [ﬁk‘i-l - C(k: 1|1'1>91‘1-1'“1‘1-1 ' G<1+1‘1)§1|1-1'+
+ ui-l-l - “1]
. [gunman-1] - c(k.i|1-1)[91‘1_1.ﬁ1‘1-1]

- [ak‘1_1,§1‘1_1]cr(1+1| 1) +

+ G(k,i| 1-1)[§1‘1_1,91‘ 1_1]cT(1+1‘ 1)

63

for i = k+1,k&2,... . In obtaining the last equality we used

the fact that

u - u for ki< 1

1+1 1 * ﬁk‘i-l’ y1|1-1

which is an easy consequence of (1.4) and (A.1.l). Substituting
(2.34) for C(k, i‘i-l) and (2.29) for G(i+l|i) into this last

equation, and performing the necessary operations we obtain
P60" 1+1“) = [ﬁlth-1’ 61‘1-1] ' [ﬁk‘1-1’ 3"1‘1-1]
[mi-1' y1‘1-11-1W1h-1’ “111-11
for i I k+l,k+2,... . Since
~ ~ d
[uk‘1_1, ui|i_1] - P&(k, 1|1-1),

[y1‘1_1,91‘1_1] . M(i)P%(i|i-1)M?(i) + 2v(1) by (3.18),

[ﬁk‘i-l’yili-l] = P;(k, 1|1-1)uF(1) by (3.48),
u

u
we can express P (k, i‘i-l) as -.\\
ﬁ ..¢~

P;(k, 1+1|1) - P;(k, 1|1-1) - P;(k, 1|1-13u3(1)tu(1)p (1|1-1)u?(1) +
u u u E
+ Pv(i)]-1M(1)P~(i|i-l)
u

for 1 - k,k+1,... with p (k, 141.4) - r (k‘k-I) as the initial
G a
condition.
This completes our derivation of a general formula for

Optimal smoothing, and we summarize our results below.

64

(3.49) THEOREM. Optimal smoothing (ﬁklt’ P;(k‘L)), k4< L for
u

the signal u k I 0,1,2,... is accomplished via the following

k9

equations with (ﬁk, P;(k)) as the initial condition:
u
(a) Optimal smoothed estimate is given by the algebraic equation

L
(3.50) a - o + z C(k, 1 1-1)[y - M(i)ﬁ
1‘“ L 1=1t+1 ‘ i ”(’11

where the n X m matrix C(k, i‘i-l) is
(3.51) C(k, 1|1-1) - P (k, 1|1-1)hT(1)[u(1)P (1|1-1)uT(1) + Pv(1)]‘1.
a h

(b) The n x n cross-covariance matrix P (k, i‘i-l) satisfies
a
the recursion

(3.52) P (k,i+1| 1) - P (k,i‘i-l) - P (k,iIi-l)MT(i)[M(i)P (i|i-l)MT(i) +
ti ti 6 u

+ Pv(1)]")1(1)P (1|1-1)
a

for i I k,k+l,1,... with the initial condition P (k, k‘k-l) I
G

P (k‘k-l).

E

(c) Optimal smoothing error covariance matrix is given by the

algebraic equation

L
(3-53) P (144) - P (k) - z: 60:, 1|1-1)u(1)P (1, 141-1) .
a G 1-1 G

This theorem provides a solution to general optimal

smoothing problem for the signal u It is an easy’task to show

k.

that the optimal smoothing (at P (140), for the signal
i

k|(,’
xk I Q(k)uk such that uk 6 R(§T(k)), is accomplished via Theorem
3.49 where u is replaced by x and M1 is replaced by 8.

Thus the optimal smoothing equations for the Kahan signal (cf.

 

65

(2.17)) are exactly those given in (3.49) with x replaced by
u and H is replaced by M. This solution, for discrete-time,
Kalman signals, is not known at the present time to the best of
the author's knowledge. (For previous works in this area see
e.g. [H-6], [K-Z], [M—S]. [K-l]. [W-5]-)

The Optimal estimation equations given in Theorem 3.49
are obviously valid for all the classes of optimal smoothing.
They may be written in different forms for each class as in the
case of optimal prediction. In the following we shall give the
forms of these equations for the single-stage smoothing (fixed-
lag smoothing with lag 1) and fixed-point smoothing.
SINGLE-STAGE SMOOTHING. Here we wish to obtain the optimal

estimate 6 for k I 0,1,2,... . By letting L I k+l

14 H1
in (3.49) and noting (3.2), (3.37) we get the results:

ak‘ H1 - 11k + C(k, k+1|1t)[yk+1 - M(k+l)€lk],

C(k. k+1|k) - P~(k)MT(k-I-l)[M(k+l)P (k+1|k)uT(k+1) + Pvat-a-ln'1
u 5
P (It, k+1|k) a P (k) ,
s s

P (k|k+1) = P (k) - C(k, k+l‘k)M(k+l)P (k)
a G a

for k I 0,1,2,... with the initial conditions Go‘o I 00 I 0,

P~(o|o) - Pu(0).

U

FIXED-POINT SMOOTHING. Recall from (3.47b) that the optimal fixed-
point smoothing problem deals with the estimate ﬁNIL where
N I positive-integer and L I N+l,N+2,... . Let k I N be fixed

in (3.49), then from (3.50) it is seen that

 

66
L-l

“up, . “h - 1.3“ cm, 1|1-1)[y1 - “(1)°1|1-1]

4” Gm. L‘L-DUL ' “(UGL‘LJJ

(3.54) = “Nu-1 + GCN. LlL-DEyL - M(L)uL|L_1]

where, by (3.51),

(3.55) G(N,L|L-1) - P~(N.L|L-1)MT(L)[M(L)P~(LIL-1)MT(L) + PVmJ"1
U u

where P (N, L‘L-l) is given by (3.52).
E

To complete the derivation of optimal fixed-point smothing
equations we need to find an expression for the smoothing error-
covariance matrix P GN‘L). From (3.53) it follows that the

u
covariance matrix satisfies the recursion

(3.56) P~(NIL) = P~(N|L-1) - cm, LIL-1)M(L)P~(L. le-l)
u U U

for 4, - N+l,N+2,... with P (N‘N) - P (N) as the initial con-
ﬂ 6
dition.

Thus we have found that the optimal fixed-point smoothing

k
accomplished via the recursive equations (3.54), (3.55), (3.52)

(ft-Nu: P (MD), L I N+l,N+2,... for the signal u is
a

and (3.56), with the initial condition (8“, P (N)). When equa-
u

tions are written for a Kalmsn signal,i.e. u is replaced by

x (cf. (2.17)) and M is replaced by a, one obtains a new

procedure in smoothing of Kalmsn signals.

67

3.4 AN EXAMPEE
Let us consider a scalar stochastic process {xk, k I 0,1,2,...},
which has zero mean and whose covariance function is Pk(k,j) I

e'(k+i), where o I constant > 0. Suppose that we observe this

0
process in the presence of a zero mean white noise process

{vk, k I 1,2,...} for which 6{vk vj] I r(k)6kj and
6{vkxj}I0 Vk,j, where OSr(k)<I VkIl,2,... . Then

the observation equation is

(3.57) I x +-v k I 1,2,...

yk k k

Since only the first and second moments of the signal
process {xk’ k I 0,1,2,...] are given, we attempt to determine
a wide-sense Markov process with the same properties. To do so,
in view of (2.16) it is sufficient to show that xk I i(k)uk,

k I 0,1,2,..., where 6(k) is a scalar function of discrete-
time, and {u

k’ k I 0,1,2,...} is a wide-sense martingale process.

If this is so, then (cf. [M-l])

§(k) - Px(k,o)P;1(o,o) k 2 0

~‘\
- e'k. or:
Thus, assuming xk I e-kuk V'k I 0,1,2,... we obtain
-k-i d -k-i
e a I [xk,xj] I e Ph(k,i)

I Ph(k,i) I o I constant > O .

Since [uk - ui, ui] I a - a I 0 V k,i I 0,1,2,..., the
stochastic process {uk’ k I 0,1,2,...] is a wide-sense martingale

with zero mean and constant covariance function 0. Hence,

I! _.

\~.

 

 

68

xk I e'kuk, and it is a wide-sense Markov process.

Now, we are ready to compute the Optimal estimation equa-
tions for the process {xk, k I 0,1,2,...} which is observed by
(3.57). We first note that since Ph(k) I a I constant,

Q(k) ﬂ Pu(k+1) - Ph(k) I 0 (cf. (1.4)). Therefore it follows

from (3.17) that
(3.53) P (k+1|1t) . P (k) W: - 0,1,2,... .
a 6

Hence, the optimal single-stage prediction and filtering for the
process {uk’ k I 0,1,2,...] are accomplished via the same set
of equations (cf. (3.22), (3.37)). Note that for the process

{xk, k I 0,1,2,...} we have (cf. (2.22), (3.58))

P~(k+1‘k) - .e'2(k+l)e"'2k P (k)
x i
(3.59) - e’2P (k) W - 0,1,2,...
i

We summarize the results below. The computations here
are exceedingly simple hence the derivations of these results

will not be demonstrated in detail.

OPTIMAL REDICHON. In the following, N and L denote fixed- "
positive integers
(a) Fixed-interval prediction (cf. (3.3), (3.6)): For

k - L+1’L+2’.OO

. _ _ -(k-L)
“k‘ L 0L ’ ﬁk‘ L e gL
-2 (k-L)

P~(k| L) - P~(L). P~(k‘L) - e P~(L)
u u 3! X

69
(b) Fixed-point prediction (cf. (3.13), (3.14)):
a - a ’ - '01-!)
“up, L ’ML 8 it
p cm) - P (L) . P (ML) - 8-2m-L)P (t)
5 {I § §

(c) Fixed-lead prediction (cf. (3.15), (3.17)): For L I 0,1,2,...

ﬁ 6 , \ 3 8-1%; ,

Hm, ' z, . girl-ML

P__(L+1|L) - {(1).} P (mm - em”? (1.) .
u u 3': 5E

OPTIMAL SINCE-STAGE PREDICTION AND FILTERING.

(a) Optimal estimate: From (3.38) (or (3.23)) we have

Gk - a!“ + c;(1<)[yk - e'knk_1] =9 5%,, - e'kkd + x(h)[yk - e'lakd]

for k - 1,2,... with ac - 00

(b) Gain matrix C(k): From (3.39) (or (3.38))

-k
P..(1t)e Pi. (1t)

G<*>"’*'?a?)' = ‘0‘) 'as'

(c) Error covariance matrix: From (3.41) and (3.58)

P~(k+1|k) a P~(k)
u U
P.,(k-1)e'2k
- P (k-l) - 4:71? P (k-l)
a e __P (It-1)+t(k) a
a

 

t (k) PG (k- 1)
e-ZkP~(k- 1)+r(k)
u

(3 . 60) I

 

for k I 1,2,... with P (0) I a. It is easily shown that (3.60)
E
with the initial condition P (0) I a has the unique solution
{1'

70

 

 

 

 

 

 

 

1
(3.61) P~(k) I k _21 k I 0,1,2,... .
u 2 e +.l
i=1 r(i) 0
Hence, from (2.22) we have
e.2k l
(3.62)P§(k)I k e'2(i'k) em‘ I k e_2(1_k) eZk k I 0,1,2,... .
+ ._______. ___
if, r(i) a 151 1(1) *' a
Notice that
C(k) I 1
k e-(EI-k) ek ’
r(k) (131 W + r)
(b), (3.61) and (3.62)
l
(k) ' k e-2(1-k) e2k °
r(k)( Z W + 7)

iIl
OPTIMAL SPOOTHING. We shall only derive the optimal estimation
equations for general smoothing (cf. (3.49)). ‘We see from the

equations given (3.49) that the only quantity that needs to be

determined is P (k, j|j-l), the other quantities in these equa-

u
tions are computed above for the filtering.

From (3.52) and (3.58) we have
e-szﬁ(j-1) ..r4
P (k.1+1|1> - P (k.J|J-1)[1 - -:11 1
ﬁ ﬁ e Pg(J-1)+r(1)

r(i)
e'23P3(1-1)+r(1)

 

' 2~(ksj‘j'l)
U

for j I k,k+1,..., with the initial condition P (k,k|k-l) I
u
P (k‘k-l). It is shown that the tlnnique solution of this equation is
a i-
P~(k-1) n r(i)
‘1 iIl
P (k,j|j-l) - 1-1 2 j 2 k+l .
" - 1
“ n [e P3<1-1)+r<1>]

iIk

‘b

h

71

For the stochastic process {xk, k I 0,1,2,...} this equation

becomes

j-l
e.(j+l)P_.(k-l)ig1 r(i)
{awn-1) = j—_1———L j 2 1+1 .
x

n [P (i-l)+f(1)]
iIk x

 

This completes our discussion on the example.
We note here that the above solutions for this simple
example can be obtained by other techniques as well. One will

obtain the same result.

as»;

CHAPTER 4

CROSS-CORREIATED NOISE HOBLEM (CCP)

In this chapter, we derive the optimal estimation equations
for the cross-correlated noise problem.(A.l.2), that is, the
estimation equations for the process {xk I Q(k)uk, k I 0,1,...]
where the process {uk+l - uk, k I 0,1,2,...} and output noise
{vk, k I 0,1,2,...} are correlated (cf. Assumption A.1.2). As
we did in the preceeding chapter, we shall derive the equations
for the process {uk, k I 0,1,2,...} then use (2.22s) to obtain

equations for the process {xk I Q(k)uk, k I 0,1,...}.

4.1 OPTIMAL PREDICTION FOR CCP

We first derive a general formula of the prediction for
the problem of interest then seek the form of this formla for
the three distinct classes of the prediction problems introduced

in Chapter 3. The general formula is derived easily by using the

5,;

orthogonal projection lemma as follows:
The linear minimum mean-square predicted estimate of the

signal uk based on the observation record Y(L) is
n
Ok‘, (uk\L2(y.(,)) k > L

where k,L . 0,1,2,... . By adding and subtracting the term “4+1

to uk and using the linearity of the orthogonal projection we get

72

73

. - _ n . n .
“RH (uk u “III-2W .m + amazes»
k > L. From the assumptions made in (1.4), (A.1.2) we have

1, and u
“I.

L+1 k ' uL+1 * Vt ’

n
for all k > L. Therefore uk - “0+1 1 L2(y,L) and

6,4, - «Wham»

((4.1) I GL'H-‘L

for k > L. From this result we observe that the general predicted
estimate is equal to the single-stage predicted estimate. SO we
need to develop an algorithm for the single-stage predicted estimate.
From this algorithm‘we may obtain the three classes of prediction

by using (6.1) as shown below.

We now state the predictor (4.1) and the corresponding
covariance matrices of the estimation errors for the classes of
prediction, that is, the equations for prediction (“kit’ P;(k‘L))

k >.L are developed for the three distinct classes. The dzrivation
of these equations is straightforward and will not be demonstrated

here.

FIXED-INTERVAL PREDICTION. Fixed-interval prediction

(ak‘L’ P (k‘L)) k > L, L I fixed positive integer, is accomplished
6

via the following equations with the initial condition

(a . P (L+1|L)):
ﬁ

IH-l‘L

(4'2) ﬁk‘L ' GL+1‘L ’

(j

74

(4.3) P~(k|L) - P~(L+1‘L) + Pu(k) - Pu(L+1), 1t 2 L+1
u u k-I
a P (L+1|L) + 2 (2(1)-
6 1-L+1

FIXED-POINT PREDICTION. Fixed-point prediction (111,”, P (M0)
E
L I 0,1,...,N-1, N I fixed positive integer, is accomplished via

the following equations with the boundary condition (%’ P (N)):

u
4.4 t. - a ,
( ) N‘L H111.
(4.5) P~(N‘L) - P~(L+1|L) + Pom) - Pawn .
U U
FIXED-LEAD PREDICTICN. Fixed-lead prediction (at +1., L, P (L+L|L))
E

L I 0,1,2,... with lead L I fixed positive integer, is accomplished
via the following equations with the initial condition

(oL‘o, P~(L|o)) - (o, Pu(L)) (cf. Remark (3.10) of Chapter 2):
u

(4.6) a

“1+th ' all; ’

(4.7) PE(L+L‘L) - P~(L+1‘L) + Pu(L+L) - Pu(L+l) .
u

Note that in order to compute the above three classes of
prediction we need to know the fixed-lead prediction with lead
one, i.e. the single-stage prediction. In the fixed-interval pre-
diction, the only value of the single-stage estimate we must know

13 ﬂ , where L is the end-point of the observation interval

L+1| L
and it is fixed. For the fixed-point and fixed-lead prediction
we need to process the single-stage prediction algorithm as the

data arrives.

75

In the following,wwe develOp a recursive algorithm for single-
stage prediction based on the previous estimate and the new observa-
tions To do this we need the innovation lemma for the observation
process defined by (1.5) and (A.1.2).

(4.8) INNOVATION LEMMA. The observation process {yk’ k I 1,2,...}
defined by (1.5) and (A.1.2) has full rank, i.e.,

Wk+1|h5h+1|k3 . M(k'l-l) Pﬁ(k+l|k)MT (PM) + Pv(k+l) > o

for all k - 0,1,2,... .

PROOF. By definition

d
y1t+1|lt ' yk+l ' (yk+1‘1'(y‘k))’

Since

yk I M(k)uk + vk, we have

- M(ki1)8 k I 0,1,2,...

yea-int ' y1t+1 1t+1|1t I °k+1|k '

Recall from (1.4) and (A.1.2) that v , j I 0,1,2,... is a white-

.1

noise process such that

[uk+1 - uk, VJ] I C(k)6kj and uo .Lvj

1,1

for all k,j I 0,1,2,... . It follows that
(4.9) [uk, v1] I O V'k s j, k,j I 0,1,2,... .

Thus VH1 1 L(y;k) and therefore (le|L(y;k)) - 0. Then the

innovation vector is

(4.10) yk-i-l‘k I yk+l - M(k-l-1)(ik+1‘k I M(k'Il-l)il'k+1‘k + vb”, k I 0,1,2,...

76

Using the expression (4.10) for the innovation vector

we Obtain

Wk-tl‘k’yk-u‘k] B [M(k+1)uk+l|k + le'ua‘ﬂmmqk "' VH1]

- M(k+l)P (k+1‘k)MT(k+1) + Pv(k+l)
s

for k - 0,1,2,... . s1hee P (k-I-l‘k) 2 o and by (A.1.2),
a
Pv(k+l) > o the matrix

” 0 V k I O l 2 ...
[yk-i-l‘k’ 9k+l‘k] > ’ ’ ’
as desired. QED

(4.11) THEOREM. The single-stage prediction (0 , P (1t+1|1t))
. 1t+1‘1t u

k I 0,1,2,... of the signal {uk, k I 0,1,2,...] is accomplished

as follows:

(a) The stochastic process {0k+l‘k’ k I 0,1,2,...}, which is

defined by the single-stage predicted estimate °k+1‘k’

(4.12) ﬁk+1|k - °k\k+1 + G(k+1‘k)[yk - M(k)Ok‘k_1]
for k I 1,2,... with ﬁl‘o I O as the initial condition, is a

zero-mean wide-sense martingale. The predictor gain matrix

C(k-i-l‘k) is

(“'13) ““1“" " [P (Rik-D1300 + C(k)][ump (k|k-1)MT(k) + oncn'l.
a a

(b) The stochastic process {ﬁk+l|k’ k I 0,1,2,...}, which is
defined by the single-stage prediction error ﬁk+l‘k satisfying

the linear stochastic difference equation

(4.14) 6H1“, - [In - G(k+l‘k)M(k)]t"i - c:(1t+1|1t)yk + u - u

k‘k-l k-l k

L]

77

for k I 1,2,... with the initial condition 3 I ul, is a wide-

l‘o

sense Markov process. The covariance matrix for this process is

determined by the recursive equation
(4.15) P (h+1|k) - P (k‘k-l) - [P (k‘k-1)Mi(k) + C(k)][M(k)P (k‘k-1)M1(k) +
a a a a
R(k)]-1[M(k)P;(k‘k~1)MI(k) + CT(k)] +-Q(k)
u

for k I 1,2,... with P (1‘0) I Ph(l) as the initial condition.
5
PROOF. (a) The proof of the fact that the process {ﬁk+1‘k’

k I 0,1,2,...} is a zero-mean wide-sense martingale is similar to
the one that is given in (3.22s) and hence omitted.

From (2.278), (4.1) and (4.10) it is seen that,

o + G(l¢+1‘k)[yk - M(k)0

k+1‘k ' ﬁk‘k-l k‘k-l]

for k - 1,2,..., and b - o for k - o, a1hee by (4.8) the

l‘o
observation process has full rank. Now, it remains to find an
expression for the predictor gain matrix G(k+l‘k). From (2.29)

and (4.3), we have

C(h+1|k) - [

51

~ -1
“k+1\k-1’ yk‘k-l][yk‘k-l’ yk‘k-IJ

e [ﬁk+l\k-l’ 9k‘k_1][M(k)P%(k‘k-1)Mr(k) +-Pv(k)]'1.

To complete the determination of the gain matrix we must compute

the matrix [uk+1‘k-l’ yk‘k-l]° This is done as follows:

78

[Gk-tl'k-l’ yhut-1] " ["kn ' °k+1|k-1’ 9141M]
' [“k+1’ yapt-11' ”in“ yk‘k-l ‘ Ok‘l-l‘k-l e L;(y‘k'1)
. [uk+1, 11006141“) + [uk+1, vk], by (4.10)
2 [uk, ﬁk‘k_1]MT(k) + (“kn - uk, vk], by (4.9)

(4.16) = P~(k‘k-1)MT(k) + C(k), a1hee u
U

k ' ﬁk‘k-l I “1410-1

and ak‘k-l .L nk‘k-l .

Substitution of this result into the expression for the predictor
gain matrix, yields

G(k+1,k) I [P~(k‘k-1)MT(k) + C(k)]EM(k)P (k‘k-1)MT(k) +'Pv(k)]-l.
u 5

(b) By definition

.. g _. _
uk+1‘k uk+1 uk‘i’l‘k’ k 0,1,2,...

Substitute (4.12) into this equation, and rearrange the terms to

Obtain

ﬁk+1‘k ' “H1 ' Ok‘k-l ' C(Hl‘k)[yk ' M<k>ok|kIIJ

1,1

- ﬁk‘k_1 - G(k+1‘k)[M(k)ﬁk‘k_l + vk] + ulc+1 - uk

(4.14) - [In - G(k+1|k)M(k)]ﬁ' - c;(h+1|1t)yk + u - u

k‘k-l 1t+1 k

for k I 1,2,... . For k I O, obviously O I ul, since

l‘o
A procedure analogous to the one that is used in (3.22b)

shows that the stochastic process {ﬁk+l‘k’ k I 0,1,2,...} is a

zero-mean wide-sense Markov process. The covariance matrix of

this process is given by (2.30):

I!

‘5
‘1‘

79

P~(k+1‘k) - P~(k+l‘k-l) - c<t+1|k)[9k‘k_1, ark-””43, 1t - 1,2,...
u

u
for k I 1,2,..., and P (1‘0) I Ph(l) for k I l. Noting that
a
P (k-i-l‘k-l) - P (k‘k-l) +Q(1t), by (4.7)
u 6

and

.. r
[yk‘k-l’ flank-l] ' [isn‘t-1’ yhilt-1]

. M(k)P (k‘k-l) + cTat), by (4.16)
s

we obtain

P~(k+l‘k) a P~(k‘k-1) - G(k+1|k)[M(k)P (1t|1t-1) + CT(k)] +Q(1t)
ti

U U

. 11041.4) - [P~(k‘k-1)MT(k) + C(k)][M(k)P~(k‘k-1)MT(k) + PVU‘H-l
u u u

x [M(k)P~(k‘k-1) + cT(1t)] +Q(k)
U

for k I 1,2,... . QED

An examination of the results of this theorem reveals that
a predict-correct concept is present as in the BP, and the predictor
that is given by (4.12) has the same structure as the one given
by (3.22).

The only difference between the algorithms given in (3.22)
for BP and in (4.11) for the CCP is the difference between the
expressions for the gain matrices G(k+1,k). ‘We expect this on—
the grounds that the gain matrix G(k+l|k) is indicative of the
amount of information contained in the innovation yk‘k_1 about
- u is correlated with v we

k+l° k+l k k
expect that the expression for G(k#1|k) of CCP may include the

the signal u Since u

cross-correlation matrix C(k) 2 [111mm1 - uk,vk].

l I

80

We complete the discussion about (4.11) with the following

remarks:

(4.17) REMARK. For the Kalman signal (2.17), the results of
Theorem 4.11 can easily be written by using (4.22) and one obtains
the estimation equations first derived by Kalman [It-4].

(4.18) REMARK. A different expression for the Optimal single-

stage prediction ﬁk-l-l‘k can be Obtained as follows:

6H”, - (umhgomn
" (“k-+1 " “k + “k‘gmk”
' ﬁk + (“H1 ' “k‘Lgv‘k”

_ n I n . _ :1
Since by (2.26) L2(y,k) 1.2(y,k l) 9 bak'k-l) and since it

n
can easily be shown that uh” - uk 1 L2(y;k-l),

n
ak—I-l‘k "' 0k + (“h-+1 ‘ “Id 12614191”

(4.19) . “k + S(k+1‘k)[yk - M(k)nk‘k_1]
for k I 1,2,..., where S(k+1|k) is the n x m predictor gain

matrix to be determined. An easy computation shows that

.. -l
8(H1‘k) I [UH]. ' “ks {jk‘ k-lJtyk‘kIl’ yklkIIJ

(4.20) - C(k)[M(k)P~(k‘k-1)MT(k) + Pv(1t)]"'1 .
11

After a moderate amount of algebraic labor, we find that the optimal
single-stage predictor error covariance matrix is given by
(4.21) P~(k+1‘k) - P~(k) - C(k)[M(k)Pﬁ(k|k-1)MT(k) + Pv(k)]-ICT(k) + Q(k)
u u
- Pu(k)MT(k)[M(k)Pﬁ(k|k-1)MT(k) + Pv(k)]-lcr(k) -

- c(k)[M(k)Pa(k‘k-1)Mr(k) + Pv(k)]'lM(k)P~(k).
u

‘~.o

81

for k = 1,2,... with P~(l‘o) - Pu(l) as the initial condition.
At this point we lrl1ote that the optimal estimation equations

(4.19)-(4.20) describe a procedure for the optimal single-stage

prediction mk-i-l‘k’ Pﬁ(k+l‘k)) in which one needs the optimal

filtered estimate to process the predictor as shown in Figure 4.1.
0 from the filter

 

    

 

 

 

 

 

k.
8(k+1|k) 2 .k-I-l‘k
1'1 X III
“a" *1: k-l 32‘
n X m a
k - 1,2,... , “1|o - 0

Figure 4.1. Optimal single-stage predictor for CCP defined by (4.19).

4.2 OPTIMAL FILTERING FOR CCP
We now consider the case where we are only interested in
the filtering problem. We shall derive a recursive equation for
the filtered estimate that is based on the predicted estimate
based on the previous observation record and the present observa-
tion. We sumarize the results in the following theorem: P"
(4.22) THEOREM. The optimal filtering (8“, P~(k)) k I 0,1,2,...
of the signal process {uk’ k I 0,1,2,...} is :ccomplished via
the following equations:

(a) The optimal filtered estimate is given by the expression

(4.23) 11k = ﬁk‘kd + (10101231c - M(k)ﬁk‘k-1]

for k I 1,2,... with 00 I 0 as the initial condition, where

82
C(k) is the filter gain matrix, and is given by

(4.24) cm .. P~(k‘k-1)MT(k)[M(k)P~(k‘k-1)MT(k) + PVOOJ'1
U U

T
I P (k)M.(k)Pv(k).
E
(b) The filtering error covariance matrix is given by the expression

(4.25) P~(k) - P~(k|k-l) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(R)
u u 5 G

+ Pv(1t)]"x(1t) P (k‘k-l)
E
for k I 1,2,... with P (0) I Pu(0) as the initial condition.
6
PROOF. (a) Note that.
(4.8) I (2.27) holds,

ﬁk'k-l + GGob}. M(k)8k‘k_1] for k I 1,2,...

(1 as

I O I 0 for k I O
O

.. .. .. -1
“(1‘) " [uk‘k-l’ y1t|1t-1]["1t|1t-1’ yk‘k-l] °

From (4.8), we know that

.. .. '1'
[Mk-1’ yk|k_1] M(k)Pﬁ(k|k-1)M (k) + Pv(k) . 1“
On the other hand,

. M(RW

[ﬁk‘k-l’ S"hut-1] " [514191 k‘k-l + ”k1

(4.26) P~(k‘k-1)MT(1() ,
‘1

since v i.uk, Thus

1t n1t|1t-1‘

C(k) - P~(k|k-1)MT(k)[M(k)P (k'k-1)MT(R) + Pv(1t)]'1.
u G

83

Note that this expression has exactly the same form as that derived
for BP (cf. (3.22)). Computations analogous to those in (3.22s)
lead to
T -1
(:00 I P (k)M (k)Pv (k)-
U

(b) From (2.27b), it is seen that the filtering error covariance

matrix is given by

P~(k) = P~(k|k-1) - “(HUMP-1' 6,4191]
u u

for k I 1,2,... and P (0) I Pu(0) for k I 0. Substituting
3
(4.24) for C(k) and (4.26) for [yk|k_1, ﬁka-IJT’ we Obtain

P (k) a P (k‘k-l) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)M1(k)
a a a a

+ Pv(k)]-1M(k)Pﬁ(k‘k-1)

for k . 1,2,..., with pﬁ(0) - ph(0) as the initial condition” QED
We observe that the cross-correlation matrix C(k) does
not appear explicitly in the estimation equations (4.23)-(4.25).
This matrix affects the estimation equations through the single-
stage prediction-error covariance matrix. ‘We must expect this fact
because of the correlation between um”1 - uk and vk. Since vk
is uncorrelated with uj for j s k, the innovation vector yk‘k_1
does not contain any information through C(k) about the signal
uk at the time k. That is why in the expression for the gain
matrix C(k), C(k) does not appear explicitly.
The block diagram of the filter is shown in Figure 4.2.
It is a dynamical system (In’ C(k), In) and operates in a predict-

correct fashion.

“‘4

84

The estimation equations given in (4.22) can easily be
written for the signal xk I §(k)uk) by using (2.22). For the

Rehnan signal (2.17), the usual Kahan filter will be obtained
[K-4].

k I 1,2,... , Go I 0

 

G (k)

nXm

 

 

 

 

 

Unit
De lay

 

from the single-stage
predictor

a1t|1t-1’

Figure 4.2. Block diagram of optimal filter for CCP.
4.3 OPTIMAL SMOOTHING FOR CCP
We continue our study of optimal estimation for the CCP
with an examination of the smoothing problem. We recall from
Section 3 of Chapter 3 that this problem can be classified into
three distinct classes. We shall examine here only optimal single-
stage and fixed-point smoothing. To do this we first derive a
general formla analogous to (3.49) for the present case. N
By the innovation lemma, (4.8), the observation process
has full rank. Therefore from (2.27c) it is seen that the Optimal
smoothing (81“!) Pﬁ(k|L)) k < L is accomplished via the follow-

ing equations with (6k, P (k)) as the initial condition:
{‘1'

85
L

. 11k + 1: Gas. 1|1-1)31M1_1 ,

8
k1‘ 1-k+1

C(k» 1‘1‘1) ' [“k|1-1' 71|1-1][71|1-1’ 91|1-1]-

L
1°60ch - P~<k) - 1: cash-MP
u

1_k+1 1|1-1' “k‘1-1]

where, by (4.8),

y1|1_1 ' y1 ' M<1>°1|1~1

and

If the steps which lead to (3.48) and (3.52) are repeated for the
present case, their results
[uk|i-l’yi|i-1]- Pﬁ (k, i|i-1)MT (i), by noting v11.uk‘1_1

and
P~(k,i+l‘i)IP (k,1|1-1)-P (k,i‘i-1)Mx(i)EM(i)P (1|1-1)M?(1)+Pv(1)]'bu(1)P (1|1-1)
u E E E E

for 1 - 1t, 1t+1,... with the initial condition P (1t,1t|1t-1) -
11
P (k‘k-l).
a (1
Thus we have found the following Optimal estimation equa-
t1ohs for the smoothing (0144' P (km) k < 4, with the 1h1t1a1
a

condition (0k, P (k)):
O

L
(4.27) 8k|LIﬁk+ 1: cat, 1|1- l)[yi- M(i)0=

1-k+1 1‘1'1]
(4.2s) c(1t,1|1-1) - P~(k,i‘i-1)MT(1)[M(1)P (i‘i-1)MT(i) + Pv(1)]"1
u 6
(4.29) P (k,i+1‘ i)IPﬁ(k,i|i-l)-P (k,i‘i-1)MT(i)[M(i)P (i‘i-1)MT(i)
a s s

+ Pv(1)]'1M(i)Pﬁ(i‘i-l), P (1t,1t\1t-1) - P (Mk-I):
a a

ll ....

‘\.s

86

1.
(4.30) P (km I P (k) - z G(k,i|i-l)M(i)P (1t,1|1-1).
a 11 1'10”. 6

Note that these equations have exactly the same form of
those obtained for the BP (cf. (3.50)-(3.53)). The cross-correla-
tion effecusthese estimation equations only through the single-

stage prediction-error covariance matrix P (i‘i-l). We expect
6
1 is correlated with 111 if j 2 1+1.

The optimal smoothing equations (4.27)-(4.29) for the signal

that because v

uk’ which are valid for all the classes of smoothing, hold for the

signal xk I §(k)uk with u E R(§T(k)) and hence for Kalman

k
signals (cf. (2.17)).

The form of these equations for single-stage smoothing
and fixed-point smoothing is easily derived by repeating the steps
leading to the analogous results given in Chapter 3. The equations
derived here for Kalman signals extend well-known results (cf.
[M-4]) to the cross-correlated noise case which was not previously

solved to the author's knowledge.

SINGLE-STAGE SMOOTHING. Optimal single-stage smoothing (Gk‘k+l’

P (1t\1t+1)), 1t - 0,1,2,..., for the e1gha1 6k is accomplished .\
s I"
via the following equations with (00, P (0)) I (O, Ph(0)) as the

u
initial condition:

= 0k + C(k, k+1‘k)[yk+1 - M(k)8

014 Hi 1t+1\ k] ’

G(k,l(+1‘k) - P (k)MT(k+l)[M(k-l-1)P (k‘l’l‘k)MT(k+l) + Pv(1t+1)]'1,
a a

P~(k, 1t+1\1t) = P (k) ,
u E

p (k‘k-i-l) = P (k) - C(k,1t+1|k)u(1t+1)P (k) .
a a 11

87

FIXED-POINT SMOOTHING. Optimal fixed-point smoothing,

(GN‘L’ P (N‘L)), L I N+l,N+2,... for the signal uk, is accomplished
6

via the following equations with (aN’ P (11)) as the initial

u
condition:

an” .. “ML-1 + cm. Lila-MY, - "(whit-1] .
cable-1) - Pﬁm.LIL-1)MT(L)[M(L)Fault-INT“) + mm“ .
P~m.L+1‘L)IP~m.‘L‘L-1)-P~(N.L‘L-1)MT(L)[M(L)Pﬁ(L|L-1)MT(L)+PV(L)]-1
u u u
x available-1). goats-1) - {MN-1)»
u u 11

P (ML) - P (ML-1) - C(N.LIL-1)M(L)P (L. N‘L-l) .
a u a

‘1

I!

\
‘51

CHAPTER 5

COIDRED NOISE PROBLEM ((11?)

Having treated the problem of optimal estimation for un-
correlated and cross-correlated observation noises (cf. (A.1.l),
(A.1.2)) in the preceding two chapters, we turn now to the
estimation problem for cross-correlated colored observation noise.
This is the most general problem studied in this dissertation.
The colored noise problem in Kalman filtering theory was
first discussed by Cox [c-l], Bryson and Johanson [s-S], and
Bucy [8-9], and by Bryson and Henrikson [B-6], Stear and Stubberud
[8-4] and others [M-7], tZ-S], [F-Z]. Bryson and Johansen's work
was based on (i) the "augmented state" procedure suggested by
Kalman [R-4], and (ii) the assumption that colored noise is
generated by a given linear difference (or differential) equation
forced by white noise. The augmented state procedure has not been
widely used because it leads to ill-conditioned computations in N
constructing the filter. Assumption (ii) has been used in all
investigations published to date.
Here we solve the colored noise problem by reducing it to
a wide-sense martingale noise problem and then applying the technique
developed in the preceding chapters. This is a more direct method
than the previous work in this area and gives a different perspective

and yields new results. An advantage of this approach is that the

88

l! ._

‘sw

89

colored noise, which is a wide-sense Markov process by assumption,
need not necessarily be given by a linear difference equation with
white noise input. In addition, the "martingale" approach provides
optimal estimation algorithms for observations corrupted by additive
wide-sense martingale noise. This case has not been considered in
the existing literature as far as the author knows.

We shall begin our study by reformulating the problem in

the following section.

5.1 REFORMULATION OF THE PROBLEM.
Recall from Section 2 of Chapter 1 that the signal and

observation are described by, reapectively,

(5.1) xk I @(k)uk k I 0,1,2,,,.

(5.2) yk = M(k)uk +vk

where

(5.3) vk+1 I y(k+l, k)vk +nk k I 0,1,2,...

The assumptions on the initial conditions xo (or no), v0 and
output noise are the same as those stated in (1.4) and (A.1.3).

The matrices §(k) and M(k) have been defined in (1.4) and (1.5)
respectively.

Note that the unique solution of (5.3), with the initial

condition v0, is

k
vk = *(k,o)vo +-121 s(k,i)ni_1
k -
(5-4) -1(k.o)[vo + 2 1(o,1)ni_1], s1hee 1 1(1t,e)y(1t,1)1(o,1)

iIl

ll. ..

‘ﬁmi

90

for k I 1,2,... . Now, define

k
vO + 121*(0’i)ni-l , if k I 1,2,...
(5.5) mk I
v if k I 0

then it follows that mk satisfies the linear stochastic difference

equation

(5.6) Ink“ a mk + Ho, 1t+1)nk

with 1110 I vo as the initial condition. By Assumption A.l.2,
vo L'nk’ V'k I 0,1,2,..., hence, in view of (2.14), the stochastic

process {mk’ k I 0,1,2,...} is a wide-sense martingale process

with zero mean, and also

(5.7) [qu+1 - mk, mk+1 - mk] = 1(o,1t+1)Pn(1t)1T(o,1t+1) 1t - 0,1,2,...
where Pn(k) 2 [nk, nk]. In addition,

(5.8) u m 1 - m1] . c(k)1T(o,1t+1)6

ks 1+ 1‘93 . 0:192:00.

[uh-+1 ' kj

since, by (A.1.3), [uk+1 - uk, nj] I C(k)6kj' Finally we note

"\

from (1.4) that [110, bk] - o v k - 0,1,2,... which implies "

(5.9) [60, mk+1 - bk] - 0 1t - 0,1,2,...

From (A.1.3) and (5.5) we conclude the fOllowing useful

results:

(5.10) uk+1 - uk .1..mj and mk+1 - mk 1.uj

VJ ‘ k, k!) - 0,1,2,... 0

It. ,_,

‘wa

91

Observe from (5.4) and (5.5) that vk I §(k,o)mk, k I 0,1,2,...

Hence, (5.2) and (5.3) can be combined in one equation as
(5.11) yk I M(k)uk + §(k,o)mk k I 1,2,...

The problem is thus to find the minimum mean-square estimate of

the signal. xk I §(k)uk, or, equivalently, in view of Section 3

of Chapter 2, of the signal uk, from.the data Y(L) I {y1,y2,...,yL}
when OR is related to the data Y(L) by (5.11). The solution

to this problem is given in the following three sections.

5.2 OPTIMAL IREDICTICN FOR (NP
As in the preceding two chapters, we first derive a formula

for the general predicted estimate Ok‘ , k >1L. To do so, we

L

proceed as follows:

n
ﬁk|L (uk‘L2(y,L)) , by the orthogonal projection lemma

- (uk - u, + u,lL§<y;t>)

- o, + (9k - ways». since <u,lL§(y:t)) - a,

for k >1L, k,L I 0,1,2,... . From (1.4) and (5.10), we see that a

n 0
uk - UL i L2(y,L). Therefore

(5.12) a - 11

L’ k 2 L k,L I 0,1,2,...

k|L

where 0L is the optimal filtered estimate at the present observa-

tion time L. It is assumed that the filtered estimate 0L is

obtained by using the filtering algorithms which will be derived

in Section 3 of the present chapter.

l! e

\
"~.u

92

Obviously (5.12) is valid for all the classes of prediction.
At this point, we note that we obtained exactly the same result for
the BP (cf. (3.2)), i.e., the general predicted estimate is equal
to the filtered estimate at the last observation time. So, the
form (5.12) for each class of prediction will be exactly the same
one that was obtained for the corresponding class in Chapter 3.
These forms are repeated here for the sake of completeness without

further comment.

FIXED-INTERVAL PREDICTION. Optimal fixed-interval prediction
(Gk‘L, P~(k‘L)), k I L+l,L+2,..., L I fixed-positive integer is
u

accomplished as follows with the initial condition (8L, P (L)):
O

I 6

(5.13) a

k‘L L ’
(5.14) P (W) - P (L) + P (k) - P (L)
s a “ “
k-l
=P(L)+ 2: (2(1) .
a 1-L

FIXED-POINT PREDICTION. Optimal fixed-point prediction (ON‘L,

P mm), L - 0,1,2,...,N-l, N - fixed-positive integer is
E
accomplished via the following equations with the boundary condi-

tion ( , P (N)):
“N E
(5.15) le‘L as “L ,

640 {mH>-%s)+%m)-%a).
U

FIXED-IEAD PREDICTION. Optimal fixed-lead prediction (ﬁIHL‘L’
P (L+L‘L)), L I 0,1,2,..., L I fixed-positive integer, is given
E
by the following equation with (0140, P (L|o)) - (o, Pu(L)) as
E

l!

~I“

\
‘wd

93
the initial condition:

8

(5°15) uLﬂ‘L '3 L s

(5.16) Puar'tm - {(16) + Puma.) - Pun) .
11

Consider the special case L I l, i.e., the single-stage

prediction (0L+11L’ P&(L+1‘L)):

“um ' 6:, ’

P3141111.) - P51.) +90.) .

u u

As in the preceding two chapters, we now develop the recursive
algorithms for this case based on the last observation and previous
predicted estimate. To do so, we need the following lemma which

was called the innovation lemma in the previous chapters.

(5.17) INNOVATION IEMMA. The observation process {yk’ k I 1,2,...]
which is defined by (5.11) has full rank, i.e.
[yk+1|1t’ yk+l|k1 > 0 °

PROOF. We first derive an expression for the innovation vector

yk+l‘k' Doing so, we note by definition
PM” - VH1 - swims» .
Since, by (5.11), yk-I-l I M(k+l)uk+1 + ﬁ(k+l,o)mk+1 we have

(5'18) S"1t+1|1t " y1t+1 " “(humans ' “Hlmmk-tqk

for k I 0,1,2,... . Note this expression involves the optimal

I!

I
\m‘

94

single-stage predicted estimate of the observation noise mk. This

estimate is computed as follows:

a‘1t+1\1t a (mk+1‘L(y‘k))

(mkﬂ - Ink + mk|L(y;k))

(mk|I-(y;k)). since, by (5.10), mk+1 - “‘1t 1 L(y;k)

No.10 (1(k.o)mk\L(y;k)). by 1(o.k)1(k.o) - IIn and (2.2191)
I H0.1<)(yk - M(k)uk\L(y;k)). by (5-11)

(5.19) I y(o,k)[yk - M(k)ﬁk] , since yk E L(y;k) .

Substituting (5.19) into (5.18) and noting that 8k I and

°k+1|1t
y(k+l,o)y(o,k) I y(k+l,k) we get the result

- M(k+l)8 - '(k+1’k)[yk.- M(k)8

91+” k ' y1t+1 1t+1\1t 1t+1|1t]

(5.20) - yk+1 - V(k-i-l,k)yk - [M(Hl) - '(k+l,k)M(k)]Ok+l‘k

for k I 0,1,2,... . Note that to process the innovation vector
9k+l‘k we need the observation vectors yk*1, yk (i.e. the last
and preceding observation vectors) and the single-stage predicted
estimate based on the observation record Y(k).

In order to prove that the observation process has full
rank, in view of Section 3 of Chapter 2, we must show that

8"le 71.1111 > 0. v1: - 0,1,2,... . From (5.20) and (5.11),

it follows that

~'~

\
“nu

95

I (M(k-l-l) - s(k+l,k)M(k))ﬁ' + §(k+l,k)M(k) (u

S"la-I‘lt k‘l'l‘k 1t+1 ' uk)
+ t(k+l,o)(mk+1 - mk)

(5.21) - mansion” + 1(1t+1,1t)11(1t)(uk+1 - uk) + 1(1t+1,o)(mk+1 - mk)

for k I 0,1,2,..., where the definition
(5.22) iota) - M(k+l) - '(HIJOMOC)

is made as a notational convenience. Using (5.21) we may write

wk+111t51t+1|RJIEmkak-tuk 4" t<k+1.k)M(k)(Uk+1 - 11k) 4' NHLO) (mk+1'mk) .
i~'1(1t+1)sk+1|k + '(k‘l’l,k)M(k)(uk+1-uk)+ 1(1t+1,o)(mH1-mk)]
for k I 0,1,2,... . Noting that

[GHI‘ k,sk+1‘k] 9 Pa (1t+1,1t) ,

.. _ - T
[“1t+1|1t'“1c+1'“1t] [uk-i-l uk’uk-l-l‘k]

IQ(k) , since u u 1.8

1t+1 ' R ”1‘1: ’
.. .. T
[uk'l-l|k'mk+l " ”1.1 " [mid-1 ' “‘18 uk-i-l‘k]
. C(k)§T(O,k~I-l) , s1hce mHl-q‘ 1 °h+1|1t'“1t .\
d
[“k+1 ' “18 “H1 ’ “k3 ' Qm '
[u -u - ] I [ u -u ]T
1t+1 k’mk-l-l “‘1. "wf‘ht' an R
- C(k)§T(o,k~l-l) by (5.8)
[mm-mkmm-mk] - 1<o.P+1)Pn(k>1T<o.1c+1> . by (5.7)

we obtain the result

l!

~--§.

\
‘wd

96

[yk+1‘k,yk Mk] - M(k+l)Pﬁ(k+l\k)MT(k+l) + §(k+1)Q(k)uT(1t)yT(k+1,k)
+ ﬁ(k+1)c(1t) + 1(k+1,k)M(1t)Q (k)MT (1t+1)
+ 1 (k+l,k)M(k)Q (k)uT(k)1T(1t+1,k)
+ y(k+1,k)M(1t)C(1t) + CT (1t)ﬁT (1t+1) + cT(k)MT(1t)¢T(1t+1,1t)
+ Pn(k)
(5.23) - Roan) P~(k+1‘k)ﬁT (1t+1) + M(k+l)Q (k)MT(k)¢T(1t+1,1t)
u

+ 1(1t+1,k)M(k)Q(1t)MT(1t+1) + M(k+l)C(k) + CT(k)MT(k+1) + Pn(k)

- 1<k+1.k>n(k><2 (k)MT <k>1T(k+1.k>

for k I 0,1,2,... . From (5.16) we have P (k+l‘k) I P (k) +-Q(k).
ﬁ “11'
Substituting this into (5.23) and simplifying the result using

(5.22), we get
(5.24) [§Hl‘k’yk+l‘k] - ﬁ(1.+1) P~(k)ﬁT (1t+1) + u (1t+1)Q(1t)uT(1t+1)
U
+ M(Hl)C(k) + CT(k)MT(k+1) + Punt)

for k I 0,1,2,... . It is clear that this m X‘m matrix is in-

vertible if
M(k+l)c(k) + CT(k)MT(k+1) + Pn(k) > o .
By assumption A.l.3 this is so, therefore
WHI‘P’ yk‘i‘l‘k] > o
as desired. For notational convenience, we shall denote this matrix

by P~ (1t+1| 11) . QED
y

ll.

~.~

“wad

97

(5.25) THEOREM. Optimal single-stage prediction (11 P (1t+1\1t))
a

1t+1|1t'

k I 0,1,2,... for the signal u is accomplished as follows:

k
(a) The stochastic process {Bk+l|k’ k I 0,1,2,...}, which is
defined by the single-stage prediction estimate, is a zero mean

wide-sense martingale, and is generated by the recursion

(5.26) 6W,k - 11M,“1 + c<1<+1|1<>tyk - 1<k.k-1)yk_1 - Menu“)

for k I 1,2,... with G I 0 as the initial condition. The

l|o
n X m gain matrix G(k+l|k) is given by

(5.27) cot-+1110 - [P~(k‘k-1)ﬁT(k) + c(k-1)]P'1(k|1t-1) .
u P

(b) The stochastic process {ﬁk+l‘k’ k I 0,1,2,...}, which is

defined by the single-stage prediction error ﬁk+llk given by

(5 .23) " - [In-C(k+l)M(k)]ﬁ'

uHI‘ k -c(1t+1|1t)x(1t- 1) (uk-u

1t|1t-1 k-l)

- c(k+1‘k)¢(k.0) (mk - “5(4) + “H1 ' "k

for k I 1,2,... with the initial condition “1‘0 I ul, is a zero-
mean wide-sense Markov process. The covariance matrix of this
process is given by the recursive equation
(5.29) P (k+1‘k) - P (1416-1) - c(1t+1|1t)[ﬁ(1t)P (k|k-1) + CT(k-l)] +Q.(k)
ﬁ ﬁ E
for k I 1,2,... with P (1‘0) I Ph(l) as the initial condition.
6
PROOF. (a) That the stochastic process {ﬁk+l‘k’ k I 0,1,2,...}
is a zero-mean wide-sense martingale process follows from (5.12)

and the properties of orthogonal projection as demonstrated in

(3.22s).

it.

~I~

\
“sad

98
From (5.17) we conclude that (2.27) holds, i.e.,

a + C(k+1|k)y'--

1t+1|1t ' t‘k|k-1 k‘k-l

- 1114191 + G(k+1‘k)[yk - (1(1t,1t-1)yk_1 - M(k)0k‘k_1]

for k I 1,2,... with 81“) I O as the initial condition. In
obtaining the last equality, we used (5.12), (5.20) and (5.22).

The gain matrix is given by (2.29):

-1
G (”1‘“) " [Hun k-l’yk‘ 1t-1J [914 k-l’yk‘ 1t- 1]

where the only unknown is the n X m matrix [uch-l‘k-l’yk‘k-l]°

This matrix is determined as follows:

Plans

-u+ﬁ

[ﬁk-tl‘k-l’yk‘k-l] ' [“16“ k k‘k-l’ +'(l"1"1)“’1t'

1t|1t-1 “k-I)
+ '(kso) (“k ' Ink-1)]

(5.30) - P~(k‘k-1)ﬁT(k) + cot-1) .
‘1
Thus, we have found
(5.27) C(k'i'l‘k) - [P (k‘k-1)MT(k) + c(1t-1)]P"1(1t|1t-1)
ﬁ' 7

where P (k‘k-l) is given by (5.23) (or (5.24)).
Y
(b) By definition the single-stage prediction error is

ﬁk-l-l‘k I qu - ﬁk-I-l‘k’ k I 1,2,... and “1‘0 I 111 for k I 0.

Noting that OH,” is given by (5.26), we obtain

99

(5.23) “1611‘s - 6H1 - “Mk-1 - G(k+l| 1:)[yk-1 (k.o)yk-1-M(k)ﬁk‘k,1]

= [In - G(k+1‘k)ﬁ(k)]tik‘k_1 - 6(1t+1|1t)u(1t-1)(uk - uk_1)

- 6(16+1|k>1<k.o)(mk - u1M) + “1+1 ‘ “1

for k I 1,2,... with ﬁl‘o I u1 as the initial condition. It

is clear that O has zero mean.

1t+1| 1:
We now prove that the stochastic process {ﬁk+l‘k’ k I 0,1,2,...]

is a zero~mean wide-sense Markov process. To begin, we define

F(k) ‘3 In - c(1t+1\1t)ii(1t)

r(k) El-[-c<1<+1|1<>ua<-1>. -c<k+1|k>1<k.o>. 163

 

 

and
r- -
“1t “1911
w 9. -
16 “'k “ﬁt-1
Lqu - “it;

Then (5.26) can be written as

ﬁk‘l‘l‘k - “10614191 + rmwk 1t - 1,2,...

From the definitions it is clear that F(k) is invertible and
the stochastic process {w , k I 1,2,...] is a white-noise process
with zero mean and that
T
r(we-1) cot-1)) (O,k) o T

[wwwj] . ,(o,k)cT(P-1) 1(o,k)Pn(1t-1)¢T(1t,o) o 51.5 .

 

 

L. o 0 0(li

I

100

Furthermore, since M(O) I O and C(l‘o) I O, ﬁl‘o I u1 l.w1 and
therefore ﬁl‘o .Lwk V k I 1,2,... . Hence, the stochastic process
{ﬁk+1\k’ k I 0,1,2,...] is a zerowmean wide-sense'Markov process,
by virtue of (2.17).

Now it remains to find an expression for the single-stage

prediction error covariance matrix P (k+l‘k). From (2.30) we have
5

P~(k+1‘k) . P~(k+1‘k-1) - c(lt+1|1t)[yk‘k_1,
u

u ti1416-1]
for k I 1,2,... with P;(l‘o) I Ph(l) as the initial condition.
Substituting (5.27) for :(k+l‘k), (5.30) for [yk‘k_1,sk‘k_1]T,
and noting from (5.26) that

P~(k+1‘k-l) - P~(k‘k-l) +Q(k)

U U

we obtain

U

P (1t+1|k) - P (k‘k-l) - [P (k‘k-1)ﬁr(k) + cT(1t-1)_]P'1(1t|1t-1)[f1(1t)P (k‘k-l)
.. s s 9 a
+ CT(k-1)] +Q(k)

for k I 1,2,... . QED
This theorem gives the recursive algorithm for computing w
the single-stage prediction. The information flow in the predictor
is shown in the Figure (5.1) which is a representation of (5.26).
We observe that the predictor (5.26) requires the storage of one
observation. This is the only difference between the computational
procedure of the predictor given in Figure 5.1 and the preceding
two given in Chapters 3 and 4.

We now write the optimal single-stage prediction equations

for the signal. xk_I {(k)uk’ by assuming that “k 6 R(§T(k))

.56 now acuouooua owwuauoawsao .uOu Sumac seq—n {Wm enough

- a x a
3.2.5

 

o m A3!

0
0' —HQ 0 OOOONOH.3

nauseous»
Onion oeuOuoo

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

333$ -
mwo osoueoum sxﬂ — 533
3.5 cs: 3.5
segues»
. a is 83:83 - s x s ,
x .23 i

 

 

 

cocoon.— a3 sateen—o

 

‘1‘

Dion 6 - x
3.5 2.1.5 .
6 4

Monaco“:— amounted we.“ a Hue.“ sec 0

 

It.

‘0“.

\
‘sml

102

v'k I 0,1,2,... . The derivations of these equations are straight-
forward and are omitted.

Optimal estimate:

(5.31) 2H1“, = <P(k+l)6+(k)$‘ck‘k_1 + 1((k+1|k)[yk-1(1t,k-1)yk_1-ﬁ(k)sk‘k_1]

for k I 1,2,... with 2 I 0 as the initial condition, where

l‘o

(5.32) 1((k-1-1‘k) a s(k+1)[§+(k)P~(k‘k-1)HT(k) + C(k-l)]P:1(k‘k-l)
x P

(5.33) ﬁat) = H(k) - y(1t,1t-1)11(1t-1)1>(1t-1)1+(k) .

Error covariance matrix:

T
(5.34) P~(k+l|k) a 6(k+1)6+(k) P~(k‘k-1)§+ (1t)1T(1t+1)
x x

~ +T '1' T
- K(k+1|k)[H(k)P~(k‘k-l)§ (k) + c (k-l)]§ (1t+1)
x

+ 1(1t+1)Q(1t)6T(1t+1)
for k I 1,2,... with P (l‘o) I Px(l) as the initial condition.
51

We remark.here that the proposed single-stage predictor
(5.31)-(5.34) for the signal xk.- 9(k)uk is of dimension n
instead of (n+m) as in the augmented state predictor given by
Bryson and Johanson [8-5] who considered a smaller class of signals
(Kalman signals). The results of this chapter also extend Bryson
and Henrikson's [B-6] work to cross-correlated colored noise in
the larger signal class. The wide-sense martingale approach we

develop is conceptually and computationally simpler.

5.3 OPTIMAL FIIJ‘ERING FOR (NP
We continue our study by an examination of the optimal
filtering problem. We wish to develop an algorithm for optimal

filtering of the signal' u In doing so, we assume that only

k.

103

the initial estimate 6010 I 00, and the filtering error covariance
matrix at the initial time, P~(o\o) I P (0), are known.

As in Chapter 3, fromr(5.12) weuobserve that optimal pre-
diction and filtering are interdependent in terms of the determina-

tion of the filtered estimate given the predicted estimate and

vice-versa. In fact

Thus the single-stage predicted estimate algorithm (5.26) can be
used to process the optimal filtered estimate. So, in order to
solve the filtering problem, it remains to find a recursive
expression for the filtering error covariance matrix. This is
done in the following theorem.

(5.35) THEOREM. Optimal filtering (0k, P~(k)) k I 0,1,2,...

u
for the signal u is accomplished as follows:

k
(a) The stochastic process {ﬂk’ k I 0,1,2,...}, which is defined
by the filtered estimate, is a zero-mean wide-sense martingale,

and is generated by the recursion

(5.36) ﬁk I ﬁk-l +‘C(k)[yk ' f(k,k-l)yk_1 "M(k)°k-1]

for k I 1,2,... with 00 I 0 as the initial condition. The

n X m matrix C(k) is given by

(5.37) out) - [P~(k\k-1)MT(k) + c(k-1)]P'1(1t|1t-1)
u I

where P (k‘k-l) is given by (5.24) .

y
(b) The stochastic process {ﬁk’ k I 0,1,2,...], which is defined

by the filtering error Gk, given by

104
(5.38) 11“, I [In-C(k)ﬁ(k)]ﬁk-I+[In-G(k)M(k)](uk-uk-l)

+ 6001090) (Ink - mk_1)

for k I 1,2,... with the initial condition 60 I no, is a zero-
mean wide-sense Markov process. The covariance matrix of this
process is given by the recursion
(5.39) P~(k) I P~(k‘k-l) - G(k)[M(k)PL(k|k-l) +-CT(k)]
u u U
for k I 1,2,... with P;(0) I Pu(0) as the initial condition.
u

PROOF. (a) It follows from (5.12) and (5.25s).

(b) By definition

3 I u - G

k k k, k'0,1,2,... 0

Substitute (5.32) for 0k to obtain

C!
I

k uk - (1k_1 - cat)[yk - 1(k.k-l)yk_1 - ﬁ(k)iik_,1]
. ﬁk- l-l'uk-uk_ 1-c(1t)[u(1t)uk-1 (k,1t-1)M(1t- 1) uk_ 1i(k)8k_ 1

+ '(ksoﬂnk ' *(k307mk_1]

ak_1-c<k)(m(k)-1 (16.1.- 1)M (1-1) )uk_1-ﬁ<k>nk_,-M<k> (“1‘“).-1)

+ §(k,o) (Ink - mk_1)] + uk - uk_1

(In-C(k)ﬁ<k>1ﬁk_ 1+1 In-c (mum (uk-uk_1>+c(k)1 (km) (Ink-m1“ 1)

for k I 1,2,... . Obviously uo I no - 80 I no, since 00 I O.
Utilizing the same procedure as that in (5.32b), we prove
that the stochastic process {ﬁk’ k I 0,1,2,...} is defined by

(5.34) is a zero-mean wide-sense Markov process.

~,~‘

\
"\.-l

105
To complete the proof, notice from (5.16) that
P (k) = P (1t+1|1t) - Q(k) k - 0,1,2,... .
ﬁ 6
Substituting (5.29) for P (k+l‘k) and noting that G(k+l‘k) I C(k),

ll
we obtain

P~(k) - P~(k‘k-l) - C(k)[ii(1t)P~(k‘1t-1) + CT(k-l)]

U U u

01'

P~(k) = P~(k-1) - G(k)[M(k)P~(k-l) + ﬁ(k)q (16-1) + cT(1t-1)] +Q(k-l)

u u u
for k I 1,2,..... For k I 0, it is clear that P;(O) I Pb(l)
by (2.27b). u QED
We see from the theorem that, as we expected from the
results of Chapter 3, the proposed filter has exactly the same
dynamical structure as the single-stage predictor discussed in the
preceding section. Thus, the same dynamics can be used to generate
the filtered and predicted estimates. The error covariance matrices
of these estimates can be computed by the recursive equations (5.29)
and (5.39) or after one of them is computed by one of these equations
then the other follows from the relation (cf. (5.16)):
P~(k+l‘k) - P~(k) +Q(1t) k - 01,2..... .
u u
The optimal filtering equations for the signal xk I §(k)uk,
such that uk 6 R(QT(k)), is obtained from (5.35) by making use of
(2.22) as follows:

Optimal estimate:

(3.40) 11, - C(k)§+(k-1)Rk_1 + wow, - 1(k.k-1)yk_1 - ﬁ(k)ﬁk_1]

106
for k I 1,2,... with the initial condition so I 0, where

(3.41) we =1P~<k\k-1)PT(P> +P<k)c<k-1)1P51<k1k-1)
x 9

(3.42) ﬁat) - H(k) - V(k,k-l)H(k-1) .

Error covariance matrix:
(3.43) P (k) - P (k‘k-l) - r(k)[ii(1t)P (k‘k-l) + cT(1t)§T(1t)]

i? i i'
for k I 1,2,... with P (O) I Px(0) as the initial condition.
ii

We remark at this point that the filter and predictor for

the signal xk I 9(k)uk have different gain matrices as Opposed

to the filter and predictor for the signal u where they have

k
the same gain matrices (cf. (5.25), (5.35)).

5.4 OPTIMAL SMOOTHING FOR (NP

To complete our study of optical estimation of the signal
xk I {(k)uk under colored noisy observation, we examine the
optimal smoothing problem. We proceed as in the preceding chapter
by deriving a general foruula for Optimal smoothing.

We observe from the innovation lama 5.17 that the observa-
tion process has full rank. Hence, by (2.27c), the optimal smooth-

1hg (11 P (k‘L)) k < L is accomplished as follows with
11

k|L’

(0 P (k)) as the initial condition:
'11

k!

1,
o -n + 2 cat. 111-1);

1t|L 1t 1-1t+1 1|1-1'

.. -1
60‘4““) " [“1t|1-1’ 91|1-1][91|1-1’71|1-1] ’

L
P k - P k - k, 1 1.]. ,u
i1( ‘15) U( ) iIft+lG< ‘ )[91‘1-1 141-1]

I!

~ 'K.‘

‘
“~14

107

where by (5.17)

y1‘1-1 = y1 ' '(l’i'lw1-1 ' M(““1‘1-1
and
[y111-1’y1\1-1] I P;(i|i-l)
y T
M(i)P (1)11 (i)+MT(i)Q(i-1)MT(i)-l-M(i)C(i-l)

a
+ CT(i-l)MT(i) + Pn(i-l)

Note that the only unknown in the above equations is the n X m

matrix [uk|i-l’ yi‘1_l]. As in the preceding two chapters, this

matrix is determined as follows:

(5.21) e [ﬁh|1-1’§1\1-I3 - [sk‘ 1_1,M(i)ﬁi‘1_1ﬁ(i,i-1)M(i) (111-111.4.)

+1(1.o)(m1 - 9114)]

(1.4), k S 1+1, (5010) 3 Uk‘ 1-1 L “1 - “1-1 8 mi - m1_1

Therefore,

[“141-1’ y1|1-1J ' [“141-1’ m1min-1]

(5 .44) I P~(k, 1|1-1)ﬁT(1)
ll

d as
where Pﬁ(k, i|i-l) [uk‘1_1, ﬁi|i-l]' The problem is now to
find the expression for the cross-covariance matrix P (k,i‘i-l).

u
Noting that

- 5141-1 - C(k, 1‘1'1)§1|1-1 k < 1

and

108

- 0(1+1\1)y u by use of (5.26)

ﬁ1+1\1 ' 6111-1 1|1-1 + ii1-1-1 ' 1

we write
[“14 1’ﬁ1+1|1] ' W14 1-1'G (k. ll11'1)y1|1-1"“i1|1-1‘G0H'1‘1w1|1--1""1+1'“1]‘

From the above, we know that uk‘i-l .L u1+1 - u1 and it can

easily be shown that u1+1 - u1 1.91‘1_1. Therefore

IO-

P~(k,i+1| 1) [61“ 1,111,1‘1]
u

P~(k,l‘ 1-1) - c(1t,1|1-1)[y'1‘1-1,111‘ 1-1]'[“h|1-1'91|1-1]°T(“111)
u

+ G(k,i‘ 1-1)[y1‘1_1.yi‘ 1_1](:"’(1-11| 1)

Pan" 111'1)'[ﬁk\1-1’91‘1-1][71|1-1'71|1-1]-1[71|1-1’61‘1-1]

where to derive the last equality, we used (2.29) and (2.34).

Substituting (5.24), (5.30) and (5.44) into this equation yields

the recursion

P (h,1+1|1)-P (k,1|1-1)-P (k,i‘i-l)ir(i)P-l(i‘i-l)[MT(i)P (1|1-1)-1cT(1-1)]
s a a 9 a

for 1: - k, 1t+1,... with P~(k,k+1‘k) - P (It) as the 1h1t1s1
condition. u a
In summary, we have found that the optimal smoothing
(0k, P~(k[L)) is accomplished via the following equations with
(0k, P:(k)) as the initial condition:
u

L

(5.44) on 1. - 01‘ + 1-ft+lc(k.1‘1-l)[y1 - "1.1-1)):14 - seminal

(5.45) c(1t,1|1-1) - P (k,i‘i-l)§r(i)P-l(i|i-l)
G S"

109
(5.46) P (k,1+1|1-1) - P (k,i‘i-l) - c(1t,1|1-1)[i'1(1)P (1|1-1)
ﬁ 8 6
+ cT(1-1>]. P (k,k+l\k) - P (k).
u ..

u

L
(5.47) P (147,) - P (k) - z-w c(1t,1|1-1)ﬁ(1)P (1,1t|1-1) .
a a lIk+1 11

These optimal estimation equations for the signal uk,
which are valid for all the classes of smoothing hold for the signal
xk I §(k)uk such that uk 6 R(§+(k)) as in the preceding two
chapters when u is replaced by x and M1 is replaced by H
which is defined by (5.33). For the class of Ralman signals, the
proposed smoother, i.e., (5.44)-(5.47) is apparently new.

The forms of the optimal estimation equations (5.44)-
(5.47) for the single-stage and fixed-point smoothing can easily
be obtained by repeating the steps leading to analogous results

in Chapter 3. We state only the results.

SINGlE-STAGE SMOOTHING. Optimal.single-stage smoothing
(1114“,. Pﬁ(k|k+1)) 1t - 0,1,2,... for the Signal uk is
accomplished via the following equations with (00, P (0)) I (0,Ph(0))

u
as the initial condition:

ﬁk‘k‘l-l I 0k + G(k,k+1‘k)[yk+1 ‘ 1(k+1.k)yk ' ﬁ(k+1)ok+1‘k]

C(k,1t+1|1t) - P (k)ﬁT(1t+1)P'1(1t+1|1t)
ﬁ 9

P (k,k+l‘k) - P (k)
a s

P (k‘k-i-l) - P (1:) - c(1t,1t+1\1t)ﬁ(1t+1)P (k)
a a u

where all the terms are defined as before.

110

FIXED-POINT SMOOTHING. Optimal fixed-point smoothing (“W7 P (u|L))
U
L I N+l,N+2,... NII fixed positive integer, is accomplished via

the following equations with the initial condition (“11’ P 01)):
ti

“ as ‘1' '1 " a ’1 ~
6k” (ML-1 601,111. ml, ((1.1. )YL_1M(L)0L‘L_11

comm-1) - PGCN.L‘L-1)§T(L)P:1(L‘L-1)
y

P (N.L+1\L) I P (Nah-1) _ C(N.LIL-1)[i(L>P (1116-1) + era-1)]
s s a

P (N,N+1‘N) I P (N)

s a

P 0111.) I P (ML-1) - G<N.L\L-1)ﬁ(L)P (13111-1)
11 0 s

where all the terms are defined as before.

" .K“

‘5
\nl

 

CHAPTER 6

CONCLUSIONS

This chapter includes a discussion of the main objectives of

thesis and possible extensions.

6.1 CONCLUSIONS AND RESULTS

The dissertation presents the derivation of the optimal mini-
mum mean-square estimation equations for the signal model
xk . §(k)uk, k - 01,2,,... where Q(k) is an n X n matrix and
uk is a wide-sense martingale process. The primary results are
theorems which demonstrate the recursive and algebraic optimal
estimation equations for prediction, filtering and smoothing when
observations are corrupted by additive uncorrelated white, cross-
correlated white, and cross-correlated colored noises.

After briefly describing the discrete-time linear estimation
problem and a literature review, Chapter 1 discusses the statement
of the problem. A literature review suggests that the signal used
in the Kalman approach can always be written in the form
xk - Q(k)uk where Q(k) is an n X n invertible matrix and uk
is a wide-sense martingale. This, in turn, suggests that the optimal
estimation equations for a signal xk - Q(k)uk, where Q(k) is not
necessarily an invertible matrix and uk 18 as above, could be applied

in an investigation of the prediction,filtering and smoothing prob-

lems of Kalman.

lll

112

In Chapter 2, the Hilbert Space of random vectors, multi-variate
wide-sense martingale and wide-sense Markov processes are briefly dis-
cussed. It is shown in this chapter also that the optimal estimation
problem of a second order signal (stochastic process) is equivalent
to determination of two matrices assuming that the observation process
has full rank.

Chapters 3, 4 and 5 include the derivation of optimal predic-
tion, filtering, and smoothing equations for the signal xk I §(k)uk
when observations are corrupted by additive uncorrelated white noise,
cross-correlated white noise (Ch. 4) and cross-correlated colored
noise (Ch. 5). Several new results for Kalman signals have been
obtained. For example, a new and simple approach to discrete-time
linear smoothing problems is developed in Chapters 3, 4, 5. The
Optimal estimation equations for cross-correlated colored noise
problems are apparently new for Kalman signals.

In summary, this thesis gives a new approach to solving the
prediction, filtering and smoothing problems of Kalman and their
extensions, for a more general class of signals. Existing deriva-
tions of Kalman filtering in the simplest case are complicated. The
complications are due to unnecessary analytic assumptions on the
signal model. The method is developed in this thesis is algebraic
in nature and gives simple derivation of Kalman filtering in all
different cases. This avoids analytic assumptions and a distinct
approach to each case.

This work has been directed at the theoretical foundations
of discrete-time linear estimation and has not considered detailed

applications. It is hoped that this new approach will provide a

113
basis for such applications.

6.2 EXTENSIONS
There are a number of topics for further research which are
suggested by this work, for example:

(1) The present approach can be applied to yield results for the
infinite-dimensional discrete-time case.

(2) Because of the availability of innovation decomposition in con-
tinuous-time [K-B], this approach again can be extended to con-
tinuous-time case involving wide-sense Markov signals covering
Falb's work [F-l], using [M-B].

(3) The impact on stochastic optimal control of these new approaches
to linear estimation should be explored.

(A) The question of asymptotic behavior of the filter in view of

this approach should be investigated.

The problems (1) and (2) are partially settled by the author

and will be completed in a subsequent work.

REFERENCES

[A-l]
[A—Z]

[B-l]

[B-z]

[3-3]

[3-4]

[8-5]

[8-6]

[3-7]

[B-8]

[3-9]

REFERENCES

Aoki, M. (1967). "Optimization of Stochastic Systems,
Topics in Discrete-Time Systems", Academic Press, N.Y.

Astr8m, Karl J. (1970). "Introduction to Stochastic
Control Theory", Academic Press, N.Y.

Beutler, F.J. (1963). "Multivariate Wide-Sense Markov
Processes and Prediction Theory"; Ann. Math. Statist. 34,
(424-438).

Blackman, R.B., HJW. Bode, and C.E. Shannon. (1948).
"Data Smoothing and Prediction in Fire-Control Systems",
Research and Development Board, Washington, D.C., August
1948.

Bode, H.D. and C.E. Shannon (1950). "A Simplified Deriva-
tion of Linear Least Squares Smoothing and Prediction
Theory"; Proc. IRE, gs, (417-425).

Brown, J.L., Jr. (1962). "Asymmetric non-mean-square
Error Criteria", IRE, Trans. PGAC, AC-7, (64-66).

Bryson, A.E. and D.E. Johansen (1965). "Linear Filtering
for Time Varying Systems Using Measurements Containing
Colored Noise"; IEEE Trans. Automatic Control, AC-IO,
(4-10) 0

Bryson, A.E. and IHJ. Henrikson (1968). "Estimation Using
Sampled-Data Containing Sequentially Correlated Noise";
J. Spacecraft 2, (662-665).

Bryson, A.E., Jr. and Yu-Chi Ho (1969). "Applied Optimal
Control, Optimization, Estimation, and Control"; Blaisdell
Publ. Co.

Bucy, R.S. (1967). "Optimal Filtering for ColoredHNoise";
Journal of Mathematical Analysis and its Applications 22,
(1'8).

Bucy, R.S. and Peter D. Joseph (1968). "Filtering for

Stochastic Processes with Applications to Guidance"; Inter-
science Publishers.

114

[C-l]

[c-z]

[c-3]

[6-2]

[J-l]

[K-l]

[K-Z]

[x-B]

115

Cox, H. (1964). "Estimation of State Variables via Dynamic
Programming"; 1964 JACC Proceedings, June 1964, (376-381).

Cramér, H. (1961). "On some classes of non-stationary
stochastic processes"; Proceedings of the Fourth Berkeley
Symposium on Mathematical Statistics and Probability,
Berkeley and Los Angeles, University of California Press,
2, (57-78).

Cramér, H. (1966). "A contribution to the multiplicity
theory of stochastic process"; Proceedings of the Fifth
Berkeley Symposium on Mathematical Statistics and Proba-
bility, Berkeley and Los Angeles, University of California
Press, 3, (215-221).
Deutsch, R. (1965). "Estimation Theory", Prentice-Hall.

Doob, J.L. (1953). "Stochastic Processes"; John Wiley &
Sons, Inc., N.Y., Third Printing.

Falb, P.L. (1968). "Infinite Dimensional Filtering: The

Kalman-Bucy Filter in Hilbert Space"; Information and
Control, 11, (102-137).

Fujita, S. and T. Fukas.(l970). "Optimal Linear Fixed-
Interval Smoothing for Colored Noise"; Information and
Control, 11, (313-325).

Gauss, K.F. (1963). "Theory of the Mbtion of the Heavenly

Bodies about the Sun in Conic Sections"; New‘York: Dover
Publications, Inc. (reprint).
Gikhman, 1.1. and A.V. Skorokhod (1969). "Introduction

to the Theory of Random Processes"; H.B. Saunders Comp.,
Philadelphia.

Jazwinski, A.H. (1970).
Theory"; Academic Press.

"Stochastic Processes and Filtering

Kailath, T. (1968). "An Innovation Approach to Least-
SQuare Estimation, Part I: Linear Filtering in Additive
White Noise"; IEEB on Control, AC-l3, (645-654).

Kailath, T. and P. Frost (1968). "An Innovation Approach to Least-
Square Estimation, Part 11: Linear Smoothing in Additive
White Noise"; IEEE on Control, AC-13, (655-660).

Kalman, R.E. (1960). "A.New Approach to Linear Filtering
and Prediction Problems"; Trans. ASME, J. Basic Engrg.,
‘QZ, (34-45).

[K-4]

[K'6J

[K-7]

[K-8]

[x-9]

[L-l]
[L'Z]
[M-1]
[M-2]

[M-3]

[rt-41
[u-SJ

[M-6]

116

Kalman, R.E. (1963). "New Methods in Wiener Filtering
Theory"; Proc. lst Symp. on Engrg. Applications of Random
Function Theory and Probability; J.L. Boydanoff and F.
Kozin, Eds.: Wiley.

Kalmsn, R.E. (1969). "Lectures on Controllability and
Observability"; Lectures delivered at CENTRO INTERNAZICNALE
MATEMATIOO ESTIVO (C.I.M.E.), Corso tenuto a Sasso Marconi
(Bologna).

Kalman, R.E. and R.S. Bucy (1961). 'New Results in Linear
Prediction and Filtering Theory"; J. Basic Engr. (Trans.
ASME, ser. D), 83, (95-100).

Kalman, R.E. and T.S. Englar (1966). "A User's Manual for
Automatic Synthesis Program"; Washington, D.C.

Kallianpur, G. and V. Mandrekar (1965). "Multiplicity and
Representation Theory of Purely Non-deterministic Stochastic
Processes"; Teor. Veroyatnost. i Primenen, g, USSR.

Kolmogorov, A.N. (1941). "Interpolation and Extrapolation
of Stationary Random Sequences"; Bull. Acad. Sci. USSR,
Math. Ser. _5_, A translation has been published by the
RAND Corp., Santa Monica, Calif., as Memo. RM-3090-PR.

Leondes, C.T. (Editor) (1970). "Theory and Applications
of Kalman Filtering"; AGARDograph, No. 139.

Liebelt, P.B. (1967). "An Introduction to Optimal Estima-
tion Theory"; Addison-Wes ley.

Mandrekar, V. (1968). "On mltivariate Wide-sense Markov
Processes"; Nagoya Math. J., 3_3_, (7-19).

Mandrekar, V. (1970). "Probability and Stochastic Processes,
Unpublished class notes", M.S.U., East Innsing, Michigan.

Mandrekar, V. and H. Salehi (1970). "Operator-valued Wide-
sense Markov Processes and Solutions of Infinite Dimensional
Linear Differential Systems Driven by White Noise"; Mathe-
matical Systems Theory, 4, (340-356).

Meditch, J.S. (1967a). "Orthogonal Projection and Discrete
Optimal Linear Smoothing"; SIAM J. Control, _5_, (74-89).

Meditch, J.S. (1967b). "On Optimal Linear Smoothing Theory";
Information and Control, E, (598-615).

Meditch, J.S. (1969). "Stochastic Optimal Linear Estimation
and Control"; McGraw-Hill.

[u-7]

[M-8]
[14-1]
[P-l]
[p-23

[R-l]

[R-z]

[a-3]
[s-l]

[s-2]

[3-3]

[3'4]

[W-l]

[w-z]

[w-a]

ll7

Mehra, R.K. and A.E. Bryson (1968). "Linear Smoothing
Using Measurements Containing Correlated Noise with an
Application to Inertial Navigation"; IEEE Trans. Auto.
Control, AC-l3, (496-503) .

Miller, R.S. (1968). "Linear Difference Equations";
W.A. Benjamin, Inc., N.Y.

Nahi, N.E. (1969). "Estimation Theory and Applications";
John Wiley and Sons, Inc., N.Y.

Penrose, R. (1955). "A Generalized Inverse for Matrices";
Proc. Cambridge Fhilos. Soc., _5_1_, (406-413).

Porter, William A. (1967). "Modern Foundations of Systems
Engineering"; Macmillan Company, N.Y.

Rauch, R.E. (1963). "Solutions to the Linear Smoothing
Problem"; IEEE Trans. on Automatic Control, AC-8, (371-
372).

"Max iuum
AIAAJ . ,

Rauch, R.E., F. Tung, and C.T. Striebel (1965).
Likelihood Estimates of Linear Dynamic Systems";
3, (1445-1450).

Royden, ILL. (1968). "Real Analysis"; Mcmillan Company, N.Y.

Sage, A.P. (1968).
Hall, Inc.

"Optimm Systems Control"; Prentice-

Sage, A.P. and J.L. Mepsa (1971). "Estimation Theory with
Applications to Comnication and Control"; McGraw-Hill.

Sherman, S. (1958). "Non-Mean-Square Error Criteria";
Trans. IRE. Proc. Group on Information Theory, IT-4,
(125-126)s

Stear, E.B. and A.R. Stubberud (1968). "Optimal Filtering
for Gauss-Markov Noise"; Int. J. Contr. a, (123-130).

Wiener, N. (1949). "The Extrapolation, Interpolation and
Smoothing of Stationary Time Series"; John Wiley and Sons,
Inc., New York, N.Y.

Wiener, N. and E. Hopf (1931). "On a class of Singular
Integral Equations"; Proc. Prussian Acad., Math.-Phys.
Ser., (696).

Wiener, N. and P. Masani (1957). "The Prediction Theory
of mltivariate Stochastic Processes 1"; Acta Math., 98,
(111-149).

 

 

 

 

N

t‘
.l
:1

‘1

1's!

[w-a]

[w-5]
[w-6]
[z-l]
[2-21

[2-3]

[2-4]

[z-s]

118

Wiener, N. and P. Masani (1958). "The Prediction Theory
of mltivariate Stochastic Processes II"; Acta Math. 22,
(93-137).

Willman, W.W. (1969). "On the Linear Smoothing Problem";
IEEE Trans. Automatic Control, AC-14, (116-117).

Wold, H. (1938). "A Study in the Analysis of Stationary
Time Series"; Sweden: Almgvist and Wiksell.

Zachrisson, L.E. (1969). "On Optimal Smoothing of Con-
tinuous Time Rainan Processes"; Inform. Sci., 1, (143-172).

Zadeh, L.A. and C.A. Dasoer (1963). "Linear System Theory:
The State Space Approach"; M$ru-llill Book Company, N.Y.

Zadeh, L.A. and J.R. Ragazzini.(1950). "An Extension of
Wiener's Theory of Prediction"; J. Appl. Phys., 21., (645-
655).

Zakai, M. (1964). "General Error Criteria"; IEEE, Trans.
PGIT, IT-lo, (94-95).

Zimsrman, W. (1969). "On the Optimm Colored Noise
Kalman Filter"; IEEE Trans. Automatic Control, Ac-l4,
(1%‘196)s

APENDIGES

APPENDIX A

GENERAIJZED INVERSES

This appendix defines and reviews briefly some properties
of generalized inverses of linear operators on a real Euclidean
Space Rn. Before starting with the definition of generalized
inverse we recall some concepts of linear transformations.

Let A be a linear mapping with domain D(A) in the
n-dimensional space Rn into the m-dimensional space Rm. In
the following we shall not distinguish between the linear trans-

formation A and its m X n matrix representation.
(A.1) DEFINITION. (a) The null space of A is the set N(A)
defined by

N(A) - {x \ Ax o, x e D(A)}.
(b) The range of A is denoted by R(A) and is given by

R(A) ={y \ y'AX.x€D(A)} -

It is trivial that N(A) and R(A) are linear subspaces

of Rn and Rm reapectively (cf. [P-Z], p. 94).

(A.2) nmoasu. Let A be a linear transformation from R“

into Rm. Then
(a) R“ - R(AT) e N(A), and Rm - R(A) e N(AT);
an MA) - NAT). and N(AT) - N(A);

119

120

(c) A is one-one mapping of R(A?) onto R(A).
PROOF. (cf. [2-2], Appendix C).

Now, let us turn to the generalized inverse. There are
several ways of defining the generalized inverse of a linear

transformation. Here we have chosen the following.

(A.3) DEFINITION. Let A be a linear transformation on R“

into R“. A+ is the generalized inverse Of A if

AA+

° Pam ’

+
A A PR(AI) .

where PR(A) is the orthogonal projection operator onto the sub-
8pace R(A) (cf. Definition 2.).
Penrose [P-l] has an alternative definition which could

be shown to be equivalent to (A.3).

Some properties of generalized inverse are given in the

following theorem. The proof of this theorem can be found e.g.

([2-2], [K-7]).
(A.4) THEOREM (Properties of A").

(a) A+ is a linear transformation from RP into Rm with the
range R(A") I R(AI) and the null space N(A+) I*N(AI).

(b) <1“)+ - A and (ATV - (IST-

(C) AA+A - A and A+AA+ - A+.

(d) A+-A"1 if A'1 exists.

APIENDIX B

MATRIX INVERS ICU mam

In this appendix certain matrix equalities which are used
in the dissertation will be derived. These results can be found
in the standard textbooks on the estimation theory or control
theory (e.g. see [J-l], Appendix 78).

In the following P, R and ‘M denote n x n,‘m X m. and

m X n matrices respectively.

(3.1) mum: P2 0 and a > o a (1 + 111311'51)’1 -

I - IMT(MPMT + 11)")1 .

PROOF. Since

(I + st'lm (I - memT + 10'1“)

- I + MIR-1M - 311.0131: + 10-111 - mra'lumrmmr + 10411
- I + mTR'l‘M - me'lm «1 11111501111"3 + 11) "111
I I,

and similarly (I - memT + a)'h1) (I + st'lu) - I, by the

definition inverse of a matrix

(I + mTR'lhfl - I - Putnam! + 10'111 . QED

(3.2) mm. P 2 0 and R > 0 a (I + mTR-]M)-1mTR-l :-
memI + R)'1.

PROOF. Multiply (3.1) on the right by MIR-1; Obtain

121

122

(I + mTR'1u)'1mTR’1 - 1:11:11"1 - 11301114T + a)')}1st'1

-1 - -1 -1
- mTR - 111101111.r + 11) 111111311 - lMT(MPMI + R)
- FMT(M1MT + 11.)']1111'1

= PMT(MIMT + 11)’1 . QED