A STOCHASTIC APPROACH TO THE MINIMAL
REAUZATION PROBLEM,

Dissertation for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
JAMES W. CLARY
1975

This is to certify that the

thesis entitled

A STOCHASTIC APPROACH TO THE
MINIMAL REALIZATION PROBLEM

presented by
James W. Clary
has been accepted towards fulﬁllment

of the requirements for

Ph. D degree in Systems Science

 

 

ﬂag;

0-7639

ABSTRACT
A STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM
By
James W. Clary

The goal of this work is to exploit the concept
of a predictor Space in the continuous time realization
problem for systems generating a stationary or analytic
impulse response matrix. The predictor space provides a
natural representation of the state space and a minimal
realization corresponds to selecting a basis of the pre-
dictor space.

A representation problem is considered first.
Given two second order processes y(t) and z(t), with
y(t) measurable and z(t) continuous in quadratic mean
and purely nondeterministic, the predictor space for y(t)
is formed. Under rather general conditions if a finite
dimensional basis for the predictor space exists then
y(t) can be represented as the output of a linear finite
dimensional dynamical system driven by the innovation pro—
cess of z(t). '

Next the realization problem for stationary systems
is considered. Given a stationary impulse response matrix

a stochastic system is defined whose input-output covariance

James W. Clary

equals it. The output of this system generates a finite
dimensional predictor space. The previously developed
representation theory then supplies a solution to the
realization problem. Two algorithms are discussed in the
context of the predictor space approach.

The realization problem for systems generating a
time-varying analytic impulse response matrix is considered
next. The development is similar to that for the stationary
case with the exception of some material on the linear
independence of analytic functions. An algorithm by
Silverman for time-varying systems is discussed in the

context of the theory developed.

A STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM

By

0" ‘

James WPVClary

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering
and Systems Science

1975

ACKNOWLEDGMENTS

The author would like to express his gratitude to
his major professor, Dr. Kwang Y. Lee, for introducing
him to this topic and for his constant advice and en-
couragement during the preparation of this thesis.

Gratitude is also expressed to Dr. Robert O. Barr
for introducing the author to the subject of Systems
Science through his excellent teaching and for providing
valuable assistance during his years of graduate study.

Thanks are also due to Dr. Robert A. Schlueter
for many helpful discussions and much encouragement. The
valuable advice of Dr. V.S. Mandrekar is also gratefully
acknowledged.

Finally thanks are due to the author's wife Kay
for her love, patience and understanding during the time

spent on this work.

ii

Chapter
1
2

TABLE OF CONTENTS

INTRODUCTION
MATHEMATICAL BACKGROUND

2.1 The Realization Problem
2.2 Hilbert Spaces of Random Vectors

MARKOVIAN REPRESENTATION OF CONTINUOUS—TIME
STOCHASTIC SYSTEMS

3.1 General Theory
3.2 Special Cases

STOCHASTIC APPROACH TO THE MINIMAL REALIZA—
TION PROBLEM IN THE STATIONARY CASE

4.1 General Theory

4.2 Silverman's Algorithm
4.3 The Ho-Kalman Algorithm
4.4 Example

STOCHASTIC APPROACH TO THE MINIMAL REALIZA-
TION PROBLEM IN THE ANALYTIC CASE

5.1 General Theory
5.2 The Silverman Algorithm
5.3 Example

CONCLUSION

6.1 Summary of Results
6.2 Problems for Future Research

BIBLIOGRAPHY

APPENDIX: GENERALIZED RANDOM PROCESSES

iii

Page

17

23
24
42
46
47
68
82
92

101

101

114

123

128

128
130

131
134

CHAPTER ONE
INTRODUCTION

Consider an input-output description of an unknown
system in the form of an impulse response or weighting
pattern matrix. Such a description may arise either
through the performance specifications in the case when
one designs a new system, or from experimental output
obtained by imputting delta functions in the case when one
is attempting to identify an unknown system. The problem
of obtaining a state space description of the unknown
system from the impulse response or weighting pattern
matrix is known as the realization problem.

Beginning primarily with the papers of Gilbert
[G-2] and Kalman [K-3] in 1963 the realization problem has
been the subject of considerable research. Gilbert [G-2]
gave a solution to the realization problem for the sta-
tionary case with the restriction that each element of
the transfer function matrix has distinct poles. The paper
by Kalman [K-3] showed that the impulse response matrix
determines the controllable and observable portion of a
system and gives a solution to the realization problem for

stationary systems. His results make clear the importance

of minimal realizations, that is state space representa-
tions having the smallest possible dimension for the state
vector. The solution to the realization problem by

Kalman was presented in.the form of an algorithm which re-
duced the state space of a nonminimal realization until it
became minimal.

A paper by Ho and Kalman in 1965 presented a dif-
ferent approach to the realization problem for stationary
systems. They showed that the minimal realization problem
is equivalent to a certain kind of representation problem
involving an infinite series of real matrices Yk’

k = 1,2,..., derived from the impulse response matrix and
known as the Markov parameters. Their algorithm, while
elegant, is difficult to motivate and explain in the con-
text in which it was originally presented.

A series of papers by Silverman [8-1] and Silver-
man and Meadows [8-2, 8-3] published between 1966 and 1971
presented significant new results on the realization prob-
lem and the closely related problem of equivalent system
representations. Silverman obtained a new algorithm for
the stationary realization problem which generalized to
a large class of time varying linear systems, which he
called "constant rank systems". This class of systems
includes, but is larger than, the class of analytic systems.

The algebraic approach to system theory as presented
in Kalman, Falb, and Arbib [K-4] or Padulo and Arbib [P-l]

has obtained results relating to the realization problem

for stationary systems and analytic time-varying systems,
but at present does not offer an algorithm for time-
varying systems of the generality of Silverman's algorithm.
A new approach to the realization problem, called
the predictor space approach, has recently been presented
by Akaike [A-l] for stationary discrete time systems.
Akaike's approach uses a stochastic system driven by white
noise to obtain a solution to a given realization problem
by the following procedure. He first shows that given two
zero-mean real valued second-order stationary Gaussian
stochastic processes y(n) and z(n) that are mutually

*
stationarily correlated w1th covariance

V(2) = E{y(n + 2)zT(n)}

y(n) can be obtained as the output of a linear stochastic
system driven by the innovation process of z(n) of the

form

v(n+1) = Fv(n) + Gw(n+l) (1-1)
y(n) = Hv(n) + u(n) (1-2)
w(n+1) = z(n+l) - z(n+l|n) . (1-3)

The symbol z(n+l|n) represents the linear least squares
prediction of z(n+l) given z(k), k = n, n-1, n-2,...

It also is shown that

 

* .

E{-} denotes the expected value of the quantity within
the brackets. The symbol (-)T indicates the transpose
Operator.

4
wm=Hﬂmmemn,2=mLzu.. um)

Then given a stationary impulse response matrix W(n),
Akaike defines a stochastic system driven by a Gaussian

white noise process z(n) with unit covariance by

y(n) = Z W(m)z(n-m), n = O,1,2,... . (1-5)
m=O

It is then shown that since the input is white noise
V(2) = w<s> (1-6)

so that using the previously derived representation

(1-1) - (1-3)

w<m> = HFQG, 2 = O,1,2,... . (1-7)

Thus the linear system

x(n+l) Fx(n) + Gu(n+1) (1-8)

Hx(n) (1-9)

y(n)

is a solution to the realization problem for the given
impulse response matrix W(2). The basic concept in
Akaike's approach is that of a predictor space, the space
spanned by the projections of y(k), k = n, n+1, n+2,...
onto the space spanned by z(k), k = n, n-l, n72,... . It
is the structure of this space that makes possible his
representation (l-l) - (1-3).

Using the predictor space approach, Akaike is able

to provide new insight into the realization problem and

the structure of the various algorithms for stationary
systems. The Silverman and Ho-Kalman algorithms for
example are easily analyzed in this context. The pre-
dictor space approach provides a powerful tool, as well
as a unifying overview, for the realization problem.

It is the purpose of this dissertation to gen-
eralize Akaike's predictor space approach to continuous
time systems. The outline of the presentation is as
follows. Chapter 2 provides an introduction to the real-
ization problem with some special results relating to the
stationary case. A section reviewing the necessary back-
ground material on Hilbert spaces of random variables and
the Hida-Cramer representation for purely nondeterministic
stochastic processes is also presented.

In chapter 3 a representation problem for con-
tinuous time stochastic processes is considered. It is
assumed that two continuous time second-order zero mean
stochastic processes y(t) and z(t), with y(t) mea—
surable and z(t) purely nondeterministic and continuous
in quadratic mean are given. The predictor space, the
space spanned by the linear least-squares estimates of
y(t) given {2(1): 1 i t}, is formed. It is then shown
that by assuming a finite dimensional basis for the pre-
dictor space and with appropriate continuity and separa-
bility hypotheses that y(t) can be represented as the
output of a continuous time linear system driven by the

innovation process of z(t). The theory is easily

generalized to the case when z(t) is white noise, and
the chapter concludes with some special developments for
the stationary case.

Chapter 4 uses the representation theory developed
in chapter 3 to solve the realization problem for continuous
time stationary systems. Given a stationary impulse response
matrix a stochastic system is defined whose input-output
cross-covariance is equal to it. It is then shown that the
predictor space contains a finite basis and that the repre-
sentation theory developed in chapter 3 can be applied.

It is shown that this representation provides a minimal
realization for the given impulse response matrix.

Two algorithms, the Ho-Kalman and the Silverman
algorithm, are analyzed for the continuous-time stationary
case. Although Akaike has already done this for the dis-
crete-time case, and as shown in chapter 2 there is
essentially no difference between the realization problem
for the discrete-time and continuous time stationary
cases, the justification is considerably more difficult in
the context of continuous-time systems. The results
demonstrate the power of the predictor space approach. An
example illustrating the Silverman and Ho-Kalman algorithms
and the predictor space approach to them is also presented.

In chapter 5 the representation theory of chapter
3 is applied to the realization problem for systems gen-

erating an analytic impulse response matrix. The procedure

is found to closely parallel the case for stationary
systems with the addition of some material on the linear
independence of analytic functions. Silverman's algorithm
is considered for this case and an example is given.

In chapter 6 a summary, conclusions, and sugges-

tions for further research are presented.

CHAPTER Two
MATHEMATICAL BACKGROUND

This chapter is devoted to basic background
material that will be used throughout the thesis. The
first section discusses the realization problem for linear
systems and some basic results associated with it. Much
of the material is based on Silverman [8-1] and Ho and
Kalman [H—2]. Relevant discussions of the realization
problem may also be found in Kalman [K-3] and Kalman, Falb,
and Arbib [K-4].

The second section is devoted to developing the
concepts associated with Hilbert spaces of random variables
and the Hida-Cramer representation for purely nondeter-
ministic stochastic processes. Basic references here are
Wiener and Masani [W-l] and Cramer [C-2]. Material from

Lee and Mandrekar [L-l] is also used.

1. The Realization Problem

Since the thesis is concerned only with continuous-
time systems, the discussion will be primarily in terms of
continuous-time systems. Consider the linear continuous-

time system representation

9§§El = F<t>x<t> + G(t)u<t> (2-1)
y(t) = H(t)x(t) (2-2)

where x(t) is an n X 1 vector, y(t) an m X 1 vector,
u(t) an r X l deterministic vector, and F(t), G(t) and
H(t) are n X n, n X r, and m X n matrices respectively.
To avoid distracting mathematical complications it will be
assumed here that the components of the matrices F(t),
G(t), and H(t) are continuous functions of time t. The
system representation given by equations (2-1) and (2—2)
will also be denoted by the subscripted triple
<F<c). G(t). H(t))n'

The solution to equation (2-1) for an initial
state xO at time tO is given by (Weiss and Kalman
[W-Zl)

t

x(t) = ¢(t,to)xo + J ©(t,T)G(T)u(T)dT (2-3)

to

where the state transition matrix ¢(t,r) is defined by

the relationship
¢<t.r> = x<c>x'1<r> (2-4)

and X(t), the fundamental matrix for the matrix F(t)

satisfies

9§§£l = F<t>x<t> . (2-5)

From equations (2-2) and (2-3)

10

t
y(t) = H(t)<l>(t,t0)xO + H(t)[ ¢(t,T)G(T)u(T)dT. (2-6)
t
o
By defining
K(t,T) = H(t)®(t,T)G(T) (2-7)
equation (2-6) can be written
I:
y(t) = H(t)<1>(t,to)xO + I K(t,T)u(T)dT. (2-8)
I:
o

The matrix K(t,T) is called the weighting pattern matrix
for the system (2-1) and (2-2). For zero initial con-
ditions, i.e., x0 = 0, equation (2-8) shows that the
weighting pattern matrix K(t,r) contains all the informa-

tion needed to describe the input-output relations of the

system (F(t), G(t), H(t))n.

For a causal system, a system characterized by
the property that the behavior of the system at any time

t is completely determined by events at times t i E

9

the impulse response matrix W(t,T) is defined by

(t)®(t,T)G(T) t Z T
W(t,T) = . (2-9)

The impulse response matrix completely characterizes the
input-output properties of (F(t), G(t), H(t)}n, for zero
initial conditions, viewed as a causal system. For the
case when K(t,T) is analytic, as when F, G, and H

are constant matrices, the impulse response matrix and

11

weighting pattern matrix are equivalent representations
since W(t,T) will have a unique analytic extension equal
to K(t,T) on the region r > t. For the general case,
of course, W(t,T) may have many different possible exten-
sions and K(t,r) is not an equivalent representation to
W(t,r). The problem of computing a state space representa-
tion (F(t), G(t), H(t))n from an impulse response or
weighting pattern matrix is known as the realization
problem.* A more formal and precise definition is given
below after reviewing the concept of equivalent system
representation.

In general there is not a unique solution to a
realization problem. Different realizations of the same
impulse response or weighting pattern matrix may have
quite different characteristics. The concept of equi-
valent representations is thus necessary to fully under-
stand the realization problem. A system (F(t), G(t), H(t))n
is algebraically equivalent to a system (F(t), G(t), H(t))n
if and only if there exists a continuously differentiable

matrix T(t) with det T(t) # O for all t e R such

that
E(t) = T(t)F(t)T-l(t) + égéil T'1(c) , (2-10)
G(t) = T(t)G(t) (2-11)
H(t) = H(t)T-1(t) . (2-12)

 

*

In the stationary case one could also start with a
transfer function matrix. These will play no role in the
thesis except in a tangential manner in chapter 4, and will
not be discussed here.

12

The system (F(t), G(t), H(t))n is said to be strictly
equivalent to (F(t), G(t), H(t))n if (2-10) - (2-12)
hold for a constant nonsingular matrix T. In general
algebraic equivalence does not preserve the stability pro-
perties of a system. For this it is necessary to have
topological equivalence (Kalman [K—31): algebraic equi-

valence plus the conditions

”T(t)”

/\
0

(2-13)

HT'l<t>i|

A
O
N

(2-14)

for fixed constants Cl and C2, where H-H is the
Euclidean norm. It is clear that strict equivalence
implies topological equivalence. It may easily be
verified that algebraically equivalent systems have the
same impulse response or weighting pattern matrix. How-
ever systems that are not algebraically equivalent systems
can also have the same impulse response or weighting
pattern matrix. In general, systems having the same
impulse response, or weighting pattern matrix, are called
w-equivalent, or K-equivalent, respectively. A formal
definition of the realization problem can now be given.

A system representation (F(t), G(t), H(t))n (is said to
be an nth-order impulse response realization if W(t,T)
can be expressed in the form of equation (2-9). If for

some finite n such a system representation exists

W(t,T) is said to be a realizable impulse response matrix.

13

The representation (F(t), G(t), H(t))n is said to be a
minimal impulse response matrix realization if there
exists no W-equivalent system (F(t), G(t), H(t))H' with
H < n. Weighting pattern realizability is defined
analogously.

There is, in general, a difference between the
weighting pattern and impulse response realization problems
in that even if W(t,T) and K(t,T) agree on the domain
t 3 T, they may not give rise to equivalent realizations.
Indeed, the respective minimal realizations may even be of
different dimension. An example of this case is presented
in a paper by Desoer and Varaiya [D-l]. However as already
mentioned when the weighting pattern matrix .K(t,r) is
analytic then the impulse response matrix W(t,T) has a
unique analytic extension to the region t < T equal to
K(t,T). It follows that for this case it makes no dif-
ference whether the impulse response or weighting pattern
matrix is used, the same class of realizations will be
obtained. This dissertation will be concerned solely with
systems generating analytic impulse response or weighting
pattern matrices. The impulse response matrix will be
used primarily but the results would be unchanged if the
weighting pattern matrix were chosen.

The following result due to Kalman [K-3] is basic
in realization theory.

Theorem 2-1:

 

W(t,T) is a realizable impulse response matrix

14
if and only if it is separable in the form
W(t.T) = y(t)6(T). t 1 T (2-15)

where u(t) and 6(1) are continuous matrices of finite
size.
22299

From equations (2-4) and (2-9) the impulse response
matrix W(t,T) of the system (F(t), G(t), H(t))n can be

expressed, for t Z T, as
W(t.r> = 1H<t>x<t>11xil<T>c<r>1 (2-16)

showing the necessity of equation (2-14). The sufficiency

of equation (2-15) follows from the obvious realization

(0.6(t). W(t)). (2-17)
Q.E.D.

Although theorem 2-1 is of theoretical importance, it may
not work well for practical applications. There is first
the problem of determining if a function of two variables
is separable. Also, even if W(t,T) is time invariant
or periodic the realization given by equation (2-17)
will not be. Furthermore the representation given by (2-17)
will never be asymptotically stable. Silverman [S-l]

gives a nice example to illustrate these points. Suppose

e-(t-T) t Z T
W(t.T) = . (2-18)

15
The realization given by equation (2-17) is then
(0, et, e‘t) (2-19)

which is time variable, has an unbounded coefficient, and

is not asymptotically stable. The realization

('1, 1, 1) (2-20)

is much nicer.
Some material from Ho and Kalman [H—2] pertaining
to stationary systems will be presented next. Consider the

corresponding discrete time and continuous time stationary

systems
x(t+l) = Fx(t) + Gu(t) (2-21)
y(t) = Hx(t), t an integer (2—22)
and
§§é51,= FX(t) + Gu(t) ‘ (2-23)
y(t) = Hx(t), t a real number, (2-24)

where x(t) is n x l, y(t) is m x 1, and u(t) is
an r x 1 vector. The stationary impulse response matrix
for the system in (2-21) and (2-22) will be denoted by

Yk and is given by
Y = HF G, k = O,1,2,... . (2-25)

For the system given by (2-23) and (2-24) the stationary

impulse response matrix W(t) is given by

l6

W(t) = He G (2-26)
m k
kt
= H 2 F _T C
m k
k t
k=0 K .
00 ER
-0

Hence, in the case of either a stationary discrete time
system or stationary continuous time system the realization
problem is equivalent to finding a constant system

(F, G, H)n with the property that
Y = HF G, k = 0,1,2, .. . (2-28)

The Yk are called the Markov parameters of the system.

The N1 by N2 block matrix built out of the Markov

 

 

parameters
, ,
Y0 Y1 YNz-l
Y1 Y2 YNZ
SN1N2 = E E 3 (2-29)
_YN1-1 YNl ' - - YN1+N2-ZA

is called the generalized Hankel matrix. Note that each
Yk is an m X r matrix so SN1N2 is an N1 - m x N2 - r
matrix. The generalized Hankel matrix will play an

important role in chapter 4.

17

Note, finally, that from equation (2—26) a stationary
impulse response matrix W(t) can have a realization of

the form (2-23) and (2-24) only if it is analytic.

2. Hilbert Spaces of Random Vectors

Let (Q, A, P) be a probability space. The set
Q is a set of possible outcomes, A is a o—algebra de-
fined on Q, and P is a probability measure defined on
A. The set Q is called the basic space and its elements
are called points. Members of A are called events.

Let G be a basic space and F a o-algebra of
subsets of G. Then the pair (O, F) is called a measur-
able Space. If G is n-dimensional Euclidean space,
denoted R3, and F the smallest o-algebra of subsets of
Rn containing all rectangles, denoted B“, then Bn is
called the n-dimensional Borel o-algebra and the sets in
Bn are called Borel sets. A measurable function from a
probability space (Q, A, P) into (Rn, B“). is called a
random variable. For n = 1 such a random variable is
called a real random variable. This will be the only case
considered in this work and henceforth the term random
variable should be interpreted to mean a real random
variable.

A second-order random variable X is one which

satisfies

E{X2} = j |X(w)|2P(dw) < m . <2-30>
S2

18

A second-order stochastic process {X(t), t e T} is a one-
parameter family of second—order random variables. The

set of all second-order random variables defined on

(Q, A, P) is denoted by L2(Q, A, P). The set

L2(Q, A, P) is a Hilbert space (Gikhman and Skorokhod

[G-3]) with the inner product

<x.y>L = Eixy} <2-31)
2
and norm
HxHLz = /T§T§7 . <2-32)

Let L3(Q, A, P) be the set of all random n

vectors x with components

xi 6 L2(Q, A, P), 1 = l,...,n .

Hence x e L3(Q, A, P) if and only if

J9|x(w)|2P(dw) < w (2-33)
where |-| denotes the Euclidean norm
n
|x|2 = xTx = 2 IX I2 . (2-34)
. 1
i=1

L3(Q, A, P) is a Hilbert space under the usual operations

and inner product

ll
"P15

(X.y) n

(x. y.)
. 1’ 1 L
L2 1

l 2

ll
\—ﬁ
.‘0
Head

lxi(w)yi(w)P(dm). (2-35)

19

This inner product generates the norm

n 1
Hxll n = Wm H = (.2 11x1”: )2 (2-36)
L2 L2 1:]. 2.

which in turn induces a topology on L3(Q, A, P). A
sequence {xk} in L3(Q, A, P) is said to converge to
a vector x e L3(Q, A, P) if and only if
ka - ann + o <2-37)
2
as n + m and to be a Cauchy sequence if and only if

I

Axk - me n + 0 (2-38)
L2

as k,m + m. Two elements x and y are regarded as
identical if the norm of their difference is zero, that is,I
E{[x-y]T[x—y]} = O, and thus P(x = y) = 1. Convergence
in norm in the Hilbert space L3(Q, A, P) is identical
with convergence in quadratic mean in probabilistic
terminology. I

A linear manifold in L3(Q, A, P) is a nonempty
subset M of L3(Q, A, P) such that if x,y e M then
Ax + By 5 M for all n x n real matrices A and B. A
subspace of L2(Q, A, P) is a linear manifold closed under
the topology induced by the norm H-u n' A lemma due to
Wiener and Masani [W-l] is now given :Eich proves the
existence of the projection of an element x and gives

its structure. The space L3(Q, A, P) is abbreviated L3.

20

Orthogonal Projection Lemma (Wiener-Masani):

a) M is a subspace of L? if and only if there is a
subspace M of L2 such that M = Mp, where Mn denotes
the Cartesian product M X...X M with n factors. M

is the set of all components of all functions in M.

b) If M is a subspace of L3 and x 6 L3, then there
exists a unique 9 such that

y e M; Hx-yﬂ < Hx—yH n’ for all y e M. (2-39)

n _
L2 L2

An element 9 6 L3 satisfies the preceeding condition if

A

and only if E{[y-y]xT} = O for all x 6 L3, i.e., y-y
is orthogonal to L3, denoted y-y L L3. The unique
element 9 of the lemma is called the orthogonal projec-
tion of y on L3 and will be denoted by (yILg). For
this 9, 91 = (yile), the orthogonal projection of yi
on to L2.

Consider now the second-order n-variate stochastic
vector x(t) e L3(ﬂ, A, P). The present of the process
x(t) is defined as L3(x(t)), where L2(x(t)) is the sub-
space of L2(Q, A, P) generated by {Xi(t)’ i = 1,2,...,n}.
Similarly the past of the process up to t is defined by
L3(x; t) where L2(x;t) is the subspace of 'L2(O, A, P)
generated by {xi(s), s i t, i = 1,2,...,n}.

To conclude this section some aspects of the Hida-

Cramer theory on multiplicity and representation will be

reviewed (see Cramer [C-2], [C-3], and Lee and Mandrekar

21

[L-ll). A second-order n-variate process {x(t),
a i t i b} is called purely nondeterministic if

n L (x;t) contains only the zero element of L (x;t),
2 2
agtib

i.e., the random variable that is almost everywhere equal
to zero. Denote by Pt the projection operator from
L2(x;b) onto L2(x;t) and assume that x(t) is zero
mean, purely nondeterministic, and continuous in quadratic
mean. For any set of mutually orthogonal random variables
{zl,...,zM} c L2(x;b) the M-variate process z(t), con-
sidered as a column vector with components zi(t) defined

by

zi(t) Ptzi

= (zilL2(x;t)) , (2-40)

will be an orthogonal increment process in that two in-
crements Azi(t) and Azj(s) are always orthogonal if
either i f j or corresponding time intervals are dis-
joint. The set of random variables {zl,...,zM} can be

chosen [K-S] so that for all t 6 [a,b]
L2(X;t) = L2(z;t). (2-41)

The process z(t) = (zl(t),...,zM(t))T is called an
innovation process for the x(t) process. The number
M may be infinite. Define the M X M matrix time-

spectral function F(t) by

F(t) = E{z(t)zT(t)}. (2-42)

22

For a mean square continuous process x(t), with orthogonal
increments z(t), F(t) is continuous and nondecreasing
which implies that F(t). is differentiable almost every-
where. Let L2(F) denote the Hilbert space cf all row-
vector functions g(t) satisfying

b T

Ja g(t)dF(c>g<t> < m <2-43)

with inner product

b
<g.h>F = [ g<t>dF<t>h<t>T. <2-44)
8.

Then there occurs the Hida—Cramer representation
t
x(t) = [a G(t,s)dz(s) (2—45)

where G(t,s) is an n X M matrix function with each
row gi(t,s) e L2(F), i = l,...,n. The innovation z(t)
is not uniquely determined but it's dimensionality M is
unique and is the smallest number for which (2-45) hOIds.
The number M is called the multiplicity of x(t).

Finally it should be noted that for a zero mean
second-order quadratic mean process x(t) the space

L2(x;t) is separable. A proof of this may be found in

Cramer ([C-3], P. 253).

CHAPTER THREE

MARKOVIAN REPRESENTATION OR CONTINUOUS-TIME
STOCHASTIC SYSTEMS

In this chapter the following problem is considered.
Given two real vector valued continuous-time zero mean
second-order stochastic processes y(t) and z(t), with
y(t) measurable and z(t) continuous in quadratic mean
and purely nondeterministic, find a representation of
y(t) as the output of a linear continuous time system
driven by the innovations process of z(t). The concept
of the predictor space, the Hilbert space spanned by the
linear least square predictions of y(t) given z(r),
to i r i t, will be used to derive a solution to this prob-
lem under appropriate assumptions. Essentially the basic
idea is that if the predictor space has a finite basis,
this basis can be used as the state vector of a linear
system generating y(t).

After developing a general theory two special
situations will be considered. The first, called the white
noise case, will look at the representation when z(t) is
white noise. A minor extension to the general theory will
be necessary since white noise is neither second-order nor
continuous in quadratic mean. The white noise case will

be useful for applications.
23

24

The second special case of interest occurs when
y(t) and z(t) are stationary processes with stationary
cross covariance. The development of the properties of

the representation for this case will conclude the chapter.

1. General Theory
Consider two zero—mean real valued second-order

72'
stochastic processes y(t) and z(t) with covariance

Vyz(t,s) = E{y(t)zT(s)} (3‘1)

The vectors y and z are m x l and r X 1 respectively.

It is also required that y(t) be measurable and z(t)

be continuous in quadratic mean and purely nondeterministic.
Denote by L2(z;s) the Hilbert space obtained by

taking the closure with respect to the mean square norm

of the linear manifold spanned by the components of z(r),

r i s. The predictors, denoted y(tls), are defined for

t s to be the vector formed by projecting the components

IV

of y(t) onto L2(z;s). Letting yi(t|s) be a component
of y(tls) with yi(t) the corresponding component of

y(t)

91(tls) = (yi(t)|L2(z;s)), i = l,...,m, s i t. (3-2)

 

*The covariance actually plays no role in this sec-
tion. It will, however, be useful in section 2 and crucial
for the applications in later chapters. E{°} indicates
the expected value of the quantity in brackets while

(-)T indicates the transpose.

25
Equivalently
9(tls) = <y<t>IL§<z;s)>, s g t. <3-3)
The predictors are specified by the relationships

E{[y(t|s) — y(t)]zT(r)} = o, r i s : c,a (3-4)
or

E{y(t|s)zT(r)} = Vyz(t,r). (3-5)

The predictor space L2(y;tls) is defined as the
Hilbert space obtained by taking the closure with respect
to the mean square norm of the linear manifold spanned by
the components oftﬂuepredictors y(tls) for t Z s. In
forming this space s is to be regarded as fixed while
t varies over all values greater than or equal to s. It
should be noted that L2(y;tls) is a subspace of L2(z;s).

Following the example of Akaike [A-l] the assump-
tion is made that the predictor space is of finite dimension.
Since L2(y;t|s) is formed by projecting components of
y(t) onto L2(z;s) with 5 fixed while allowing t to
vary over all values greater than s, a finite dimensional
basis, consisting of n random variables, can be regarded

A
as a function of s. The basis, of dimension n, will be

 

A
The basis could also be regarded as being of the form
{x1(t1),x2(t2),... ,xn(tn)} where xi(ti) is an element of

L2(y;tls), ti 1 s. The point is, for fixed 3 the basis
can be fixed and changes only as s varies.

denoted by the vector
x(s) = (xl(s),x2(s),...,xn(s))T. (3-6)

Letting L2(x(s)) be the Hilbert space generated by taking
the closure with respect to the mean square norm of the
linear manifold spanned by the components of x(s) it is

obvious that

L2<9;tls) = L2<x<s>> . <3-7)

Theorem 3-1:

 

Let s and t be fixed with s i t. Then
y(tls) = A(t,s)x(s) (3-8)

for some m x n matrix A(t,s).
Proof:
Since x(s) is a finite dimensional basis for

L2(9;t|s) then
9(CIS) e L2(9;t|8) (3-9)
implies that
9(CIS) = AX(S) (3-10)

for some matrix A. Since 9 varies with t' and s
then A must also, so equation (3-8) is justified.
Q.E.D.

Corollary:

 

Let s and t be fixed with t i s. Then

27
y(t) = A(t,s)x(s) + u(t) (3-11)

where u(t) is an m X 1 vector orthogonal to L§(z;s).
Proof:

Set

y(t) - §<tls>. <3—12)

u(t)
Then for r i s i t

E{u(t)zT(r)} 311y<c> - 9(CIS)]ZT(r)}

= 0 (3-13)

by equation (3-4). Hence u(t) is orthogonal to Lg(z;s)
and combining equations (3-8) and (3-12) equation (3-11)

is obtained.
Q.E.D.

The next goal will be to develop the dynamic be-
havior of x(t). The following results are directed to
this end.

Lemma 3—1:

 

Let {Xn} be a sequence of scalar random variables

converging in quadratic mean to the process X. Then

lim E{Xn} = E{X}. (3-14)

n+oo
Proof:

Recall the Schwarz inequality

|E{XY}|2 3 E{X2}E{Y2}. (3-15)

28

Let Y = 1. Then
|E{[Xn — X]Y}|2 : E{[Xn - X]z}-E{Y2}. (3-16)
That is
|E{X - x112 < E{[X - x12}. (3-17)
11 — n
But by hypothesis
lim E{[X - x12} = 0 (3-18)
n+oo n
Hence
lim |E{xn - X}|2 = 0 (3-19)
n+oo
which implies
lim [E{X } - E{x}]2 = 0. (3-20)
n+(X) n

Equation (3-20) is equivalent to equation (3-14) so the

lemma is proved.

Corollary:

 

x(t) is a zero mean random variable.

Q.E.D.

Proof:

By definition each component xi(s) of the n X 1
vector process x(s) is an element of L2(z;s). Hence
xi(t) is the limit in quadratic mean of a sequence of
finite linear combinations of components of 2. But 2
is a zero mean stochastic process so finite linear combina-
tions of components of 2 have zero mean. Hence xi(s)

is a limit in quadratic mean of a sequence of zero mean

29

processes. Thus by lemma 3-1, xi(s) and hence x(s)

have zero means.
Q.E.D.

It is appropriate at this point to make a few
remarks concerning measurability since that will be the
next property of x(t) to be considered. Let T be a
Lebesgue measurable parameter set on which is defined a
o-algebra of Lebesgue measurable sets L. Let Q be a
probability space on which is defined a o-algebra of events
A. The symbol L e A denotes the smallest o-algebra
defined* on T X 9 including all sets of the form L X A
such that L e L and A e A. Then a process {X(t), t e T}
is said to be a measurable process** if X(t,w) is a
(t,w) function measurable with respect to L e A. The
assumption at the beginning of the chapter that y(t) is
measurable is in the sense that it's components satisfy
this definition.

Lemma 3-2:

 

x(t) is a measurable second-order random vector.
Proof:

Since a process continuous in quadratic mean has
a measurable modification, z(t) can be taken to be a
measurable process. Then L2(z;t) is a closed subspace

of L2(M), the space of square integrable measurable

 

A
A X B denotes the Cartesian product of the sets A
and B.

AA
The product measure defined on L e A is, of course,
itself a measure (Rudin [R-l], p. 140).

30

functions defined on M = [to,t] X 9, where Q is the
probability space on which {z(r): t0 3 r i t} is defined.
It is known that L2(M) is complete (Rudin [R-l], p. 66),
and since a closed subspace of a complete space is complete
(Simmons [8-4], p. 73), L2(z;t) is complete. Hence every
convergent sequence of measurable square integrable func-
tions in L2(z;t) converges to a measurable square in-
tegrable function in L2(z;t). Since Xi(t) is the limit
in the mean square norm of a sequence of such functions,

xi(t) is measurable and square integrable. Hence x(t)

is measurable and square integrable. Q E D

It is next necessary to review the concept of weak

convergence which will play a key role in the development.
A sequence of random vectors xn in a separable Hilbert
space H is said to converge weakly to the vector x
in H, denoted xn E x, if for every y in H

lim E{yTx } = E{yTx}. (3—21)

n+m n
A key lemma for determining weak convergence is the follow-
ing (Parzen [P-3], p. 272).

Lemma 3-3:

 

A sequence of random vectors xn defined as
members of some separable Hilbert space of random vectors
converges weakly to a vector x if and only if the
following two conditions hold:

(i) For some constant M,

31
E{xTx } < M, n = 1,2,... . (3-22)
nn ‘—

(ii) For some family of vectors {k(t), t e T}

which spans the Hilbert space spanned by {xn, n = 1,2,...}
. . T
11m E{xnk(t)} (3-23)
n+oo

exists for every t e T.

The following theorem and corollary due to Halyo
and McAlpine [H-l] are useful in developing the dynamic
behavior of x(t). The proof of the theorem as given by
Halyo and McAlpine is presented since it will be necessary
in further developments to refer to it. Except for
changes in notation, the following theorem, proof, and
corollary are directly from [H-l]. The separability of
the spaces L2(z;t) and it's subspaces L2(x;t) and
L2(x(t)), although not explicitly mentioned, is necessary
to allow the use of weak convergence.

Theorem 3—2:

 

Let {x(t): to i t} and {z(t): t0 1 t} be second
order measurable vector random processes with zero mean

such that L2(x;t) is a subspace of L2(z;t) for each
avxz(t,s)

'3' at

 

is continuous on {(t,s):tO i s 3 t}, and

{E|([xi(t) - xi(s)]|L2(z;s))|2}35 : BIt-sl, s i t (3-24)

then x(t) can be written as

t
x(t) = x(to) + J W(S)ds + U(t), t g t (3-25)

t
O

32

where U(t) is a process with orthogonal increments such
that U(t) - U(s) is uncorrelated to z(r) whenever
r i s i t, and U(to) vanishes; w(t) and U(t) belong

to L2(z;t) and

 

BVXZ(t,s)
sz(t,s) = at , s i t . (3-26)
Proof:
Let t be the smaller of t1 and t2 and s i t;
then

 

 

IL3<z;t> zT<s> (3-27)

x(tl)-X(t2) T
E{[ tl-tz :12 (3)}

E x(t1)-x(t2)
t1‘t2
sz(tl’s) - sz(t2’s)

= . (3-28)
t1 ' t2

 

The right side of (3-28) converges as t2 tends to t1;

x(tl) - x(t2)
tl-tz

 

hence by lemma 3-3,( |L3(z;t) converges

weakly to, say, w(t1) in L2(z;tl) since its norm is
bounded by B. Define the process U(t) as
t
U(t) = x(t) - x(t ) - I w(s)ds, c < t. (3-29)
0 t o —
0
First note that, using the definition of z(t), (3-26)
follows from (3-28). Now,
t T T
J E{w(r)z (t')}dr = E{[x(t)-x(s)]z (t')}, t' i s i t. (3-30)
3 .

Interchanging integration and expectation, and using the

properties of projections, it can be shown that

33

t
(I w<r>dr|L§<z;s>> = <1x<t>-x<s>1|L3<z;s>>. s 3 t (3-31)
3

<1u<t> - U<s>1IL3<z;s)> = o, s g t. (3-32)

Since U(t) belongs to L2(z;t), U(t) — U(s) is
orthogonal to L2(z;s); and hence to U(t') - U(s'),

s' i t' i 3. Therefore U(t) has orthogonal (i.e., un-
correlated) increments. Also note that since w(s) be-
longs to L2(z;s), U(t) - U(s) is uncorrelated to

L2(w;s), s i t, and x(t ).
0 Q.E.D.

Corollary:

 

Theorem 3—2 remains valid if (3-24) is replaced
by the condition that E{Ixi(t)|2} is bounded on bounded

intervals and

vxz<t,s> = p(t)q(8) <3-33)

where p(t) is an n X 2 matrix function of t alone and

q(s) is an A X r matrix function of 3 alone.

Equation (3-25) will be the basis of the dynamic
representation of x(t). Note that the continuity of

3sz(t,s)
3t
Theorem 3—2 are new assumptions on x(t) and 2(8). The
8Vx2(t,s)
at
the desired representation to exist. Equation (3-24)

 

and equation (3—24) among the hypotheses of

 

continuity of is a restriction necessary for

can be replaced by the separability of sz(t,s) and the

boundedness on bounded intervals of E{Ixi(t)I2}, and

34

these conditions will be satisfied automatically in the
applications to be encountered later.

The next task, which extends the work of Halyo
and McAlpine in this context and plays the crucial role
in the develOpment of the representation theory, will be
to show that w(t) is, in fact, an element of L3(x(t)).
For this the following lemma is necessary.

Lemma 3-4:

 

Let {x(t):tO : t} and {z(t):to : t} be as in

theorem 3-2. Then

x(t )-X(t )
( l 2 |L3(z;t1)) e L3<X<t1))’ t 5 t1 : tz'

 

 

tl-t2 0
(3-34)
Proof:
Clearly
x(t )-x(t )
( :1-t2 2 |L3(z;t))= Eﬁ—tEEx(tl)'Lg(z;tl))
- (x(t2)|L'2'(z:t1))]. (3-35)

By definition x(tl) e L3(z;t1) so (x(t1)|L3(Z;tl))
equals x(tl) and is an element of E3(x(t1)). The
problem is to show that (x(t2)|L3(z;t1)) e E3(x(t1)).
Recall that x(tz) is a basis for L2(y;t|t2)' and
L2(y;t|t2) is an n-dimensional space generated by linear
combinations of components of y(tltz) as t varies over
all values greater than or equal to t2. There must exist

a set of predictors {y(rlltz),y(12|t2),...,y(1£|t2)},

35

2 i n, Tk 1 t2, the set of whose components contains n
linearly independent elements, that is, a basis for
L2(y;t|t2). This basis can be in general different from
x(tz). Also, it must be true that

R
x(tz) = .21 ij(let2) (3-36)
J

3

for some set of n X m matrices Cj, j = 1,... 2. This
follows from the fact that any two bases of the predictor

space are related by a nonsingular transformation. Now

2
<x<t2>|L3<z;tl>> = (jil cj9<rjlt2)|L3<z;tl)> <3-37)
_ g C (( m . m .
— . y(T.)|L2(z,t2))lL2(z,tl))
j=l J J
g m
= jil Cj(Y(Tj)IL2(ZEtl))- (3‘38)

The last equality follows from the fact that

L2(Z;tl) C L2(z;t2) for tl i t2. Continuing,

rap

n
(x(t2)|L2(Z;tl)) Cj9(let1)

j 1

ll
lib/12°

C.A .,t t 3-39
j 1 J (TJ 1)X( l) ( )
and the last sum is an element in L3(x(t1)).. This

suffices to prove the lemma. Q.E.D.

Theorem 3-3:

 

Let w(tl) be as defined in theorem 3-2. Then

w(tl) e L3(x(cl)).

36

Proof:

Note that in the proof of theorem 3-2 w(tl) is
x(t1)"‘(t2)an(z.t )

tl-t2 2 ’ l
in L3(z;tl) as t2 + tl by lemma 3-3 since

x(t )-x(t )
E{( 'lzl'tZ 2 ILr2'(z;tl)) zT(s)} exists for

< < < ‘
tO _ s _ tl _ t2, and Since

T ‘x(t )-x(t )
3(Z;t1)) ( il'tZ 2 |L3(z;tl)) is

 

defined as the weak limit of (

 

 

IL

 

E x(tl)-x(t2)
( t1't2

bounded by virtue of equation (3-24). That

 

i s 2 t1 1 t

x(t )-x(t )
E ( l 2 |L3(z;t1))zT(s) exists for t

 

 

 

tl-t2 o 2
x(tl)-x(t2) n T
implies that E t -t |L2(z;tl) v (5) exists for
l 2
any v(s) e L3(z;s), and in particular for v(s) e L3(x(s)).
5 x(t )-x(t )
. _ l 2 n _ T .
Sett1ng s — t1, E ( tl’tz |L2(z,tl))v (t1) ex1sts
for all v(tl) e L3(x(t1)). Since by lemma 3-4 both
factors in the expected value are elements of L3(x(tl))
the weak limit must exist as an element of L3(x(tl)).
. n
That is, w(tl) e L2(x(t1)). Q.E.D.
Corollary:
Let x(t) and w(t) be as defined above. Then
w(t) = F(t)x(t), t g t. (3-40)

0

Proof:

Since w(t) and x(t) are both elements of

37

L2(X(t)).

W(t) = l.i.m. an(t) (3-41)

11+“)

for some sequence of n‘X n matrices Fn' Since the
limit exists, the sequence of real valued matrices con—
verges to a real valued matrix F(t) and equation (3-40)

is justified. Q E D

Combining equations (3-25) and (3-40) yields
t
x(t) = x(to) + I F(s)x(s)ds + U(t), t0 3 t. (3-42)
to
The next item to be considered is how to relate
U(t) to z(t). By the assumption that z(t) is a
quadratic mean continuous purely nondeterministic process,
it can be expressed as (Cramer [C-2])
t

z(t) = I K(t,s)dv(s) (3-43)
t

o ,
where the k X 1 vector“ v(s) is the innovation pro-
cess for z(t). The innovation process has the important

property that (Kallianpur and Mandrekar [K-5])
L2(z;t) = L2(v;t). (3-44)

Hence U(t) e L3(z;t) implies U(t) e L3(v;t). Thus
U(t) can be expressed in terms of the innovation process

z(t) by (Ivkovich and Rozanov [I-l])

 

A . .
v(s) may be of countably infinite dimen31on.

38

t
U(t) = I G(t,s)dv(s). (3-45)
I:

o

The orthogonal increment property of U(t) implies it is
a wide-sense Martingale process (Doob [D-2], Lee and
Mandrekar [L-l]). Since U(t) is wide-sense Martingale
by theorem 3.3 of Lee and Mandrekar [L-l] it is seen that

G(t,s) is independent of t. Hence

t
U(t) = I G(s)dv(s). (3-46)
t
0

Thus, combining equations (3-42) and (3-46)

t t
x(t) = x(to) + J F(s)x(s)ds + J G(s)dv(s) (3-47)
to to
where v(s) is the innovation process for z(t). Equa-
tion (3-47) can be written in the form
dx(t) = F(t)x(t)dt + G(t)dv(t). (3-48)

Equations (3-47) and (3—48) are regarded as equivalent
representations and have the solution (Mandrekar and
Salehi [M-l])

t

x(t) = ¢(t.co>x<to) + [ ¢<c,r>g<r)dv<r> <3-49)
t

o
where ¢(t,r) is the transition matrix defined by F(t).
Equations (3-11) and (3-48) now yield the repre-

sentation

dx(t)

F(t)x(t)dt + G(t)dv(t) (3-50)

Y(t) A(t.S)X(S) + U(t) (3-51)

39

Setting
H(t) = A(t,t) (3-52)

and letting s = t in equation (3-51), the desired re-

presentation

dx(t)

F(t)x(t)dt + G(t)dv(t) (3-53)

Y(t) H(t)X(t) + U(t) (3-54)

is achieved.
The development to this point is summarized by the
following theorem.

Theorem 3-4:

 

Let y(t) and z(t) be zero—mean real valued
second-order m X l and r X l variate stochastic pro-
cesses reSpectively and let y(t) be measurable and
z(t) be continuous in quadratic mean and purely non-
deterministic. Form the predictor space L2(y;t|s) be
projecting the components of y(t) onto L2(z;s) for

t0 1 s i t. Assume that L2(y;t|s) has an n dimensional
BVXZ(t,s)

8t

tO < s i t, and either sz(t,s) is separable and

basis x(s). Then if is continuous for

 

E{|xi(t)|2} is bounded on bounded intervals, or
iv
E{|(xi(t) - xi(S)|L2(Z;s))|2}2

representation given by equation (3—53) and (3-54) can be

< BIt-sl for 's i t, the

obtained.
The next point that should be mentioned is the

minimality of the dimension of the representation. The

4O

representation is minimal in the sense that since x(t)
is a basis of the predictor space, there does not exist a
vector of smaller dimension than x(t) that contains all
the information in the predictor space.

To close this section it seems appropriate to
comment briefly on some of the assumptions used and their
roles in the derivations. The requirements that z(t)
be continuous in quadratic mean and purely nondeterministic
are to ensure the existence of it's innovation process.
Continuity in quadratic mean may be replaced by the weaker
condition that for all t the limit z(t+o) exists
(Cramer [C-2]). The requirement that sz(t,s) be
separable, equation (3-25), and the necessity of

3sz(t,s)
at
structure to be derived will exist. In the applications

 

being continuous are necessary that the system

to the realization problem Vyz(t,s) will be seen to be
equal to a deterministic impulse response matrix. It is
well known (Kalman [K~3]) that an impulse response matrix
has a system realization if and only if it is separable.

Since
Vyz(t,s) = H(t)VXz(t,s) (3-53)

the requirement that sz(t,s) be separable is a natural
one.
Perhaps the most critical assumption was that the

predictor space have a finite dimensional basis of constant

41

«dimension n. Both the finite dimensionality and constant
dimensionality are worthy of comment. The finite dimen-
sionality of L2(y;t|s) played no essential role in
deriving equation (3-11). The derivation will go through
‘with. x(t) of countably infinite dimension. After that,
‘mathematical intricacies begin to arise if x(t) is
allowed to have infinite dimension, and since the applica—
tions will be for finite dimensional systems only the
restrictions to the finite case was felt to be justified.
The question of constant dimension is a tricky one.
The eventual concern of the thesis is with the realization
problem and, as observed by Kalman (Kalman, Falb, and
Arbib [K-4]), to develop a really comprehensive theory of
realization the dimension of the state space should be
allowed to vary with time. However for the types of
systems to be considered in this thesis considerations of

this nature need not arise. For more general time-varying

 

systems than the analytic systems considered in this paper,

however, the question of constant dimensionality is a

 

significant one.

Finally it should be mentioned that the repre-
sentation theory developed here is somewhat more general
than will be necessary for applications. To be specific
z(t) will always be a Wiener process. The derivation of
the representation is not simplified significantly by

making this assumption on z(t) however.

42

2. Special Cases

In using the representation deveIOped in part 1
to solve the realizationproblem the input process will be
an r X 1 vector zero-mean stationary white noise process.
Since white noise is neither measurable nor second-order
the theory developed in part 1 is not immediately applic—
able. The first consideration here will be to resolve this
problem. Then some results pertaining to stationary pro-
cesses will be developed.

Given a zero mean white Gaussian process n(t)
set

t

z(t) = I n(s)ds. (3-56)
I:

0
Then (3-56) is actually the innovation representation
(Kailath [K-l]) for the Wiener process z(t) which is
clearly a second—order purely nondeterministic process
continuous in quadratic mean. Hence the theory of part 1

applies. The vector dv(s) (see equation (3-43)) can

be replaced by n(s)ds with the interpretation
dz(s) = n(s)ds. (3—57)
Equations (3-53) and (3-54) become

dx(t)

F(t)x(t)dt + G(t)n(t)dt (3-58)

H(t)x(t) + u(t) (3-59)

y(t)

and equation (3-49) becomes

43
t

x(t) = ¢(t,to)x(to) + [ ¢(t,s)G(s)n(s)ds. (3-60)
t

o
In this manner the theory of part 1 is extended to what
will be called the white noise case.

Next consider the situation wherein y(t) and

z(t) are zero mean real valued second order stationary

processes with stationary cross-covariance
Vyz(t—s) = E{y(t)zT(s)}, (3-61)

where y(t) and z(t) are, as before, m X l and r X l
vectors respectively. Furthermore, z(t) is a Wiener
process generated by stationary white noise n(t) with
unit covariance and y(t) is, as usual, measurable. It
will be shown that the representation corresponding to

this situation takes the form of a stationary system

dx(t) Fx(t)dt + Gn(t)dt (3-62)

y(t) Hx(t) + u(t) 1 (3-63)

where F, G, and H are constant matrices. Equations
(3-58) and (3-59) are still valid and the derivation

starts from them. Let t 3 s. Then

t
Vyz(t-s) = E [H(t)®(t,to)x(to) + [t W(t,r)n(r)dr]zT(s)
0
where (3-64)
W(C,T) = H(t)¢(t,T)G(T). (3-65)

The usual assumption that x(to) is independent of z(s)

is made and since both are zero mean processes (3-64)

44
reduces to

t
V (t-s) = I W(t,T)E{n(T)zT(S)}dT. (3-66)
yz t

By assumption
z(s) = J n(r)dr . (3—67)

Hence

 

to to
s
= { W(t,r)dr. (3-68)
it
0
Now from (3-68)
8V (t-s)
YZ _ '
as _ -Vyz(t-S)
= -W(t,s). (3-69)
Hence
W(t,s) = W(t-s). (3-70)

Recall also the assumption that Vyz(t,s) is separable,

that is
Vyz(t-s) = a(t)8(s). ‘ (3-71)
This implies W(t-s) is also separable,

W(t-s) = -a(t)B(s). (3-72)

45

Since W(t-s) is a realizable stationary impulse response
matrix it is analytic in both t and s for t Z 5.
Hence both u(t) and 8(5) must be analytic functions.
Since u(t) and 8(t) are continuously differentiable
then by lemma 5 of Youla [Y-l] W(t-s) has a constant

coefficient realization. That is, W(t-s) is of the form

W(t-s) = HeF(t'S)G .

Thus the representation

dx(t) Fx(t)dt + Gn(t)dt (3-74)

y(t) Hx(t) + u(t) (3-75)

is achieved. This concludes the discussion of the

representation problem.

CHAPTER FOUR

STOCHASTIC APPROACH TO THE MINIMAL REALIZATION
PROBLEM IN THE STATIONARY CASE

This chapter will be concerned with the realiza-
tion problem for stationary systems. The stationary
realization problem may be stated as follows. There is
given a stationary impulse response or weighting pattern
matrix W(t-s) and the objective is to find matrices F,

G, and H such that
W(t-s) = HeF(t‘S)C (4-1)

where F is of minimal dimension. The idea used in the
predictor space approach is to define a stochastic system
having input-output covariance Vyz(t-s) equal to W(t-s).
Then, if a basis of the predictor space can be found, the
results of the previous chapter can be used to write down
a Markovian representation determining a minimal realiza-
tion for the impulse response matrix W(t-s).

The theory developed in the first part of the
chapter will be used in the rest of the chapter to examine
two important realization algorithms. The power of the pre-
dictor space approach is demonstrated by the ease with which

these algorithms are explained in this context.

46

47

I. General Theory
Consider the m X r matrix W(t,T) on which are
made the following assumptions:
A-l) W(t,r) is stationary in the sense that
W(t+o,r+o) = W(t,T) for all t,T, and o.
A-2) W(t,T) is an impulse response matrix,
implying that W(t,r) is separable and

equals 0 for t < T.

t

A-3) I W(C,T)TW(C,T)dT < m, t0 3 t < m.
t

O

The usual abuse of notation will prevail and W(t,T) will
be written as W(t-T) with t i T.
Let u(t) be an r—dimensional stationary

Gaussian white noise with
E{n(t)} = o , E{n(t)nT(S)} = I-6(t-s). <4-1)

Denote by y(t) the m-dimensional output of the stochastic
system defined by
t

y(t) = J W(t-T)n(r)dr (4-2)

to

when the stochastic integral exists. Define the Wiener
process z(t) by
t

z(t) = I n(s)ds. (4-3)

to

Then q(t) is the innovation process for z(t) and equa-

tion (4-2) can be written

48
t

y(t) = I W(t-T)dZ(T). (4-4)

I:
O

The goal here is to justify using the representa-
tion theory deveIOped in chapter 3. To do that it will be
necessary to verify that the hypotheses of theorem 3-4 are
satisfied.

That y(t) and z(t) are both zero mean and
second order is clear. Furthermore, as noted in chapter 3,
z(t) is continuous in quadratic mean as well as purely
nondeterministic. Since W(t-T) satisfies assumption A—3
the stochastic integral in equation (4-2), or equivalently
equation (4-4), exists and y(t) is quadratic mean con-
tinuous and measurable (Wong [W-3], pp. 145-146).

First form the predictors y(tls) by

9(tIS) (y(t)|L?(Z;S))

t
(J w<t-T>dz<T)IL§<z;s>>

t
O

s
I W(t-T)dz(T). (4‘5)

to

The last equality follows from the fact that for T > s
<dz<w)|L§<z;s>> = o <4-6)

by virtue of the orthogonal increment property of z(t).
Forming the predictor space L2(y;tls) suppose that it
has a finite dimensional basis x(s). Then from the

corollary to theorem 3-1

49
y(t) = A(t,s)x(s) + u(t). (4-7)
Since by equations (4-4) and (4-5)

T(tlt) = y(t) (4—8)

the term u(t) defined by y(tlt) - y(t) is not present

when t = s and equation (4—7) becomes

y(t) = H(t)X(t). (4-9)

Equation (4-9) will be useful in showing that the cross
covariance sz(t,s) must be separable. First it is
necessary to compute Vyz(t,s). Using equation (4—4) and
the properties of orthogonal increment processes (Wong

[W-3], p. 144) for t i s i t

PTO

f
Vyz(t,s) E{J W(t-T)dz(T)ZT(S)}

t
O

n

s
dzT(T)}

E{J W(t-r)dz(r)J
t t

O O

S
J W(t-T)dT. (440)

to

By hypothesis W(t-T) is separable in the form

W(t-T) = P(t)q(I) (4-11)

for some m.x 2 matrix function p(t) and A X r matrix
function q(r). Hence

S

Vyz(t,s) = p(t)[ q(T)dT (4'12)

to

50
so Vyz(t,s) is separable. Since
Vyz(t,s) = H(t)sz(t,s) (4-13)

it is clear that sz(t,s) must also be separable.

 

BVXZ(t,s)
It is next necessary to show that at
must be continuous for t0 1 s i t. Combining equations
(4-10) and (4-13)
3
H(t)V (t,s) = I W(t-T)dT (4-14)
xz t
o

and the right hand side of the equation is analytic for
t i T. From the discussion of the stationary case at the
end of chapter 3 the matrix output coefficient matrix
H(t) is time invariant and can be denoted by simply H.
Thus

3

HVXZ(t,s) = I W(t-T)dT (4-15)

t
0

implies sz(t,s) is analytic and hence possesses con-
tinuous derivatives of any order for t0 2 s i t.

Finally to show that E{xi(t)2} for each component
xi(t) of x(t) is bounded on bounded intervals the fact
that in the case of a stationary system a minimal realiza-
tion is observable is used (Silverman [S-l]). Hence the
matrix H must have full rank which implies that the

digaonal elements of HTH cannot contain a zero. Since

E{xT(t)HTHx(t)} = E{yT(t)y(t)}
< w (4-16)

51
it follows that for each i, i = 1,2,...,n,
2
E{xi(t)} < m. (4-17)

Hence the desired property that E{lxi(t)l} is bounded on
bounded intervals must hold.

Thus the necessary hypotheses of theorem 3-4 will
all be satisfied if a finite basis x(s) of the predictor
space L2(y;tls) can be found. The search for such a
basis is now begun.

Set t - T = A. Since W(A) is analytic it has

a Taylor series expansion

w k
_ A ”
W(A) — E Yk ET’ A Z 0. (4-18)
k—O
The coefficients Yk’ k = O,1,2,... are called the Markov

parameters (Ho and Kalman [H-2]) and are given by

= de(A)

—-—E—— , k = O,1,2,... . (4-19)
dA A=O

Using equation (4-18) the predictors can be expressed as

 

s

y(tls) = [t (k: Yk $E§%l—>dz<r>. to g s g t <4-20>

0
Lemma 4-1:
m s
y(tls) = Z ka (t-T) dz(T), to i s i t. (4-21)
k=o Jt '
0

Proof:

The equality in equation (4-21) is in the sense of

mean square equality. Define

52

s n (t-T)k
S = I [ 2 YR ——ET——]dz(T). (4-22)
tO k=o °

From a criterion by Doob ([D-2], p. 429)timasequence of
stochastic integrals Sn converges in the mean to y(tls)
if and only if the integrands converge in the mean to
W(t-T). This criterion is clearly satisfied in this case

and since

the lemma is proved.

Theorem 4-1:

 

The pth quadratic mean derivative of y(tls) with

respect to t exists and is given by

A m Y S
EEX§E%§2 = kg (E:EIT [ (t-T)k-de(T) (4-23)
t =p ' to

for t0 1 s i t.
Proof:
. . apy(t|s) . .
The PER der1vat1ve ex15ts 1n a mean

at
2
8 pVy(t1IS,t2|S)

 

square sense if exists at the point

P P

(t,t) (Papoulis [P-2], p. 317) where

V9(t1|s, tzls) = E{y(tl|s)y(t2|s)T}. (4-24)

Using the representation given by equation (4-5) and the

orthogonal increment property of z(t)

53

s s
VA(t1|s,t2|s) = E{[[ W(tl-T)dZ(T)][J W(tz-T)dZ(T)]T} (4-25)
V t t
o o
S T
= J W(tl-T)W (t2‘T)dT, to < S < t. (4‘26)
t _ _
0

Since W(A) is analytic it is clear from (4—26) that mixed
partials of V9(tl|s,tzls) of any order exist for all tl
Thus QEELELEZ

2' atp
all positive integers p. Since the only portion of

and t exists in a mean square sense for

s
I W(t-T)dz(T) that depends on t is the deterministic

to

portion of the integrand W(t-T), it is clear that for

<
tO _ s i t

pA s
§_y(tls) z [ a _W<t- r) dz(T)
atp t atp

“P
W dZ(I), (4-27)

II
T
IIM8
'U
r-<
7:.
\-——*ﬁ
('1'

where the last equality follows by a proof similar to that

J.

for lemma 4.1.“ Q E D

 

* .
To prove r1gorously that the process y(tls) has the
mean square derivative —2£EL§Z it is necessary to show that

lim “y(t+eis); y(tL§)_ §2§%L§%|= . Doing that for y(tls)

5+0
results in an easy calculation justifyingt equation (4- 27).

Also, for the case when s = t, y(tlt) = I: W(t-T)dz(T) the
t occurring as a limit of integration is toindependent of
the t occurring in the integrand and 8 (tLt) is to be in-

terpreted as 22%%i§ll
s=t

 

54

Stating the result of the theorem in a different way, for

to i s 3 t

 

FYO Y1 Y2

Y1 Y2 Y3
Y2 Y3 Y4
L .

 

 

.1

The matrix consisting of Yi's

dz(T) P
dz(r)
dZ(T)

J L

 

 

9(tIS)

32(tls)

8t

2

3 y(t!s)

8t

 

J

(4-28)

on the left is called the

generalized Hankel matrix (Ho and Kalman [H-2]) and will

be denoted Sm.

importance in this section.

Lemma 4—2:

 

Equation (4-28) will be of considerable

Let V?(t1|s,t2|s) be as defined in theorem 4-1.

Then V9(t1|s,t2|s) is analytic in t1 and t2 for

2 - o

k+m

8 Vy(t1|s,t2|s)

 

Proof:

 

 

E 3k9(t1|3) amyT(t2|S)
‘ k m
3tl 8t2

(4-29)

The analyticity of V9(tl|s,t2|s) is obvious from

the analyticity of

W(A)

and equation (4-24).

Equation

(4—29) is a standard result (see Papoulis [P-Z]. P- 317).

 

Q.E.D.
3k+mVy(t1Is,t2|s)
For notational convenience k m will be
3t 3t
denoted b V (k’m)(t ls t IS) 1 2
y 9 l 3 2 °

55

Theorem 4-2:

 

For t i s i t and t i s i t + T,

O O
m ak-(t S) k
y(t + T|s) = z —Z—Ei—— ﬁT . (4-30)
k= —0 8t °

Since the existence of the derivatives of y(tls)
has already been shown, the first step will be to show the
convergence of the right hand side of (4-30) in the mean

square sense. Define

 

P ak (tls) Tk
s = z ._Ji_1:_. -(4-31)
p k=o at ET
For 9. < p
k p jA j T
E [s-sg][:s -s£:]TE = k(t #1175) 2 —Y——J——3 (‘3 5) ET
9 p k: 2+1 Btk ' j=2+1 at3 J
= a k9(t[s) ajy<§13)T kaj (“‘33)
k,j= -2+1 atk at3 J
j+k

= z V(k j)(t|s, t|s)

k,j=1+1 j‘k'

Tj ATk

P P
k=2+1 j= —2+1 V9

+0
. m (k j) T3
as i tends to m s1nce Z V ’ (tls,t|s) 3T converges.

7‘: joy
Therefore,

 

*
tr is the trace Operator.

56

. _ T
”Sp - s,“ — tr E{[Sp - 3,1[sp - 8,] } (4-33)

+ O

as 1 tends to w. Thus the sequence is Cauchy and hence

<s_<_t,t _<_s:t+T.

. m
n e 1 -
co v rges n L2(z,s) for tO _ 0

To prove (4-30) it will be necessary to show that

m k k
H§<t + Tls) - k2 ﬁ—iiﬁlﬁl iTH = 0. (4-34)
=0 3t °

Now

 

m kA k Q T
a y(tl ) 8 (t )
Z s i ] 9 Is

E [?(t + TIS) g
k=o 3t 8t
1.

vSO'“)<t + Tls,cls) - z vﬁk’£)(tls,t|8> ﬁr
Y k=o y .

= 0 (4-35)

m k
since k2 Vgg’k)(t|s,t|s) %T is just the Taylor series
:0 °

for Véo’£)(t+1|s,tls).

m kA k
Thus y(t+T|s) - Z §_Z£El§l %T is orthogonal to
k=o 3t '

2
a (tgs) for all 2 and so it is orthogonal to
3t

00 k k

X §_2£El§l T . Therefore
k‘ kl

—0 at

57

00 k k 00 jA j
E [9<t+rls)- z MELE), %r][3’(t+*r|s)-oz 3__.Y_(_E_I_Sl ITJT

k=o 3t J=O BtJ J

m kA k
E [y(t+T|s)-kz a—fﬁﬁLglﬁT]9(t+Tls)T
w k
V9(t+TIS,t+TIS) - k2 ng’o)(t|s,t+1|s) iT
=o

= 0, (4-36)

since the second term is again just the Taylor series

expansion of the first. Thus

m k k
i|y(t+T|s)—kz LE—QE—li) 11—,“ = tr E [y(t+1|s)
=0 t '

 

 

00 k k °° j j
- z a (tks) ET][y(t+T|s)- z _g_j__a (‘3 S) ITJT = 0,
k=o 3t ' j=o at3 3
(4-37)
and the theorem is proved. Q.E.D.
Corollary:
Let yi(t+T|s) be the ith component of
9(t+1|s). Then
m Bkyi(t|s) Tk
yi(t+1|s) = z k ET’ t0 3 s i t. (4-38)
k=o 8t °
Proof:
The corollary is only a restatement of equation
(4-30).

Q.E.D.
For notational convenience define the symbol

F(tls) by

58

T k T
F(tls) = [9(tls)T, WW} WCLS) ,...]T. (4-39)

Btk
It is thus the right hand side of equation (4-28). The
vector F(tls) will be regarded as a vector whose com-
ponents are the scalar components of
822(Els) .

at
by F(tls), denoted by S{F(tls)}, consists of all random

y(tIS),ay§EJS) The linear manifold spanned

’

variables which can be written in the form

 

 

 

N 3 l9m.(tls)
a = 2 Ci i (4-40)
1=l 3t 1
for some integer N and real constants Cl""’CN' The
8 19m (t|S> k1
i . 3 y(tls)
symbol k 18 the mith component of k
Bti Bti

The Hilbert space spanned by F(tls), denoted L2(F(t|s)),
is the closure of S{F(t|s)} with respect to the mean '
square norm. The variables t and s are regarded as

fixed when considering L2(F(t|s)).

Theorem 4-3:

 

L2(F(t|s)) = L2(y;t|s), t i s i t . (4-41)

0

Proof:
Recall that when considering the space L2(y;t|s),
s is regarded as fixed while t varies over all values

greater than or equal to s. This is in contrast to the

space L2(F(t|s)) where both t and s are regarded as
fixed. Thus the s in L2(T(t|s)) is the same as that
in L2(y;t|s), but the use of the symbol t must be
correctly interpreted,

That L2(F(t|s)) c L2(y;tls) is trivial since for
any stochastic process a(t) e L2(y;t|s) it follows

automatically that ﬁgét)

 

e L2(y;tls) whenever the
derivative exists in the mean square sense. (Wong [W-3],

p. 79). Hence
S{F(t|s)} c L2(y;tls). (4—42)

Since L2(y;t|s) is closed, taking the closure of the left

hand side yields
L2(F(t|s)) c L2(y;t|s). (4-43)
Now let a e L2(y;t|s). Then by definition

a = l.i.m. a (4-44)
p+oo p

where, for each p, ap is a linear combination of com-

ponents of y(t ls) with t > s. This can be
Pi Pi

expressed as

N
P
a = z c 9 (t ls) , -
P 1:1 91 pji pi (4 45)
where C ,...,C are real constants, t 3 s for all
pl pn i
pi, and ypji(tpi|s) is the pjith component of
9(tp [8). By the corollary to theorem 4-2, for s < t
i

and s < t ,
i

 

 

 

m aky (tls) (t - wk
& (t Is) = 2 pi“ pi <4-46)
pji Pi k=o Btk 1‘!
Hence
N sky <tis)<t - t)k
a = 2 c z 3 k k, . (4-47)
p i=0 pi k=o 3t '
Let
N ak9p, <t|s><cp. - t>k
sN = z Jkk_ 1k, (4-48)
k=o 8t '

Since SN is a linear combination of elements of F(tls),
SN 6 L2(F(t|s)). Since L2(F(t|s)) is closed,

lim SN 6 L2(F(t|s)). Therefore,

N—mo

m 3k? . (tIS) (tp -t)

p .
Jk 1
2 e L (F(tls)). (4-49)
k=o atk RT 2

k

 

Thus ap e L2(F(t|s)) since ap is just a finite linear

combination of such elements. Invoking once again the
fact that L2(F(t|s)) is closed, l.i.m. ap is an element

p+oo
of L2(F(tls)). Thus

L2<9;tls> c L2<r<t|s>). , <4-so>

Corollary:

 

F(tls) Spans L2(y;t|s) for t i s i t.

Proof:

The corollary is immediate from the equality

61

L2(y;t|s) = L2(F(tls)). (4-51)

Q.E.D.
For convenient reference equation (4-28) is re-

produced again here:

 

 

" w r- S (t-T)O g F 7‘!
Y0 Y1 Y2 . . . [t —~C—)—T— dZ(T) y(tlS)
o
s l
(t-T) _ 3 (t S) _
Y1 Y2 Y3 [t "1! dz(T) - ~X§EL—— — F(tls).
o
s 2 2
(t-T) a (t S)
o
L. . . J B . J L. . J

 

 

 

 

 

It has been established by the corollary to theorem
4—3 that F(tls) contains among its components a basis
for the space L2(y;t|s). The basis, though, is as yet
unidentified. However, if a finite dimensional basis for
the linear space spanned by the rows of the Hankel matrix
is found, define the vector x(tls) as the corresponding
elements of the predictor space where the correspondence
is realized by taking the inner product of the row vectors)

of the Hankel matrix forming the basis and the vector

Z(t,s) defined by

S O S
Z(t,s) = [ £E%}l— dzT(T), J £E%}l— dZT(T),... . (4-52)
t .
O O

 

*
The row vectors must be distinguished from the block
row vectors such as [Yk’Yk+l’Yk+2""]'

62

Then x(tls) is the basis for L2(y;tls). Essentially
the idea is that if the predictor space has a finite
dimensional basis, then it can be identified by examining

the Hankel matrix.

Theorem 4-4:

 

Suppose the linear Space spanned by the rows of
the Hankel matrix has a basis consisting of a finite
number of (scalar) row vectors of the matrix. Define the
vector x(tls) as that vector whose components are given
by taking the inner product of the rows of the Hankel
matrix forming the basis and the vector Z(t,s). Then
x(tls) is a basis for the predictor space.

By equation (4-28) a set of components of F(tls)
is linearly independent if and only if corresponding rows
of the Hankel matrix are. Thus by the hypothesis on the
Hankel matrix the components of the vector x(tls) are
a maximal linearly independent subset of the components
of F(tls) and hence span L2(F(tls)). Then by the
corollary to theorem 4-3 the components of x(tls) are
a linearly independent set of random variables spanning
L2(y;t|s). Therefore, by definition, x(tls) is a basis

of the predictor space.
Q.E.D.

The next thing to show is that a finite dimen-
sional basis of the linear space spanned by the rows of

the Soo can be found. It has been shown by Silverman

63

[8-1] that for a stationary system the rank of the Hankel
matrix is equal to the dimension of a minimal realization
of the system if the transfer function of the impulse
response matrix of the system is a strictly proper rational
matrix.* He also showed that for a stationary system, this
is a necessary and sufficient condition for a realization
to exist. Since the assumption of separability made at

the beginning of the section is also sufficient to guarantee
a realization for the stationary impulse response matrix
W(t-s), this implies that the Hankel matrix has finite
rank. Hence a basis for the linear space spanned by the
row vectors can be found and the x(tls) so defined is of
the dimension of a minimal realization of W(t-s). These

results are summarized by the following theorem.

Theorem 4-5:

 

Let W(t-s) be a stationary separable impulse
response matrix satisfying equation (4-2). Then the Hankel
matrix defined by W(t—s) has the prOperty that a finite
dimensional basis of the space spanned by its row vectors
can be found. Furthermore, the vector x(tls) defined
by taking the inner product of these row vectors and
Z(t,s) is the same dimension as a minimal realization of

W(t-s).

 

* . .
G(s) is a strictly proper rational matrix 1f the
numerator of every element of G(s) is of lower degree
than its denominator.

64

Hence, a basis for the predictor space L2(y;tls)
can be found by examining the rows of the Hankel matrix
SN for some N. The basis, however, is a function of
two variables 3 and_’t in apparent contrast to chapter
3 where the basis of L2(y;tls) was considered as a
function of 3 alone. The vector x(tls), moreover,is a
basis of L2(y;t|s) for any value of t greater than or
equal to s. In this sense x(tls) is independent of t
in its role as a basis. The problem arises though that
the derivatives in the representations of chapter 3, for
instance in the representation given by equations (3-74)
and (3-75), are total derivatives. That is, it is nec-
essary to consider

QW=§¥J§1+MH§1£E . (4-53)
S S

at ds

This total derivative is not defined because the term
3% is not defined. In general there is no relationShip
between t and s. If, however, consideration is

explicitly restricted to the case t = s then
—— = 1 (a—sw

and the total derivative is defined. Furthermore by
restricting consideration to the case t = s a basis

depending only on s has been found. Define

x(s) = x(sls) = x(tls)|t=S . (4—55)

65

As discussed at the beginning of the section the criteria
of theorem 3-4 are satisfied since now it has been shown
that a finite basis x(s). of the predictor space
L2(y;t|s) can be found. The representation given by

equations (3-74) and (3-75) is justified,

dx(s) Fx(s)ds + Gdz(s) (4-56)

y(s) Hx(s). (4-57)

No noise term u(t) appears in equation (4-57) since

9(SIS) = y(s) (4-58)

and u(t) was defined by equation (3-12) as

u(t) = y(t) - y(tls).

The solution to equation (4-56) is

S
x(s) = J eF(S'r)cdz(r) _ (4-59)

to

and

s
y(s) = H I eF(S-r)Gdz(r). (4-60)

to

Theorem 4-6:

 

The matrices F, G, and H occurring in equa-
tions (4—53) and (4-54) determine a minimal realization
for the impulse response matrix W(t-s).

Let tO < T < t. Then using equation (4-57)
and the representation

66
n(t)dt = dz(t), (4-61)
it follows that

t
Vyn(t,1) = HE J eF(t—r)Gn(r)drnT(T) (4—62)

to

t
= H I eF(t-r)G6(r-T)dr

t
O

= HeF(t-T)G. (4-63)

'4

7\
Furthermore, for a symmetric delta function, by equation

<4-2)

t
E J W(t—r)n(r)drnT(T)

to

Vyn(t,T)

t T
I W(t-r)E{n(r)n (1)}dr

o
t
= I W(t-r)6(r-I)dr
to
Eﬁgill-,'r= tO or T = t
= W(t-T), to < T < t . (4-64)
0 , T i [to,t]

Hence, for tO < T < t

W(t-T) = HeF(t‘T)G. . (4-65)

 

*
The symmetric delta function is defined by

b %f(s-o), s = b
J f(t)6(t-s)dt = %f(s+°): S = 0 .
a $[f(s-o) + f(s+o)], a < sI< b

O, s f [a,b]

67

That equation (4-65) also holds at T = t0 and T = t
follows from the analyticity of the functions involved.
Since analytic functions defined on an interval of the
real line have a unique analytic extension to the real
line, and analytic functions equal on an interval of the
real line are equal on their whole domain of analyticity,
(Olmstead [0-1], p. 268), equation (4-65) can be extended
to hold for all t and T. In particular it holds for
T = t and T = to. The minimality of the realization

‘follows from theorem 4-5.
Q.E.D.

Thus, the Markovian representation developed in
chapter 3 yields a solution to the deterministic realiza~
tion problem for the stationary case. The following theorem
is interesting in that it shows explicitly that the com-

ponents of x(s) generate the predictor space.

Theorem 4-7:

 

F(t-s)

y(tls) = He x(s), t i s i t. (4-66)

0

Proof:
Using equations (4-5) and (4-59), and theorem
4-6,

s
y(tls) J W(t-r)dz(r)

to

H [8 eF(t-r)Gdz(r)
t

o
= HeF(t'S)JS eF(S-r)Gdz(r)

to

= HeF(t‘S)x(s). Q.E.D.

68

This completes the results for this section. It
is to be emphasized that the material developed here should
not be regarded as an algorithm, but as a theoretical
setting which is useful in analyzing and understanding the
realization problem and associated algorithms, as will be

demonstrated in the next several sections.

2. Silverman's Algorithm

In this section the predictor space approach will
be used to analyze Silverman's realization algorithm for
stationary systems (Silverman [S-2]). Akaike [A-l] has
previously done this for the discrete-time case. The
analysis for continuous time systems is considerably more
complex.

The analysis starts from the assumption that the

dimension of the system is somehow known.

Theorem 4-8:

 

Suppose it is known that the impulse response
matrix W(t-T) can be realized by a system of dimension
n. Then the search for a basis of the predictor space can

be limited to within the space spanned by'the components of

 

" atn:l

n-l T
[9(cls>T,—$——|—3 g; ”T... 3 Wm”? .
Proof:
Since W(t-T) can be realized by a system of
dimension n, theorem 4-6 can be used to represent the

predictors as

69

s
y(tls) = J HeF(t'r)Gdz(r) (4-67)

t
0

where F is an n x n matrix. Then

p S
§_2i£l§l = I HFPeF(t-r>cdz(r). (4-68)
atp tO

If p = n, then the characteristic equation of the

n x n matrix F,
F“ = z A.FJ (4-69)

can be used to write

n n-l s .
3—2££L§l = Z AjI HFJeF(t-r)Gdz(r)

 

 

 

atn j=o tO
n-l j
= z A. E—Yiiiil . (4-70)
j=o J atJ
Then
3n+1y(t|s) = “£1 1. 31+13<t15)
atn+l j=o J at3+1
j=o J 3tJ+l
n-l j
+ *n-l z A. ﬁ—2££L§l (4-71)
n-l jg I
= .5 Yj 3 (3' S) '
j=o 3t

Continuing by induction it is clear that all elements in

T
the vector [y(tls)T, £3-2§%-L§)——,...]T are linearly

7O

dependent on the components of the first n block ele-

n-l T
ments [y(tls)T,...,a y(tls) 1T.

n-l
at Q.E.D.

Corollary:

 

The search for a basis of the linear space
spanned by the row vectors of the Hankel matrix may be
restricted to the first n block rows. That is, it is

only necessary to look at

 

 

F 1
Y0 Y1 Yk
Y1 Y2 Yk+1
Sn,w = . . . . (4-72)
Yn-l Yn Yn+k-l
" .1

Proof:

The corollary is immediate from the theorem.
Q.E.D.

It is equally easy to see that attention may be

a) o
3

restricted to the first n block columns of Sn

Indeed, since

Yk = HFkG (4-73)

then again using the characteristic polynomial of F

= n
Yn HF C

ll
:13
"Ml

= X A.Y.. (4-74)

71

Clearly, by induction, for p 1 n the block entry Yp

can be written as a linear combination of YO""’Yn-l'

Thus Sn m has only n linearly independent block

9

columns and it is only necessary to consider

 

 

'- n
Y0 Y1 o o o Yn-l
Y1 Y2 . . . Yn
Sn,n = . . . . (4-75)
an_1 Yn . . . an-zj

Lemma 4-3:

k ' T . k+' .
BE y(tlt) 013” $0}: (-1)JE{a 373$”) nT(t)}.

 

 

 

at:IE dt:J at
j,k = O,1,2,... . (4-76)
Proof:
dj T(t)
The quantities ——ﬂ—j—— , j = O,1,2,... have a
dt

meaning only in the sense of generalized random processes.
A discussion of generalized random processes and proof of

the lemma will be found in the appendix.
Q.E.D.

Theorem 4-9:

 

k j T j
3 202 t) d n (t) _ (-l) - _
t t (4-77)

Proof:

From equation (4-27) for any nonnegative integer p

72

t P
a(t t) T(t) = E I §_E££_ll nT(T)dT n T(t) (4'78)
t Btp
O
t P
= J 8 W(E'T) 5(T‘t)dT
tO at
P
= % a W(t-T)l (4-79)
Btp T=t

Thus from equation (4-19)

E__2££l£2. T(t) = a Y'. <4-80)

atp p
Combining equations (4-76) and (4-80) yields equation

(4-77).

 

 

 

 

 

 

 

 

 

Q.E.D.
Corollary:
For all positive integers j and k
r \
F9<clt> T
ay<r|t)
3t
k-l T
at <c>- d”T d(t), .< 1>k'1 d Q_{t) )
dt
83'1ygtit)n
J—
LL at J J
p
7
Y0 Y1 Yk-l
Y1 Y2 Yk
= l . . . .
2 . . . (4-81)
LYj_1 Yj . . . Yj+k-2
Proof:

The corollary is a direct consequence of theorem 4-9.
Q.E.D.

73

At this point it is appropriate to present Silver-
man's algorithm for stationary systems (Silverman [S-ll).
The algorithm assumes that integers j and k can be

found such that

rank Sjk = rank S = n, R = 0,1,2, .. . (4-82)

j+l k+2

The integers j and k are usually taken to be the
smallest values such that equation (4-82) holds, but this
is not necessary for the algorithm. The integer n de-
fined by equation (4-82) is the dimension of the minimal
realization of the impulse response matrix generating

the Hankel matrix.

Given Sjk’ let M denote the n X k - r sub-
matrix formed from the first n independent rows of Sjk'
Define the n x k - r submatrix M of Sj+l,k in the
following way. Denote the scalar rows of the Hankel matrix
by Ya , a = l,2,3,... . If Y2 is one of the rows

selected for the matrix M, then the row YR (m is

+m
dimension of output) is selected to be the corresponding
row in the matrix M. The following four submatrices of

Sj+l k are then defined.

A -- The nonsingular n x n matrix formed from
the first n independent columns of M.

K -— The n x n matrix occupying the same column
positions in M as does A in M.

Al -- The m x n matrix occupying the same column
positions in Slk as does A in M.

A2 -- The n x r matrix occupying the first r

(dimension of input) columns of H.

74

Then setting

F = XA'l, G = A2, H = AZA'l (4-83)
yields a minimal realization of the impulse response
matrix
W(t-s) = ueF(t‘S)G. (4-84)

After a preliminary lemma, equation (4-81) will
be employed in examining the Silverman algorithm in the

context of predictor space theory.

Lemma 4-4:

 

If equation (4-82) is satisfied by the Hankel
matrix S. for some j,k, and n, then the matrix S

Jk nn
satisfies equation (4-82).
Proof:

Since the dimension of the realization is n,
from the proof of theorem 4-8 and the discussion following
its corollary it is clear that the n + lst block row of

Sn+1 n is but a linear combination of the n block rows

of Snn' Similarly the n+l,...,n+2 block columns of
Sn+l,n+l are but linear combinations of the n block
columns of Sn+1,n' Hence
rank Snn = rank Sn+l,n = rank Sn+l,n+2’£ = O,1,2,...
(4-85)
Q.E.D.

The only purpose of this lemma is to justify working with

the specific matrix Snn in the following analysis. This

75

is desirable since, as it is known, the realization
being sought is of dimension n or less. Theorem 4-8

guarantees the existence of a finite dimensional basis
*
within the vector [y(tlt)T, —2££iEl—,. .., M2(t£t>m
at“
Now from equation (4-81) picking the matrix M

of the first n linearly independent rows of Snn is

clearly equivalent to picking the first n linearly

T an— 12

independent elements of [y(tIt)T,3 g: t) ,..., %(§IC)T]T
- Btn

which amounts to picking a basis x(tlt) = x(t) for the

 

predictor space. This is formalized in the following

theorem.

Theorem 4-10:

 

If a Hankel matrix Snn satisfies equation (4-82)

 

with rank Snn = n, then the corresponding vector
[y(tlt), T—2££:£l—,. .. n-Ey(tlt)T ] . contains exactly
at“

n linearly independent elements and these form a basis
for the predictor space L2(y;tlt).
Since Snn satisfies equation (4-82) the dimen-

sion of the realization is n, theorem 4-8 applies, and

the vector [y(tlt)T, —2£ELEl—,.

 

V

My(tL§)T]T . contains
at“

 

* V
The point is that although Silverman's algorithm may
work with the Hankel matrix S'k’ j < n, k < n, and the

analysis to be carried out here can still be done, it would
be unjustified from the viewpoint of the theory developed
in this work.

76

among its components a basis of the predictor space
L2(9;tlt). Hence a maximal set of linearly independent
components from this vector will constitute a basis.

From equation (4T 81) the vector
n—l T T
[>(tlt)T, —2££LEL—, ...,3 gfflt) must have exactly
at
n linearly independent elements when the rank of S

 

nn

is n. Hence the n linearly independent elements from

 

[%(tlt)T,—2£ELEl—,...,19(tlt)H] form a maximal linear
Btn
independent set and thus a basis for the predictor space.
Q.E.D.

Denoting the basis formed picking the first n

linearly independent elements of .
n-l
[9(tlt)T, MEL“ . 3 y(tlt)1]T by x(tlt) = x(t),

" at“
it is obvious from equation (4-81) that

 

T n- 1 T
= 2E{x(t)[n'r(t), _93_tlt:;,...,<-1)n-1 d dZnSEYD

(4-86)

 

If xi(t) is a component of x(t), then the element m

rows beneath it in the vector

 

ax (tlt)
[9(tlt)T, —2£ELEl—,. .. ,W:Z(tLt)T]T is just* -—1§E———.

Thus the rows of the matrix H correspond to the vector

M and 30

3t

 

*Note that once again the partial is understood as

—§$£l£—- -§$£l§—48_t. A careful distinction must be made

between —§$ELE— and 9§é%L§l .

r4 = 23{§§§§—L31[nT (t), :1” Lit),...,<-1)n'1 d

ExamEle:
(4-76) explicitly for the case

234

r

91mm
92(t|t)

9m(tlt)

391(tlt)

a‘t

39$(tlt)

3 t1

 

 

 

 

 

77

 

 

j =
73:40: | t)
9(tlt)
/ 8x(t|t)
/ at
[LQLWZQLH,,nr(c)1,'d"%m
‘36
y11 Y12
y21 y22
yml ym2
ym.+1,1 ym+1,2 '
ym+2,1 ym+2,2 °
yn-m 1 yn-m,2

11— l dnT(t)]
tn- 1]

}. (4-87)

Writing out the matrices and vectors of equation

 

k = n yields
..... .3” LTﬁ]
J
' W
yl,(n-l)-r
y2,(n-l)~r
. Y0, . ,Yn-l

Vm,(n-1);r
Ym+1,(n-1).r
ym+2,(n—l)or

 

yn-m,(n-l)-I‘
.J

78

The are the actual elements of the Hankel matrix,

yij
not the Markov parameters. Suppose the two starred rows
of the Hankel matrix are the rows of M. Then the corre-
sponding starred elements of the vector of 9's are the
elements of the basis of L2(y;tlt), i.e., x(tlt). The two

checkmarked rows of the Hankel matrix are then the rows of

H and the two corresponding checkmarked elements in the

column of 9's are just §§%%LEL. Picking the n x n

matrix A corresponds to selecting the first n independent

 

T n-l T
elements from the vector [nT(t),r—g-g—t-ﬂ—EQ-H..,(~l)n-1 d n iti].
dtn‘
The resulting vector will be called v(t). Thus
A = 2E{x(tlt)vT(t)}. (4-88)
Similarly the n x n matrix K is just
’A‘ = 23{9—’%§it—) vT<t>}. <4-89)
The m x n matrix Al is given by
A1 = 2E{y(tlt)vT(t)}. (4-90)
Finally, the n x r matrix A2 is given by
T
A2 = 2E{x(tlt)n (t)}. (4-91)

Now using equation (4-54) and the fact that y(t|t) = y(t)

79

AlA'l = 2E{y(t)vT(t)}[2E{x(t|t)vT)t)}]-1 (4-92)
= HE{x(t|t)vT(t)}[E{x(t|t)vT(t)}]-l (4-93)
= H . ‘ (4-94>

Next, using equation (4-56)

A2 2E{x(tlt)nT(t)}

t
= 2E{J eF(t-r)Gn(r)drnT(t)} (4-95)
to
t
= 2 J eF(t-r)G6(r-t)dr
to

= 2-% eF(t-t)G

= G . (4—96)

Before computing the final product, K A-l, note

that K involves the term axgi t). It will be necessary
to interpret, and find an expression for, this term. Let
xi(t|t) be a component of x(tlt). Then from equation

(4-81), for some integer p i n

 

 

< I > 3‘33“”) <4 97)
x. t s = -
1 atp
where 9j(t|s) is a component of y(tls). Since from
equation (4-27)

px s p _

3 Z(CIS) = [ 3 W(t r) dz(r), (4-98)

atp t atp
o
denoting the jth row of _————?—— by J
at atp

80

s apw.(t-r)
xi(t|s) = I J dz(r) (4-99)

P
to. at

 

The following theorem can now be proven.

Theorem 4-11:

 

ELL—g: *3 = Fx(tlt) (4400)
where §§§%L£l is interpreted as iiéEL§l|s=t'

Proof:

From equation (4-99)

3 apwj(t-r)

 

xi(t|s) = I dz(r).

3
t0 3t
Then the vector x(tls) can be represented as

S
x(tls) = J M(t-r)dz(r) (4-101)

t
0

where M(t-r) is an n x r matrix whose ith row is of

 

BPW.(t-r)
the form J p for some p i n and some j between
at
l and r. Hence,
3
3§$§L§l = J éﬂﬁilil dz(r), (4-102)
at t at
o

and

3x(t|t) = ax(t|s)
3t at
It ”-

to

 

s=t

gE-r) dz(r). (4-103)

 

81

But from equation (4-59)

t
x(tlt) = [ eF(t-r)G dz(r).
)t
0

Then identifying the integrand in equation (4-59) with that

in equation (4—101)

t F(t-r)
gig—E-Lgl = { W— G dz(r) (4'104)
1t
-o
t F(t-r)
= F I e G dz(r)
t
o
= F x(tlt)
Q.E.D.

Now computing the final product

A— A.1 = 2E{§1§—E—Lt—) VT(t)} [2E{x(tlt)vT(t))]-l (4-105‘)

E{Fx(t| t)vT(t)} [E{x(t| t)vT(t)}] '1
FE{x(t|t)vT(tl}[F{x(t|t)vT(tij]'l

= F.

Thus, in the context of the predictor space approach
to realization, Silverman's algorithm turns out to be a
rather straightforward result. The same machinery that

was used here will next be used to consider the Ho-Kalman

algorithm.

3. The Ho-Kalman Algorithm

In this section the predictor space approach is

used to analyze the Ho-Kalman algorithm (Ho and Kalman

[H-21). A summary of the algorithm is provided first. A

neat and concise summary has been provided by Tether [T-l].

As with the Silverman algorithm it is necessary

that equation (4-82) be satisfied.

to be the case.

by

 

Let 0K

Yi+j-2

 

 

This will be assumed

Then the algorithm proceeds as follows

1)
equation (4—82).

2)

such that

PS

where I
n

Pick N1

0 O

and N

stant defined by equation (4-82).

ﬁn are given by

such that

The matrices

8

Find nonsingular square matrices

is an n x n unit matrix and n

NlNZ

denote the shift operator defined

YK+j-l

K+j

YK+i+j-2

 

(4-106)

satisfies

and Q

(4-107)

is the con-

Un and

83

I

_ n N _ , _
Un - 0 (Nlm x n), U — [In 0] (n x N2 r) (4 108)

3) Let E be the block matrix

In

EIn = [IIn Om ... Om] (m x le) (4-109)

and Er be
Er = [Ir 0r ... Or] (r x Nzr) (4-110)
4) Then

F = ﬁ [JTP o s Q JTJU (4-111)

n NINZ n
_ ~ T T
G - Un[J P sN N Er] (4-112)
1 2
H = E [s Q JTJUn (4-113)

m NINZ
yields a minimal realization of the impulse response matrix

generating the Hankel matrix SN N
l 2

For the analysis the specific Hankel matrix Snn'
will be used. As shown in the previous section Snn

satisfies equation (4-82). Using equation (4-81), equation

(4-107) from step 2 can be written

ff y(tlt) ’ W
32(t|t)

3t

2P E< : n T(t) ~3____ )-n l d dtn-](-t) Q

an’19(tlt)

 

 

 

 

= . (4-114)

84

Partition the matrices P and Q into the form

31

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r r
P11 P12 Pln Q11 Q12 an
P21 P22 P25 Q21 Q22 an
P = . . Z , Q = . .
P P P Q
L n1 n2 nanmxnm Lin n2 anJnrxnr
(4-115)
where P.. is an m x m matrix and Q.. is an r x r
13 13
matrix. Then (4-114) becomes
rr-n - I‘W
1-1 7 F’n . i-l 1
5 P11 3 yifit) Z <-1>1'1QT d V (t)
i=1 at 1:1 11 dtl-I
n 1'1 n . i-l
2 P21 3 yifit) Z (~1)1'10? d ? (t)
1‘1 3t i=1 ‘12 cit-1"I f 7
I O
- n
2E< . ? =
° 0 O
2 P ' 9* 1 z (-l)1'lQT d 3—(t)
i=1 n1 atl- i=1 1n dtl-l
lKL A L .J J
(4-116)
Equation (4-116) will be shown to imply that the first n
F . 1
g P . 31-1?(tLt)
i=1 ll atl'
n ai'l9(tlt)
3 P21 i-l
i=1 3t
elements of the vector 2 are linearly
n P ai’ly(tjt)
Z ni ati-l
i=1
L 4

 

 

85

independent and hence form a basis for the predictor space.

Theorem 4-12:

 

Equation (4-116) holds only if the first n ele-

 

 

 

 

 

 

 

 

 

 

 

 

2 Plia —1
i=1 at1
n 31-19(t]t)
2 P21 i-
i=l 8t

ments of the vector I are linearly

Z Pni i-l

lel at J
independent. , n . 1
Proof: Denote the scalar components of 2 Pl' y§t%t)
i=1 1 iatl'
by ui, i = l,...,n.m. n
X P19(tLt)
i=1 2i lati'1
? P . 9(ELF)
Li=1 “1 BtL- J
Also denote the scalar components of
Z ('1) 1-1 Q11'°'°' Z (‘1) i-I Qin
i=1 dt i=1 dt
by vi, i = 1,. .,nr.
Then by equation (4-112)
ij; j, k= l,...,n -
2E u jvk . (4-117)
= n+l,...,nm

Suppose the first n components of the vector

[u1,u2,...,unm]T are not linearly independent. Then there

86
exists al,...,an, not all equal to zero, such that

aiu = 0. (4-118)

i

HMS

i 1

Suppose a. is nonzero. Then

J
n Oti
U. = - Z a“ Ui . (4‘119)
3 i=1 j
ia‘j
Thus for k = 1,2, ..,n
2E{ujvk} = -2131 a; E{uivk}
i#j
n 0.1
= Z $51k
i=1 3
i#j
O.
= -2 61: , (4-120)
j

Hence, for equation (4-117) to hold Gk must be zero for

k = 1,2,...,nj,k # j. Then by equation (4-118)
. . = 0. 4-121
aJuJ ( )
Since
E . . = 1 4-122
{anJ} ( )

it must be that aj equals zero, contradicting the assump-

tion of linear dependence. Thus the theorem is proved.
Q.E.D.

Corollary:

 

The first n components of the vector

87

form a basis for the predictor space.

 

 

From theorem 4-10 the vector

T -l T
[F(tlt)T. §2§%i£l~,...,any(tlt%] contains exactly n

atn—l

linearly independent elements which form a basis of the

 

predictor space. Since the first n elements of the vector

 

 

 

 

 

F -_ 1
n 81 19(tlt)
5 P11 1-1

1’1 at

n i-l

2 P . 3 g(tLtl

. 21 1-1

1=l at

are linearly independent they are
'-l

§ P . 31 ?(tlt)

i=1 “1 atl'
L ..1

simply a nonsingular linear transformation of the first

basis and hence another basis.
Q.E.D.

The basis defined by the above corollary will, as usual,

be denoted by x(tlt).

*
Next consider the quantity P(oSnn)Q. From

 

*
The symbol 0 is the shift operator defined above.

88

equation (4-106)

 

 

Y1 Y2 Yn
Y2 Y3 Yn+1
GS = : Z :
nn
Yn Yn+1 Y2n-l
L. .J

From equation (4-81)

 

 

 

 

 

 

V

 

r- 1 W
ay"(t|t)
8t
32" t t)
3t
T n—l T
° T -d - d
2E2 n (or—3&3)“ .,<-1>n 1 “,3th
. dt
an t t)
atn
LL H J
Y1 Y2 Yn 1
Y2 Y3 Yn
uYn Yn+l an-lJ
Again partitioning the matrices P and Q

(4—115)

(4-123)

(4-124)

as in equation

 

 

 

 

 

F
VII 1
3 d
2 P11 (tit) Z ( 1)i 10:1
i=1 3t i=1
2 p oalygtlt) §_i-1QT d
i=1 21 atl i=1 12
PoSan = 2B 4
n ix
.2 Pnia vgtit) Z (-l)i-1QEndi
L1=1 at .J Ll=l
k
Since
r.n i 7 rhn
z: p . LEE—IL) 2 p ,‘19(tJlt>
1=l 11 3t1 i=1 11 Btl
= E.
at
‘21P . a1 (t t) 2P dz(tlt)
i=1 n1 at1 i=1 ni i-l
L J L
it is clear that the first: n elements of the vector
1
in 3i t t)
=1 Pli atl
g P . 31A t t)
.= n1 1
L1 1 at J
are simply §§%%i£l = §§§%i§l
s=t

 

 

 

 

hand side of equation (4-111).

 

 

 

 

 

 

 

T‘I

(4-125)

 

 

(4-126)

Now consider the right

 

 

 

 

 

 

 

~ T T _
UnEJ PoSanJ JUn —
1.? P1. 31A(t tf1
i=1 1 at1
[I 0] In 0 2E 2 i 1 d1 12T(t)
n ( 1) Qﬂ
O O . 1=l dtl
g P . 31A t.t)
K.Li=1 “1 atl' j
n . i— l T I
- d
., Z (_1)1 l 2 £t)Q1n n
i=1 dt 0
r W F .‘
3x(t[t) I
= [In 0] 2E< [VT (t) 0]} “ (4-127)
0 - 0
k7 J J L .1

 

 

 

 

T . .
where v (t) con51sts of the f1rst n scalar components

of the l X r . n vector

i- l nT n . i- “In T
”if§> Qil,..., z (-1)1'l d ”_<t)Qin

1 dt i=1 dti

 

 

From theorem 4-11 iéé§1£l can be replaced by Fx(tlt).

Hence
Fx(tlt) I.
Un[JTPoS nQJT]Un = [In 0123 . o [VT(t) 0] on (4-128)
’ T
Fx(tlt)v (t) O In
= 2t:n 0] E ]
o o 0
= 2F E{x(t|t)vT(t)} . , (4-129)
= F . 1 (4-130)

from equation (4-116). This proves equation (4—107).
. T . .
Next cons1der UnEJPSnnEr]. Om1tt1ng some steps

which would be redundant

Uan P SnnE

]

Ht-l

9l

 

x(tlt) T ~dnT(t)
[In @3213 0 E1 (t).——d-———t ,
r“ j
I
(-1)n'1 dn 1"T(t1] or
. dtn- 1 r
Lor.J

 

 

2E{X(tlt)nT(t)}

t
2E{] eF(t-r)Gn(r)drnT(ti}
t

O

eF(t-r)G6(r-t)dr

(4-131)

(4-132)

(4-l33)

where the representation in equation (4-56) was used to

obtain the equality (4-132).

Finally consider

T
Emrsnnq J JUn.

dundant steps are omitted

B [S Q JT JUn =

mnn

[Im Om

 

i- -ln T

Again some re-

n_(t)Qil

 

 

 

 

F9<t|t> 7
33(tlt)
3t
. o J 2E ' z (-1)1 1 d
m ° i=1 dt
(tit)
L Btu-1 '11 o I
i- 1 (ii-173T“)Q n n
dtl'1 1“ L9 0 o

 

2E{9(tlt)vT(t)}

(4-134)

(4-135)

92

where vT(t) is again the first n components of the

vector defined in (4-127). Continuing,

T T
Em[San J Jun H - 2E{x(t|t)v (t)}

H-I (4-136)

from equation (4-110).

Thus the Ho-Kalman algorithm follows by a straight-
forward analysis using the predictor space approach. As
noted by Akaike [A-l] the basis of the algorithm is that

it provides a linear transformation of the vector

T n-IA T
[y(tlt)T.§2§%LEl-,.. 3 yftLt)] such that the first

., n
at 1
n components of the transformed vector are a basis of the

predictor space.

4. Example

An example is presented here to explicitly
demonstrate the Silverman algorithm, the Ho-Kalman algorithm,
and several aspects of the predictor space approach. The

example starts with the system

x1(t) 2 1 x1(t) 1 0 u1(t)
= + ;
x2(t) O 1 x2(t) O 2 u2(t)
(t 1 1 t
yl ) = Xl( ) (4_137)
y2(t) 0 .1 X2(t)

Let

W(t-T) =

Also, the Markov parameters

= K-—
YK HPG—

2
O

k

2
2

0

eZ(t-T)

282(t-T)

2e

Y are given by

K

2k+l

t-T

(4-138)

.(4-139)

(4-l40)

Part I: Silverman's algorithm.

The Hankel matrix for the system (4-137) is

N

2 4 8 8 16 16 32

4 32
2 o 2 o 2 o 2 o
3
2

iN
I
O b O

 

 

 

8 16 l6 32 32 64 64
O 2 O 2 O 2 O

A

16

 

 

*

32 32 64 64 128
2 o 2 o 2 o 2 o
32—_32 64—128 256
2 o 2 o 2 o 2‘ o
16 32 64 64 236 512
o 2 o 2 o 2 o 2 o 2 o

32 64 64 128 256 256 512

 

 

l6 128

 

 

'ca c~l<3 thCD ca
re a: b0 .6

I. .1

(4-141)

 

..a
C"
,...:
O"

64 128 156

000
N
O

 

 

-128 256

 

g

128

b.)
N

512

 

 

128 512 1024 1024

 

 

 

 

 

 

 

o o . ’
l I ‘ J
where the dashed lines partition the matrix into its

Markov parameters It can be determined by inspec-

Yk'
tion that

94

rank 81 1 = rank 82,l+2 = 2, R = 0,1,2, .. . (4-142)

,

Hence, by equation (4-82) it can be determined that the
dimension of a minimal realization of this system is two.

The matrix M is just 811’ that is

l 2
M = (4-143)
0 2
and
_ 2 4
M = (4-144)
0 2

Clearly A, A1 and A2 all equal M and A equals M.

Then, using equation (4-83),

 

r
2
= , (4-145)
0 .
L.
1 2
G = A2 = , (4-146)
0 2
and
H = AlA 1 = 1. (4-147)

The nonsingular transformation
1 l
O 1
defines a strict equivalence between the system
(F, G, H)2 and the system given by equations (4-137) and
(4-138), (F, C, H)2.

95

Part II: The Ho—Kalman algorithm.

Once again the algorithm starts with

I ' 2
S - q
11 o 2
since 811 satisfies equation (4-82). It is first

necessary to find square matrices P and Q such that
PSIIQ = 12. (4-150)
It is easily seen that

l -l l O
Q = , P = (4—151)
0 1/2 0 l

satisfy this condition. In this particular example Un’

Un’ J, Em, and Er are all 12’ From equat1ons (4-111) -

(4-113)
F = PoSllQ
1 o 2 4 1 -1
0 1 o 2‘ 0 1/2
2 o
= , (4-152)
0 1
G = P811
1 2
= (4-153)
0 2

and

96

CDI'-‘CDl---|

This is the same realization as was arrived at via the

Silverman algorithm.

0 l/2

In general these algorithms will

yield equivalent, but not the same, realizations.

Part III:

The predictor space approach.

(4-154)

This part of the example illustrates the Silverman

algorithm in the context of the predictor space approach.

Equation (4-28) becomes for this example

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 2 I 2 4 I 4 8 8 I6 16
o 2 o 2 o 2 o 2 o
.... ....I .... ._.I. _ ...JL-
2 4 4 8 8 16 16 32 32
o 2 I o 2 o 2 o 2 o
4 8 I 8 6 16 32 32 64 64
0 2J o 2 o 2 o 2 o
8 16 I16 32 32 64 64 128 128
o 2 I o 2 o 2 o 2 o
L ‘ I I '
fyi<t|s> ’1
22(tIS)
3Y1<tIS)
I: at .
8Y2 (t IS)
at

 

Ffs

to

‘s
Ito
s
fto

S
ft0(t-r)n2(r>dr
fi'1721—‘”1<T>dT

O

f8
tO

 

 

01(T)dT
n2(T)dT

(t’T)n1(T)dT

(t-T)2

<t-r)2

n2(T)dT

 

(4-155)

97

By inspection it can be determined that

 

 

 

 

P P (n S k 00 S k

9 (tIS) 1 2 2k (t-T) n (T)dT + Z 2k+1 (t-T) n (T)dT

1 k=o t k' 1 k: t k! 2
0 - O o
m s, _ k
92(tIS) 2 X I ££E%l—-n2(r)dr
k=o tO

391(tIS) _ A

——-;E——— 2y1(tls)

3? (tlS)

——2_3—E—_ 512(CIS)
L J

 

(4-156)
Clearly 91(tIs) and 92(tls) form a maximal linearly
independent subset of F(tIs) and hence are the basis

x(tls). Then from equation (4-156)

 

 

Fayl(t|s)7
3X(CIS) = T
at
392 (.tIS)
u at J
- 2 O 91(tls)
0 1 92(tls)
2 O
= [ x(tls). (4-157)
0 1

From theorem 4—11 this shows that

2 O
F = . (4-158)
0 l

 

98

Assume now that the system input vector n(t)
has independent white noise components n1(t) and n2(t),
each with unit covariance. Then the vector vT(t), used
in analyzing Silvermanis algorithm, consisting of the first
n independent components from the vector
n-l T

T
[6T(t),léﬂa§El,...,(-1)n‘1 d d 2_{t)] is just nT(t).
t

 

Hence

2E{x(tIt)vT(t)} =

 

 

 

 

 

 

 

For» t _ k 00 k+ t ._ k I' “T
Z 2kI ££k%l_nl(1)d1 + Z 2 II ££—%%—n2(T)df1 n1(t)
k=o 1t k=o t
o 0
2E
2 E t (t’T)k ( )d ”
k! n2 T T n2(t)
k=o to
L J L. J
F 7
00 t k 00 t k
z 2kI ££:%%—5(T-t)dT z 2k+1I £E%%l-6(T-t)dT
k=o t ' k=o t
o o
= 2 m t (t_T)k
O 2 Z I ——ET——5(T‘t)dT
k=o t
0 J
L
l 2
‘0 2
-= G. ' (4‘159)

Similar computations can be used to show that

2E{-§-x—§%LQ vT(t)}[2E{x(tIt)vT(t)}] '1 = F

and

2E{y(tIt)vT(t)}I2E{x(tIt)vT(t)}]-1 = H

but it is not necessary to do so for this example. The
matrix H is I follows from the fact that y(tIt) =

x(tIt) and F was determined by inspection from equation

(4-156).

Part IV:

An alternate approach to this example which also
illustrates some of the predictor space concepts is as
follows. Using the impulse response matrix given by
equation (4-139) the output of the stochastic system de-

fined by equation (4-2) is given by

t e2(t-T) 292(t-T) ”1(1')

(t-T)
t0 0 28 T12(T)

y(t) = dT (4-160)

where n1(T) and n2(1) are again assumed to be independent

white noise processes with unit covariance. Then

(S e2(6-1) 2e2(t-T)

n1(T)
9(tIS) = I

_ 64. (4-161)
to o 2et T

n2(T)
It can be easily seen by actually carrying out the dif-

ferentiations that for k = O,1,2,...,

k

k 2 o
2.21EI21 = y(tls), (4-162)
0 1

3t

100

Hence y(tls) is seen to form a maximal linearly in-
dependent subset for the vector F(tls) and hence is a
basis x(tls) for the predictor space L2(y;tls).

Again noting that vT(t) = nT(t) the realization
given by Silverman's algorithm is easily derived. Since

y(tlt) = x(tIt) and vT(t) = nT(t), using equation (4-161)

2E{x(tIt)vT(t)} =

t e2(t-T) 282(t-T) ”1(1)

2E dTInl(t),n2(t)]

t-T
to O 28 712(1')

t e2(t-T) 2eZ(t-T) 5(t-T) O
2 dT

to o zeT‘T o 6(t-T)

= G . (4-163)

The matrix F may be similarly computed from
28{§§§%L‘1 va} [2E{x(t|t)vT(t)}]-l

or simply read off from equation (4-162). The matrix

H is the identity matrix since y(tlt) = x(tIt) implies

2E{y(t|t)vT(t)}[2E{x(t|t)vT(t)}]-1 = I. (4-164)

This example concludes the discussion of the
realization problem for stationary impulse response

matrices.

CHAPTER FIVE

STOCHASTIC APPROACH TO THE MINIMAL REALIZATION
PROBLEM IN THE ANALYTIC CASE

In this chapter the minimal realization problem
for analytic impulse response matrices is considered. The

development is similar to that for the stationary case.

1. General Theory
Consider the m x r matrix W(t,T) on which are
made the following assumptions:
A-l) W(C,T) is analytic in both t and T
for t Z T,
A-2) W(C,T) is a realizable impulse response

matrix,
t

A-3) I W(t,T)TW(C,T)dT < w, t0 1 t < w.
t

0
Following the procedure in chapter 4, denote by

y(t) the medimensional output of a stochastic system
driven by the r-dimensional input n(t) where n(t) is

a white noise process with
E{n(t)} = o . E{n(t)nT(S)} = I a(t-s>. (5-1)

The output of the stochastic system is defined by

t

y(t) = I W(t,T)n(T)dT <5-2)

to

101

102

when the integral exists. Define the Wiener process
z(t) by
t

z(t) =‘I n(S)ds (s-3>

0
so that n(t) is the innovation process for z(t) and
(5-2) can be written
I:

y(t) = I W(t,T)dZ(T). (5-4)

t
O

The goal here again is to justify using the re—
presentation theory developed in chapter 3. To do this it
is again necessary to verify that the hypotheses of
theorem 3-4 are satisfied.

That y(t) and z(t) are both zero mean real
valued second-order processes is clear. Since z(t) is
quadratic mean continuous and hence measurable and since
W(t,r) satisfies A-3 the integral in equation (5-2)
exists and y(t) is also quadratic mean continuous and
measurable (Wong [W-3], pp. 145-146).

Next form the predictors y(tls) by

y(tls) <y<c>lL§<z;s)>

t
I W(t,r><dz(r)IL§<z;s>>

t
O

s
I W(t,T)dz(T), (5-5)

to

where the last equality follows from the fact that

103

<dz<r>IL§<z;s>> = o. T > s (s-e)

by virtue of the orthogonal increment property of z(t).
Forming the predictor space L2(y;t|s) suppose that a
finite dimensional basis x(s) can be found for it. Then

from theorem 3—1

?(tIS) = A(t,S)X(S). (5-7)

From equations (5-4), (5-5), and

2(tlt) = H(t)x(t) (5-8)

it is the case that

y(t) = H(t)X(t). (5-9)

Equation (5-9) will be useful in showing that sz(t,s)
must be separable. First it is necessary to compute
Vyz(t,s). Using equation (5-4) and the orthogonal incre-

ment property of z(t) for t0 i s i t

t
HII W(t,T)dz(T)zT(sI}
t

O

t S
EII W(t,T)dz(T) I dZT(T{}
t t

O O

Vyz(t,s)

S
I W(t,T)dT. - <5-10)
to

By hypothesis A-2 and theorem 2-1 W(t,T) is separable

in the form

W(t,1) = F(t)Q(T)- (5-11)

104

Hence

8

vyz<t,s> = p(t) I q<r>dr <5-12)

t
0

so Vyz(t,s) is separable. Since

 

Vyz(t,s) = H(t)sz(t,s) (5-13)
it is clear that V (t,s) is also separable.
xz 3V (t,s)
It is next necessary to show that th must

be continuous for t i s i t. Combining equations (5-10)

0
and (5-13)

8
H(t)VXZ(t,s) = It W(t,T)dT (5-14)

0
and the right hand side of the equation is analytic for

t Z s 1 T 3 to. Hence H(t) and sz(t,s) must be
analytic also in this region and thus possess continuous
derivatives of any order in this region.

Finally it is necessary to show that the components
of E{x(t)Tx(t)} must be bounded on bounded intervals.
Since the components of H(t) are continuous and y(t)
is a second-order process, from equation (5-9) x(t)
must also be second-order, which automatically yields the
desired property.

All the prerequisites of theorem 3-4 have been
established except for the existence of a finite basis for

the predictor space. The search for that basis for the

most part parallels the stationary case. Since W(C,T)

105

is analytic it has the Taylor series expansion in T, for

t i T,
co k - - k
mm) = z MEL—Tl _ £151. to _<_ ¥ 5. t. (5-15)
k=o 8T T=T ’

There is no problem about the derivatives of W(t,T)
existing at T = t since W(t,T) has a unique analytic
extension to the region t < T and this extension can be
used to ensure that the derivative is well defined.

Using equation (5—15) the predictors can be expressed

as

(T-T)k
——ET——)dZ(T). (5’16)

 

 

 

S (X)
y(tls)-I (zﬁwigin _
tO k=o 81 T=T
Lemma 5-1:
m k, S ’ k
y(tls) = Z §_E£EL12 - .I £Iﬁ%l—'dZ(T), (5-17)
k=o 31 T=T CO
to i s i t
f i t
Proof:

The proof is by the same procedure as for lemma

4‘1° Q.E.D.

Theorem 5-1:

 

The pth quadratic mean derivative of 19(tIs)

with respect to t exists for tO i s i t and is given by

ap2(t|s) = : 8p+kW(t,T)

atp k=o BtPBTET

 

 

....-

106

Proof:
The proof of this theorem is virtually identical

to the proof of theorem 4-1.
‘ Q.E.D.

8p+kW(t,T)

atpark T=f

noted by WP k(t,¥). Stating the result of the theorem in

 

For ease of notation will be de-

a different way, for t i s i t and f i t

 

 

 

 

 

o
r _ _ _ S _- 0 i
W0’0(t,1) W0,l(t,r) w0,2(t,1) “r'jt (T T) dz(r)
w ' w ' w ’ {80(T'¥)l d
1,0(C,T) 1,1(t,T) 1,2(t,T) Jt Z(T)
- _ _ 80(T-T)2
W2,O(C,T) W2,l(t,r) W2,2(C,T) ... It __7T__ dz(r)
o
L a L J
F ?(tIS) 1
8 (t s)
at (5-19)
= 822(t!s) '
at
L ' J

 

 

Equation (5-19) will be abbreviated
S(t.¥)Z(¥.S) = F(tIS)

with the corresponding definitions for S(t,¥),Z(¥,s),
and F(tls).

Lemma 5-2:

 

Let Vy(t1|s,t2Is) be defined by

107
Vy(tl|s,tzls) = E{y(tlls)yT(t2Is)}. (5—20)

Then V?(tl|s,t2|s) is analytic in t1 and t2 for

tl,t2 1 s 1 to and

k+p k p T
a V (t s,t ) a (t s) a (t s)
9 1| zls _ E —? 1| 9 2| . (5-21)

 

k p k p
atlatz Btl 3t2

 

 

Proof:

From equation (5-5)

s s
T
E [It W(t1,1)dz(1)]|:ft W(t2,T)dz(T)]

o 0
(5-22)

V?(t1|s,t2|s)

s
I W(tl,T)WT(t2,T)dT, t i s i t.

to

0

Hence the analyticity of Vy(t1|s,t2|s) with respect to
Cl and t2 follows from the analyticity of W(t,1).
Equation (5-21) is a standard result (see Papoulis

[P-Z], p. 317).

 

Q.E.D.
Theorem 5-2:
For t0 1 s i t and t0 1 s i t + T
m k k
y(t + Tls) = z é—Ziﬁiﬁl-ﬁT . (5-23)
k=o 3t
Proof:
The proof is the same as for theorem 4-2.
Q.E.D.

Forming the space L2(F(t|s)) as in chapter 4 the

following analogue to theorem 4—3 occurs.

108

Theorem 5-3:

 

L2(F(t|s)) = L2(y;t|s), c i s 3 c. (5-24)

0

m:

The proof is identical to that for theorem 4-3.

Q.E.D.

As in chapter 4 it has been established that
F(tls) contains among its components a basis for the pre-
dictor space L2(y;t|s). The basis, though, is as yet un-
identified. However if a finite dimensional basis for the
linear space spanned by the scalar rows of the matrix
S(t,f) is found*, define the vector x(tls) as the
corresponding elements of the predictor space where the
correspondence is realized by taking the inner product of
the row vectors of S(t,?) forming the basis and the
vector Z(t,s). Then, as in chapter 4, x(tls) is a basis

for the predictor space.

Theorem 5-4:

 

Suppose that the linear space spanned by the rows
of S(t,¥) has a basis consisting of a finite number of
(scalar) row vectors of the matrix. Define the vector
x(tls) as that vector whose components are given by taking
the inner product of the rows of S(t,¥) forming the basis
and the vector Z(t,s). Then x(tls) is a finite dimen-

sional basis for the predictor space.

 

*
Recall that when considering the space L2(T(t|s)),

both_ 3 and t are regarded as fixed. Hence t in
S(t,T) is regarded as fixed also,

109

Proof:
The proof is completely analogous to that of

theorem 4-4.
Q.E.D.

The next thing to show is that a finite basis of
the linear space spanned by the rows of S(t,¥) can be
found. By assumption A-2 and theorem 2-1 it follows that

W(t,I) is separable in the form

W(t,T) = p(t)q(r). (5-25)

Since W(t,T) is analytic in both variables so are p(t)
and q(t) (Kalman [K-BJ). Furthermore the columns of
p(t) and rows of q(t) may be taken to be linearly in-
dependent and the number of linearly independent columns
in p(t) and rows in q(t) is an integer n uniquely
determined by W(t,T) (Youla [Y-ll).* Consider the

matrix

T -|T -IT‘-| -
[Ho,p(t,r): wl,p(t’T): W2,p(t,r):...] —

 

dp T(;) pT(t)E de(t)E dZPT(t) E... . (5-26)
dTp dt at?

By a theorem by Chen [C-l] from the fact that pT(t) is

an n X m analytic matrix with linearly independent rows

 

*
In this case the decomposition (5-25) is said to be
globally reduced. It can be shown (Youla Y-l ) that n
is the dimension of a minimal realization of W(t,T).

llO

 

 

T
rank[pT (t>' dpdét)' d2 Pét) i...] = n. (5-27)
I d t n '
This immediately implies that
k+1pT
rankQ T(t): d (t): ... < n, k = O,1,2,..., (5-28)
kdtk . dtk+1 : —

since the matrix in (5-28) is a submatrix of the one in

(5-27). Equation (5-27) implies that

 

rankaT(t):__ﬂ__(_:__d T(t :dpq2T(T) dZPTét):... <
drp ' dt ' —
(5-29)

That is,

|/\
:3
A
U1

I

L2)
0
v

)p

rank w: p(t, ¥)' WT p(t,f) Ew§,p(t,¥)i...
Next define
mp(t) = rankEﬂT (t,¥): WT (t,¥):--% W (t,¥{], (5-31)
q 0.? . 1 u
and
m3 = 33p m2(t) . (5-32)

Then by another theorem by Chen [C-l] if a is the least

integer such that m3 = m§+l = v, then

m5 = m5 = v, 8 = a+l,a+2,..., (5'33)
and

a i n - mg + l, a i v 3 a-m . (5-34)

Since 1 i mg i n, by equation (5-34) a i n. Hence it has

lll

been shown that

r

W0,0(t,f) w0,1(t,%)

W1,O(C,T) W1,1(t’T)
rank «

 

 

Fw0,0(t,¥) w0’1(t,¥)

wl O(tﬁ) wl,l(t,¥)

= rank . i n. (5—35)

LWn-1,0(t:T) wn-1,1(t'T)

 

 

' .J
Thus a basis for the linear space spanned by the rows of

S(t,?) can be found among the rows of the matrix

 

 

r — - - a
w0,0(t,T) WO’1(C,T) w0,2(t:T) ’
W1’0(t,¥) Wl,l(t,¥) wl 2(tﬂ)
Sn m(t,¥) =
Lyn-1,0(t’T) wn-l,l(t’T) wn-l,2(t’T) . . “J
(5-36)

The following theorem is now justified.

Theorem 5-5:

 

Let W(t,T) be an analytic realizable impulse response
matrix. Then the matrix S(t, f) defined by W(t,T) has
the property that a finite dimensional basis of the space
Spanned by its row vectors can be found. Furthermore the

vector x(tIs) defined by taking the inner product of

112

these row vectors and Z(t,s) is a basis for the predictor

space L2(y;tls).

Hence a finite dimensional basis for the predictor
space L2(y;tls) can be found. As in chapter 4 it is
necessary to restrict the basis to the case x(sIs) =

dx(tls) .
T be defined. All Of

x(tls)|t=S in order that
the criteria of theorem 3-4 have now been shown to be

satisfied and the representation

dx(sls) F(s)x(sls) + G(s)dz(s) (5-37)

y(s) H(S)X(SIS) (5-38)

is justified. The solution to (5—37) and (5-38) can be

represented as

s
x(sIs) = J ¢(s,T)G(T)dz(T) (5-39)
to
S a
y(s) = H(s) J @(s,T)G(T)dz(T) (5-40)
t
o

where @(S,T) is the transition matrix for equation (5-37)

satisfying the property

EESELLL = F(s)¢(s,r)

as

Theorem 5-6:

 

The matrices F(s), 6(3), and @(s,1) occurring
in equations (5-39) and (5-40) determine a minimal realiza-

tion for the impulse response matrix W(t,T).

113

Proof:

 

Let tO < T < t. Using equation (5-40) with dz(r)

replaced by n(T)dT

t

v (t,w) H<t5E I ¢(t,r)G(r)n(r)dr nT(T)
Y” t
O

t

H(t) J ©(t,r)G(r)6(r-T)dr
t

O

H(t)®(t,T)G(T). (5-42)

But using equation (5-2) for tO { T < t

t T
Vyn(t,T) - E It W(t,r)n(r)dr n (T)
o
= W(t,T). (5-43)
Hence for tO < T < t
W(t,T) = H(t)¢(t,T)G(T). (5-44)

By the same argument as for theorem 4-6 equation (5-44)
can be extended to hold for tO i T i t. Hence the repre-
sentation (5-37) and (5-38) determines a realization for
W(t,T). The realization must be minimal since by the dis~
cussion preceeding theorem 5-5 the number of components in
x(t|s) is less than or equal to n, the dimension of a

minimal realization of W(t,T). Since x(t|s) does

determine a realization, it has in fact n components.
Q.E.D.

114

Corollary:

 

Let n be the dimension of a finite dimensional
basis of L2(y;tls). Then rank Sn’m(t,¥) = n.
The corollary is proved by the discussion at the

end of the proof of theorem 5-6 and equation (5—19).
Q.E.D.

This completes the results for this section. In the next
section the algorithm by Silverman is considered for the

case of a time—varying analytic impulse response matrix.

2. Silverman's Algorithm for Time-Varying Systems

This section uses the predictor space approach to
analyze Silverman's algorithm for time-varying systems in
the case where the system impulse response matrix is
analytic. Silverman's algorithm for the time-varying case
is actually stated for a class of systems larger than just
those having an analytic impulse matrix. Hence it must be
realized that the analysis presented here does not reflect
the full generality of the algorithm.

Define the matrix

r ,
w0’0(t,r) W0,l(t,T) . . . WO,N2-1(t'T) 1
w1,0(t,1) Wl’1(t,r) . . . W1,N2_1(t,T)
S (t,T) =
N1N2 .
W (t,T) W _ (t,T) . . . W _ _ (C,T)
L N1-1,0 N1 1,1 N1 1,N2 l 4

 

 

(5-46)

115

for N1,N2 = 1,2,... . Then the following theorem by

Silverman [S-Z] may be stated.

Theorem 5-7 (Silverman) -

 

Suppose that for some finite positive integers
2, k and n (2 i n,k i n) the matrices S£k(t,r) and
SQ+1,k+1(t’T) have rank n for all t 3 T and a fixed
n x n submatrix A(t,r) of S£k(t,r) also has rank n
for all t 3 I. Then W(t,1) is realizable as an nth
order uniformly observable and uniformly controllable

system.

The realization is given by the following construction
which parallels that for the stationary case. Let M(t,r)
denote the n x k z r submatrix formed by the first n
linearly independent rows of S£k(t,r). Define the
n x k - r matrix H(t,T) as a submatrix of 82+1’k(t,r)
in the following manner. If Sj(t,T) is one of the scalar
rows of S£k(t,r) selected for the matrix M(t,T), then
the row sj+m(t,r) is selected as the corresponding row
in the matrix M(t,r). The following four submatrices of
S£+1,k(t,r) are then defined.
A(t,T) -- The n x n submatrix of M(t,T) having
rank n for all t 3 T.
K(t,r) -- The n x n matrix occupying the same
column positions in H(t,r) as does A
in M(t,T).
Al(t,T) -- The m x n matrix occupying the same

column positions in Slk(t,1) as does

116

A(t,T) in M(t,T).
A2(t,1) -- The n x r matrix occupying the first
r columns of M(t,r).

Then setting

F(t) = X<t.t)A‘1<t.t>. G(t) = A2<t,t>.

H(t) = Al(t,t)A-l(t,t) (5-47)

yields a minimal realization of the impulse response
matrix W(C,T).

It should be noted that theorem 5-7 gives a suf-
ficient condition, but not a necessary condition for a
realization to exist. Furthermore the minimality of the
realization is equivalent to controllability and observa-
bility ([S-l], Theorem 24). The properties of uniform
controllability and uniform observability are much stronger
than controllability and observability however [S-h].

The necessary machinery for the analysis of the

algorithm is developed next.

Theorem 5-8:

 

Suppose it is known that the analytic impulse
response matrix W(t,r) can be realized by a system of
dimension n. Then the search for a basis of the pre-
dictor space can be limited to the components of the vector

Pn(t|s) defined by

 

T n-l T
Fn(tls) ==[y(t|s)T» §2£%%§l—,...,a ayéfis) ‘J. (5-48)
t

117

Proof:

By the corollary to theorem 5-6
rank Sn,®(t’¥) = n

where n is the dimension of a minimal realization of

W(t,1). Furthermore, by equation (5-35)
rank S(t,?) i n

Hence by equation (5-19) F(tls) can have no more than

n linearly independent elements. By the corollary to
theorem 5-6 and equation (5-19) Fn(t|s) must contain n
linearly independent components. Hence Pn(tls) contains
a maximal linearly independent subset of the components of
F(tls) and thus by theorem 5-3 a basis for the predictor

space.
Q.E.D.

That the search for a basis of the linear space
spanned by the row vectors of S(t, f) may be restricted
to the first n block rows is already established by
equation (5-35) and the corollary to theorem 5-6. It can
also be shown that it is only necessary to consider the
first n block columns. By an argument entirely analogous

to that preceeding equation (5—35) it can be shown that

118

r - - - j
wO O(t,T) w0’1(t,1) WO’2(t,T) . . .
Wl’0(t,T) ”1,1(t'T) W1,2(C,T)

rank W2’0(t,f) W2,l(t,;) W2,2(t,f)

 

 

L . . . J
P a
W0,0(t,T) W0,l(t,T) . . . wO,n-1(t’T)
W1,O(t,T) Wl,l(t,T) . . . wl,n-1(t’T)
= rank W2,O(t,?) W2,1(t,¥) . . . Wz’n_l(t,?) : n (5'49)
u . . . J

 

 

Equation (5-49) implies that in searching for the basis of
the linear space spanned by Sn oo(th) that it is not
necessary to consider elements in the columns occurring
after the first n block columns in testing for linear
independence. The linear independence will show up in the
first n block columns and attention may be restricted

to Snn(t,¥).

An analogue to equation (4-76) will now be developed.

 

 

Theorem 5-9:
kA g wk,j(t’T)’ T = t0 or T = t
E 3 y(tlt) (j) - T = . (5-50)
k n (T) ... -
3t Wk j(t,T), to < T <.t
Proof:

From equation (A-lS) of the appendix for

j,k = O,1,2,....

119

 

k . t k+j
E i—XiﬁiEl n(3)<%> = E I 3 ”(t'T) dz<T)nT<¥)

 

 

 

 

 

 

 

 

 

 

k j
at tO at 3T
t k+j
= E‘ I a g(th) n(1)dTnT(f)
t at 3T
0
t k+j _
= [ 8 g(th) G(T-T)dr
tO 8t 81
k+j -
g. a g(tir) ¥ = t or f = t0
8t BTJ
k+j -
8 g(tiT) tO < f < t . (5-51)
at BTJ
0 E z [to,t]
Q.E.D
Corollary 1:
For all positive integers n and t0 < f < t
rr
y(tlt) ‘ ‘
32(tlt)
at
E . [h<¥>T,n1<E)T.....n(“'1)<¥fr] F
_3n’19(tlt)
- .1
at“ 1 J
F’ _ _ -
W0,0(t,T) W0,l(t,1) . . . W0,n_l(t,I)
W1,O(C,T) W1,l(t,T) . . . wl,n-l(t’T)
= I I i . (5-52)
LWn_1,0(t,r) Wn_l’1(t,T) . . . Wn_l’n_l(t-T)J

 

 

120

Proof:

The corollary is a direct consequence of the lemma.
Q.E.D.

Equation (5-52) will be represented by E{Fn(t,t)ﬂn(f)} =
Snn(t,¥). The symbols Fn(t|t) and Snn(t,¥) are already

defined and
nn(¥) = [n(¥)T,n1(¥)T,...,n(n‘1)(¥)T] . (5-53)

CorollaryﬁZ:

 

 

For all positive integers n and f = t0 or
E = t
_ = } - _,
E {Tn(tlt)ﬂn(rl} 2 Snn(t,T) . (5 34)
Proof:
The corollary is a direct consequence of the
theorem.

Q.E.D.
Suppose now that the hypotheses of theorem 5-7

are satisfied. Then from theorem 5-8 the vector Pn(t|t)

contains an n dimensional basis x(tIt) for the pre-

dictor space L2(y;t|s). It is clear from equation (5-54)

that
M(t,t) = 2E{x(t|t)ﬂn(t)} - (5-55)

and by the same reasoning as in the stationary case

H(t,t) = 2E{3—X§—E—l£)— ﬂn(t)}. (5-56)

121

The n x n matrix A(t,t) is a submatrix of M(t,t)
selected by taking the linearly independent columns of
M(t,t). Denoting the linearly independent elements of
ﬂn(t) corresponding to the linearly independent columns

of M(t,t) by vT(t),
A(t,t) = 2E{x(t|t)vT(t)}. (5-57)

The n x n matrix K(t,t) occupying the same column
positions in M(t,t) as does A in M(t,t) is then given

by

K(t,t) = 2E {Rig—Eli)— vT(t)} . (5-58)

The m x n matrix Al(t,t) occupying the same column
positions in Sln(t,t) as does A(t,t) in M(t,t) is

given by
Al(t,t) = 2E{y(t|t)vT(t)} . (5-59)

Finally, the n X r matrix A2(t,t) occupying the first

r columns of M(t,t) is given by
A2(t,t) = 2E{x(tlt)nT(t)}. (5-60)

Now using equation (5-38) and the fact that y(tlt) = y(t)

Al<c,t>A‘1<t,t> 2E{H<t)x<t|t>vT(t>}[2E{x<t|t>vT<t>}J’1.

H(t). (5-61)

Next, using equation (5-39)

 

122

A2(t.t> 2E{x(t|t)nT(t)}

2E [t T
¢(t,T)G(T)n(T)dT n (t)

to.

p:.
2 I @(t,T)G(T)6(T‘t)dT

to

= 2-%¢(t,t)G(t)

G(t). (5-62)

By a proof virtually identical to that for theorem 4-11

it can be shown that

Egg—ti = F(t)x(t|t). (5-63)

Hence

K(t,t)A'l(t,t) 2E{F(t)x(t|t)vT(t)}[2E{x(t|t)vT(t)}]-1

F(t) . . (5-64)

Thus the verification of Silverman's algorithm for
the case of an analytic impulse response matrix turns out
to be nearly identical to that for the stationary case.

As noted in the discussion after theorem 5-7 the realiza-
tion must be observable and controllable. The properties
of uniform observability and controllability will not be
discussed here since consideration of them would involve
developing the properties of generalized controllability
and observability matrices, (see Silverman [S-2] or Silver-

man and Meadows [S-3]) a topic outside of the scope of this

dissertation.

123

the concepts used here.

3. Example

Consider the linear time-varying system

(t) t

9;. x1 =
dt x2(t) o
y1(t) - l
y2(t) 0

t2 xl(t) 1 o
+

t x2(t) O 2

O x1(t)

1 x2(t)

u1(t)

u2(t)

A simple example is given next to illustrate

(5-65)

(5-66)

The transition matrix for this system,¢(t,r), is given by

¢(t,t) =

is

r 2 2 2 2 ﬂ
t -T t -T 3 3
2 2 (t -T
e e “_—§_——
£42.
0 e
L J
The impulse response matrix W(t,T)
F 2 2 2 2
t "T E—‘T 3 3
e 2 2e 2 £E_%l_l
2 2
E_:l_
0 2e 2
u

 

 

 

 

 

 

(5-67)

(5-68)

(5-69)

124

The matrix has been partitioned into the submatrices

wj,k(t’T)' For t = T

 

 

F1 0 l-t -2t2 [CZ-1 4t(t2-1) q
o 2 :0 -2c '0 2(t2-1)
533(c,c) .. c 2:2 l-cz «:3 lt(t2-1) 6t2(t2-1) (5-70)
0 2: lo -2t2 lo 2t(t2-l)
[2+1 4t(t2+l)l-t(t2+l) -6t2(t2+1)|(t2-1)(t2+1) 8t(t2+1)(t2-1)
Lo 2(c2+1) :0 -2c(c2+1) !o 2t2(t2-1) J

The conditions of theorem 5-7 are satisfied by Sll(t,T)
and 822(t,r). It can easily be verified that

rank 311(t,T) = 2 and rank 322(t,r) = 2 for all

T 3 t. The fixed n x n submatrix A(t,T) of

S£k(t,T) = Sll(t,r) mentioned in theorem 5-7 is 811(C,T).

 

That is
r q
tz-TZ tz-TZ
e 2 2/3e 2 (t3-T3)
A(t,I) = t2-T2 ° (5-71)
0 2e 2
L J

 

 

A realization is now constructed from S33(t,t) by using
Silverman's algorithm. The matrix M(t,1) is just A(t,T).

The matrix H(t,r) is given by

F t2 2 2 2 . 1
-‘t t -'r
te 26 [t2 + %(t3-T3)t]
H(t,T) = tz-TZ . (5-72>
0 2te 2
L. .J

 

 

125

The matrix A(t,T) is already defined and A(t,T) is
H(t,r). The matrices A1(t,T) and A2(t,r) both equal
A(t,T). Thus

A2(t,t) = A(t,t) = (5-73)
and

_1 1 o
A1(t.t)A (t,t) =- (5-74)
0 1

yield the input and output coefficient matrices as

expected. Also

A(t,t)A-1(t,t) =

= (5-75)

as expected.

Next the predictor space approach is illustrated.
Denote the two components of y(tls) by 91(tls) and
92(tls). Also denote the aBth component of the 2 x 2

matrix ij(t,t) by W§E(t,t). Then by equation (5-19)

m s - k
91(tls) = Z Wéi(t,¥) I Slﬁ%l— n1(1)dr

k=o tO
oo . S - k
+ X Wli(t.¥) I £l%%l— n2(T)dT (5-76)
k=o 0 t0
and
A m 22 - S r-tr)k
y2(t|s) = z Wok(t,1) J n2(T)dT. (5-77)
k=o t

O

'25

I28

126

Also

3? (tls) w _ s _— k
———=m1mJ—s>—W<

k-o t °
_ , (3
00 S "k
+ z W1i(t'¥) I $lﬁ%l— n2(T)dT (5-78)
k=o tO '
and
3? (tlS) m _ S _- k
Zat k2 wiﬁ(c,T) [ $1E%l— nz(1)dr. (5-79)
=o t '

o

The basis of the predictor space x(tIt) is just the
vector y(tlt). Consider equation (5-57). It is assumed
that the white noise input vector to the system,

q(t) = [nl(t),n2(t)jT, consists of independent white
noises nl(t) and n2(t) each with unit covariance.

Hence v(t) = n(t) and

 

T
A(t,t) = 2E{x(t|t)v (t)}
T
= 2E{Y(t|t)n (t)}
F: Vita») J 1'73}: n1(:)dr + Wins) It ("wk n2(1)d1
.0 0 Co
k [11(1).n2(r)]
O t -
L Rio w§i(c.c) {to 11E§1_ n2(1)dr
E w11<c.c) {t (l't’k 6(1-t)dr ; w12(c,c) it $1;$ZE 6(1-t)d1
kw ok JCO‘TT— kw ok to n. (5-80)
- c a k -
L 0 Rio w§§(c,c) [to $;E§l- 6(1-t)d1
F H

11 12
W00(t,t) W00(t,t)

 

22
o w00<t.t>j

r

= (5-81)

 

127

Similarly

A(t,t) 2E {22% vT(t)}

2E {ﬁg—Elli nT(t)}

 

' 11 12 '1
W10(t,t) W10(t,t)
= 22
0 w t,t
u 10( )J
r-t: 2t:2
= . (5-82)
0 2t
L.

 

The matrices A1(t,t) and A2(t,t) both equal A(t,t)
and so all four matrices necessary for the implementation
of the Silverman algorithm have been computed from the pre-
dictor space approach.

With this example the discussion of the realiza-

tion problem for analytic impulse response matrices is con-

cluded.

CHAPTER SIX

' CONCLUSION

In this thesis a new approach to,the realization
problem, the predictor space approach, has been extended
to continuous time finite dimensional systems having
analytic impulse response matrices. This approach provides
a unifying overview to the algorithms for the solution to

the realization problem.

1. Summary of Results

A representation problem which provides the
foundation of the predictor space approach is considered
first. There is given two second order zero mean stochastic
processes y(t) and z(t), with y(t) measurable and
z(t) purely nondeterministic and continuous in quadratic
mean. The predictor space is defined as the Hilbert space
of random variables spanned by the projections of y(t)
onto the space spanned by z(s) for t 3 s. A finite
basis for the predictor space is assumed and then it is
shown that y(t) can be expressed in terms of this basis
and an error term.

The predictor space forms the state space for the

stochastic system representation. The dynamic behavior of
128

129

the basis is developed using a theorem by Halyo and
McAlpine, the theory of weak confergence in Hilbert spaces,
and the Hida-Cramer representation for purely nondeter-
ministic processes that are continuous in quadratic mean.

To conclude the chapter two special cases are
considered. It is shown that if the given purely nondeter-
ministic process is white noise, then by considering it to
be the innovation process for a measurable Wiener process
the theory remains valid. Finally it is established that
if the two given processes are stationary with stationary
cross covariance, the system representation obtained has
stationary coefficient matrices.

Next the realization problem for stationary systems
is considered. Given an impulse response matrix known to
be realizable, a stochastic system driven by white noise
is defined whose input-output covariance equals the given
impulse response matrix. It is shown that the input and'
output of the stochastic system satisfy the prerequisites
for the application of the previously developed representa-
tion theory with the exception of the existence of a finite
basis for the predictor space of the system output. The
proof that such a basis can be found is then carried out.
Having established that the representation theory developed
in chapter 3 applies, it is shown that the coefficient
matrices of the stochastic representation provide a solution
to the realization problem for the given impulse response

matrix.

130

Two of the key algorithms in realization theory,
an algorithm due to Silverman and the Ho-Kalman algorithm
are analyzed. It is shown that both can be interpreted in
terms of various manipulations of a basis of the predictor
space. An example is given which illustrates many of the
concepts used.

The realization problem for time-varying analytic
impulse response matrices is considered next. The develop-
ment parallels that for the stationary case for the most
part, although some theorems on the rank of analytic
matrices are needed. The Silverman algorithm is analyzed
for this case. Another example is also given to illustrate.

the theory.

2. Problems for Future Research

The most pressing problem for future research is
to consider developing the predictor space approach for
more general time varying systems. If this could be done
it would furnish insight into an area where algorithms at
present are sorely lacking. The algorithm by Silverman
can be used for "constant rank systems", a somewhat more
general class of systems than considered here. It would be
of interest to extend the predictor Space approach to at
least this class of systems.

It might also be of interest to attempt an exten-
sion of the predictor space approach to some classes of non-
linear systems. This is an area with few results at present

and any insight shed by a new approach would be of interest.

BIBLIOGRAPHY

[A-l]

[C-1]

[C-2]

[C-3]

[D-1]

[D-2]

[G-l]

[G-2]

[G-3]

[H-1]

[H-2]

BIBLIOGRAPHY

Akaike, H., "Stochastic theory of minimal repre-
sentation," IEEE Trans. Automat. Contr., AC-19
(1974). PP. 667-674.

Chen, C.T , "Linear Independence of Analytic
Functions," SIAM J. Appl. Math., 15 (1967),
pp. 1272-1274.

Cramér, H., "Stochastic processes as curves in
Hilbert space,” Theory Prob. Appl., 9 (1964),
pp. 169-179.

Cramer, H., "On the structure of purely non-
deterministic stochastic processes," Arkiv Mat.,

3 (1961), pp. 249-266.

Desoer, C.A. and Varaiya, P., "The minimal realiza-
tion of a nonanticipative impulse response matrix,”
SIAM J. Appl. Math., 15 (1967), pp. 754-764.

Doob, J.L., Stochastic Processes, John Wiley,
New York, 1953.

 

Gel'fand, I.M. and Vilenkin, N. Ya., Generalized
Functions, Vol. 4, Applications of Harmonic I
Analysis, Academic Press, New York, 1964.

 

 

Gilbert, E.G., "Controllability and observability

in multivariable control systems,” SIAM J. Control,
1 (1963), PP. 128-151.

Gikhman, I.J. and Skorokhod, A.V., Introduction to
the Theory of Random Processes, W.B. Saunders Co.,
Philadelphia (1969).

 

 

Halyo, N. and McAlpine, G., "On the spectral
factorization of nonstationary vector random pro-
cesses," IEEE Trans. Automat. Contr., AC-l9 (1974)
pp. 674-679.

Ho, B.L. and Kalman, R.E., "Effective construction
of linear state-variable models from input/output
functions," in Proc. 3rd Allerton Conf. Circuits
and Systems Theory (1965), pp. 449-459.

131

 

[I-l]
[K-l]

[K-Z]

[K-3]
[K-4]

[K-SJ
[L-l]

[M-l]

[0-1]

[P-1]
[P-2]
[P-3]

[P-4]

132

Ivkovich, Z. and Rozanov, Y.A., ”0n the canonical
Hida-Cramér representation for random processes,"

Theory Prob. Appl. 16 (1971), pp. 351-356.

Kailath, T., ”The innovations approach to detection
and estimation theory," Proc. IEEE, 5B (1970),
pp. 680-695. ‘

Kailath, T., "RKHS approach to detection and
estimation problems - part I: deterministic
signals in Gaussian noise," IEEE Trans. Inform.
Theory, IT-l7 (1971), pp. 530-549.

Kalman, R.E., "Mathematical description of linear
dynamical systems,” SIAM J. Control, 1 (1963),
pp. 152-192.

Kalman, R.E., Falb, P.L., and Arbib, M.A , Topics
in Mathematical System Theory, McGraw-Hill, New
York (1969).

 

Kallianpur, G. and Mandrekar, V., ”Multiplicity
and representation theory of purely non-determin-
istic stochastic processes," Theory Prob. Appl.,
10 (1965), pp. 553-581.

Lee, K.Y. and Mandrekar, V., "Multiplicity and
martingale approach to linear state estimation,”
Proc. Princeton Conference on Information Sciences
and Systems, 1973.

Mandrekar, V. and Salehi, H., "Operator valued wide-
sense Markov processes and solutions of infinite-
dimensional linear differential systems driven by
white noise," Mathematical System Theory, 4 (1970),
pp. 340-356.

Olmstead, J.M.H., Real Variables, An Introduction
to the Theory of Functions, Appleton-Century-Crofts,
New York, 1959.

 

 

Padulo, L. and Arbib, M.A., System Theory, W.B.
Saunders Co., Philadelphia, 1974.

 

Papoulis, A., Probability, Random Variables, and
Stochastic Processes, MCGraw-Hill, New York, 1965.

 

 

Parzen, E., Time Series Analysis Papers, Holden Day,
San Francisco, 1967.

 

Prohorov, Yu, V. and Rozanov, Yu. A., Probability
Theory, Springer-Verlag, New York, 1969.

[R-l]

[S-l]

[S-2]

[S-3J

[S-4]

[S-5]

[T-l]

[W-l]

[W-2]

[W-3]

[Y-l]

133

Rudin, W., Real and Complex Analysis, McGraw-
Hill, New York, 1966.

 

Silverman, L.M., "Realization of linear dynamical
systems," IEEE Trans. Automat. Contr., AC-l6 (1971),
pp. 554-567. ‘

Silverman, L.M., "Representation and realization
of time-variable linear systems," Dept. Elec. Eng.,
Columbia Univ., New York, N.Y., Tech. Report 94,
June 1966.

Silverman, L.M. and Meadows, H.E., ”Equivalent
realizations of linear systems,” SIAM J. Appl.

Math., _1_7_ (1969), pp. 393-408.

Silverman, L.M. and Meadows, H.E., "Controllability

and observability in time-variable linear systems,"
SIAM J. Control, 5 (1968), pp. 64-73.

Simmons, G.F., Introduction to Topology and Modern
Analysis, McGraw-Hill, New YorRT’1963.

 

Tether, A., "Construction of minimal linear state-
variable models from finite input-output data,"
IEEE Trans. Automat. Contr., AC-15 (1970), pp.427-
436.

Wiener, N. and Masani, P., "The prediction theory
of multivariate stochastic processes,” Acta. Math.,
9B (1957), pp. 111—150.

Weiss, L. and Kalman, R.E., "Contributions to
linear systems theory,” Int. J. Eng. Sci., 3 (1965),
pp. 141-171.

Wong, E., Stochastic Processes in Information and
Dynamical Systems,iMcGraw-Hill, New York, 1971.

 

 

Youla, D.C., "The synthesis of linear dynamical
systems from prescribed weighting patterns,” J.
SIAM Appl. Math., 14 (1966). PP. 527-549.

APPENDIX

 

APPENDIX

GENERALIZED RANDOM PROCESSES

Let (Q, A, P) be a probability space and consider
the linear space L2(Q, A, P) of all zero-mean finite
variance random variables with norm equal to the variance.
A second order random process, say {x(t,w):t e T}, is
then a mapping associating with every t e T a random
variable x(t,w) in L2(Q, A, P). White noise is not such
a second-order random process. To obtain a rigorous theory
for white noise it is necessary to go to the theory of
generalized random processes. An important reference in
this area is Gel'fand and Vilenkin [G-l]. The book by
Prohorov and Rozanov [P-4] is also helpful and an intuitive
discussion may be found in Kailath [K-2].

Let U be a real or complex linear space of
elements u. A function x = x(u) of the parameter u e U
whose values x(u) are real or complex random variables
on a probability space (R, A, P) is called a random
functional. The random functional x(u) is called linear

if for any u,v e U and numbers a,B
X(au + BV) = aX(u) + BX(V) (A-l)

with probability 1. The finite dimensional probability
134

 

135

distribution functions of a generalized process are defined,

in an analogous manner to the usual case, by

F(a1,a2,...,an) =
P{w:x(u1,w) i al,x(u2,m) i a2,...,x(un,w) i an} (A-2)
for all finite collections {ul,...,un} ‘of elements of U.

The linear space U is usually taken to be the
set D of all real or complex functions w defined on
the real line which are infinitely differentiable and have
compact support. The concept of convergence used to
define a topology on D is that Yn converges to ¢ as
n goes to infinity, denoted ”n.: ¢, if all functions

on = mn(t) vanish outside a fixed finite interval which

dk

. ¢n(t)

18 the same for each function and .————E—— converges
dt

k
uniformly to g4&£%;—3 as n goes to infinity, for each
dt

derivative of order k = 0,1,2, .. . The space D equipped
with this topology is called the space of testing functions.
The random functional x(o) defined on D is

called continuous if the convergence in D of the functions

ij(t) t0 @j(t), j = l,...,n, implies
111m (X(cpkl),...,x(cpkn)) = (x(sl),...,x(cpn')). (A-3)

A continuous linear random functional on D is called a
generalized random function. In the case that D consists

of functions of one variable, the corresponding random

136

function will be called a generalized random process. This
is the case of interest here and for the rest of this
appendix D will mean the space of testing functions re-
stricted to functions of one variable.

A number of operations are defined on generalized
random processes in a manner analogous to that by which they
are defined for generalized functions. By a linear com-
bination axl + 8x2 of the generalized random processes
x1 and x2 is meant the random process x associating
with every function @(t) e D the generalized random
variable axl(¢) + Bx2(¢). The product f(t) x of an
infinitely differentiable function f(t) and a generalized
random process x is defined astimaprocess for which
there corresponds to the function w(t) e D the generalized
random process x(fQ). The derivative of a generalized

random process x(o) for Q 5 D is defined by
X'(cp) = -X(cp'). (A-A)

It is important to note that the derivative of a generalized
random process always exists and is a generalized random
process. This is in contrast to ordinary random processes
which may not have derivatives that are ordinary random
processes. In particular, the derivative of the Wiener
process does not exist as an ordinary random process, but
it does exist as a generalized random process.

The material necessary to establish lemma 4-3 and

theorem 5-9 is now developed. It will first be shown that

 

137

 

k . t ak+
E{9_3;<1§J_t_>ﬂ (J)(—F)}= BUB 113“?) (ms n Tm} (21-5)

at tO at 3T

where

ak (t t) ;'aky(t|s)

 

at at s=t
t kw
= 3 (t,T) dz( ) (A-6)
4CO at T

and n(t) is a stationary Gaussian white noise interpreted
via equation (4-3) as the derivative of the Wiener process

z(t). Define the generalized Wiener process z(w) by

2(6) = [ 6<t>dz<t> <A-7)

for any r x 1 vector ¢(t) whose components ¢i(t) are

testing functions. Then for k = 0,1,2,

EEAIE'U— Z(zp )T=} E{J:O 8 kW(t, tR,T)dz(T)[I cp(T)dZ(T)]T} .

(A-8)

 

 

Since W(t,1) is an impulse response matrix it is identically

3kW(t,T) dz(r)
at

t
zero for t < 1 so I may be replaced by

k
I i—gfﬁLll dz(r). Furthermore, assuming the input z(r)

to be zero for r < t0 the lower limit of integration to

may be replaced by -m. Hence for k = 0,1,2,

138

k 00 00
1369—23? Z(cp)T}= EU Alt—”113.11 dz(r)U cp(T)dz(T)]T}

at -m 3t

= [ ————T(-—3 Wt”) cpT(T)dT (A-9‘)
rm at

by the properties of the orthogonal increment process
Z(T).
Next, by definition, the white noise process

n(¢) is given by

U(m) = Z'(m)
"-' -Z(q)'). (A-lO)
Hence
k k '
{%L}{—Le~}
t t
m k
= -I §—§EELLLCP'(T)TdT. (A'll)
Consider n(j)(o) for j = 0,1,2,... . By definition '
n(j)(w) = <-1>jn<¢(j)>. <A-12)

Then for j = 0,1,2,...

E{3k2(t|t) n(30W)?
, 3t

. k .
(_1)JE{3_2_(1F(J_tl U(cp (3))T}
. 3t (A-13)
' °° kW(t ) .('+1) T
= (-l)J+1J i———E*l— J (T) dT
- 3t ¢
where the last equality is by the use of equation (A-ll).

Integrating by parts j times and using the fact that

w(r) has compact support

 

139

(-1>j+1A£:§E§fﬁ*ll-v(j+l)<r>Tdr

 

. m k+j
(_1)23+l[ 3 .W(E,T)

BtJB J m'(T)TdT

¢'(T)TdT. (A-14)

 

[m 8k+jW(t,T)
’ -m atRaTJ

Hence

E{3k2(tlt) “(j)(CP)T}= -[°° 3k+j¥(tj.r) (P'(T)Td1

3t at 31

co k+j co T
= _ 3 W(t,‘l‘) v
E{I-m atka-rj dz(T)[:[-mcp (T)dZ(T)] }

"m k+j
_ _ 3 W(t,T) . T}
— -E ~ . dz( )z( )
{J-m BthTJ T T

t k+j
= E{[ 3 g(tiT) dz(T)n(cp)T} (Ac-15)
tO 3t 813

 

 

 

 

Next suppose that W(t,T) is stationary so that

 

 

 

W(t,T) = W(t - T). (A-lG)
Then
akJ’JWr-Jl = ak+Jw<;--r_>_ (A47)
atkaTJ BtkaTJ
= (-1)jak+3w<y-I>
atk+J

Substituting this into equation (A-lS) yields

 

140

 

 

k . -. t k+j
a (tlt) ( ) _ a W(t- ) T
E{ it? n J RN} _ (-1)JEUt 3tR+j T dZ(T)n(CP) }
O

. . k+j
_<.-1>JE{3 if?” MN}. <A-16

 

3t
Interpreting equation (4-76) in this sense establishes

lemma 4-3.