A STOCHASTIC APPROACH TO THE MINIMAL REAUZATION PROBLEM, Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY JAMES W. CLARY 1975 This is to certify that the thesis entitled A STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM presented by James W. Clary has been accepted towards fulfillment of the requirements for Ph. D degree in Systems Science flag; 0-7639 ABSTRACT A STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM By James W. Clary The goal of this work is to exploit the concept of a predictor Space in the continuous time realization problem for systems generating a stationary or analytic impulse response matrix. The predictor space provides a natural representation of the state space and a minimal realization corresponds to selecting a basis of the pre- dictor space. A representation problem is considered first. Given two second order processes y(t) and z(t), with y(t) measurable and z(t) continuous in quadratic mean and purely nondeterministic, the predictor space for y(t) is formed. Under rather general conditions if a finite dimensional basis for the predictor space exists then y(t) can be represented as the output of a linear finite dimensional dynamical system driven by the innovation pro— cess of z(t). ' Next the realization problem for stationary systems is considered. Given a stationary impulse response matrix a stochastic system is defined whose input-output covariance James W. Clary equals it. The output of this system generates a finite dimensional predictor space. The previously developed representation theory then supplies a solution to the realization problem. Two algorithms are discussed in the context of the predictor space approach. The realization problem for systems generating a time-varying analytic impulse response matrix is considered next. The development is similar to that for the stationary case with the exception of some material on the linear independence of analytic functions. An algorithm by Silverman for time-varying systems is discussed in the context of the theory developed. A STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM By 0" ‘ James WPVClary A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering and Systems Science 1975 ACKNOWLEDGMENTS The author would like to express his gratitude to his major professor, Dr. Kwang Y. Lee, for introducing him to this topic and for his constant advice and en- couragement during the preparation of this thesis. Gratitude is also expressed to Dr. Robert O. Barr for introducing the author to the subject of Systems Science through his excellent teaching and for providing valuable assistance during his years of graduate study. Thanks are also due to Dr. Robert A. Schlueter for many helpful discussions and much encouragement. The valuable advice of Dr. V.S. Mandrekar is also gratefully acknowledged. Finally thanks are due to the author's wife Kay for her love, patience and understanding during the time spent on this work. ii Chapter 1 2 TABLE OF CONTENTS INTRODUCTION MATHEMATICAL BACKGROUND 2.1 The Realization Problem 2.2 Hilbert Spaces of Random Vectors MARKOVIAN REPRESENTATION OF CONTINUOUS—TIME STOCHASTIC SYSTEMS 3.1 General Theory 3.2 Special Cases STOCHASTIC APPROACH TO THE MINIMAL REALIZA— TION PROBLEM IN THE STATIONARY CASE 4.1 General Theory 4.2 Silverman's Algorithm 4.3 The Ho-Kalman Algorithm 4.4 Example STOCHASTIC APPROACH TO THE MINIMAL REALIZA- TION PROBLEM IN THE ANALYTIC CASE 5.1 General Theory 5.2 The Silverman Algorithm 5.3 Example CONCLUSION 6.1 Summary of Results 6.2 Problems for Future Research BIBLIOGRAPHY APPENDIX: GENERALIZED RANDOM PROCESSES iii Page 17 23 24 42 46 47 68 82 92 101 101 114 123 128 128 130 131 134 CHAPTER ONE INTRODUCTION Consider an input-output description of an unknown system in the form of an impulse response or weighting pattern matrix. Such a description may arise either through the performance specifications in the case when one designs a new system, or from experimental output obtained by imputting delta functions in the case when one is attempting to identify an unknown system. The problem of obtaining a state space description of the unknown system from the impulse response or weighting pattern matrix is known as the realization problem. Beginning primarily with the papers of Gilbert [G-2] and Kalman [K-3] in 1963 the realization problem has been the subject of considerable research. Gilbert [G-2] gave a solution to the realization problem for the sta- tionary case with the restriction that each element of the transfer function matrix has distinct poles. The paper by Kalman [K-3] showed that the impulse response matrix determines the controllable and observable portion of a system and gives a solution to the realization problem for stationary systems. His results make clear the importance of minimal realizations, that is state space representa- tions having the smallest possible dimension for the state vector. The solution to the realization problem by Kalman was presented in.the form of an algorithm which re- duced the state space of a nonminimal realization until it became minimal. A paper by Ho and Kalman in 1965 presented a dif- ferent approach to the realization problem for stationary systems. They showed that the minimal realization problem is equivalent to a certain kind of representation problem involving an infinite series of real matrices Yk’ k = 1,2,..., derived from the impulse response matrix and known as the Markov parameters. Their algorithm, while elegant, is difficult to motivate and explain in the con- text in which it was originally presented. A series of papers by Silverman [8-1] and Silver- man and Meadows [8-2, 8-3] published between 1966 and 1971 presented significant new results on the realization prob- lem and the closely related problem of equivalent system representations. Silverman obtained a new algorithm for the stationary realization problem which generalized to a large class of time varying linear systems, which he called "constant rank systems". This class of systems includes, but is larger than, the class of analytic systems. The algebraic approach to system theory as presented in Kalman, Falb, and Arbib [K-4] or Padulo and Arbib [P-l] has obtained results relating to the realization problem for stationary systems and analytic time-varying systems, but at present does not offer an algorithm for time- varying systems of the generality of Silverman's algorithm. A new approach to the realization problem, called the predictor space approach, has recently been presented by Akaike [A-l] for stationary discrete time systems. Akaike's approach uses a stochastic system driven by white noise to obtain a solution to a given realization problem by the following procedure. He first shows that given two zero-mean real valued second-order stationary Gaussian stochastic processes y(n) and z(n) that are mutually * stationarily correlated w1th covariance V(2) = E{y(n + 2)zT(n)} y(n) can be obtained as the output of a linear stochastic system driven by the innovation process of z(n) of the form v(n+1) = Fv(n) + Gw(n+l) (1-1) y(n) = Hv(n) + u(n) (1-2) w(n+1) = z(n+l) - z(n+l|n) . (1-3) The symbol z(n+l|n) represents the linear least squares prediction of z(n+l) given z(k), k = n, n-1, n-2,... It also is shown that * . E{-} denotes the expected value of the quantity within the brackets. The symbol (-)T indicates the transpose Operator. 4 wm=Hflmmemn,2=mLzu.. um) Then given a stationary impulse response matrix W(n), Akaike defines a stochastic system driven by a Gaussian white noise process z(n) with unit covariance by y(n) = Z W(m)z(n-m), n = O,1,2,... . (1-5) m=O It is then shown that since the input is white noise V(2) = w (1-6) so that using the previously derived representation (1-1) - (1-3) w = HFQG, 2 = O,1,2,... . (1-7) Thus the linear system x(n+l) Fx(n) + Gu(n+1) (1-8) Hx(n) (1-9) y(n) is a solution to the realization problem for the given impulse response matrix W(2). The basic concept in Akaike's approach is that of a predictor space, the space spanned by the projections of y(k), k = n, n+1, n+2,... onto the space spanned by z(k), k = n, n-l, n72,... . It is the structure of this space that makes possible his representation (l-l) - (1-3). Using the predictor space approach, Akaike is able to provide new insight into the realization problem and the structure of the various algorithms for stationary systems. The Silverman and Ho-Kalman algorithms for example are easily analyzed in this context. The pre- dictor space approach provides a powerful tool, as well as a unifying overview, for the realization problem. It is the purpose of this dissertation to gen- eralize Akaike's predictor space approach to continuous time systems. The outline of the presentation is as follows. Chapter 2 provides an introduction to the real- ization problem with some special results relating to the stationary case. A section reviewing the necessary back- ground material on Hilbert spaces of random variables and the Hida-Cramer representation for purely nondeterministic stochastic processes is also presented. In chapter 3 a representation problem for con- tinuous time stochastic processes is considered. It is assumed that two continuous time second-order zero mean stochastic processes y(t) and z(t), with y(t) mea— surable and z(t) purely nondeterministic and continuous in quadratic mean are given. The predictor space, the space spanned by the linear least-squares estimates of y(t) given {2(1): 1 i t}, is formed. It is then shown that by assuming a finite dimensional basis for the pre- dictor space and with appropriate continuity and separa- bility hypotheses that y(t) can be represented as the output of a continuous time linear system driven by the innovation process of z(t). The theory is easily generalized to the case when z(t) is white noise, and the chapter concludes with some special developments for the stationary case. Chapter 4 uses the representation theory developed in chapter 3 to solve the realization problem for continuous time stationary systems. Given a stationary impulse response matrix a stochastic system is defined whose input-output cross-covariance is equal to it. It is then shown that the predictor space contains a finite basis and that the repre- sentation theory developed in chapter 3 can be applied. It is shown that this representation provides a minimal realization for the given impulse response matrix. Two algorithms, the Ho-Kalman and the Silverman algorithm, are analyzed for the continuous-time stationary case. Although Akaike has already done this for the dis- crete-time case, and as shown in chapter 2 there is essentially no difference between the realization problem for the discrete-time and continuous time stationary cases, the justification is considerably more difficult in the context of continuous-time systems. The results demonstrate the power of the predictor space approach. An example illustrating the Silverman and Ho-Kalman algorithms and the predictor space approach to them is also presented. In chapter 5 the representation theory of chapter 3 is applied to the realization problem for systems gen- erating an analytic impulse response matrix. The procedure is found to closely parallel the case for stationary systems with the addition of some material on the linear independence of analytic functions. Silverman's algorithm is considered for this case and an example is given. In chapter 6 a summary, conclusions, and sugges- tions for further research are presented. CHAPTER Two MATHEMATICAL BACKGROUND This chapter is devoted to basic background material that will be used throughout the thesis. The first section discusses the realization problem for linear systems and some basic results associated with it. Much of the material is based on Silverman [8-1] and Ho and Kalman [H—2]. Relevant discussions of the realization problem may also be found in Kalman [K-3] and Kalman, Falb, and Arbib [K-4]. The second section is devoted to developing the concepts associated with Hilbert spaces of random variables and the Hida-Cramer representation for purely nondeter- ministic stochastic processes. Basic references here are Wiener and Masani [W-l] and Cramer [C-2]. Material from Lee and Mandrekar [L-l] is also used. 1. The Realization Problem Since the thesis is concerned only with continuous- time systems, the discussion will be primarily in terms of continuous-time systems. Consider the linear continuous- time system representation 9§§El = Fx + G(t)u (2-1) y(t) = H(t)x(t) (2-2) where x(t) is an n X 1 vector, y(t) an m X 1 vector, u(t) an r X l deterministic vector, and F(t), G(t) and H(t) are n X n, n X r, and m X n matrices respectively. To avoid distracting mathematical complications it will be assumed here that the components of the matrices F(t), G(t), and H(t) are continuous functions of time t. The system representation given by equations (2-1) and (2—2) will also be denoted by the subscripted triple = xx'1 (2-4) and X(t), the fundamental matrix for the matrix F(t) satisfies 9§§£l = Fx . (2-5) From equations (2-2) and (2-3) 10 t y(t) = H(t)(t,t0)xO + H(t)[ ¢(t,T)G(T)u(T)dT. (2-6) t o By defining K(t,T) = H(t)®(t,T)G(T) (2-7) equation (2-6) can be written I: y(t) = H(t)<1>(t,to)xO + I K(t,T)u(T)dT. (2-8) I: o The matrix K(t,T) is called the weighting pattern matrix for the system (2-1) and (2-2). For zero initial con- ditions, i.e., x0 = 0, equation (2-8) shows that the weighting pattern matrix K(t,r) contains all the informa- tion needed to describe the input-output relations of the system (F(t), G(t), H(t))n. For a causal system, a system characterized by the property that the behavior of the system at any time t is completely determined by events at times t i E 9 the impulse response matrix W(t,T) is defined by (t)®(t,T)G(T) t Z T W(t,T) = . (2-9) The impulse response matrix completely characterizes the input-output properties of (F(t), G(t), H(t)}n, for zero initial conditions, viewed as a causal system. For the case when K(t,T) is analytic, as when F, G, and H are constant matrices, the impulse response matrix and 11 weighting pattern matrix are equivalent representations since W(t,T) will have a unique analytic extension equal to K(t,T) on the region r > t. For the general case, of course, W(t,T) may have many different possible exten- sions and K(t,r) is not an equivalent representation to W(t,r). The problem of computing a state space representa- tion (F(t), G(t), H(t))n from an impulse response or weighting pattern matrix is known as the realization problem.* A more formal and precise definition is given below after reviewing the concept of equivalent system representation. In general there is not a unique solution to a realization problem. Different realizations of the same impulse response or weighting pattern matrix may have quite different characteristics. The concept of equi- valent representations is thus necessary to fully under- stand the realization problem. A system (F(t), G(t), H(t))n is algebraically equivalent to a system (F(t), G(t), H(t))n if and only if there exists a continuously differentiable matrix T(t) with det T(t) # O for all t e R such that E(t) = T(t)F(t)T-l(t) + égéil T'1(c) , (2-10) G(t) = T(t)G(t) (2-11) H(t) = H(t)T-1(t) . (2-12) * In the stationary case one could also start with a transfer function matrix. These will play no role in the thesis except in a tangential manner in chapter 4, and will not be discussed here. 12 The system (F(t), G(t), H(t))n is said to be strictly equivalent to (F(t), G(t), H(t))n if (2-10) - (2-12) hold for a constant nonsingular matrix T. In general algebraic equivalence does not preserve the stability pro- perties of a system. For this it is necessary to have topological equivalence (Kalman [K—31): algebraic equi- valence plus the conditions ”T(t)” /\ 0 (2-13) HT'li| A O N (2-14) for fixed constants Cl and C2, where H-H is the Euclidean norm. It is clear that strict equivalence implies topological equivalence. It may easily be verified that algebraically equivalent systems have the same impulse response or weighting pattern matrix. How- ever systems that are not algebraically equivalent systems can also have the same impulse response or weighting pattern matrix. In general, systems having the same impulse response, or weighting pattern matrix, are called w-equivalent, or K-equivalent, respectively. A formal definition of the realization problem can now be given. A system representation (F(t), G(t), H(t))n (is said to be an nth-order impulse response realization if W(t,T) can be expressed in the form of equation (2-9). If for some finite n such a system representation exists W(t,T) is said to be a realizable impulse response matrix. 13 The representation (F(t), G(t), H(t))n is said to be a minimal impulse response matrix realization if there exists no W-equivalent system (F(t), G(t), H(t))H' with H < n. Weighting pattern realizability is defined analogously. There is, in general, a difference between the weighting pattern and impulse response realization problems in that even if W(t,T) and K(t,T) agree on the domain t 3 T, they may not give rise to equivalent realizations. Indeed, the respective minimal realizations may even be of different dimension. An example of this case is presented in a paper by Desoer and Varaiya [D-l]. However as already mentioned when the weighting pattern matrix .K(t,r) is analytic then the impulse response matrix W(t,T) has a unique analytic extension to the region t < T equal to K(t,T). It follows that for this case it makes no dif- ference whether the impulse response or weighting pattern matrix is used, the same class of realizations will be obtained. This dissertation will be concerned solely with systems generating analytic impulse response or weighting pattern matrices. The impulse response matrix will be used primarily but the results would be unchanged if the weighting pattern matrix were chosen. The following result due to Kalman [K-3] is basic in realization theory. Theorem 2-1: W(t,T) is a realizable impulse response matrix 14 if and only if it is separable in the form W(t.T) = y(t)6(T). t 1 T (2-15) where u(t) and 6(1) are continuous matrices of finite size. 22299 From equations (2-4) and (2-9) the impulse response matrix W(t,T) of the system (F(t), G(t), H(t))n can be expressed, for t Z T, as W(t.r> = 1Hx11xilc1 (2-16) showing the necessity of equation (2-14). The sufficiency of equation (2-15) follows from the obvious realization (0.6(t). W(t)). (2-17) Q.E.D. Although theorem 2-1 is of theoretical importance, it may not work well for practical applications. There is first the problem of determining if a function of two variables is separable. Also, even if W(t,T) is time invariant or periodic the realization given by equation (2-17) will not be. Furthermore the representation given by (2-17) will never be asymptotically stable. Silverman [S-l] gives a nice example to illustrate these points. Suppose e-(t-T) t Z T W(t.T) = . (2-18) 15 The realization given by equation (2-17) is then (0, et, e‘t) (2-19) which is time variable, has an unbounded coefficient, and is not asymptotically stable. The realization ('1, 1, 1) (2-20) is much nicer. Some material from Ho and Kalman [H—2] pertaining to stationary systems will be presented next. Consider the corresponding discrete time and continuous time stationary systems x(t+l) = Fx(t) + Gu(t) (2-21) y(t) = Hx(t), t an integer (2—22) and §§é51,= FX(t) + Gu(t) ‘ (2-23) y(t) = Hx(t), t a real number, (2-24) where x(t) is n x l, y(t) is m x 1, and u(t) is an r x 1 vector. The stationary impulse response matrix for the system in (2-21) and (2-22) will be denoted by Yk and is given by Y = HF G, k = O,1,2,... . (2-25) For the system given by (2-23) and (2-24) the stationary impulse response matrix W(t) is given by l6 W(t) = He G (2-26) m k kt = H 2 F _T C m k k t k=0 K . 00 ER -0 Hence, in the case of either a stationary discrete time system or stationary continuous time system the realization problem is equivalent to finding a constant system (F, G, H)n with the property that Y = HF G, k = 0,1,2, .. . (2-28) The Yk are called the Markov parameters of the system. The N1 by N2 block matrix built out of the Markov parameters , , Y0 Y1 YNz-l Y1 Y2 YNZ SN1N2 = E E 3 (2-29) _YN1-1 YNl ' - - YN1+N2-ZA is called the generalized Hankel matrix. Note that each Yk is an m X r matrix so SN1N2 is an N1 - m x N2 - r matrix. The generalized Hankel matrix will play an important role in chapter 4. 17 Note, finally, that from equation (2—26) a stationary impulse response matrix W(t) can have a realization of the form (2-23) and (2-24) only if it is analytic. 2. Hilbert Spaces of Random Vectors Let (Q, A, P) be a probability space. The set Q is a set of possible outcomes, A is a o—algebra de- fined on Q, and P is a probability measure defined on A. The set Q is called the basic space and its elements are called points. Members of A are called events. Let G be a basic space and F a o-algebra of subsets of G. Then the pair (O, F) is called a measur- able Space. If G is n-dimensional Euclidean space, denoted R3, and F the smallest o-algebra of subsets of Rn containing all rectangles, denoted B“, then Bn is called the n-dimensional Borel o-algebra and the sets in Bn are called Borel sets. A measurable function from a probability space (Q, A, P) into (Rn, B“). is called a random variable. For n = 1 such a random variable is called a real random variable. This will be the only case considered in this work and henceforth the term random variable should be interpreted to mean a real random variable. A second-order random variable X is one which satisfies E{X2} = j |X(w)|2P(dw) < m . <2-30> S2 18 A second-order stochastic process {X(t), t e T} is a one- parameter family of second—order random variables. The set of all second-order random variables defined on (Q, A, P) is denoted by L2(Q, A, P). The set L2(Q, A, P) is a Hilbert space (Gikhman and Skorokhod [G-3]) with the inner product L = Eixy} <2-31) 2 and norm HxHLz = /T§T§7 . <2-32) Let L3(Q, A, P) be the set of all random n vectors x with components xi 6 L2(Q, A, P), 1 = l,...,n . Hence x e L3(Q, A, P) if and only if J9|x(w)|2P(dw) < w (2-33) where |-| denotes the Euclidean norm n |x|2 = xTx = 2 IX I2 . (2-34) . 1 i=1 L3(Q, A, P) is a Hilbert space under the usual operations and inner product ll "P15 (X.y) n (x. y.) . 1’ 1 L L2 1 l 2 ll \—fi .‘0 Head lxi(w)yi(w)P(dm). (2-35) 19 This inner product generates the norm n 1 Hxll n = Wm H = (.2 11x1”: )2 (2-36) L2 L2 1:]. 2. which in turn induces a topology on L3(Q, A, P). A sequence {xk} in L3(Q, A, P) is said to converge to a vector x e L3(Q, A, P) if and only if ka - ann + o <2-37) 2 as n + m and to be a Cauchy sequence if and only if I Axk - me n + 0 (2-38) L2 as k,m + m. Two elements x and y are regarded as identical if the norm of their difference is zero, that is,I E{[x-y]T[x—y]} = O, and thus P(x = y) = 1. Convergence in norm in the Hilbert space L3(Q, A, P) is identical with convergence in quadratic mean in probabilistic terminology. I A linear manifold in L3(Q, A, P) is a nonempty subset M of L3(Q, A, P) such that if x,y e M then Ax + By 5 M for all n x n real matrices A and B. A subspace of L2(Q, A, P) is a linear manifold closed under the topology induced by the norm H-u n' A lemma due to Wiener and Masani [W-l] is now given :Eich proves the existence of the projection of an element x and gives its structure. The space L3(Q, A, P) is abbreviated L3. 20 Orthogonal Projection Lemma (Wiener-Masani): a) M is a subspace of L? if and only if there is a subspace M of L2 such that M = Mp, where Mn denotes the Cartesian product M X...X M with n factors. M is the set of all components of all functions in M. b) If M is a subspace of L3 and x 6 L3, then there exists a unique 9 such that y e M; Hx-yfl < Hx—yH n’ for all y e M. (2-39) n _ L2 L2 An element 9 6 L3 satisfies the preceeding condition if A and only if E{[y-y]xT} = O for all x 6 L3, i.e., y-y is orthogonal to L3, denoted y-y L L3. The unique element 9 of the lemma is called the orthogonal projec- tion of y on L3 and will be denoted by (yILg). For this 9, 91 = (yile), the orthogonal projection of yi on to L2. Consider now the second-order n-variate stochastic vector x(t) e L3(fl, A, P). The present of the process x(t) is defined as L3(x(t)), where L2(x(t)) is the sub- space of L2(Q, A, P) generated by {Xi(t)’ i = 1,2,...,n}. Similarly the past of the process up to t is defined by L3(x; t) where L2(x;t) is the subspace of 'L2(O, A, P) generated by {xi(s), s i t, i = 1,2,...,n}. To conclude this section some aspects of the Hida- Cramer theory on multiplicity and representation will be reviewed (see Cramer [C-2], [C-3], and Lee and Mandrekar 21 [L-ll). A second-order n-variate process {x(t), a i t i b} is called purely nondeterministic if n L (x;t) contains only the zero element of L (x;t), 2 2 agtib i.e., the random variable that is almost everywhere equal to zero. Denote by Pt the projection operator from L2(x;b) onto L2(x;t) and assume that x(t) is zero mean, purely nondeterministic, and continuous in quadratic mean. For any set of mutually orthogonal random variables {zl,...,zM} c L2(x;b) the M-variate process z(t), con- sidered as a column vector with components zi(t) defined by zi(t) Ptzi = (zilL2(x;t)) , (2-40) will be an orthogonal increment process in that two in- crements Azi(t) and Azj(s) are always orthogonal if either i f j or corresponding time intervals are dis- joint. The set of random variables {zl,...,zM} can be chosen [K-S] so that for all t 6 [a,b] L2(X;t) = L2(z;t). (2-41) The process z(t) = (zl(t),...,zM(t))T is called an innovation process for the x(t) process. The number M may be infinite. Define the M X M matrix time- spectral function F(t) by F(t) = E{z(t)zT(t)}. (2-42) 22 For a mean square continuous process x(t), with orthogonal increments z(t), F(t) is continuous and nondecreasing which implies that F(t). is differentiable almost every- where. Let L2(F) denote the Hilbert space cf all row- vector functions g(t) satisfying b T Ja g(t)dF(c>g < m <2-43) with inner product b F = [ gdFhT. <2-44) 8. Then there occurs the Hida—Cramer representation t x(t) = [a G(t,s)dz(s) (2—45) where G(t,s) is an n X M matrix function with each row gi(t,s) e L2(F), i = l,...,n. The innovation z(t) is not uniquely determined but it's dimensionality M is unique and is the smallest number for which (2-45) hOIds. The number M is called the multiplicity of x(t). Finally it should be noted that for a zero mean second-order quadratic mean process x(t) the space L2(x;t) is separable. A proof of this may be found in Cramer ([C-3], P. 253). CHAPTER THREE MARKOVIAN REPRESENTATION OR CONTINUOUS-TIME STOCHASTIC SYSTEMS In this chapter the following problem is considered. Given two real vector valued continuous-time zero mean second-order stochastic processes y(t) and z(t), with y(t) measurable and z(t) continuous in quadratic mean and purely nondeterministic, find a representation of y(t) as the output of a linear continuous time system driven by the innovations process of z(t). The concept of the predictor space, the Hilbert space spanned by the linear least square predictions of y(t) given z(r), to i r i t, will be used to derive a solution to this prob- lem under appropriate assumptions. Essentially the basic idea is that if the predictor space has a finite basis, this basis can be used as the state vector of a linear system generating y(t). After developing a general theory two special situations will be considered. The first, called the white noise case, will look at the representation when z(t) is white noise. A minor extension to the general theory will be necessary since white noise is neither second-order nor continuous in quadratic mean. The white noise case will be useful for applications. 23 24 The second special case of interest occurs when y(t) and z(t) are stationary processes with stationary cross covariance. The development of the properties of the representation for this case will conclude the chapter. 1. General Theory Consider two zero—mean real valued second-order 72' stochastic processes y(t) and z(t) with covariance Vyz(t,s) = E{y(t)zT(s)} (3‘1) The vectors y and z are m x l and r X 1 respectively. It is also required that y(t) be measurable and z(t) be continuous in quadratic mean and purely nondeterministic. Denote by L2(z;s) the Hilbert space obtained by taking the closure with respect to the mean square norm of the linear manifold spanned by the components of z(r), r i s. The predictors, denoted y(tls), are defined for t s to be the vector formed by projecting the components IV of y(t) onto L2(z;s). Letting yi(t|s) be a component of y(tls) with yi(t) the corresponding component of y(t) 91(tls) = (yi(t)|L2(z;s)), i = l,...,m, s i t. (3-2) *The covariance actually plays no role in this sec- tion. It will, however, be useful in section 2 and crucial for the applications in later chapters. E{°} indicates the expected value of the quantity in brackets while (-)T indicates the transpose. 25 Equivalently 9(tls) = IL§, s g t. <3-3) The predictors are specified by the relationships E{[y(t|s) — y(t)]zT(r)} = o, r i s : c,a (3-4) or E{y(t|s)zT(r)} = Vyz(t,r). (3-5) The predictor space L2(y;tls) is defined as the Hilbert space obtained by taking the closure with respect to the mean square norm of the linear manifold spanned by the components oftfluepredictors y(tls) for t Z s. In forming this space s is to be regarded as fixed while t varies over all values greater than or equal to s. It should be noted that L2(y;tls) is a subspace of L2(z;s). Following the example of Akaike [A-l] the assump- tion is made that the predictor space is of finite dimension. Since L2(y;t|s) is formed by projecting components of y(t) onto L2(z;s) with 5 fixed while allowing t to vary over all values greater than s, a finite dimensional basis, consisting of n random variables, can be regarded A as a function of s. The basis, of dimension n, will be A The basis could also be regarded as being of the form {x1(t1),x2(t2),... ,xn(tn)} where xi(ti) is an element of L2(y;tls), ti 1 s. The point is, for fixed 3 the basis can be fixed and changes only as s varies. denoted by the vector x(s) = (xl(s),x2(s),...,xn(s))T. (3-6) Letting L2(x(s)) be the Hilbert space generated by taking the closure with respect to the mean square norm of the linear manifold spanned by the components of x(s) it is obvious that L2<9;tls) = L2> . <3-7) Theorem 3-1: Let s and t be fixed with s i t. Then y(tls) = A(t,s)x(s) (3-8) for some m x n matrix A(t,s). Proof: Since x(s) is a finite dimensional basis for L2(9;t|s) then 9(CIS) e L2(9;t|8) (3-9) implies that 9(CIS) = AX(S) (3-10) for some matrix A. Since 9 varies with t' and s then A must also, so equation (3-8) is justified. Q.E.D. Corollary: Let s and t be fixed with t i s. Then 27 y(t) = A(t,s)x(s) + u(t) (3-11) where u(t) is an m X 1 vector orthogonal to L§(z;s). Proof: Set y(t) - §. <3—12) u(t) Then for r i s i t E{u(t)zT(r)} 311y - 9(CIS)]ZT(r)} = 0 (3-13) by equation (3-4). Hence u(t) is orthogonal to Lg(z;s) and combining equations (3-8) and (3-12) equation (3-11) is obtained. Q.E.D. The next goal will be to develop the dynamic be- havior of x(t). The following results are directed to this end. Lemma 3—1: Let {Xn} be a sequence of scalar random variables converging in quadratic mean to the process X. Then lim E{Xn} = E{X}. (3-14) n+oo Proof: Recall the Schwarz inequality |E{XY}|2 3 E{X2}E{Y2}. (3-15) 28 Let Y = 1. Then |E{[Xn — X]Y}|2 : E{[Xn - X]z}-E{Y2}. (3-16) That is |E{X - x112 < E{[X - x12}. (3-17) 11 — n But by hypothesis lim E{[X - x12} = 0 (3-18) n+oo n Hence lim |E{xn - X}|2 = 0 (3-19) n+oo which implies lim [E{X } - E{x}]2 = 0. (3-20) n+(X) n Equation (3-20) is equivalent to equation (3-14) so the lemma is proved. Corollary: x(t) is a zero mean random variable. Q.E.D. Proof: By definition each component xi(s) of the n X 1 vector process x(s) is an element of L2(z;s). Hence xi(t) is the limit in quadratic mean of a sequence of finite linear combinations of components of 2. But 2 is a zero mean stochastic process so finite linear combina- tions of components of 2 have zero mean. Hence xi(s) is a limit in quadratic mean of a sequence of zero mean 29 processes. Thus by lemma 3-1, xi(s) and hence x(s) have zero means. Q.E.D. It is appropriate at this point to make a few remarks concerning measurability since that will be the next property of x(t) to be considered. Let T be a Lebesgue measurable parameter set on which is defined a o-algebra of Lebesgue measurable sets L. Let Q be a probability space on which is defined a o-algebra of events A. The symbol L e A denotes the smallest o-algebra defined* on T X 9 including all sets of the form L X A such that L e L and A e A. Then a process {X(t), t e T} is said to be a measurable process** if X(t,w) is a (t,w) function measurable with respect to L e A. The assumption at the beginning of the chapter that y(t) is measurable is in the sense that it's components satisfy this definition. Lemma 3-2: x(t) is a measurable second-order random vector. Proof: Since a process continuous in quadratic mean has a measurable modification, z(t) can be taken to be a measurable process. Then L2(z;t) is a closed subspace of L2(M), the space of square integrable measurable A A X B denotes the Cartesian product of the sets A and B. AA The product measure defined on L e A is, of course, itself a measure (Rudin [R-l], p. 140). 30 functions defined on M = [to,t] X 9, where Q is the probability space on which {z(r): t0 3 r i t} is defined. It is known that L2(M) is complete (Rudin [R-l], p. 66), and since a closed subspace of a complete space is complete (Simmons [8-4], p. 73), L2(z;t) is complete. Hence every convergent sequence of measurable square integrable func- tions in L2(z;t) converges to a measurable square in- tegrable function in L2(z;t). Since Xi(t) is the limit in the mean square norm of a sequence of such functions, xi(t) is measurable and square integrable. Hence x(t) is measurable and square integrable. Q E D It is next necessary to review the concept of weak convergence which will play a key role in the development. A sequence of random vectors xn in a separable Hilbert space H is said to converge weakly to the vector x in H, denoted xn E x, if for every y in H lim E{yTx } = E{yTx}. (3—21) n+m n A key lemma for determining weak convergence is the follow- ing (Parzen [P-3], p. 272). Lemma 3-3: A sequence of random vectors xn defined as members of some separable Hilbert space of random vectors converges weakly to a vector x if and only if the following two conditions hold: (i) For some constant M, 31 E{xTx } < M, n = 1,2,... . (3-22) nn ‘— (ii) For some family of vectors {k(t), t e T} which spans the Hilbert space spanned by {xn, n = 1,2,...} . . T 11m E{xnk(t)} (3-23) n+oo exists for every t e T. The following theorem and corollary due to Halyo and McAlpine [H-l] are useful in developing the dynamic behavior of x(t). The proof of the theorem as given by Halyo and McAlpine is presented since it will be necessary in further developments to refer to it. Except for changes in notation, the following theorem, proof, and corollary are directly from [H-l]. The separability of the spaces L2(z;t) and it's subspaces L2(x;t) and L2(x(t)), although not explicitly mentioned, is necessary to allow the use of weak convergence. Theorem 3—2: Let {x(t): to i t} and {z(t): t0 1 t} be second order measurable vector random processes with zero mean such that L2(x;t) is a subspace of L2(z;t) for each avxz(t,s) '3' at is continuous on {(t,s):tO i s 3 t}, and {E|([xi(t) - xi(s)]|L2(z;s))|2}35 : BIt-sl, s i t (3-24) then x(t) can be written as t x(t) = x(to) + J W(S)ds + U(t), t g t (3-25) t O 32 where U(t) is a process with orthogonal increments such that U(t) - U(s) is uncorrelated to z(r) whenever r i s i t, and U(to) vanishes; w(t) and U(t) belong to L2(z;t) and BVXZ(t,s) sz(t,s) = at , s i t . (3-26) Proof: Let t be the smaller of t1 and t2 and s i t; then IL3 zT (3-27) x(tl)-X(t2) T E{[ tl-tz :12 (3)} E x(t1)-x(t2) t1‘t2 sz(tl’s) - sz(t2’s) = . (3-28) t1 ' t2 The right side of (3-28) converges as t2 tends to t1; x(tl) - x(t2) tl-tz hence by lemma 3-3,( |L3(z;t) converges weakly to, say, w(t1) in L2(z;tl) since its norm is bounded by B. Define the process U(t) as t U(t) = x(t) - x(t ) - I w(s)ds, c < t. (3-29) 0 t o — 0 First note that, using the definition of z(t), (3-26) follows from (3-28). Now, t T T J E{w(r)z (t')}dr = E{[x(t)-x(s)]z (t')}, t' i s i t. (3-30) 3 . Interchanging integration and expectation, and using the properties of projections, it can be shown that 33 t (I wdr|L§> = <1x-x1|L3>. s 3 t (3-31) 3 <1u - U1IL3 = o, s g t. (3-32) Since U(t) belongs to L2(z;t), U(t) — U(s) is orthogonal to L2(z;s); and hence to U(t') - U(s'), s' i t' i 3. Therefore U(t) has orthogonal (i.e., un- correlated) increments. Also note that since w(s) be- longs to L2(z;s), U(t) - U(s) is uncorrelated to L2(w;s), s i t, and x(t ). 0 Q.E.D. Corollary: Theorem 3—2 remains valid if (3-24) is replaced by the condition that E{Ixi(t)|2} is bounded on bounded intervals and vxz = p(t)q(8) <3-33) where p(t) is an n X 2 matrix function of t alone and q(s) is an A X r matrix function of 3 alone. Equation (3-25) will be the basis of the dynamic representation of x(t). Note that the continuity of 3sz(t,s) 3t Theorem 3—2 are new assumptions on x(t) and 2(8). The 8Vx2(t,s) at the desired representation to exist. Equation (3-24) and equation (3—24) among the hypotheses of continuity of is a restriction necessary for can be replaced by the separability of sz(t,s) and the boundedness on bounded intervals of E{Ixi(t)I2}, and 34 these conditions will be satisfied automatically in the applications to be encountered later. The next task, which extends the work of Halyo and McAlpine in this context and plays the crucial role in the develOpment of the representation theory, will be to show that w(t) is, in fact, an element of L3(x(t)). For this the following lemma is necessary. Lemma 3-4: Let {x(t):tO : t} and {z(t):to : t} be as in theorem 3-2. Then x(t )-X(t ) ( l 2 |L3(z;t1)) e L3|L3> = (jil cj9 <3-37) _ g C (( m . m . — . y(T.)|L2(z,t2))lL2(z,tl)) j=l J J g m = jil Cj(Y(Tj)IL2(ZEtl))- (3‘38) The last equality follows from the fact that L2(Z;tl) C L2(z;t2) for tl i t2. Continuing, rap n (x(t2)|L2(Z;tl)) Cj9(let1) j 1 ll lib/12° C.A .,t t 3-39 j 1 J (TJ 1)X( l) ( ) and the last sum is an element in L3(x(t1)).. This suffices to prove the lemma. Q.E.D. Theorem 3-3: Let w(tl) be as defined in theorem 3-2. Then w(tl) e L3(x(cl)). 36 Proof: Note that in the proof of theorem 3-2 w(tl) is x(t1)"‘(t2)an(z.t ) tl-t2 2 ’ l in L3(z;tl) as t2 + tl by lemma 3-3 since x(t )-x(t ) E{( 'lzl'tZ 2 ILr2'(z;tl)) zT(s)} exists for < < < ‘ tO _ s _ tl _ t2, and Since T ‘x(t )-x(t ) 3(Z;t1)) ( il'tZ 2 |L3(z;tl)) is defined as the weak limit of ( IL E x(tl)-x(t2) ( t1't2 bounded by virtue of equation (3-24). That i s 2 t1 1 t x(t )-x(t ) E ( l 2 |L3(z;t1))zT(s) exists for t tl-t2 o 2 x(tl)-x(t2) n T implies that E t -t |L2(z;tl) v (5) exists for l 2 any v(s) e L3(z;s), and in particular for v(s) e L3(x(s)). 5 x(t )-x(t ) . _ l 2 n _ T . Sett1ng s — t1, E ( tl’tz |L2(z,tl))v (t1) ex1sts for all v(tl) e L3(x(t1)). Since by lemma 3-4 both factors in the expected value are elements of L3(x(tl)) the weak limit must exist as an element of L3(x(tl)). . n That is, w(tl) e L2(x(t1)). Q.E.D. Corollary: Let x(t) and w(t) be as defined above. Then w(t) = F(t)x(t), t g t. (3-40) 0 Proof: Since w(t) and x(t) are both elements of 37 L2(X(t)). W(t) = l.i.m. an(t) (3-41) 11+“) for some sequence of n‘X n matrices Fn' Since the limit exists, the sequence of real valued matrices con— verges to a real valued matrix F(t) and equation (3-40) is justified. Q E D Combining equations (3-25) and (3-40) yields t x(t) = x(to) + I F(s)x(s)ds + U(t), t0 3 t. (3-42) to The next item to be considered is how to relate U(t) to z(t). By the assumption that z(t) is a quadratic mean continuous purely nondeterministic process, it can be expressed as (Cramer [C-2]) t z(t) = I K(t,s)dv(s) (3-43) t o , where the k X 1 vector“ v(s) is the innovation pro- cess for z(t). The innovation process has the important property that (Kallianpur and Mandrekar [K-5]) L2(z;t) = L2(v;t). (3-44) Hence U(t) e L3(z;t) implies U(t) e L3(v;t). Thus U(t) can be expressed in terms of the innovation process z(t) by (Ivkovich and Rozanov [I-l]) A . . v(s) may be of countably infinite dimen31on. 38 t U(t) = I G(t,s)dv(s). (3-45) I: o The orthogonal increment property of U(t) implies it is a wide-sense Martingale process (Doob [D-2], Lee and Mandrekar [L-l]). Since U(t) is wide-sense Martingale by theorem 3.3 of Lee and Mandrekar [L-l] it is seen that G(t,s) is independent of t. Hence t U(t) = I G(s)dv(s). (3-46) t 0 Thus, combining equations (3-42) and (3-46) t t x(t) = x(to) + J F(s)x(s)ds + J G(s)dv(s) (3-47) to to where v(s) is the innovation process for z(t). Equa- tion (3-47) can be written in the form dx(t) = F(t)x(t)dt + G(t)dv(t). (3-48) Equations (3-47) and (3—48) are regarded as equivalent representations and have the solution (Mandrekar and Salehi [M-l]) t x(t) = ¢(t.co>xg <3-49) t o where ¢(t,r) is the transition matrix defined by F(t). Equations (3-11) and (3-48) now yield the repre- sentation dx(t) F(t)x(t)dt + G(t)dv(t) (3-50) Y(t) A(t.S)X(S) + U(t) (3-51) 39 Setting H(t) = A(t,t) (3-52) and letting s = t in equation (3-51), the desired re- presentation dx(t) F(t)x(t)dt + G(t)dv(t) (3-53) Y(t) H(t)X(t) + U(t) (3-54) is achieved. The development to this point is summarized by the following theorem. Theorem 3-4: Let y(t) and z(t) be zero—mean real valued second-order m X l and r X l variate stochastic pro- cesses reSpectively and let y(t) be measurable and z(t) be continuous in quadratic mean and purely non- deterministic. Form the predictor space L2(y;t|s) be projecting the components of y(t) onto L2(z;s) for t0 1 s i t. Assume that L2(y;t|s) has an n dimensional BVXZ(t,s) 8t tO < s i t, and either sz(t,s) is separable and basis x(s). Then if is continuous for E{|xi(t)|2} is bounded on bounded intervals, or iv E{|(xi(t) - xi(S)|L2(Z;s))|2}2 representation given by equation (3—53) and (3-54) can be < BIt-sl for 's i t, the obtained. The next point that should be mentioned is the minimality of the dimension of the representation. The 4O representation is minimal in the sense that since x(t) is a basis of the predictor space, there does not exist a vector of smaller dimension than x(t) that contains all the information in the predictor space. To close this section it seems appropriate to comment briefly on some of the assumptions used and their roles in the derivations. The requirements that z(t) be continuous in quadratic mean and purely nondeterministic are to ensure the existence of it's innovation process. Continuity in quadratic mean may be replaced by the weaker condition that for all t the limit z(t+o) exists (Cramer [C-2]). The requirement that sz(t,s) be separable, equation (3-25), and the necessity of 3sz(t,s) at structure to be derived will exist. In the applications being continuous are necessary that the system to the realization problem Vyz(t,s) will be seen to be equal to a deterministic impulse response matrix. It is well known (Kalman [K~3]) that an impulse response matrix has a system realization if and only if it is separable. Since Vyz(t,s) = H(t)VXz(t,s) (3-53) the requirement that sz(t,s) be separable is a natural one. Perhaps the most critical assumption was that the predictor space have a finite dimensional basis of constant 41 «dimension n. Both the finite dimensionality and constant dimensionality are worthy of comment. The finite dimen- sionality of L2(y;t|s) played no essential role in deriving equation (3-11). The derivation will go through ‘with. x(t) of countably infinite dimension. After that, ‘mathematical intricacies begin to arise if x(t) is allowed to have infinite dimension, and since the applica— tions will be for finite dimensional systems only the restrictions to the finite case was felt to be justified. The question of constant dimension is a tricky one. The eventual concern of the thesis is with the realization problem and, as observed by Kalman (Kalman, Falb, and Arbib [K-4]), to develop a really comprehensive theory of realization the dimension of the state space should be allowed to vary with time. However for the types of systems to be considered in this thesis considerations of this nature need not arise. For more general time-varying systems than the analytic systems considered in this paper, however, the question of constant dimensionality is a significant one. Finally it should be mentioned that the repre- sentation theory developed here is somewhat more general than will be necessary for applications. To be specific z(t) will always be a Wiener process. The derivation of the representation is not simplified significantly by making this assumption on z(t) however. 42 2. Special Cases In using the representation deveIOped in part 1 to solve the realizationproblem the input process will be an r X 1 vector zero-mean stationary white noise process. Since white noise is neither measurable nor second-order the theory developed in part 1 is not immediately applic— able. The first consideration here will be to resolve this problem. Then some results pertaining to stationary pro- cesses will be developed. Given a zero mean white Gaussian process n(t) set t z(t) = I n(s)ds. (3-56) I: 0 Then (3-56) is actually the innovation representation (Kailath [K-l]) for the Wiener process z(t) which is clearly a second—order purely nondeterministic process continuous in quadratic mean. Hence the theory of part 1 applies. The vector dv(s) (see equation (3-43)) can be replaced by n(s)ds with the interpretation dz(s) = n(s)ds. (3—57) Equations (3-53) and (3-54) become dx(t) F(t)x(t)dt + G(t)n(t)dt (3-58) H(t)x(t) + u(t) (3-59) y(t) and equation (3-49) becomes 43 t x(t) = ¢(t,to)x(to) + [ ¢(t,s)G(s)n(s)ds. (3-60) t o In this manner the theory of part 1 is extended to what will be called the white noise case. Next consider the situation wherein y(t) and z(t) are zero mean real valued second order stationary processes with stationary cross-covariance Vyz(t—s) = E{y(t)zT(s)}, (3-61) where y(t) and z(t) are, as before, m X l and r X l vectors respectively. Furthermore, z(t) is a Wiener process generated by stationary white noise n(t) with unit covariance and y(t) is, as usual, measurable. It will be shown that the representation corresponding to this situation takes the form of a stationary system dx(t) Fx(t)dt + Gn(t)dt (3-62) y(t) Hx(t) + u(t) 1 (3-63) where F, G, and H are constant matrices. Equations (3-58) and (3-59) are still valid and the derivation starts from them. Let t 3 s. Then t Vyz(t-s) = E [H(t)®(t,to)x(to) + [t W(t,r)n(r)dr]zT(s) 0 where (3-64) W(C,T) = H(t)¢(t,T)G(T). (3-65) The usual assumption that x(to) is independent of z(s) is made and since both are zero mean processes (3-64) 44 reduces to t V (t-s) = I W(t,T)E{n(T)zT(S)}dT. (3-66) yz t By assumption z(s) = J n(r)dr . (3—67) Hence to to s = { W(t,r)dr. (3-68) it 0 Now from (3-68) 8V (t-s) YZ _ ' as _ -Vyz(t-S) = -W(t,s). (3-69) Hence W(t,s) = W(t-s). (3-70) Recall also the assumption that Vyz(t,s) is separable, that is Vyz(t-s) = a(t)8(s). ‘ (3-71) This implies W(t-s) is also separable, W(t-s) = -a(t)B(s). (3-72) 45 Since W(t-s) is a realizable stationary impulse response matrix it is analytic in both t and s for t Z 5. Hence both u(t) and 8(5) must be analytic functions. Since u(t) and 8(t) are continuously differentiable then by lemma 5 of Youla [Y-l] W(t-s) has a constant coefficient realization. That is, W(t-s) is of the form W(t-s) = HeF(t'S)G . Thus the representation dx(t) Fx(t)dt + Gn(t)dt (3-74) y(t) Hx(t) + u(t) (3-75) is achieved. This concludes the discussion of the representation problem. CHAPTER FOUR STOCHASTIC APPROACH TO THE MINIMAL REALIZATION PROBLEM IN THE STATIONARY CASE This chapter will be concerned with the realiza- tion problem for stationary systems. The stationary realization problem may be stated as follows. There is given a stationary impulse response or weighting pattern matrix W(t-s) and the objective is to find matrices F, G, and H such that W(t-s) = HeF(t‘S)C (4-1) where F is of minimal dimension. The idea used in the predictor space approach is to define a stochastic system having input-output covariance Vyz(t-s) equal to W(t-s). Then, if a basis of the predictor space can be found, the results of the previous chapter can be used to write down a Markovian representation determining a minimal realiza- tion for the impulse response matrix W(t-s). The theory developed in the first part of the chapter will be used in the rest of the chapter to examine two important realization algorithms. The power of the pre- dictor space approach is demonstrated by the ease with which these algorithms are explained in this context. 46 47 I. General Theory Consider the m X r matrix W(t,T) on which are made the following assumptions: A-l) W(t,r) is stationary in the sense that W(t+o,r+o) = W(t,T) for all t,T, and o. A-2) W(t,T) is an impulse response matrix, implying that W(t,r) is separable and equals 0 for t < T. t A-3) I W(C,T)TW(C,T)dT < m, t0 3 t < m. t O The usual abuse of notation will prevail and W(t,T) will be written as W(t-T) with t i T. Let u(t) be an r—dimensional stationary Gaussian white noise with E{n(t)} = o , E{n(t)nT(S)} = I-6(t-s). <4-1) Denote by y(t) the m-dimensional output of the stochastic system defined by t y(t) = J W(t-T)n(r)dr (4-2) to when the stochastic integral exists. Define the Wiener process z(t) by t z(t) = I n(s)ds. (4-3) to Then q(t) is the innovation process for z(t) and equa- tion (4-2) can be written 48 t y(t) = I W(t-T)dZ(T). (4-4) I: O The goal here is to justify using the representa- tion theory deveIOped in chapter 3. To do that it will be necessary to verify that the hypotheses of theorem 3-4 are satisfied. That y(t) and z(t) are both zero mean and second order is clear. Furthermore, as noted in chapter 3, z(t) is continuous in quadratic mean as well as purely nondeterministic. Since W(t-T) satisfies assumption A—3 the stochastic integral in equation (4-2), or equivalently equation (4-4), exists and y(t) is quadratic mean con- tinuous and measurable (Wong [W-3], pp. 145-146). First form the predictors y(tls) by 9(tIS) (y(t)|L?(Z;S)) t (J wdz> t O s I W(t-T)dz(T). (4‘5) to The last equality follows from the fact that for T > s > = o <4-6) by virtue of the orthogonal increment property of z(t). Forming the predictor space L2(y;tls) suppose that it has a finite dimensional basis x(s). Then from the corollary to theorem 3-1 49 y(t) = A(t,s)x(s) + u(t). (4-7) Since by equations (4-4) and (4-5) T(tlt) = y(t) (4—8) the term u(t) defined by y(tlt) - y(t) is not present when t = s and equation (4—7) becomes y(t) = H(t)X(t). (4-9) Equation (4-9) will be useful in showing that the cross covariance sz(t,s) must be separable. First it is necessary to compute Vyz(t,s). Using equation (4—4) and the properties of orthogonal increment processes (Wong [W-3], p. 144) for t i s i t PTO f Vyz(t,s) E{J W(t-T)dz(T)ZT(S)} t O n s dzT(T)} E{J W(t-r)dz(r)J t t O O S J W(t-T)dT. (440) to By hypothesis W(t-T) is separable in the form W(t-T) = P(t)q(I) (4-11) for some m.x 2 matrix function p(t) and A X r matrix function q(r). Hence S Vyz(t,s) = p(t)[ q(T)dT (4'12) to 50 so Vyz(t,s) is separable. Since Vyz(t,s) = H(t)sz(t,s) (4-13) it is clear that sz(t,s) must also be separable. BVXZ(t,s) It is next necessary to show that at must be continuous for t0 1 s i t. Combining equations (4-10) and (4-13) 3 H(t)V (t,s) = I W(t-T)dT (4-14) xz t o and the right hand side of the equation is analytic for t i T. From the discussion of the stationary case at the end of chapter 3 the matrix output coefficient matrix H(t) is time invariant and can be denoted by simply H. Thus 3 HVXZ(t,s) = I W(t-T)dT (4-15) t 0 implies sz(t,s) is analytic and hence possesses con- tinuous derivatives of any order for t0 2 s i t. Finally to show that E{xi(t)2} for each component xi(t) of x(t) is bounded on bounded intervals the fact that in the case of a stationary system a minimal realiza- tion is observable is used (Silverman [S-l]). Hence the matrix H must have full rank which implies that the digaonal elements of HTH cannot contain a zero. Since E{xT(t)HTHx(t)} = E{yT(t)y(t)} < w (4-16) 51 it follows that for each i, i = 1,2,...,n, 2 E{xi(t)} < m. (4-17) Hence the desired property that E{lxi(t)l} is bounded on bounded intervals must hold. Thus the necessary hypotheses of theorem 3-4 will all be satisfied if a finite basis x(s) of the predictor space L2(y;tls) can be found. The search for such a basis is now begun. Set t - T = A. Since W(A) is analytic it has a Taylor series expansion w k _ A ” W(A) — E Yk ET’ A Z 0. (4-18) k—O The coefficients Yk’ k = O,1,2,... are called the Markov parameters (Ho and Kalman [H-2]) and are given by = de(A) —-—E—— , k = O,1,2,... . (4-19) dA A=O Using equation (4-18) the predictors can be expressed as s y(tls) = [t (k: Yk $E§%l—>dz. to g s g t <4-20> 0 Lemma 4-1: m s y(tls) = Z ka (t-T) dz(T), to i s i t. (4-21) k=o Jt ' 0 Proof: The equality in equation (4-21) is in the sense of mean square equality. Define 52 s n (t-T)k S = I [ 2 YR ——ET——]dz(T). (4-22) tO k=o ° From a criterion by Doob ([D-2], p. 429)timasequence of stochastic integrals Sn converges in the mean to y(tls) if and only if the integrands converge in the mean to W(t-T). This criterion is clearly satisfied in this case and since the lemma is proved. Theorem 4-1: The pth quadratic mean derivative of y(tls) with respect to t exists and is given by A m Y S EEX§E%§2 = kg (E:EIT [ (t-T)k-de(T) (4-23) t =p ' to for t0 1 s i t. Proof: . . apy(t|s) . . The PER der1vat1ve ex15ts 1n a mean at 2 8 pVy(t1IS,t2|S) square sense if exists at the point P P (t,t) (Papoulis [P-2], p. 317) where V9(t1|s, tzls) = E{y(tl|s)y(t2|s)T}. (4-24) Using the representation given by equation (4-5) and the orthogonal increment property of z(t) 53 s s VA(t1|s,t2|s) = E{[[ W(tl-T)dZ(T)][J W(tz-T)dZ(T)]T} (4-25) V t t o o S T = J W(tl-T)W (t2‘T)dT, to < S < t. (4‘26) t _ _ 0 Since W(A) is analytic it is clear from (4—26) that mixed partials of V9(tl|s,tzls) of any order exist for all tl Thus QEELELEZ 2' atp all positive integers p. Since the only portion of and t exists in a mean square sense for s I W(t-T)dz(T) that depends on t is the deterministic to portion of the integrand W(t-T), it is clear that for < tO _ s i t pA s §_y(tls) z [ a _W fir Y k=o y . = 0 (4-35) m k since k2 Vgg’k)(t|s,t|s) %T is just the Taylor series :0 ° for Véo’£)(t+1|s,tls). m kA k Thus y(t+T|s) - Z §_Z£El§l %T is orthogonal to k=o 3t ' 2 a (tgs) for all 2 and so it is orthogonal to 3t 00 k k X §_2£El§l T . Therefore k‘ kl —0 at 57 00 k k 00 jA j E [9 k1 i . 3 y(tls) symbol k 18 the mith component of k Bti Bti The Hilbert space spanned by F(tls), denoted L2(F(t|s)), is the closure of S{F(t|s)} with respect to the mean ' square norm. The variables t and s are regarded as fixed when considering L2(F(t|s)). Theorem 4-3: L2(F(t|s)) = L2(y;t|s), t i s i t . (4-41) 0 Proof: Recall that when considering the space L2(y;t|s), s is regarded as fixed while t varies over all values greater than or equal to s. This is in contrast to the space L2(F(t|s)) where both t and s are regarded as fixed. Thus the s in L2(T(t|s)) is the same as that in L2(y;t|s), but the use of the symbol t must be correctly interpreted, That L2(F(t|s)) c L2(y;tls) is trivial since for any stochastic process a(t) e L2(y;t|s) it follows automatically that figét) e L2(y;tls) whenever the derivative exists in the mean square sense. (Wong [W-3], p. 79). Hence S{F(t|s)} c L2(y;tls). (4—42) Since L2(y;t|s) is closed, taking the closure of the left hand side yields L2(F(t|s)) c L2(y;t|s). (4-43) Now let a e L2(y;t|s). Then by definition a = l.i.m. a (4-44) p+oo p where, for each p, ap is a linear combination of com- ponents of y(t ls) with t > s. This can be Pi Pi expressed as N P a = z c 9 (t ls) , - P 1:1 91 pji pi (4 45) where C ,...,C are real constants, t 3 s for all pl pn i pi, and ypji(tpi|s) is the pjith component of 9(tp [8). By the corollary to theorem 4-2, for s < t i and s < t , i m aky (tls) (t - wk & (t Is) = 2 pi“ pi <4-46) pji Pi k=o Btk 1‘! Hence N sky k sN = z Jkk_ 1k, (4-48) k=o 8t ' Since SN is a linear combination of elements of F(tls), SN 6 L2(F(t|s)). Since L2(F(t|s)) is closed, lim SN 6 L2(F(t|s)). Therefore, N—mo m 3k? . (tIS) (tp -t) p . Jk 1 2 e L (F(tls)). (4-49) k=o atk RT 2 k Thus ap e L2(F(t|s)) since ap is just a finite linear combination of such elements. Invoking once again the fact that L2(F(t|s)) is closed, l.i.m. ap is an element p+oo of L2(F(tls)). Thus L2<9;tls> c L2). , <4-so> Corollary: F(tls) Spans L2(y;t|s) for t i s i t. Proof: The corollary is immediate from the equality 61 L2(y;t|s) = L2(F(tls)). (4-51) Q.E.D. For convenient reference equation (4-28) is re- produced again here: " w r- S (t-T)O g F 7‘! Y0 Y1 Y2 . . . [t —~C—)—T— dZ(T) y(tlS) o s l (t-T) _ 3 (t S) _ Y1 Y2 Y3 [t "1! dz(T) - ~X§EL—— — F(tls). o s 2 2 (t-T) a (t S) o L. . . J B . J L. . J It has been established by the corollary to theorem 4—3 that F(tls) contains among its components a basis for the space L2(y;t|s). The basis, though, is as yet unidentified. However, if a finite dimensional basis for the linear space spanned by the rows of the Hankel matrix is found, define the vector x(tls) as the corresponding elements of the predictor space where the correspondence is realized by taking the inner product of the row vectors) of the Hankel matrix forming the basis and the vector Z(t,s) defined by S O S Z(t,s) = [ £E%}l— dzT(T), J £E%}l— dZT(T),... . (4-52) t . O O * The row vectors must be distinguished from the block row vectors such as [Yk’Yk+l’Yk+2""]' 62 Then x(tls) is the basis for L2(y;tls). Essentially the idea is that if the predictor space has a finite dimensional basis, then it can be identified by examining the Hankel matrix. Theorem 4-4: Suppose the linear Space spanned by the rows of the Hankel matrix has a basis consisting of a finite number of (scalar) row vectors of the matrix. Define the vector x(tls) as that vector whose components are given by taking the inner product of the rows of the Hankel matrix forming the basis and the vector Z(t,s). Then x(tls) is a basis for the predictor space. By equation (4-28) a set of components of F(tls) is linearly independent if and only if corresponding rows of the Hankel matrix are. Thus by the hypothesis on the Hankel matrix the components of the vector x(tls) are a maximal linearly independent subset of the components of F(tls) and hence span L2(F(tls)). Then by the corollary to theorem 4-3 the components of x(tls) are a linearly independent set of random variables spanning L2(y;t|s). Therefore, by definition, x(tls) is a basis of the predictor space. Q.E.D. The next thing to show is that a finite dimen- sional basis of the linear space spanned by the rows of the Soo can be found. It has been shown by Silverman 63 [8-1] that for a stationary system the rank of the Hankel matrix is equal to the dimension of a minimal realization of the system if the transfer function of the impulse response matrix of the system is a strictly proper rational matrix.* He also showed that for a stationary system, this is a necessary and sufficient condition for a realization to exist. Since the assumption of separability made at the beginning of the section is also sufficient to guarantee a realization for the stationary impulse response matrix W(t-s), this implies that the Hankel matrix has finite rank. Hence a basis for the linear space spanned by the row vectors can be found and the x(tls) so defined is of the dimension of a minimal realization of W(t-s). These results are summarized by the following theorem. Theorem 4-5: Let W(t-s) be a stationary separable impulse response matrix satisfying equation (4-2). Then the Hankel matrix defined by W(t—s) has the prOperty that a finite dimensional basis of the space spanned by its row vectors can be found. Furthermore, the vector x(tls) defined by taking the inner product of these row vectors and Z(t,s) is the same dimension as a minimal realization of W(t-s). * . . G(s) is a strictly proper rational matrix 1f the numerator of every element of G(s) is of lower degree than its denominator. 64 Hence, a basis for the predictor space L2(y;tls) can be found by examining the rows of the Hankel matrix SN for some N. The basis, however, is a function of two variables 3 and_’t in apparent contrast to chapter 3 where the basis of L2(y;tls) was considered as a function of 3 alone. The vector x(tls), moreover,is a basis of L2(y;t|s) for any value of t greater than or equal to s. In this sense x(tls) is independent of t in its role as a basis. The problem arises though that the derivatives in the representations of chapter 3, for instance in the representation given by equations (3-74) and (3-75), are total derivatives. That is, it is nec- essary to consider QW=§¥J§1+MH§1£E . (4-53) S S at ds This total derivative is not defined because the term 3% is not defined. In general there is no relationShip between t and s. If, however, consideration is explicitly restricted to the case t = s then —— = 1 (a—sw and the total derivative is defined. Furthermore by restricting consideration to the case t = s a basis depending only on s has been found. Define x(s) = x(sls) = x(tls)|t=S . (4—55) 65 As discussed at the beginning of the section the criteria of theorem 3-4 are satisfied since now it has been shown that a finite basis x(s). of the predictor space L2(y;t|s) can be found. The representation given by equations (3-74) and (3-75) is justified, dx(s) Fx(s)ds + Gdz(s) (4-56) y(s) Hx(s). (4-57) No noise term u(t) appears in equation (4-57) since 9(SIS) = y(s) (4-58) and u(t) was defined by equation (3-12) as u(t) = y(t) - y(tls). The solution to equation (4-56) is S x(s) = J eF(S'r)cdz(r) _ (4-59) to and s y(s) = H I eF(S-r)Gdz(r). (4-60) to Theorem 4-6: The matrices F, G, and H occurring in equa- tions (4—53) and (4-54) determine a minimal realization for the impulse response matrix W(t-s). Let tO < T < t. Then using equation (4-57) and the representation 66 n(t)dt = dz(t), (4-61) it follows that t Vyn(t,1) = HE J eF(t—r)Gn(r)drnT(T) (4—62) to t = H I eF(t-r)G6(r-T)dr t O = HeF(t-T)G. (4-63) '4 7\ Furthermore, for a symmetric delta function, by equation <4-2) t E J W(t—r)n(r)drnT(T) to Vyn(t,T) t T I W(t-r)E{n(r)n (1)}dr o t = I W(t-r)6(r-I)dr to Efigill-,'r= tO or T = t = W(t-T), to < T < t . (4-64) 0 , T i [to,t] Hence, for tO < T < t W(t-T) = HeF(t‘T)G. . (4-65) * The symmetric delta function is defined by b %f(s-o), s = b J f(t)6(t-s)dt = %f(s+°): S = 0 . a $[f(s-o) + f(s+o)], a < sI< b O, s f [a,b] 67 That equation (4-65) also holds at T = t0 and T = t follows from the analyticity of the functions involved. Since analytic functions defined on an interval of the real line have a unique analytic extension to the real line, and analytic functions equal on an interval of the real line are equal on their whole domain of analyticity, (Olmstead [0-1], p. 268), equation (4-65) can be extended to hold for all t and T. In particular it holds for T = t and T = to. The minimality of the realization ‘follows from theorem 4-5. Q.E.D. Thus, the Markovian representation developed in chapter 3 yields a solution to the deterministic realiza~ tion problem for the stationary case. The following theorem is interesting in that it shows explicitly that the com- ponents of x(s) generate the predictor space. Theorem 4-7: F(t-s) y(tls) = He x(s), t i s i t. (4-66) 0 Proof: Using equations (4-5) and (4-59), and theorem 4-6, s y(tls) J W(t-r)dz(r) to H [8 eF(t-r)Gdz(r) t o = HeF(t'S)JS eF(S-r)Gdz(r) to = HeF(t‘S)x(s). Q.E.D. 68 This completes the results for this section. It is to be emphasized that the material developed here should not be regarded as an algorithm, but as a theoretical setting which is useful in analyzing and understanding the realization problem and associated algorithms, as will be demonstrated in the next several sections. 2. Silverman's Algorithm In this section the predictor space approach will be used to analyze Silverman's realization algorithm for stationary systems (Silverman [S-2]). Akaike [A-l] has previously done this for the discrete-time case. The analysis for continuous time systems is considerably more complex. The analysis starts from the assumption that the dimension of the system is somehow known. Theorem 4-8: Suppose it is known that the impulse response matrix W(t-T) can be realized by a system of dimension n. Then the search for a basis of the predictor space can be limited to within the space spanned by'the components of " atn:l n-l T [9(cls>T,—$——|—3 g; ”T... 3 Wm”? . Proof: Since W(t-T) can be realized by a system of dimension n, theorem 4-6 can be used to represent the predictors as 69 s y(tls) = J HeF(t'r)Gdz(r) (4-67) t 0 where F is an n x n matrix. Then p S §_2i£l§l = I HFPeF(t-r>cdz(r). (4-68) atp tO If p = n, then the characteristic equation of the n x n matrix F, F“ = z A.FJ (4-69) can be used to write n n-l s . 3—2££L§l = Z AjI HFJeF(t-r)Gdz(r) atn j=o tO n-l j = z A. E—Yiiiil . (4-70) j=o J atJ Then 3n+1y(t|s) = “£1 1. 31+13 T ay- d”T d(t), .< 1>k'1 d Q_{t) ) dt 83'1ygtit)n J— LL at J J p 7 Y0 Y1 Yk-l Y1 Y2 Yk = l . . . . 2 . . . (4-81) LYj_1 Yj . . . Yj+k-2 Proof: The corollary is a direct consequence of theorem 4-9. Q.E.D. 73 At this point it is appropriate to present Silver- man's algorithm for stationary systems (Silverman [S-ll). The algorithm assumes that integers j and k can be found such that rank Sjk = rank S = n, R = 0,1,2, .. . (4-82) j+l k+2 The integers j and k are usually taken to be the smallest values such that equation (4-82) holds, but this is not necessary for the algorithm. The integer n de- fined by equation (4-82) is the dimension of the minimal realization of the impulse response matrix generating the Hankel matrix. Given Sjk’ let M denote the n X k - r sub- matrix formed from the first n independent rows of Sjk' Define the n x k - r submatrix M of Sj+l,k in the following way. Denote the scalar rows of the Hankel matrix by Ya , a = l,2,3,... . If Y2 is one of the rows selected for the matrix M, then the row YR (m is +m dimension of output) is selected to be the corresponding row in the matrix M. The following four submatrices of Sj+l k are then defined. A -- The nonsingular n x n matrix formed from the first n independent columns of M. K -— The n x n matrix occupying the same column positions in M as does A in M. Al -- The m x n matrix occupying the same column positions in Slk as does A in M. A2 -- The n x r matrix occupying the first r (dimension of input) columns of H. 74 Then setting F = XA'l, G = A2, H = AZA'l (4-83) yields a minimal realization of the impulse response matrix W(t-s) = ueF(t‘S)G. (4-84) After a preliminary lemma, equation (4-81) will be employed in examining the Silverman algorithm in the context of predictor space theory. Lemma 4-4: If equation (4-82) is satisfied by the Hankel matrix S. for some j,k, and n, then the matrix S Jk nn satisfies equation (4-82). Proof: Since the dimension of the realization is n, from the proof of theorem 4-8 and the discussion following its corollary it is clear that the n + lst block row of Sn+1 n is but a linear combination of the n block rows of Snn' Similarly the n+l,...,n+2 block columns of Sn+l,n+l are but linear combinations of the n block columns of Sn+1,n' Hence rank Snn = rank Sn+l,n = rank Sn+l,n+2’£ = O,1,2,... (4-85) Q.E.D. The only purpose of this lemma is to justify working with the specific matrix Snn in the following analysis. This 75 is desirable since, as it is known, the realization being sought is of dimension n or less. Theorem 4-8 guarantees the existence of a finite dimensional basis * within the vector [y(tlt)T, —2££iEl—,. .., M2(t£t>m at“ Now from equation (4-81) picking the matrix M of the first n linearly independent rows of Snn is clearly equivalent to picking the first n linearly T an— 12 independent elements of [y(tIt)T,3 g: t) ,..., %(§IC)T]T - Btn which amounts to picking a basis x(tlt) = x(t) for the predictor space. This is formalized in the following theorem. Theorem 4-10: If a Hankel matrix Snn satisfies equation (4-82) with rank Snn = n, then the corresponding vector [y(tlt), T—2££:£l—,. .. n-Ey(tlt)T ] . contains exactly at“ n linearly independent elements and these form a basis for the predictor space L2(y;tlt). Since Snn satisfies equation (4-82) the dimen- sion of the realization is n, theorem 4-8 applies, and the vector [y(tlt)T, —2£ELEl—,. V My(tL§)T]T . contains at“ * V The point is that although Silverman's algorithm may work with the Hankel matrix S'k’ j < n, k < n, and the analysis to be carried out here can still be done, it would be unjustified from the viewpoint of the theory developed in this work. 76 among its components a basis of the predictor space L2(9;tlt). Hence a maximal set of linearly independent components from this vector will constitute a basis. From equation (4T 81) the vector n—l T T [>(tlt)T, —2££LEL—, ...,3 gfflt) must have exactly at n linearly independent elements when the rank of S nn is n. Hence the n linearly independent elements from [%(tlt)T,—2£ELEl—,...,19(tlt)H] form a maximal linear Btn independent set and thus a basis for the predictor space. Q.E.D. Denoting the basis formed picking the first n linearly independent elements of . n-l [9(tlt)T, MEL“ . 3 y(tlt)1]T by x(tlt) = x(t), " at“ it is obvious from equation (4-81) that T n- 1 T = 2E{x(t)[n'r(t), _93_tlt:;,...,<-1)n-1 d dZnSEYD (4-86) If xi(t) is a component of x(t), then the element m rows beneath it in the vector ax (tlt) [9(tlt)T, —2£ELEl—,. .. ,W:Z(tLt)T]T is just* -—1§E———. Thus the rows of the matrix H correspond to the vector M and 30 3t *Note that once again the partial is understood as —§$£l£—- -§$£l§—48_t. A careful distinction must be made between —§$ELE— and 9§é%L§l . r4 = 23{§§§§—L31[nT (t), :1” Lit),...,<-1)n'1 d ExamEle: (4-76) explicitly for the case 234 r 91mm 92(t|t) 9m(tlt) 391(tlt) a‘t 39$(tlt) 3 t1 77 j = 73:40: | t) 9(tlt) / 8x(t|t) / at [LQLWZQLH,,nr(c)1,'d"%m ‘36 y11 Y12 y21 y22 yml ym2 ym.+1,1 ym+1,2 ' ym+2,1 ym+2,2 ° yn-m 1 yn-m,2 11— l dnT(t)] tn- 1] }. (4-87) Writing out the matrices and vectors of equation k = n yields ..... .3” LTfi] J ' W yl,(n-l)-r y2,(n-l)~r . Y0, . ,Yn-l Vm,(n-1);r Ym+1,(n-1).r ym+2,(n—l)or yn-m,(n-l)-I‘ .J 78 The are the actual elements of the Hankel matrix, yij not the Markov parameters. Suppose the two starred rows of the Hankel matrix are the rows of M. Then the corre- sponding starred elements of the vector of 9's are the elements of the basis of L2(y;tlt), i.e., x(tlt). The two checkmarked rows of the Hankel matrix are then the rows of H and the two corresponding checkmarked elements in the column of 9's are just §§%%LEL. Picking the n x n matrix A corresponds to selecting the first n independent T n-l T elements from the vector [nT(t),r—g-g—t-fl—EQ-H..,(~l)n-1 d n iti]. dtn‘ The resulting vector will be called v(t). Thus A = 2E{x(tlt)vT(t)}. (4-88) Similarly the n x n matrix K is just ’A‘ = 23{9—’%§it—) vT}. <4-89) The m x n matrix Al is given by A1 = 2E{y(tlt)vT(t)}. (4-90) Finally, the n x r matrix A2 is given by T A2 = 2E{x(tlt)n (t)}. (4-91) Now using equation (4-54) and the fact that y(t|t) = y(t) 79 AlA'l = 2E{y(t)vT(t)}[2E{x(t|t)vT)t)}]-1 (4-92) = HE{x(t|t)vT(t)}[E{x(t|t)vT(t)}]-l (4-93) = H . ‘ (4-94> Next, using equation (4-56) A2 2E{x(tlt)nT(t)} t = 2E{J eF(t-r)Gn(r)drnT(t)} (4-95) to t = 2 J eF(t-r)G6(r-t)dr to = 2-% eF(t-t)G = G . (4—96) Before computing the final product, K A-l, note that K involves the term axgi t). It will be necessary to interpret, and find an expression for, this term. Let xi(t|t) be a component of x(tlt). Then from equation (4-81), for some integer p i n < I > 3‘33“”) <4 97) x. t s = - 1 atp where 9j(t|s) is a component of y(tls). Since from equation (4-27) px s p _ 3 Z(CIS) = [ 3 W(t r) dz(r), (4-98) atp t atp o denoting the jth row of _————?—— by J at atp 80 s apw.(t-r) xi(t|s) = I J dz(r) (4-99) P to. at The following theorem can now be proven. Theorem 4-11: ELL—g: *3 = Fx(tlt) (4400) where §§§%L£l is interpreted as iiéEL§l|s=t' Proof: From equation (4-99) 3 apwj(t-r) xi(t|s) = I dz(r). 3 t0 3t Then the vector x(tls) can be represented as S x(tls) = J M(t-r)dz(r) (4-101) t 0 where M(t-r) is an n x r matrix whose ith row is of BPW.(t-r) the form J p for some p i n and some j between at l and r. Hence, 3 3§$§L§l = J éflfiilil dz(r), (4-102) at t at o and 3x(t|t) = ax(t|s) 3t at It ”- to s=t gE-r) dz(r). (4-103) 81 But from equation (4-59) t x(tlt) = [ eF(t-r)G dz(r). )t 0 Then identifying the integrand in equation (4-59) with that in equation (4—101) t F(t-r) gig—E-Lgl = { W— G dz(r) (4'104) 1t -o t F(t-r) = F I e G dz(r) t o = F x(tlt) Q.E.D. Now computing the final product A— A.1 = 2E{§1§—E—Lt—) VT(t)} [2E{x(tlt)vT(t))]-l (4-105‘) E{Fx(t| t)vT(t)} [E{x(t| t)vT(t)}] '1 FE{x(t|t)vT(tl}[F{x(t|t)vT(tij]'l = F. Thus, in the context of the predictor space approach to realization, Silverman's algorithm turns out to be a rather straightforward result. The same machinery that was used here will next be used to consider the Ho-Kalman algorithm. 3. The Ho-Kalman Algorithm In this section the predictor space approach is used to analyze the Ho-Kalman algorithm (Ho and Kalman [H-21). A summary of the algorithm is provided first. A neat and concise summary has been provided by Tether [T-l]. As with the Silverman algorithm it is necessary that equation (4-82) be satisfied. to be the case. by Let 0K Yi+j-2 This will be assumed Then the algorithm proceeds as follows 1) equation (4—82). 2) such that PS where I n Pick N1 0 O and N stant defined by equation (4-82). fin are given by such that The matrices 8 Find nonsingular square matrices is an n x n unit matrix and n NlNZ denote the shift operator defined YK+j-l K+j YK+i+j-2 (4-106) satisfies and Q (4-107) is the con- Un and 83 I _ n N _ , _ Un - 0 (Nlm x n), U — [In 0] (n x N2 r) (4 108) 3) Let E be the block matrix In EIn = [IIn Om ... Om] (m x le) (4-109) and Er be Er = [Ir 0r ... Or] (r x Nzr) (4-110) 4) Then F = fi [JTP o s Q JTJU (4-111) n NINZ n _ ~ T T G - Un[J P sN N Er] (4-112) 1 2 H = E [s Q JTJUn (4-113) m NINZ yields a minimal realization of the impulse response matrix generating the Hankel matrix SN N l 2 For the analysis the specific Hankel matrix Snn' will be used. As shown in the previous section Snn satisfies equation (4-82). Using equation (4-81), equation (4-107) from step 2 can be written ff y(tlt) ’ W 32(t|t) 3t 2P E< : n T(t) ~3____ )-n l d dtn-](-t) Q an’19(tlt) = . (4-114) 84 Partition the matrices P and Q into the form 31 r r P11 P12 Pln Q11 Q12 an P21 P22 P25 Q21 Q22 an P = . . Z , Q = . . P P P Q L n1 n2 nanmxnm Lin n2 anJnrxnr (4-115) where P.. is an m x m matrix and Q.. is an r x r 13 13 matrix. Then (4-114) becomes rr-n - I‘W 1-1 7 F’n . i-l 1 5 P11 3 yifit) Z <-1>1'1QT d V (t) i=1 at 1:1 11 dtl-I n 1'1 n . i-l 2 P21 3 yifit) Z (~1)1'10? d ? (t) 1‘1 3t i=1 ‘12 cit-1"I f 7 I O - n 2E< . ? = ° 0 O 2 P ' 9* 1 z (-l)1'lQT d 3—(t) i=1 n1 atl- i=1 1n dtl-l lKL A L .J J (4-116) Equation (4-116) will be shown to imply that the first n F . 1 g P . 31-1?(tLt) i=1 ll atl' n ai'l9(tlt) 3 P21 i-l i=1 3t elements of the vector 2 are linearly n P ai’ly(tjt) Z ni ati-l i=1 L 4 85 independent and hence form a basis for the predictor space. Theorem 4-12: Equation (4-116) holds only if the first n ele- 2 Plia —1 i=1 at1 n 31-19(t]t) 2 P21 i- i=l 8t ments of the vector I are linearly Z Pni i-l lel at J independent. , n . 1 Proof: Denote the scalar components of 2 Pl' y§t%t) i=1 1 iatl' by ui, i = l,...,n.m. n X P19(tLt) i=1 2i lati'1 ? P . 9(ELF) Li=1 “1 BtL- J Also denote the scalar components of Z ('1) 1-1 Q11'°'°' Z (‘1) i-I Qin i=1 dt i=1 dt by vi, i = 1,. .,nr. Then by equation (4-112) ij; j, k= l,...,n - 2E u jvk . (4-117) = n+l,...,nm Suppose the first n components of the vector [u1,u2,...,unm]T are not linearly independent. Then there 86 exists al,...,an, not all equal to zero, such that aiu = 0. (4-118) i HMS i 1 Suppose a. is nonzero. Then J n Oti U. = - Z a“ Ui . (4‘119) 3 i=1 j ia‘j Thus for k = 1,2, ..,n 2E{ujvk} = -2131 a; E{uivk} i#j n 0.1 = Z $51k i=1 3 i#j O. = -2 61: , (4-120) j Hence, for equation (4-117) to hold Gk must be zero for k = 1,2,...,nj,k # j. Then by equation (4-118) . . = 0. 4-121 aJuJ ( ) Since E . . = 1 4-122 {anJ} ( ) it must be that aj equals zero, contradicting the assump- tion of linear dependence. Thus the theorem is proved. Q.E.D. Corollary: The first n components of the vector 87 form a basis for the predictor space. From theorem 4-10 the vector T -l T [F(tlt)T. §2§%i£l~,...,any(tlt%] contains exactly n atn—l linearly independent elements which form a basis of the predictor space. Since the first n elements of the vector F -_ 1 n 81 19(tlt) 5 P11 1-1 1’1 at n i-l 2 P . 3 g(tLtl . 21 1-1 1=l at are linearly independent they are '-l § P . 31 ?(tlt) i=1 “1 atl' L ..1 simply a nonsingular linear transformation of the first basis and hence another basis. Q.E.D. The basis defined by the above corollary will, as usual, be denoted by x(tlt). * Next consider the quantity P(oSnn)Q. From * The symbol 0 is the shift operator defined above. 88 equation (4-106) Y1 Y2 Yn Y2 Y3 Yn+1 GS = : Z : nn Yn Yn+1 Y2n-l L. .J From equation (4-81) V r- 1 W ay"(t|t) 8t 32" t t) 3t T n—l T ° T -d - d 2E2 n (or—3&3)“ .,<-1>n 1 “,3th . dt an t t) atn LL H J Y1 Y2 Yn 1 Y2 Y3 Yn uYn Yn+l an-lJ Again partitioning the matrices P and Q (4—115) (4-123) (4-124) as in equation F VII 1 3 d 2 P11 (tit) Z ( 1)i 10:1 i=1 3t i=1 2 p oalygtlt) §_i-1QT d i=1 21 atl i=1 12 PoSan = 2B 4 n ix .2 Pnia vgtit) Z (-l)i-1QEndi L1=1 at .J Ll=l k Since r.n i 7 rhn z: p . LEE—IL) 2 p ,‘19(tJlt> 1=l 11 3t1 i=1 11 Btl = E. at ‘21P . a1 (t t) 2P dz(tlt) i=1 n1 at1 i=1 ni i-l L J L it is clear that the first: n elements of the vector 1 in 3i t t) =1 Pli atl g P . 31A t t) .= n1 1 L1 1 at J are simply §§%%i£l = §§§%i§l s=t hand side of equation (4-111). T‘I (4-125) (4-126) Now consider the right ~ T T _ UnEJ PoSanJ JUn — 1.? P1. 31A(t tf1 i=1 1 at1 [I 0] In 0 2E 2 i 1 d1 12T(t) n ( 1) Qfl O O . 1=l dtl g P . 31A t.t) K.Li=1 “1 atl' j n . i— l T I - d ., Z (_1)1 l 2 £t)Q1n n i=1 dt 0 r W F .‘ 3x(t[t) I = [In 0] 2E< [VT (t) 0]} “ (4-127) 0 - 0 k7 J J L .1 T . . where v (t) con51sts of the f1rst n scalar components of the l X r . n vector i- l nT n . i- “In T ”if§> Qil,..., z (-1)1'l d ”_ 7 33(tlt) 3t . o J 2E ' z (-1)1 1 d m ° i=1 dt (tit) L Btu-1 '11 o I i- 1 (ii-173T“)Q n n dtl'1 1“ L9 0 o 2E{9(tlt)vT(t)} (4-134) (4-135) 92 where vT(t) is again the first n components of the vector defined in (4-127). Continuing, T T Em[San J Jun H - 2E{x(t|t)v (t)} H-I (4-136) from equation (4-110). Thus the Ho-Kalman algorithm follows by a straight- forward analysis using the predictor space approach. As noted by Akaike [A-l] the basis of the algorithm is that it provides a linear transformation of the vector T n-IA T [y(tlt)T.§2§%LEl-,.. 3 yftLt)] such that the first ., n at 1 n components of the transformed vector are a basis of the predictor space. 4. Example An example is presented here to explicitly demonstrate the Silverman algorithm, the Ho-Kalman algorithm, and several aspects of the predictor space approach. The example starts with the system x1(t) 2 1 x1(t) 1 0 u1(t) = + ; x2(t) O 1 x2(t) O 2 u2(t) (t 1 1 t yl ) = Xl( ) (4_137) y2(t) 0 .1 X2(t) Let W(t-T) = Also, the Markov parameters = K-— YK HPG— 2 O k 2 2 0 eZ(t-T) 282(t-T) 2e Y are given by K 2k+l t-T (4-138) .(4-139) (4-l40) Part I: Silverman's algorithm. The Hankel matrix for the system (4-137) is N 2 4 8 8 16 16 32 4 32 2 o 2 o 2 o 2 o 3 2 iN I O b O 8 16 l6 32 32 64 64 O 2 O 2 O 2 O A 16 * 32 32 64 64 128 2 o 2 o 2 o 2 o 32—_32 64—128 256 2 o 2 o 2 o 2‘ o 16 32 64 64 236 512 o 2 o 2 o 2 o 2 o 2 o 32 64 64 128 256 256 512 l6 128 'ca c~l<3 thCD ca re a: b0 .6 I. .1 (4-141) ..a C" ,...: O" 64 128 156 000 N O -128 256 g 128 b.) N 512 128 512 1024 1024 o o . ’ l I ‘ J where the dashed lines partition the matrix into its Markov parameters It can be determined by inspec- Yk' tion that 94 rank 81 1 = rank 82,l+2 = 2, R = 0,1,2, .. . (4-142) , Hence, by equation (4-82) it can be determined that the dimension of a minimal realization of this system is two. The matrix M is just 811’ that is l 2 M = (4-143) 0 2 and _ 2 4 M = (4-144) 0 2 Clearly A, A1 and A2 all equal M and A equals M. Then, using equation (4-83), r 2 = , (4-145) 0 . L. 1 2 G = A2 = , (4-146) 0 2 and H = AlA 1 = 1. (4-147) The nonsingular transformation 1 l O 1 defines a strict equivalence between the system (F, G, H)2 and the system given by equations (4-137) and (4-138), (F, C, H)2. 95 Part II: The Ho—Kalman algorithm. Once again the algorithm starts with I ' 2 S - q 11 o 2 since 811 satisfies equation (4-82). It is first necessary to find square matrices P and Q such that PSIIQ = 12. (4-150) It is easily seen that l -l l O Q = , P = (4—151) 0 1/2 0 l satisfy this condition. In this particular example Un’ Un’ J, Em, and Er are all 12’ From equat1ons (4-111) - (4-113) F = PoSllQ 1 o 2 4 1 -1 0 1 o 2‘ 0 1/2 2 o = , (4-152) 0 1 G = P811 1 2 = (4-153) 0 2 and 96 CDI'-‘CDl---| This is the same realization as was arrived at via the Silverman algorithm. 0 l/2 In general these algorithms will yield equivalent, but not the same, realizations. Part III: The predictor space approach. (4-154) This part of the example illustrates the Silverman algorithm in the context of the predictor space approach. Equation (4-28) becomes for this example 1 2 I 2 4 I 4 8 8 I6 16 o 2 o 2 o 2 o 2 o .... ....I .... ._.I. _ ...JL- 2 4 4 8 8 16 16 32 32 o 2 I o 2 o 2 o 2 o 4 8 I 8 6 16 32 32 64 64 0 2J o 2 o 2 o 2 o 8 16 I16 32 32 64 64 128 128 o 2 I o 2 o 2 o 2 o L ‘ I I ' fyi ’1 22(tIS) 3Y1dr fi'1721—‘”1dT O f8 tO 01(T)dT n2(T)dT (t’T)n1(T)dT (t-T)2 . (5-1) The output of the stochastic system is defined by t y(t) = I W(t,T)n(T)dT <5-2) to 101 102 when the integral exists. Define the Wiener process z(t) by t z(t) =‘I n(S)ds (s-3> 0 so that n(t) is the innovation process for z(t) and (5-2) can be written I: y(t) = I W(t,T)dZ(T). (5-4) t O The goal here again is to justify using the re— presentation theory developed in chapter 3. To do this it is again necessary to verify that the hypotheses of theorem 3-4 are satisfied. That y(t) and z(t) are both zero mean real valued second-order processes is clear. Since z(t) is quadratic mean continuous and hence measurable and since W(t,r) satisfies A-3 the integral in equation (5-2) exists and y(t) is also quadratic mean continuous and measurable (Wong [W-3], pp. 145-146). Next form the predictors y(tls) by y(tls) lL§ t I W(t,r>> t O s I W(t,T)dz(T), (5-5) to where the last equality follows from the fact that 103 IL§> = o. T > s (s-e) by virtue of the orthogonal increment property of z(t). Forming the predictor space L2(y;t|s) suppose that a finite dimensional basis x(s) can be found for it. Then from theorem 3—1 ?(tIS) = A(t,S)X(S). (5-7) From equations (5-4), (5-5), and 2(tlt) = H(t)x(t) (5-8) it is the case that y(t) = H(t)X(t). (5-9) Equation (5-9) will be useful in showing that sz(t,s) must be separable. First it is necessary to compute Vyz(t,s). Using equation (5-4) and the orthogonal incre- ment property of z(t) for t0 i s i t t HII W(t,T)dz(T)zT(sI} t O t S EII W(t,T)dz(T) I dZT(T{} t t O O Vyz(t,s) S I W(t,T)dT. - <5-10) to By hypothesis A-2 and theorem 2-1 W(t,T) is separable in the form W(t,1) = F(t)Q(T)- (5-11) 104 Hence 8 vyz = p(t) I qdr <5-12) t 0 so Vyz(t,s) is separable. Since Vyz(t,s) = H(t)sz(t,s) (5-13) it is clear that V (t,s) is also separable. xz 3V (t,s) It is next necessary to show that th must be continuous for t i s i t. Combining equations (5-10) 0 and (5-13) 8 H(t)VXZ(t,s) = It W(t,T)dT (5-14) 0 and the right hand side of the equation is analytic for t Z s 1 T 3 to. Hence H(t) and sz(t,s) must be analytic also in this region and thus possess continuous derivatives of any order in this region. Finally it is necessary to show that the components of E{x(t)Tx(t)} must be bounded on bounded intervals. Since the components of H(t) are continuous and y(t) is a second-order process, from equation (5-9) x(t) must also be second-order, which automatically yields the desired property. All the prerequisites of theorem 3-4 have been established except for the existence of a finite basis for the predictor space. The search for that basis for the most part parallels the stationary case. Since W(C,T) 105 is analytic it has the Taylor series expansion in T, for t i T, co k - - k mm) = z MEL—Tl _ £151. to _<_ ¥ 5. t. (5-15) k=o 8T T=T ’ There is no problem about the derivatives of W(t,T) existing at T = t since W(t,T) has a unique analytic extension to the region t < T and this extension can be used to ensure that the derivative is well defined. Using equation (5—15) the predictors can be expressed as (T-T)k ——ET——)dZ(T). (5’16) S (X) y(tls)-I (zfiwigin _ tO k=o 81 T=T Lemma 5-1: m k, S ’ k y(tls) = Z §_E£EL12 - .I £Ifi%l—'dZ(T), (5-17) k=o 31 T=T CO to i s i t f i t Proof: The proof is by the same procedure as for lemma 4‘1° Q.E.D. Theorem 5-1: The pth quadratic mean derivative of 19(tIs) with respect to t exists for tO i s i t and is given by ap2(t|s) = : 8p+kW(t,T) atp k=o BtPBTET ....- 106 Proof: The proof of this theorem is virtually identical to the proof of theorem 4-1. ‘ Q.E.D. 8p+kW(t,T) atpark T=f noted by WP k(t,¥). Stating the result of the theorem in For ease of notation will be de- a different way, for t i s i t and f i t o r _ _ _ S _- 0 i W0’0(t,1) W0,l(t,r) w0,2(t,1) “r'jt (T T) dz(r) w ' w ' w ’ {80(T'¥)l d 1,0(C,T) 1,1(t,T) 1,2(t,T) Jt Z(T) - _ _ 80(T-T)2 W2,O(C,T) W2,l(t,r) W2,2(C,T) ... It __7T__ dz(r) o L a L J F ?(tIS) 1 8 (t s) at (5-19) = 822(t!s) ' at L ' J Equation (5-19) will be abbreviated S(t.¥)Z(¥.S) = F(tIS) with the corresponding definitions for S(t,¥),Z(¥,s), and F(tls). Lemma 5-2: Let Vy(t1|s,t2Is) be defined by 107 Vy(tl|s,tzls) = E{y(tlls)yT(t2Is)}. (5—20) Then V?(tl|s,t2|s) is analytic in t1 and t2 for tl,t2 1 s 1 to and k+p k p T a V (t s,t ) a (t s) a (t s) 9 1| zls _ E —? 1| 9 2| . (5-21) k p k p atlatz Btl 3t2 Proof: From equation (5-5) s s T E [It W(t1,1)dz(1)]|:ft W(t2,T)dz(T)] o 0 (5-22) V?(t1|s,t2|s) s I W(tl,T)WT(t2,T)dT, t i s i t. to 0 Hence the analyticity of Vy(t1|s,t2|s) with respect to Cl and t2 follows from the analyticity of W(t,1). Equation (5-21) is a standard result (see Papoulis [P-Z], p. 317). Q.E.D. Theorem 5-2: For t0 1 s i t and t0 1 s i t + T m k k y(t + Tls) = z é—Zifiifil-fiT . (5-23) k=o 3t Proof: The proof is the same as for theorem 4-2. Q.E.D. Forming the space L2(F(t|s)) as in chapter 4 the following analogue to theorem 4—3 occurs. 108 Theorem 5-3: L2(F(t|s)) = L2(y;t|s), c i s 3 c. (5-24) 0 m: The proof is identical to that for theorem 4-3. Q.E.D. As in chapter 4 it has been established that F(tls) contains among its components a basis for the pre- dictor space L2(y;t|s). The basis, though, is as yet un- identified. However if a finite dimensional basis for the linear space spanned by the scalar rows of the matrix S(t,f) is found*, define the vector x(tls) as the corresponding elements of the predictor space where the correspondence is realized by taking the inner product of the row vectors of S(t,?) forming the basis and the vector Z(t,s). Then, as in chapter 4, x(tls) is a basis for the predictor space. Theorem 5-4: Suppose that the linear space spanned by the rows of S(t,¥) has a basis consisting of a finite number of (scalar) row vectors of the matrix. Define the vector x(tls) as that vector whose components are given by taking the inner product of the rows of S(t,¥) forming the basis and the vector Z(t,s). Then x(tls) is a finite dimen- sional basis for the predictor space. * Recall that when considering the space L2(T(t|s)), both_ 3 and t are regarded as fixed. Hence t in S(t,T) is regarded as fixed also, 109 Proof: The proof is completely analogous to that of theorem 4-4. Q.E.D. The next thing to show is that a finite basis of the linear space spanned by the rows of S(t,¥) can be found. By assumption A-2 and theorem 2-1 it follows that W(t,I) is separable in the form W(t,T) = p(t)q(r). (5-25) Since W(t,T) is analytic in both variables so are p(t) and q(t) (Kalman [K-BJ). Furthermore the columns of p(t) and rows of q(t) may be taken to be linearly in- dependent and the number of linearly independent columns in p(t) and rows in q(t) is an integer n uniquely determined by W(t,T) (Youla [Y-ll).* Consider the matrix T -|T -IT‘-| - [Ho,p(t,r): wl,p(t’T): W2,p(t,r):...] — dp T(;) pT(t)E de(t)E dZPT(t) E... . (5-26) dTp dt at? By a theorem by Chen [C-l] from the fact that pT(t) is an n X m analytic matrix with linearly independent rows * In this case the decomposition (5-25) is said to be globally reduced. It can be shown (Youla Y-l ) that n is the dimension of a minimal realization of W(t,T). llO T rank[pT (t>' dpdét)' d2 Pét) i...] = n. (5-27) I d t n ' This immediately implies that k+1pT rankQ T(t): d (t): ... < n, k = O,1,2,..., (5-28) kdtk . dtk+1 : — since the matrix in (5-28) is a submatrix of the one in (5-27). Equation (5-27) implies that rankaT(t):__fl__(_:__d T(t :dpq2T(T) dZPTét):... < drp ' dt ' — (5-29) That is, |/\ :3 A U1 I L2) 0 v )p rank w: p(t, ¥)' WT p(t,f) Ew§,p(t,¥)i... Next define mp(t) = rankEflT (t,¥): WT (t,¥):--% W (t,¥{], (5-31) q 0.? . 1 u and m3 = 33p m2(t) . (5-32) Then by another theorem by Chen [C-l] if a is the least integer such that m3 = m§+l = v, then m5 = m5 = v, 8 = a+l,a+2,..., (5'33) and a i n - mg + l, a i v 3 a-m . (5-34) Since 1 i mg i n, by equation (5-34) a i n. Hence it has lll been shown that r W0,0(t,f) w0,1(t,%) W1,O(C,T) W1,1(t’T) rank « Fw0,0(t,¥) w0’1(t,¥) wl O(tfi) wl,l(t,¥) = rank . i n. (5—35) LWn-1,0(t:T) wn-1,1(t'T) ' .J Thus a basis for the linear space spanned by the rows of S(t,?) can be found among the rows of the matrix r — - - a w0,0(t,T) WO’1(C,T) w0,2(t:T) ’ W1’0(t,¥) Wl,l(t,¥) wl 2(tfl) Sn m(t,¥) = Lyn-1,0(t’T) wn-l,l(t’T) wn-l,2(t’T) . . “J (5-36) The following theorem is now justified. Theorem 5-5: Let W(t,T) be an analytic realizable impulse response matrix. Then the matrix S(t, f) defined by W(t,T) has the property that a finite dimensional basis of the space Spanned by its row vectors can be found. Furthermore the vector x(tIs) defined by taking the inner product of 112 these row vectors and Z(t,s) is a basis for the predictor space L2(y;tls). Hence a finite dimensional basis for the predictor space L2(y;tls) can be found. As in chapter 4 it is necessary to restrict the basis to the case x(sIs) = dx(tls) . T be defined. All Of x(tls)|t=S in order that the criteria of theorem 3-4 have now been shown to be satisfied and the representation dx(sls) F(s)x(sls) + G(s)dz(s) (5-37) y(s) H(S)X(SIS) (5-38) is justified. The solution to (5—37) and (5-38) can be represented as s x(sIs) = J ¢(s,T)G(T)dz(T) (5-39) to S a y(s) = H(s) J @(s,T)G(T)dz(T) (5-40) t o where @(S,T) is the transition matrix for equation (5-37) satisfying the property EESELLL = F(s)¢(s,r) as Theorem 5-6: The matrices F(s), 6(3), and @(s,1) occurring in equations (5-39) and (5-40) determine a minimal realiza- tion for the impulse response matrix W(t,T). 113 Proof: Let tO < T < t. Using equation (5-40) with dz(r) replaced by n(T)dT t v (t,w) H. G(t) = A2. H(t) = Al(t,t)A-l(t,t) (5-47) yields a minimal realization of the impulse response matrix W(C,T). It should be noted that theorem 5-7 gives a suf- ficient condition, but not a necessary condition for a realization to exist. Furthermore the minimality of the realization is equivalent to controllability and observa- bility ([S-l], Theorem 24). The properties of uniform controllability and uniform observability are much stronger than controllability and observability however [S-h]. The necessary machinery for the analysis of the algorithm is developed next. Theorem 5-8: Suppose it is known that the analytic impulse response matrix W(t,r) can be realized by a system of dimension n. Then the search for a basis of the pre- dictor space can be limited to the components of the vector Pn(t|s) defined by T n-l T Fn(tls) ==[y(t|s)T» §2£%%§l—,...,a ayéfis) ‘J. (5-48) t 117 Proof: By the corollary to theorem 5-6 rank Sn,®(t’¥) = n where n is the dimension of a minimal realization of W(t,1). Furthermore, by equation (5-35) rank S(t,?) i n Hence by equation (5-19) F(tls) can have no more than n linearly independent elements. By the corollary to theorem 5-6 and equation (5-19) Fn(t|s) must contain n linearly independent components. Hence Pn(tls) contains a maximal linearly independent subset of the components of F(tls) and thus by theorem 5-3 a basis for the predictor space. Q.E.D. That the search for a basis of the linear space spanned by the row vectors of S(t, f) may be restricted to the first n block rows is already established by equation (5-35) and the corollary to theorem 5-6. It can also be shown that it is only necessary to consider the first n block columns. By an argument entirely analogous to that preceeding equation (5—35) it can be shown that 118 r - - - j wO O(t,T) w0’1(t,1) WO’2(t,T) . . . Wl’0(t,T) ”1,1(t'T) W1,2(C,T) rank W2’0(t,f) W2,l(t,;) W2,2(t,f) L . . . J P a W0,0(t,T) W0,l(t,T) . . . wO,n-1(t’T) W1,O(t,T) Wl,l(t,T) . . . wl,n-1(t’T) = rank W2,O(t,?) W2,1(t,¥) . . . Wz’n_l(t,?) : n (5'49) u . . . J Equation (5-49) implies that in searching for the basis of the linear space spanned by Sn oo(th) that it is not necessary to consider elements in the columns occurring after the first n block columns in testing for linear independence. The linear independence will show up in the first n block columns and attention may be restricted to Snn(t,¥). An analogue to equation (4-76) will now be developed. Theorem 5-9: kA g wk,j(t’T)’ T = t0 or T = t E 3 y(tlt) (j) - T = . (5-50) k n (T) ... - 3t Wk j(t,T), to < T <.t Proof: From equation (A-lS) of the appendix for j,k = O,1,2,.... 119 k . t k+j E i—XifiiEl n(3)<%> = E I 3 ”(t'T) dzT,n1A‘1 2E{HvT(t>}[2E{xvT}J’1. H(t). (5-61) Next, using equation (5-39) 122 A2(t.t> 2E{x(t|t)nT(t)} 2E [t T ¢(t,T)G(T)n(T)dT n (t) to. p:. 2 I @(t,T)G(T)6(T‘t)dT to = 2-%¢(t,t)G(t) G(t). (5-62) By a proof virtually identical to that for theorem 4-11 it can be shown that Egg—ti = F(t)x(t|t). (5-63) Hence K(t,t)A'l(t,t) 2E{F(t)x(t|t)vT(t)}[2E{x(t|t)vT(t)}]-1 F(t) . . (5-64) Thus the verification of Silverman's algorithm for the case of an analytic impulse response matrix turns out to be nearly identical to that for the stationary case. As noted in the discussion after theorem 5-7 the realiza- tion must be observable and controllable. The properties of uniform observability and controllability will not be discussed here since consideration of them would involve developing the properties of generalized controllability and observability matrices, (see Silverman [S-2] or Silver- man and Meadows [S-3]) a topic outside of the scope of this dissertation. 123 the concepts used here. 3. Example Consider the linear time-varying system (t) t 9;. x1 = dt x2(t) o y1(t) - l y2(t) 0 t2 xl(t) 1 o + t x2(t) O 2 O x1(t) 1 x2(t) u1(t) u2(t) A simple example is given next to illustrate (5-65) (5-66) The transition matrix for this system,¢(t,r), is given by ¢(t,t) = is r 2 2 2 2 fl t -T t -T 3 3 2 2 (t -T e e “_—§_—— £42. 0 e L J The impulse response matrix W(t,T) F 2 2 2 2 t "T E—‘T 3 3 e 2 2e 2 £E_%l_l 2 2 E_:l_ 0 2e 2 u (5-67) (5-68) (5-69) 124 The matrix has been partitioned into the submatrices wj,k(t’T)' For t = T F1 0 l-t -2t2 [CZ-1 4t(t2-1) q o 2 :0 -2c '0 2(t2-1) 533(c,c) .. c 2:2 l-cz «:3 lt(t2-1) 6t2(t2-1) (5-70) 0 2: lo -2t2 lo 2t(t2-l) [2+1 4t(t2+l)l-t(t2+l) -6t2(t2+1)|(t2-1)(t2+1) 8t(t2+1)(t2-1) Lo 2(c2+1) :0 -2c(c2+1) !o 2t2(t2-1) J The conditions of theorem 5-7 are satisfied by Sll(t,T) and 822(t,r). It can easily be verified that rank 311(t,T) = 2 and rank 322(t,r) = 2 for all T 3 t. The fixed n x n submatrix A(t,T) of S£k(t,T) = Sll(t,r) mentioned in theorem 5-7 is 811(C,T). That is r q tz-TZ tz-TZ e 2 2/3e 2 (t3-T3) A(t,I) = t2-T2 ° (5-71) 0 2e 2 L J A realization is now constructed from S33(t,t) by using Silverman's algorithm. The matrix M(t,1) is just A(t,T). The matrix H(t,r) is given by F t2 2 2 2 . 1 -‘t t -'r te 26 [t2 + %(t3-T3)t] H(t,T) = tz-TZ . (5-72> 0 2te 2 L. .J 125 The matrix A(t,T) is already defined and A(t,T) is H(t,r). The matrices A1(t,T) and A2(t,r) both equal A(t,T). Thus A2(t,t) = A(t,t) = (5-73) and _1 1 o A1(t.t)A (t,t) =- (5-74) 0 1 yield the input and output coefficient matrices as expected. Also A(t,t)A-1(t,t) = = (5-75) as expected. Next the predictor space approach is illustrated. Denote the two components of y(tls) by 91(tls) and 92(tls). Also denote the aBth component of the 2 x 2 matrix ij(t,t) by W§E(t,t). Then by equation (5-19) m s - k 91(tls) = Z Wéi(t,¥) I Slfi%l— n1(1)dr k=o tO oo . S - k + X Wli(t.¥) I £l%%l— n2(T)dT (5-76) k=o 0 t0 and A m 22 - S r-tr)k y2(t|s) = z Wok(t,1) J n2(T)dT. (5-77) k=o t O '25 I28 126 Also 3? (tls) w _ s _— k ———=m1mJ—s>—W< k-o t ° _ , (3 00 S "k + z W1i(t'¥) I $lfi%l— n2(T)dT (5-78) k=o tO ' and 3? (tlS) m _ S _- k Zat k2 wifi(c,T) [ $1E%l— nz(1)dr. (5-79) =o t ' o The basis of the predictor space x(tIt) is just the vector y(tlt). Consider equation (5-57). It is assumed that the white noise input vector to the system, q(t) = [nl(t),n2(t)jT, consists of independent white noises nl(t) and n2(t) each with unit covariance. Hence v(t) = n(t) and T A(t,t) = 2E{x(t|t)v (t)} T = 2E{Y(t|t)n (t)} F: Vita») J 1'73}: n1(:)dr + Wins) It ("wk n2(1)d1 .0 0 Co k [11(1).n2(r)] O t - L Rio w§i(c.c) {to 11E§1_ n2(1)dr E w11j r = (5-81) 127 Similarly A(t,t) 2E {22% vT(t)} 2E {fig—Elli nT(t)} ' 11 12 '1 W10(t,t) W10(t,t) = 22 0 w t,t u 10( )J r-t: 2t:2 = . (5-82) 0 2t L. The matrices A1(t,t) and A2(t,t) both equal A(t,t) and so all four matrices necessary for the implementation of the Silverman algorithm have been computed from the pre- dictor space approach. With this example the discussion of the realiza- tion problem for analytic impulse response matrices is con- cluded. CHAPTER SIX ' CONCLUSION In this thesis a new approach to,the realization problem, the predictor space approach, has been extended to continuous time finite dimensional systems having analytic impulse response matrices. This approach provides a unifying overview to the algorithms for the solution to the realization problem. 1. Summary of Results A representation problem which provides the foundation of the predictor space approach is considered first. There is given two second order zero mean stochastic processes y(t) and z(t), with y(t) measurable and z(t) purely nondeterministic and continuous in quadratic mean. The predictor space is defined as the Hilbert space of random variables spanned by the projections of y(t) onto the space spanned by z(s) for t 3 s. A finite basis for the predictor space is assumed and then it is shown that y(t) can be expressed in terms of this basis and an error term. The predictor space forms the state space for the stochastic system representation. The dynamic behavior of 128 129 the basis is developed using a theorem by Halyo and McAlpine, the theory of weak confergence in Hilbert spaces, and the Hida-Cramer representation for purely nondeter- ministic processes that are continuous in quadratic mean. To conclude the chapter two special cases are considered. It is shown that if the given purely nondeter- ministic process is white noise, then by considering it to be the innovation process for a measurable Wiener process the theory remains valid. Finally it is established that if the two given processes are stationary with stationary cross covariance, the system representation obtained has stationary coefficient matrices. Next the realization problem for stationary systems is considered. Given an impulse response matrix known to be realizable, a stochastic system driven by white noise is defined whose input-output covariance equals the given impulse response matrix. It is shown that the input and' output of the stochastic system satisfy the prerequisites for the application of the previously developed representa- tion theory with the exception of the existence of a finite basis for the predictor space of the system output. The proof that such a basis can be found is then carried out. Having established that the representation theory developed in chapter 3 applies, it is shown that the coefficient matrices of the stochastic representation provide a solution to the realization problem for the given impulse response matrix. 130 Two of the key algorithms in realization theory, an algorithm due to Silverman and the Ho-Kalman algorithm are analyzed. It is shown that both can be interpreted in terms of various manipulations of a basis of the predictor space. An example is given which illustrates many of the concepts used. The realization problem for time-varying analytic impulse response matrices is considered next. The develop- ment parallels that for the stationary case for the most part, although some theorems on the rank of analytic matrices are needed. The Silverman algorithm is analyzed for this case. Another example is also given to illustrate. the theory. 2. Problems for Future Research The most pressing problem for future research is to consider developing the predictor space approach for more general time varying systems. If this could be done it would furnish insight into an area where algorithms at present are sorely lacking. The algorithm by Silverman can be used for "constant rank systems", a somewhat more general class of systems than considered here. It would be of interest to extend the predictor Space approach to at least this class of systems. It might also be of interest to attempt an exten- sion of the predictor space approach to some classes of non- linear systems. This is an area with few results at present and any insight shed by a new approach would be of interest. BIBLIOGRAPHY [A-l] [C-1] [C-2] [C-3] [D-1] [D-2] [G-l] [G-2] [G-3] [H-1] [H-2] BIBLIOGRAPHY Akaike, H., "Stochastic theory of minimal repre- sentation," IEEE Trans. Automat. Contr., AC-19 (1974). PP. 667-674. Chen, C.T , "Linear Independence of Analytic Functions," SIAM J. Appl. Math., 15 (1967), pp. 1272-1274. Cramér, H., "Stochastic processes as curves in Hilbert space,” Theory Prob. Appl., 9 (1964), pp. 169-179. Cramer, H., "On the structure of purely non- deterministic stochastic processes," Arkiv Mat., 3 (1961), pp. 249-266. Desoer, C.A. and Varaiya, P., "The minimal realiza- tion of a nonanticipative impulse response matrix,” SIAM J. Appl. Math., 15 (1967), pp. 754-764. Doob, J.L., Stochastic Processes, John Wiley, New York, 1953. Gel'fand, I.M. and Vilenkin, N. Ya., Generalized Functions, Vol. 4, Applications of Harmonic I Analysis, Academic Press, New York, 1964. Gilbert, E.G., "Controllability and observability in multivariable control systems,” SIAM J. Control, 1 (1963), PP. 128-151. Gikhman, I.J. and Skorokhod, A.V., Introduction to the Theory of Random Processes, W.B. Saunders Co., Philadelphia (1969). Halyo, N. and McAlpine, G., "On the spectral factorization of nonstationary vector random pro- cesses," IEEE Trans. Automat. Contr., AC-l9 (1974) pp. 674-679. Ho, B.L. and Kalman, R.E., "Effective construction of linear state-variable models from input/output functions," in Proc. 3rd Allerton Conf. Circuits and Systems Theory (1965), pp. 449-459. 131 [I-l] [K-l] [K-Z] [K-3] [K-4] [K-SJ [L-l] [M-l] [0-1] [P-1] [P-2] [P-3] [P-4] 132 Ivkovich, Z. and Rozanov, Y.A., ”0n the canonical Hida-Cramér representation for random processes," Theory Prob. Appl. 16 (1971), pp. 351-356. Kailath, T., ”The innovations approach to detection and estimation theory," Proc. IEEE, 5B (1970), pp. 680-695. ‘ Kailath, T., "RKHS approach to detection and estimation problems - part I: deterministic signals in Gaussian noise," IEEE Trans. Inform. Theory, IT-l7 (1971), pp. 530-549. Kalman, R.E., "Mathematical description of linear dynamical systems,” SIAM J. Control, 1 (1963), pp. 152-192. Kalman, R.E., Falb, P.L., and Arbib, M.A , Topics in Mathematical System Theory, McGraw-Hill, New York (1969). Kallianpur, G. and Mandrekar, V., ”Multiplicity and representation theory of purely non-determin- istic stochastic processes," Theory Prob. Appl., 10 (1965), pp. 553-581. Lee, K.Y. and Mandrekar, V., "Multiplicity and martingale approach to linear state estimation,” Proc. Princeton Conference on Information Sciences and Systems, 1973. Mandrekar, V. and Salehi, H., "Operator valued wide- sense Markov processes and solutions of infinite- dimensional linear differential systems driven by white noise," Mathematical System Theory, 4 (1970), pp. 340-356. Olmstead, J.M.H., Real Variables, An Introduction to the Theory of Functions, Appleton-Century-Crofts, New York, 1959. Padulo, L. and Arbib, M.A., System Theory, W.B. Saunders Co., Philadelphia, 1974. Papoulis, A., Probability, Random Variables, and Stochastic Processes, MCGraw-Hill, New York, 1965. Parzen, E., Time Series Analysis Papers, Holden Day, San Francisco, 1967. Prohorov, Yu, V. and Rozanov, Yu. A., Probability Theory, Springer-Verlag, New York, 1969. [R-l] [S-l] [S-2] [S-3J [S-4] [S-5] [T-l] [W-l] [W-2] [W-3] [Y-l] 133 Rudin, W., Real and Complex Analysis, McGraw- Hill, New York, 1966. Silverman, L.M., "Realization of linear dynamical systems," IEEE Trans. Automat. Contr., AC-l6 (1971), pp. 554-567. ‘ Silverman, L.M., "Representation and realization of time-variable linear systems," Dept. Elec. Eng., Columbia Univ., New York, N.Y., Tech. Report 94, June 1966. Silverman, L.M. and Meadows, H.E., ”Equivalent realizations of linear systems,” SIAM J. Appl. Math., _1_7_ (1969), pp. 393-408. Silverman, L.M. and Meadows, H.E., "Controllability and observability in time-variable linear systems," SIAM J. Control, 5 (1968), pp. 64-73. Simmons, G.F., Introduction to Topology and Modern Analysis, McGraw-Hill, New YorRT’1963. Tether, A., "Construction of minimal linear state- variable models from finite input-output data," IEEE Trans. Automat. Contr., AC-15 (1970), pp.427- 436. Wiener, N. and Masani, P., "The prediction theory of multivariate stochastic processes,” Acta. Math., 9B (1957), pp. 111—150. Weiss, L. and Kalman, R.E., "Contributions to linear systems theory,” Int. J. Eng. Sci., 3 (1965), pp. 141-171. Wong, E., Stochastic Processes in Information and Dynamical Systems,iMcGraw-Hill, New York, 1971. Youla, D.C., "The synthesis of linear dynamical systems from prescribed weighting patterns,” J. SIAM Appl. Math., 14 (1966). PP. 527-549. APPENDIX APPENDIX GENERALIZED RANDOM PROCESSES Let (Q, A, P) be a probability space and consider the linear space L2(Q, A, P) of all zero-mean finite variance random variables with norm equal to the variance. A second order random process, say {x(t,w):t e T}, is then a mapping associating with every t e T a random variable x(t,w) in L2(Q, A, P). White noise is not such a second-order random process. To obtain a rigorous theory for white noise it is necessary to go to the theory of generalized random processes. An important reference in this area is Gel'fand and Vilenkin [G-l]. The book by Prohorov and Rozanov [P-4] is also helpful and an intuitive discussion may be found in Kailath [K-2]. Let U be a real or complex linear space of elements u. A function x = x(u) of the parameter u e U whose values x(u) are real or complex random variables on a probability space (R, A, P) is called a random functional. The random functional x(u) is called linear if for any u,v e U and numbers a,B X(au + BV) = aX(u) + BX(V) (A-l) with probability 1. The finite dimensional probability 134 135 distribution functions of a generalized process are defined, in an analogous manner to the usual case, by F(a1,a2,...,an) = P{w:x(u1,w) i al,x(u2,m) i a2,...,x(un,w) i an} (A-2) for all finite collections {ul,...,un} ‘of elements of U. The linear space U is usually taken to be the set D of all real or complex functions w defined on the real line which are infinitely differentiable and have compact support. The concept of convergence used to define a topology on D is that Yn converges to ¢ as n goes to infinity, denoted ”n.: ¢, if all functions on = mn(t) vanish outside a fixed finite interval which dk . ¢n(t) 18 the same for each function and .————E—— converges dt k uniformly to g4&£%;—3 as n goes to infinity, for each dt derivative of order k = 0,1,2, .. . The space D equipped with this topology is called the space of testing functions. The random functional x(o) defined on D is called continuous if the convergence in D of the functions ij(t) t0 @j(t), j = l,...,n, implies 111m (X(cpkl),...,x(cpkn)) = (x(sl),...,x(cpn')). (A-3) A continuous linear random functional on D is called a generalized random function. In the case that D consists of functions of one variable, the corresponding random 136 function will be called a generalized random process. This is the case of interest here and for the rest of this appendix D will mean the space of testing functions re- stricted to functions of one variable. A number of operations are defined on generalized random processes in a manner analogous to that by which they are defined for generalized functions. By a linear com- bination axl + 8x2 of the generalized random processes x1 and x2 is meant the random process x associating with every function @(t) e D the generalized random variable axl(¢) + Bx2(¢). The product f(t) x of an infinitely differentiable function f(t) and a generalized random process x is defined astimaprocess for which there corresponds to the function w(t) e D the generalized random process x(fQ). The derivative of a generalized random process x(o) for Q 5 D is defined by X'(cp) = -X(cp'). (A-A) It is important to note that the derivative of a generalized random process always exists and is a generalized random process. This is in contrast to ordinary random processes which may not have derivatives that are ordinary random processes. In particular, the derivative of the Wiener process does not exist as an ordinary random process, but it does exist as a generalized random process. The material necessary to establish lemma 4-3 and theorem 5-9 is now developed. It will first be shown that 137 k . t ak+ E{9_3;<1§J_t_>fl (J)(—F)}= BUB 113“?) (ms n Tm} (21-5) at tO at 3T where ak (t t) ;'aky(t|s) at at s=t t kw = 3 (t,T) dz( ) (A-6) 4CO at T and n(t) is a stationary Gaussian white noise interpreted via equation (4-3) as the derivative of the Wiener process z(t). Define the generalized Wiener process z(w) by 2(6) = [ 6dz jn<¢(j)>. j+1A£:§E§ffi*ll-v(j+l)Tdr . m k+j (_1)23+l[ 3 .W(E,T) BtJB J m'(T)TdT ¢'(T)TdT. (A-14) [m 8k+jW(t,T) ’ -m atRaTJ Hence E{3k2(tlt) “(j)(CP)T}= -[°° 3k+j¥(tj.r) (P'(T)Td1 3t at 31 co k+j co T = _ 3 W(t,‘l‘) v E{I-m atka-rj dz(T)[:[-mcp (T)dZ(T)] } "m k+j _ _ 3 W(t,T) . T} — -E ~ . dz( )z( ) {J-m BthTJ T T t k+j = E{[ 3 g(tiT) dz(T)n(cp)T} (Ac-15) tO 3t 813 Next suppose that W(t,T) is stationary so that W(t,T) = W(t - T). (A-lG) Then akJ’JWr-Jl = ak+Jw<;--r_>_ (A47) atkaTJ BtkaTJ = (-1)jak+3w atk+J Substituting this into equation (A-lS) yields 140 k . -. t k+j a (tlt) ( ) _ a W(t- ) T E{ it? n J RN} _ (-1)JEUt 3tR+j T dZ(T)n(CP) } O . . k+j _<.-1>JE{3 if?” MN}.