WEE—E‘SENSE KAERNGAEE AFPROACE T0 “REAR EEESCKE’EE-“Eihifi GF’EHflé ESTEMAHSH The“: ‘30? fits Degree of 95. D. HESHEGM SMTE L‘NEVERSETY HALE? MM 197! LIBRARY thS~~ Michigan State University This is to certify that the thesis entitled Wide-Sense Martingale Approach to Linear Discrete-Time Optimal Estimation presented by Halit Kara has been accepted towards fulfillment of the requirements for Ph. D. degree in Systems Science M /fa/L/< Major professor Date May 191 1971 0-7639 WIDE-SENSE MARTDNGALE APPROACH TO IJNEAR DISCRETE -TIME 0 PI'IMAL ESTIMATION BY Halit Kara AN ABSTRACT OR A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering and System Science 1971 ABSTRACT WIDE -SENSE MARTINGAIE APPROACH TO LINEAR DISCRETE-TIME OPTIMAL ESTD‘IATICN BY Ha lit Kara The dissertation considers the minimum mean-square error estimation of the signal xk ' §(k)uk where §(k) is an n x n matrix and uk is a wide-sense martingale process. The optimal estimation equations are derived for prediction, filtering and smoothing based on noisy observations. Along with the statemnt of the problem, the historical and mathematical background upon which the derivations of the optimal estimation equations are based is presented. The general formulas for the optimal estimation equations for a second-order discrete-time stochastic process are derived assuming that the observation process has full rank. Then, the recursive and algebraic estimation equations are derived for the signal when the observations are corrupted by additive white, cross-correlated white and cross-correlated colored noises. The recursive nature of these equations follows easily from wide-sense martingale pro- perty of uk. The thesis gives a purely orthogonal projection approach in solving the prediction, filtering and smoothing problem of Kalman and their extensions, for a are general class of signals. The main object is to remove unnecessary analytic complications introduced by stochastic difference equations and to use a con- ceptually simpler geometric approach which provides a unified attack in all three cases (uncorrelated white, cross-correlated white and cross-correlated colored observation noises). However, at the same time, analytic solutions which can be studied numerically are obtained. WIDE-SENSE MARTINGALE APPROACH TO LINEAR DISCRETE-TIME OPIIMAL ESTIMATICN BY Halit Kara A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHIIDSOHIY Department of Electrical Engineering and System Science 1971 To my professors M.N.Ozdas and T. Ozker whose encouragement and insistence made it all possible. ii ACKNOWIBDGEMENTS The constant help, encouragement, and guidance given by Drs. R.O. Barr and G.L. Park have made this work possible. Besides the help extended on the actual writing of the thesis, Dr. Park is responsible for procuring the financial aid without which nothing could have been achieved. The excellent academic background, whether it be through teaching or advising, provided by Dr. Barr; and his friendly and encouraging disposition have also had a great influence on the initiation and fruition of this work. The problem was initially brought to the author's attention by Dr. V. Mandrekar whose suggestions and assistance throughout the writing of the dissertation needs special mention. Thanks are also due to Dre. 1. Kern and H. Salehi for their critical reviews and helpful suggestions. The author would also like to take this Opportunity to acknowledge the unselfish interest and encouragement extended by Dr. M.Z. Akcasu and his wife Melahat of the University of Michigan, and to Dr. P.K. Wong for providing him with mathematical background. Support was granted by Consumer's Power Company Cooperative Research at Michigan State University and the Turkish Sugar Corpora- tion. This support is gratefully adknowledged. Last but, definitely not least, the author wishes to give special mention and thanks for the love, patience, and understanding put forth by his wife, Ayten, and his children, Tarik and Aysen. iii Chapter TABLE OF CQITENTS ABSTRACT DEDICATION ACKNOWIEDGMENTS LIST OF FIGURES GENERAL.NOTATION IJST OF SYMBOLS INTRODUCTION 1.1 Historical Background and Literature Survey 1.2 Statement of the Problem 1.3 Outline of the Thesis MATHEMATICAL BACKGROUND AND BASIC RESULTS 2.1 Hilbert Space of Random‘Vectors 2.2 Wide-Sense Martingale and Markov Processes 2.3 A Solution of the General Minimm Mean-Square Estimation Problem BASIC PROBLEM (BP) 1 Optimal Prediction for BP 2 Optimal Filtering for BP 3 Optimal Smoothing for BP 4 An Example CROSS (DRRELATED NOISE PROBLEM (CCP) 4.1 Optimal Prediction for CCP 4.2 Optimal Filtering for CCP 4.3 Optimal Smoothing for CCP iv Page ii iii vi vii viii name 12 12 17 23 37 38 54 60 67 7 2 72 81 Chapter COIDRED NOISE ROBLEM (CNP) 5 1 Reformulation of the Problem 5 2 Optimal Prediction for CNP 5.3 Optimal Filtering for (NP 5 4 Optimal Smoothing for CNP CONCLUSIONS 6.1 Conclusions and Results 6.2 Extensions REFERENCES .APPENDIX A APTENDIX B Page 88 89 91 92 106 111 11 1 113 114 119 121 LIST OF FIGURES Figure Page 1.1 Block diagram for linear estimation problem (1.2) 3 1.2 Block diagram for the output noise 10 3.1 Block diagram for single-stage predictor 52 3.2 Block diagram of filter 59 4.1 Optimal single-stage predictor for CCP defined by (4.19) 81 4.2 Block diagram of optimal filter for CCP_ 84 5.1 Block diagram for single-stage predictor for (NP 101 vi GENERAL NOTATIW The discrete-time is denoted by i,j,k,{,,...,s Vectors are denoted by small letters, such as u, v, w, x, y and z. The transpose of a vector is denoted by superscript T, for example xT denotes the transpose of the vector x. Matrices are denoted by capital letters, such as D, F, G, H,...,§. The transpose and trace of a matrix are denoted by superscript T and by tr respectively. The syubols 0 denotes the scalar zero, or the null vector, or the null matrix, depending on the context. The proof of a theorem will be introduced by the word PROOF and terminated by the abbreviation QED. If the proof is omitted the statement of the theorem will be terminated by the syubol . vii Symbol Usage k E z kéz ACB ...;- ooov ' k|L Ml kl; M -l lal IX| a,y>L [x,y] M 00.:- w ,4 x(w) A613 LIST OF SYMBOLS Meaning by definition equal to k is an element of Z k is not an element of Z A is a subset of B .. such that r ... for each (for all) - Optimal estimate x based on k the observation Y (L) d x - xk.- aktb, estimation k|L error All x such that - Absolute value of a Euclidean norm of vector x Inner product of x,y in L2 Inner product of x,y in L; Gramian matrix of x,y Norm of‘vector x induced by <,> ... implies - Function on X to Y Function assigning x(w) to w Direct sum of A, B viii Reference Ch. 1 Ch. 1 Ch. 2 Q9 A ® B Direct product of A, B Ch. 2 x .L y x, y are orthogonal Ch. 2 ‘L {x .l. M x is orthogonal to subspace M Ch. 2 'L A‘L Orthogonal complement of A Ch. 2 + P+ Generalized inverse of P Ch. 2 ( l ) (x‘M) Orthogonal projection of x onto M Ch. 2 A iAj iAjc-l'min{i,j} 011.2 x > y y is less than of x Ch. 1 > {P > 0 P is positive definite Ch. 1 x 2 y y is less than or equal to x Ch..1 2 {P 2 0 P is positive semidefinite Ch. 1 Abbrev iat ions Meaning Reference BP Basic problem Ch. 3 CCP Cross-correlated noise problem Ch. 4 WP Colored-noise problem (11. 5 iff if and only if at. 2 OPE Orthogonal projection estimator Ch. 2 H-space Hilbert space Ch. 2 L2 (1.2 (Q,a,p)) Hilbert space of square integrable random variables (on the probability space (fl,d,p)) Ch. 2 L2(L;(n,d,p)) Hilbert space of square integrable random n-vector (on the probability space (0.4.10) Ch. 2 (mod P) With probability one Ch. 2 ix r.v. Y(L) Ph 2 (°|M) orthogonal projection Operator onto subspace M Random'variable Z 2 {...-l,0,l,...}, the set of all integers Kronecker delta n x n identity matrix n-dimensional Euclidean space over reals Observation record Ht) 2 {y1,y2,. 0 - ,YL} CHAPTER 1 INTRODUCTION The Optimal estimation problem is encountered under dif- ferent forms in many branches of science as well as in a variety of engineering disciplines. The discrete-time linear estimation problem which is an important special case of the general problem is the topic of this study. It can be described quite generally in simple terms with reference to the block diagram in Figure 1.1. In this block diagram xk and zk denote, respectively, the input and output signals of a memoryless, non-random linear trans- formation, H(k), so that 2k - H(k)xk . The output is observed in a noisy environment which is assumed to be an additive random signal vk, called the output noise (or measurement noise or observation noise). Thus, the (actually) observed signal yk can be represented as = z +’v where the subscript refers to discrete-time, i.e., k E Z - {...,-1,0,l,2,...}, the set of all integers. It is assumed that the observations are available over a set of integers {k1,k +1,...,L} where k1 is an arbitrary 1 starting time (for the sake of simplicity k1 is chosen to be unity) and ; moves along in discrete-time as additional data are recorded. The problem can now be stated as follows: (1.2) OPTIMAL DISCRETE-TIME LINEAR ESTIMATION PROBLEM. Given: (a) The relationship between x and yk, k 6 Z: i.e. H(k): k k 6 Z and (1.1). (b) The means and covariance matrices of the stochastic pro- cesses {xk, k 6 z} and {vk, k E z}. Problem: Given an observation record (data) Y(;) 2 {y1,y2,...,yL}; find an optimal realizable estimate R of the signal x kl; k which is a linear function of the data y1,y2,...,yL, i.e. L d 1) 5: - I: A(1)y kl; id 1 for k;2 0, where A(i), i - l,2,...,; are matrices in appropriate dimension. For k >.; the problem is called prediction, for k - ; filtering, and for kt< L smoothing. The terms optimal and realizable that occur in the descrip- tion of the problem are defined as follows: le of xk some specified criterion of Optimality. The criterion used in this OPTIMAL. The estimate it is optimal if it satisfies dissertation is the minimum mean-square error, i.e., the minimiza- tion of the mean-square error risk.function (or performance measure): ~ 2 _ T _ _ _ _« T 2) (1.3) T ; the estimator is called a predictor, for k I ; a filter, and for k.< L a smoother. The purpose of this research is to study PrOblem (1.2) where the signal is an ndvariate, discrete-time, second-order stochastic process {xk I §(k)uk, k I 0,1,2,...} where Vk I 0,1,2,...,§(k) is a non-random n X n matrix and uk, k I 0,1,2,... is a wide-sense martingale processl). The major l) The idea of approaching to optimal linear estimation problem from a wide-sense martingale process approach was first suggested to the author by V. Mandrekar. contribution of the present work lies in the characterization Of the signal xk as a linear transformation of a wide-sense martin- gale rather then as a solution of a given difference equation. This characterization leads directly to recursive estimates and also allows direct alternative derivations and extensions of earlier well-known results in discrete-time linear estimation (cf. [B-S], [B-6], [c-l], [x-B], [K-4]). 1.1 HISTORICAL BACKGROUND AND LITERATURE SURVEY The problem of estimating a stochastic signal from a noisy observation record has been intensively studied since the appear- ance, during the early 1940's, of the classical work of AsN. Kolmogorov [K-9] and N. Wiener [W—l]. Rolmogorov studied only discrete-time stationary processes and solved the problem of linear estimation of such processes using a technique which was based on the time-domain recursive orthogonalization of the observed data. This technique was suggested by Wold [W-6] in his doctoral dissertation in 1938, and hence is known as the Wold decomposition method. Kolmogorov's theory extended to vector valued random elements by Wiener and Masani [W-3], [W-4] in 1957-58. On the other hand, Wiener [W-l] studied the linear estima- tion prOblem of continuous-time processes and reduced it to the problem of solving a certain integral equation, the so-called ."Wiener-Hopf equation". This equation was already studied by Wiener and Hopf [W-3] in 1931. It can only be solved explicitly for certain special cases of the general estimation problem. The solution involves formidable mathematics, which was beyond the reach of most engineers at that time, even though Wiener undertook this work in response to an engineering prOblem. Because of its complexity, Wiener's theory did not receive proper attention and recognition for many years. In 1950, H. Bode and C. Shannon [B-3] gave a different derivation of Wiener's results based on ideas in a report by Blackman, H. Bode, and C. Shannon [B-Z]. The work of Bode and Shannon was instrumental in popularizing Wiener's theory. The same approach was independently discovered by L. Zadeh and R. Ragazzini [Z-3]. In 1950's the idea of generating linear estimates recursively was introduced. Such algorithms were used by Gauss in 1809 [G-l] in his numerical calculations of the orbit of the astereoid Ceres. But the modern interest in recursive estimation was stimulated by the increased usage of digital computers. The first modern work on this subject was done by R. Kalman [K-3], in 1960. The prac- ticality of the Kalman approach to the estimation problem has made it immensely popular among engineers. The paper by Kalman in 1960 introduced a different approach to the linear estimation problem of Kolmogorov and Wiener in the case of a special class of discrete-time stochastic processes. In 1961, R. Kalman and R.S. Bucy [K-6] generalized Kalman's results to continuous-time processes. The novelty of their formulation was the representation of all stochastic processes by state equations that are driven by additive white input noises rather than correla- tion functions. By restricting their attention to GausséMarkov processes given by difference or differential eqautions, in particular, they derived difference (for discrete-time) and dif- ferential (for continuous-time) equations for the filters, which are called Kalman filter and Kalman-Bucy filter, respectively. These equations can be used to construct a linear filter that is, of course, identical to the one specified by the Wiener-Hopf equa- tion. However, there is a definite practical advantage in having a differential (or difference) equation for the estimate instead of an integral equation for the estimator. Specifically, it is much easier to solve a differential equation by analog or digital techniques than to solve an integral equation and then perform a convolution. This computational advantage of the Kalman approach to the linear estimation problem.has stimulated a great number of papers, providing alternative derivations, extensions and relation- ships tO classical parameter estimation techniques. It may be an overstatement to suggest that there are as many derivations of the Kalman filter equations as there are workers in the field. Most of these works are now in standard texts [A-l], [A-Z], [B-7], [8-9], [9-1], [J-l], [L-l], [L-z], [N-l], [8-1], [341. 1.2 STATEMENT OF THE PROBLEM A precise statement of the problem considered in this study will now be presented. We first note that, in the last decade, all the work on the Kalman filtering theory assumes that the signal is generated by a given linear stochastic difference equation driven by a white-noise process. It is known that such a stochastic dif- ference equation generates a wide-sense Markov process (and it always can be written as xk I §(k)uk where §(k) is an invertible matrix and uk is a wide-sense martingale process (cf. [M-l], [M-2] and see also chapter 2, Section 2). This remark motivates us to the following problem which is characterized by the assump- tions on the signal xk, k I 0,1,2,... and output noise vk, k I 0,1,2,... in the general problem (1.2) described in Figure 1.1. (1.4) SIGNAL. The signal xk, k I 0,1,2,... is assumed to be an n-variate, second order stochastic process in discrete-time, and given by xk I §(k)uk k I 0,1,2,... where k I 0,1,2,...,§(k) is an n X n matrix of known functions of discrete-time, and uk’ k I 0,1,2,... is a wide-sense martingale (see Section 2 of Chapter 2) with zero mean and known n x n positive semi-definite covariance matrix sequence {Ph(k), k I 0,1,2,...} where (cf. Section 2 of Chapter 2) d T Pu(k) dluk ui} for k s i, k,i I 0,1,2,... . As a notational convenience, for any integer k, the definition: two 9- a (uk+1 - “1.) (um1 - uk)T} I Pu(k+l) - Pu(k) is made. In addition, without loss of generality, it is assumed that the initial value uo is orthogonal (cf. Section 1 of Chapter 2) to the output noise. Notice that, if not, define Gk I uk - uol then do I 0 and therefore it is orthogonal to any output noise. So that this assumption is not a restriction; it is just a nota- tional convenience. (1.5) OBSERVATIONS. The observations are corrupted by additive noise such that I H(k)xk +-v yk k I M(k)uk +-vk for k I 1,2,... (thus the starting time is k I l) where yk E Rm (m s n), observation vector, VR 6 Rm , output noise vector . H(k) is an m.x n matrix of known functions of discrete-time and M(k) 2 H(k)§(k), k I 0,1,2,... . Note that, for k I 0, M(O) I H(0) I 0 since at the time k I 0, there is no observation. The following assumptions are made on the output noise vk, k I 0,1,2,... each of which leads to a prOblem in the estima- tion theory: (A.1.l) BASIC PROBLEM. The output noise v k I 0,1,2,... is k, a zero mean white noise (cf. Section 2 of Chapter 2) and that dlv VT} I P (k)6 k j v kj for k,j I 0,1,2,..., where Pv(k) is an m X m matrix of known functions of discrete-time and it is assumed that P§(k) > 0 1) 1) If P is asymmetric matrix, P>0 (P20) means P is positive (semi) definite. Vk I 0,1,2,..., and 6kj is the Kronecker delta: 1 if kIj o if kl‘j In addition, it is assumed that the process {vk, k I 0,1,2,...} and {u k I 0,1,...} are orthogonal, i.e., k+1 ' “k’ 6{vk(u -uj)T}=o 1r jik . j+l (A.1.2) CROSS-CORRELATED NOISE PROBLEM, The process vk, k I 0,1,2,... is a zero-mean white noise process as in (A.1.1), k . 0,1,000} except that it is correlated with the process u {Ute-+1 ' k’ such that T 6{(uk+1 - uk)vjl I C(k)6kj for k,j I 0,1,2,... where C(k) is an n x m matrix of known functions of discrete-time. (A.1.3) COLORED NOISE PROBLEM. The output noise process is the output of a known linear system with a white noise input (see Figure 1.2): vk+1 I l(k+l,k)vk +nk for k I 0,1,2,... with the initial condition v0, where vk 6 Rtn , output noise, m nk E R , white noise, and l(k+l,k) is the UlX u: transition matrix of known function 10 of discrete-time. It is assumed: 1. The initial condition V0 is a square-integrable random vector with zero mean and known m X m covariance matrix l ¢(k+l,k) _ , I m X m Figure 1.2. Block diagram for the output noise. d T , Pv(0) I 6{vo v0} which 18 orthogonal to the processes uk, k-0,l,2,... and n k=0,1,2,... k, 2. The process nk, k I 0,1,2,... is a white noise process with zero mean and that 6 HT} I P (k)6 {nk j n kj where Pn(k) is an m X m matrix of known functions of discrete- time. In addition, the processes k I 0,1,2,...} {“k+1 " “k’ and {nk’ k I 0,1,2,...} are correlated such that T 6[(uk+1 - uk)nJ} I C(k)5k1 for k,j I 0,1,2,..., where C(k) is an n X m matrix of known functions of discrete-time. T T M(k+l)C(k) + C (10M (k-H.) + Pn(k) is positive definite for all k I 0,1,2,... . 11 The problem is now, under one of the assumptions (A.1.l), or (A.1.2) or (A.1.3), to find the minimumimean-square estimate of the signal x I §(k)uk defined by (1.4) and observed via (1.5). k Our main interest is to compute the optimal estimate *klt based on the observation record Y(;) and the corresponding d estimation error, ikl‘ I xk - k , covariance matrix: kl; d ~ Piflclt) - sum 55:”) . So, by the solution Of the Optimal estimation problem we mean a set of equations which allow us to compute the pair (ile,P (le)). i We shall refer to this pair as prediction if k >';, filtering if k I ;, and smoothing if kl< L. 1.3 OUTIJNE OF THE THESIS The outline of the dissertation is as follows. Chapter 2 contains the mathematical background upon which the derivations of the estimation equations are based and the basic results. In Chapter 3, the Optimal estimation equations for the signal xk I Q(k)uk are derived under Assumption (A.1.1). Optimal estima- tion equations for the signal xk I §(k)uk under Assumptions (A.1.2) and (A.1.3) are derived in Chapters 4 and 5 respectively. The results of the thesis are reviewed in Chapter 6 and conclusions are drawn concerning the application of this approach. A number of extensions of the present research are proposed. CHAPTER 2 MATHEMATICAL BACKGROUND AND BASIC RESULTS This chapter is devoted to the basic mathematical notions which are used throughout the dissertation and a new solution of the general discrete-time optimal estimation problem (cf. Section 3). The material presented in Section 1 and Section 2 is based on the works of Wiener and Masani [W-B], and Mandrekar [M-Z], respectively. The proofs of the new results and the known results whose proof belong to the author are presented. 2.1 HIlBERT SPACE OF RANDOM‘VECTORS Let (0,4,P) be a probability space; that is, n is a set of points w, d is a a-algebra of subsets of n, and P is a probability measure on 0. A certain property is said to hold P-almost-everywhere on n (or with probability one) if the proba- bility of the set of points w at which this property does not hold equals zero. We indicate this property with the expression (mod P). Let (XyB) be a measurable space (cf. [R93], p. 217). A function x : 0 ... x 1) is called a random elemnt with range in X, if it is d-measurable; i.e. VB 6 B : {wlx(w) E B] 6 d. A random element with range in a finite n-dimensional linear Space 1) That is w-L x(w) e x. 12 13 is called a random n-vector. In the case in which X is the real line R and B is the o-algebra of Borel subsets of R, the func- tion x is called a random variable (abbreviation: r.v.). In this study, the range space x is chosen to be the Euclidean n-space R“, and o-algebra B is chosen to be the a-algebra of Borel sub- sets of Ru, so that a random element x with range in Rn is (1) a random n-vector (column) with r.v. components x , i - 1,2,...,n. We denote by L2(n,d,P) (or 1.2) the set of all r.v. x defined on (0,4,P) which are square integrable: 6i|Xlzl 9- lluwnzmw) < .. The set Lzm,d,P) is a Hilbert space (abbreviation: H-space) with usual Operations and inner product (cf. [G-Z], Theorem 7, Section 5 of Chapter II). Now, let 1.261.513) (or L?) be the set of all random n-vectors x on 0, with components x(i) E Lzm,d,P), 1 - 1,2,...,n. Thus x 6 113mm,» iffl) x“), 1 - 1,2,...,n are random variables and x is square integrable; i.e. d dllxlzl - £lx(w)l2P(dw) < .. , where l-l denotes the Euclidean norm: n lnzéfx-z u“M€ iIl The space 1361.43) is a direct-product H-space (cf. [P-l], p. 75) with usual operations and the inner product 1) "iff" is shorthand for "if, and only if". 14 " (1) (1) - 2 0‘ .y >L i=1 n (2.1) = 2: xmwnmwwmw) iIl T I dlx y} . This inner product generates the norm n as M - <>" - <2 lawn?” iIl - wash)“ . which in turn induces a topology in L; : a sequence {xk} in n n L2 is said (i) to converge to a vector x E 1.2 iff llxk - xll .. O as k _. an, and (ii) to be a Cauchy sequence iff llhk - hmll -0 0 as k,m -0 on. The inner product (2.1) does not play any significant role in the stochastic theory, although the corresponding norm (2.2) and topology it induces do. Rather than inner product we often use rectangular Gramian matrices. (2.3) DEFINITION. The n x m matrix [XaY] 2 La“): y(j)>L] " [IE x(1)("OF”)(V)1’(dW)] i I 1,2,...,n; j I 1,2,...,m that is defined for x E L3, y 6 I; is called the Gramian of st°1) 1) In what follows we assume implicitly that the random vectors x and y are defined on the same probability space. This assump- tion is made throughout of this work. 15 In the next two definitions we introduce the concepts of orthogonality and subspace [W-3]. These definitions differ from the usual ones in that Gramians replace inner products, and matrix coefficients replace scalar coefficients in linear combinations. (2.4) DEFINITION. We say that: (a) two vectors x,y in L; are orthogonal, written as x 1.y, iff [x,y] I 0 (b) two sets M,N contained in L; are orthogonal, written as M 1.N, iff each vector in. M is orthogonal to each vector in N. (2.5) REMARK. (a) Note that this concept of orthogonality is stronger than the usual one. For x 1.y, it is not sufficient that I 0. (b) From (2.3) we see that [Ax, By] I A[x,y]BT, Vh X p and m X q real matrices A,B; and vecotrs x 6 Lg, y E E3. Hence, if x 1y, then Ax iBy. (2.6) DEFINITION. A non-empty subset ‘M of L; is said to be: (a) a linear manifold if x,y E M a Ax + By 6 M, Vn X n real matrices A,B. n (b) a subspace of L 2 if it is a linear manifold, which is closed in the topology of the norm (2.2). The subspace spanned by the family of randomrvectors {xk, k 6 A} where A is an index set will be denoted by {{xk, k 6 A}. The basic facts governing the - notions just introduced which are referred in this study are given in the next two lemmas. These lemmas are quoted here from the work of N. Wiener and P. Masani (CW-3], lemmas 5.8 and 5.9). 16 (2.7) ORTHOGONAL PROJECTION LEMiA (Wiener-Masani, 1957). (a) M' is a subspace of 1.; iff there is a subspace M of L 3 M I M“, O 2 o where M: denotes the direct-product Mo ®...® Mo with n factors. M0 is the set of all components Of all vectors in M. n (b) If M is asubspace of L2 and x 6 L2, then H a unique (mod P) 15, 6 M 9 “x - KN“ ' :2; “X - y“ - A vector x E M satisfies this equality iff it satisfies the following equivalent conditions: x-xulM or [x,leEJtusy] VYEM. (c) If M,N are subspaces of L; and MCN, then 3 aunique subspace M' C N 9 N I M 9 M', M .LM' where 9 denotes a direct sum of vector spaces (cf. [P-Z], p. 38). In particular, M c: L; is a subspace =9 a a unique M-L, called the orthogonal complement of M such that 1.; - M 9 Mi, :4 .L M‘- . (2.8) DEFINITION. The unique vector KM of (2.7b) is called the orthogonal projection of x onto M, and denoted by (le). The operator PM on L; defined by PM(x) I (le) Vx E L; is called the orthogonal projection Operator. (2.9) LEMMA (Wiener-Masani, 1957). (a) If M,N are orthogonal n subspaces of L3, then M @N is a subspace of L2 and for any xeL; Mn @N) - (le) + (xlu) (mod P) (b) If M,N are subspaces 9 MC: N, then for any x e L; 17 “(14”)“ 5 “(KW)“ - In the study of the properties of orthogonal projection, a basic role is played by (2.7b). For example, from (2.7b) we may deduce the following: (2.10) COROLLARY. Let M,N be subspaces of L;1 a MC: N. Then the following holds (mod P): (a) (Ax + By‘M) I A(x‘M) + B(y\M), Vn X n real matrices A,B and x,y E L; (b) ((x|M)|N) - <3aui-gu»a1-gunfi = [xi - mx(i), x-1 - mx(j)] . If mx(i) - 0 V i E z, then the process is said to have zero mean. For a zero mean process, it is clear that Px(i,j) I Cx(i,j) Vi,j e 2. If Cx(i,j) - mx(i)m:(j) that is, if qu'” - o vi,j 6 2 then the process is said to be uncorrelated. From now on we assume, without loss of generality, that all the processes have zero mean unless otherwise stated. (2.11) DEFINITION. A second-order mlti-variate process {xk’ k E Z} is said to be: (a) a white noise (or orthogonal) process iff [xvxj] I PxUMiJ v1.1 6 z. where P;(J) 9 P;(J.J). (b) a process with orthogonal increments iff the process {xlc'I-l - xk, k E Z} is orthogonal. Let (ark, k E Z} be a stochastic process in 1;. We shall denote by L(x;{,) and L(xL) the spaces {{x i 51,} and 1’ {fit} respectively (cf. Definition 2.6b). These are called the past-present and the present of the process {xv k 6 Z} respec- tively. Obviously L(xL) c: L(x;{,) C L(x;.(,+l). 19 (2.12) DEFINITION. A discrete-time process {xk, k e z} in L; is said to be: (a) a wide-sense martingale process, iff VRJ, E 2 with L S k, (xk|L(x;L)) I XL ; (b) a wide-sense Markov process, iff V k,L 6 2, with L s k (xk|L(x;c>> - (xkluxbn . We see from (2.7b), (2.128) that a process {xk, k.€ Z} in L; is a wide-sense martingale process iff vk,{, E 2 with L s k, xk - XL 1. x1 vi 5 L. Hence for a wide-sense martingale process Px(i,j) I P;(i A j), where i A j 2 min{i,j]. A necessary and sufficient condition for a process n {xk, k.€ Z} in L2 to be a wide-sense martingale is given in the following lemma. This lemma extends to n-variate case the result Of Doob ([D-Z], p. 166). The proof of the lemma is similar to the one that is given by Doob, so it is omitted. (2.13) IEMMA. A discrete-time process {xk, 1:62} in L; is wide-sense martingale iff it satisfies the first-order linear vector difference equation I x +sw x k+1 k 10"]. «a -¢n where (wk, k 6 Z} is a white-noise process. (2.14) REMARK. (a) In continuous-time case the "only if" part of Lemma (2.13) does not hold, i.e. in continuous-time a wide-sense martingale process cannot be generated by a linear differential equation unless it is differentiable in the sense [M-3]. When that is so, the approach of the thesis can be applied to continuous time. 20 (b) We see from (2.5b) that if {wk, k.e Z} is a white noise process in I; then {r(k)wk, k 6 2}, where F(k) is an n x m matrix function, is a white-noise process in 1;. Hence the solution of x = xk + I"(k+1)wk+1 x I w k > -es m -o k+l is a wide-sense martingale, by virtue of (2.13). One important wide-sense martingale process for our purpose is the orthogonal projection of a process. (2.15) IEMMA. Let y E L; be fixed and {xk, k E 2} be a process In L2. Define uk - (y|L(x;k)). k e z . The process {uk’ k 6 Z} is a wide-sense martingale. PROOF. Since y 6 L2, so is uk, k e 2 by virtue of (2.91:). Also from (2.10b) (uk|L(X;L)) = ( = S k . ED UL 178 Q 21 Next we study discrete-time wide-sense Markov processes. The concept of wide-sense Markov processes first was introduced by Doob [D-Z]. Let {xk, k 6 2} be a wide-sense Markov process in L2. The definition Of wide-sense Markov process implies that (xk‘L(x;L)) - A(k,L)xL for L s k, where A(k,L) is an n x n matrix. Beutler [B-l] proved that Anus) = ‘3. (morgue) where P;(k,L) a [xk,xL] and P+ denotes the pseudo-inverse (cf. A.3) of the matrix P. He also showed that a multivariate second-order process is a wide-sense Markov process iff A(kst) ‘ A(k.J)A(JsL) for L s j s k. The function A(k,L) is called a transition matrix. Mandrekar and Salehi ([M-3], Theorems 2.11 and 2.12) recently gave a representation to a wide-sense Markov process, extending to a singular case the work of Mandrekar [M-l], [M92] which shows the connection between the signal process (1.4) of the problem that is considered in this study, and wide-sense Markov processes. (2.16) THEOREM (Mandrekar-Salehi, 1971). Let {xk, k e 2} be a discrete-time stochastic process in L3. Then the process {xk' k E Z} is a wide-sense Markov process with the transition function A(kpc) I P(k,L)P+(L,L) such that Px(k,L)Pfi(L,L) is one-one on R(Px(k,L)) onto R(Px(L,L)1) for k $.L iff 1) Here R(§) denotes the range space of the linear O rator Q (see Appendix A). P0 22 x1‘ I Quk. where {uk’ k E z} is a wide-sense martingale and §(k) is an n X n matrix function such that uk 6 R(§T(k)), R(Q(k)) is independent of k 6 Z. In either case A(k,L) I §(k)§+(L). This representation is unique in the sense that if xk I fi(k.)vk with v E R(¢T(k)) then there exists an n x n matrix X such that Vk 6 Z we have uk I K‘vk and 6(k) I f(k)K+, (2.17) mm: The Kahan filtering theory (cf. [It-3], [K—4]) assumes that the signal process {xk, k I 0,1,2,...} is generated by a given linear difference equation Of the form (2.17a) xk+1 = F(k)xk +'r(k)wk, k I 0,1,2,... where F(k) and r(k) are matrices of functions of discrete-time and wk is a white noise process. In addition P(k) is invertib1e for all k I 0,1,... and wk is orthogonal to a given any initial condition. Given an initial condition x0, then the unique solution of (2.17a) is given by the expression (cf. [K—3]) k (2.17b) x I §(k,o)x +- z §(k,i)F(i-l)w k I 0,1,2,... k O 1_1 i-l where §(-,.) is the fundamental solution of §(k+1.L) B F(k)9(ksL) a QCst) ‘ I Define d k uk = x0 +-1§1 Q(O,i)I"(i-1)wi_1 = uk_1 + §(o,k)I‘(k-l)wk-1 for k 2 1 no. x for k I 0 . 23 Then (2.17b) can be written as xk I §(k,o)uk. It follows from the definition uk and (2.14b) that the stochastic process {“k’ k I 0,1,2,...} is a zero-mean wide-sense martingale whose covariance matrix k T T (2.17c) Ph(k) = Pi(0) +-i§1 9(o.1)F(1-1)P;(1-1)F (1-1)9 (0.1) - Pu(k-l) + Q(o,k)F(k-l)Pfi(k-1)FT(k-I)QT(o,k) . Thus, by (2.16), the process {xk’ k I 0,1,2,...}, which is defined by (2.17a) with the initial condition x0 is a wide-sense Markov process. We shall refer to this signal as a Kalman signal. Observe that the class of Kalman signals is a special class of wide-sense Markov processes. It is obviously contained in the class of signal defined by (1.4). Thus the optimal estima- tion equations of a Kalman signal may be obtained from the optimal estimation equations for a signal defined by (1.4), but not con- versely. 2.3 A SOUJTION OF THE GENERAL MINIMUM MEAN-SQUARE ESTIMATION 15103151 The problem we will discuss in this section can be formu- lated as follows: Consider two related stochastic processes {xk, k - 0,1,2,...} and {yk, k - 1,2,...} In L; and I; (m s n) respectively. We will refer to {xk, k I 0,1,2,...} as signal and to {yk, k I 1,2,...} as the observation process. The problem is to find the minimum mean-square estimate of the signal xk based on the Observation record Y(L) 2 {y1,...,yL}; that is, L to find that vector 2 of the form 2 A(i)y , where A(i), kl; i-l i i I 1,2,...,L are n X m real matrices which minimizes the mean- 24 square error risk function (1.3): ~ T JExk‘L] . 6[(Xkdaklb) (xk - fik‘L)} 2 (2.18) = “xk - $144,“ . The solution of this problem is as follows: lat n d L2(Y3L) - £{K(1)Yis 1 - 132:00'3‘6} where x(i), i I 1,2,...,L are n X m real matrices. Note that L;(y;L) is a subspace of L; tion 2.6b). So that the problem is reduced to finding that vector and LEON.) I‘ L(y;L) (cf. Defini- fiklc in L;(y;L) such that (2.18) is minimmm. From the orthogonal projection lemma (cf. 2.7b), we know that *k‘ is given by L ) A kuL ‘ (xk|L;(YSL))1 (2.19) - P x 0 L30 :1.) k L based on the observation record Y(L), is the orthogonal That is the linear minimum mean-square estimate *k‘ of the signal xk, projection xk onto the subspace L;(y;L) generated by the vectors K(i)y1, i I 1,2,...,L. We will refer to the operator (-|L;(y;L) I P n defined by (2.19) as the orthogonal projec- 170;!) tion estimator abbreviation: OPE). 1) We recall that given processes have zero-means, otherwise at - a + (x IL“(y'L)) 144, 1c|o k 2 ’ d where file optimal estimate xk given no observation 6{xk]. 25 From now on the notation 5114‘ will denote the minimum mean-square estimate of the signal xk based on the observation record Y(L) which is given by (2.19), and the expression "optimal estimate" will mean this estimate. The pair (iktL,PL(k[L)) will denote the optimal estimate and the correSponding error x covariance matrix of the estimation problem. ‘We refer to the pair , k I 0,1,2,... k based on the observation record Y(L), L I 1,2,... if k,L are (RR‘L,P (le)) as optimal estimation of the signal x x not specified, as gptimal prediction if k >.L, as Optimal filtering if k I.L, and as optimal smoothipg if k~< L. Now let {xk, k I 0,1,2,...} and {uk, k I 0,1,2,...} be two signal processes such that (2.20) xk I Q(k)uk k I 0,1,2,..., where §(k) is a linear transformation which may not be invertible and xk,uk€ L'z‘ Vk -o,1,2,... . 1e: {yk. k - 1,2,...} in L; (m s n) be the observation process for the signal {xk, k I 0,1,2,...}. Then the optimal estimates of the signals x and u based on the Observation record Y(L) are k k 914;, - (xk‘L;(y;L)) and “up, - (uk|L;()';L)) respectively. Since xk I §(k)uk we have n *le - «(wuklhzmm (2.2m - sac) «Rages» '3 Q(k)fik‘{’ 0 Thus the OPE commute with any linear deformation of the signal 26 process. Furthermore if uk 6 R(§T(k)) V k I 0,1,2,..., then §+(k)xk - fans (1c)uk - p u , by (1.3) auras) 1‘ .11 k since uk E x(QT(k)) V k I 0,1,2,... . Hence, it follows from (2.20) that» (2.21b) a - fans: kll. kit ‘ From the definitions of estimation error and covariance matrix of error, and (2.21a,b) similar results can easily be derived for the estimation error and its covariance matrix. These results are summarized in the following lemma. (2.22) LEMMA. Let {xk, k I 0,1,2,...} and {“k’ k I 0,1,2,...} be two signal processes as above. The optimal esthmations (s‘ck‘t, P~(k‘.(,)) and (a , P~(k|{,)), kg, - 0,1,2,... based on x U k|L the observation record Y(L) of the processes xk, k I 0,1,... and u k I 0,1,... are related to each other via the following fl‘_ k’ --.. re lat ions : "“H (a) The optimal estimates: 2 I §(k)fi k,L I 0,1,2,... . 14L klc The estimation errors: - §(k)fik| k’l, - 0,1,2,... 0 *le l. The error covariance matrices: Pia“) - §(k)P~(k|t)§T(k) 1m. - 0.1.2.... - u 27 (b) If in addition u E R(§T(k)) V k I 0,1,2,... then for k k,L I 0,1,2,... we have u = §+(k)fi klt kl; ’ a - fags: kl; kl; ’ T 504:.) -= s+ j [7k+1|k' an“) .. [mus- yj+1] " WHl‘k’ 9J+IHJ ' WHI‘R’ yj+l] " [fin-aw 93+1| 5] a O ‘ Since §k+1‘k 1.L(y,j+1) for j+l S k. Similarly we find that Wk+1|18 yJ+1|j1 " Pyo‘flnkj ' 1) Note that this definition of innovation process is different than one that was recently given by Railath (cf. [K-IJ). Our definition is the one that was given by Cramér [C-3]. (Also see [W‘3]) s 29 where d .. Rios-+1) - [ac-”‘1‘. yk-I-l‘k] 2 o . QED It is Obvious that the stochastic process {yk’ k I 0,1,...} is purely non-deterministic if rank P~(k+l) 2 l for all k I 0,1,... . Let rank P~(k) I r(k) {2 1. We shall refer to r(k) as the rank Of the stochasZic process {yk, k I 0,1,2,...}. It is clear that r(k) s m, Vk I 0,1,2,..., if r(k) I m, Vk then we say that the stochastic process has full rank. We note that P~(k+l) is invertible iff r I m, that is, the stochastic process {:k, k I 0,1,2,...} has full rank (cf. [C-B], [W-3]). Related to the innovation process associated with a signal process {yk, k I 0,1,2,...} we have the following result (see also [C-3], [W-3]). (2.26) LDIMA. Let {ykfl‘w k I 0,1,...] be the innovation process associated with a stochastic process {yk, k I 0,1,2,...}. Then for L < k, L,k I 0,1,2,..., and L(y;k) . LOW.) 69 LGL-tl‘t) 69 Lfiti-ZIL-I-l) e...c+> Lfik‘kd). PROOF. In view of (2.25) the subspace Last-HR.) , L(YL+2‘L+1), . . . , Lgk‘k-rl) are orthogonal to each other. So that is a subspace by virtue of (2.9a). Since by (2.24) 3O yL-I'llL’ yL+21L+1"°°’ yk‘k-l J. L(Y;L)s L(Y;L+1)s-°'a L(y;k'1) respectively, and L(y,L) is contained in all these subspaces, it follows that and therefore by (2.9a) Mm) 69 L6 ) ems) LG 4+1“ k‘k- l) is a subSpace. Now it remains to show that this subspace is equal to L(y;k). Since L(y,k-1)C L(y,k) and yk‘k_1€ L(y;k), we have L(Ysk) D L(Ysk'1) @ Llek-l) 0 On the other hand, by (2.24) (*> yk . and + (ykIL(y;k-1)) E LGk‘k_1) (+3 L(y.k-1)» and for L < k, yL 6 L(y;k-1) C LGk‘k-l) @ L(y;k-1). It follows that (...) L c L.L I 1,2,..., Optimal prediction: (2.28) fik‘L - fikjL-l + C(k‘L)§L‘L-1 , Optimal predicted estimate, _ ~ ~ -1 (2.29) C(k‘L) [xle_l, yL‘L'1][yL‘L'1’ iL‘L‘l] , predictor gain matrix 2.30 k -Pk-1- k .5? ( ) PRHU id; ) mung";1 14:,- error covar iance matrix . 1], prediction For I, - o, 2140 - (${xk} - o and pump) - rum . (b) If k I L I 1,2,..., optimal filtering: d (2.31) 2 I R + C(k)? , optimal filtered estimate, I k|k k " StIc|1c-I 14m (2.32) 600 I [ik‘k-l’ S’k‘k_1][§k‘k_1. $410434, filter gain matrix, (2.33) P~(k‘k) 9- P~(k) - P~(k|k11) - C(k)[§k‘k_1. 1,4191]. x X x filtering error covariance matrix. For k I O, 20 I 6{xo} I 0 and P%(0) I P;(0). (c) If k.< L, optimal smoothing: L (2.34) It IR + r, c(k,I|I-I)y , optimal smoothed estimate, 14:, k iIk+1 I|I-1 32 (2-35) G(k,i|i-l) ‘ [ik‘i-l’ 21‘1_1][yi‘1_1a 21‘1_1]-1, smoother gain matrix, L (2.36) 1:04;) = P~(k) - z C(k, i‘i- -l)[y x x iIk+l i‘i-l’ Sikh-11’ smoothing error covariance matrix. For k - L, 9141‘ - *1. and p (k|k) - p (k). i i PROOF. (a) Since by (2.26) n . - n . - n n . _ n 1) we have gm .. (kuL2(Y;L- 1) s L“ 26“?» 1 I (xleZO'm-ID + (xkiLZG‘L‘LdD . by (2-28) G(k|L)y L-1,2,... k2; ' itlick-1+ L-|L 1 where G(k|L) is the n X m gain matrix to be determined. To determine the gain matrix G(k|L), notice that from (2.7b) xk - G(k‘L)Sr‘L‘L_1 .L yL‘L'l . SO that [1:18 yL‘L_1] - G L I 1,2,... . For L I 0, Obviously 1 .. -1 ~ “,-1,b virteof 2.5b ) YL‘L_1.LL(}',L )gyth-IH'ZG!’ ) y u ( ) ~ n - a yL‘LIl I gk|L-l , since Rk|L-1 E 1.2(y,L l). 34 fik‘o I optimal estimate xk’ given no Observation I dka} I 0 and d P%(k‘o) . [xk - fik‘o’ xk.- gk‘o] " [xk’xk] d I Px (k) . (b) Since the above equations hold for k 2 L, by letting k I L I 1,2,... we obtain expression for filtering (2.31-2.33). For k I L I 0, 20 I 0, since 20 I Optimal estimate xo given no Observation I I O . Thus P~(0) 2 [£0,20] I [xo,xo] S Pk(0). x (c) Since by (2.26) 130:2) - 1.3030 s gem”) owe L26 1). s > k L‘LI \ 5“ and the subspaces on the right of this equality are orthogonal to each other, the optimal smoothed estimate is £14 L I (xkl 1‘2"“) - (x I “(y-k) s “6' Name IN? >) k L2 ' L2 k+l‘k 2 ”(,1 L (2.34) -R + I: G(k,i I-Iw 1‘ iIk+l ' 1‘1'1 where G(k,i‘i-l), i I k+l,...,L are the n X‘m gain matrices to be determined. If the steps lead to (2.29) repeated here, their 35 results ~ ~ -1 (2.35) C(k.i\i-l) I [xk‘1-l: 71‘1-13[VI‘I-I' yi‘i-l] for i g k+1’ooo,{,s TO obtain an expression for the smoothing error covariance matrix we note that k|L k le L e xk - ij§+lc(k'1‘1'1)71|I-1 for L > k I 0,1,2,... . Hence L L P (km = [i’ - 2: cam 14):; . g - a Mk 1 14>? - J i k iIk+l ‘ 1‘1'1 k iIk+l ’ | 1‘1 1 L ~ ~ ~ N T ' s +' Z G k,i 1-1 , G k,i i-l - [xk. xx] 1_k+1 ( ‘ )[yi‘i-l yI|I-1] ( 1 ) L - Z C(k 1 1-1) y ,f iIk+l ' ‘ L 1‘1'1 k] L T - i ’ G k,i 1-1 £51 [ k 7I‘I-1] ( 1 ) where we made use of (2.25). Substituting (2.35) into this expression and noting that [§k,9 ] we obtain i‘i-l] ' [xk‘i-l’yi‘i-l | L | t (2.36) Hum-Pao- E cacti-1)? -.5? -1 i g 1dk+1 ' I|I 1 k|I 1 for L > k I 0,1,2,... . For k I.L, it is obvious that Rk‘k . ik and P;(k‘k) - P;(k). QED X x This theorem is basic in the study of discrete-time linear estimation. To the best of the author's knowledge, the results of this theorem.do not exist in the literature. At this point, we 36 make the following remarks related to this theorem. (2.37) REMARK. The only assumption related to the signal process is that the signal process is second order. The Observation process is assumed to be second order and have full rank. The full-rank assumption can be dropped by using the generalized inverse (cf. Appendix A) instead of the inverse. If the process has constant rank r, then by using a suitable invertible transformation on the Observation process one may Obtain a full rank equivalent Observation process. (2.38) REMARK. To determine explicitly the estimation equations in Theorem 2.27 we need only deterndnethe following matrices: Lik‘i-l’ yi‘i-l] and [yi‘i-l’ yi‘i-l] ' Since the process {yk|k-l’ k I 1,2,...} is a white-noise process, the computation of the second matrix above is easy, actually it is usually given as part of the problem. The determination of the first matrix is rather involved and usually is possible by making further assumptions on the signal process, such as being generated by a given linear difference equation driven by a white-noise pro- cess (Kalman filtering theory) or as in (1.4) (cf. Section 2 Of Chapter 1). CHAPTER 3 BASIC PROBIm (BP) This chapter is devoted to the derivation of optimal estimation equations for the basic problem (BP). In this problem the signal and observation equation are described by (1.4) and (1.5), which are repeated here for convenience signal: xk I Q(k)uk, (uh, k I 0,1,2,...} is a wide-sense martingale Observation: k I 1,2,... yk I M(k)uk + v ks The assumptions on the initial signal xo (or no), output noise are the same as those stated in (1.4) and (A.1.1). The matrices {(k) and M(k) have been described in (1.4) and (1.5). The problem is simply to find the minim mean-square estimation equations for the signal x1‘ I Q(k)uk based on the observation record Y(L), for k,L I 0,1,2,... . In view of Section 3 of Glapter 2, we need to derive the equations for the stochastic process {uk, k I 0,1,2,...], then use (2.22s) to get the equations for the process {xk I 900%. k I 0,1,2,...}. The estimation equations are derived in the following three sections. Section 1 is devoted to the prediction problem. Where three distinct classes of prediction are defined and a recursive algorithm developed for the single-stage prediction. Sections 2 and 3, the filtering and smoothing problem are 37 38 considered, and recursive algorithms are derived for the filter and smother. A simple example is given in Section 4 to illustrate the application of the results of the earlier sections. We need to note here that the results of the present Chapter can be Obtained from the results of the following chapter. The primary reason, however, for its separate treatment is to pro- vide a full exposition of the new approach, with an explicit state- ment of the terminology, followed by the derivations of major est imat ion equat ions . 3.1 OPTIMAL PREDICTIm FOR BP We recall from Chapter 1 that in the prediction problem, we wish to obtain the optimal estimate fl‘k‘L of the signal “k’ based on the observation record Y(L) I {y1.y2,...,y‘c}, where k > L. In other words, we wish to obtain the estimate of the signal at a time in future in terms of the existing data at the present time. Our primary interest here is to obtain data prediction algorithm for the signal uk. In particular, we wish to develop \ algorithm which are recursive in time, thereby permitting us to “aeor perform predict ion efficiently with a digital computer. Before attempting this, however, it will prove expedient first to classify predicted estimates according to the possible relationship between the two time indices, k and ’L. The need for this classification arises because both indices are variables or one may be fixed and the other may be allowed to vary. Depending on how the time indices vary, three distinct classes of prediction can be defined: 39 (3.1) DEFINITICN. (a) Fixed-interval prediction: 6k”, L I L I fixed positive integer, k I L+l, L+2,... (b) Fixed-point prediction: fik‘L L I 0,1,2,...,N-1. (c) Fixed-lead prediction: 0k”, L I 0,1,2,..., k I L + L where L is a fixed positive integer. Having introduced three distinct classes Of prediction we now derive a formla for the general prediction problem, and then proceed to Obtain algorithm for computing the Optimal fixed- interval, fixed-point, and fixed-lead predictions. From the orthogonal projection lama, we know that the predicted estimate of the signal u based on the observation k record Y(L) which is optimal for the mean-square error risk function is a - (u | “(rm up, I: I'2 ’ for k > L, k,L I 0,1,2,... . In addition, since the orthogonal projection is unique, all the predicted estimates are unique. Using the fact that uk - UL .L 1.;(y3L) V k 2 L (cf. (1.4) and (A.1.1)) we have I _ n . “le (uk “c“ch'z‘m” - (uk - uL\L‘2‘ L, based on Y(L) is accomplished via (3.3) and (3.6) with the initial condition (0L, P;(L)). If we are u given (0L, P (L)) then the fixed-interval predicted estimate 11 is obtained without calculation, in fact 6 I 6L, and uk‘L k|L the fixed-interval prediction error covariance matrix is computed through simple algebraic operations. But, the only value of L for which (0L, P~(L)) is known without processing the filter U algorithm is L I 0. On the other hand for L I 0, d 6 I G I optimal estimate u given no Observation 0‘0 0 o I 6(uo} I o and d ~ ~ P (0) I [u , no] G I Ph(0) , since u I uo - no I uo . ...g\ 80 that Nah. I O a d P I . ak‘o n G(k\o) Ph(k) This result, though,trivial, shows that the future estimates with- out Observation is zero, and the covariance matrix of the estimate is the same as that of the signal process. Thus expressions (3.3) and (3.6) have limited practical use as far as performing the fixed-interval prediction is concerned. 43 FIXED-POINT PREDICTION. In the fixed-point prediction problem, we wish to Obtain the optimal estimate of the signal at a given time in the future as function of the current time. Here the form of (3.2) that is of interest is (3.12) aN‘L I 6L L < N I fixed positive integer. In this case we continue to process the filter algorithm as the data arrives as opposed to the fixed-interval prediction, where we stop processing the filter algorithm once all the data are received. The equations of the error and the error covariance matrix Of the fixed point prediction are easily obtained from their definitions as: (3.13) fiNlé =E +u - u {mm = {(2) + rum) - Pun) u U N-1 (3.14) I P~(:,) + 2 (2(1) u iIL for L I 0,1,2,...,N-1 with the boundary conditions 5 MN - {IN and 11mm) - P~(N). u u Note that the fixed-point prediction error covariance matrix eventually is equal to the filtering error covariance matrix, since N is fixed and L eventually is equal to N. FIXED-LEAD PREDICTION. This is probably the most used case as it has application to control systems for "lead" correction action, etc. [L-lg. Here, we wish to predict the value of the signal a fixed amount of discrete-time L in the future from the current 44 time L, i.e., we wish to predict the signal with lead L. Hence the form of (3.2) that is of interest is (3'15) A L - 091329000 GL'HJL ' “I. SO that as in the fixed-point prediction, we must continue to pro- cess the filter algorithm as the data arrives. The error and error covariance matrix of the fixed-lead prediction, respectively, are given by the following equations: (3.16) EHL‘L I UL + uL+L - “‘6 , L I 0,1,2,... P~(L+L‘L) . 11(1) + Puma - Pam u u L-hL-l (3-17) I P (L) + 2 0(1) . L I 0.1.2.”- E iIL with the initial conditions GL‘O . 0 and P (L‘o) - Pu(L). 8 From (3.16) we see that the magnitude of the fixed-lead prediction error depends upon the amount of lead I” When L I 1, the magnitude has its smallest value. For that reason, this special case has merit attention, and is called the single-stage prediction. \ We shall obtain a recursive algorithm.for computing the single-stage ““* prediction 0 I 0,1,2,...,. In develOping the algorithm k+1‘k’ 1‘ for the single-stage prediction, we assume, only the initial estimate fil‘o I 0 and the corresponding covariance matrix of the error P;(l‘o) I Pb(l) are given. The algorithm‘will depend only on the pgevious estimate lek-l and the new observation yk. The result is given in the following theorem. To prove the theorem, we need the following technical lemma, which will be referred to 45 as the innovation lemma. (3.18) INNOVATION LEMMA. The innovation process {§k+1‘k’ k I 0,1,2,...] associated with the observation process {yr k I 1.2,...) of the BP defined by (3.19) - M(k)uk + v k - 1,2,... 3'1. k (see (A.1.l) for the meaning of the symbols) is generated by (3.20) k - 0,1,2,... s"1c+1|1c " yk+1 ‘ "(k)fi1c+1|1c The process {yk’ k I 1,2,...] has full rank, i.e., Dune huh] > 0 I - 0,1,2,... - PROOF. By definition yk+l‘k " y1c+1 ' (yk+1‘L(y‘k)) ' yk+1 ' (M(k+l)uk+1 + VR+I|L("")) . yk+1 - M(k+1)0k+1‘k - (vk+l|L(y;k)) for k I 0,1,2,... . Since vk, k I 0,1,2,... is a white-noise process, and by (A.1.l) vk Iuj vk,j I 0,1,2,..., we have Vk+1 J. L(y,k). So that (vk+1‘L(y;k)) - 0 by virtue of (2.10c) and therefore - M(k+l)0 k I 0,1,2,... . 31t+1| k ' yk+1 1t+1|1c Note that for k I 0, yl‘o I yl, since 61 I 6{u1} I 0 (cf. |o (3.11)). “w-OI l! \ 46 Substituting (3.19) into (3.20) and rearranging the terms we get - 11(1.+1)0 S"1c+1|1c ' y1c+1 1t+1|1c "-' M(k+1)fi k 3 0,1,2,... 1c+1\1c + v1c+1 Thus, for k I 0,1,2,... I [M(k+l)fi' + VH1, M(k+1)fi Wk+l|k’ yk-i-l‘k] 1c+1|1t k+1|k + vk-l-l] - M(k+l)[fi' MT(k+1) + [v 1c+1|1t’ t"1c+1| k] 1c+1’ 3+1] ’ since v k I 0,1,2,... . Noting that k+1 * fik+l|k V .. .. d [uk-l-l‘k’ uk+l‘k] ' Pfiml‘k) 2 ° and [VHP vk+1] I Pv(k+l) > O , by (A.1.l) , we Obtain ~ _ T (3.21) [yHI‘kfikfl‘k] M(k-l-1)Pfi(k+l‘k)M (k‘l'l) + Pv(k+l) -_\\ for k I 0,1,2,... . Since V k I 0,1,2,..., Pv(k+l) >'0 and ..M‘, P (k+l‘k) 2 0, by 08.2) the inverse of [y exists u k-l-l‘k’ yk-I-l‘k] for all k I 0,1,2,... . Thus the Observation process has full rank. QED (3.22) THEOREM. The single-stage prediction (8 P (k+1‘k)) E 1c+1|1t’ k I 0,1,2,... for the process {uk, k I 0,1,2,...} is accomplished via the following equations: (a) The stochastic process {6 k I 0,1,2,...], which is 1c+1|1c’ defined by the single-stage prediction estimate, is a zero-mean 47 wide-sense martingale, and is generated by the recursive equation (3.23) a + c(1<+1|1c)[yk - M(k)‘u 1c+1|1c e oklk-l k‘k-ll for k I 1,2,... with the initial condition a I 0, where llo G(k+l‘k) is the n X m gain matrix and is given by the follow- ing expression: (3.24) G(k+l|k) a P~(k‘k-1)MT(k)[M(k)P (klk-1)MT(1<) + Pv(k)‘.l'1 u 6 k B 1,2,see (b) The stochastic process I 0,1,2,...}, which is {fik+1\k’ 1‘ defined by the single-stage prediction error E is the 1t+1|k’ solution of the following stochastic linear difference equation (3.25) 11 a (In - c(k+1‘k)M(k))fi k+l - k - (:(1c+1|1c)vk + u a1c+111t k‘k-l for k I 1,2,... with the initial condition 6 I u . This 1\o 1 process is a zero mean wide-sense Markov process whose covariance matrix is given by the recursive equation (3.26) p (k+1|k) - P (k‘k-l) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + a u a G + Pv(k)]-1M(k) P~(k| k-l) + Q(k) u for k - 1,2,... with the initial condition P (1|o) - Pu(l). a {Gene k is a zero mean wide-sense martingale. Obviously, the process has PROOF. (a) We first prove that the process I 0,1,2,...} zero-mean. To prove that it is a wide-sense martingale, we must show that (I) 0 e L; Vk - 0,1,2,..., and (II) lei-1‘ k 48 |L(u. L+1|L))" L s k . mk-l-II k fi’L-i-I‘L Since uk 6 LE, so that 6 by virtue Of (3.9b). It remains Hl‘k to verify the requirement (ii). The verification of (ii) is as follows: For k 2 L (fim‘klmaalm - ((uk+1|L:(y;k))|L(3.L+1|L)) (ammo ..wlm by (2.1%), since L(fi,L+l‘L) c L'z‘(y;k) for L s k <. same reason- ing as above (€1\L(U.L+1IL) . by (3 2) L+1|L , since 0 1, +1| L e L(0,L+I|L) . ' aL+1|L Since the observation process has full rank, by (3.18), the single-stage predicted estimate is given by G1c+1|1c a °k+1|1c-1 + G - [a 3 107 3' 1‘1 1c+1\1c-1’ k‘k-l 14km k‘k-l From (3.21) we know that (3.21) [Mk-15141.4] - M(k)Pfi(k‘k-1)MT(k) + Pv(k) . On the other hand, [fik-i-l‘k-l’yklk-l] " [gut-1 + “H1 ' “k' M(k)fik|k-l"+ ”13' by (3.19) (3.27) I P~ (kl k-1)MT(1<) U k Substituting (3.21) and (3.27) into the expression for G(k+l|k) since uk+1 - uk’ uk|k-l and v are orthogonal. above we Obtain C(k-i’l‘k) I P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + Pv(k)]-1 . u 5 (b) The expression for the single-stage prediction error is Obtained as follows: IO. “1c+1|k uk‘l‘l ‘ °k+1|k in“ * k+1 k k u b 3 23 “A . uk+1 - uk‘k'l - G( ‘ )[yk - M( ) k‘k'l] 9 y ( ' ) . fiklk-l - G(k+l|k)[M(k)uk|k-l + vk] + uk+1 - uk , by letting u1c+1 I “kid - u +'u and substituting (3.20) k k (3.25) (In - G(k+1‘k)M(k))fi'k‘k_1 - G(k+1‘k)vk + uk+1 - uk for k I 1,2,... . For k I 0, it is Obvious that fillo I u', since “1‘0 I O. I! \ I 50 TO show that the process I 0,1,2,...], which [qu‘k’ k is generated by (3.25), is a zero-mean wide-sense Markov process, define F(k) g In - c(1c+1\1c)h(1c), r(k) ‘3- [-c(k+1\k). In]. and then (3.25) can be written as (3.28) uk+l|k I F(k)i'ik‘k_1 +'I‘(k)rk , k I 1,2,... From the definitions F(k) and rk it is clear that F(k) V k I 1,2,... is invertible and the stochastic process {rk, k I 1,2,...] is a zero-mean white-noise process. I 111 and therefore u1|o 1 rk, k I 0,1,...] is defined by Furthermore, since a 1'0 k - 1,2,..., the process {GHI‘R, (3.28) is of the same form and is subject to the same conditions as the process defined by (2.17). Hence, it is a zero-mean wide- sense Markov process. It now remains to determine the single-stage prediction error covariance matrix P~(k+1‘k). To obtain an equation for P~(k+1|k), we may multiplyu(3.25) with its transpose and take u mathematical expectations, or let k I L in (2.30) and then sub- stitute (3.21) and (3.27) into it to get 51 p (k-l-l‘k) - P (Relic-1) - P (k1k-1)MT(k>1M(k>Pen-Indus) 5 fi 5 + Pv(k)]-]M(k)P~(k‘k-l) U for k 1,2,... . Since, by (3.7) P~(k+1‘k-1) . P (k‘k-l) +Q(k), u G we have (3.26) P~(k+1\k> - {Mk-1) - P~(k|k-l)MT(k)[M(k)P(k|k-1)MT(k) + U U U + Pv(k)]'lM(k)P~(k|k-l) +000 11 for k I 1,2,... . For k I 0, obviously P (1‘0) I Ph(l), 5 since a I u QED 1‘0 1 ' This theorem gives a recursive algorithm for computing the single-stage prediction. The recursive algorithms are given in Theorem 3.22 are extremely useful in processing Observations to obtain the predicted estimate utilizing a digital computer. The Observations can be processed as they occur, and there is no need to store any observation data. In fact, so far as storage of the observations and the signal is concerned, only flk‘k-l need to be stored in proceding from time k to time knl. An additional feature is that the error covariance P~(k+l‘k) is ‘II"' computed as a direct part of the estimator, and ma; be used to judge the accuracy of the estimation procedure as in the Kalman filtering theory. This is based on the assumption that the observation models and the means and covariances of related pro- cesses are correctly known. A block diagram of the single-stage predictor is shown in Figure 3.1. The information flow in the predictor can be explained very simply by considering this block diagram, which 17": 52 is a representation of (3.23). From this figure we see that single- stage predictor consists of a model discrete-time linear dynanical system fip I (In’ G(°‘°).In) (cf. [Ii-5]): 0H1” - nk‘k-l + G(k+l\k)yk‘k_1 , state equation M» 2k I 8H1” , output equation in which the gain-times-innovation term is applied to the model as a forcing function. Observe that the predictor operates in a predict-correct fashion. That is, the correction term caet1|1t)yk+1‘k is added to the predicted estimate d to k|k-l determine the current predicted estimate. The correction term involves a weighting of the innovation associated with the obser- vation progess by the gain matrix C(k'I-l' k). Input 0; 2p 5 Dynamical System 8p Output 0f fip Observation n X m Corr'Lcti-O Current estimate vatiofi I M(k) I : klk-fll guitl - e- 1 n X m : Previous a ...,N estimate " I-a-O' k-1,2’eee : 01‘0.0 Figure 3.1. Block diagram of single-stage predictor. The estimation equations derived above, can be used to estimate the signal {xk I Q(k)uk, k I 0,1,2,...} as pointed out in Section 3 of Chapter 2. If we wish to have these equations involving only the estimates of the signal [xk I §(k)uk, k - 0,1,2,...}, then as shown there we need the assumption that 53 uk 6 R(QT(k)) vk I 0,1,2,... . Assuming that is so, we state the result in the following corollary for the single-stage predictor. The proof of this corollary easily follows from (2.22) and (3.22). (3.29) COROLLARY. The single-stage prediction (RR-H”, Piaci-l‘k» k I 0,1,2,... for the process {xk I §(k)uk, k I 0,1,2,...} (cf. (1.4)) such that uk 6 R(§r(k)) V k I 0,1,2,..., is accomplished via the following equations: (a) The stochastic process {9k+1‘k, k I 0,1,2,...}, which is defined by the single-stage prediction estimate is a zero-mean wide-sense Markov process, and is generated by the recursive equa- tion (3.30) 2 - Q (k+1)§+(k)fc xk+1‘k + 1<(1c+1|1t)[yk - mm k\k-1 14 Rd1 for k.I 1,2,... with the initial condition fil‘o I 0, where K(k+l‘k) is the n X m gain matrix and is given by (3.31) R(k+l|k) I 1(1t+1)o+(1c) P~(k‘k-1)HT(k)[H(k)P~(k‘k-1)HT(k) + X X -l +-P;(k)] . (b) The stochastic process {ik'l'l‘k’ 1t - 0,1,2,...}, which Is I I defined by the single-stage prediction error §k+l‘k’ is the solution of the linear stochastic difference equation (3.32) -- (Q(k+1) - K(k+l‘k)H(k))x' - K(k+1‘k)vk + ik‘fl‘ k k|k-1 + b(k+1)(uk+1 - uk) for k I 1,2,..., with the initial condition, fil'o I §(1)u1. This process is a zero-mean wide-sense Markov process whose ‘1 -... ,7“ 54 covariance matrix is given by the recursive «nation + +I T (3.33) P~(k+l‘k) - Q(k+l)§ (k)P~(k‘k-1)§ (1.)) (It-+1) - x x + T T - §(k+1)§ (k) P~(k|k-1)H (k)[H(k)P~(k|k-1)H (k) + x x - +T T T + PV (10] 5100504191» (kn (k+l) + <1(R+I)Q (km (k+l) x fork = 1,2,..., with the initial condItIon P§(l|o) = §(1)Pu(1)§T(1). (3.34) REMARK. It is shown in (2.17) that the signals of the Kahan filtering theory can be written in the form xk I 9(k)uk for k I 1,2,... and x0 I 110 where §(k) I §(k,o) is a transi- tion matrix. By letting §(k) I §(k,o) and 6+(k) I p-1(k,o) I Q(O,k) in (3.29) and noting that §(k,j)§(j ,k) I In, one Obtains the results of the Kahnan filtering theory (cf. [II-3], [lb-4]) for the BP. (3.35) REMARK. The algorithms given in Corollary 3.29 are not available in the current literature to the best of my knowledge, and cannot be obtained directly from the Kahan filtering theory. (3.36) RWARK. A comparison of (3.22) and (3.29) shows that in the estimation of the signal x I §(k)uk, k I 0,1,2,... computa- k tion time is saved if the algorithms given in (3.22) is first used followed by (2.22s) to obtain the desired results. 3.2 OPTIMAL FIIIERING FOR BP We now examine the problem of obtaining an algorithm for computing the basic problem Of interest, namely, the filtering Problem. In developing the algorithm for optimal filtering for the Bignal {u k I 0,1,2,...] and therefore for the signal k, I! \ 55 {xk I Q(k)uk, k I 0,1,2,...], we assume that only the initial estimate Go I 0, and the filtering error covariance matrix at the initial time, P;(0) I Ph(0), are given. From (3.2),uwe observe that prediction and filtering are interdependent in terms of the determination of the predicted estimate given the filtered estimate and vice versa. In fact, I 3 V' I ... Ok+1‘k Uk k 0,1,2, Hence, the single-stage predicted estimate algorithm (3.23) can be used to compute the filtered estimate. Of course, the filter- ing error covariance matrix will not be the same one that is given by (3.26) for the single-stage predictor error. It must be computed to judge the accuracy of the estimation procedure. With these preliminaries completed, we now state and prove the basic theorem of optimal filtering for the signal (nu, k I 0,1,2,...}. (3.37) THEOREM. The filtering (0k, PL(k)), k I 0,1,2,... for the stochastic process {uk’ k I 0,1,2,?..} is accomplished via the following equations: V (a) The stochastic process {fik’ k I 0,1,2,...}, which is defined by the filtered estimate, is a zero-mean wide-sense martingale. It is generated by the recursive equation (3.38) 11k - 01", + c(k)[yk - M(k)fik-1] for k I 1,2,... with the initial condition 60 I 0, where C(k) is the n X m gain matrix and is given by 56 T -1 (3.39) C(k) I P (k)M (k)Pv (k) . u (b) The stochastic process {fik’ k I 0,1,2,...}, which is defined by the filtering error fik that satisfies the stochastic linear difference equation (3.40) 61‘ - (IIn - G(k)M(k))fik_1 + [In - c;(1c)h(1c)](uk - uk_1)IG(k)vk for k I 1,2,... with the initial condition Go I no, is a zero- mean wide-sense Markov process. Its covariance matrix is given by the following recursive equation: (3.41) P~(k) I P~(k|k-1) - P~(k)MT(k)P;1(k)M(k)P~(k‘k-l) u u u u for k I 1,2,... with the initial condition P~(0) I Ph(0). u PROOF. (a) Since, by (3.2) 0 I Oh, it follows from (3.22s) 1t+1|k that the process (Gk, k I 0,1,2,...} is a zero-mean wide-sense martingale and 0 - a _1 + c(1t+1|1t)[yk - Rana k k k-l] for k I 1,2,... . For k I 0, obviously 00 I 0 (cf. 3.11). The filter gain matrix C(k) is Obviously equal to the "" single-stage predictor gain matrix G(k+l|k). Thus, from.(3.24) C(k) I C(k-i-llk) I P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + P (k)].1 . ~ ~ v u u To obtain an expression in terms of the filtering error covariance matrix for the gain matrix C(k), note that since P (k‘k-l) 2 0 u and Pv(k) >10, from (B.2) and (B.1) 57 p (k‘k-1)MT(k)[M(k)P (1t|1c-1)NT(R) + Pv(k)]-1 ‘u a - P~(k‘k-1)MT(k)R-1(k) - P (k‘k-1)MT(k)[M(k)P (k‘k-1)MT(k) + 11 fi' '11 + Pv (kn-114(k) P” (kl k-1)MT (k) P;1(k) . u It is shown in part (b) of this theoremtthat P~(k) - P (k‘k-l) - P (k‘k-1)MT(k)[M(k)P (1t|1t-1)NT(1c) + Pv(k)]'1 u G a E x M(k)P (k‘k-l) . a Hence C(k) - P (klk-1>M?(k)tu(k)r (t|t-1)hT(k) +1300)”1 u a -gemhmghm. u (b) The filtering error is by definition 6k B Uk ' Gk, k - 0,1,2,eee e Substituting (3.38) for Gk, and rearranging the terms we get uk I uk - ilk-1 - G(k)[yk - M(k)0k+1] I 310']. I G(k)[M(k)fik-l + M(k) (“k ' uk-l) + vk] +Iuk - uk-l (3.40) - [In - G(k)M(k)]fik_1 + [In - <3(1c)h(1c)](uk - uk_l) - c:(lt)yk for k I 1,2,... . For k I 0, it is clear that fio I uo, since a - O. O A procedure similar to that used in (3.22b), shows that the stochastic process {fik’ k I 0,1,2,...}, which is generated by (3.40), is a zero-mean wide-sense‘Markov process. I! \ .Z. 58 To obtain an expression for the filtering error covariance matrix P (k), we note from (3.7) that G P~(k) - P~(k+1‘k) - Q(k) . u 11 Substituting (3.26) for P~(k+1‘k) and noting that u C(k) - c(1t+1|1c) - P~(k‘k-1)MT(k)[M(k)P~(k|k-1)MT(k) + Pv(1t)]"1 u u . P~ (k)MT (1t) P;1(k) u we obtain P_,(k) - P (k‘k-l) - C(k)M(k)P (k‘k-l) “ a ti . P~(k|k-l) - Pu(k)MT(k)P;1(k)M(k) P~(k|k-1) u u for k I 1,2,... . For k I O, P (0) I Ph(0), since 60 I uo. QED {I A block diagram of the filter is shown in Figure 3.2 which is a representation of (3.38). By comparing Figures 3.1 and 3.2, we see that the single-stage predictor and filter for the stochastic process {“k’ k I 0,1,2,...} based on the same observation record have exactly the same structure: III” fiP I (In, G(°|°), In) , the single-stage predictor 2F - (In. c(-). In) , the filter . The computation, of filtering differs from the computation of single-stage prediction, in the determination of the filtering error covariance matrix. To compute the filtering error co- variance matrix P (k), one may use (3.41) or the following 8 59 equality (cf. (3 .7)): p (k) = P (k+1|k) -Q(1t) 1t - 0,1,2,... . ~ {I A u A Input of 2p :Dynamical System 21" Output of 8F E yk k‘k- C(k):k k_ 3 (1k 2k . Gk ‘E’ ‘n X m 2 1.. Observation Inno—. _Corr ction Current estimate vation I 140:) W I in“ Unit L . de- k x l M( kIl n m [Previous a estimate I kIlflvu, Mo-o Figure 3.2. Block diagram of filter. We now state without proof the result of Theorem (3.37) for the signal [xk I 900%. k ' oslszs-n) (Cf- (1J0). assuming that u 6 R(QT(k)), k I 0,1,2,... . k (3.42) COROLLARY. The filtering (2“, Pica), k I 0,1,2,... for the signal {xk I Q(k)uk, k I 0,1,2,...} is accomplished via the following equations: (a) The stochastic process {ik’ k I 0,1,2,...}, which is defined by the single-stage prediction estimate, is a zero-mean wide-sense Markov process, and is generated by the recursive equation (3.43) R - Q(k)§+(k-1)fik_1 + KOOIPk - H(k)9(k)9+(k-l)fi k k-1] for k I 1,2,... . The initial condition is *0 I 0, where K(k) is the n X m gain matrix and is given by (3.44) R(k) - P. (1011" (k) P;1(k) x 60 (b) The stochastic process {fik’ k I 0,1,2,...], which is defined by the filtering error ik that satisfies the linear stochastic difference equation (3.45) gk = [o¢*rk-1) - xn¢+kk-l)]§k_1 + [Q(k) - K(k)H(k)9(k)](uk - u - K(k)vk k-l) for k I 1,2,... with the initial condition i0 I x0 I §(0)uo, is a zero-mean wide-sense Markov process whose covariance matrix is given by the following equation: T (3.46) P;(k) = 4¢+P (k-m+ 9T x i T - + - P;(k)HT(k)Bvl(k)H(k)P (k‘k-l)§ (k-l)§T(k) x i for k I 1,2,... with the initial condition P§(O) I P;(O) I 4(0)ph(0)4T(0). 3.3 OPTIMAL SMOOTHING FOR BP. Recall from Chapter 1 that the optimal smoothing problem deals with estimate of the signal at a time k based on the observation record Y(L) I {y1,y2,...,yL}, where k‘<.L: that is, the time at which it is desired to estimate the signal precedes the time of the last observation yL. Just as in the case of prediction, the smoothed estimate of a signal is classified according to possible relationships between the two time indices, L and k. Depending upon how the time indices L and k vary, analogous to the prediction, three classes of smoothing can be defined (cf. [M-6], [M-7]): a.“ 61 (3.47) DEFINITION. (a) Fixed-interval.smoothing: G , L I L I H; fixed-positive integer, k I 0,1,2,...,L-1. (b) Fixed-point smoothing: fik‘L’ k I‘N I fixed-positive integer, 4, - n+1, n+2,... . (c) Fixed-lag smoothing: fi , L I k+L, L I fixed-positive kl; integer, k I 0,1,2,... . Before examining each of these classes separately in terms of developing algorithms, we seek a general formula to the optimal smoothing. First we recall from (3.18) that the observation process has full rank, so, by (2.27c) the optimal smoothing (flkJL, P%(klc)), k.< L is accomplished via the following equations with the initial condition (fik, f~(k)): u L 614; - 0k + “glam, i‘i-l)§1‘1_1 , ,, -1 “(1" 1‘1'1) " [“141-1' S7:1|1-1]w1|1-1’ 9'1|1-1] ' L P (klu - P (k) - z 00:. 1|1-1)[§ . a 3 a ti 1-k+1 1| 1'1 kl 1'1 where, by (3.18), ini-l ' yi ' M(imqiul and ... ~ '1‘ a 1 .. . [’ili-l’ yi‘H] M( )Pfiuli 1|)M (0+ Pvu) Notice that in these equations, which are valid for all the classes of smoothing, the only unknown is the n x m ‘matrix [uk‘i-l’ yi‘i-l] i I k+1,k+2,...,{,. This matrix is determined as follows: 62 vk, k I 0,1,2,... is a white-noise process and vi l'uj’ i,j = 0,1,2,... 0 3V1 J-fik‘i-l i=1,2’ooo, k-0,1’2,ooo a [“k|1-1’ yi‘i-l] [uk‘i-l’ M91‘1-1'“1‘1-1 ' G<1+1‘1)§1|1-1'+ + ui-l-l - “1] . [gunman-1] - c(k.i|1-1)[91‘1_1.fi1‘1-1] - [ak‘1_1,§1‘1_1]cr(1+1| 1) + + G(k,i| 1-1)[§1‘1_1,91‘ 1_1]cT(1+1‘ 1) 63 for i = k+1,k&2,... . In obtaining the last equality we used the fact that u - u for ki< 1 1+1 1 * fik‘i-l’ y1|1-1 which is an easy consequence of (1.4) and (A.1.l). Substituting (2.34) for C(k, i‘i-l) and (2.29) for G(i+l|i) into this last equation, and performing the necessary operations we obtain P60" 1+1“) = [filth-1’ 61‘1-1] ' [fik‘1-1’ 3"1‘1-1] [mi-1' y1‘1-11-1W1h-1’ “111-11 for i I k+l,k+2,... . Since ~ ~ d [uk‘1_1, ui|i_1] - P&(k, 1|1-1), [y1‘1_1,91‘1_1] . M(i)P%(i|i-1)M?(i) + 2v(1) by (3.18), [fik‘i-l’yili-l] = P;(k, 1|1-1)uF(1) by (3.48), u u we can express P (k, i‘i-l) as -.\\ fi ..¢~ P;(k, 1+1|1) - P;(k, 1|1-1) - P;(k, 1|1-13u3(1)tu(1)p (1|1-1)u?(1) + u u u E + Pv(i)]-1M(1)P~(i|i-l) u for 1 - k,k+1,... with p (k, 141.4) - r (k‘k-I) as the initial G a condition. This completes our derivation of a general formula for Optimal smoothing, and we summarize our results below. 64 (3.49) THEOREM. Optimal smoothing (fiklt’ P;(k‘L)), k4< L for u the signal u k I 0,1,2,... is accomplished via the following k9 equations with (fik, P;(k)) as the initial condition: u (a) Optimal smoothed estimate is given by the algebraic equation L (3.50) a - o + z C(k, 1 1-1)[y - M(i)fi 1‘“ L 1=1t+1 ‘ i ”(’11 where the n X m matrix C(k, i‘i-l) is (3.51) C(k, 1|1-1) - P (k, 1|1-1)hT(1)[u(1)P (1|1-1)uT(1) + Pv(1)]‘1. a h (b) The n x n cross-covariance matrix P (k, i‘i-l) satisfies a the recursion (3.52) P (k,i+1| 1) - P (k,i‘i-l) - P (k,iIi-l)MT(i)[M(i)P (i|i-l)MT(i) + ti ti 6 u + Pv(1)]")1(1)P (1|1-1) a for i I k,k+l,1,... with the initial condition P (k, k‘k-l) I G P (k‘k-l). E (c) Optimal smoothing error covariance matrix is given by the algebraic equation L (3-53) P (144) - P (k) - z: 60:, 1|1-1)u(1)P (1, 141-1) . a G 1-1 G This theorem provides a solution to general optimal smoothing problem for the signal u It is an easy’task to show k. that the optimal smoothing (at P (140), for the signal i k|(,’ xk I Q(k)uk such that uk 6 R(§T(k)), is accomplished via Theorem 3.49 where u is replaced by x and M1 is replaced by 8. Thus the optimal smoothing equations for the Kahan signal (cf. 65 (2.17)) are exactly those given in (3.49) with x replaced by u and H is replaced by M. This solution, for discrete-time, Kalman signals, is not known at the present time to the best of the author's knowledge. (For previous works in this area see e.g. [H-6], [K-Z], [M—S]. [K-l]. [W-5]-) The Optimal estimation equations given in Theorem 3.49 are obviously valid for all the classes of optimal smoothing. They may be written in different forms for each class as in the case of optimal prediction. In the following we shall give the forms of these equations for the single-stage smoothing (fixed- lag smoothing with lag 1) and fixed-point smoothing. SINGLE-STAGE SMOOTHING. Here we wish to obtain the optimal estimate 6 for k I 0,1,2,... . By letting L I k+l 14 H1 in (3.49) and noting (3.2), (3.37) we get the results: ak‘ H1 - 11k + C(k, k+1|1t)[yk+1 - M(k+l)€lk], C(k. k+1|k) - P~(k)MT(k-I-l)[M(k+l)P (k+1|k)uT(k+1) + Pvat-a-ln'1 u 5 P (It, k+1|k) a P (k) , s s P (k|k+1) = P (k) - C(k, k+l‘k)M(k+l)P (k) a G a for k I 0,1,2,... with the initial conditions Go‘o I 00 I 0, P~(o|o) - Pu(0). U FIXED-POINT SMOOTHING. Recall from (3.47b) that the optimal fixed- point smoothing problem deals with the estimate fiNIL where N I positive-integer and L I N+l,N+2,... . Let k I N be fixed in (3.49), then from (3.50) it is seen that 66 L-l “up, . “h - 1.3“ cm, 1|1-1)[y1 - “(1)°1|1-1] 4” Gm. L‘L-DUL ' “(UGL‘LJJ (3.54) = “Nu-1 + GCN. LlL-DEyL - M(L)uL|L_1] where, by (3.51), (3.55) G(N,L|L-1) - P~(N.L|L-1)MT(L)[M(L)P~(LIL-1)MT(L) + PVmJ"1 U u where P (N, L‘L-l) is given by (3.52). E To complete the derivation of optimal fixed-point smothing equations we need to find an expression for the smoothing error- covariance matrix P GN‘L). From (3.53) it follows that the u covariance matrix satisfies the recursion (3.56) P~(NIL) = P~(N|L-1) - cm, LIL-1)M(L)P~(L. le-l) u U U for 4, - N+l,N+2,... with P (N‘N) - P (N) as the initial con- fl 6 dition. Thus we have found that the optimal fixed-point smoothing k accomplished via the recursive equations (3.54), (3.55), (3.52) (ft-Nu: P (MD), L I N+l,N+2,... for the signal u is a and (3.56), with the initial condition (8“, P (N)). When equa- u tions are written for a Kalmsn signal,i.e. u is replaced by x (cf. (2.17)) and M is replaced by a, one obtains a new procedure in smoothing of Kalmsn signals. 67 3.4 AN EXAMPEE Let us consider a scalar stochastic process {xk, k I 0,1,2,...}, which has zero mean and whose covariance function is Pk(k,j) I e'(k+i), where o I constant > 0. Suppose that we observe this 0 process in the presence of a zero mean white noise process {vk, k I 1,2,...} for which 6{vk vj] I r(k)6kj and 6{vkxj}I0 Vk,j, where OSr(k) O . Since [uk - ui, ui] I a - a I 0 V k,i I 0,1,2,..., the stochastic process {uk’ k I 0,1,2,...] is a wide-sense martingale with zero mean and constant covariance function 0. Hence, I! _. \~. 68 xk I e'kuk, and it is a wide-sense Markov process. Now, we are ready to compute the Optimal estimation equa- tions for the process {xk, k I 0,1,2,...} which is observed by (3.57). We first note that since Ph(k) I a I constant, Q(k) fl Pu(k+1) - Ph(k) I 0 (cf. (1.4)). Therefore it follows from (3.17) that (3.53) P (k+1|1t) . P (k) W: - 0,1,2,... . a 6 Hence, the optimal single-stage prediction and filtering for the process {uk’ k I 0,1,2,...] are accomplished via the same set of equations (cf. (3.22), (3.37)). Note that for the process {xk, k I 0,1,2,...} we have (cf. (2.22), (3.58)) P~(k+1‘k) - .e'2(k+l)e"'2k P (k) x i (3.59) - e’2P (k) W - 0,1,2,... i We summarize the results below. The computations here are exceedingly simple hence the derivations of these results will not be demonstrated in detail. OPTIMAL REDICHON. In the following, N and L denote fixed- " positive integers (a) Fixed-interval prediction (cf. (3.3), (3.6)): For k - L+1’L+2’.OO . _ _ -(k-L) “k‘ L 0L ’ fik‘ L e gL -2 (k-L) P~(k| L) - P~(L). P~(k‘L) - e P~(L) u u 3! X 69 (b) Fixed-point prediction (cf. (3.13), (3.14)): a - a ’ - '01-!) “up, L ’ML 8 it p cm) - P (L) . P (ML) - 8-2m-L)P (t) 5 {I § § (c) Fixed-lead prediction (cf. (3.15), (3.17)): For L I 0,1,2,... fi 6 , \ 3 8-1%; , Hm, ' z, . girl-ML P__(L+1|L) - {(1).} P (mm - em”? (1.) . u u 3': 5E OPTIMAL SINCE-STAGE PREDICTION AND FILTERING. (a) Optimal estimate: From (3.38) (or (3.23)) we have Gk - a!“ + c;(1<)[yk - e'knk_1] =9 5%,, - e'kkd + x(h)[yk - e'lakd] for k - 1,2,... with ac - 00 (b) Gain matrix C(k): From (3.39) (or (3.38)) -k P..(1t)e Pi. (1t) G<*>"’*'?a?)' = ‘0‘) 'as' (c) Error covariance matrix: From (3.41) and (3.58) P~(k+1|k) a P~(k) u U P.,(k-1)e'2k - P (k-l) - 4:71? P (k-l) a e __P (It-1)+t(k) a a t (k) PG (k- 1) e-ZkP~(k- 1)+r(k) u (3 . 60) I for k I 1,2,... with P (0) I a. It is easily shown that (3.60) E with the initial condition P (0) I a has the unique solution {1' 70 1 (3.61) P~(k) I k _21 k I 0,1,2,... . u 2 e +.l i=1 r(i) 0 Hence, from (2.22) we have e.2k l (3.62)P§(k)I k e'2(i'k) em‘ I k e_2(1_k) eZk k I 0,1,2,... . + ._______. ___ if, r(i) a 151 1(1) *' a Notice that C(k) I 1 k e-(EI-k) ek ’ r(k) (131 W + r) (b), (3.61) and (3.62) l (k) ' k e-2(1-k) e2k ° r(k)( Z W + 7) iIl OPTIMAL SPOOTHING. We shall only derive the optimal estimation equations for general smoothing (cf. (3.49)). ‘We see from the equations given (3.49) that the only quantity that needs to be determined is P (k, j|j-l), the other quantities in these equa- u tions are computed above for the filtering. From (3.52) and (3.58) we have e-szfi(j-1) ..r4 P (k.1+1|1> - P (k.J|J-1)[1 - -:11 1 fi fi e Pg(J-1)+r(1) r(i) e'23P3(1-1)+r(1) ' 2~(ksj‘j'l) U for j I k,k+1,..., with the initial condition P (k,k|k-l) I u P (k‘k-l). It is shown that the tlnnique solution of this equation is a i- P~(k-1) n r(i) ‘1 iIl P (k,j|j-l) - 1-1 2 j 2 k+l . " - 1 “ n [e P3<1-1)+r<1>] iIk ‘b h 71 For the stochastic process {xk, k I 0,1,2,...} this equation becomes j-l e.(j+l)P_.(k-l)ig1 r(i) {awn-1) = j—_1———L j 2 1+1 . x n [P (i-l)+f(1)] iIk x This completes our discussion on the example. We note here that the above solutions for this simple example can be obtained by other techniques as well. One will obtain the same result. as»; CHAPTER 4 CROSS-CORREIATED NOISE HOBLEM (CCP) In this chapter, we derive the optimal estimation equations for the cross-correlated noise problem.(A.l.2), that is, the estimation equations for the process {xk I Q(k)uk, k I 0,1,...] where the process {uk+l - uk, k I 0,1,2,...} and output noise {vk, k I 0,1,2,...} are correlated (cf. Assumption A.1.2). As we did in the preceeding chapter, we shall derive the equations for the process {uk, k I 0,1,2,...} then use (2.22s) to obtain equations for the process {xk I Q(k)uk, k I 0,1,...}. 4.1 OPTIMAL PREDICTION FOR CCP We first derive a general formula of the prediction for the problem of interest then seek the form of this formla for the three distinct classes of the prediction problems introduced in Chapter 3. The general formula is derived easily by using the 5,; orthogonal projection lemma as follows: The linear minimum mean-square predicted estimate of the signal uk based on the observation record Y(L) is n Ok‘, (uk\L2(y.(,)) k > L where k,L . 0,1,2,... . By adding and subtracting the term “4+1 to uk and using the linearity of the orthogonal projection we get 72 73 . - _ n . n . “RH (uk u “III-2W .m + amazes» k > L. From the assumptions made in (1.4), (A.1.2) we have 1, and u “I. L+1 k ' uL+1 * Vt ’ n for all k > L. Therefore uk - “0+1 1 L2(y,L) and 6,4, - «Wham» ((4.1) I GL'H-‘L for k > L. From this result we observe that the general predicted estimate is equal to the single-stage predicted estimate. SO we need to develop an algorithm for the single-stage predicted estimate. From this algorithm‘we may obtain the three classes of prediction by using (6.1) as shown below. We now state the predictor (4.1) and the corresponding covariance matrices of the estimation errors for the classes of prediction, that is, the equations for prediction (“kit’ P;(k‘L)) k >.L are developed for the three distinct classes. The dzrivation of these equations is straightforward and will not be demonstrated here. FIXED-INTERVAL PREDICTION. Fixed-interval prediction (ak‘L’ P (k‘L)) k > L, L I fixed positive integer, is accomplished 6 via the following equations with the initial condition (a . P (L+1|L)): fi IH-l‘L (4'2) fik‘L ' GL+1‘L ’ (j 74 (4.3) P~(k|L) - P~(L+1‘L) + Pu(k) - Pu(L+1), 1t 2 L+1 u u k-I a P (L+1|L) + 2 (2(1)- 6 1-L+1 FIXED-POINT PREDICTION. Fixed-point prediction (111,”, P (M0) E L I 0,1,...,N-1, N I fixed positive integer, is accomplished via the following equations with the boundary condition (%’ P (N)): u 4.4 t. - a , ( ) N‘L H111. (4.5) P~(N‘L) - P~(L+1|L) + Pom) - Pawn . U U FIXED-LEAD PREDICTICN. Fixed-lead prediction (at +1., L, P (L+L|L)) E L I 0,1,2,... with lead L I fixed positive integer, is accomplished via the following equations with the initial condition (oL‘o, P~(L|o)) - (o, Pu(L)) (cf. Remark (3.10) of Chapter 2): u (4.6) a “1+th ' all; ’ (4.7) PE(L+L‘L) - P~(L+1‘L) + Pu(L+L) - Pu(L+l) . u Note that in order to compute the above three classes of prediction we need to know the fixed-lead prediction with lead one, i.e. the single-stage prediction. In the fixed-interval pre- diction, the only value of the single-stage estimate we must know 13 fl , where L is the end-point of the observation interval L+1| L and it is fixed. For the fixed-point and fixed-lead prediction we need to process the single-stage prediction algorithm as the data arrives. 75 In the following,wwe develOp a recursive algorithm for single- stage prediction based on the previous estimate and the new observa- tions To do this we need the innovation lemma for the observation process defined by (1.5) and (A.1.2). (4.8) INNOVATION LEMMA. The observation process {yk’ k I 1,2,...} defined by (1.5) and (A.1.2) has full rank, i.e., Wk+1|h5h+1|k3 . M(k'l-l) Pfi(k+l|k)MT (PM) + Pv(k+l) > o for all k - 0,1,2,... . PROOF. By definition d y1t+1|lt ' yk+l ' (yk+1‘1'(y‘k))’ Since yk I M(k)uk + vk, we have - M(ki1)8 k I 0,1,2,... yea-int ' y1t+1 1t+1|1t I °k+1|k ' Recall from (1.4) and (A.1.2) that v , j I 0,1,2,... is a white- .1 noise process such that [uk+1 - uk, VJ] I C(k)6kj and uo .Lvj 1,1 for all k,j I 0,1,2,... . It follows that (4.9) [uk, v1] I O V'k s j, k,j I 0,1,2,... . Thus VH1 1 L(y;k) and therefore (le|L(y;k)) - 0. Then the innovation vector is (4.10) yk-i-l‘k I yk+l - M(k-l-1)(ik+1‘k I M(k'Il-l)il'k+1‘k + vb”, k I 0,1,2,... 76 Using the expression (4.10) for the innovation vector we Obtain Wk-tl‘k’yk-u‘k] B [M(k+1)uk+l|k + le'ua‘flmmqk "' VH1] - M(k+l)P (k+1‘k)MT(k+1) + Pv(k+l) s for k - 0,1,2,... . s1hee P (k-I-l‘k) 2 o and by (A.1.2), a Pv(k+l) > o the matrix ” 0 V k I O l 2 ... [yk-i-l‘k’ 9k+l‘k] > ’ ’ ’ as desired. QED (4.11) THEOREM. The single-stage prediction (0 , P (1t+1|1t)) . 1t+1‘1t u k I 0,1,2,... of the signal {uk, k I 0,1,2,...] is accomplished as follows: (a) The stochastic process {0k+l‘k’ k I 0,1,2,...}, which is defined by the single-stage predicted estimate °k+1‘k’ (4.12) fik+1|k - °k\k+1 + G(k+1‘k)[yk - M(k)Ok‘k_1] for k I 1,2,... with fil‘o I O as the initial condition, is a zero-mean wide-sense martingale. The predictor gain matrix C(k-i-l‘k) is (“'13) ““1“" " [P (Rik-D1300 + C(k)][ump (k|k-1)MT(k) + oncn'l. a a (b) The stochastic process {fik+l|k’ k I 0,1,2,...}, which is defined by the single-stage prediction error fik+l‘k satisfying the linear stochastic difference equation (4.14) 6H1“, - [In - G(k+l‘k)M(k)]t"i - c:(1t+1|1t)yk + u - u k‘k-l k-l k L] 77 for k I 1,2,... with the initial condition 3 I ul, is a wide- l‘o sense Markov process. The covariance matrix for this process is determined by the recursive equation (4.15) P (h+1|k) - P (k‘k-l) - [P (k‘k-1)Mi(k) + C(k)][M(k)P (k‘k-1)M1(k) + a a a a R(k)]-1[M(k)P;(k‘k~1)MI(k) + CT(k)] +-Q(k) u for k I 1,2,... with P (1‘0) I Ph(l) as the initial condition. 5 PROOF. (a) The proof of the fact that the process {fik+1‘k’ k I 0,1,2,...} is a zero-mean wide-sense martingale is similar to the one that is given in (3.22s) and hence omitted. From (2.278), (4.1) and (4.10) it is seen that, o + G(l¢+1‘k)[yk - M(k)0 k+1‘k ' fik‘k-l k‘k-l] for k - 1,2,..., and b - o for k - o, a1hee by (4.8) the l‘o observation process has full rank. Now, it remains to find an expression for the predictor gain matrix G(k+l‘k). From (2.29) and (4.3), we have C(h+1|k) - [ 51 ~ -1 “k+1\k-1’ yk‘k-l][yk‘k-l’ yk‘k-IJ e [fik+l\k-l’ 9k‘k_1][M(k)P%(k‘k-1)Mr(k) +-Pv(k)]'1. To complete the determination of the gain matrix we must compute the matrix [uk+1‘k-l’ yk‘k-l]° This is done as follows: 78 [Gk-tl'k-l’ yhut-1] " ["kn ' °k+1|k-1’ 9141M] ' [“k+1’ yapt-11' ”in“ yk‘k-l ‘ Ok‘l-l‘k-l e L;(y‘k'1) . [uk+1, 11006141“) + [uk+1, vk], by (4.10) 2 [uk, fik‘k_1]MT(k) + (“kn - uk, vk], by (4.9) (4.16) = P~(k‘k-1)MT(k) + C(k), a1hee u U k ' fik‘k-l I “1410-1 and ak‘k-l .L nk‘k-l . Substitution of this result into the expression for the predictor gain matrix, yields G(k+1,k) I [P~(k‘k-1)MT(k) + C(k)]EM(k)P (k‘k-1)MT(k) +'Pv(k)]-l. u 5 (b) By definition .. g _. _ uk+1‘k uk+1 uk‘i’l‘k’ k 0,1,2,... Substitute (4.12) into this equation, and rearrange the terms to Obtain fik+1‘k ' “H1 ' Ok‘k-l ' C(Hl‘k)[yk ' Mok|kIIJ 1,1 - fik‘k_1 - G(k+1‘k)[M(k)fik‘k_l + vk] + ulc+1 - uk (4.14) - [In - G(k+1|k)M(k)]fi' - c;(h+1|1t)yk + u - u k‘k-l 1t+1 k for k I 1,2,... . For k I O, obviously O I ul, since l‘o A procedure analogous to the one that is used in (3.22b) shows that the stochastic process {fik+l‘k’ k I 0,1,2,...} is a zero-mean wide-sense Markov process. The covariance matrix of this process is given by (2.30): I! ‘5 ‘1‘ 79 P~(k+1‘k) - P~(k+l‘k-l) - c°1|1~1 and If the steps which lead to (3.48) and (3.52) are repeated for the present case, their results [uk|i-l’yi|i-1]- Pfi (k, i|i-1)MT (i), by noting v11.uk‘1_1 and P~(k,i+l‘i)IP (k,1|1-1)-P (k,i‘i-1)Mx(i)EM(i)P (1|1-1)M?(1)+Pv(1)]'bu(1)P (1|1-1) u E E E E for 1 - 1t, 1t+1,... with the initial condition P (1t,1t|1t-1) - 11 P (k‘k-l). a (1 Thus we have found the following Optimal estimation equa- t1ohs for the smoothing (0144' P (km) k < 4, with the 1h1t1a1 a condition (0k, P (k)): O L (4.27) 8k|LIfik+ 1: cat, 1|1- l)[yi- M(i)0= 1-k+1 1‘1'1] (4.2s) c(1t,1|1-1) - P~(k,i‘i-1)MT(1)[M(1)P (i‘i-1)MT(i) + Pv(1)]"1 u 6 (4.29) P (k,i+1‘ i)IPfi(k,i|i-l)-P (k,i‘i-1)MT(i)[M(i)P (i‘i-1)MT(i) a s s + Pv(1)]'1M(i)Pfi(i‘i-l), P (1t,1t\1t-1) - P (Mk-I): a a ll .... ‘\.s 86 1. (4.30) P (km I P (k) - z G(k,i|i-l)M(i)P (1t,1|1-1). a 11 1'10”. 6 Note that these equations have exactly the same form of those obtained for the BP (cf. (3.50)-(3.53)). The cross-correla- tion effecusthese estimation equations only through the single- stage prediction-error covariance matrix P (i‘i-l). We expect 6 1 is correlated with 111 if j 2 1+1. The optimal smoothing equations (4.27)-(4.29) for the signal that because v uk’ which are valid for all the classes of smoothing, hold for the signal xk I §(k)uk with u E R(§T(k)) and hence for Kalman k signals (cf. (2.17)). The form of these equations for single-stage smoothing and fixed-point smoothing is easily derived by repeating the steps leading to the analogous results given in Chapter 3. The equations derived here for Kalman signals extend well-known results (cf. [M-4]) to the cross-correlated noise case which was not previously solved to the author's knowledge. SINGLE-STAGE SMOOTHING. Optimal single-stage smoothing (Gk‘k+l’ P (1t\1t+1)), 1t - 0,1,2,..., for the e1gha1 6k is accomplished .\ s I" via the following equations with (00, P (0)) I (O, Ph(0)) as the u initial condition: = 0k + C(k, k+1‘k)[yk+1 - M(k)8 014 Hi 1t+1\ k] ’ G(k,l(+1‘k) - P (k)MT(k+l)[M(k-l-1)P (k‘l’l‘k)MT(k+l) + Pv(1t+1)]'1, a a P~(k, 1t+1\1t) = P (k) , u E p (k‘k-i-l) = P (k) - C(k,1t+1|k)u(1t+1)P (k) . a a 11 87 FIXED-POINT SMOOTHING. Optimal fixed-point smoothing, (GN‘L’ P (N‘L)), L I N+l,N+2,... for the signal uk, is accomplished 6 via the following equations with (aN’ P (11)) as the initial u condition: an” .. “ML-1 + cm. Lila-MY, - "(whit-1] . cable-1) - Pfim.LIL-1)MT(L)[M(L)Fault-INT“) + mm“ . P~m.L+1‘L)IP~m.‘L‘L-1)-P~(N.L‘L-1)MT(L)[M(L)Pfi(L|L-1)MT(L)+PV(L)]-1 u u u x available-1). goats-1) - {MN-1)» u u 11 P (ML) - P (ML-1) - C(N.LIL-1)M(L)P (L. N‘L-l) . a u a ‘1 I! \ ‘51 CHAPTER 5 COIDRED NOISE PROBLEM ((11?) Having treated the problem of optimal estimation for un- correlated and cross-correlated observation noises (cf. (A.1.l), (A.1.2)) in the preceding two chapters, we turn now to the estimation problem for cross-correlated colored observation noise. This is the most general problem studied in this dissertation. The colored noise problem in Kalman filtering theory was first discussed by Cox [c-l], Bryson and Johanson [s-S], and Bucy [8-9], and by Bryson and Henrikson [B-6], Stear and Stubberud [8-4] and others [M-7], tZ-S], [F-Z]. Bryson and Johansen's work was based on (i) the "augmented state" procedure suggested by Kalman [R-4], and (ii) the assumption that colored noise is generated by a given linear difference (or differential) equation forced by white noise. The augmented state procedure has not been widely used because it leads to ill-conditioned computations in N constructing the filter. Assumption (ii) has been used in all investigations published to date. Here we solve the colored noise problem by reducing it to a wide-sense martingale noise problem and then applying the technique developed in the preceding chapters. This is a more direct method than the previous work in this area and gives a different perspective and yields new results. An advantage of this approach is that the 88 l! ._ ‘sw 89 colored noise, which is a wide-sense Markov process by assumption, need not necessarily be given by a linear difference equation with white noise input. In addition, the "martingale" approach provides optimal estimation algorithms for observations corrupted by additive wide-sense martingale noise. This case has not been considered in the existing literature as far as the author knows. We shall begin our study by reformulating the problem in the following section. 5.1 REFORMULATION OF THE PROBLEM. Recall from Section 2 of Chapter 1 that the signal and observation are described by, reapectively, (5.1) xk I @(k)uk k I 0,1,2,,,. (5.2) yk = M(k)uk +vk where (5.3) vk+1 I y(k+l, k)vk +nk k I 0,1,2,... The assumptions on the initial conditions xo (or no), v0 and output noise are the same as those stated in (1.4) and (A.1.3). The matrices §(k) and M(k) have been defined in (1.4) and (1.5) respectively. Note that the unique solution of (5.3), with the initial condition v0, is k vk = *(k,o)vo +-121 s(k,i)ni_1 k - (5-4) -1(k.o)[vo + 2 1(o,1)ni_1], s1hee 1 1(1t,e)y(1t,1)1(o,1) iIl ll. .. ‘fimi 90 for k I 1,2,... . Now, define k vO + 121*(0’i)ni-l , if k I 1,2,... (5.5) mk I v if k I 0 then it follows that mk satisfies the linear stochastic difference equation (5.6) Ink“ a mk + Ho, 1t+1)nk with 1110 I vo as the initial condition. By Assumption A.l.2, vo L'nk’ V'k I 0,1,2,..., hence, in view of (2.14), the stochastic process {mk’ k I 0,1,2,...} is a wide-sense martingale process with zero mean, and also (5.7) [qu+1 - mk, mk+1 - mk] = 1(o,1t+1)Pn(1t)1T(o,1t+1) 1t - 0,1,2,... where Pn(k) 2 [nk, nk]. In addition, (5.8) u m 1 - m1] . c(k)1T(o,1t+1)6 ks 1+ 1‘93 . 0:192:00. [uh-+1 ' kj since, by (A.1.3), [uk+1 - uk, nj] I C(k)6kj' Finally we note "\ from (1.4) that [110, bk] - o v k - 0,1,2,... which implies " (5.9) [60, mk+1 - bk] - 0 1t - 0,1,2,... From (A.1.3) and (5.5) we conclude the fOllowing useful results: (5.10) uk+1 - uk .1..mj and mk+1 - mk 1.uj VJ ‘ k, k!) - 0,1,2,... 0 It. ,_, ‘wa 91 Observe from (5.4) and (5.5) that vk I §(k,o)mk, k I 0,1,2,... Hence, (5.2) and (5.3) can be combined in one equation as (5.11) yk I M(k)uk + §(k,o)mk k I 1,2,... The problem is thus to find the minimum mean-square estimate of the signal. xk I §(k)uk, or, equivalently, in view of Section 3 of Chapter 2, of the signal uk, from.the data Y(L) I {y1,y2,...,yL} when OR is related to the data Y(L) by (5.11). The solution to this problem is given in the following three sections. 5.2 OPTIMAL IREDICTICN FOR (NP As in the preceding two chapters, we first derive a formula for the general predicted estimate Ok‘ , k >1L. To do so, we L proceed as follows: n fik|L (uk‘L2(y,L)) , by the orthogonal projection lemma - (uk - u, + u,lL§) - o, + (9k - ways». since 1L, k,L I 0,1,2,... . From (1.4) and (5.10), we see that a n 0 uk - UL i L2(y,L). Therefore (5.12) a - 11 L’ k 2 L k,L I 0,1,2,... k|L where 0L is the optimal filtered estimate at the present observa- tion time L. It is assumed that the filtered estimate 0L is obtained by using the filtering algorithms which will be derived in Section 3 of the present chapter. l! e \ "~.u 92 Obviously (5.12) is valid for all the classes of prediction. At this point, we note that we obtained exactly the same result for the BP (cf. (3.2)), i.e., the general predicted estimate is equal to the filtered estimate at the last observation time. So, the form (5.12) for each class of prediction will be exactly the same one that was obtained for the corresponding class in Chapter 3. These forms are repeated here for the sake of completeness without further comment. FIXED-INTERVAL PREDICTION. Optimal fixed-interval prediction (Gk‘L, P~(k‘L)), k I L+l,L+2,..., L I fixed-positive integer is u accomplished as follows with the initial condition (8L, P (L)): O I 6 (5.13) a k‘L L ’ (5.14) P (W) - P (L) + P (k) - P (L) s a “ “ k-l =P(L)+ 2: (2(1) . a 1-L FIXED-POINT PREDICTION. Optimal fixed-point prediction (ON‘L, P mm), L - 0,1,2,...,N-l, N - fixed-positive integer is E accomplished via the following equations with the boundary condi- tion ( , P (N)): “N E (5.15) le‘L as “L , 640 {mH>-%s)+%m)-%a). U FIXED-IEAD PREDICTION. Optimal fixed-lead prediction (fiIHL‘L’ P (L+L‘L)), L I 0,1,2,..., L I fixed-positive integer, is given E by the following equation with (0140, P (L|o)) - (o, Pu(L)) as E l! ~I“ \ ‘wd 93 the initial condition: 8 (5°15) uLfl‘L '3 L s (5.16) Puar'tm - {(16) + Puma.) - Pun) . 11 Consider the special case L I l, i.e., the single-stage prediction (0L+11L’ P&(L+1‘L)): “um ' 6:, ’ P3141111.) - P51.) +90.) . u u As in the preceding two chapters, we now develop the recursive algorithms for this case based on the last observation and previous predicted estimate. To do so, we need the following lemma which was called the innovation lemma in the previous chapters. (5.17) INNOVATION IEMMA. The observation process {yk’ k I 1,2,...] which is defined by (5.11) has full rank, i.e. [yk+1|1t’ yk+l|k1 > 0 ° PROOF. We first derive an expression for the innovation vector yk+l‘k' Doing so, we note by definition PM” - VH1 - swims» . Since, by (5.11), yk-I-l I M(k+l)uk+1 + fi(k+l,o)mk+1 we have (5'18) S"1t+1|1t " y1t+1 " “(humans ' “Hlmmk-tqk for k I 0,1,2,... . Note this expression involves the optimal I! I \m‘ 94 single-stage predicted estimate of the observation noise mk. This estimate is computed as follows: a‘1t+1\1t a (mk+1‘L(y‘k)) (mkfl - Ink + mk|L(y;k)) (mk|I-(y;k)). since, by (5.10), mk+1 - “‘1t 1 L(y;k) No.10 (1(k.o)mk\L(y;k)). by 1(o.k)1(k.o) - IIn and (2.2191) I H0.1<)(yk - M(k)uk\L(y;k)). by (5-11) (5.19) I y(o,k)[yk - M(k)fik] , since yk E L(y;k) . Substituting (5.19) into (5.18) and noting that 8k I and °k+1|1t y(k+l,o)y(o,k) I y(k+l,k) we get the result - M(k+l)8 - '(k+1’k)[yk.- M(k)8 91+” k ' y1t+1 1t+1\1t 1t+1|1t] (5.20) - yk+1 - V(k-i-l,k)yk - [M(Hl) - '(k+l,k)M(k)]Ok+l‘k for k I 0,1,2,... . Note that to process the innovation vector 9k+l‘k we need the observation vectors yk*1, yk (i.e. the last and preceding observation vectors) and the single-stage predicted estimate based on the observation record Y(k). In order to prove that the observation process has full rank, in view of Section 3 of Chapter 2, we must show that 8"le 71.1111 > 0. v1: - 0,1,2,... . From (5.20) and (5.11), it follows that ~'~ \ “nu 95 I (M(k-l-l) - s(k+l,k)M(k))fi' + §(k+l,k)M(k) (u S"la-I‘lt k‘l'l‘k 1t+1 ' uk) + t(k+l,o)(mk+1 - mk) (5.21) - mansion” + 1(1t+1,1t)11(1t)(uk+1 - uk) + 1(1t+1,o)(mk+1 - mk) for k I 0,1,2,..., where the definition (5.22) iota) - M(k+l) - '(HIJOMOC) is made as a notational convenience. Using (5.21) we may write wk+111t51t+1|RJIEmkak-tuk 4" t1T . by (5.7) we obtain the result l! ~--§. \ ‘wd 96 [yk+1‘k,yk Mk] - M(k+l)Pfi(k+l\k)MT(k+l) + §(k+1)Q(k)uT(1t)yT(k+1,k) + fi(k+1)c(1t) + 1(k+1,k)M(1t)Q (k)MT (1t+1) + 1 (k+l,k)M(k)Q (k)uT(k)1T(1t+1,k) + y(k+1,k)M(1t)C(1t) + CT (1t)fiT (1t+1) + cT(k)MT(1t)¢T(1t+1,1t) + Pn(k) (5.23) - Roan) P~(k+1‘k)fiT (1t+1) + M(k+l)Q (k)MT(k)¢T(1t+1,1t) u + 1(1t+1,k)M(k)Q(1t)MT(1t+1) + M(k+l)C(k) + CT(k)MT(k+1) + Pn(k) - 1n(k><2 (k)MT 1T(k+1.k> for k I 0,1,2,... . From (5.16) we have P (k+l‘k) I P (k) +-Q(k). fi “11' Substituting this into (5.23) and simplifying the result using (5.22), we get (5.24) [§Hl‘k’yk+l‘k] - fi(1.+1) P~(k)fiT (1t+1) + u (1t+1)Q(1t)uT(1t+1) U + M(Hl)C(k) + CT(k)MT(k+1) + Punt) for k I 0,1,2,... . It is clear that this m X‘m matrix is in- vertible if M(k+l)c(k) + CT(k)MT(k+1) + Pn(k) > o . By assumption A.l.3 this is so, therefore WHI‘P’ yk‘i‘l‘k] > o as desired. For notational convenience, we shall denote this matrix by P~ (1t+1| 11) . QED y ll. ~.~ “wad 97 (5.25) THEOREM. Optimal single-stage prediction (11 P (1t+1\1t)) a 1t+1|1t' k I 0,1,2,... for the signal u is accomplished as follows: k (a) The stochastic process {Bk+l|k’ k I 0,1,2,...}, which is defined by the single-stage prediction estimate, is a zero mean wide-sense martingale, and is generated by the recursion (5.26) 6W,k - 11M,“1 + c<1<+1|1<>tyk - 11ua<-1>. -c1. 163 and r- - “1t “1911 w 9. - 16 “'k “fit-1 Lqu - “it; Then (5.26) can be written as fik‘l‘l‘k - “10614191 + rmwk 1t - 1,2,... From the definitions it is clear that F(k) is invertible and the stochastic process {w , k I 1,2,...] is a white-noise process with zero mean and that T r(we-1) cot-1)) (O,k) o T [wwwj] . ,(o,k)cT(P-1) 1(o,k)Pn(1t-1)¢T(1t,o) o 51.5 . L. o 0 0(li I 100 Furthermore, since M(O) I O and C(l‘o) I O, fil‘o I u1 l.w1 and therefore fil‘o .Lwk V k I 1,2,... . Hence, the stochastic process {fik+1\k’ k I 0,1,2,...] is a zerowmean wide-sense'Markov process, by virtue of (2.17). Now it remains to find an expression for the single-stage prediction error covariance matrix P (k+l‘k). From (2.30) we have 5 P~(k+1‘k) . P~(k+1‘k-1) - c(lt+1|1t)[yk‘k_1, u u ti1416-1] for k I 1,2,... with P;(l‘o) I Ph(l) as the initial condition. Substituting (5.27) for :(k+l‘k), (5.30) for [yk‘k_1,sk‘k_1]T, and noting from (5.26) that P~(k+1‘k-l) - P~(k‘k-l) +Q(k) U U we obtain U P (1t+1|k) - P (k‘k-l) - [P (k‘k-1)fir(k) + cT(1t-1)_]P'1(1t|1t-1)[f1(1t)P (k‘k-l) .. s s 9 a + CT(k-1)] +Q(k) for k I 1,2,... . QED This theorem gives the recursive algorithm for computing w the single-stage prediction. The information flow in the predictor is shown in the Figure (5.1) which is a representation of (5.26). We observe that the predictor (5.26) requires the storage of one observation. This is the only difference between the computational procedure of the predictor given in Figure 5.1 and the preceding two given in Chapters 3 and 4. We now write the optimal single-stage prediction equations for the signal. xk_I {(k)uk’ by assuming that “k 6 R(§T(k)) .56 now acuouooua owwuauoawsao .uOu Sumac seq—n {Wm enough - a x a 3.2.5 o m A3! 0 0' —HQ 0 OOOONOH.3 nauseous» Onion oeuOuoo 333$ - mwo osoueoum sxfl — 533 3.5 cs: 3.5 segues» . a is 83:83 - s x s , x .23 i cocoon.— a3 sateen—o ‘1‘ Dion 6 - x 3.5 2.1.5 . 6 4 Monaco“:— amounted we.“ a Hue.“ sec 0 It. ‘0“. \ ‘sml 102 v'k I 0,1,2,... . The derivations of these equations are straight- forward and are omitted. Optimal estimate: (5.31) 2H1“, = (1t-1)1+(k) . Error covariance matrix: T (5.34) P~(k+l|k) a 6(k+1)6+(k) P~(k‘k-1)§+ (1t)1T(1t+1) x x ~ +T '1' T - K(k+1|k)[H(k)P~(k‘k-l)§ (k) + c (k-l)]§ (1t+1) x + 1(1t+1)Q(1t)6T(1t+1) for k I 1,2,... with P (l‘o) I Px(l) as the initial condition. 51 We remark.here that the proposed single-stage predictor (5.31)-(5.34) for the signal xk.- 9(k)uk is of dimension n instead of (n+m) as in the augmented state predictor given by Bryson and Johanson [8-5] who considered a smaller class of signals (Kalman signals). The results of this chapter also extend Bryson and Henrikson's [B-6] work to cross-correlated colored noise in the larger signal class. The wide-sense martingale approach we develop is conceptually and computationally simpler. 5.3 OPTIMAL FIIJ‘ERING FOR (NP We continue our study by an examination of the optimal filtering problem. We wish to develop an algorithm for optimal filtering of the signal' u In doing so, we assume that only k. 103 the initial estimate 6010 I 00, and the filtering error covariance matrix at the initial time, P~(o\o) I P (0), are known. As in Chapter 3, fromr(5.12) weuobserve that optimal pre- diction and filtering are interdependent in terms of the determina- tion of the filtered estimate given the predicted estimate and vice-versa. In fact Thus the single-stage predicted estimate algorithm (5.26) can be used to process the optimal filtered estimate. So, in order to solve the filtering problem, it remains to find a recursive expression for the filtering error covariance matrix. This is done in the following theorem. (5.35) THEOREM. Optimal filtering (0k, P~(k)) k I 0,1,2,... u for the signal u is accomplished as follows: k (a) The stochastic process {flk’ k I 0,1,2,...}, which is defined by the filtered estimate, is a zero-mean wide-sense martingale, and is generated by the recursion (5.36) fik I fik-l +‘C(k)[yk ' f(k,k-l)yk_1 "M(k)°k-1] for k I 1,2,... with 00 I 0 as the initial condition. The n X m matrix C(k) is given by (5.37) out) - [P~(k\k-1)MT(k) + c(k-1)]P'1(1t|1t-1) u I where P (k‘k-l) is given by (5.24) . y (b) The stochastic process {fik’ k I 0,1,2,...], which is defined by the filtering error Gk, given by 104 (5.38) 11“, I [In-C(k)fi(k)]fik-I+[In-G(k)M(k)](uk-uk-l) + 6001090) (Ink - mk_1) for k I 1,2,... with the initial condition 60 I no, is a zero- mean wide-sense Markov process. The covariance matrix of this process is given by the recursion (5.39) P~(k) I P~(k‘k-l) - G(k)[M(k)PL(k|k-l) +-CT(k)] u u U for k I 1,2,... with P;(0) I Pu(0) as the initial condition. u PROOF. (a) It follows from (5.12) and (5.25s). (b) By definition 3 I u - G k k k, k'0,1,2,... 0 Substitute (5.32) for 0k to obtain C! I k uk - (1k_1 - cat)[yk - 1(k.k-l)yk_1 - fi(k)iik_,1] . fik- l-l'uk-uk_ 1-c(1t)[u(1t)uk-1 (k,1t-1)M(1t- 1) uk_ 1i(k)8k_ 1 + '(ksoflnk ' *(k307mk_1] ak_1-cnk_,-M (“1‘“).-1) + §(k,o) (Ink - mk_1)] + uk - uk_1 (In-C(k)fi1fik_ 1+1 In-c (mum (uk-uk_1>+c(k)1 (km) (Ink-m1“ 1) for k I 1,2,... . Obviously uo I no - 80 I no, since 00 I O. Utilizing the same procedure as that in (5.32b), we prove that the stochastic process {fik’ k I 0,1,2,...} is defined by (5.34) is a zero-mean wide-sense Markov process. ~,~‘ \ "\.-l 105 To complete the proof, notice from (5.16) that P (k) = P (1t+1|1t) - Q(k) k - 0,1,2,... . fi 6 Substituting (5.29) for P (k+l‘k) and noting that G(k+l‘k) I C(k), ll we obtain P~(k) - P~(k‘k-l) - C(k)[ii(1t)P~(k‘1t-1) + CT(k-l)] U U u 01' P~(k) = P~(k-1) - G(k)[M(k)P~(k-l) + fi(k)q (16-1) + cT(1t-1)] +Q(k-l) u u u for k I 1,2,..... For k I 0, it is clear that P;(O) I Pb(l) by (2.27b). u QED We see from the theorem that, as we expected from the results of Chapter 3, the proposed filter has exactly the same dynamical structure as the single-stage predictor discussed in the preceding section. Thus, the same dynamics can be used to generate the filtered and predicted estimates. The error covariance matrices of these estimates can be computed by the recursive equations (5.29) and (5.39) or after one of them is computed by one of these equations then the other follows from the relation (cf. (5.16)): P~(k+l‘k) - P~(k) +Q(1t) k - 01,2..... . u u The optimal filtering equations for the signal xk I §(k)uk, such that uk 6 R(QT(k)), is obtained from (5.35) by making use of (2.22) as follows: Optimal estimate: (3.40) 11, - C(k)§+(k-1)Rk_1 + wow, - 1(k.k-1)yk_1 - fi(k)fik_1] 106 for k I 1,2,... with the initial condition so I 0, where (3.41) we =1P~ +P]. P (k,k+l\k) - P (k). u .. u L (5.47) P (147,) - P (k) - z-w c(1t,1|1-1)fi(1)P (1,1t|1-1) . a a lIk+1 11 These optimal estimation equations for the signal uk, which are valid for all the classes of smoothing hold for the signal xk I §(k)uk such that uk 6 R(§+(k)) as in the preceding two chapters when u is replaced by x and M1 is replaced by H which is defined by (5.33). For the class of Ralman signals, the proposed smoother, i.e., (5.44)-(5.47) is apparently new. The forms of the optimal estimation equations (5.44)- (5.47) for the single-stage and fixed-point smoothing can easily be obtained by repeating the steps leading to analogous results in Chapter 3. We state only the results. SINGlE-STAGE SMOOTHING. Optimal.single-stage smoothing (1114“,. Pfi(k|k+1)) 1t - 0,1,2,... for the Signal uk is accomplished via the following equations with (00, P (0)) I (0,Ph(0)) u as the initial condition: fik‘k‘l-l I 0k + G(k,k+1‘k)[yk+1 ‘ 1(k+1.k)yk ' fi(k+1)ok+1‘k] C(k,1t+1|1t) - P (k)fiT(1t+1)P'1(1t+1|1t) fi 9 P (k,k+l‘k) - P (k) a s P (k‘k-i-l) - P (1:) - c(1t,1t+1\1t)fi(1t+1)P (k) a a u where all the terms are defined as before. 110 FIXED-POINT SMOOTHING. Optimal fixed-point smoothing (“W7 P (u|L)) U L I N+l,N+2,... NII fixed positive integer, is accomplished via the following equations with the initial condition (“11’ P 01)): ti “ as ‘1' '1 " a ’1 ~ 6k” (ML-1 601,111. ml, ((1.1. )YL_1M(L)0L‘L_11 comm-1) - PGCN.L‘L-1)§T(L)P:1(L‘L-1) y P (N.L+1\L) I P (Nah-1) _ C(N.LIL-1)[i(L>P (1116-1) + era-1)] s s a P (N,N+1‘N) I P (N) s a P 0111.) I P (ML-1) - G o a (1 + 111311'51)’1 - I - IMT(MPMT + 11)")1 . PROOF. Since (I + st'lm (I - memT + 10'1“) - I + MIR-1M - 311.0131: + 10-111 - mra'lumrmmr + 10411 - I + mTR'l‘M - me'lm «1 11111501111"3 + 11) "111 I I, and similarly (I - memT + a)'h1) (I + st'lu) - I, by the definition inverse of a matrix (I + mTR'lhfl - I - Putnam! + 10'111 . QED (3.2) mm. P 2 0 and R > 0 a (I + mTR-]M)-1mTR-l :- memI + R)'1. PROOF. Multiply (3.1) on the right by MIR-1; Obtain 121 122 (I + mTR'1u)'1mTR’1 - 1:11:11"1 - 11301114T + a)')}1st'1 -1 - -1 -1 - mTR - 111101111.r + 11) 111111311 - lMT(MPMI + R) - FMT(M1MT + 11.)']1111'1 = PMT(MIMT + 11)’1 . QED