ON ALGORITHMS FOR NONLINEAR PREDICTION Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY SHASHI PHOHA 1976 LIB Atlfi A? Y Mitsheceam SEN-La» Luv-8:31? as“ v WWI-1",” This is to certify that the thesis entitled ON ALGORITHMS FOR NONLINEAR PREDICTION presented by Shashi Phoha has been accepted towards fulfillment of the requirements for Ph.D. degree inwand Probability %/ fcéeé ’ Major professor DateL/4/7/fl V // T Ll/Zfij G/d" ABSTRACT ON ALGORITHMS FOR NONLINEAR PREDICTION By Shashi Phoha Any claim to prediction, i.e. to foretell the future in what- ever limited sense, of a dynamic process, is based on a quantitative understanding of the model. Most processes in nature are such that their future behaviour is not completely determined by their past. For such random phenomena the idea of perfect prediction is replaced by that of a conditional distribution given the past. In 1959 in their paper in the Harald Cramér volume, N. Wiener and P. Masani formally defined the non-linear prediction problem for a univariate strictly stationary stochastic process and obtained an algorithm for determining the nonlinear predictor in terms of the moments of the process. The corresponding non-linear prediction problem for multivariate processes takes into consideration the in- formation contained in the correlative behaviour of these variables. In this thesis, by introducing certain operations on vectors to de— fine a matrix-algebraic structure on the past, a determinate mathe- matical problem is established and its solution demonstrated by a corresponding algorithm. In an attempt to obtain a more efficient algorithm for the problem of non-linear prediction for multivariate processes we proceed Shashi Phoha along N. Wiener's idea of linearizing statistics of time-series to obtain the non-linear predictor. The latter half of the thesis deals with transferring the non-linear prediction problem of a univariate stationary process to linear prediction of a related infinite-variate process. ON ALGORITHMS FOR NONLINEAR PREDICTION By -. ."\. Q \' ilk?" “ Shashi Phoha A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1976 TO MY PARENTS ii ACKNOWLEDGEMENTS I wish to express my sincere thanks to Professor H. Salehi whose constant encouragement and guidance made this work possible. Sincere appreciation is also extended to Professor V. Mandrekar whose valuable criticisms gave a shape and form to unformulated ideas. I wish to thank Professors J. Hannan, H.L. Koul and J. Shapiro for serving on my committee. And for the very understanding attitude and personal interest one almost takes for granted here, I wish to thank all members of the faculty. Let me also thank Mrs. Noralee Burkhardt for her excellent typing of this thesis. iii TABLE OF CONTENTS Page INTRODUCTION . . . . . . . . . . . . . . . . . . . . . 1 Chapter I. PRELIMINARIES . . . . . . . . . . . . . 5 II. MULTIVARIATE NONLINEAR PREDICTION . . . . . . . . . . 16 III. EXTENSION OF THE ALGORITHM FOR LINEAR PREDICTION TO BANACH SPACE VALUED STATIONARY STOCHASTIC PROCESSES O O O O O O O O O O 0 O I O O O O O O O O O 43 IV. ON LINEARIZING STATISTICS OF TIME-SERIES FOR NONLINEAR PREDICTION . . . . . . . . . . . . . . . . . 56 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . 66 iv INTRODUCTION The ultimate objective of modeling a dynamic system is, of course, to predict or control the output of the system by observing and manipulating the inputs. As opposed to the hitherto acceptable idealization that the laws of nature could be formulated linearly, technology in its trends towards both refinements and magnifications of scope and complexity, has been pointing with increasing insistence to the fact that in the formulation of natural laws, modern require- ments of precision forbid the suppression of nonlinear elements, for the suitable formulations are almost without exception, nonlinear. Fundamentals of nonlinear prediction theory as an outstanding con- tribution of World War II, came out of N. Wiener's work on predicting aircraft paths for purposes of fire control. The nonlinear theory is still in a juvenile state. This is a difficult terrain in which the going has been rough. But beyond the challenge to master difficulties, the demand for clearing of paths is arising for more mundane considera- tions. Wiener's work is geared towards applications. In particular, the theory modified and extended in various ways, constitutes a tool of considerable potential in forecasting and regulation. In prediction theory of a stationary stochastic process, there is first the problem of defining the prediction, i.e., of showing that we have a determinate mathematical problem, and of demonstrating its theoretical solution. There is then a theoretical analysis of time and spectral domains. Finally, there is an attempt to use the time and spectral analysis to determine the predictor at a more efficient level. The program for linear prediction has largely been carried out in the multivariate case by N. Wiener and P. Masani [20], [21]. Linear predictors for weakly stationary stochastic pro— cesses are obtained and a factorization of the spectral density under the boundedness condition yields an algorithm for computing the gen- erating function. An algorithm for computing the linear predictor is then obtained from the generating function. This work was gen- eralized to Hilbert space valued random variables by A.G. Miamee [7]. A needed generalization to random variables defined on a Banach space, is the content of Chapter III. .For nonlinear prediction of a univariate, strictly stationary, discrete parameter stochastic process, work has been done at the first level by N. Wiener and P. Masani [19]; and the ultimate objec- tive to determine the relevant conditional expectation has been es- tablished. The predictor, which is the conditional expectation, is obtained in terms of the moments of the process by reducing the prob- lem to a projection on the Lz-closure of the algebra generated by the present and past of the process. In Chapter II of this thesis the corresponding nonlinear problem is defined for multivariate random variables. By introducing certain operators on matrices to define a matrix-algebraic structure on the past, a determinate mathematical problem is established and its solution demonstrated. But a solution to the problem -- a solution as an algorithm which produces a numerical estimate from numerical observations, perhaps with the intercession of a digital computer, has not yet been satisfactorily obtained in the literature. Chapters III and IV of this thesis deal with transferring the univariate nonlinear prediction to linear prediction of an infinite-variate stationary stochastic process. In his paper [18] in the Berkeley Symposium, N. Wiener sug- gested a closely related method of linearizing statistics of time- series to obtain the non-linear predictor. Chapter IV of the thesis deals with this method. For real—valued, measurable, univariate random variables {fn: —m < n < m}, with all powers of fn Lebesgue integrable, and an ergodic measure preserving transformation T, he suggested develOping a linear theory of multiple prediction for random variables Xn whose components are products of all combinations of f1(Tn1),...,fv(Tnv). The autocorrelations demanded are expressions of the form n n n E{fl(T 1) x f2(T 2) x...x £v(r v)}. A dense subspace of L2 generated by {fn: -m < n < m} and consisting of finite products of {fn}:m, can then be arranged as an infinite-variate stationary stochastic process {Xn: -m < n < m} whose linear past at any stage is exactly the nonlinear past (poly- nomials and their limits) of the univariate process {fnz -m < n < m}. Due to duplication of subfactors in the components, however, the vectors Xn(m) do not possess a Hilbert norm. Xn's may then possess a Banach norm. The linear predictor for {Xn: -w < n < on} may then be obtained by methods established in Chapter III, which further gives the nonlinear predictor for the univariate process {fn: —m < n < 00}. Conditions on the moments of {fn} that would guarantee the bounded- ness condition for {Xn: -w < n < co} have not been obtained explicitly. An example of a two state Markov chain sheds light on the procedures and some of the intricacies involved. Following the work of Wiener and Masani much interest has been aroused in the areas of nonlinear prediction. Some of the more recent 'work related to this problem appears in [A], [8], [9], [10], [ll], [14], [16]. CHAPTER I PRELIMINARIES While observing a random quantity associated with some long enduring mechanism in nature, from the remote past to the present moment, we obtain a sequence of readings {Xk: k = O,-l,-2,...}. For the problems of predicting a future value based on this information we shall be considering {xk: k = O,-1,-2,...} as part of a stationary stochastic process {fnz n = O, + 1, + 2,...} defined on a probability space (Q,F,P) such that xn = fn(m0), mo in 9, -w < n < m . Now fn may be square integrable and hence lie in some Hilbert space H. Following Masani in [5] we have 1.1. The Gram-matricial structure of HN for a fixedgpositive integer N. Let fl HN={f=Z :f €H,1 0}. - 0 5.3. Further, define a weakly measurable B(X,K) valued function A(eie) on the unit circle to be l3 analytic, if for each x 6 X, A(eie)(x) is in Lg+(K), conjugate analytic, if for each x E X, A(eie)(x) E Lg-(K). 5.2.1. Factorization of the Spectral Density. * A nonnegative weakly summable B(X,X ) valued function de- fined on the unit circle is said to be factorable if there exists a Hilbert space K and a conjugate analytic B(X,K) valued function 16 19 * A(e ) defined on the unit circle such that f(eie) = A (eie)A(e ), in the sense that (f(eie) x)(y) = (A(eie)x, A(eie)y) for all x E X, y_€ X. Regarding factorization of the spectral density of a B(X,K) valued regular stationary stochastic process, the following has been established in [7] Theorem 3.3.12, p. 23. 5.3.2. Theorem. The spectral distribution F of a regular B(X,K) valued stationary stochastic process is absolutely continuous and -§§ (x) = z e'ikeAk (e ) where «eig is a generating function of the stochastic process. In chapter 111 we obtain a factorization of the spectral density for B(X, K) valued regular stochastic process and thus obtain the linear predictor and the prediction error matrix Gv’ for any v Z_l, for such 15 a process. Chapter IV proceeds, to suggest the possibility of utilizing this analysis for nonlinear prediction problem. Results in chapter IV are not yet complete. CHAPTER II MULTIVARIATE NONLINEAR PREDICTION In genuine applicationscxuaalmost always works with systems whose states must be described by several random variables. Weather forecasts depend on a set of interdependent variables, for example. Although it is reasonable that ideas and methods should be develOped first for the univariate case, ultimately one must be able to cope with the multivariate situation. The price {pm} and demand {dn} of a certain commodity, tabulated by days, in themselves form univariate time-series, which are by no means independent. Nonlinear predictors for each of the pro- cesses {pn: -m < n < w} and {dn: —w < n < co} may be obtained under regularity conditions by methods of Wiener and Masani in [19]. However the predictors 6V and dv for v 3_1, thus obtained would ignore the information contained in the correlative behavior of pn and (1H for n.: 0. It would therefore be natural to consider {(dn): -m < n < m} as a multivariate process and to define a corresr pausing structure on the past which formalizes the notions of non- linear predictor and prediction error for a multivariate process. In fact with the appropriate formalism, all of the work of Wiener and Masani in [19] generalizes to this case. We proceed to introduce the same in this chapter and to obtain the corresponding generalization. 16 17 l. Multivariate Stationarity. Let f f —m < n < m, be real random variables nl’fn2’°°" nN’ . defined on a probability space (X,A,u). Then the multivariate stochastic process {fn: -w < n < 0o} where fn = fnl is defined n2 .an to be strictly stationary if and only if for each integer v > O, c c = u[f E Bl’ f \ B2,...,f \ B ] n1+v n2+v nk+v k u[fn1 6 Bl’ fn2 E 32,...,fnk.€ Bk]’ whatever B1,...,Bk Borel measur- able in RN and -w < n < n <...< n < w. 2 k 1.1. Note. (i) The multivariate stochastic process {fnz -m < n < w} is strictly stationary if and only if for every integer v > 0 ulw E S: fi+vj(w) E Aij’ (1,3) 6 I] = pr E S: fij(w) e A1 9 (193) e I] j whatever finite I c {—m < n < m} x {1,2,...,N} and Borel measurable Aij c R. (ii) If {fn: -m < n < Go} is a multivariate strictly stationary process then the univariate stochastic processes {fn : —m < n < w} j for j = 1,2,...,N are all strictly stationary. 1.2. A Characterization of Multivariate Strictly Stationary Stochastic Processes. Notation. For s,t eRN let 18 St= 1 "NZ m n 1 Theorem. The process {fn: -m < n < m} is strictly stationary iff for N each tl,...,tk O and t = (t j j1,...’t for l:j_<_k andany u‘ER- N k E[exp iu(t1fn +...+tkfn )]= E[exp{i Z X in 1 k . j=l 2=1 ut f 2 1 nj k N E[exp{i Z Z j=l i=1 }] “tjefnj+le E[exp{iu(tlfnl+l + t2fn2+l +...+ tkfnk+1)}]° Since u and t 's were chosen arbitrarily, the characteristic J function of the joint distributions of {fn l: l_: j‘: k, 1.: 2.: N} . J and {fn +11; 1.: j_: k, l_: 1.: N} agree. 3 m Stationarity of {fn}_0° follows by Note 1.1(i). 19 1.3. Theorem. CD With the multivariate strictly stationary process {fn}_m defined on a complete probability space .(X,A,u), can be associated a probability space (Q,F,P), a measure preserving transformation T on Q onto itself and a random variable f on R such that the stochastic process {fglfw defined by End.) = f(an), m e :2 G) has the same joint distribution functions as {fn}_m, i.e. (in obvious notation) F =F ,—m 0, define the predictor fv with lead v of the function fv as j/E(fvllBO) A I E(fOZIBO) 'f =I . v . E/ ' Let the right hand side of the equation be denoted by B(fv|80)' 31 3.2. The Prediction Problem. Given the moments for O . . . (DJ-931) (“2932) 9 ° ' 9 9 (nk9Jk) -m < n1_i n2.:..s: nk < m and l _<_jl,j2,...,jk i N of the multi- variate strictly stationary stochastic process {fn: -m < n < m} with fn EL.» and with zero expectations, to determine polynomials ¢q of q + l N-dimensional variables with matricial coefficients such that E= =' . v E[fleO] 11m Oq(f0,f1, ,f_q) qv-H'D Note. 1. Since the joint distributions F of f ,f ,...,f n ,...,n n n n l q l 2 q have compact support, their moments characterize them completely [12] Cor. 1.1, p. 11. 2. The assumption fn E Lm is made to reduce the prediction prob- lem to a projection on the closure of the algebra generated by the past. The weaker condition of existence of all moments is not sufficient. Ref. [19] Theorem 6.5. 4. Main Theorems. 4.1. The following are based on [5]. * 5 L2 with the inner—product (fl,f2) - {f flfzdP} which yields the trace norm, is complete (almost everywhere equal functions are identified). Theorem. For v > O, fv is the orthogonal projection in L2 of fv on M0. E[fv|80,q] is the orthogonal projection in L2 of fv on M0.q' 32 Proof. _ _ N M0 Non L2 is a subspace of L2 - L2. Hence M0, the space of all co-ordinates of elements of MO, is _ N In fact M0 — L2(BO). L2(BO). Also as in the proof of Lemma 2.8 of [5], p. 354. / . (fleo) = /E(valMo) E(stIBO) / I ._ I B(fOZIMO) — B(fvleO) I3, =.~ \ z ' 7 \ M M )./ M 3 )/ .\ le O/ le O// = B(fVIBO). Similarly for projections on M 4.2. Theorem. A function f E AQ)q iff there exists a Baire function s: (11.1‘1)‘1+1 +'RN such that f = ¢(f0,f ,f -l,ooo -q)o 4.3. Theorem ' For F a distribution function on (RN)q with a compact carrier Jq for some compact set J c RN and for 1.: p < m, let Lp F be the space of all RN-valued functions O on (RN)q which 9 are measurable w.r.t. the Lebesgue—Stieltjes measure generated by F p I * and are such that IIOI dF < m where lol R] O O. Then under the norm H¢HF p = { f§|¢|de}1/p, the polynomials in q-variables with values in ’ J RN and coefficients in HN form an everywhere dense subset of Lp F' Proof. Lp F = {¢: (RN)q + R : f{¢*¢}de < m}. As in 2.3 P: denotes the family of polynomials in q N-variate vari- I‘llll II If «I III" \III ||Ill|ul IIIOIII 33 ables {xi}:=l' (Péq)N is the family of polynomials in Nq variables . N _ 1 N {xij' 1 §_i :_q, 1 §_j :_N}. By Theorem 2.3 we have Pq f (PNq) . 1 Now PNq is a dense subset of the space L F of all Lebesgue-Stieltjes P. measurable functions w defined on RNq into RN with it [(n 4,)de < a, Ref. [19] §6.2, p. 203. Hence for each coordinate ¢i of ¢ 6 Lp,F there corresponds for given 5 > 0, pi.6 Fifi such P u 1 that H¢i - pi”: < e/Np. And then p = :1' E (PNq pN )N = P2 is such that p, _Zp/2 -1~_:p th¢-FhF’p [{X(¢i pi) I dF: M) ZN¢1 pin < 5. Hence P: is everywhere dense in Lp F' 9 4.4. Theorem: on (1) M0 is the closure in L2 of the linear manifold U M q=0 O’q (ii) M0,c1 = A0,Cl (iii) M0 = A0. Proof. Let M0 denote the subspace of L2 ordinates of elements of MO. Similarly define M consisting of all co- . Then b Lemma 0.q y 1.2 of Chapter I Now let f'E M Since the truncations {fX of f 0' lflgn} approach f in the trace norm, each co-ordinate of f is approximable by simple Bo measurable functions. Since 80 is the Borel algebra 34 co generated by LJ 80 q’ simple Borel measurable functions are q=O ’ k approximable by simple functions of the form 2 aiIF with w m i=1 i F' E U B which obviously belong to LJ N . Such functions 1 =0 O.q _ DA 9 q—O being bounded are in L and hence in. U (N n L ) = IJ M . 2 _ 0"] 2 _ Oeq q—O q-O Hence the result. (ii) Let f'E MO Then by Theorem 4.2, f = ¢(f0,f_l,...,f_q) where ¢: (RN)q+1 + RN is a Baire function. Now f E [2, hence o 6 La F where F denotes the joint dis- tribution of {f ° -q‘: i_: 0}, because 1° “fo2 = huiz ’ Also F has a compact carrier. Hence by Theorem 4.3 there exists a sequence of real polynomials Qn of q+1 N-variate variables such that OJ m :5 +00. 4.4.1 HO - QnHF + 0 Now define wn=Qn(fO,f_l,...,f ). Then tyne Aoq. Since " 9 f6 L2, tyne L2 and Hf - wnHLZ = ”‘1’ - QnHF° By 4.4.1 then f = lim wn in L2-norm. n+m (iii) From Definition 2.4 it follows immediately that A0 = $20 A0,q' Also from (ii) above I: A0, is everywhere dense in I: M . By (1) therEEgre, A0 is everywhere dense in M0 i.e. q_0 on AO - Mo. 35 Corollaries. (i) For the strictly stationary stochastic process {fn: -m < n < m} and for v > O fv = lim E[fleO’q] in L2 . q—HD This is immediate from Theorem 4.4(i) above. (ii) For the strictly stationary stochastic process {fnz -m < n < m} there exists a sequence ¢q of Baire functions on (RN)q+1 to RN such that fv = lim ¢q(f0,f_l,...,f_q) in L2. q-W Proof. From Corollary (i) f = lim E[f IM ]. V V Oeq q—mo Also by Theorem 4.2 there exists a Baire function ¢q such that E[fleO,q] = ¢q(f0,f_l,...,f_ ). Hence the result. (iii) For the strictly stationary stochastic process {fn: -m < n < 0°} and for any integer v > 0, there exists a sequence of polynomials Qq in q+1 N-variate variables with coefficients in HN and values in RN such that fv = lim Qq(f0,f q-wo _1,...,f_q) in L2. Proof. By Corollary (ii) above there exist Baire functions ¢q such that “Ev - ¢q(f0,f f_q)h2+ 0. Furthermore for given -1’...’ e > 0 there exist polynomials Qq of the above type such that I‘llllllll‘ 36 E_. C _ lieq - Qqhz < I? Thus [I¢q(f0,fl,...,f ) - Qq(f0,f_1,...,f_q)[.2 < 2“. '9 Hence fv - lim Qq(f0,f q—W _l,...,f_q) in L2. 5. Computation of the Predictor. In this section we shall obtain an orthonormal basis for the space MO =LAb in L2 with the trace norm. The expansion of the pre- dictor in this basis is then actually computable from the given data. Define a subset H c L2 to be linearly independent iff for any finite collection h .,hn 6 H and A1 6 HN with l_: i‘: n, 1,0. zAihi=O_=Ai=O vi with l_<_i:n. Define the span S(H) of a subset H of L2 to be the family 11 {1:1 Aihi: A1 é HN, h1 E H and O < n < m} . A subset H of a subspace M of L2 is said to be a basis for M iff H is linearly independent and spans M. A basis H of a subspace M of L2 is said to be orthogonal iff for hl # h2 E H * (h1,h2) = Ehlh2 = 0 . An orthogonal basis H is said to be orthonormal if in addition V h 6 H 37 .”h"2 = trace(h,h) = 1. 5.1. Lemma. A subset HCIL2 is linearly independent in L2 iff the .h ' corresponding set H = {hj : l §_j §_N, h = :1 E H} is linearly independent in L . . 2 \hN Proof. Let hl,...,hn be a finite set which is linearly independent in H. Then n N 2 Z a .h , = O ... .= .= i; 13 ” 811 812 a1N h11 1 1 J l O O 0 h ... 12 +... . . I I \- s ‘\thl." an1 ... anN 'hnl g + O .;. O E = . w aij = 0 for 1 §_i §_n, l §_j :_N. 0 ° 0 - hnN ° \0/ Conversely if {hij: 1 :_i §_n, l §_j §_N} is linearly independent in L2 then hl"'°’hn must be linearly independent in L2 since for k ' k matrices Ak [aileXN’ kzl Akhk - O a aij - O for l_: 1.: N, l §_j §_N and I C {1,2,...,n}, i.e. Ak = O for each k E I. Hence {h h } is linearly independent in L2. 1’...,n The case when the set H is not finite is immediate now. 5.1.1. A linear arrangement of the set H = {fij: «w < i :_O, l £_j 5_N} is: 38 f ... 01’ ’fON’f-11"°°’f-IN’f-2I""’f-ki"" Then H = {hnz 0 :_n < m} with h0 = 1 and for each n > 0 h = n f n-1 n-1 -L_fi—]sn-L N IN h 9- d n w ere [N] enotes the integral part of E} One method of arranging all finite products of elements of H in a row is given by Wiener and Masani as follows. Ref. [19] §7, p. 205. Let pk = k+1th prime, and V j > 1 with the prime factoriza- tion j+l = pk pk ... pkr , O :_k define R1 R2 kr {sz 1 :_j < 00} is then a linear rearrangement of all finite products of elements of H. Thus M0 = clos Sioj: 0.: j < m} where O0 = 1 M0 is therefore the closure of the subspace of L2 spanned ¢j=01j O, i.e. w # 0. Hence {¢j: O < j < 0R} is linearly independent. 5.2.2. Corollary. Under the above assumptions the set {¢ ° 0 §_j < 00} can be j' orthonormalized to obtain an orthonormal basis {wj: O §_j < 0°} for A0 in L2 as follows: 41 ‘I’ a O 1 wl - ¢l//<¢l.¢l) (¢09¢0) (¢03¢1) o o o (¢O)¢k_l) ¢O l . W - _—__—_—— (¢ 9¢ ) (¢ 9¢ ) .. . . (¢ 9¢ ) ¢ k m l 0 1 l l k-l l (¢k9¢0) (¢ka¢l) o o o (¢k,¢k_l) ¢k where Ak = det[(¢1,¢j)]O 3.1’j.i k’ k > 1. 5.2.3. Corollary. . n For every integer v > O, fvj = E[fvaBO] = :1: k:0(fvj,wk)wk. 5.3. Computation of the Predictor. Theorem. For every v > O, fv = ii: Qn(f0,f_l,...,f_m ) where mn is an increasing sequence of nonnegative integers and Qn is a polynomial in mn+l N-variate variables with co—efficients in HN, these co— efficients being computable in terms of the moments O . (n19j1)(n2’j2) ° ° 0 (uijk) Proof. As in the univariate case (Ref. [19] Theorem 7.9, p. 209), let “j be the subscript of the last prime in the factorization of j+l and let, for any n > O m =maX so. . n {U1’. sun} Then wk is expressible as a linear polynomial in terms of {¢j: 0.: j.§ k}, each one of which is further expressible as a poly- 42 nomial in {hjz l_: j_: k}. Thus from the Corollary 5.2.3 there exist real polynomials qnj such that . n f = lim 2 (f .,w )w = lim q (f : -m §_i :_O, l §_£ f N). vj n+m k=O vj k k n+w nj ii nj Thus qn (fij -mn-i i < O, 1‘: j < N) l qn2(fij: -mn: i _<_ 0, 1: j :N) f =11!“ ooooooooooooooooooooooooooooooo o W qu(fiJ -mn fi'i §_O, 1.: j_: N . N = C Since A0,m A0,m , there exist polynomials Qn" A0,m n n n such that 5.3.1 A = fv lim Qn(f0’f-l’f-2’°'°’f-m ). n+m n Note: Actually qnj is a linear polynomial in mn+l variables and A lim qnj(h£: -m < E §_O) vj new n'- = lim q (f : -m < 2 §_O) n-m “3 4%] .n-Ifim “ So the coordinates of the predictor are obtained through the procedure of this section and are written in terms of limits of polynomials in N-variate variables in 5.3.1. CHAPTER III EXTENSION OF THE ALGORITHM FOR LINEAR PREDICTION TO BANACH SPACE VALUED STATIONARY STOCHASTIC PROCESSES An algorithm for computation of the linear predictor for Banach space valued random variables is obtained in this chapter. Time domain and spectral analysis for such processes, including Wold Cramer con- cordance and necessary and sufficient conditions for factorability of the spectral density were obtained by A. G. Miammee Ref. [7], Chapter III. However the algorithm of Wiener and Masani under the boundedness condition, was obtained only for Hilbert space valued random variables using Fourier analysis of infinite matrix valued functions Ref. [7], Chapter VII. Under an extension of the boundedness condition of Wiener and Masani on the spectral density of the Banach space valued stationary stochastic process, a corresponding algorithm for computing the generating function and the linear predictor is obtained. 1.1 The Boundedness condition on the Spectral Density Let the spectral density fe , of the B(X,K) valued stationary stochastic process {an: -m < n < m} satisfy 0 < m(6)A*A 5 f6 5 M(6)A*A a.e. 6 e [O,2h) 1 11(6) m(6)’ m(6) for some A: X + K with “A” = l and M(6), summable. 43 44 1.2. Lemma. f * Under boundedness condition 1.1 for Ne=-—g — A A with agnm)+mm 9 6 2 ’ M(e) - m(e) stimm+nw) = %[(A*A(x+y).x+y) - ]. Therefore (fax,y) * (Nn = -——;;;—— - (A Ax.y> f (x+y) E (x-y) =‘IIt-9————— - ’X-Y)}I- a9 39 Due to boundedness assumption 1.1 then 45 M(6) |.y)|;;%[t;——— (A*A} a.e. 6 Thus 1.2.1. ”N H < “(9) ’ m(e) sup “Ax”2 + “Ayuz a.e e 6 B "’ M(6) + m(6) x:IIxII=l Y=HYH=1 §_ggg; ; 2E2; a.e. 6 (since IIAII2 = 1). 1.3. Lemma. Let the spectral density f satisfy boundedness condition 1.1 and the image AX be dense in K. If A is one—to-one onto then * (a) A is one-to-one (b) A*‘1 = (A'1)* 2) x AX and (c) |A*“1N6A51(k),i| 5_figg; :nEEgIHRH “in a.e. e for R52 e AX. Proof. * * (a) Since AX is dense in K, A is defined on K to X * * * as follows V k6 K, A (k) = x where x (y) = (k,Ay) V y‘E X. 46 Further, for each k,kd 5 K t * A (k) = A (k')¢= VyE X. (k,Ay) = (k'.Ay) e we AX. (k.£) = (16.2) c: V26 K, (k,2) = (k',2) (since AX is dense in K) c: k = k' * Hence A is one-to-one. * * * * (b) Range of A = {x 6 X. :51 k5 K with x (y) = (k,Ay) VyE X}. _ - * Also A 1 is defined on a dense subspace of K, and domain of (A 1) * is a subspace of X . In fact * {x 0((A'1>*) m * .. x.*: site K with x (A 12) =<1t,2) vie AX} * * * = {x e X.: ake K with x (y) = (k,Ay) VyE X} (since A is one-to-one onto AX from X). Therefore 0(A*'1) = 0((A’1)*). * *- Also for x E D.(A 1) *-1 * * A (x) = kw x (y) = (k,Ay) VyE X- e x*(A‘12) = (k,AA'li) V2 5 AX e x*(A‘19.) - (k,2.) V2 5 A-X - * * e (A 1) (x ) = (c) From 1.2.1 in the proof of the previous lemma 1.2, for x,yE X INe (x),yI < ”(9) " m(e) LIL—ML- a. e. M(6) + m(6) Thus for k,2,e A(X) 47 I(A*"1N9A'1(k>,2)l = I(NnA"l(k),A'12)l < M(6) — m(6) AAA-I(mliz + helm? e -— M(6) + m(6) 2 a°e' n M(6) — m(e):llkll2 + IIRIIZ a e e M(6) + m(6) 2 ° ° ° Thus 2 2 :*- -li < M(6) - m(6) k' +. Q “A lNeA llB-— .:”E M(6) + m(6) 2 I E1 III—1 = Me) - mm) 6 M(6) + m(6) ° Hence the result. 1.4. Relationship with the Case of Hilbert Space Valued Random Vari- ables. Main Theorem I. If the spectral density fe satisfies the boundedness con- dition and A: X + K is one-to—one and AX dense in K then there is a unique stationary stochastic process (nu: 1m < n < 00} which is B(K,K) valued and is such that (i) Rn(n) = A*-1Rg(n)A—l on AX 2 M(6) + m(6 lNeA'l] = A*-1f€(6)A-l on AX _ *- (ii) fn(6) ' ) [IK + A where IK denotes the identity Operator on K and f€(6) is f in our previous notation. 6 Proof. 2 "' M(6) + m(6) UK a bounded operator defined on AX. Let ge denote its unique con- *— - *.. .— By Lemma 1.3 A lf€(6)A 1 + A 1NeA 1] is tinuous extension to K. g6 is then a B+(K,K) valued function 48 defined on the unit circle. Further g(6) is strongly measurable since f(6) is assumed to be so. Also by 1.3(c) .“ge“.€ L1[0,2n). Hence g6 '18 Bochner integrable. Thus for any n, EDA-1 defined on AX is such that V k.€ AX 2n . -1 :2 -.l_ *-1 -1 .hgnA (k)h - 2” g (A feA (k),k)d6 1 2n = 27 (I) (gn.k>de I A IIgIILl[0,2n)IIkII2 < w. Hence gnA-l admits of a unique continuous extension to K, say nn. (nu: -m < n < 00} is then a B(K,K) valued stochastic process. In fact {nnz -m < n < 00} is stationary as shown in the following reasoning. For k,£ 6 K we must show that (nnk,nm£) depends only on m—n. Since A(X) is dense in K, 3 sequences {xp}, {yq} in X such that A(x ) + k in K P and A(yp) + R in K. Then, since nn and nm are bounded (nnk.nm£) 11m (nn(Axp).nm(Ayp)) p-W = lim (tnA'1(Axp),th‘1(Ayp)) p—hm = 11 , p+2 (Enxp Emyp) 2 = lim -—'f P+00 0 H -i(n-m)6 e (fexp,yp)d6 49 which depends on m and n only through n-m. Furthermore 2" -i(n-m)6 -l -1 (nnk.nm2) = 1m m-E— fe (feA (Axp),A (Ayp))d6 pw 0 2N _- _ *_ _ = Iim-—— f e 1(“ m>e(A 1f A 1(Ax ),Ay )d6 2n 6 p p pw 0 2N _ _ *_ - = 1—-f e i(“ m) {lim (A 1f A l(Ax ),Ay )}de 2n 0 6 P P p-NX) 2n =-%; f e-i(n-m)e(ge(k),(1))d9a O the last two steps being true since ge is a bounded operator. Thus on A(X) _ _ -1 -1 _ *-1 -1 _ *-1 -1 Rn - (nn.n0) - (EnA ,EOA ) - A (En,€O)A - A RE A n n f (e) = A*'1f (6)A-1. n E n D and due to continuity of all functions involved, Rn and fn (6) are *-1 -1 n *-l -1 the unique continuous extensions of A RE A and A fg (6)A n n respectively to K. 1.5. Factorization of the spectral density. Corollary. If O6: K + K is the generating function given in [7] 7.3.5, for the B(K,K) valued stationary stochastic process {nn: -m < n < 0°} and if fe satisfies the boundedness condition and A: X + K is one-to-one and AX dense in K then ¢6A: X + K is such that * fe - (¢nA) <¢nA>. 50 Proof. By Theorem 1.4, ge is the unique continuous extension of *_ - A 1f A 1. So that f = A 86A *3! _ * A ¢e¢oA - (¢9A) (¢9A). 1.6. The Prediction Error Matrix and the Predictor for a Banach Space Valued Process. For the B(K,K) valued stationary stochastic process {nn: -w < n < Go} a schematic algorithm to obtain the prediction error matrix Gn and the linear predictor av of nv for v > 0 based on the past Inn: n.: O}, is given in [7], Chapter VII. We shall now find the same for the B(X,K) valued process {§n: -w < n < 00}. 1.6.1. Notation. For 3 c B(X,K) let 3(3) denote the smallest (strongly) closed subspace of B(X,K) containing the set {SB: 8 E S, B 6 B(X,X)} 0(3) denote the smallest closed subspace of K containing the set {Sx: S E S, x e X}. We shall use the same notation also for subsets S of B(K,K). In this notation then 5(3) = B(X,O(S)) (Ref. [7], 3.2.3, Chapter III, p. 9). 1.6.2. Definition. For the stationary stochastic B(X,X) valued process {En: -m < n < 00} define 51 Mn = O{Ek: k.: n}, Mn = oIEkx: x E X; k :_n}, -® < n < w M = n M , M = n M n n M = E{Ek: -w < k < co}, Mco = O{Ekx: x 6 X, -m < k < 0°}. 00 Furthermore let Bn’ Bn for -w :.n i w denote the corresponding subspaces for the B(K,K) valued stationary process {nn: -m < n < 00}. 1.6.3. Main Theorem II. The two stationary stochastic processes {Enz -w < n < 00} and (nu: -m < n < 00} are further related as follows (i) For each integer v > 0, if Ev denotes the projection of Ev on M0 and nv the prOJection of nv on B0 then * ii G = A G A () n ”n (iii) Ev = lim 2 Evkg-k n+0° k=0 iv6 16 -1 where EV is the kth Fourier coefficient of [e- O(e )]O+O , k O being the generating function of the process {nn: -w < n < 00}. Proof. (1) Note that E = n A for each k. Also AX is dense in K k k and nk is bounded. Therefore for each k o{£kx: x E X} = O{nk2: t 6 K}. Now for each x E X 52 nvA(x) (nVIBO)(AX) = (nvAxlBO) = (g A-leIB ) v 0 = (EvXIBO) = (EVXIMO) _ = (anIMO)(x> H m C A >< v (ii) Now for each x E X, y E X (5*GnA(X))(y) ((nl - fil)(Ax), (n1 — Ol)(Ay)) (by definition of Gn) (nle - nle, nle — nlAy) (flA'1(Ax> - 51(x).zlA‘1(Ax) - zl> (using nn = EnA-l on AX and (1)) (61x - EIX.€1x - EIX) G€(x). (iii) For each integer v > 0, n h (x) = lim ( z E n )(x) Ref. [7], 7.4.11, Chapter VII, v vk -k . n+m k=0 p 108. Therefore 53 €v(X) = nvA(X) = nv(AX) n = lim ( Z Evkn-k)(AX) n-*0° k=0 n -1 = lim ( z Eka_kA )(Ax) n+00 k=0 " -1 = lim 2 E t A (Ax) n+m k=0 vk -k n = lim 2 E E (x). n+w k=0 vk -k 1.7. Note. For results in this chapter the boundedness assumption 1.1 was made on the spectral density of the process and it was further assumed that the map A: X + K be one-to-one with the image of X dense in K. The restriction of AX being dense in K is easily deleted by re- placing K by the Hilbert space H generated by AX in defining the process {nn: -w < n < 00}. Generalization when A is not one-to- one calls for a closer look and may be handled as follows: Let K(P) denote the kernel of any operator P. Then due to boundednessassumption 1.1 1.7.1 K(A) = K(fe) a.e. e where f6 denotes the quadratic form of f Let the quotient space, 6. denoted by I, be such that v x t x,. 15;“;- inf IIx - sII = d(x,K(A)) ) where ; is the equivalence class x + K(A) of elements of x. Now (X,“ h) is a Banach space. Ref. [3], P. 140. 54 The linear map AQ defined on it as follows AQ(§) = Ax for Q t Y is continuous in the norm of X. This is shown as follows IIAQII= sup IAQ<§>II= a. ,stu. Hxh=l x:d(x,K(A))=l Also for each x 6 X with d(x,K(A)) = l, x = x + 6 - 6 whatever 6 6 K(A). So .hAx“ i inf {IA(x + 6)“ + hAdh}_: inf {“x + 5“} 5_1 6€K(A) deR(A) since A6 = 0 for O E K(A) and hA“ = 1. Hence A is continuous. Furthermore A is such that Q Q (AQAQ(;)’;) = = (A*Ax.y) for x,y e x. Thus * : M(6)A A a.e. 6 m(6)A*A _: f Q Q Q Q 9 and A : X + K is one-to-one, and without loss of generality A (X) Q is dense in K. Q To make sense of the definition of EnAQ under AQ’ we must have an uniquely defined on X. It is here that we would need the assumption of linearity of En. Let, for x and y on the image of X in X, Ax = Ay. Then 55 h€n(X) - £n(y)h2 = hanh2 = (£n(x - y), £n(x - y)) = (50(x - y). €O(x - y)) 1 2n . = as fe(x - y)d6 = 0 (due to 1.7.1). So if x = y mod K(A) then gn(x) = €n(y). Hence ‘V x.€ X we may define €n(;) = €n(x). And the preceding procedures now apply to -1 Q dictor for the process {€n: -w < n < co}. EnA to ultimately yield the prediction error matrix and the pre- CHAPTER IV ON LINEARIZING STATISTICS OF TIME-SERIES FOR NONLINEAR PREDICTION The problem of nonlinear prediction of a stationary stochastic process was dealt with, in the second chapter of this thesis at the level of defining the predictor and showing that we have a determinate mathe- matical problem and then obtaining the predictor by a more or less direct attack. In order to utilize the time and spectral domain analysis to obtain an algorithm for the predictor at a more efficient level, N. Weiner in [18] suggests a method of relating the nonlinear prediction problem to linear prediction of an infinite-variate stationary stochastic process. In this chapter we will explore this relationship further. 1. The Related Infinite-Variate Process Let {fn : -m < n < 00} be a univariate strictly stationary stochastic process defined on a probability space (9, F, P), with zero expectations. Let T denote the shift associated with the process. Further assume that there exists a > ()such that IfCI §_a. Let H = { H f1 : I C {0,-l,-2,...} and I finite}, where elements of H are writiZfi as products of £1 with decreasing indices. Let {¢j' : 1 §_j' 0 (fle0) = XvO. where X; denotes the linear predictor of Xv based on the present and past of the process {Xn : -w < n < 00}. Proof (i) The polynomials in {fn : n §_k} form a dense subset of Mk' Mk’ on the other hand, is the closure of finite linear combi- nations of products from {fn : n §_k}. Both MR and MR therefore, contain a common dense set. Both being closed, it follows that Mk = Mk' (ii) then follows from the definition of predictors. 1.3. MAIN RESULT I . The nonlinear predictor (fV'MO) may thus be obtained as the 0th coordinate of the linear predictor RV of the ‘zm-valued strictly stationary random process {Xn : - m < n < w}. 1.4 Let C()denote the set of all infinite vector sequences that tend to zero. C()is then a (separable) Banach space and its dual space £' is separable. 1.4.1. MAIN RESULT II Under the assumption that Rh 6 C0, the process {En : -w < n < co} 1 * defined from the separable Banach space CO = .K to L2(O) as follows 1 y mi . £n(x) (o) = x 0, the n step transition matrix is I l 1 n ' 1 l n I— _ a- _ — — - 2 + 2 (2 l) 2 2 (2O 1) PD = 1 1 l l n n 2 .. 2 (Za—l) 2 + 2 (20’1) J All moments of finite products of {fa}0° are then obtainable as —00 v follows. Since for any integer v > O, fn is l or fn according as v is even or odd respectively, to obtain moments of finite products of {fn}0° we need just evaluate B(fn f ... f ) for -w < n1 < < 1 n2 “R -—00 < :1k < w. Notice that for an integer v > 0 n2 6O B(vaf0=l) - E(vaf0=-l) = P“(1,1) — PV(1,-1) -{PV(-1,1) - PV(-1,-1)} l. 2 + Ian—i)" 4% - l, (mi—1)") - g A, (zen-1)" — § -% (2a-1>V} 2 (2o-1)V. Similarly f(fnlf0=1) + B(fVIfO = -1) = PV(1,1) - PV(1,-1) + PV(-1,1) - P“(-1,—1) = 0. Now f(fofn1 fn2 ... fnk) = f:(fn1 fn2 ... fnkIfO = 1) P(f0 = 1) — E(fnl fn2 ... fnklfo = -1) P (f0 = -1) 8.]... = — =— 2[B(fn1 fn2 ... fnkIfO 1) f(fn1 fnz ... fnklfo 1)] 1 = E-[E(fn2 ...fnkIfnl - l, f = 1) P(fn.l = llf0=1> O =-l,f0=l) P(fnl =-1|f0=l) -E(f ...f f n2 nkI n1 -E(fn2 ...fnkIfnl =l,fO=-l) P(fnl =1If +E(fn2 ...fnkIfnl - —1, f0=-1) P(fnl=-1If0=—l)] l n n - §{E(fn2 ...fnkIfnl =1) P 1 (1,1)-E(fn2 ...fnklfnl= -1)P l(-1.1) —E(f ...fnkIf - 91 s- “1 _ _ n2 n1 -1) P (-l,l)+E(fn2...fnkIfnl 1)? ( 1, 1) 1 61 = listf f If -1) {P“1(1 l) p“1 1 1 } _ 2 n2"° Bk nl- ’ — (- s ) +E(fn2ooofnklfn1 - -1) xiPnl(-l.-l) -Pn1<-1.1>}1 = l{B(f f If =1) (2 -1>n1 + B(f f If “ -1)(2 1>nl 2 n ... n n a n ... n n - o- ] 2 k 1 2 k 1 1 n1 2 “2 “k “1 2 “k 1 Again repeating the same process n "n “n =1) {P 2 l<1.1>+Pn2 1(4.1)} E(f f f ) = 1-(2 —l)n1 [B(f 0 n ...n 2 a n3... 2 f |fn 1 k “k _E(fn ...fnklfn =-1) {Pnz-n1(1,-1)+p“2'“1 (-1,-1) 1 3 2 _ 1 n1 1 n2‘n1 1 1 n2"“1 - -2-(2a-1) [B(fn . ..fn lfn =1){ ‘2' + (20"1) + 7 - f(ZG-l) } 3 k 2 nz-nl nz‘n -E(fn ...fn |f = —1) §-- %-(2a-1) + %-— %-(2a—1) } 3 k 2 1 n1 s _(2a_1) [E(f ...f If =1) -E(f ...f If = -1)] 2 n3 nk n2 n3 nk n2 Proceeding thus we obtain n n “n E(f(fnl...fnk) ='%(2a-1) 1 (2s-1) 3 2 =-1)] ...[E(fnkIf -1)+(-1)kE(fnkIfnk_l nk-l n +(n3—n )+... n -n _ - %{(2a-l) l 2 }x 2 (2a-l) k k 1 if k is odd 0 if k is even 62 n +n +...+n —(n +n +... + n ) l k 4 k-l = (2a—1) 3 2 if k is odd 0 if k is even We will actually consider the corresponding prediction problem for {gn: -~nI 0 nl nI 1 J 63 16h = z E(X , xh,) e + 0, if I + J is odd 01 J lhlipl '6 nl+n —n2+ ...+n -nI_l+m +m3-m2+. B(xoi xh,) e1 h + z (2a-1) 3 I l Ihlinl J h >nI if I and J are both odd n +n -n +...+(h-nI)+(m2+h—ml—h)+ 2 B(XOi xh.) sigh + 2 {(Zo—l) 1 3 2 lhlfPI ' J lhl>n1 16h X e } if I and J are both even = 2 16h Ihlinl B(x01 th) e if I + J is odd ..‘HD -m J J-l ...+ +h- -h) (mJ mJ—l 6 16h 2 E(X X ) e1 h + c(i,j) 2 e if I, J odd Ihl —-I n1 6 h i h 2 B(X Xh,) e1 h + c(i,j) Z (Zn-l) e 6 if I, J even, 01 3 where c(i,j) is the appropriate constant bounded by l. The spectral density in all these cases is bounded above by some M(6) which is integrable since 19h (i)| Z B(g g ---g 3 g ) e I Ihlinl 0 n1 nI h ”1+“ 64 i(n +1)6 (ii) c(i,j) X ' eieh = 2 c(i,j) e I lhl>n1 l - 816 which is integrable on the unit circle, n +1 (111) |c(i,j) 2 (201—1)h eiehl :_c(i,j) 2(2a-l) I z (201—1)h |h|>nI h>0 n +1 = 2c(i,j) (2a-1) I l - (Zn-l) nI+l = c(i,j) (Zn-1) l - a The spectral density, however, is not bounded away from 0. Consider the diagonal element fe(i,i) where XOi is gogl...gI. Then ieh ZI ||r£1|1 E K(fi th e tends to 0 as I tends to w. Thus there exists a subsequence {in} such that f (in, in) tends to 0 6 as n tends to w. Hence fe cannot be bounded away from zero. 1 So if we consider the infinite variate process {ETE1 gn: -w