llllHWIIIHIIIHIHIHIIll!JillNillflllllllllllllllllllll THESIS 31293 01417 2690 I This is to certify that the dissertation entitled 0N PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY MODELING AND PARAMETER ESTIMATION presented by Hao Zhang has been accepted towards fulfillment of the requirements for Doctor degree in Statistics V. Mandrekar %an&<§kfi2b Major professor Date JunefilO. 1995 MS U is an Affirmative Action/Equal Opportunity Institution 042771 A s, ———_— _. LIBRARY Mlchlgan State University PLACE N RETURN BOX to remove this checkout from your record. TO AVOID FINES Mum on or baton duo duo. DATE DUE DATE DUE DATE DUE msu lo An Affirmative Acuavsquu omuy It'lotltulon “WMJ ON PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY MODELING AND PARAMETER ESTIMATION By Hao Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1995 ABSTRACT ON PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY MODELING AND PARAMETER ESTIMATION By Hao Zhang We study a special class of periodically correlated time series for which the best linear predictor of a:(t) based on all past information , denoted by 5:(t), depends on at most p steps back, i.e., there exists an integer p such that 5:05) = P7‘OJ’($(1¢)III=(t - 1), - - - ,w(t - 10)),Vt- We call such a periodically correlated process a periodic autoregression (PAR). A PAR is equivalent to the following time domain model: t W) - 200', t)5'3(t - j) = 0(t)€(t), 3: where p(t), 0(t), a.( j, t) are all periodic in t and 6(t) is the innovation process. We first show that Burg’s maximum entropy principle can be generalized to periodically correlated case and the generalized principle results in a PAR. We then study estimation problems in a PAR model. We show that the Yule- Walker equations provide consistent estimators for the coefficients a( j, t). We also give a uniform convergence rate of the estimators. Finally, We generalize Akaike’s Bayesian Information Criterion to give consistent estimation for the orders p(1),p(2), - - - ,p(T)- To my wife and my parents iii Acknowledgment My deepest thanks go to my advisor, Dr. Mandrekar for his guidance and encouragement. His patience and thoughtfulness have made my stay much easier. I also thank my thesis committee members, Drs. LePage, Salehi and Sledd for their time and interest. I appreciate the support of the Department of Statistics and Probability in the last five years when I experienced growth in research, teaching and statistical consulting. Never forgotten will be the numerous discussions with many faculty members. Finally and importantly, I thank my wife, Hong, for her support and encour- agement, my parents and sisters for their long-time support. iv Contents Introduction 1 Periodic Autoregression 2 Maximum Entropy Modeling of PC Time Series 2.1 Introduction ................... . ............ 2.2 Maximum Entropy Modeling of PC Time Series ........... 3 Parameter Estimation in PAR Model 3.1 Preliminary Results on Martingale Differences ............ 3.2 Convergence Rate of Sample Covariances ............... 3.3 Convergence Rate of Coeficients ................... 3.4 BIC for Order Estimation ....................... 13 13 14 22 23 34 Introduction Time series analysis is one of the fields that have attracted interests of proba- bilistists, statisticians and researchers from economics, engineering, social sciences and other areas. Stationary time series has been studied extensively because of its application in many fields and the adequate mathematical tools to handle it. One of the most important class of stationary time series is ARMA models which are widely used in applications. Problems studied for ARMA models are param- eter estimation, spectral estimation and prediction. AR models, as special cases of ARMA models, received more attention since stationary time series can be ap- proximated by AR models (An, Chen and Hannan, 1982). What makes AR models even more interesting is the application of informa- tion theory in the study of stationary time series. Burg (1967) used the idea of maximizing entropy in spectral estimation. He showed this approach results in an AR model. Burg’s maximum entropy principle is better justified by Parzen (1983) and has been widely used today in spectral estimation. Akaike (1974) ap- plied information theory to develop the well-known Akaike’s Information Criterion (AIC) for order estimation in AR models. AIC tends to overestimate orders and yields inconsistent estimators (Shibata,1976). To get consistent estimators for the order in an AR model, Akaike (1977) later modified AIC to Bayesian Information Criterion (BIC). An, Chen and Harman (1982) proved that BIC gives consistent order estimation. Although stationary time series describe many phenomena well, there are situa- tions when data exhibit non-stationarity. Efforts to study non- stationary processes can be found constantly in literature. One natural generalization of stationary process is Loéve’s harmonizable pro- cess. Spectral domain problems for harmonizable processes are studied by Cramér (1961). Cramér (1964) also showed that time domain problems are difficult for harmonizable processes. So far, no time domain model is discussed for general harmonizable processes. The effort is thus concentrated on non-stationary process for which both time domain and spectral domain approaches can be applied. The major example in this direction is periodically correlated (PC) processes, which are harmonizable as shown by Gladyshev ( 1961). After the early work of Gudzenko (1959) and Gladyshev (1961), several authors have studied the Kolmogorov-Wiener problem (Miamee and Salehi,1980, Hurd and Mandrekar,1991). Motivated from applications (e.g., Bloomfield et al, 1994, Gardner, 1986), Hurd (1989), Hurd and Gerr (1991) studied some inference problems for spectral measure of PC processes. Time domain models have been also studied for PC processes. Pagano (1978) introduced periodic autoregression (PAR). Anderson and Vecchia (1993) studied periodic ARMA model and gave the asymptotic properties of sample covariances. Adams and Goodwin (1995) studied on-line parameter estimation for periodic ARMA models. We study PC time series systematically by first showing that the analogue of Burg’s algorithm holds for PC time series. For this purpose, we introduce a slightly different definition of PAR. From this definition, we can easily show that PAR satisfies maximum entropy principle. Our definition reveals an intrinsic property of PAR models and overcomes a technical difficulty encountered by Pagano’s. These are all discussed in Chapter 2. We then consider parameter estimation for PAR models in Chapter 3. Here, we study estimation of coefficients in PAR models. We give convergence rates for these estimates by first studying the convergence rate for sample covariances. Then we apply these rates to give consistent estimators for the order of PAR models. Our work generalizes Akaike’s BIC order estimation to PAR model. Numerical results are shown at the end of Chapter 3. Throughout this thesis, we assume all sequences of random variables have zero means. Let "A“ denote the sup-norm of a real matrix A and Z denote the set of integers. Chapter 1 Periodic Autoregression A sequence of real valued random variables :1: = {$(t), t E Z} is called a periodically correlated if for some integer T and any t, s, E$(t) = Ea:(t + T), Cov(:r(t), 3(8)) = Cov(a:(t + T), :1:(s + T)). We always assume that Ea:(t) = 0 for all t. Then the definition is equivalent to that there exists a unitary operator U such that for some T, a;(t + T) = Ua:(t), Vt. One chooses the minimum T as the period of the PC sequence. The best linear predictor of :z:(t), given 316(3), 3 < t, is the projection of a:(t) onto the closed space spanned by 31(3), 3 < t (Kolmogorov, 1941). Hereafter, we denote this projection by Proj(a:(t)|a:(s), s < t). From the definition of PC sequences, we see that if Projwnzc), s < t) = fauna — j), i=1 then Proj(:1:(t+T)|ac(s),s 0, such that for any t at) = Proj|x. We do not consider the case :i:(t) is zero which is less significant in view of prediction. Then for each t, there is a smallest positive integer p(t) satisfying 53(t) = Pr0j($(t)lm(t - 1),$(t - 2), - ° - ,$(t - p(t)»- We will call p(t) the order of p(t). We will show in Proposition 1.0.1 that p(t) is periodic, then we can simply say that x(t) has order p(l), p(2), - - - , p(T). We can write p(t) (t) = 20(1) 13):!C (t - j) i=1 8) or equivalently p(t) -Za(j,t) t - i) — 6(13) (11) where 5(t) = :L‘(t) — :i:(t). Thus e(t) lS orthogonal to 23(3) for all s < t. Let 02(t) = E|a:(t) — 52(t)|2. If 02(t) > 0 for every t, we say that $05) is non-deterministic. In this case, we can write (1.1) as 90) at) — g as, me - j) = a(t)e(t>, (1.2) where {e(t)} is a the innovation sequence, 1.e., 6 = at) — at) (t) 0(t) - The next proposition says all parameters are periodic in t. Proposition 1.0.1 If a:(t) is a non-deterministic PAR, then the functions p(-),a(-, -) and 02(t) are unique and for any t p(t) = W + T) 02(t) = 02(t + T) 0(1', t) = 0(1) t + T) Proof. p(.) and a2 (.) are obviously unique by definition. Since a: is non-deterministic, it follows that :z:(t — 1), ...,x(t — p(t)) are linearly independent and thus a(-, -) is unique. Notice that p(t) is periodically correlated if and only if there exists a unitary operator U such that U$(t) = :r(t + T). It is very easy to justify UProj(y|S) = Proj(Uy|US) for any y and (closed) subspace S, where US = {Um : a: E S}. It follows that UProj(a:(t)|:r(s), s < t) = Proj(:r(t + T)|:1:(s + T), s < t) UProj(:I:(t)|:r(s),t —p S s g t) = Proj($(t +T)|:r(s + T),t —p S s S t). 6 We conclude from the above two equations that p(t) = p(t + T), a"(t) = 02(t + T) Since then p(t) U§:(t) = ;a(j,t)a:(t -— j + T). On the other hand, p(t+T) U5:(t) = :i:(t + T) = Z a(j,t+ T):1:(t + T — j). i=1 From the facts that p(t) = p(t + T) and (12(3), for t+ T — p(t) g s S t+ T — 1 are linearly independent, we obtain a(j, t + T) = a(j, t),Vj, t. We see that for non-deterministic PC sequences, our definition is equivalent to (1.2) with periodic parameters, which was used to define periodic autoregression by Pagano (1978). The definition here overcomes a technical difficulty. (See the remark at the end of this section.) We are ready to give a characterization of PAR model now. Let £(t) = (a:(1 + tT),a:(2 +tT),. - -,:1:(T + tT) )’, v t, R(t, s) = Ea:(t):c(s). Theorem 1.0.1 Let a: be a PC sequence. Then :1: is a non-deterministic PAR is equivalent to the followings. (i) $(t) satisfies the following time domain model p(t) $(t) - Z: a(J',t):1«'(t - j) = 0(t)€(t), (13) where €(t) is the innovation process and 0(t) > 0. (ii) The Yule- Walker equations hold, i. e., p(t) R_ 0, Vt. (iii) 5(t) is T dimensional AR(p) model for “[190;- Pisa 1+1 (where [a] denotes the integer part of a.) Namely, there exist matrizes AJ- and G and a T dimensional time series €(t) such that 5(t) — z A,-:i:'(t — j) = €(t) and (1.5) Cov(é(t), 5(3)) = G6”, Eé‘(t):i:'(s)' = 0,Vs < t. (1.6) Proof. It is obvious that (i) and (ii) are equivalent and are necessary and sufficient conditions for :z: to be PAR by definition 1. Let us prove (i)=>(iii). Suppose a: is PAR. Then a: has the time domain expres- sion given by (1.3) which can be written in vector form 801:“) —:Bj$ =D€(t) (1.7) where p is defined in the theorem and D=diag(a(1), ..., o(T)), B, = (bj(k, l))Z:,___1 for bo(k,l) = 6k,l—a(k—lil)X{k>l} bj(k1l) : (1(Tj + k _ 1’ k)X{Tj+k—lgp(k)}a 1 S j S p Let €(t) = (6(1 +tT), 6(2 + tT), - - ~ ,e(T+ tT))'. Then it is obvious that €(t) 1 55(3) for any 3 < t. Let A,- = 80—18,, €(t) = 130-119;. Then €(t) satisfies (1.6) with G = B51D2(Bo-1)’ and (1.5) follows from (1.7). (iii)=> (i): Since G is positive definite, it has the following Cholesky decomposition G = LHL' where L is lower triangular with all diagonal entries being 1 and H is diagonal and non-singular. Multiplying (1.5) by L“, we get 9 L-lsa) — Z L"1A,-:I:'(t — j) = L-la j=1 Since L‘1 is also lower triangular and Var(L’1é) = H is a diagonal matrix, the scalar form of the above equation will yield (i). QED. Corollary 1.0.1 A nondeterministic PAR is purely nondeterministic. Proof. Let a: be a nondeterministic PAR. Then the corresponding multiple se- quence 53' is a stationary AR. It is well known multiple AR is purely nondetermin- istic. In [19], and [22], it is proved that :1: is purely nondeterministic if and only if 53' is so. Thus the proof is completed. QED. Remark. A stationary AR(p) model is defined in literature (Hannan, 1973) as a second order stationary sequence satisfying :0 $(t) — Z"? 01$(t- j) = 0603), for some constants a > 0, aj such that :0 |1 — Zajzjl 75 0, for |z| S 1. j=l and a white noise {e(t)}. The constraint for the coefficients is a necessary and sufficient condition for the existence of a solution of stationary sequence (see, e. g., Hannan, 1970). Analogously, we need to know constraints on the coefficients a( j, t) to guarantee a PC solution of (1.3) exists. We can give the constraint in two equivalent ways. We note that (1.3) with periodic parameters has a non- deterministic PC solution if and only if {p(t),02(t),a(j, t), j = 1,---,p(t),t = 1, 2, ~ - - ,T} uniquely determines R(t, s) for It — s] g p(t) such that R(t, s) = R(s, t) = R(t + T, s + T) and for any t = 1,2, - - - ,T, the matrix P. = (W — it — with (1.8) is positive definite. The necessity is obvious and sufficiency follows from Theorem 2.2.1 in the next chapter. We also see from Theorem 1.0.1 that (1.3) has a PC solution if and only if P det(I — Z Ajzi) 7e o,v |z| g 1. (1.9) i=1 10 Indeed, (1.9) implies there is a stationary solution of (1.5)(Hannan, 1970, page 326). The corresponding scalar sequence must be a PAR satisfying (1.3) by Theo— rem 1.0.1 It must be non-deterministic since a(t) is positive. We use the technique of Whittle (1963) to show the other way. Suppose that (1.3) has a PC solution. Then the corresponding vector-valued stationary sequence satisfies (1.5) and (1.6). Define the pT dimensional random vector / 5(t) ) EU—l) ( :‘E(t — p + 1) f Project Yt such that Y: = PYt—l + Z, Z, _L Yt-l and P is the projection matrix. We see from (1.5) that (Al A2 A,,_1 AP) I (....1) Zt = (é(t)’,0, ...,0)’. Let /\ be an eigenvalue of P and E be the corresponding left eigenvector. Then {P = A5. (1.10) Observe that Cov(Yt) = Cou(Yt_1) is positive definite , then Eléztl2 = EIEKP -- EIEPYHI2 = £Var(K)€’(1 — IAIZ) 2 0. 11 it follows that IAI S 1 with equality if and only if {Z = 0. Because of the special form of P, ( 1.10) implies that E and {0 must be 0 together where {0 is the first T entries of 6. Since Var(é't) is non-singular, for non-zero E, EZt 2 £053 74 01 So all the eigenvalues of P have modules less than 1. Now suppose 2 satisfies p det(I — ZAJ-fj) = 0. i=1 1 It suffices to prove z‘ is an eigenvalue of P. There exists an row vector 61 6 RT such that 51(1 — f: 14,51): 0. (1.11) ":1 Let J 5,- : 25,--1 — £1Aj_1, j: 2, ...,p. (1.12) Set 5 =(€1,...,€p). Notice (1.11) and (1.12) imply immediately 5 P = 25. (1.9) now follows. 12 Chapter 2 Maximum Entropy Modeling of PC Time Series 2.1 Introduction The entropy of a random vector in R" with probability density function f (:13) is defined as H(X) = -E1nf(X) = — l... f(:v)1nf(x)d$- Burg (1967) developed a maximum entropy approach for spectral estimation of stationary time series which has been widely used since then. Burg’s approach can be stated in the following way. Suppose p+ 1 autocovariances R(O),R(1), - - - ,R(p) of a stationary sequence are known (usually estimated from observations). Instead of taking R(n) to be 0 for all n greater than p, as in windowed spectral estimation, we extrapolate R(n) for n > p in such a way that maximizes the entropy H(CL‘(t), $(t — 1)1' ' ° ,£E(t — 71.)), for all n > p. 13 It turns out that the only such extrapolation is given by Yule-Walker equations, thus this maximum entropy method results in an AR model. We consider here the same question for PC sequence. Suppose for each t = 1, 2, - - - ,T, we know the covariance matrix of (~73(t),:v(t - 1), - - - ,$(t —pt)) for some integers pt > 0. Because the time series is PC, we do not require that the pt’s are the same. We will extrapolate the covariance function in such a way that maximizes the entropy H(:v(t),:v(t — 1). - - - .xa — s» for all s < t. Problems we will consider are ( 1). whether there is a PC solution to this maximizing problem and (2). the properties of such PC sequences which maximize the entropies. We will prove in the next section that there is a unique Gaussian PAR sequences which maximizes the entropies. 2.2 Maximum Entropy Modeling of PC Time Series To avoid the ambiguity of saying part of covariances of a sequence is known without knowing such sequence exists, we state the problem in a more mathematical way. Let p1, p2, - - - , pT be positive integers and r(., .) be defined on the set A = Uf=1{(u,v) E Z x Z :t—pt S u,v S t}. We assume that Pt = (TU _j1t_ k))§,tk=o 14 is positive definite for all t = 1, 2, - - - ,T and r(s, t) = r(s + T, t + T) = r(t, s) (2.1) whenever (s, t), (t, s) and (s + T, t + T) are in A. These assumptions are seen necessary for r to be a covariance function of a PC time series. Let [C be the set of all PC time series with period T whose covariances are r(t, s) for (t, s) E A. The next theorem says IC is not empty. Theorem 2.2.1 There is a non-deterministic Gaussian PAR in IC. Proof. Since Ft is positive definite, the equations Pt r(t _ k3 t) — 20(j1 t)7“(t - kit — j) : 6k,002(t)1 for k = 0111' ° '1pt1 (2'2) i=1 have unique solution a(1,t),a(2, t), - - - ,a(pt,t),02(t) and 02(t) > 0. These Yule- Walker equations actually provide a way to extend r(t, s) to be a covariance func- tion of a PC time series. But we will adapt a statistical approach here. Let i E {1,2,---,T} be such that i—p,St—ptfoth=1,2,---,T. Then there are Gaussian random variables r(t) of 0 mean, for i — p,- S t S i — 1, such that Ea:(t):1:(s) = r(t, s), for i —p,~ S t,s S i — 1. Let €(t), t Z i be a sequence of i.i.d standard normal random variables and also independent of {x(t),i — p,- S t S i — 1}. Define, for t 2 i, p(t) ac(t) = Z: 0(3) t)$(t - j) + 0(t)€(t) 15 where p(t),a(j, t) and 0(t) are the periodic version of pt,a(j, t) and 0(t) respec- tively. This definition together with (2.2) yield Ea:(t):r(s) = r(t,s), for (t,s) E A. We now show that r(t) is PC, i.e., E:r(t + T):1:(s + T) = Ea:(t):1:(s), (2.3) for Vt, s _>_ i — pi. We observe that (2.3) is true for i -— p,- S t, s S 0, because of (2.1). Assume it is true for i — p,- S t, s S n. Replacing r(n + 1) by the definition, we have for t < n+1, p(n+1) Ecr(t):1:(n + 1) = Z Ea:(t):r(n + 1 — j) j=1 p(n+1) = 2 Ed?“ + T):13(n + 1 + T) (by the induction assumption) j=l = Ea:(t + T)a:(n + 1 + T) Similarly, we can prove Ex2(n +1) = E$2(n + 1+ T). Thus (2.3) is true for t, s S n + 1. So we have proved (2.3). Let r(t, s) = Ea:(t):r(s) for all t, s 2 i-p,. Then r(t, 3) still satisfies (2.1). Now we extend r(t, s) to Z2 by r(t — mT, s — mT) = r(t, s),Vm Z O. 16 For any m S n, the matrix {r(t,s) : m S t,s S n} is positive definite because of the periodicity of r(t, s) and the fact that {r(t), t Z 0} are linearly independent. So there is a Gaussian sequence with r(t, s) as covariance function by Kolmogorov’s Theorem. This sequence must be a PAR by Theorem 2 and non-deterministic since 02(t) > 0. QED. This Gaussian PAR must be unique in distribution. It might have orders p(t) S pt for t = 1,2, - - - ,T because a(j,pt) might be zero. But the orders are uniquely determined by I‘t’s. The next theorem says that it is the one that maximizes entropy. Theorem 2.2.2 Let r(t) be a Gaussian PAR in IC, then for any 3 S t, H(Iv(t),rv(t - 1), .--,x(8)) = 313 lib/(13).“t - 1), ---,y(8)) (2.4) where the supremum is taken over all sequences Y in K: for which the entropies in (2.4) can be defined. Conversely, if a PC sequence y(t) in IC satisfies (2.4), then y(t) is a Gaussian PAR. Remark. The problem we considered here is more general than assuming R0, R1, - - - the covariance of a vector-valued stationary sequence, are known. Actually, the later is a special case of our problem here. Thus Theorem 3 contains the maximum entropy method for stationary vector-valued sequence as a special case. There is a practical consideration why we assume r(t, s) is known in the set A instead of 17 a square area {1 — q S t, s S T} for some q > 0. To approximate a PC sequence using a PAR, we might choose different orders p(l),p(2), - - - ,p(T). Given finite observations of a PC sequence, we should just estimate r(t, s) for (t, s) E A and extrapolate it through Yule-Walker equations since smaller |t —- s| tends to give better estimate of r(t, 3). To prove this theorem, we need some basic properties of entropy. The following two lemmas are known.(e.g., for Lemma 1, Choi, 1983, Parzen, 1983; for Lemma 2, Gallager, 1965, Kullback, 1978.) Lemma 2.2.1 For any random vector Y, let X be a normal random vector having the same covariance matrix as Y. Then H (Y) S H (X) The equality holds if and only if Y has normal distribution. For two random vectors with joint probability function f (x, y), the conditional entropy of X given Y is defined as, provided it exists, H(X|Y) = — f1uf 0, sup E|e(t) |4+6 < oo tez 3.1 Preliminary Results on Martingale Differ- ences In this section, we give some results of sample covariances of a martingale differ- ence. We need this to derive results for PAR model. We will give a law of the iterated logarithm for martingale difference and apply it to give the convergence rate of covariance of martingale difference. For stationary and ergodic martingale difference, a law of the iterated logarithm has been given in literature. ( See, for example,[30]). But we assume higher moments instead of stationarity and ergodicity since we believe that the assumption of higher moments 23 being bounded is less restrictive in time series than the assumption of stationarity and ergodicity. Theorem 3.1.1 Let {Ym .77", n 2 1} be a martingale difierence such that for some 6>0, M>0 andforanyn, ElYn|2+6 S M Let sf, 2 L1 E{Y,-2|.7:,-_1}. Suppose also that 2 C I 8 11m 1nf—'—l > 0, a.s. n—-+oo n Then almost surely - ?=1 Y" 11m sup = 1 "40° \/2s,2, ln ln 5?, Proof. Stout [29] proved that for martingale difference Y", the following law of the iterated logarithm holds " Y - i=1 n 11m sup 2 1 we ,/23,2,1n1ns;~; if 3?, = 2;, E(Y,,2|.7-'-_1) —> 00 and there exists a sequence Kn which is 75,4 measurable and goes to zero such that Emma: > v3.» < oo, (32) i=1 where 2 K7152: v = . " lnlnsf, In our theorem, it is obvious that sf, —+ 00. Take 1 Kn = . lnlns,2, 24 We only need to check (3.2). Since E(Y.3X{Y.3 > v33) 3 (EIYnI2+P)v;P s M - 3,7», we see that the sum in (3.2) is bounded by M i s;2_p n=l which is finite almost surely because lim inf sf, / n > 0. So the theorem follows. QED. We first state a lemma which is needed to prove our next theorem. Lemma 3.1.1 Let {me'mn Z 1} be a supermartingale difference with EYl = 0 and for some K 6(0, 1/2/, 311 ‘/21nlnsf, where 3?, = 2;, E {K2|E_1}. Suppose for some constants b 2 9, almost surely, YnSK a.s sf, S b2,Vn. Then for any 0 < 5 < 2, P(supZY, > 6{2b2 lnlnb2}1/2) _<_ exp(—s1n1nb2) "21 i=1 where ,8 = 62(1 - 6—53). Proof. Let c = Kb/VZlnlnbz, /\ = 6b—1V21nlnb2. 25 Then Ac = 6K S 1. —1/2 Since x(x ln In x) is increasing for x 2 9, we have Y” S c, Vn. It follows [31] (Lemma 5.4.1 on page 299) that n A2 A Tn = €XP(/\ZY1) eXP(“"'2"(1 + 'Z—C i=1 193.)- is a super-martingale w.r.t f}, and ETl S 1. Thus for any a > 0, P(sngn > a)S1/a. Then P(supzn: Y,- > (5(2b2 lnln b2)1/2) " i=1 2 P(supZYi > Abz) " i=1 A S P(sup Tn > exp(A2b2 - -b23(1 + 30)b2) S exp(—A2b2(1/2 — Ac/4)) = exp(—fllnln b2). QED. Theorem 3.1.2 If {e(t),.7:t,t 2 1} satisfies Assumption 3.0.1, then for any posi- tive real number d and integer T, . I 2":1 62(ST) — n] 11m su 5 < 00, 3.3 'Hoop V2nlnlnn ( ) lim sup maX0 0, let I f : fsT-l-tVT 8 14(8) = €(ST)E(ST + t) - E{€(ST)€(ST + t)|}'§_1} Then {K(s),.7-';,s Z 1} is a martingale difference. We will finish the proof by proving the following, 0A 1}, we get . . mind); Y,(s) > . 11,33”: (2nlnlnn)1/2 _ —f. (329) Now (3.7) follows (3.28) and(3.29). So it is enough to prove (3.27). To reach that goal, let p > 1 and define the stopping times rm, = inf{n Z 1, s?(n + 1) Z pzm} Let _ - 2m ——1nf{n Z 1, Krtrégxnns, (n + 1) _>_ p } Then Tm S Tm, for any t. (Notice that Tm is not a stopping time.) Then for O<6 6J2nlnlnn,i.o) P(BSItnSdlnns S P(mgup+1 351:3an Z( Y,(s )> 6\/('rm + 1) lnln(rm + 1),1.o mm) (3.30) Since we have proved lim n‘1 max sf(n) = 1, "400 lStSdlnn then almost surely for sufficiently large m, 2m—1 p < Tm < p2m+1 Using this inequality and the fact that \/p2m—1 ln ln p2m-l > p-2 \/p2m+2 ln ln p2m+2, for sufficiently large m, the probability in (3.30) is less than P({ sup ax217,03>6p'2\/p2m+2lnlnp2m‘1,i.o in m)}) (3.31) "ST m+1 3SItnSdlnn8 32 It is easy to verify for a fixed t, the martingale difference {Y,(s)X{s S Tm+1,,}, f}, s 2 1} satisfies conditions in Lemma 3.1.1 with this b2 = pm”. From Lemma 3.1.1 and the fact Tm S rm, for any t, we have (sup ZY,(s) )>6p-2Vb21nlnb2) Pn6p 2\/b21nlnb5") ”(Tm+1¢ 8’1 S exp(— —=,Blnlnp2m‘1)) ((2m —1)lnp)"fi, V1 S t S dlnn where ,8 = map-2, K). Since ,B(x, K) is increasing in x E (0,4/(3K)), then 3 > 2 for 6 > p27(K). It follows then for such a 6, Z; Z: P({Sgp 2; 13(8) > 6\/p2m-1 1111np2m—1}) 5i d( 27” + 3) )(lnp)((2m — 1) Imp)” < oo. (3.32) (3.31), (3.32) and Borel-Cantelli lemma imply P( sup max ZY,() > )6\/(Tm + 1) lnln(Tm + 1), i.o inm)) = 0 (3.33) "(Tm-+1 3StSd Inns for 6 > p27(K). Since p > 1 is arbitrary, let p goes to 1,then (3.33) is true for 6 > 7(K). Then we have proved P( max if“) 7(K)\/2nlnlnn, i.o)=0. lStSdlnns (3.27) follows now. The proof is completed. QED. 33 3.2 Convergence Rate of Sample Covariances In this section, we prove the following theorem. Theorem 3.2.1 If {x(t) : t E Z} is PAR and Assumption 3.0.1 holds, then for any constant d > 0, almost surely, lnlnN sup |R~(t,s)—R(t,s)l=0( N ) |t—s| 0 and 72 > 0 such that for any j,t ICU, t)| S warm-723') Proof. It is well known that the Wold coefficients C" of a stationary multivariate process go to zero at an exponential rate, i.e., there exists positive constants a and fi such that ”Cull S aexp(-nfi), where the norm ”Call is the maximum of entries in C". Observe that for any 0 S m < T and j = nT + m, C(j, t) is an element of the matrix C" for any t: 1,2,---,T. Hence I C(J'J) IS IICnll S aexp(-nfi) S aexp(-j)6/T)- QED. Next, we consider the sample covariance of the innovation process e(t). For any positive integers t, s and positive real number b, let ['9] u(t, s; b) = Z {e(t + mT)e(s + mT) — Ee(t)e(s)} (3.36) m=0 where [b] as before denotes the integer part of b. Lemma 3.2.2 Let {Mn} be a sequence of increasing , non-negative random vari- ables and {An} be an increasing sequence of real numbers. If An —) 00 and E'(Mn) = 0(An), then Mn 2 0(An ln An(ln 1n An)1+5) for any 6 > 0. 35 Proof. For a given 6 > 0, let A(n) = AnlnAn(lnlnA,,)1+5. Without loss of generality, assume that E(Mn) S CA" (3.37) for some constant c > O. For j Z 1, let n, = inf{n Z 1:1nAn > j}. (3.38) Then n,- increases to 00 as j —+ 00. It follows from Markov’s inequality, (3.37) and (3.38) that P(Mn, > A(n)” S Since 23:3 “Tip“ < oo, Borel-Cantelli Lemma implies that P(Mnj > A(nj),i.o) = 0. Now, for n,- S n S n,“ — 1 , we have from the monotonicity A(n) Z A(nj) and Mn S Mn. 1+1-1 Then almost surely, for sufficiently large n, M" Mnj+l A(nj'f'l) (339) Since , ' ' 1+6 ' 1 J-too A(nj) J—mo exp(])_71+5lnj (3.39) implies M, = 0(A(n)) = O(A,, lnAn(lnln A,)1+5). Since it is true for any 6 > 0, O can be replaced by 0. The proof is completed. QED. 36 The following lemma is needed to prove our theorem and is of interests of its own. Lemma 3.2.3 Under Assumption 3. 0.1, for any constant (1 > 0, t t' limsup max |u(, ,n)| n—too ItISdlnn V 2nln lnn t o limsup max M- < V2, a.s. (3.41) 1).—+00 |t|.lsISdln mess \/2n ln 1n n ‘ Proof. The proof is an application of Theorem 3.1.2 and Lemma 3.2.2 with 00, a.s. (3.40) some computation. We only need to prove the lemma for t S 3 since u(t, s; n) is symmetric in t, s. For a fixed |t| < dln n, let no be the integer such that to = t — 710T E[1,T] First notice that Theorem 3.1.2 implies that for any fixed to, n 2 _ ,imsuplz =.0. The lemma now follows. QED. Proof of Theorem 3.2.1 Without loss of generality, assume N = nT. Then by (3.1) n—max(t,s)/T RN(t, s) = n—1 : x(t + mT)x(s + mT) (3.46) m=l fort: 1,---,T.,and s=0,1,-~,N—t—1. Since both RN and R are symmetric and periodic, we only need to prove the theorem for t = 1, ..,T. and t S s S t+ dlnn, and Let Q" = W. Notice that from (3.35) and the orthogonality of {e(t)}, oo R(ts) = z c(j,t)c(k.s)cs._.~,._, (3.47) J'.k=0 Then it follows from (3.46) and (3.47) that RN(t, s) — R(t, s) = n—1 it): c(j,t)c(k, s) j,k=0 n—s/T x Z [e(t + mT — j)e(s + mT — k) — 6,_j,3_k] m=0 +n — [n - S/T] i c(j,t)c(k,8)61—j,s—lc j,k=0 n 38 The sum in the second term is finite by Lemma 3.2.1, thus the second term is obviously 0(n’1 1n n) uniformly in s S dln. Denote the first term by Wn(t, 3). Then w, (t, s)— _ n 1 Z c(j,t) c(k ,s)u(t —j,s — k, n— s/T) (3.43) j, k=0 where u(t, s; x) is defined by (3.36). Next, truncate the sum in (3.48) at j, k S dln n. Denote by Zn(t, s) the truncated sum, i.e. dlnn Zn(t, s) = n"1 E c(j, t)c(k, s)u(t — j, s — k; n — s/T) (3.49) j,k=0 Then it follows Lemma 3.2.3 max max |u(t—j,s—k;n—s/T)|=O(\/nlnlnn). 0dlnn Applying Markov’s inequality, it is easy to see [)(OSsInax tSdlnnn '2 ZC(j,t)C )’Ll(t—j,S—' k;n—S/T)| >Qn) j=d In 11 k=0 39 0000 S Q;2n‘2 Z E| Z Z c(j, t)c(k, s)u(t — j, s — k; n - s/T)|2 OSs—tSdlnn j=dlnnk=0 (3.52) We have proved in (3.45) E[u(t, s; i)|2 S c(i + 1), then Holder’s inequality implies for Vt,, nj,i = 1, - - - ,4, j = 1, 2, E|u(t1, t2; n1)u(t3, t4; n2)] S c\/(n1 + 1)(n2 + 1). This together with Lemma 3.2.1 implies that the expectation in (3.52) is dominated by oo . 00 f _2 d n+1c cg,t2 ck,szS n+1n 72. where 71, 72 are the same as in proposition 2. Then (3.52) is bounded by (dln n)2Q;2cn'1“272d = 0(n‘2). It follows that 213(1va > Q”) < oo. Borel-Cantelli lemma implies [1,” = 0(6),) a.s Similarly, we can prove that 12,1: = 0(Qn) “-3- (3.51) is established now and the proof is finished. QED. 40 3.3 Convergence Rate of Coefficients Solution of Yule-Walker equations provides estimators for the coefficients. If the orders p(1),p(2),- - - ,p(T) are known, then we will have no dificulties to show, using the results in last section, that these estimators are consistent and have the same convergence rate as the sample covariances. Since the orders are unknown, we need a little extra work and notations get complicated. To make our statements clearer, we will define random inner product which will simplify our statements. Let L262, F, dP) denote the Hilbert space of random variables with zero means and finite second moments. Then {x(t), t E Z} is a set in this Hilbert space and Yule—Walker equations are just normal equations of projection. We want to use this convenience of projection even when the covariances R(t, s) in Yule-Walker equations are replaced by the sample covariances RN (t, s). For this purpose, we introduce random inner product. Let X denote the the subset {x(t), t 6 Z} of L2(Q,F,dP). For each integer N, let < -,- >N () be a map from X x X X (2 to the set of real numbers such that < x(t),$(8) >N (w) = R~(t, 3)(w)o We can not yet say < -, . > (w) is an inner product for a fixed w. But for a given finite sequence of integers to,t1, . . .,tm and a fixed (1), (RN(tj,tk);-’,‘k=o is positive definite for sufficiently large N. So for such a N, < -, -, >N (at) can be regarded as an inner product on a linear space spanned by x(to), . . . ,x(tm) such that < x(tj),x(t;,) >N (w) = RN(tj,tk)(w). We will suppress w in the inner product and write it as < ., . > N. The correspond- ing norm will be denoted by H.” N. 41 For the sake of convenience and unity of notations, let < ., . >00 and [|.||co be the inner product and norm in L2(Q, F, dP), i.e. Ex(t)x(s) < x(t),x(s) >co ”x(t)“... = 3132(0- Then for any t, s, almost surely, Iii—130 < x(t),x(s) >N=< x(t), x(s) >oo . For each N = 1, 2, - . - ,oo, denote by ProjN[x(to)|x(t1), - - - ,x(tm)] the projec- tion of x(to) onto the subspace spanned by x(tl), - - - ,x(tm) under the ||.||N. Let )0 PTOle$(t) l x(t - 1), - -- ,~'I:(t - 19)] = Eat/(j. t;p)$(t - 1'). (3-53) j=1 aN(j, t;p),j = 1, . - . ,p are actually the solution of I‘~(t;p)a~(t;p) = R~(t;p), (3-54) where I‘m/(tun) = (RMt - it - k))§,k=1 (3-55) RN(t;P) = (RN(t_1)t))"°:RN(t —P,t))’ (3-56) a~(t;p) = (a~(1,t;p), - - - ,a~(p, t;p))' (13-57) For N = 00, the above equations are just the Yule—Walker equations we dis- cussed in Chapter 2. Let l(.) he a periodic function from N to N with period T. l (t) may depend on sample and serves as an estimator of p(t). Choose a dominating function L(N) from N to N and assume the following throughout the rest of this section. 42 Assumption 3.3.1 L(N) increases to 00 and L(N) = 0(ln N) Theorem 3.3.1 Let {x(t),t E Z} be a PAR with order p(l),---,p(T). Then under Assumption 3. 0.1 and 3.3.1, mp | a~(j,t;l(t)) — a.. 0 such that for any t and q, |]I“1(t; q)” S M. Proof. We note that for any positive definite matrix F, ”F“ is less than or equal to the maximum eigen value of I‘. Since F‘1(t; q) is positive definite, we only need to show that the eigenvalues of F‘1 (t; q) is bounded from above, or equivalently, all eigenvalues of I‘(t; q), for any t = 1, 2, - - - ,T and q 2 l, are no less than a positive number A. Let Aq be the minimum of all eigenvalues of P(t; q) for all t = 1,2, . - - ,T. Evidently, Aq > 0. It suffices to show that min 02 (t) 1 + max, 25:] 0(1', t)2 . Ag“ 2 min(Aq, (3.58) 43 Let (x(t) \ x,(t)= ‘3‘“) mm \$(t-q)) Then Xq+1(t) = $(t) xq(t " 1) For any vector CH, of q + 2 dimension such that llC9+1ll2 : 11 C Cq+1 = 1 C9 where Cq is a (q + 1)-dimensional vector. Using (1.2), we get we write it as C;+1Xq+1(t) = ”(t) + (cdp(t) + Cpllpr): where ('1'? = (a(1,t),a(2,t), . - - ,a(p(t),t),0, - - - ,0). The orthogonality of c(t) with x(s), s < t together with the definition of Ag imply C;+1P(t; q + 1)Cq+1 = IICQHXQHUHIZ = c202(t) + ”(05120) + Cp)’Xp(t)||2 2 0202(1) + A,||ea,(t) + qul2 2 620205) + Aq|62||5q(t)||2 - ||qu|2| Since c2 + IIqul2 = 1, and (ax + Abe — 1]) = min(A,a/b),Va > 0,A > 0,b > 1, inf OSxSl 44 we have C’ I P(t'q + 1)C , > min(A 2(t) ) q 1 , q l q, llaq(t)ll2+1 . (3.58) now follows. QED. Proof of Theorem 3.3.1 For brevity, we omit l (t) in FA), RN and aN. Observe that I + I‘;1(t)(I‘N(t) — I‘oo(t)) (am(t) — GNU» = I‘:(t)[R..(t) — R~ + (F~(t) — r...)a..(t)) (3.59) Theorem 3.2.1 and Lemma 3.3.1 imply that the maximum absolute value of the entries of F;1(t))(I‘N(t) — Foo(t)) is 0(‘/M13—N). Thus ll I‘;.'J(t)(1‘1v(t) - Poo(t)(aoo(t) - a~(t))ll _ 0( 1"]3’" )12(t))|a..(t)-a~(t)n (3.60) = 0(1)|laoo(t)-a~(t)ll (3.61) Similar argument proves RHS of (3.59) is lnlnN N Also notice that every 0(1) and 0(1) appeared above is uniform in t and functions I such that l(t) S L(N). Then the Theorem follows from (3.59),(3.61) and (3.62). 0( )(1, 1, - -. ,1)'. (3.62) 3.4 BIC for Order Estimation For stationary AR(po) model, Akaike(1977) first proposed to estimate p0 by 15 which minimizes 1nd; +plnN/N 45 Here 6: is the estimate of 02 from the Yule-Walker equations of order p. An—Chen—Hannan (1982) proved BIC estimator is consistent under general con- ditions. In this section, we develop similar criterion for PAR models. It turns out that BIC is included as a special case. Let x(l), x(2), - - - ,x(N) be observations from a PAR model with order p(). Let RN(t, s) be defined by (3.1). We will follow the notations in the last section. Let amt; 1) = ”x(t) — Proms) I x(t — 1), - - - ,x(t — 1))“; (3.63) Let q(N) be a sequences of positive integers such that , lnlnN _ . q(N) _ 1313,13» q(N) — 0, and [[1130 — — O. (3.64) Let p(t) = pN(t) minimize lnai,(t;l) + l - q(N)/N, V0 S l S L(N). (3.65) Then p(t) is a consistent estimator of p(t) under general assumptions. Theorem 3.4.1 Let x(t) be PAR satisfying Assumption 3.0.1 and L(N) satisfy Assumption 3. 3.1. Then for any t, almost surely p(t) —) p(t), asN —> 00. Proof. Since RN(t, s) —> R(t, s), a.s, then ll 2:33 cj$( tilllN —) “213013 t1)”; (3-66) for all real c1, - - - ,cm and integers t1, t2, ' - - ,tm. As a special case of (3.66),we have 012v(t;l) 3 02.9; l) = “x(t) — Fromm) Im 0300.120» (370) Thus 03w) 2 amps) — 1) > amps» (3.71) It follows that for any I < p(t), . q N <1Vlgn°o(ln012V(t,l)+l(—l). M) N N 13320011 Uzi/(15,190)) + p(t) 47 This inequality implies that asymptotically W) Z p(t) (3.72) Using (3.69) repeatedly and applying Theorem 3.2.1, we get for l > p(t), viva, 1) _ 012v(t,P(t)) l = Z amtm)=(z—p(t))0(ln1nN/N) j=p(t)+1 Since 1&3?» 012v(t;l) = gigglvfimfifl = 030(t;p(t)), then for sufficiently large N, 012v (t, l) _ _ 0737:1017» 2 (1 It follows from (3.73)-(3.74) that 01%; l) 1“ 0?»;(t_;p(t)) ln 01%,(t, l) — ln afv(t,p(t)) = (l —-p(t))0(lnlnN/N). where the 0(1) is uniform in l S L(N). The assumption on q(N) and (3.75) imply that min [mom I) — mammt» + [z — p(t)]q(N)/N] > 0. p(t)