llllHWIIIHIIIHIHIHIIll!JillNillflllllllllllllllllllll

THESIS 31293 01417 2690

I

This is to certify that the

dissertation entitled

0N PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY
MODELING AND PARAMETER ESTIMATION

presented by

Hao Zhang

has been accepted towards fulﬁllment
of the requirements for

Doctor degree in Statistics

 

V. Mandrekar %an&<§kﬁ2b

Major professor

Date JuneﬁlO. 1995

MS U is an Afﬁrmative Action/Equal Opportunity Institution 042771

A s, ———_— _.

 

LIBRARY
Mlchlgan State
University

 

 

 

PLACE N RETURN BOX to remove this checkout from your record.
TO AVOID FINES Mum on or baton duo duo.

DATE DUE DATE DUE DATE DUE

        

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

msu lo An Afﬁrmative Acuavsquu omuy It'lotltulon
“WMJ

 

 

ON PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY
MODELING AND PARAMETER ESTIMATION

By

Hao Zhang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1995

ABSTRACT

ON PERIODIC AUTOREGRESSION: MAXIMUM ENTROPY
MODELING AND PARAMETER ESTIMATION

By

Hao Zhang

We study a special class of periodically correlated time series for which the best
linear predictor of a:(t) based on all past information , denoted by 5:(t), depends

on at most p steps back, i.e., there exists an integer p such that
5:05) = P7‘OJ’($(1¢)III=(t - 1), - - - ,w(t - 10)),Vt-

We call such a periodically correlated process a periodic autoregression (PAR). A

PAR is equivalent to the following time domain model:

t
W) - 200', t)5'3(t - j) = 0(t)€(t),
3:
where p(t), 0(t), a.( j, t) are all periodic in t and 6(t) is the innovation process.
We ﬁrst show that Burg’s maximum entropy principle can be generalized to
periodically correlated case and the generalized principle results in a PAR.
We then study estimation problems in a PAR model. We show that the Yule-
Walker equations provide consistent estimators for the coefﬁcients a( j, t). We

also give a uniform convergence rate of the estimators. Finally, We generalize

Akaike’s Bayesian Information Criterion to give consistent estimation for the orders

p(1),p(2), - - - ,p(T)-

To my wife and my parents

iii

Acknowledgment

My deepest thanks go to my advisor, Dr. Mandrekar for his guidance and
encouragement. His patience and thoughtfulness have made my stay much easier.

I also thank my thesis committee members, Drs. LePage, Salehi and Sledd for
their time and interest.

I appreciate the support of the Department of Statistics and Probability in
the last ﬁve years when I experienced growth in research, teaching and statistical
consulting. Never forgotten will be the numerous discussions with many faculty
members.

Finally and importantly, I thank my wife, Hong, for her support and encour-

agement, my parents and sisters for their long-time support.

iv

Contents

Introduction
1 Periodic Autoregression

2 Maximum Entropy Modeling of PC Time Series
2.1 Introduction ................... . ............

2.2 Maximum Entropy Modeling of PC Time Series ...........

3 Parameter Estimation in PAR Model
3.1 Preliminary Results on Martingale Differences ............
3.2 Convergence Rate of Sample Covariances ...............

3.3 Convergence Rate of Coeﬁcients ...................
3.4 BIC for Order Estimation .......................

13
13
14

22
23
34

Introduction

Time series analysis is one of the ﬁelds that have attracted interests of proba-
bilistists, statisticians and researchers from economics, engineering, social sciences
and other areas. Stationary time series has been studied extensively because of
its application in many ﬁelds and the adequate mathematical tools to handle it.
One of the most important class of stationary time series is ARMA models which
are widely used in applications. Problems studied for ARMA models are param-
eter estimation, spectral estimation and prediction. AR models, as special cases
of ARMA models, received more attention since stationary time series can be ap-
proximated by AR models (An, Chen and Hannan, 1982).

What makes AR models even more interesting is the application of informa-
tion theory in the study of stationary time series. Burg (1967) used the idea of
maximizing entropy in spectral estimation. He showed this approach results in
an AR model. Burg’s maximum entropy principle is better justiﬁed by Parzen
(1983) and has been widely used today in spectral estimation. Akaike (1974) ap-
plied information theory to develop the well-known Akaike’s Information Criterion
(AIC) for order estimation in AR models. AIC tends to overestimate orders and
yields inconsistent estimators (Shibata,1976). To get consistent estimators for the
order in an AR model, Akaike (1977) later modiﬁed AIC to Bayesian Information
Criterion (BIC). An, Chen and Harman (1982) proved that BIC gives consistent

order estimation.

Although stationary time series describe many phenomena well, there are situa-
tions when data exhibit non-stationarity. Efforts to study non- stationary processes
can be found constantly in literature.

One natural generalization of stationary process is Loéve’s harmonizable pro-
cess. Spectral domain problems for harmonizable processes are studied by Cramér
(1961). Cramér (1964) also showed that time domain problems are difﬁcult for
harmonizable processes. So far, no time domain model is discussed for general
harmonizable processes.

The effort is thus concentrated on non-stationary process for which both time
domain and spectral domain approaches can be applied. The major example in
this direction is periodically correlated (PC) processes, which are harmonizable
as shown by Gladyshev ( 1961). After the early work of Gudzenko (1959) and
Gladyshev (1961), several authors have studied the Kolmogorov-Wiener problem
(Miamee and Salehi,1980, Hurd and Mandrekar,1991).

Motivated from applications (e.g., Bloomﬁeld et al, 1994, Gardner, 1986), Hurd
(1989), Hurd and Gerr (1991) studied some inference problems for spectral measure
of PC processes. Time domain models have been also studied for PC processes.
Pagano (1978) introduced periodic autoregression (PAR). Anderson and Vecchia
(1993) studied periodic ARMA model and gave the asymptotic properties of sample
covariances. Adams and Goodwin (1995) studied on-line parameter estimation for
periodic ARMA models.

We study PC time series systematically by ﬁrst showing that the analogue of
Burg’s algorithm holds for PC time series. For this purpose, we introduce a slightly
different deﬁnition of PAR. From this deﬁnition, we can easily show that PAR

satisﬁes maximum entropy principle. Our deﬁnition reveals an intrinsic property of

PAR models and overcomes a technical difﬁculty encountered by Pagano’s. These
are all discussed in Chapter 2.

We then consider parameter estimation for PAR models in Chapter 3. Here,
we study estimation of coefﬁcients in PAR models. We give convergence rates for
these estimates by ﬁrst studying the convergence rate for sample covariances. Then
we apply these rates to give consistent estimators for the order of PAR models.
Our work generalizes Akaike’s BIC order estimation to PAR model. Numerical
results are shown at the end of Chapter 3.

Throughout this thesis, we assume all sequences of random variables have zero
means. Let "A“ denote the sup-norm of a real matrix A and Z denote the set of

integers.

Chapter 1

Periodic Autoregression

A sequence of real valued random variables :1: = {$(t), t E Z} is called a periodically
correlated if for some integer T and any t, s,
E$(t) = Ea:(t + T),
Cov(:r(t), 3(8)) = Cov(a:(t + T), :1:(s + T)).
We always assume that Ea:(t) = 0 for all t. Then the deﬁnition is equivalent to
that there exists a unitary operator U such that for some T,
a;(t + T) = Ua:(t), Vt.

One chooses the minimum T as the period of the PC sequence. The best linear
predictor of :z:(t), given 316(3), 3 < t, is the projection of a:(t) onto the closed space
spanned by 31(3), 3 < t (Kolmogorov, 1941). Hereafter, we denote this projection
by

Proj(a:(t)|a:(s), s < t).

From the deﬁnition of PC sequences, we see that if

Projwnzc), s < t) = fauna — j),

i=1

then
Proj(:1:(t+T)|ac(s),s<t+T)= Zc-( t(+T— j.)

In other words, the coefﬁcients cj(t) are periodic m t for every j Z 1. Whereas for
general non-stationary sequences, nothing can be said about cj (t).

From the point view of prediction, the simplest class of PC sequences are those
for which prediction depends only on ﬁnite history. This leads to the following

deﬁnition.

Deﬁnition 1.0.1 A PC sequence is called a periodic autoregression (PAR for ab-

breviation) if there exists an integer p > 0, such that for any t
at) = Proj<x<t>|x<t — 1),x(t — 2), - - ',:r(t —p)>.

We do not consider the case :i:(t) is zero which is less signiﬁcant in view of

prediction. Then for each t, there is a smallest positive integer p(t) satisfying

53(t) = Pr0j($(t)lm(t - 1),$(t - 2), - ° - ,$(t - p(t)»-

We will call p(t) the order of p(t). We will show in Proposition 1.0.1 that p(t) is
periodic, then we can simply say that x(t) has order p(l), p(2), - - - , p(T). We can

write
p(t)
(t) = 20(1) 13):!C (t - j)

i=1

8)

or equivalently
p(t)

-Za(j,t) t - i) — 6(13) (11)
where 5(t) = :L‘(t) — :i:(t). Thus e(t) lS orthogonal to 23(3) for all s < t.
Let

02(t) = E|a:(t) — 52(t)|2.

If 02(t) > 0 for every t, we say that $05) is non-deterministic. In this case, we can

write (1.1) as
90)

at) — g as, me - j) = a(t)e(t>, (1.2)
where {e(t)} is a the innovation sequence, 1.e.,
6 = at) — at)
(t) 0(t) -

The next proposition says all parameters are periodic in t.

Proposition 1.0.1 If a:(t) is a non-deterministic PAR, then the functions p(-),a(-, -)
and 02(t) are unique and for any t

p(t) = W + T)

02(t) = 02(t + T)

0(1', t) = 0(1) t + T)
Proof. p(.) and a2 (.) are obviously unique by deﬁnition. Since a: is non-deterministic,
it follows that :z:(t — 1), ...,x(t — p(t)) are linearly independent and thus a(-, -) is

unique. Notice that p(t) is periodically correlated if and only if there exists a

unitary operator U such that
U$(t) = :r(t + T).
It is very easy to justify
UProj(y|S) = Proj(Uy|US)

for any y and (closed) subspace S, where US = {Um : a: E S}.
It follows that

UProj(a:(t)|:r(s), s < t) = Proj(:r(t + T)|:1:(s + T), s < t)

UProj(:I:(t)|:r(s),t —p S s g t) = Proj($(t +T)|:r(s + T),t —p S s S t).

6

We conclude from the above two equations that
p(t) = p(t + T), a"(t) = 02(t + T)
Since

then
p(t)

U§:(t) = ;a(j,t)a:(t -— j + T).

On the other hand,

p(t+T)

U5:(t) = :i:(t + T) = Z a(j,t+ T):1:(t + T — j).

i=1

From the facts that p(t) = p(t + T) and (12(3), for t+ T — p(t) g s S t+ T — 1 are

linearly independent, we obtain

a(j, t + T) = a(j, t),Vj, t.

We see that for non-deterministic PC sequences, our deﬁnition is equivalent to

(1.2) with periodic parameters, which was used to deﬁne periodic autoregression

by Pagano (1978). The deﬁnition here overcomes a technical difﬁculty. (See the

remark at the end of this section.) We are ready to give a characterization of PAR

model now. Let

£(t) = (a:(1 + tT),a:(2 +tT),. - -,:1:(T + tT) )’, v t,

R(t, s) = Ea:(t):c(s).

Theorem 1.0.1 Let a: be a PC sequence. Then :1: is a non-deterministic PAR is
equivalent to the followings.

(i) $(t) satisﬁes the following time domain model

p(t)
$(t) - Z: a(J',t):1«'(t - j) = 0(t)€(t), (13)

where €(t) is the innovation process and 0(t) > 0.

(ii) The Yule- Walker equations hold, i. e.,
p(t)
R<t — k, t) — zao, t)R(t — k. t — j) = out»... (1.4)
j=1

for any It _>_ 0, Vt.
(iii) 5(t) is T dimensional AR(p) model for

 

“[190;-

Pisa 1+1

(where [a] denotes the integer part of a.) Namely, there exist matrizes AJ- and G

and a T dimensional time series €(t) such that

5(t) — z A,-:i:'(t — j) = €(t) and (1.5)
Cov(é(t), 5(3)) = G6”, Eé‘(t):i:'(s)' = 0,Vs < t. (1.6)

Proof. It is obvious that (i) and (ii) are equivalent and are necessary and sufﬁcient
conditions for :z: to be PAR by deﬁnition 1.
Let us prove (i)=>(iii). Suppose a: is PAR. Then a: has the time domain expres-

sion given by (1.3) which can be written in vector form

801:“) —:Bj$ =D€(t) (1.7)

where p is deﬁned in the theorem and D=diag(a(1), ..., o(T)), B, = (bj(k, l))Z:,___1

for

bo(k,l) = 6k,l—a(k—lil)X{k>l}

bj(k1l) : (1(Tj + k _ 1’ k)X{Tj+k—lgp(k)}a 1 S j S p

Let €(t) = (6(1 +tT), 6(2 + tT), - - ~ ,e(T+ tT))'. Then it is obvious that €(t) 1 55(3)
for any 3 < t. Let

A,- = 80—18,, €(t) = 130-119;.
Then €(t) satisﬁes (1.6) with G = B51D2(Bo-1)’ and (1.5) follows from (1.7).

(iii)=> (i): Since G is positive deﬁnite, it has the following Cholesky decomposition
G = LHL'

where L is lower triangular with all diagonal entries being 1 and H is diagonal and
non-singular. Multiplying (1.5) by L“, we get
9
L-lsa) — Z L"1A,-:I:'(t — j) = L-la
j=1
Since L‘1 is also lower triangular and Var(L’1é) = H is a diagonal matrix, the

scalar form of the above equation will yield (i).

QED.

Corollary 1.0.1 A nondeterministic PAR is purely nondeterministic.

Proof. Let a: be a nondeterministic PAR. Then the corresponding multiple se-
quence 53' is a stationary AR. It is well known multiple AR is purely nondetermin-
istic. In [19], and [22], it is proved that :1: is purely nondeterministic if and only if

53' is so. Thus the proof is completed.

QED.

Remark. A stationary AR(p) model is deﬁned in literature (Hannan, 1973) as a

second order stationary sequence satisfying
:0
$(t) — Z"? 01$(t- j) = 0603),
for some constants a > 0, aj such that
:0
|1 — Zajzjl 75 0, for |z| S 1.
j=l
and a white noise {e(t)}. The constraint for the coefﬁcients is a necessary and
sufﬁcient condition for the existence of a solution of stationary sequence (see,
e. g., Hannan, 1970). Analogously, we need to know constraints on the coefﬁcients
a( j, t) to guarantee a PC solution of (1.3) exists. We can give the constraint in
two equivalent ways. We note that (1.3) with periodic parameters has a non-
deterministic PC solution if and only if {p(t),02(t),a(j, t), j = 1,---,p(t),t =
1, 2, ~ - - ,T} uniquely determines R(t, s) for It — s] g p(t) such that

R(t, s) = R(s, t) = R(t + T, s + T)
and for any t = 1,2, - - - ,T, the matrix
P. = (W — it — with (1.8)

is positive deﬁnite. The necessity is obvious and sufﬁciency follows from Theorem

2.2.1 in the next chapter.
We also see from Theorem 1.0.1 that (1.3) has a PC solution if and only if

P
det(I — Z Ajzi) 7e o,v |z| g 1. (1.9)

i=1

10

Indeed, (1.9) implies there is a stationary solution of (1.5)(Hannan, 1970, page

326). The corresponding scalar sequence must be a PAR satisfying (1.3) by Theo—

rem 1.0.1 It must be non-deterministic since a(t) is positive. We use the technique

of Whittle (1963) to show the other way. Suppose that (1.3) has a PC solution.

Then the corresponding vector-valued stationary sequence satisﬁes (1.5) and (1.6).
Deﬁne the pT dimensional random vector

/ 5(t) )

EU—l)

 

 

( :‘E(t — p + 1) f
Project Yt such that

Y: = PYt—l + Z, Z, _L Yt-l
and P is the projection matrix. We see from (1.5) that
(Al A2 A,,_1 AP)
I

 

 

(....1)

Zt = (é(t)’,0, ...,0)’.
Let /\ be an eigenvalue of P and E be the corresponding left eigenvector. Then
{P = A5. (1.10)
Observe that Cov(Yt) = Cou(Yt_1) is positive deﬁnite , then

Eléztl2 = EIEKP -- EIEPYHI2
= £Var(K)€’(1 — IAIZ) 2 0.

11

it follows that IAI S 1 with equality if and only if {Z = 0. Because of the special
form of P, ( 1.10) implies that E and {0 must be 0 together where {0 is the ﬁrst T

entries of 6. Since Var(é't) is non-singular, for non-zero E,

EZt 2 £053 74 01
So all the eigenvalues of P have modules less than 1. Now suppose 2 satisﬁes

p
det(I — ZAJ-fj) = 0.

i=1

1

It sufﬁces to prove z‘ is an eigenvalue of P. There exists an row vector 61 6 RT

such that
51(1 — f: 14,51): 0. (1.11)
":1
Let J
5,- : 25,--1 — £1Aj_1, j: 2, ...,p. (1.12)
Set
5 =(€1,...,€p).

Notice (1.11) and (1.12) imply immediately
5 P = 25.

(1.9) now follows.

12

Chapter 2

Maximum Entropy Modeling of
PC Time Series

2.1 Introduction

The entropy of a random vector in R" with probability density function f (:13) is
deﬁned as
H(X) = -E1nf(X) = — l... f(:v)1nf(x)d$-

Burg (1967) developed a maximum entropy approach for spectral estimation of
stationary time series which has been widely used since then. Burg’s approach can
be stated in the following way. Suppose p+ 1 autocovariances R(O),R(1), - - - ,R(p)
of a stationary sequence are known (usually estimated from observations). Instead
of taking R(n) to be 0 for all n greater than p, as in windowed spectral estimation,

we extrapolate R(n) for n > p in such a way that maximizes the entropy
H(CL‘(t), $(t — 1)1' ' ° ,£E(t — 71.)),
for all n > p.

13

It turns out that the only such extrapolation is given by Yule-Walker equations,
thus this maximum entropy method results in an AR model.
We consider here the same question for PC sequence. Suppose for each t =

1, 2, - - - ,T, we know the covariance matrix of

(~73(t),:v(t - 1), - - - ,$(t —pt))

for some integers pt > 0. Because the time series is PC, we do not require that the
pt’s are the same. We will extrapolate the covariance function in such a way that

maximizes the entropy
H(:v(t),:v(t — 1). - - - .xa — s»

for all s < t. Problems we will consider are
( 1). whether there is a PC solution to this maximizing problem and
(2). the properties of such PC sequences which maximize the entropies.
We will prove in the next section that there is a unique Gaussian PAR sequences

which maximizes the entropies.

2.2 Maximum Entropy Modeling of PC Time
Series

To avoid the ambiguity of saying part of covariances of a sequence is known without
knowing such sequence exists, we state the problem in a more mathematical way.

Let p1, p2, - - - , pT be positive integers and r(., .) be deﬁned on the set
A = Uf=1{(u,v) E Z x Z :t—pt S u,v S t}.

We assume that

Pt = (TU _j1t_ k))§,tk=o

14

is positive deﬁnite for all t = 1, 2, - - - ,T and
r(s, t) = r(s + T, t + T) = r(t, s) (2.1)

whenever (s, t), (t, s) and (s + T, t + T) are in A.
These assumptions are seen necessary for r to be a covariance function of a PC
time series. Let [C be the set of all PC time series with period T whose covariances

are r(t, s) for (t, s) E A. The next theorem says IC is not empty.
Theorem 2.2.1 There is a non-deterministic Gaussian PAR in IC.

Proof. Since Ft is positive deﬁnite, the equations
Pt
r(t _ k3 t) — 20(j1 t)7“(t - kit — j) : 6k,002(t)1 for k = 0111' ° '1pt1 (2'2)
i=1
have unique solution a(1,t),a(2, t), - - - ,a(pt,t),02(t) and 02(t) > 0. These Yule-
Walker equations actually provide a way to extend r(t, s) to be a covariance func-

tion of a PC time series. But we will adapt a statistical approach here.

Let i E {1,2,---,T} be such that
i—p,St—ptfoth=1,2,---,T.

Then there are Gaussian random variables r(t) of 0 mean, for i — p,- S t S i — 1,
such that

Ea:(t):1:(s) = r(t, s), for i —p,~ S t,s S i — 1.
Let €(t), t Z i be a sequence of i.i.d standard normal random variables and also
independent of {x(t),i — p,- S t S i — 1}. Deﬁne, for t 2 i,

p(t)
ac(t) = Z: 0(3) t)$(t - j) + 0(t)€(t)

15

where p(t),a(j, t) and 0(t) are the periodic version of pt,a(j, t) and 0(t) respec-
tively.
This deﬁnition together with (2.2) yield

Ea:(t):r(s) = r(t,s), for (t,s) E A.
We now show that r(t) is PC, i.e.,
E:r(t + T):1:(s + T) = Ea:(t):1:(s), (2.3)

for Vt, s _>_ i — pi. We observe that (2.3) is true for i -— p,- S t, s S 0, because of
(2.1). Assume it is true for i — p,- S t, s S n. Replacing r(n + 1) by the deﬁnition,

we have for t < n+1,

p(n+1)
Ecr(t):1:(n + 1) = Z Ea:(t):r(n + 1 — j)
j=1
p(n+1)
= 2 Ed?“ + T):13(n + 1 + T) (by the induction assumption)
j=l

= Ea:(t + T)a:(n + 1 + T)
Similarly, we can prove
Ex2(n +1) = E$2(n + 1+ T).

Thus (2.3) is true for t, s S n + 1. So we have proved (2.3).
Let
r(t, s) = Ea:(t):r(s)

for all t, s 2 i-p,. Then r(t, 3) still satisﬁes (2.1). Now we extend r(t, s) to Z2 by

r(t — mT, s — mT) = r(t, s),Vm Z O.

16

For any m S n, the matrix
{r(t,s) : m S t,s S n}

is positive deﬁnite because of the periodicity of r(t, s) and the fact that {r(t), t Z 0}
are linearly independent. So there is a Gaussian sequence with r(t, s) as covariance
function by Kolmogorov’s Theorem. This sequence must be a PAR by Theorem 2

and non-deterministic since 02(t) > 0.
QED.

This Gaussian PAR must be unique in distribution. It might have orders p(t) S pt
for t = 1,2, - - - ,T because a(j,pt) might be zero. But the orders are uniquely
determined by I‘t’s.

The next theorem says that it is the one that maximizes entropy.

Theorem 2.2.2 Let r(t) be a Gaussian PAR in IC, then for any 3 S t,
H(Iv(t),rv(t - 1), .--,x(8)) = 313 lib/(13).“t - 1), ---,y(8)) (2.4)

where the supremum is taken over all sequences Y in K: for which the entropies in
(2.4) can be deﬁned.

Conversely, if a PC sequence y(t) in IC satisﬁes (2.4), then y(t) is a Gaussian
PAR.

Remark. The problem we considered here is more general than assuming R0, R1, - - -

the covariance of a vector-valued stationary sequence, are known. Actually, the
later is a special case of our problem here. Thus Theorem 3 contains the maximum
entropy method for stationary vector-valued sequence as a special case. There is

a practical consideration why we assume r(t, s) is known in the set A instead of

17

a square area {1 — q S t, s S T} for some q > 0. To approximate a PC sequence
using a PAR, we might choose different orders p(l),p(2), - - - ,p(T). Given ﬁnite
observations of a PC sequence, we should just estimate r(t, s) for (t, s) E A and
extrapolate it through Yule-Walker equations since smaller |t —- s| tends to give
better estimate of r(t, 3).

To prove this theorem, we need some basic properties of entropy. The following
two lemmas are known.(e.g., for Lemma 1, Choi, 1983, Parzen, 1983; for Lemma

2, Gallager, 1965, Kullback, 1978.)

Lemma 2.2.1 For any random vector Y, let X be a normal random vector having

the same covariance matrix as Y. Then
H (Y) S H (X)
The equality holds if and only if Y has normal distribution.

For two random vectors with joint probability function f (x, y), the conditional

entropy of X given Y is deﬁned as, provided it exists,
H(X|Y) = — f1uf<xly)f(x,y)dxdy.
where f (xly) is the conditional probability density of X given Y. Thus
H(X|Y) = H(X, Y) — H(Y). (2.5)

We use H (X |Y, Z) to denote the conditional entropy of X given Y and Z. H (X |Y)
can be interpreted as the remaining uncertainty of X given Y is observed. Then

the following lemma is clear intuitively based on this interpretation.

Lemma 2.2.2 For any three random vectors X, Y, Z with joint probability density

function f (x, y, 2),
H(XIY, 2) s H(X|Y).

18

with equality if and only if X and Y are independent conditionally on Z, i.e.,

f(xly, Z) = f(IEIy), a.e.

Proof of Theorem 2.2.2. We know from Lemma 1 that we should look for a
maximizer of entropies among Gaussian sequences. Let y(t) be a Gaussian sequence

in IC. Using (2.5) repeatedly, we get for any 3 < t,

H(y(t),y(t - 1), - - - 131(8))
= H(y(8)) + Z H(y(u)ly(u - 1), - - -,y(3))-

u=s+l

We see that for those terms for which 11. Z s 2 u — p(u), the conditional entropies
are known. (since y is Gaussian and the covariance matrix is known.)

Thus, to ﬁnd a maximizer is to maximize, for s < u — p(u),

H(y(U)|y(U - 1), ' ° ' ,y(s»
_<. H(y(u)|y(u —1).---.y<u — p(u))) (by Lemma 2) (2.6)
= 1‘1’(-‘I=('u)liv(u -1),---,x(y - p(u»)
The last equality is true because (x(u), x(u — 1), - - - , x(y —- p(u))) and
(y(u)|y(u — 1), - - - ,y(u — p(u))) have the same Gaussian distribution. We see this
upper bound is reached by the Gaussian PAR x(t). So x(t) maximizes the entropy.
Conversely, if a Gaussian sequence y in IC maximizes all the entropies, then the

equality in (2.6) must hold. Then, following Lemma 2,
f(y(u)|y(u _ 1)) ' ' '1y(3))
= f(z/(u)|y(u - 1), - - - , y(u — 1900))-
In terms of conditional expectation, it means
E(y(U)ly(u - 1), - - - ,y(8))
= E(v(®|y(u - 1), - - - ,y(u - p(U)))-

19

Since y has Gaussian distribution, the conditional expectation is the projection

onto corresponding space. Then the last equality is exactly

Pv“03'(y(u)|y(u - 1), - - ' ,y(8))

= Proy'(y(U)ly(u - 1), - - - ,y(u - p(U)))-
so y is a PAR.

QED

Finally, we note that MEM picks up the most random or the most unpredictable
time series. It is much clearer to state it terms of prediction. Let 02(t; x) be the

mean square of prediction error deﬁned by
02(t; x) = E|x(t) — Proj(x(t)|x(s), s < t)|2
for a sequence {x(t), t E Z} in IC.
Theorem 2.2.3 x E [C is a PAR if and only if
020%) 2 02(t;y),Vy 6 IC-

Proof. Let x(t) be the PAR in IC and y(t) be any sequence in IC, we have for any

t= 1,2,---,T,
p(t)
02(t; y) S E |y(t) - 20(13if)y(t-j)l2 (2-7)
p(t)
= r(t, t) — gag, t)r(t,t — j) = 02(t; x)

Conversely, if y(t) satisﬁes

02(t; y) = sup 02(t; 6),
(GK:

20

The equality must hold in (2.7). It follows that

p(t)

Z 0(1) L(It))"y(t - j) = Proj(y(t)ly(8), 8 < t).

3:1

So y(t) is a PAR.

QED.

21

Chapter 3

Parameter Estimation in PAR

Model

In this Chapter, we consider parameter estimation in PAR. We assume, throughout
this Chapter, that x(l), x(2), - - - ,x(N) are observations from a non—deterministic
PAR {x(t) : t = 0,:t1, i2, - - }.

Deﬁne the sample covariance by

RN(t,s)=[NT‘1]‘l Z x(t+jT)x(s+jT) (3.1)

j€D(t,s)
fort = 1,2,~-,T, s = 0,1,---,N—t—1, where D(t,s) = {j : max(t,s)+jT S N}.

RN(., .) can be extended by
RN(t13) : RN(31t)1 RN(t+kTas+kT) = RN(t13)

Then RN(t,s) serves as estimator of R(t,s). In Theorem 3.2.1, we give the
uniform convergence rate of these estimators. The solution of Yule-Walker equation
with R replaced by R N estimates regression coefﬁcients. Uniform convergence rate

of them is given in Theorem 3.3.1. In the last part of this Chapter, we generalize

22

Akaike’s BIC for stationary AR model to PAR model to get consistent estimator
of order p(l),p(2), - - - ,p(T).

The following assumption is made throughout this Chapter. Since we will
consider properties of sample covariances, we assume the moments of order more
than 4 exist and are bounded. For two sequences of real numbers an and b,,,

an m bu means an = 0(bn) and bn = 0(an).
Assumption 3.0.1 {e(t) : t E Z} is the innovation process of {x(t) : t E Z} and

E(€(t) lf-t—I) : 0:
E(€2(t) | 71-1) = 1, a.s
Z V(62(tT|f}T_T) a: n, a.s
t=1
where f} = 0{e(s) : s S t}. Suppose also that for some 6 > 0,

sup E|e(t) |4+6 < oo
tez

3.1 Preliminary Results on Martingale Differ-
ences

In this section, we give some results of sample covariances of a martingale differ-
ence. We need this to derive results for PAR model.

We will give a law of the iterated logarithm for martingale difference and apply
it to give the convergence rate of covariance of martingale difference. For stationary
and ergodic martingale difference, a law of the iterated logarithm has been given
in literature. ( See, for example,[30]). But we assume higher moments instead of

stationarity and ergodicity since we believe that the assumption of higher moments

23

being bounded is less restrictive in time series than the assumption of stationarity

and ergodicity.

Theorem 3.1.1 Let {Ym .77", n 2 1} be a martingale diﬁerence such that for some
6>0, M>0 andforanyn,

ElYn|2+6 S M

Let sf, 2 L1 E{Y,-2|.7:,-_1}. Suppose also that

2

C I 8
11m 1nf—'—l > 0, a.s.
n—-+oo n

Then almost surely
- ?=1 Y"
11m sup = 1

"40° \/2s,2, ln ln 5?,

Proof. Stout [29] proved that for martingale difference Y", the following law of
the iterated logarithm holds

 

 

 

" Y
- i=1 n
11m sup 2 1
we ,/23,2,1n1ns;~;
if 3?, = 2;, E(Y,,2|.7-'-_1) —> 00 and there exists a sequence Kn which is 75,4
measurable and goes to zero such that
Emma: > v3.» < oo, (32)
i=1
where
2 K7152:
v = .
" lnlnsf,
In our theorem, it is obvious that sf, —+ 00. Take
1
Kn = .
lnlns,2,

24

We only need to check (3.2). Since

E(Y.3X{Y.3 > v33)

3 (EIYnI2+P)v;P s M - 3,7»,
we see that the sum in (3.2) is bounded by
M i s;2_p
n=l
which is ﬁnite almost surely because lim inf sf, / n > 0. So the theorem follows.
QED.

We ﬁrst state a lemma which is needed to prove our next theorem.

Lemma 3.1.1 Let {me'mn Z 1} be a supermartingale difference with EYl = 0
and for some K 6(0, 1/2/,

311
‘/21nlnsf,

where 3?, = 2;, E {K2|E_1}. Suppose for some constants b 2 9, almost surely,

YnSK

a.s

sf, S b2,Vn.
Then for any 0 < 5 < 2,
P(supZY, > 6{2b2 lnlnb2}1/2) _<_ exp(—s1n1nb2)
"21 i=1

where ,8 = 62(1 - 6—53).

Proof. Let
c = Kb/VZlnlnbz, /\ = 6b—1V21nlnb2.

25

Then
Ac = 6K S 1.

—1/2

Since x(x ln In x) is increasing for x 2 9, we have

Y” S c, Vn.

It follows [31] (Lemma 5.4.1 on page 299) that

n A2 A
Tn = €XP(/\ZY1) eXP(“"'2"(1 + 'Z—C

i=1

193.)-
is a super-martingale w.r.t f}, and ETl S 1. Thus for any a > 0,
P(sngn > a)S1/a.
Then
P(supzn: Y,- > (5(2b2 lnln b2)1/2)
" i=1

2 P(supZYi > Abz)
" i=1

A
S P(sup Tn > exp(A2b2 - -b23(1 + 30)b2)

S exp(—A2b2(1/2 — Ac/4)) = exp(—ﬂlnln b2).

QED.

Theorem 3.1.2 If {e(t),.7:t,t 2 1} satisﬁes Assumption 3.0.1, then for any posi-

tive real number d and integer T,

 

 

. I 2":1 62(ST) — n]
11m su 5 < 00, 3.3
'Hoop V2nlnlnn ( )
lim sup maX0<t<dlnn I 28:1 €(ST)€(ST + t)] S «5 (34)
11—100 «27111111177,

26

Proof. Let
Y(s) = 62(8T) — 1.

Notice that Assumption 3.0.1 implies {Y(s), $37, 3 Z 1} is a martingale difference

and

sup E|Y(t) [Nd/2 < 00,
t

Applying Theorem 3.1.1 to {Y(s),.7:sT, s 2 1} and the fact

2 E(Y2(S)].7:3T_T) z TL,

12114831) ﬁ—ga— < 00. (3.5)

(3.3) is proved now. To prove (3.4), we need ﬁrst truncate 6(8). Let

_ 3 mi
MS) — (lnlns) lns

6(3) : 6(S)X{|e(s)l<\/:(:)I}
77(5) = ‘(S)X{Ie(s)12\/A_(5}

For a ﬁxed t > 0, let

I
f : fsT-l-tVT

8

14(8) = €(ST)E(ST + t) - E{€(ST)€(ST + t)|}'§_1}

Then {K(s),.7-';,s Z 1} is a martingale difference. We will ﬁnish the proof by

proving the following,

 

0<Itil§11cnn2|e(s(T)(sT+t)— Yt(s)| = o(\/r—1), (3.6)
limsup max IZFIYAS )| < \/2. (3.7)

n—mo 0<t<dlnln Vannlnn _

27

To prove (3.6), we ﬁrst note that

E|n(8)|2 S E|6(8)|4+6/\(S)’1‘6/2-

 

It follows
°° E |n(8)|2
< 00. (3.8)
.2}. «3
Then Kronecker’s lemma implies
2317(3)2 = o(\/ﬁ) a.s. (3.9)
3:1
and
z: wens.-.) ——- (Jo/Ems. (3.10)
s=1

We will show ﬁrst
Krtxéaixnn i |e(sT)e(sT + t) — £(sT)£(sT + t)|
— — 3:3
= 13:31ch 2 |n(sT)77(sT + t) + n(sT)€(sT + t) + £(sT)n(sT + t)|
- — 3:3

: 06/5) (3.11)

Applying Holder’s inequality and (3.9), we get

151,231,; ".2 |n(sT)n(sT + t)|

nT+dlnn
S 2 172(3) = o(\/nT + dlnn) == o(\/ﬁ) (3.12)
s=1
Since Vlnln s - ln s is increasing and

then
A1/2(s + t) _<_ A1/2(s) + 11/20:). (3.13)

28

Because {(3) S A1/2(s), (3.13) implies

n nT
Z In(sT)£(sT + t)| _<_ Z In(s)€(s + t)|

3:3 3:3

nT
S 2 |77(8)|(A1/2(8) + *1/20» (3-14)

3:3
Notice that by (3.8)

|n(8) HIV/2(8) |7)(S)l2 00 a S
2: \/3 S Z—ﬁ < , . .

8:3 3:

 

Then Borel-Cantelli lemma implies

nT n
2; ln(8)l -—- 0((/-,(—nTT—,).

Applying it to (3.14), we get

max 2:: ln(sT) €(sT+t)l= am) (3.15)

1StSdlnn

Similarly, we can prove that
1331‘ Z |€(8T)n(sT + t)! = 0M?) (3-16)
— " n”51:3

Now, (3.11) follows from (3.12), (3.15) and (3.16).

Using the same approach, we can prove that

max 2 [E(( s((T)e (sT + t) — £(sT)§(sT + t)|}:_1)| = o(\ﬁi). (3.17)

1StSdlnn.‘J

(3.6) follows (3.14) and (3.17). Next, we will use the exponential inequality in
lemma 3.1.1 to get (3.7). Let us now investigate sum of the conditional variance

of Yt(s).

max ZIE{€( ST) {(sT+t)I}-;_1}I2

lStSdlnn:J

29

S max ZIHST) )2|E{n(sT+t)|.7-"_ 1H2

1<t<dlnn
nT+dlnn
S WIT) Z; E(nz(8)|f§_1)
= o(\/7—t)o(\/nT + dlnn) = 0(n) (3.18)

Here we have used (3.10). Since

E(62(sT+t)|FsT+t_1) = 1, a.s andf;_1CJ-'3T+t-1, then
E(62(sT+t)If;_1) = 1.

Consequently,
2:331:7(52(sT):(sT + ma.)
=:€2(8T E(() 2(8T + t)|J-"_ )) (3.19)
Applying (3.10) and the fact that
62(sT) 3 AW).
we get
..ea.§€2<sT)E<n2<sT = ﬁll-11-1)

g A(nT) 2": 12(27on + t)|f;_.)

3:3

<(/1n,nT.1.-o(\/’)= o(n ) (320)

Notice also that (3.3) and (3.9) imply

 

:52 (5T): -Zl(€ (6—2(8T) 772(3T))
— —n + 0(n) (3.21)

30

Then (3.19), (3.20) and (3.21) yield that uniformly in 1 S t S dln n,

i E{€2<sT)s(sT +01%.) = n + om). (322)

3:1

It follows from (3.18) and (3.22) that uniformly in t S dln n,
2 _ n _
s.<n) — ZVar{K(s)le_1}— n+o<n). (323)
3:1
Notice that from the deﬁnition of Yt(n),

mm) s 2A<nT>A<nT + t)

= n st (n)
Kt( ) lnlns?(n) (3.24)
where

_ nT nT + t 1/4 st(n) _1 1
K101) _ 2(ln ln nT lnln(nT + t)) lnln s?(n)) (1n nT) ln(nT + t) '

 

 

Then by (3.23)
lim K,(n) = 0. (3.25)

1}.—NI)

Choose 0 < K < 3?, let
1748) = Yt(3)X{K¢(s)SK}-

 

 

 

Then ~
, max, Zn=11’t(s) . max; 22:1 Yt($)
3 = . 3.26
113183.31) (2n ln 1n n)”2 11:33:11 (2" In 1n n)”2 ( )
We will prove that ~
lim sup max: 25:1 ”(8) S "y(K), d (3.27)

n—mo (2nlnlnn)1/2
where y(K) E (0, 51471) is the unique root of [3(x, K) — 2 = 0 and )6(x,K) =
x2(1 — %). Since 3(2):, K) is decreasing in K, we see that y(K) decreases to x/2

as K goes to 0. Then (3.26) and (3.27) yield

. maxt 23:1 Yt(3)
11313.3" (2nlnln mm 5 ‘5' (3'28)

31

Apply this results to the martingale difference {—Y,(s ), .7” ,s > 1}, we get

 

. . mind); Y,(s)
> .
11,33”: (2nlnlnn)1/2 _ —f. (329)

Now (3.7) follows (3.28) and(3.29). So it is enough to prove (3.27). To reach that
goal, let p > 1 and deﬁne the stopping times

rm, = inf{n Z 1, s?(n + 1) Z pzm}

Let

_ - 2m
——1nf{n Z 1, Krtrégxnns, (n + 1) _>_ p }

Then Tm S Tm, for any t. (Notice that Tm is not a stopping time.) Then for

O<6<oo,

3.1:}:ij )> 6J2nlnlnn,i.o)
P(BSItnSdlnns

 

S P(mgup+1 351:3an Z( Y,(s )> 6\/('rm + 1) lnln(rm + 1),1.o mm) (3.30)
Since we have proved

lim n‘1 max sf(n) = 1,
"400 lStSdlnn

then almost surely for sufﬁciently large m,

2m—1

p < Tm < p2m+1

Using this inequality and the fact that

 

 

\/p2m—1 ln ln p2m-l > p-2 \/p2m+2 ln ln p2m+2,

for sufﬁciently large m, the probability in (3.30) is less than

 

P({ sup ax217,03>6p'2\/p2m+2lnlnp2m‘1,i.o in m)}) (3.31)

"ST m+1 3SItnSdlnn8

32

It is easy to verify for a ﬁxed t, the martingale difference {Y,(s)X{s S Tm+1,,}, f}, s 2
1} satisﬁes conditions in Lemma 3.1.1 with this b2 = pm”.
From Lemma 3.1.1 and the fact Tm S rm, for any t, we have

(sup ZY,(s) )>6p-2Vb21nlnb2)

Pn<Tm+1 8:].

SP( sup ZY,(S)>6p 2\/b21nlnb5")

”(Tm+1¢ 8’1
S exp(— —=,Blnlnp2m‘1)) ((2m —1)lnp)"ﬁ, V1 S t S dlnn
where ,8 = map-2, K). Since ,B(x, K) is increasing in x E (0,4/(3K)), then 3 > 2
for 6 > p27(K). It follows then for such a 6,

 

Z; Z: P({Sgp 2; 13(8) > 6\/p2m-1 1111np2m—1})
5i d( 27” + 3) )(lnp)((2m — 1) Imp)” < oo. (3.32)

(3.31), (3.32) and Borel-Cantelli lemma imply

 

P( sup max ZY,() > )6\/(Tm + 1) lnln(Tm + 1), i.o inm)) = 0 (3.33)

"(Tm-+1 3StSd Inns

for 6 > p27(K). Since p > 1 is arbitrary, let p goes to 1,then (3.33) is true for
6 > 7(K).

Then we have proved

P( max if“) 7(K)\/2nlnlnn, i.o)=0.

lStSdlnns

(3.27) follows now. The proof is completed.

QED.

33

3.2 Convergence Rate of Sample Covariances

In this section, we prove the following theorem.

Theorem 3.2.1 If {x(t) : t E Z} is PAR and Assumption 3.0.1 holds, then for

any constant d > 0, almost surely,

 

lnlnN
sup |R~(t,s)—R(t,s)l=0( N )
|t—s|<dlnN

where RN is deﬁned by (3.1). R(t, s) is the autocovariance function of x.

We will need some lemmas to prove the theorem. Clearly, RN(t,s) can be
linearly expressed by the sample covariances of the corresponding martingale dif-
ference E(t). So, we ﬁrst investigate the Wold coefﬁcients for a PAR model. It
is well known that the corresponding multivariate stationary AR model 2'? has

representation

x(t) = i 0,50: — j) (3.34)
where =0
E'(t) = (6(1 + tT),e(2 + tT), - - -,e(T+ tT))’,
x(t) = (x(l + tT), x(2 + tT), - - - ,x(T + tT))’.
If we write (3.34) for each component, we get
x(t) = gamma — j) ‘ (3.35)

and obviously C(j, t) is periodic in t. We call c( j, t) the Wold coefﬁcients of the
PAR x(t).
The following fact about PAR is analogous to a well known one for stationary

AR model.

34

Lemma 3.2.1 There exists constants '71 > 0 and 72 > 0 such that for any j,t
ICU, t)| S warm-723')

Proof. It is well known that the Wold coefﬁcients C" of a stationary multivariate

process go to zero at an exponential rate, i.e., there exists positive constants a and
ﬁ such that
”Cull S aexp(-nﬁ),

where the norm ”Call is the maximum of entries in C". Observe that for any
0 S m < T and j = nT + m, C(j, t) is an element of the matrix C" for any
t: 1,2,---,T. Hence

I C(J'J) IS IICnll S aexp(-nﬁ) S aexp(-j)6/T)-
QED.

Next, we consider the sample covariance of the innovation process e(t). For any

positive integers t, s and positive real number b, let
['9]
u(t, s; b) = Z {e(t + mT)e(s + mT) — Ee(t)e(s)} (3.36)

m=0

where [b] as before denotes the integer part of b.

Lemma 3.2.2 Let {Mn} be a sequence of increasing , non-negative random vari-
ables and {An} be an increasing sequence of real numbers. If An —) 00 and
E'(Mn) = 0(An), then

Mn 2 0(An ln An(ln 1n An)1+5)

for any 6 > 0.

35

Proof. For a given 6 > 0, let A(n) = AnlnAn(lnlnA,,)1+5. Without loss of
generality, assume that

E(Mn) S CA" (3.37)
for some constant c > O.
For j Z 1, let
n, = inf{n Z 1:1nAn > j}. (3.38)

Then n,- increases to 00 as j —+ 00.

It follows from Markov’s inequality, (3.37) and (3.38) that

 

P(Mn, > A(n)” S

Since 23:3 “Tip“ < oo, Borel-Cantelli Lemma implies that
P(Mnj > A(nj),i.o) = 0.

Now, for n,- S n S n,“ — 1 , we have from the monotonicity

A(n) Z A(nj) and Mn S Mn.

1+1-1

Then almost surely, for sufﬁciently large n,

 

 

M" Mnj+l A(nj'f'l) (339)

Since

 

, ' ' 1+6 ' 1
J-too A(nj) J—mo exp(])_71+5lnj

(3.39) implies
M, = 0(A(n)) = O(A,, lnAn(lnln A,)1+5).

Since it is true for any 6 > 0, O can be replaced by 0. The proof is completed.

QED.

36

The following lemma is needed to prove our theorem and is of interests of its

own.
Lemma 3.2.3 Under Assumption 3. 0.1, for any constant (1 > 0,

t t'
limsup max |u(, ,n)|

n—too ItISdlnn V 2nln lnn

t o
limsup max M- < V2, a.s. (3.41)

1).—+00 |t|.lsISdln mess \/2n ln 1n n ‘

Proof. The proof is an application of Theorem 3.1.2 and Lemma 3.2.2 with

00, a.s. (3.40)

some computation. We only need to prove the lemma for t S 3 since u(t, s; n) is
symmetric in t, s.

For a ﬁxed |t| < dln n, let no be the integer such that
to = t — 710T E[1,T]

First notice that Theorem 3.1.2 implies that for any ﬁxed to,

n 2 _
,imsuplz =.<e (to+mT) 1))
n—)oo «27111111177,

lim sup max l2m=1 €(t0 + mT)e(s + mT)| S \/2, a.s. (3.43)

n—too to<3<dlnn v2nlnlnn

Let so = s — noT. It is clear that u(t, s; n) can be written as

< oo, a.s. (3.42)

 

1}.-+110
u(t, s; n) = Z (6(t0 + mT)e(so + mT) — 6“,)
and
[u(t,s;n) — Z: e(to + mT)e(so + mT)| S 2M”, (3.44)
m=0

where Mn denotes the the maximum of [u(t,s;i)| over |t|, |s|,i S dlnn . Since

{u(t, s, ; i),.7-'m,,x(,,,)+,-T-1,i Z 1} is a L2 martingale under Assumption 3.0.1 and

E ln(t, 8; i)|2

= : E|e(t + mT)e(s + mT) — Ee(t)e(s)|2 S c(i + 1). (3.45)

m=0

37

for c = 1 + sup, E |e(t)|4. It is evident that

E(Mf) S E}: ln(t, 8; i)|2

t,s,i

g (1 + 2dlnn)3 . c(lnn + 1).
Then E(Mn) = O(A,,) for An = (1n n)2. Lemma 3.2.2 implies

Mn = o(A,,lnA,,(ln1nA,,)2)

= 0(n“), Va>0.

The lemma now follows.

QED.

Proof of Theorem 3.2.1 Without loss of generality, assume N = nT. Then by
(3.1)

n—max(t,s)/T
RN(t, s) = n—1 : x(t + mT)x(s + mT) (3.46)

m=l

fort: 1,---,T.,and s=0,1,-~,N—t—1.

Since both RN and R are symmetric and periodic, we only need to prove the
theorem for t = 1, ..,T. and t S s S t+ dlnn, and

Let Q" = W. Notice that from (3.35) and the orthogonality of {e(t)},

oo

R(ts) = z c(j,t)c(k.s)cs._.~,._, (3.47)

J'.k=0

Then it follows from (3.46) and (3.47) that

RN(t, s) — R(t, s) = n—1 it): c(j,t)c(k, s)

j,k=0
n—s/T
x Z [e(t + mT — j)e(s + mT — k) — 6,_j,3_k]
m=0

+n — [n - S/T] i c(j,t)c(k,8)61—j,s—lc

j,k=0

 

n

38

The sum in the second term is ﬁnite by Lemma 3.2.1, thus the second term is
obviously 0(n’1 1n n) uniformly in s S dln.
Denote the ﬁrst term by Wn(t, 3). Then
w, (t, s)— _ n 1 Z c(j,t) c(k ,s)u(t —j,s — k, n— s/T) (3.43)
j, k=0
where u(t, s; x) is deﬁned by (3.36).
Next, truncate the sum in (3.48) at j, k S dln n. Denote by Zn(t, s) the truncated

sum, i.e.
dlnn

Zn(t, s) = n"1 E c(j, t)c(k, s)u(t — j, s — k; n — s/T) (3.49)

j,k=0
Then it follows Lemma 3.2.3

max max |u(t—j,s—k;n—s/T)|=O(\/nlnlnn).

0<s—tSd lnn OSj,de In 11

Consequently

dlnn

|Z,,(t,s)| S n‘10(\/n1nlnn) )max jzzo |c( j,t )c(,k s)|— — 0(Qn) (3.50)

max
OSs—tSdlnn

So it is sufﬁcient to show

max |W,,(t, s) — Zn(t, s)| = 0(Qn). (3.51)

OSs—tSdlnn

The left hand side is dominated by I 1," + 12,", where

11,11: max IIZ ZC(j,t)C ”(t—j13_kin_S/T)l

OSs—tSdlnnn j—dlnn k-O

12,1. = osgggmnql Z Z 60'. 0606. 8)U(t - 133 - k; n - s/T)|

0Sj<dlnn k>dlnn

Applying Markov’s inequality, it is easy to see

[)(OSsInax tSdlnnn '2 ZC(j,t)C )’Ll(t—j,S—' k;n—S/T)| >Qn)

j=d In 11 k=0

39

0000

S Q;2n‘2 Z E| Z Z c(j, t)c(k, s)u(t — j, s — k; n - s/T)|2

OSs—tSdlnn j=dlnnk=0

(3.52)
We have proved in (3.45)
E[u(t, s; i)|2 S c(i + 1),

then Holder’s inequality implies for Vt,, nj,i = 1, - - - ,4, j = 1, 2,

 

E|u(t1, t2; n1)u(t3, t4; n2)] S c\/(n1 + 1)(n2 + 1).

This together with Lemma 3.2.1 implies that the expectation in (3.52) is dominated
by

 

oo . 00 f _2 d
n+1c cg,t2 ck,szS n+1n 72.

where 71, 72 are the same as in proposition 2.

Then (3.52) is bounded by (dln n)2Q;2cn'1“272d = 0(n‘2). It follows that
213(1va > Q”) < oo.
Borel-Cantelli lemma implies
[1,” = 0(6),) a.s
Similarly, we can prove that
12,1: = 0(Qn) “-3-

(3.51) is established now and the proof is ﬁnished.

QED.

40

3.3 Convergence Rate of Coefﬁcients

Solution of Yule-Walker equations provides estimators for the coefﬁcients. If the
orders p(1),p(2),- - - ,p(T) are known, then we will have no diﬁculties to show,
using the results in last section, that these estimators are consistent and have the
same convergence rate as the sample covariances. Since the orders are unknown,
we need a little extra work and notations get complicated. To make our statements
clearer, we will deﬁne random inner product which will simplify our statements.
Let L262, F, dP) denote the Hilbert space of random variables with zero means
and ﬁnite second moments. Then {x(t), t E Z} is a set in this Hilbert space and
Yule—Walker equations are just normal equations of projection. We want to use
this convenience of projection even when the covariances R(t, s) in Yule-Walker
equations are replaced by the sample covariances RN (t, s). For this purpose, we
introduce random inner product. Let X denote the the subset {x(t), t 6 Z} of
L2(Q,F,dP). For each integer N, let < -,- >N () be a map from X x X X (2 to

the set of real numbers such that
< x(t),$(8) >N (w) = R~(t, 3)(w)o

We can not yet say < -, . > (w) is an inner product for a ﬁxed w. But for a given
ﬁnite sequence of integers to,t1, . . .,tm and a ﬁxed (1), (RN(tj,tk);-’,‘k=o is positive
deﬁnite for sufﬁciently large N. So for such a N, < -, -, >N (at) can be regarded as

an inner product on a linear space spanned by x(to), . . . ,x(tm) such that
< x(tj),x(t;,) >N (w) = RN(tj,tk)(w).

We will suppress w in the inner product and write it as < ., . > N. The correspond-

ing norm will be denoted by H.” N.

41

For the sake of convenience and unity of notations, let < ., . >00 and [|.||co be

the inner product and norm in L2(Q, F, dP), i.e.

Ex(t)x(s)

< x(t),x(s) >co

”x(t)“... = 3132(0-
Then for any t, s, almost surely,

Iii—130 < x(t),x(s) >N=< x(t), x(s) >oo .

For each N = 1, 2, - . - ,oo, denote by ProjN[x(to)|x(t1), - - - ,x(tm)] the projec-
tion of x(to) onto the subspace spanned by x(tl), - - - ,x(tm) under the ||.||N. Let
)0
PTOle$(t) l x(t - 1), - -- ,~'I:(t - 19)] = Eat/(j. t;p)$(t - 1'). (3-53)
j=1

aN(j, t;p),j = 1, . - . ,p are actually the solution of

I‘~(t;p)a~(t;p) = R~(t;p), (3-54)
where
I‘m/(tun) = (RMt - it - k))§,k=1 (3-55)
RN(t;P) = (RN(t_1)t))"°:RN(t —P,t))’ (3-56)
a~(t;p) = (a~(1,t;p), - - - ,a~(p, t;p))' (13-57)

For N = 00, the above equations are just the Yule—Walker equations we dis-
cussed in Chapter 2.

Let l(.) he a periodic function from N to N with period T. l (t) may depend
on sample and serves as an estimator of p(t). Choose a dominating function L(N)

from N to N and assume the following throughout the rest of this section.

42

Assumption 3.3.1 L(N) increases to 00 and L(N) = 0(ln N)

Theorem 3.3.1 Let {x(t),t E Z} be a PAR with order p(l),---,p(T). Then
under Assumption 3. 0.1 and 3.3.1,

mp | a~(j,t;l(t)) — a..<j,t;z(t)) I: 0< 3%!)

1'.“

where the supremum is taken over any t, j S l (t) and any periodic function l(.)

with period T such that l(t) S L(N), V t.

Remark. Notice that for p Z p(t) , a0o (j, t; p) are the actual regression coefﬁcients
in the PAR model, this theorem says if we choose order p greater than the true

order in (3.54), then the estimator from Yule-Walker equation converges to true

parameter at the rate of ‘/ MiG—N.

The proof of this Theorem needs the following Lemma.

Lemma 3.3.1 Let x(t) be a PAR. I‘(t; q) = (R(t—j, t—k))3l,k=0. Then there exists
an M > 0 such that for any t and q, |]I“1(t; q)” S M.

Proof. We note that for any positive deﬁnite matrix F, ”F“ is less than or equal
to the maximum eigen value of I‘. Since F‘1(t; q) is positive deﬁnite, we only need
to show that the eigenvalues of F‘1 (t; q) is bounded from above, or equivalently, all
eigenvalues of I‘(t; q), for any t = 1, 2, - - - ,T and q 2 l, are no less than a positive
number A.

Let Aq be the minimum of all eigenvalues of P(t; q) for all t = 1,2, . - - ,T.
Evidently,

Aq > 0.
It sufﬁces to show that
min 02 (t)

1 + max, 25:] 0(1', t)2 .

 

Ag“ 2 min(Aq, (3.58)

43

Let

 

 

(x(t) \
x,(t)= ‘3‘“) mm
\$(t-q))

Then

Xq+1(t) = $(t)
xq(t " 1)

For any vector CH, of q + 2 dimension such that

llC9+1ll2 : 11

C
Cq+1 = 1
C9

where Cq is a (q + 1)-dimensional vector. Using (1.2), we get

we write it as

C;+1Xq+1(t) = ”(t) + (cdp(t) + Cpllpr):

where ('1'? = (a(1,t),a(2,t), . - - ,a(p(t),t),0, - - - ,0).
The orthogonality of c(t) with x(s), s < t together with the deﬁnition of Ag
imply
C;+1P(t; q + 1)Cq+1 = IICQHXQHUHIZ
= c202(t) + ”(05120) + Cp)’Xp(t)||2
2 0202(1) + A,||ea,(t) + qul2
2 620205) + Aq|62||5q(t)||2 - ||qu|2|

Since c2 + IIqul2 = 1, and

(ax + Abe — 1]) = min(A,a/b),Va > 0,A > 0,b > 1,

inf
OSxSl

44

we have

 

C’ I P(t'q + 1)C , > min(A 2(t) )
q 1 , q l q, llaq(t)ll2+1 .
(3.58) now follows.

QED.

Proof of Theorem 3.3.1 For brevity, we omit l (t) in FA), RN and aN. Observe

that

I + I‘;1(t)(I‘N(t) — I‘oo(t)) (am(t) — GNU»

= I‘:(t)[R..(t) — R~<t> + (F~(t) — r...)a..(t)) (3.59)
Theorem 3.2.1 and Lemma 3.3.1 imply that the maximum absolute value of the
entries of F;1(t))(I‘N(t) — Foo(t)) is 0(‘/M13—N). Thus

ll I‘;.'J(t)(1‘1v(t) - Poo(t)(aoo(t) - a~(t))ll

 

_ 0( 1"]3’" )12(t))|a..(t)-a~(t)n (3.60)
= 0(1)|laoo(t)-a~(t)ll (3.61)

Similar argument proves RHS of (3.59) is

lnlnN
N

Also notice that every 0(1) and 0(1) appeared above is uniform in t and functions

I such that l(t) S L(N). Then the Theorem follows from (3.59),(3.61) and (3.62).

 

0( )(1, 1, - -. ,1)'. (3.62)

3.4 BIC for Order Estimation

For stationary AR(po) model, Akaike(1977) ﬁrst proposed to estimate p0 by 15

which minimizes

1nd; +plnN/N

45

Here 6: is the estimate of 02 from the Yule-Walker equations of order p.
An—Chen—Hannan (1982) proved BIC estimator is consistent under general con-
ditions.
In this section, we develop similar criterion for PAR models. It turns out that
BIC is included as a special case. Let x(l), x(2), - - - ,x(N) be observations from a
PAR model with order p(). Let RN(t, s) be deﬁned by (3.1). We will follow the

notations in the last section. Let
amt; 1) = ”x(t) — Proms) I x(t — 1), - - - ,x(t — 1))“; (3.63)

Let q(N) be a sequences of positive integers such that

 

, lnlnN _ . q(N) _
1313,13» q(N) — 0, and [[1130 — — O. (3.64)
Let p(t) = pN(t) minimize
lnai,(t;l) + l - q(N)/N, V0 S l S L(N). (3.65)

Then p(t) is a consistent estimator of p(t) under general assumptions.

Theorem 3.4.1 Let x(t) be PAR satisfying Assumption 3.0.1 and L(N) satisfy

Assumption 3. 3.1. Then for any t, almost surely
p(t) —) p(t), asN —> 00.
Proof. Since RN(t, s) —> R(t, s), a.s, then

ll 2:33 cj$( tilllN —) “213013 t1)”; (3-66)

for all real c1, - - - ,cm and integers t1, t2, ' - - ,tm.

As a special case of (3.66),we have
012v(t;l) 3 02.9; l) = “x(t) — Fromm) Im<t — 1), ~ - - , x(t — 1))“; (3.67)

46

Suppose that

PTOjN[$(t) I1?“ — 1), ' ' ' ,$(t " 1)] = ZGNUJJWU — j) (3-68)

J=1
for N: 1,2,---,oo.
It is helpful to realize that for l 2 2,
02 (t' l)
1——N—’—= 2lt-l 3.69
012V(t;l—1) aN(3!) ( )

In fact, (3.69) is just application of Pythagorean Theorem. In fact, let
p(t) = ProjN[:1:(t) | x(t — 1), - - - ,x(t — l + 1)].
Then
am — 1) — owl)
= |$(t) - Pr0j~[$(t) |96(t - 1), - ' - ,x(t - l + 1)]lliv
—||$(t) - PTOjNIIIIU) |$(t"1),“',$(t—l)l”iv
lli‘(t) - 1'31‘01'1v[373(t)|2=(t - Ur - - ,x(t - 1+ 1)]II2

diva: t; l)llx(t) - Fran/[x(t) |$(t - 1), - ° - ,x(t - l)]||?v

a (l,t;l)0}"v(t; 1)

Since ago(t,p(t);p(t)) is positive, (3.69) implies
0300,19“) - 1) > 0300.120» (370)

Thus
03w) 2 amps) — 1) > amps» (3.71)

It follows that for any I < p(t),

. q N
<1Vlgn°o(ln012V(t,l)+l(—l).

M)
N N

13320011 Uzi/(15,190)) + p(t)

47

This inequality implies that asymptotically

W) Z p(t)

(3.72)

Using (3.69) repeatedly and applying Theorem 3.2.1, we get for l > p(t),

viva, 1)

_ 012v(t,P(t))
l

= Z amtm)=(z—p(t))0(ln1nN/N)
j=p(t)+1
Since
1&3?» 012v(t;l) = gigglvﬁmﬁﬂ = 030(t;p(t)),

then for sufﬁciently large N,

012v (t, l) _ _
0737:1017» 2 (1

It follows from (3.73)-(3.74) that

01%; l)

1“ 0?»;(t_;p(t))

ln 01%,(t, l) — ln afv(t,p(t)) = (l —-p(t))0(lnlnN/N).

where the 0(1) is uniform in l S L(N).
The assumption on q(N) and (3.75) imply that

min [mom I) — mammt» + [z — p(t)]q(N)/N] > 0.

p(t)<l<L(N)

for sufﬁciently large N. So asymptotically
W) S p(t)-

Then the assertion follows from (3.72) and (3.76).

48

(3.73)

(3.74)

(3.75)

(3.76)

QED.

Corollary 3.4.1 Suppose 55 is a multivariate AR(p) model and p(t),t = 1, - - - ,T,are
the order estimators for the corresponding PAR a: deﬁned before. Then

 

13: max[ﬁ(t) —t

195T T 1+ 1

is a consistent estimator of p.
Proof. It follows from theorem 3.4.1 and Theorem 1.0.1 (iii).
QED.

We also use simulated data to estimate order of the following model

132:; — 0-7172n-1 + 35332714 = €2n

$2n+1 - (15332:: — -25$2n—1 = 6211—1

where 6,, are i.i.d normal sequence. So T = 2,p1 2 p2 = 2. We took N =
200, q(N) = In N ,. Our simulated results are shown in the tables. We see that it

picks up the right order.

 

pl 1 2 3 4 5 6 7
BIG .05429 .02231 .04451 .06129 .08767 .10455 .12899

 

 

 

 

 

 

 

 

 

 

152 1 2 3 4 5 6 7 H
BIC .06602 003244 -0.00682 0.01884 0.04428 .06567 .09199 I]

 

 

 

 

 

 

 

 

 

 

 

49

Bibliography

[1] ADAMS G. J AND GOODWIN G. C.(1995) Parameter estimation for
periodic ARMA models. J. Time Ser. Anal. Vol. 16, No. 2. 127-145

[2] ANDERSON, P. L. AND VECCHIA, A. V. (1993) Asymptotic results for
periodic autoregressive moving average processes. J. Time Ser. Anal. Vol.

1, 1-18.

[3] AKAIKE, H. (1974) A new look at the statistical model identiﬁcation. IEEE
Trans. Automatic Control. AC—19, 716-722

[4] AKAIKE, H. (1977). On entropy maximization principle. In Application of
Statistics , Ed. RR. Krishnaiah, 27-41. North-holland, Amsterdam.

[5] AN, H. Z., CHEN, Z. C. AND HANNAN, E. J (1982) autocovariance,
Autoregression and Autoregressive Approximation. The Ann. of Statist.

Vol. 10. No.3, 926-936

[6] CRAMéR, H (1961) On some classes of non-stationary processes. Proc. 4th

Berkeley Sympos. II, 57-77

[7] CRAME'ZR, H ( 1964) Stochastic processes as curves in Hilbert space. Proba-

bility Theory and Applications, 2, 1964.

50

l8]

[9]

[10]

[11]

[12]

[13]

[14}

[15]

[16]

BLOOMFIELD, P. HURD, H. L. AND LUND, R. B (1994) Periodic Corre-
lation in Stratospheric Ozone Data,J. Time Series Analysis, Vol. 15, No.

2.

BURG, JP. (1967) Maximum entropy spectral analysis. Proc. 33th Ann.
Intern. Meeting, Soc. of Explor. Geophys., Oklahoma City, Oklahoma. Also
reprinted in Modern spectrum analysis, ed., D. G. Childers. (1978) IEEE
Press, New York.

CHILDERS, D. G. ed. (1978) Modern spectrum analysis. IEEE Press, New
York.

CHOI, B. S. AND COVER, T. M. (1987) A proof of Burg’s theorem. In C. R.
Smith and G. J. Erickson (eds.), Maximum-Entropy and Bayesian Spectral
Analysis and Estimation Problems, 75-84, D. Reidel Pub. Co., Boston.

GALLAGER, R. (1968) Information Theory and Reliable Communication,
Wiley, New York.

GARDNER, W.A. (1986) Introduction to random processes with application

to signals and systems. New York: Macmillian.

GLADYSHEV, E.G. (1961) Periodically Correlated Random Sequence, So-
viet Math. 2, 383-388.

HANNAN, E.J. (1970) Multiple Time Series. Wiley, New York.

GUDZENKO, L.I.( 1959) On Periodically Nonstationary Precesses, Radiotek.
Elektron. Vol. 4, no.6, 1062-1064

51

[17] HAYKIN, S. S. ed. (1983)Nonlinear methods of spectral analysis. Springer-

Verlag, New York.

[18] HURD, H.L. (1989) Nonparametric time series analysis for periodically cor-

related processes. IEEE Trans. Inform. Theory. Vol.35, no. 2, 350-359

[19] HURD, H.L. AND MANDREKAR, V. (1991) Spectral Theory of Periodically

and Quasi-Periodically Stationary SaS Sequences. Tech. Report No. 349.
Center For Stochastic Processes, Univ. of North Carolina at Chapel Hill.

[20] KOLMOGOROV, A. N. (1941) Stationary sequences in Hilbert space. Bull.

[21]

[22]

[23]

[24]

[25]

Math. Moscow. 2, No. 6.

KULLBACK, S. (1978) Information Theory and Statistics. Peter Smith,

Massachusetts.

MIAMEE,A.G. AND SALEHI, H.(1980) On the Prediction of Periodially
Correlated Stochastic Processes. In Multivariate Analysis—V, P.R. Krishna:-
iah ed., North Holland, 167-179

MORF, M., VIEIRA, A., LEE, D. T. L. AND KAILATH, T. (1978) Re-
cursive multichannel maximum entropy spectral estimation. IEEE Trans.

Acoust. Speech, Signal Process. ASSP-28, 441-454.

PAGANO, M.(1978) On Periodic and Multiple Autoregressions. The Ann.
of Statist. Vol.6. No. 6. 1310-1317.

PARZEN, E. (1977) Multiple time series: determining the order of approxi-
mation autoregressive scheme. In: P. Krishnaiah, ed. Multivariate Analysis:

IV, 283-295. North— Holland, Amsterdam.

52

[26] PARZEN, E. (1983) Autoregressive spectral estimation. In D. R. Brillinger
and P. R. Krishnaiah ed. Time Series in the Frequency Domain, ( Handbook
of Statistics 3). 221-247. North-Holland, Amsterdam.

[27] SHIBATA, R. (1976) Selection of the order of an autoregressive model by
Akaike’s information criterion, Biometrika, 63, 117-126.

[28] SMITH, C. R. AND ERICKSON, G. J. (1983) Maximum-Entropy and
Bayesian Spectral Analysis and Estimation Problems. D. Reidel Pub. Co.,

Boston.

[29] STOUT, W.F.(1970) A Martingale Analogue of Kolmogorov’s Law of the
Iterated Logarithm. Z. Wahrscheinlichkeitstheorie verw. Geb. 12, 279-290.

[30] STOUT, W.F(1970)The Hartman-Wintner Law of the Iterated Logarithm
for Martingales. Ann. Math. Statist. 41, 2158—2160.

[31] STOUT, W.F. (1974) Almost Sure Convergence. Academic, New York.

[32] TIAN, C. J. (1988) A Limiting Property of Sample Autocovariance of Peri-
odically Correlated Processes with Application to Period Detection. Journal

of Time Series Analysis, Vol. 9, no. 4, 411-417.

[33] ULRYCH, T. J. AND BISHOP, Y. N. (1975) Maximum Entropy Spec-
tral Analysis and Autoregressive Decomposition. Rev. Geophysics and Space

Physics 13, 183-200

[34] VECCHIA, A.V. (1985) Periodic autoregressive-moving average modeling
with applications to water resources. Water Resources Bulletin, Vol 21,no

5, 721—730.

53

[35] VECCHIA, A. V. (1991) Testing for Periodic Autoregressions in Seasonal
Time Series Data, Biometrika,Vol 78.

[36] WHITTLE, P. (1963) On the Fitting Of Multivariate Autoregressions, and
the Approximate Canonical Factorization Of A Spectral Density Matrix.
Biometrika, 50. 1 and 2, p129.

54

I I

* “1111111111)] 111))“

 

31129304