PERIODIC ARMA MODEL: FORECASTING, PARSIMONY, ASYMPTOTIC NORMALITY AND AIC By Kai Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics - Doctor of Philosophy 2014 ABSTRACT PERIODIC ARMA MODEL: FORECASTING, PARSIMONY, ASYMPTOTIC NORMALITY AND AIC By Kai Zhang Periodic autoregressive moving average (PARMA) models are indicated for time series whose mean, variance, and covariance function vary with the season. In this thesis, I develop and implement forecasting procedures for PARMA models. The required computations are documented in detail. An application to monthly river flow forecasting is provided. For my father. iii ACKNOWLEDGMENTS I was lucky to meet with my thesis advisor Dr. Mark Meerschaert, who led me into the fascinating world of time series and academic research. I want to say a million thanks, however it will not be enough for my gratitude. I could not make it without his encouragement and belief in me. I will also benefit from his life attitude for the rest of my life. I am also grateful for the tremendous help from Dr. Paul Anderson, who introduced me into the periodic ARMA modeling. Thanks him for numerous of transits to East Lansing for discussing my thesis. Special thanks go to my thesis committee member and department Chairperson Dr. Hira Koul. I am indebted for his help in every little detail of my doctoral years, including the learning in STT 954, failing and passing two prelim exams, student travel fund applications, summer research fellowships, job-hunting and thesis updates, etc. Additionally, Dr. Chae Young Lim was always supportive in my research and job-hunting process. I am deeply grateful of her time spent reading my thesis and valuable feedbacks. Lastly, I want to express my gratitude for Ms. Sue Watson. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Computation of PARMA Autocovariances . . . . . . . . . . . . . . . . . . . 1 8 Chapter 2 Forecasting and forecast error . . . . . . . . . . . . . . . . . . . . 2.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Forecast errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 25 Chapter 3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 A simulation study for the convergence of the coefficients in innovations algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4 59 Application to a Natural River Flow . . . . . . . . . . . . . . . . Chapter 5 Maximum likelihood estimation and reduced model . 5.1 Maximum likelihood Function for PARMAS (p, q) Model . . . . . 5.2 Reduced PARMAS (1, 1) model. . . . . . . . . . . . . . . . . . . 5.2.1 Reduced model by asymptotic distribution of ψˆi ( ) . . . . 5.2.2 Reduced model by asymptotic distribution of MLE . . . . Chapter 6 . . . . . 67 67 83 83 85 Asymptotic normality of PARMA model . . . . . . . . . . . . . 98 Chapter 7 Periodic AIC for PARMAS (p, q) model . . . . . 7.1 Kullback-Liebler (K-L) Information . . . . . . . . . . . . . 7.2 Derivation of periodic AIC for PARMAS (p, q) process . . . 7.2.1 Application to model selection for the Fraser River 7.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 . . . . . . . . . . 126 126 128 145 147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 BIBLIOGRAPHY . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . 186 LIST OF TABLES Table 1.1 Parameters in PARMA12 (1, 1) for Example 1.1.1. . . . . . . . . . . 11 Table 1.2 Part of autocovariances in PARMA12 (1, 1) for Example 1.1.1. . . . . 12 Table 3.1 Moving average parameter estimates ψˆi ( ) at season i and lag = 1, 2, . . . , 6, and p-values, after k = 20 iterations of the innovations algorithm applied to average monthly flow series for the Fraser River near Hope BC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Table 3.2 Model parameters and estimates for simulated PARMA12 (1, 1) data. 56 Table 4.1 Parameter estimates for the PARMA model (1.0.1) of average monthly flow series for the Fraser River near Hope BC from October 1912 to September 1982. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Sample mean, standard deviation and autocorrelation at lag 1 and 2 for an average monthly flow series for the Fraser River near Hope BC, from October 1912 to September 1982. . . . . . . . . . . . . . . 63 Estimated parameters for PARMA12 (1, 1) model, from innovations algorithm, using the logarithm of the original data. The resulting negative likelihood value is -362.655. . . . . . . . . . . . . . . . . . . 77 Table 5.2 A compare of different algorithm in optim function. . . . . . . . . . 78 Table 5.3 MLE result by BFGS method, with σi as nuisance parameters. The ˆ = −463.8585. . . . . . . . . . resulting MLE value was −2 log L(β) 79 MLE by BFGS method, using the values of σ ∗ from Figure 5.1. The ˆ = −506.6053. We take those resulting MLE value was −2 log L(β) parameters as our best results in optimization. . . . . . . . . . . . 83 Parameter estimates for the reduced PARMA model (1.0.1) of average monthly flow series for the Fraser River near Hope BC from October 1912 to September 1982. The resulting negative likelihood value is -351.3939. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Table 4.2 Table 5.1 Table 5.4 Table 5.5 vi Table 5.6 MLE and its standard error. . . . . . . . . . . . . . . . . . . . . . . 88 Table 5.7 Model parameters by MLE optimization and their p-values. . . . . . 95 Table 7.1 Comparison of AIC values for different models . . . . . . . . . . . . 146 Table 7.2 The model parameters in PAR12 (p) automatic model fitting by pear package in R, where the number of estimable parameters is 19, assuming σ ˆ i as nuisance parameters in the AIC computation. . . . . . 146 Table 7.3 A compare of AIC values for different models, after taking the log of the data. Note that both full and reduced PARMA12 (1, 1) models perform better than PAR models. . . . . . . . . . . . . . . . . . . . 147 LIST OF FIGURES Convergence of ψˆi ( ) as iterations k increase,where 0 ≤ i < S −1, 1 ≤ ≤ 2, k = 1, 2, . . . , 30. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. . . . . . . . . . . . . . . . . . . . . . . 50 Convergence of vk, i−k as iterations k increase,where 0 ≤ i < S − 1, 1 ≤ ≤ 2, k = 1, 2, . . . , 30. . . . . . . . . . . . . . . . . . . . . . . 51 Plots of R1k and R2k against iterations k, which show the convergence of ψˆi ( ) and vˆk, i−k as iterations k increase,where 0 ≤ i ≤ 11, 1 ≤ ≤ 2, k = 1, 2, . . . , 50. . . . . . . . . . . . . . . . . . . . . . . 57 Convergence of φˆi and θˆi as iterations k increase,where 0 ≤ i ≤ 11, k = 1, 2, . . . , 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Average monthly flows in cubic meters per second (cms) for the Fraser River at Hope, BC indicate a seasonal pattern. . . . . . . . . . . . . 60 Statistics for the Fraser river time series: (a) seasonal mean; (b) standard deviation; (c,d) autocorrelations at lags 1 and 2. Dotted lines are 95% confidence intervals. Season = 0 corresponds to October and Season = 11 corresponds to September. . . . . . . . . . . . . . 61 √ ACF and PACF for model residuals, showing the bounds ±1.96/ N , indicate no serial dependence. With no apparent pattern, these plots indicate that the PARMA12 (1, 1) model is adequate. . . . . . . . . . 62 24-month forecast (solid line with dots) based on 70 years of Fraser river data, with 95% prediction bounds (dotted lines). For comparison, the actual data (solid line) is also shown. This data was not used in the forecast. . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 4.5 Width of 95% prediction bounds for the Fraser river. . . . . . . . . 66 Figure 5.1 −2 log{L(β, σ 2 )} as a function of each σi , i = 0, . . . , 11, with the remaining parameters fixed. . . . . . . . . . . . . . . . . . . . . . . 80 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 viii Figure 5.2 Figure 5.3 −2 log{L(β, σ 2 )} as a function of each θi , i = 0, . . . , 11, with the remaining parameters fixed. . . . . . . . . . . . . . . . . . . . . . . . 81 −2 log{L(β, σ 2 )} as a function of each φi , i = 0, . . . , 11, with the remaining parameters fixed. . . . . . . . . . . . . . . . . . . . . . . 82 Figure 5.4 A 24-month forecast using the reduced PARMA12 (1, 1) model. Solid line is the original data; Solid line with dots is the reduced PARMA(1,1) forecast; The two dotted lines are 95% Gaussian prediction bounds. The 24-month forecast is from October 1982 to September 1984. . . 86 Figure 5.5 A comparison of 24-month forecasts between the full PARMA12 (1, 1) model and reduced PARMA12 (1, 1) model. In the first two graphs, dotted line is forecast, and solid line is real data. . . . . . . . . . . . 87 A 24-month forecast using the reduced PARMA12 (1, 1) model in Table 5.7. Solid line is the original data; Solid line with dots is the reduced PARMA(1,1) forecast; The two dotted lines are 95% Gaussian prediction bounds. The 24-month forecast is from October 1982 to September 1984. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 A comparison of 24-month forecasts between the full PARMA12 (1, 1) model and reduced PARMA12 (1, 1) model, by asymptotic distribution of MLE. In the first two graphs, dotted line is forecast, and solid line is real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Figure 5.6 Figure 5.7 Chapter 1 Introduction Mathematical modeling and simulation of river flow time series are critical issues in hydrology and water resources. Most river flow time series are periodically stationary, that is, their mean and covariance functions are periodic with respect to time. To account for the periodic correlation structure, a periodic autoregressive moving average (PARMA) model can be useful. PARMA models are also appropriate for a wide variety of time series applications in geophysics and climatology. In a PARMA model, the parameters in a classical ARMA model are allowed to vary with the season. Since PARMA models explicitly describe seasonal fluctuations in mean, standard deviation and autocorrelation, they have been used to generate more faithful models and simulations of natural river flows. Historically, Gladyshev [18] first defined the concept of periodically correlated stochastic process; In [20], some early work of Jones and Brelsford studied the problem of predicting time series with periodic structure, including the estimation of necessary parameters; In [32] and [47], Pagano and Troutman studied the elementary properties of univariate processes, 1 and connections with stationary multivariate processes; For stationary time series models, Box and Jenkins [10] presented a systematic approach for modeling time series based on three stages: (1) Model identification; (2) parameter estimation; and (3) diagnostic checks or tests of goodness of fit. Since then, applications and extensions of this modeling approach to hydrology have been widespread. Adams and Goodwin [1] described an on-line parameter estimation technique, based on methods from automatic control, which provides consistent estimates of PARMA model parameters. Anderson and Vecchia [3] obtained the asymptotic distribution for the sample autocovariance and sample autocorrelation functions of PARMA process, and they also studied the asymptotic properties of the discrete Fourier transform of the estimated periodic autocovariance and autocorrelation functions. Anderson and Meerschaert [5] established the basic asymptotic theory for periodic moving averages of i.i.d. random variables with regularly varying tails. They showed that when the underlying random variables have finite variance but infinite fourth moment, the sample autocorrelations are asymptotically stable. Lund and Basawa [22, 23] explored recursive prediction and likelihood evaluation techniques for PARMA models. Time series analysis involves four general steps: model identification, parameter estimation, diagnostic checking, and forecasting. Model identification is the most difficult step for PARMA modeling. Noakes et al. [31] suggested examining the plots of the periodic partial autocorrelation function as the best approach to identify PARMA models. This method is highly recommended when the parameter space is not constrained, however it requires a high level of user experience. Another method is to use an automatic selection criterion, such as the Akaike Information Criterion (AIC) [2], or the Bayesian information criterion 2 (BIC) [37] when all possible candidates of models are examined. However, this procedure requires investigating quite a large number of models, especially when the number of parameter estimates is fairly large, for example, monthly data with 12 seasons. Ursu and Turkman [52] applied the genetic algorithm as a method of identifying the PAR model, which greatly improved the model selection efficiency. Salehi H. has also made tremendous contributions on periodic stochastic process, see [26, 27, 28, 29]. Additionally, model identification for general PARMA times series was discussed in Tesfaye, Meerschaert and Anderson [42]. Anderson, Meerschaert and Vecchia [6] developed an innovations algorithm for PARMA parameter estimation. Tesfaye, Meerschaert and Anderson [42] demonstrated model diagnostics for a PARMA model of monthly river flows for the Fraser River in British Columbia, Canada. In this thesis, I develop a practical method for forecasting PARMA models, and I demonstrate the method by forecasting monthly river flows for the same time series of monthly flows. ˜ } ˜ A stochastic process {X t t∈Z is periodically stationary if its mean EXt and covariance ˜ ,X ˜ Cov(X t t+h ) for h ∈ Z are periodic functions of time t with the same period S, i.e., for some integer S ≥ 1, for i = 0, 1, ..., S − 1, and for all integers k and h, I have ˜ = EX ˜ EX i i+kS ˜ ,X ˜ ˜ ˜ and Cov X i i+h = Cov Xi+kS , Xi+kS+h . ˜ } ia called a PARMA (p, q) process if the meanA periodically stationary process {X t S ˜ − µ is of the form centered process Xt = X t t p Xt − q φt (k)Xt−k = εt + θt (j)εt−j j=1 k=1 3 (1.0.1) where {εt } is a sequence of random variables with mean zero and standard deviation σt > 0 such that {δt = σt−1 εt } is independent and identically distributed. ({εt } is called periodic i.i.d. Gaussian noise if Xt is a Gaussian process.) Here the autoregressive parameters φt (j), the moving average parameters θt (j), and the residual standard deviations σt are all assumed to be periodic functions of t with the same period S ≥ 1. Throughout this paper I will also assume: (i) The model (1.0.1) admits a causal representation ∞ ψt (j)εt−j Xt = (1.0.2) j=0 where ψt (0) = 1 and ∞ |ψ (j)| < ∞ for all t. Note that ψ (j) = ψ t t+kS (j) for all j=0 t j. (ii) The model (1.0.1) also satisfies an invertibility condition ∞ πt (j)Xt−j εt = (1.0.3) j=0 where πt (0) = 1 and ∞ |π (j)| < ∞ for all t, and define X t−j = 0 when t − j < 0. j=0 t Again, πt (j) = πt+kS (j) for all j. The notation used in this paper is consistent with: Anderson and Vecchia [3]; Anderson and Meerschaert [4, 5]; Anderson, Meerschaert, and Vecchia [6]; Anderson and Meerschaert [7]; Tesfaye, Meerschaert, and Anderson [42]; and Tesfaye, Anderson, and Meerschaert [43]. This notation is also an extension of the notation in Brockwell and Davis [11]. ˜ ,X ˜ ˜ Suppose I have N years of data, consisting of n = N × S data points, X 0 1 , . . . , Xn−1 , 4 where S is the number of seasons. For example, for monthly data I have S = 12, and our convention is to let i = 0 represent the first month, i = 1 represent the second, . . . , and i = S − 1 = 11 represent the last. The sample mean for season i is µ ˆi = N −1 N −1 ˜ X kS+i . (1.0.4) k=0 The sample autocovariance for season i at lag γˆi ( ) = N −1 is N −1−hi ˜ X ˆi jS+i − µ ˜ X ˆi+ jS+i+ − µ , (1.0.5) j=0 where ≥ 0, hi = (i + )/S and = · is the greatest integer function. The sample autocorrelation for season i at lag ρˆi ( ) = is γˆi ( ) , γˆi (0)ˆ γi+ (0) (1.0.6) which is also the sample cross-correlation between two different seasons. In (1.0.5) the divisor N is used rather than N − hi , since this ensures that the autocoˆ (i) = γˆ (j − ) n variance matrix at season i, Γ i j, =1 is non-negative definite, where N      (i)  ˆ = γˆ (j − ) n Γ = i j, =1  N     γˆi (0) γˆi (1) ... γˆi (1) .. . γˆi (0) ... .. . γˆi (N − 1) γˆi (N − 2) . . . 5  γˆi (N − 1)   γˆi (N − 2)  .  ..  .   γˆi (0) ˆ (i) = 1 ΓΓ , where Γ is the N × 2N matrix To see this we may write Γ N N  0   0  Γ=  .. .   0  ... ... 0 Y1 Y2 . . . ... YN    . . . 0 Y1 Y2 . . . . . . YN 0   ,  . . . . . . .. .. . . . . . . . . . . . . . .    Y1 Y2 . . . YN 0 . . . . . . 0 ˜ ˆ (i) a = N −1 (a Γ)(a Γ) ≥ j = 1, . . . , N , and Yj = X µi . Then ∀N ×1 vector a, we have a Γ jS+i −ˆ N ˆ (i) = 0. By definition of non-negative definiteness, since a is arbitrary, then the matrix Γ N γˆi (j − ) n j, =1 is non-negative definite. Then the sample covariance matrix  γˆi (1)  γˆi (0)    γˆ (1) γˆi (0) i ˆ (i) =  Γ  k  ..  .   γˆi (k − 1) γˆi (k − 2) ... ... .. . ...  γˆi (k − 1)   γˆi (k − 2)    ..  .   γˆi (0) converges in the operator norm A = sup{ Ax : x = 1} to the covariance matrix  γi (1)  γi (0)    γi (0) (i)  γi (1) Γ = k  ..  .   γi (k − 1) γi (k − 2) ... ... .. . ...  γi (k − 1)   γi (k − 2)    ..  .   γi (0) in probability as N → ∞, if k → ∞ in such a way that k 2 /N → 0. This result also assumes 6 a spectral bound, see [6, Theorem 3.1], and that the underlying noise sequence has finite fourth moment. Note that, if every σi > 0, then Proposition 4.1 of Lund and Basawa [22] (i) shows that the covariance matrix Γn = γi (j − ) n j, =1 is invertible for every n ≥ 1 and each i = 0, 1, . . . , S − 1. Since the set of invertible matrices is open, the convergence in probability from [6, Theorem 3.1] implies that the sample covariance matrix is invertible with probability approaching 1 as k → ∞. Given a PARMAS (p, q) model (1.0.1) for a periodic time series, a recursive forecasting algorithm is developed in this thesis based on minimizing mean squared error. I detail the computation of h-step ahead forecasts for a PARMA model, based on the innovations algorithm, and an idea of Ansley (see Ansley [9]; Lund and Basawa [23]). I also have developed R codes to implement these forecasts, and compute the asymptotic variance of the forecast errors. All R codes are listed in the appendix. This thesis is laid out as follows. Chapter 2 develops the algorithms for computing h-step ahead forecasts for any h ≥ 1, and computes the associated forecast error variances. Chapter 3 specifies the details of computation in this paper, mainly by R software. Chapter 4 illustrates the methods of this thesis by forecasting average monthly flows for the Fraser River. Chapter 5 develops a reduced PARMAS (p, q) model to achieve parsimony. Chapter 6 derives the asymptotic theory of PARMA models, and Chapter 7 discusses the periodic AIC for automatic model selection. 7 1.1 Computation of PARMA Autocovariances Given a PARMAS (p, q) time series (1.0.1), Define the covariance function γj ( − j) = E(Xj X ) = Cov(Xj , X ), (1.1.1) and note that γj ( − j) = E(Xj X ) = E(X Xj ) = γ (j − ) for all j, ∈ Z. Then the covariance function γj ( −j) = E(Xj X ) can be explicitly computed by two methods. The first approach is to use the causal representation in (1.0.2). Given the model parameters, we may write  E Xj X ∞ = E  ∞ ψj (k)εj−k ψ (r)ε −r  r=0 k=0 ∞ ∞ ψj (k)ψ (r)E εj−k ε −r . = k=0 r=0 2 , when j − k = − r, and E ε Notice that E εj−k ε −r = σj−k j−k ε −r = 0, otherwise. Letting r = − j + k, therefore ∞ γj ( − j) = E Xj X = 2 . ψj (k)ψ ( − j + k)σj−k (1.1.2) k=0 However, (1.1.2) is computationally impractical since it requires determination and infinite summation of ψj (k) and ψ ( − j + k). Let γt (h) = Cov Xt , Xt+h be the autocovariance of Xt at season i = t − S t/S and lag h ≥ 0. Now I will consider the second method, by mimicking Yule-Walker methods for 8 stationary ARMA series. Multiplying both sides of (1.0.1) by Xt−h , where h > max (p, q), I have  Xt−h Xt −   p  q φt (k)Xt−k  = Xt−h εt + θt (j)εt−j  . j=1 k=1 Take expectations on both sides, and use causal representation to compute the right hand side:    p E Xt−h Xt −   φt (k)Xt−k  = E Xt−h εt + θt (j)εt−j  j=1 k=1  p  ∞ φt (k)γt−h (h − k) = E  γt−h (h) −  q θt (j)εt−j  ψt−h (k)εt−h−k  εt + k=0 k=1  q j=1 = 0, since εt−h−k ⊥ εt−j , ∀j = 0, 1, . . . , q, k = 0, 1, . . ., where x ⊥ y if and only if x, y = 0, and x, y is the inner product of x and y in Hilbert space H. Recall from Chapter 2 of Brockwell and Davis [11] that x, y is called the inner product of x and y, such that (a) x, y = y, x , the bar denoting complex conjugation, (b) x + y, z = x, z + y, z for all x, y, z ∈ H, (c) αx, y = α x, y for all x, y ∈ H and α ∈ R, (d) x, x ≥ 0 for all x ∈ H, (e) x, x = 0 if and only if x = 0. In this way I obtain p φt (k)γt−h (h − k). γt−h (h) = k=1 9 (1.1.3) Equation (1.1.3) expresses γt−h (h) in terms of autocovariances of the previous p lags when h > max (p, q). The advantage of this method is that the computational complexity of (1.1.3) does not increase with increasing h. Therefore, once γt−h (h) is identified for all lags 0 ≤ h ≤ max (p, q) and for all seasons i = 0, 1, . . . , S − 1, then the PARMA autocovariances at higher lags can be efficiently computed. Next I will focus on computation of γt−h (h) for 0 ≤ h ≤ max (p, q) and all seasons i = 0, 1, . . . , S − 1. With a similar technique, multiply both sides of (1.0.1) by Xt−h and take expectations:    p E Xt−h Xt −    q φt (k)Xt−k  = E Xt−h εt + θt (j)εt−j  . j=1 k=1 Notice that for 0 ≤ h ≤ max (p, q), then p φt (k)γmin(t−k,t−h) (|h − k|) γt−h (h) − k=1   ∞ q  θt (j)εt−j  ψt−h (k)εt−h−k εt + = E j=1 k=0 q 2 ψ = θt (j)σt−j t−h (j − h), j=h 2 , when k = j − h, and E(ε where θt (0) = 1 and E(εt−h−k εt−j ) = σt−j t−h−k εt−j ) = 0, otherwise. In this way I get the general form of autocovariance function for 0 ≤ h ≤ max (p, q), p q 2 ψ θt (j)σt−j t−h (j − h). φt (k)γmin(t−k,t−h) (|h − k|) + γt−h (h) = k=1 (1.1.4) j=h In a straightforward way, (1.1.4) is actually an S × [max(p, q) + 1] dimensional linear system. 10 Table 1.1: Parameters in PARMA12 (1, 1) for Example 1.1.1. season i φi θi σi 0 0.198 0.687 11875.479 1 0.568 0.056 11598.254 2 0.560 -0.052 7311.452 3 0.565 -0.050 5940.845 0.321 0.470 4160.214 4 5 0.956 -0.389 4610.209 1.254 -0.178 15232.867 6 7 0.636 -0.114 31114.514 8 -1.942 2.393 32824.370 9 -0.092 0.710 29712.190 0.662 -0.213 15511.187 10 0.355 0.322 12077.991 11 The matrix associated with this linear system is invertible as long as the PARMA model is causal. See the appendix for a simple R code written to solve the linear system, for γt−h (h), for all 0 ≤ h ≤ max (p, q) and i = 0, 1, . . . , S − 1. Note that (1.1.4) only requires ψt−h (j − h) for j ≤ q, which great reduces the computation, compared with the first method in (1.1.2). Example 1.1.1. Consider the PARMA12 (1, 1) model in Table 1.1.1, used in Tesfaye et al. [42] to fit the 72-year monthly observations of Fraser river. By (1.1.4), when 0 ≤ h ≤ max(1, 1), I have h = 0, 1, and   2 2  γt (0) − φt (1)γ t−1 (1) = θt (0)ψt (0)σt + θt (1)ψt (1)σt−1   γ 2 t−1 (1) − φt (1)γt−1 (0) = θt (1)σt−1 (1.1.5) Note that θt (0) = 1, ψt (0) = 1, ψt (1) = φt (1) + θt (1), and (1.1.5) is a 12 × 2 dimensional linear system containing 24 unknown paramters γt (0) and γt (1), for t = 0, 1, . . . , 11. Apply the computation in R, I can get γt (0) and γt (1) for all the seasons, which are shown in the 11 Table 1.2: Part season i 0 1 2 3 4 5 6 7 8 9 10 11 of autocovariances in PARMA12 (1, 1) for Example 1.1.1. γt (0) γt (1) γt (2) γt (3) 261385575 156364519 87564130 49473734 228262590 120832037 68270101 21914702 117569804 63754073 20465057 19564595 69938164 39038161 37320482 46799885 42959747 34336947 43058531 27385226 50262780 59246310 37680653 -73175828 302264368 165787551 -321959424 29620267 1059745614 258668383 -23797491 -15753939 1619934424 615947912 407757518 144753919 1298905828 671836226 238501860 47223368 600922799 290799803 57578361 32704509 301560482 159927070 90838576 50869602 first two columns of Table 1.1.1. For h > 1, γt−h (h) = φt γt−h (h − 1), (1.1.6) therefore γt (h) at higher lags h > 1 can be computed. Partial results for higher lags are shown in Table 1.1.1. 12 Chapter 2 Forecasting and forecast error 2.1 Forecasting The PARMA prediction equations are based on orthogonal projection, to minimize the mean squared prediction error among all linear predictors. If the PARMA process has Gaussian innovations, then this will also minimize the mean squared prediction error among all predictors. To simplify notation, I denote the first season to be forecast as season 0, and define the season of the oldest data point as season S − 1. If the total number of available data is not a multiple of S, I discard a few(< S) of the oldest observations, to obtain the data set ˜ ,X ˜ ˜ X 0 1 , . . . , Xn−1 , where n = N × S. ˜ − µ is the mean-centered process in (1.0.1). Fix a probability space Recall that Xt = X t t on which the PARMAS (p, q) model (1.0.1) is defined, and let ˜ n = sp{1, ˜ ,...,X ˜ H ¯ X ¯ X0 , . . . , Xn−1 } 0 n−1 } = sp{1, 13 denote the set of all linear combinations of these random variables in the Hilbert space of random variables on that probability space, with the inner product X, Y = E(XY ). Note that ˜ =P P˜ X ˜ n (Xn + µn ) Hn n H = µn + P ˜ Xn Hn (2.1.1) = µn + Psp{1,X Xn ¯ 0 ,...,Xn−1 } = µn + Psp{X Xn , ¯ 0 ,...,Xn−1 } ˜ can be accomplished by forecasting the meanso that forecasting for the original data X t centered process Xt , and then adding the seasonal mean. In order to develop a more efficient forecasting algorithm, it is useful to consider a transformed process (cf. Ansley [9]; Lund and Basawa [6]) defined by Wt =      Xt ,     Xt − t = 0, . . . , m − 1 p (2.1.2) φt (k)Xt−k , t ≥ m k=1 where m = max(p, q). Then it follows from (1.0.1) that, for t ≥ m, the transformed process (2.1.2) has the moving average representation q Wt = θt (j)εt−j , (2.1.3) j=0 where θt (0) = 1 for all t. Notice that φt (k) and θt (k) are periodic in S, such that φt (k) = φ t (k) and θt (k) = θ t (k), where t is the season corresponding to index t, so that t = t 14 mod S. Proposition 2.1.1. Hn = sp{X ¯ ¯ 0 , . . . , Xn−1 } = sp{W 0 , . . . , Wn−1 } (2.1.4) for all n ≥ 1. ∗ = sp{W Proof. Define Hn ¯ ¯ ¯ 0 , . . . , Wn−1 }. For j = 1, obviously H1 = sp{X 0 } = sp{W 0} = H1∗ . Assume that when j = n − 1, (2.1.4) holds, i.e. Hn−1 = sp{X ¯ 0 , . . . , Xn−2 } = ∗ . By (2.1.2), sp{W ¯ . , Wn−2 } = Hn−1 0, . .    X n−1 m − 1 ≥ j. Finally, for the last case, when m ≤ j ≤ , using (1.0.1) again I have  q E(Wj W ) = E   q θj (k)εj−k θ (r)ε −r  r=0 k=0 q θj (k)θ (r)E(εj−k ε −r ), = k=0 r=0 q and E(εj−k ε −r ) = 0 unless j − k = − r. Remark 2.1.4. Since θ0 (k) = 0 for k > q, it follows from the final case in Proposition 2.1.3 that C(j, ) = 0 whenever > m and > j + q. Using the covariance function C(j, ) computed in Proposition 2.1.3, I can now apply the innovations algorithm from Brockwell and Davis [11, Proposition 5.2.2 ] to the transformed process (2.1.2) to compute the one-step ahead predictor n ˆn = W ˆ θn,j Wn−j − W n−j , (2.1.7) j=1 where θn,1 , . . . , θn,n are the unique projection coefficients that minimize the mean squared error ˆn vn = E Wn − W 20 2 . (2.1.8) The coefficients θn,j in (2.1.7) and vn in (2.1.8) are computed from the system of equations v0 = C(0, 0)  θn,n−k = v −1 C(n, k) − k  k−1 θk,k−j θn,n−j vj  j=0 n−1 vn = C(n, n) − θn,n−j 2 (2.1.9) vj j=0 solved in the order v0 , θ1,1 , v1 , θ2,2 , θ2,1 , v2 , θ3,3 , θ3,2 , θ3,1 , v3 , . . . and so forth. Proposition 2.1.5. Given a PARMAS (p, q) process (1.0.1), the innovations algorithm (2.1.9) applied to the transformed process (2.1.2) with covariance function C(j, ) from Proposition 2.1.3 yields      0       n   ˆ θn,j Wn−j − W n−j ˆ Wn = j=1      q    ˆ  θn,j Wn−j − W  n−j   j=1 n = 0, 1 ≤ n < m, (2.1.10) n ≥ m. In particular, I have θn,j = 0 whenever j > q and n ≥ m. Proof. If n ≥ m and j > q, then Remark 2.1.4 implies that C(n, n − j) = E(Wn Wn−j ) = 0. Since ˆ W ¯ 0 , . . . , Wn−j−1 }, n−j ∈ sp{W ˆ another application of Remark 2.1.4 shows that E(Wn W n−j ) = 0, and then it follows using the linearity of the expectation operator that ˆ E Wn (Wn−j − W n−j ) = 0. 21 (2.1.11) The proof of the innovations algorithm in [11, Proposition 5.2.2 ] shows that the random variables ˆ {Wn−j − W n−j : j = 1, 2, . . . , n} are uncorrelated, and hence the coefficients −1 E W (W ˆ θn,j = vn−j n n−j − W n−j ) (2.1.12) are uniquely defined. Then (2.1.10) follows from (2.1.11), (2.1.12), and (2.1.9). Remark 2.1.6. Proposition 2.1.5 shows the advantage of the transformed process (2.1.2) for computing the innovations, since the sum in (2.1.10) terminates after q lags when n ≥ m. Since the forecast equations developed later in this paper all depend on the innovations, this fact will be used to speed up the forecast computations. ˆ n for the best linear predictor, that The next result gives the one-step ahead predictors X minimize the mean squared prediction error. Equation (2.1.13) was proven by Lund and Basawa [23, Equation (3.4)] in a different notation. Theorem 2.1.7. The one-step predictors (2.1.6) for a PARMAS (p, q) process (1.0.1) can be computed recursively using  n   ˆ  θn,j Xn−j − X  n−j   j=1 ˆn = X p q    ˆ  φn (j)Xn−j + θn,j Xn−j − X  n−j  j=1 j=1 1≤n h, it follows using Brockwell and Davis [11, Proposition 2.3.2] that n+h ˆ θn+h,j Wn+h−j − W n+h−j . PHn Wn+h = (2.1.18) j=h+1 ˆ Since each Wn+h−j − W n+h−j lies in Hn , for j > h, and Wn+h −PHn Wn+h is orthogonal to any element of Hn , it follows that the mean square error of the h-step prediction is E Wn+h − PHn Wn+h 2 n+h = C(n + h, n + h) − θn+h,j j=h+1 24 2 vn+h−j . From (2.1.2), I can write PHn Wn+h = PHn Xn+h when 0 ≤ h ≤ m − n − 1, and p PHn Wn+h = PHn Xn+h − φn+h (k)PHn Xn+h−k (2.1.19) k=1 when h ≥ m − n. Substitute (2.1.15) into (2.1.18) to get n+h ˆ θn+h,j Xn+h−j − X n+h−j . PHn Wn+h = (2.1.20) j=h+1 If 0 ≤ h ≤ m − n − 1, then (3.0.6) and (2.1.2) imply that (2.1.16) holds. If h ≥ m − n, substitute (3.0.6) into (2.1.19) to get (2.1.17), using the fact that θn+h,j = 0 if j > q and h ≥ m − n. ˜ ,...,X ˜ Given a PARMAS (p, q) time series data set X 0 n−1 , I can forecast future values using the h-step ahead predictors from Theorem 2.1.9. Note that this requires computing all of the innovations ˆ Xt − X t for t = 0, 1, . . . , n − 1 using the recursive equation (2.1.13). Complete details will be provided in Chapter 3. In the next chapter, I explicitly compute the forecast errors, which will be needed to compute prediction bands for this forecast. 2.2 Forecast errors The next theorem is the main theoretical result of this chapter. It explicitly computes the variance of the forecast errors, and a simpler asymptotic variance that is useful for computations. Formula (2.2.12) for the covariance matrix of the forecast errors was established 25 by Lund and Basawa [23, Equation (3.41)] in a different notation. However, that paper did not develop an explicit formula for the forecast error variance. I begin with the data set {X0 , X1 , . . . , Xn−1 } as in Chapter 2, where n = N × S. Theorem 2.2.1. Define χj (0) = 1 for all j ≥ 0, χj (j − ) = 0 for all j ≥ 0 and > j, and recursively min(p, ) φj (k)χj−k ( − k) χj ( ) = (2.2.1) k=1 for all j ≥ 0 and 0 ≤ < j. 2 (h) = E Then the mean-squared error σn Xn+h − PHn Xn+h 2 of the h-step predictors PHn Xn+h for the PARMAS (p, q) process (1.0.1) can be computed recursively using 2 (h) = σn  h 2 j χh (k)θn+h−k,j−k  vn+h−j  j=0 (2.2.2) k=0 when n ≥ m := max(p, q), where the coefficients θn+h−k,j−k and vn+h−j are computed via the innovations algorithm (2.1.9) applied to the transformed process (2.1.2). Furthermore, the asymptotic mean squared error is given by 2 (h) → σn h 2 (j)σ 2 ψh h−j as n = N S → ∞, (2.2.3) j=0 where j χh (k)θh−k (j − k). ψh (j) = k=0 ˜ − µ is the mean-centered process in (1.0.1), and W is the Proof. Recall that Xt = X t t t 2 (h) = transformed process in (2.1.2). Note that the mean squared prediction error σn 26 E(Xn+h − PHn Xn+h )2 for the mean-centered PARMA process (1.0.1) is not the same as the mean squared prediction error E(Wn+h − PHn Wn+h )2 for the transformed process (2.1.2). When n ≥ m := max(p, q), Theorem 2.1.9 implies that (2.1.17) holds for any h ≥ 0, and note that the second term in (2.1.17) vanishes when h + 1 > q. Write ˆ ˆ Xn+h = X n+h + (Xn+h − Xn+h ), and since n = N S, then φn+h (j) = φh (j). Substitute (2.1.13) with n ≥ m to get Xn+h = φh (1)Xn+h−1 + . . . + φh (p)Xn+h−p q ˆ θn+h,j Xn+h−j − X + n+h−j , j=0 (2.2.4) with θn,0 = 1 for all n. Subtract (2.1.17) from (2.2.4) and rearrange terms to get p Xn+h − PHn Xn+h − φh (k) Xn+h−k − PHn Xn+h−k k=1 h (2.2.5) ˆ θn+h,j Xn+h−j − X n+h−j . = j=0 Define the random vectors     Mn =    ˆn Xn − X .. . ˆ Xn+h − X n+h           X n − PH n X n      . . and Fn =   .     Xn+h − PHn Xn+h . 27 (2.2.6) Write h Φh = −φj (j − ) j, =0 (2.2.7) where I define φj (0) = −1 for all j, and φj (k) = 0 for k > p or k < 0. Note that φj (k) is periodic in S, such that φj (k) = φ j (k), where j is the season corresponding to index j, so that j = j mod S. Then from the innovations algorithm, write h Θn = θn+j,j− j, =0 (2.2.8) where I define θn,0 = 1, and θn,k := 0 for k > q or k < 0. Then I can use (5.1.3) to write Φh Fn = Θn Mn , (2.2.9) where I note that   0 0 0  1    −φ (1) 1 0 0  1    −φ (2) −φ2 (1) 1 0  2 Φh =    −φ3 (3) −φ3 (2) −φ3 (1) 1   .. .. .. ..   . . . .   −φh (h) −φh (h − 1) −φh (h − 2) . . . 28 . . . 0   . . . 0    . . . 0    . . . 0   .. ..  . .   ... 1 and   0 0  1   θ 1 0  n+1,1   θ θn+2,1 1  n+2,2 Θn =    θn+3,3 θn+3,2 θn+3,1   .. ..  ..  . . .   θn+h,h θn+h,h−1 θn+h,h−2 . . . 0   . . . 0    . . . 0   ..  1 .   .. ..  . .   ... 1 are lower triangular matrices. The entries of the innovations vector Mn in (2.2.6) are uncorrelated, with covariance matrix Vn := E Mn Mn = diag vn , vn+1 , . . . , vn+h , (2.2.10) where I recall that vn = E ˆn Xn − X 2 (2.2.11) are the one step ahead prediction errors. Then the covariance matrix of the vector Fn = Φ−1 Θn Mn of prediction errors is h Cn := E Fn Fn = Ψn Vn Ψn where Ψn = Φ−1 Θn h (2.2.12) and denotes the transpose. In order to compute the matrix Cn in (2.2.12), I first need to compute the inverse matrix h Φ−1 = χj (j − ) h j, =0 29 (2.2.13) and show that (2.2.1) holds. An elegant way to compute the inverse matrix (5.1.11) is to use operator notation along with the z-transform. Use (1.0.1) to write Φ(B)Xn+j = Θ(B)εn+j for j ≥ 0, (2.2.14) where p p p φn+j (k)z k = − Φ(z) = 1 − φn+j (k)z k = − φj (k)z k k=1 q k=0 k=0 q q Θ(z) = 1 + θn+j (k)z k = θn+j (k)z k = θj (k)z k , k=1 k=0 k=0 since n = N S, BXt = Xt−1 is the backward shift operator. Then the infinite order moving average representation (1.0.2) for the mean-centered process Xt can be written in the form Xn+j = Ψ(B)εn+j for j ≥ 0, (2.2.15) where Ψ(z) = Φ−1 (z)Θ(z) (2.2.16) Notice that Φh = Φ, however Θn = Θ and Ψn = Ψ. Write ∞ Ψ(z) = Φ−1 (z) = k=0 ∞ ψj (k)z k χj (k)z k k=0 extending the notation (5.1.11). If n ≥ m = max(p, q), then the infinite order moving 30 average representation (2.1.3) of the transformed process Wt can be written in the form for j ≥ 0 and n ≥ m. Wn+j = Θ(B)εn+j (2.2.17) Equating (2.2.15) and (2.2.17) and using (2.2.16) shows that Xn+j = Φ−1 (B)Wn+j for j ≥ 0. (2.2.18) Since Φ−1 (z)Φ(z) = I, I have  ∞    p χj+k (k)z k  1 − k=0 φj (k)z k  = 1 k=1 for all j ≥ 0. By expanding the above equation, I have χj (0) + χj+1 (1)z + · · · + χj+k (k)z k + · · · 1 − φj (1)z − · · · − φj (p)z p = 1, which could be separated as χj (0) + χj+1 (1) − χj (0)φj (1) z + · · · + χj+p (p) − χj+p−1 (p − 1)φj (1) − · · · − χj (0)φj (p) z p + · · · = 1 31 Setting every coefficient of z k to 0, for k = 1, 2, . . ., I have for all j ≥ 0, χj (0) = 1 χj+1 (1) = χj (0)φj (1) .. . χj+p (p) = χj+p−1 (p − 1)φj (1) − · · · − χj (0)φj (p). Hence, min(p, ) φj (k)χj−k ( − k), φj (k)χj−k ( − k) = χj ( ) = k=1 k=1 using the fact that φj (k) = 0, k > p and χj ( − k) = 0, k > , which proves (2.2.1). Since Φh = Φ, this shows that the inverse matrix (5.1.11) can be computed for any h ≥ 0 by taking χj (0) = 1 for all j ≥ 0, χj (j − ) = 0 for all j ≥ 0 and > j, and recursively applying the formula (2.2.1) for all j > 0 and 0 ≤ < j. Now I can proceed to compute the matrix Cn in (2.2.12). First write h Ψn = Φ−1 Θn = ψn+j,j− h j, =0 32 where h ψn+j,j− = s=0 h (Θ ) Φ−1 h j,s n s, χj (j − s)θn+s,s− = (2.2.19) s=0 j χj (j − s)θn+s,s− = s= since χj (j − s) = 0 for s > j, and θn+s,s− = 0 for s < . Substitute s = j − k in the sum (5.1.5) to obtain j− ψn+j,j− = χj (k)θn+j−k,j− −k k=0 33 (2.2.20) Now it follows from (5.1.5), (5.1.9), and (5.1.14) that 2 (h) = E σn Xn+h − PHn Xn+h 2 = Ch,h h h = Ψh,u Vu,w Ψw,h u=0 w=0 h h = Ψh,u Vu,w Ψh,w u=0 w=0 h Ψh,u Vu,u Ψh,u = u=0 h 2 = ψn+h,h−u vn+u u=0  2 h h−u  = χh (k)θn+h−k,h−u−k  vn+u u=0 k=0 (2.2.21) Substitute u = h − j to obtain (2.2.2). Next I require a few preparatory results. Lemma 2.2.2. With the innovations algorithm (2.1.9) applied to the transformed process (2.1.2), I have vr − σr2 → 0, as r → ∞. Proof. Recall that Hr = sp{X ¯ ¯ 0 , . . . , Xr−1 } = sp{W 0 , . . . , Wr−1 }, and define Hr = sp{W ¯ j , −∞ < j ≤ r − 1}. 34 By the invertibility condition of Wr process in Proposition 2.1.2, ∞ σr2 = E(ε2r ) = E(Wr + j=1 πr (j)Wr−j )2 = E(Wr − PHr Wr )2 where ∞ j=1 πr (j)Wr−j = PHr (εr − Wr ) = −PHr Wr , since εr ⊥Hr and Wr−j ∈ Hr , j = 1, 2, . . . , ∞. Since Hr ⊆ Hr , I have PHr Wr ∈ Hr , thus by the Projection Theorem in Brockwell and Davis [11, Theorem 2.3.1 ], it follows that ˆ r )2 = vr . σr2 = E(Wr − PHr Wr )2 ≤ E(Wr − PHr Wr )2 = E(Wr − W On the other hand, r πr (j)Wr−j ∈ Hr , − j=1 so that by another application of the Projection Theorem in Brockwell and Davis [11, Theorem 2.3.1 ], I obtain ˆ r )2 = E(Wr − P Wr )2 vr = E(Wr − W Hr r ≤ E Wr − (− πr (j)Wr−j ) 2 j=1 r = E πr (0)Wr + πr (j)Wr−j 2 j=1 r = E( πr (j)Wr−j )2 j=0 35 Therefore, since εr ⊥ Hr ,  2 r vr ≤ E  πr (j)Wr−j  j=0 2  = E εr − πr (j)Wr−j  j>r  2 = E(εr )2 + E  πr (j)Wr−j  j>r   = σr2 + E  πr (j)Wr−j j>r ≤ σr2 + πr (k)Wr−k  k>r | πr (j) || πr (k) |E | Wr−j Wr−k | j,k>r ≤ σr2 + | πr (j) || πr (k) | C(r − j, r − j)C(r − k, r − k) j,k>r  ≤ σr2 +  2 | πr (j) | M, j>r where M = max{C(i, i) : i = 0, 1, . . . , S − 1} < ∞. In summary, I have proved | πr (j) | 2 M σr2 ≤ vr ≤ σr2 + j>r where 2 → 0 uniformly over all seasons, as r → ∞. Hence , as r → ∞, I j>r | πr (j) | have | πr (j) | 2 M → 0, |vr − σr2 | ≤ j>r which completes the proof. 36 ˆ r be given by (2.1.10). Then Lemma 2.2.3. Let Wr be given by (2.1.2) and W ˆ r − εr )2 = 0. lim E (Wr − W r→∞ q j=0 θr (j)εr−j in (2.1.3), I may write Proof. By the causal representation Wr = q ˆ r )] = E(ε2 ) = σ 2 , θr (j)εr−j − W r r ˆ r )] = E[εr (εr + E[εr (Wr − W j=1 ˆ r . Hence E[εr (Wr − W ˆ r )] = since εr ⊥ εr−j , for j = 1, . . . , q, and εr ⊥ Hr , therefore εr ⊥ W σr2 . Then ˆ r − εr )2 E (Wr − W ˆ r )2 − 2E[εr (Wr − W ˆ r )] + E(ε2 ) = E (Wr − W r = vr − 2σr2 + σr2 = vr − σr2 → 0, using Lemma 2.2.2. Proposition 2.2.4. Let θr,k be the projection coefficients from (2.1.7) and let θr (k) be the moving average coefficients from (1.0.1). Then |θr,k − θr (k)| → 0 as r → ∞, for all k = 1, 2, . . . Proof. I know that by (2.1.12) ˆ θr,k = v −1 E Wr (Wr−k − W r−k ) . r−k 37 I also have θr (k) = σ −2 E(Wr εr−k ), r−k since 2 , E(Wr εr−k ) = E θr (k)ε2r−k = θr (k)σr−k using the causal representation (2.1.3). By the triangle inequality, ˆ θr,k − θr (k) ≤ θr,k − σ −2 E Wr (Wr−k − W r−k ) r−k ˆ + σ −2 E Wr (Wr−k − W r−k − εr−k ) r−k = θr,k − σ −2 θr,k vr−k r−k (2.2.22) ˆ + σ −2 E Wr (Wr−k − W r−k − εr−k ) r−k ≤ θr,k 1 − σ −2 vr−k r−k 1 1 ˆ r − εr ) 2 2 , + σ −2 C(r, r) 2 E(Wr − W r−k using the Cauchy-Schwarz inequality. Note that θr,k is uniformly bounded in r over all seasons, since (2.1.12) and the Cauchy-Schwarz Inequality imply, ˆ θr,k = v −1 E Wr (Wr−k − W r−k ) r−k 2 ˆ ≤ v −1 C(r, r) E(Wr−k − W r−k ) r−k −1/2 =v C(r, r). r−k Then, as r → ∞, the first term on the right-hand side of (2.2.22) approaches 0 by Lemma 2.2.2. The second term on the right-hand side of (2.2.22) approaches 0 by Lemma 2.2.3 1 and the fact that σ −2 C(r, r) 2 is uniformly bounded in r. Thus, θr,k − θr (k) → 0 as r−k 38 r → ∞. Proof of Theorem 2.2.1 continued. Using the covariance function C(j, ) computed in Proposition 2.1.3, I can now apply the innovations algorithm from Brockwell and Davis [11, Proposition 5.2.2 ] to the transformed process (2.1.2) to compute the one-step ahead predictor n ˆn = W ˆ θn,j Wn−j − W n−j j=1 ˆ = 0. Proposition 2.2.4 shows that for n > 0, where W 0 θs, − θs ( ) → 0 as s → ∞, Note that (2.2.23) also holds for s = n + h − k and for all > 0. (2.2.23) = 0, since θs,0 = θs (0) = 1 by definition. Substitute = j − k to see that θn+h−k,j−k − θn+h−k (j − k) → 0 as n → ∞ (2.2.24) for all j ≥ k ≥ 0. Since n = N S, then θn+h−k (j − k) = θN S+h−k (j − k) = θh−k (j − k). Therefore (5.1.19) is better written as θN S+h−k,j−k − θh−k (j − k) → 0 as N → ∞. (2.2.25) The mean-centered process Xt has moving average representation (2.2.15), which can be related to the moving average representation (2.2.17) of the transformed process Wt by the 39 relation (2.2.16). Expanding both sides I obtain ∞ ψh (j)z j = j=0 q ∞ χh (j)z j j=0 θh (j)z j . j=0 Expanding the product on the right-hand side and equating coefficients, we have ∞ j=0 ψh (j)z j = ∞ χh (j)z j θh (0)1 + θh (1)z + · · · + θh (q)z q j=0 = χh (0)θh (0)1 + χh (0)θh (1) + χh (1)θh−1 (0) z j + ··· + χh (k)θh−k (j − k) z j + · · · k=0 j ∞ = χh (k)θh−k (j − k) z k . j=0 k=0 Then j χh (k)θh−k (j − k) ψh (j) = (2.2.26) k=0 for all h ≥ 0 and j ≥ 0. Now it follows from (5.1.5), (2.2.25) and (2.2.26) that lim ψ (j) − ψN S+h,j N →∞ h j j = lim χh (k)θh−k (j − k) − χh (k)θN S+h−k,j−k N →∞ k=0 k=0 j ≤ lim χh (k) θh−k (j − k) − θN S+h−k,j−k N →∞ k=0 (2.2.27) = 0, where for each h ≥ 0, k = 0, . . . , j and 0 ≤ j ≤ h, χh (k) is a constant independent of 40 n = N S. Notice that from (2.2.25), as N → ∞ I have j j χh (k)θh−k (j − k) χh (k)θN S+h−k,j−k → k=0 k=0 and therefore,  2 j  2 j χh (k)θh−k (j − k) χh (k)θN S+h−k,j−k  →   (2.2.28) k=0 k=0 2 is periodic with period S and thus Furthermore, by (2.2.28), since h is finite and σh−j bounded, then as N → ∞, h lim N →∞  j  j χh (k)θN S+h−k (j − k) −   j=0 2 k=0 2 χh (k)θN S+h−k,j−k  2 σh−j →0 k=0 (2.2.29) 41 Then it follows from (2.2.2), (2.2.27), (2.2.29) and Lemma 2.2.2 that lim σ 2 (h) − n→∞ n = = h 2 (j)σ 2 ψn+h n+h−j j=0 2 (h) − lim σN S N →∞ 2 (h) − lim σN S N →∞ h 2 (j)σ 2 ψh h−j j=0 h   2 j 2 χh (k)θh−k (j − k) σh−j j=0 k=0  2 j h 2 2 (h) −  χh (k)θN S+h−k,j−k  σh−j ≤ lim σN S N →∞ j=0 k=0  2  2 j j h 2  χh (k)θh−k (j − k) −  χh (k)θN S+h−k,j−k  σh−j + lim N →∞ j=0 k=0 k=0  2 j h 2 (h) − 2  χh (k)θN S+h−k,j−k  σh−j = lim σN S N →∞ j=0 k=0  2 j h 2 (h) − 2  χh (k)θN S+h−k,j−k  vN S+h−j + σh−j − vN S+h−j = lim σN S N →∞ j=0 k=0  2 j h 2 (h) −  χh (k)θN S+h−k,j−k  vN S+h−j ≤ lim σN S N →∞ j=0 k=0  2 j h 2  + lim χh (k)θN S+h−k,j−k  σh−j − vN S+h−j N →∞ j=0 k=0  2 j h 2 (h) −  = lim σn χh (k)θn+h−k,j−k  vn+h−j + 0 n→∞ j=0 k=0 = 0. This proves (2.2.3). 42 The following corollary gives asymptotic prediction intervals for the forecast, in the Gaussian case. Corollary 2.2.5. If {Xt } is a 0-mean Gaussian process, then the probability that Xn+h lies 1 h ψ 2 (j)σ 2 2 approaches (1 − α) as n → ∞, between the bounds PHn Xn+h ± zα/2 j=0 h h−j where zα is the (1 − α)-quantile of the standard normal distribution. Proof. Since (X0 , X1 , . . . , Xn+h ) has a multivariate normal distribution, then by Problem 2.20 in Brockwell and Davis [11], Xn+h = E(Xn+h |X0 , . . . , Xn−1 ). PHn Xn+h = Esp ¯ {X ,...,X 0 n−1 } Then since (2.2.3) holds, letting Φ(t) denote the cumulative distribution function of the standard normal distribution, it follows that, as n → ∞, h P PHn Xn+h − zα/2 1 2 (j)σ 2 2 ψh h−j j=0 h ≤ Xn+h ≤ PHn Xn+h + zα/2 1 2 (j)σ 2 2 ψh h−j j=0 = P |Xn+h − PHn Xn+h | ≤ zα/2 σn (h) → Φ(zα/2 ) − Φ(−zα/2 ) = 1 − α. 43 Chapter 3 Computation In this chapter, I outline the computations needed to produce forecasts and the associated prediction intervals. All the calculations are carried out by R software, and the codes are in the appendix. This chapter details the codes and their usage. The first step is model selection, i.e., the number of seasons S, the order p of the autoregressive part, the order q of the moving average part must be chosen. Usually S is known from the application. For example, I use S = 4 for quarterly data and S = 12 for monthly data. The next step is to estimate the autoregressive parameters φt (k) for k = 1, . . . , p, the moving average parameters θt (j) for j = 1, . . . , q, and the residual standard deviations σt of a PARMAS (p, q) model ˜ −µ for the sample-mean corrected data, Xt = X t ˆt . These two steps are closely connected, since the process of model selection requires fitting a proposed model to judge its adequacy. The entire procedure of model selection and parameter estimation is outlined in Tesfaye, et al.[42]. A brief synopsis is given in the following paragraph. ˜ ,X ˜ ˜ , we can extract a subset X ˜ ,X ˜ ˜ For any data set X 0 1 , . . . , Xn i i+1 , . . . , Xi+n−1 , where ˜ i represents the i-th season, n = N S, N equals the number of years of data and S equals the 44 number of seasons in a year. Use (1.0.4) and (1.0.5), respectively, to compute the seasonal sample means and autocovariances. The means µ ˆi for i = 0, 1, . . . , S − 1 are stored in an S × 1 array MU(I) for I= 1, . . . , S and the autocovariances γˆi ( ) for i = 0, 1, . . . , S − 1 and = 0, . . . , N − 1 are stored in an S × N array GAMMA(I,L) for I= 1, 2, . . . , S and L= 1, . . . , N . Since our notation begins with season i = 0 and lag = 0, and since many coding platforms (including R) do not allow zero or negative subscripts, there is a change of notation I= i + 1 and L= + 1. In this way, MU(I) = µ ˆi and GAMMA(I,L) = γˆi ( ). Now consider a general innovations algorithm for all seasons i, where i = 0, 1, . . . , S − 1: v0,i = C(i, i)  (i) θ = v −1 C(i + n, i + k) − n,n−k k,i n−1 vn,i = C(i + n, i + n) −  k−1 θ j=0 (i) (i) θn,n−j vj,i  k,k−j (3.0.1) 2 (i) vj,i , θn,n−j j=0 (i) (i) (i) (i) (i) (i) solved in the order v0,i , θ1,1 , v1,i , θ2,2 , θ2,1 , v2,i , θ3,3 , θ3,2 , θ3,1 , v3,i , . . . and so forth. Now k + 1 iterations of the innovations algorithm (3.0.1) with C(j, ) = γˆj ( − j) must be computed for every initial season i = 0, 1, . . . , S − 1 to obtain the estimates of the seasonal variances σ ˆi2 = vk, i−k , (3.0.2) and estimates of the coefficients in the infinite order moving average representation (1.0.2), ( i−k ) ψˆi ( ) = θ k, 45 (3.0.3) for = 0, . . . , k, where t is the season corresponding to the index t, so that jS + i = i. See Anderson, Meerschaert, and Vecchia [6] for more details. It is necessary to repeat the innovations algorithm for every initial season i = 0, 1, . . . , S − 1 because the final estimates of the seasonal variances σ ˆi2 and the infinite order moving average coefficients ψˆi (j) depend on both the starting season i and the number of iterations. The number of iterations k + 1 should be chosen so that all the parameter estimates σ ˆi2 for i = 0, 1, . . . , S − 1 and ψˆi (j) for i = 0, 1, . . . , S − 1 and = 0, . . . , m show evidence of convergence, where m = max{p, q}. I use the idea of relative error to show the convergence. For example, as the number of iterations increases, for a fixed season i, define σ ˆi2 (k) as the seasonal sample variance after k iterations, and σ ˆi2 (k + 1) the seasonal variance after k + 1 iterations. Since the value of variance can be large, I consider the relative change σ ˆi2 (k + 1) − σ ˆi2 (k) /ˆ σi2 (k). As a general rule of thumb, I interpret a relative error less than 0.05 after k + 1 iterations as evidence of convergence. The estimated seasonal variances are stored in an S × 1 array VAR(I) for I= 1, . . . , S and the estimated coefficients in the infinite order moving average representation are stored in an S × N array PSI(I,L) for I= 1, . . . , S and L= 1, . . . , N . In this way, VAR(I) = σ ˆi2 and PSI(I,L) = ψˆi ( ) , where I= i + 1 and L= + 1. Anderson and Meerschaert (2005) show that P vk, i−k −−−→ σi2 , (3.0.4) ( i−k ) P θˆ −−−→ ψi (j), k,j (3.0.5) 46 and ( i−k ) − ψi (j) ⇒ N 0, N 1/2 θˆ k,j j−1 σ 2 i−n ψ 2 (n) i 2 n=0 σi−j (3.0.6) as N → ∞ and k → ∞ for any fixed i = 0, 1, . . . , S − 1, where “ ⇒ ” indicates convergence in distribution, and N (m, v) is a normal random variable with mean m and variance v. The main technical condition for the convergence (3.0.6) to hold is that the noise sequence εt has a finite fourth moment. In practical applications, N is the number of years of data, k is the number of iterations of the innovations algorithm (typically on the order of k = 10, 15 or 20, see the discussion later), and the convergence in distribution is used to approximate the quantity on the lefthand side of (3.0.6) by a normal random variable. Equation (3.0.6) can be used to produce confidence intervals and hypothesis tests for the moving average parameters in (1.0.1). For example, an α-level test statistic rejects the null hypothesis H0 : ψi ( ) = 0 in favor of the alternative Ha : ψi ( ) = 0, indicating that the model parameter is statistically significantly different from zero) if |Z| > zα/2 . The p-value for this test is p = P (|Z| > |z|) ( i−k ) N 1/2 θˆ k, z= , W where and Z ∼ N (0, 1), −1 n=0 vˆk, i−k−n W2 = vˆk, i−k− ( i−k ) 2 θˆ k,n (3.0.7) . The innovations algorithm allows us to identify an appropriate model for the periodic time series at hand, and the p-value formula gives us a way to determine which coefficients in the identified PARMA model are statistically significantly different from zero (those with a 47 small p-value, say, p < 0.05). Once estimates of the infinite order moving average coefficients ψˆi (j) have been computed, a system of vector difference equations must be solved to determine estimates of the autoregressive parameters φˆi (j) for i = 0, 1, . . . , S − 1 and j = 0, . . . , p, and estimates of the moving average parameters θˆi (j) for i = 0, 1, . . . , S − 1 and j = 0, . . . , q. See Tesfaye, Anderson and Meerschaert [43] for complete details. In the special case of a PARMAS (1, 1) model, it is possible to solve those difference equations by hand to obtain φˆt (1) = ψˆt (2)/ψˆt−1 (1) and θˆt (1) = ψˆt (1) − φˆt (1) (3.0.8) where ψˆt (0) = 1. Hence k + 1 iterations of the innovations algorithm for every initial season i = 0, 1, . . . , S −1 are sufficient to estimate these parameters, assuming that k is large enough to ensure convergence for the variance estimates σ ˆi2 and the infinite order moving average coefficients ψˆi (j) for all seasons i = 0, 1, . . . , S − 1 and for all lags j = 0, . . . , m, where m = max{p, q}. In general, the number of iterations needed for convergence will depend on the order of the model being fit. I use an S ×(m+1) array to store θˆt (j), for a PARMAS (p, q) model, and use the same size array to store φˆt (j). In my R code, the corresponding names of these arrays are named as THETA and PHI respectively. Table 3 lists moving average parameter estimates ψˆi ( ) at season i and lag = 1, 2, . . . , 6, and p-values, after k = 20 iterations of the innovations algorithm applied to average monthly flow series for the Fraser River near Hope BC. In the discussion of a PARMAS (1, 1) model, by (3.0.8) I only consider ψˆi ( ) at lag 1 and lag 2, and the ones with a higher p-value are shown in bold font in Table 3. In order to study the convergence of ψˆi ( ) as iterations k increase, I exclude ψˆi ( ) if its corresponding p-value is more than p0 of 0.05, since that value 48 Table 3.1: Moving average parameter estimates ψˆi ( ) at season i and lag = 1, 2, . . . , 6, and p-values, after k = 20 iterations of the innovations algorithm applied to average monthly flow series for the Fraser River near Hope BC. i ψˆi (1) p ψˆi (2) p ψˆi (3) p ψˆi (4) p ψˆi (5) p ψˆi (6) p 0 0.885 .00 0.134 .28 0.105 .10 0.163 .01 0.006 .93 0.038 1 0.625 .00 0.503 .00 0.085 .46 0.140 .02 0.077 .17 -0.004 .94 2 0.508 .00 0.350 .00 0.419 .00 0.032 .72 0.097 .03 0.019 .65 3 0.515 .00 0.287 .00 0.140 .07 0.239 .00 0.034 .60 0.030 .37 4 0.791 .00 0.165 .10 0.295 .00 0.112 .12 0.160 .03 0.045 .43 5 0.567 .00 0.757 .00 0.057 .61 0.250 .00 0.062 .40 0.139 .06 6 1.076 .01 0.711 .11 0.856 .01 0.415 .13 0.241 .17 0.112 .52 7 0.522 .03 0.684 .41 0.988 .28 1.095 .09 0.350 .51 0.198 .56 8 0.451 .00 -1.014 .00 -0.062 .66 -0.745 .50 0.128 .87 -0.635 .31 9 0.618 .00 -0.041 .77 -0.746 .01 -1.083 .26 -0.047 .97 0.514 .50 10 0.448 .00 0.409 .00 0.026 .78 -0.241 .20 -1.125 .08 0.799 .26 11 0.677 .00 0.159 .01 0.194 .00 0.050 .46 -0.190 .17 -0.402 .38 would not be significantly different from zero by (3.0.7). And consider the relative error ψˆi ( )(k+1) − ψˆi ( )(k) . ERRk ( ) = i ψˆi ( )(k) (3.0.9) And define the maximum of the relative error Rk = max{ERRk / Ik }, i ( ) : (i, ) ∈ where Ik is defined as Ik = {i, : p < p0 in (3.0.7)} Figure 3.1 shows plots of Rk against number of iterations k, and the convergence of ψˆi ( ) as k increase. Figure 3.1 also shows plots excluding ψˆi ( ) with p values above p0 of 0.01 and 0.10. Similar plots for vk, i−k are given in Figure 3.2. Once the model is fit, the adequacy of the model can be judged. One way to do this is 49 0.5 0.4 0.3 0.2 0.1 bound of 0.1 0.0 Value of max relative error 0.6 (a) Max relative error, P−value = 1% 0 5 10 15 20 25 30 Number of iterations Figure 3.1: Convergence of ψˆi ( ) as iterations k increase,where 0 ≤ i < S − 1, 1 ≤ ≤ 2, k = 1, 2, . . . , 30. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. to compute the residuals and check for any remaining serial correlation. To compute the residuals from the data, the invertible representation (1.0.3) is used and the weights πt (j) must be computed by solving another system of vector difference equations. In the special case of a PARMAS (1, 1) model, it is possible to solve those difference equations by hand (see Anderson, Meerschaert and Tesfaye [8]) to obtain δˆt = σ ˆt−1 Xt − t−i + (−1)j j=2 φˆt (1) + θˆt (1) Xt−1 (3.0.10) φˆt−j+1 (1) + θˆt−j+1 (1) θˆt (1)θˆt−1 (1)...θˆt−j+2 (1)Xt−j where i is the season of the first data point. This produces n − 1 residuals δˆt for t = i + 1, . . . , i + n − 1. In general, one obtains n − m residuals where m = max{p, q} depends on the order of the model being fit. Now plot the autocorrelation function (ACF) and the partial 50 0.5 0.4 0.3 0.1 0.2 bound of 0.2 0.0 Value of max relative error 0.6 (a) Max relative error 0 5 10 15 20 25 30 35 Number of iterations Figure 3.2: Convergence of vk, i−k as iterations k increase,where 0 ≤ i < S − 1, 1 ≤ 2, k = 1, 2, . . . , 30. ≤ autocorrelation function (PACF) of the residuals and check for any remaining serial correlation, in exactly the same way as for ARMA modeling. An example could be seen in Figure 4.3 of next chapter. Since the standardized errors δt = σt−1 εt in a PARMAS (p, q) model are also iid observations under this model, 95% of the ACF and PACF values should fall √ within the confidence bands ±1.96/ n − m if the model is adequate, see Tesfaye, et al [42]. The principle of parsimony suggests that I choose an adequate model with m = max (p, q) as small as possible. Once the order p of the autoregressive part and the order q of the moving average part are chosen and found adequate, it is then advisable to fit a reduced model with fewer parameters. One method for finding a reduced model using discrete Fourier transforms is discussed in Anderson, Meerschaert and Tesfaye [8]. Validation of a time series model is tantamount to the application of diagnostic checks to 51 the model residuals, to see if they resemble white noise. The Ljung-Box test can be used to test the white-noise null hypothesis (see [11]). If the null hypothesis of white-noise residuals is not rejected, and if the autocorrelation and partial autocorrelation functions of the residuals show no evidence of serial correlation, then I judge the model to be adequate. Fitting a suitable distribution to the residuals allows for a faithful simulation based on this model. To obtain additional parsimony, it is also advisable to consider simpler models where some statistically insignificant model parameters are set to zero. If the resulting model residuals pass the same diagnostic tests, then the simplified model is also deemed adequate. Next I detail the computations required to produce forecasts, and their prediction intervals. As before, I assume that the first season to be forecast is season i = 0, and write the ˜ ,X ˜ ˜ data in the form X 0 1 , . . . , Xn−1 , discarding a few of the oldest observations if necessary. ˜ −µ Then I compute the sample means µ ˆi using (1.0.4) and set X(I) = X i ˆi for I = 1, . . . , S. Step 0: Compute the sample autocovariance by (1.0.5). Apply innovations algorithm with γˆi ( ), and get estimates for σ ˆi2 and ψˆi ( ) by (3.0.2) and (3.0.3). Then model parameter estimates θˆt (j) and φˆt (k) could be computed by (3.0.8). From the constructed model, create an S × Q array to store θˆt (j), j = 0, 1, . . . , q with θˆt (0) = 1 and t = 0, 1, . . . , S − 1, where Q = q + 1. Also, create an S × p array to store φˆt (k), k = 1, . . . , p and an S × 1 array to store σ ˆt2 . The corresponding array names in my R codes are THETA, PHI and SIGMA respectively. Step 1: Use Proposition 2.1.3 to compute the covariances, C(j, ), of the transformed process Wt given in (2.1.2). Notice that in the computation process, γj ( − j), θˆt (j), φˆt (k), and σ ˆt2 are obtained in Step 0. Create an n × n array C(J,L) to store C(j, ), where C(J,L) = C(j, ), for J = j + 1, L = + 1. Speed and efficiency of the algorithm given in Proposition 52 2.1.3 is provided for by noting that C(j, ) = 0 if − j > q and ≥ i + m where m = max(p,q). I may let the n × n array defined just now as a zero-array, in this way, with the previous condition, I could avoid a lot of computations. Step 2: From the innovations algorithm given by (2.1.9), once C(j, ) is given, calculate the coefficients θn,j . Since θn,j = 0 if j > q and n ≥ m, I will need an array of size n × Q (again, θn,0 = 1) to store the innovation coefficients, where the corresponding array name is THETA in my R code. The innovations algorithm is solved in the order v0 , θ1,1 , v1 , θ2,2 , θ2,1 , v2 , θ3,3 , θ3,2 , θ3,1 , v3 , . . . and so forth. Use an n × 1 array to store vk . ˆ ,X ˆ ˆ Step 3: Compute the one-step predictors, X 1 2 , . . . , Xn−1 , by (2.1.13), and they are stored in the array Xhat in my R code. At most only an m-step computation is needed, where m = max(p, q), since (2.1.13) only requires the storage of at most p past observations ˆ Xn−1 , . . . , Xn−p and at most q past innovations (Xn−j − X n−j ), j = 1, . . . , q. Therefore computation is much faster when p and q are small, for example, PARMA(1,1) model. The one-step prediction is based on the mean-subtracted data, so I add the seasonal mean back ˆ n once the prediction is obtained. See step 4 for specific storage of X ˆn. to X Step 4: The one- and h-step predictors are stored in the same vector Y of length n + h, where h is the desired number of forecast steps. For example, Y [n + 2] is the prediction at 2-step. Compute the h-step predictors PHn Xn , PHn Xn+1 , . . ., in order, using the recursion given by (2.1.17). The calculation of Y [h] is based on the information on Y [h − 1], so the computation is recursive. At most only an m-step computation is needed. Step 5: By Corollary 2.2.5, compute approximate 95% Gaussian prediction bounds for Xn+h given by PHn Xn+h ± 1.96 × σn (h). The large sample approximation of σn (h) is obtained from (2.2.3). The weights, ψt (j) in (2.2.3) are computed from the model parameters 53 φt and θt via the following recursions: p φi (k)ψi−k (j − k) = 0, j ≥ max(p, q + 1) ψi (j) − k=1 and j ψi (j) − φi (k)ψi−k (j − k) = 0, 0 ≤ j < max(p, q + 1) k=1 To avoid this computation, I may adopt the parameter estimate ψˆi (j) in (3.0.3). Finally, ˜ ˜ noting that X ˆn+h = Xn+h , the approximate 95% prediction bounds for X n+h − µ n+h ˆn+h = µ ˆ n+h . Two h-length vectors are (ˆ µn+h + PHn Xn+h ) ± 1.96 × σn (h). Also, µ U BOUND and L BOUND are defined to store the prediction bounds. 3.1 A simulation study for the convergence of the coefficients in innovations algorithm. To better testify the convergence of ψˆi ( ) and vˆk, i−k as iterations k increase in Figure 3.1 and Figure 3.2, we will conduct a detailed simulation to show the actual error in convergence of ψˆi ( ) and vˆk, i−k in innovations algorithm. 72-year of monthly data for PARMA12 (1, 1) were simulated, as shown in Table 3.1, and the innovations algorithm was used to obtain parameter estimates. Some general conclusions can be drawn from this study, which in practice proves the results in (3.0.4) and (3.0.5). Define the relative error for ψˆi ( ) as follows: ψˆi ( )(k+1) − ψˆi ( )(k) ERR1k ( ) = . i ψˆi ( )(k) 54 (3.1.1) And define the maximum of the relative error R1k = max{ERR1k / Ik }, i ( ) : (i, ) ∈ where Ik is defined as Ik = {i, : p < p0 in (3.0.7)} And define the relative error for vˆk, i−k : ERR2k i = vˆk+1, i−(k+1) − vˆk, i−k vˆk, i−k . (3.1.2) And define the maximum of the relative error R2k = max{ERR2k i : ∀i = 0, . . . , 11}, Figure 3.3 illustrates plots of R1k and R2k against iterations k, which shows the convergence of ψˆi ( ) and vˆk, i−k as iterations k increase,where 0 ≤ i ≤ 11, 1 ≤ ≤ 2, k = 1, 2, . . . , 50. For ψˆi ( ) , the maximum relative error over all seasons i drop below 5% when k is between 10 and 20, and it increases when k is larger than 20, since the sample autocovariance becomes relatively small when the lag is large. Similar observation could be seen for vˆk, i−k , where the maximum relative error over all seasons i drop below 7% when k is between 10 and 20, and it increases when k is larger than 20. Therefore, we conclude that 10 to 20 iterations of the innovations algorithm are usually sufficient to obtain reasonable estimates of the model parameters. Furthermore, a convergence test for the model parameters φˆ and θˆ is shown in FIgure 3.4, and it shows similar results, which proves our previous 55 Table 3.2: Model parameters and estimates season i φi φˆi θi 0 0.20 0.34 0.70 0.60 0.72 0.10 1 2 0.60 0.56 -0.10 0.60 0.45 -0.10 3 4 0.30 0.30 0.50 5 0.90 0.85 -0.40 6 1.30 2.03 -0.20 0.60 0.06 -0.10 7 8 -1.90 -4.85 2.40 -0.09 0.38 0.70 9 0.70 0.65 -0.20 10 11 0.40 0.10 0.30 conclusion. 56 for simulated PARMA12 (1, 1) data. θˆi σi σˆi 0.58 11900 8766 0.10 11600 7626 -0.16 7300 6693 0.05 6000 4402 0.48 4200 3521 -0.29 4600 4287 -0.23 15200 15880 0.20 31100 22339 5.43 32800 31761 0.17 29700 29657 -0.25 15500 13426 0.51 12100 11126 0.3 0.2 0.1 Max relative error 0.4 Convergence of psi (median result in 100 simulations) 0.0 5% 0 10 20 30 40 50 Number of iterations 0.4 0.3 0.2 Max relative error 0.5 0.6 Convergence of sigma (median result in 100 simulations) 0.1 7% 0 10 20 30 40 50 Number of iterations Figure 3.3: Plots of R1k and R2k against iterations k, which show the convergence of ψˆi ( ) and vˆk, i−k as iterations k increase,where 0 ≤ i ≤ 11, 1 ≤ ≤ 2, k = 1, 2, . . . , 50. 57 0.3 0.2 Max relative error 0.4 0.5 Convergence of phi (median result in 100 simulations) 0.0 0.1 9% 0 10 20 30 40 50 Number of iterations 3 2 50% 0 1 Max relative error 4 5 Convergence of theta (median result in 100 simulations) 0 10 20 30 40 50 Number of iterations Figure 3.4: Convergence of φˆi and θˆi as iterations k increase,where 0 ≤ i ≤ 11, k = 1, 2, . . . , 50. 58 Chapter 4 Application to a Natural River Flow In this chapter, I apply the R codes in Chapter 3 to forecast future flows for a time series of monthly river flows, get 95% Gaussian prediction bounds for these forecasts. I model a monthly river flow time series from the Fraser River at Hope, British Columbia, which is the longest river in British Columbia, traveling almost 1400 km and sustained by a drainage area covering 220,000 square kilometers. See for maps and river flow data downloads at http://www.wateroffice.ec.gc.ca/ . Daily discharge measurements, in cubic meters per second (cms), were averaged over each of the respective months to produce monthly Fraser River flow time series. The series contains 72 years of data from October 1912 to September 1984, and part of the data is shown in Figure 4.1. To better test the prediction results, I based our forecast on the first 70 years of data, from 1912 to 1982. Then, I computed a 24-month prediction from October 1982 to September 1984. In our analysis, i = 0 corresponds to October and i = 11 corresponds to September. Using the “water year” starting on 1 October is customary for modeling of river flows, because of low correlation between Fall monthly flows. The sample seasonal mean, 59 250000 100000 0 Flow (cms) 10/1912 10/1918 10/1924 Month / Year Figure 4.1: Average monthly flows in cubic meters per second (cms) for the Fraser River at Hope, BC indicate a seasonal pattern. standard deviation and autocorrelations at lag 1 and lag 2 are given in Table 4 and Figure 4.2, with 95% confidence intervals. The non-stationary of the series is apparent, since the mean, standard deviation and correlation functions vary significantly from month to month (the confidence bands for some wet and dry seasons don’t overlap). Removing the periodicity in mean and variance will not yield a stationary series. Therefore a periodically stationary series model is appropriate. Tesfaye, et al. [42], identified a PARMA12 (1, 1) model Xt − φi Xt−1 = εt + θi εt−1 , 60 (4.0.1) 50000 0 20000 Sample sd (cms) 350000 150000 0 2 4 6 8 10 0 2 4 6 8 10 (c) Sample Autocorrelations : lag = 1 (d) Sample Autocorrelations : lag = 2 2 4 6 8 10 0.5 0.0 −0.5 0.8 0.4 0 1.0 Season Autocorrelations Season 0.0 Autocorrelations (b) Sample Standard Deviations 0 Sample mean (cms) (a) Sample Means 0 Season 2 4 6 8 10 Season Figure 4.2: Statistics for the Fraser river time series: (a) seasonal mean; (b) standard deviation; (c,d) autocorrelations at lags 1 and 2. Dotted lines are 95% confidence intervals. Season = 0 corresponds to October and Season = 11 corresponds to September. ˜ − µ , E (ε ) = 0, V (ε ) = σ 2 , σ −1 ε iid, for the series with S = 12, and where Xt = X t t t t t t t used the innovations algorithm at k = 20 iterations for periodically stationary processes by (3.0.1) and (3.0.8) to estimate φi (1), θi (1), and σi , i = 0, 1, . . . , 11. Table 4 gives the parameter estimates of the model where φˆ = (φ0 (1), . . . , φ11 (1)) , θˆ = (θ0 (1), . . . , θ11 (1)) , and σ ˆ = (σ0 , . . . , σ11 ) . I employ these estimates as the parameters of the model. Although the model is periodically stationary, the standardized residuals √ (3.0.10) should be stationary, so the standard 95% limits ( that is, 1.96/ n ) still apply. Figure 4.3 shows the ACF and PACF of the model residuals. Although a few values lie 61 √ Figure 4.3: ACF and PACF for model residuals, showing the bounds ±1.96/ N , indicate no serial dependence. With no apparent pattern, these plots indicate that the PARMA12 (1, 1) model is adequate. 62 Table 4.1: Parameter estimates for the PARMA model (1.0.1) of average monthly flow series for the Fraser River near Hope BC from October 1912 to September 1982. month φˆ θˆ σ ˆ OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP 0.187 0.592 0.575 0.519 0.337 0.931 1.286 1.059 -2.245 -1.105 0.679 0.353 0.704 0.050 -0.038 -0.041 0.469 -0.388 -0.088 -0.592 2.661 0.730 -0.236 0.326 11761.042 11468.539 7104.342 5879.327 4170.111 4469.202 15414.905 30017.508 32955.491 30069.997 15511.989 12111.919 Table 4.2: Sample mean, standard deviation and autocorrelation at lag 1 and 2 for an average monthly flow series for the Fraser River near Hope BC, from October 1912 to September 1982. month OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP µ ˆ 69850 55824 40502 33006 30740 29348 58959 173308 249564 198844 127138 86437 Parameter 1 γˆ (0) 2 19976 17709 12858 9269 8878 8864 20314 39437 45154 42627 28253 20071 63 ρˆ(1) 0.712 0.748 0.731 0.786 0.787 0.504 0.333 0.260 0.577 0.780 0.720 0.621 ρˆ(2) 0.515 0.577 0.541 0.697 0.380 0.286 -0.286 -0.031 0.499 0.456 0.308 0.472 350000 250000 150000 0 50000 Flow (cms) 10/1982 02/1983 06/1983 10/1983 02/1984 06/1984 Month / Year Figure 4.4: 24-month forecast (solid line with dots) based on 70 years of Fraser river data, with 95% prediction bounds (dotted lines). For comparison, the actual data (solid line) is also shown. This data was not used in the forecast. slightly outside of the 95% confidence bands, there is no apparent pattern. The p value from the Ljung-Box test was 0.08 indicating that I do not reject the null hypothesis that the residuals resemble iid white noise. Hence we conclude that the PARMA12 (1, 1) model is adequate. If I reject the null hypothesis that PARMA(1, 1) model residuals resemble iid noise, I would abandon that model and fit a PMA(q) model to the data, q ≥ 2. Using Theorem 1 in [8], I would identify the order, q, of the pure moving average and then parsimoniously estimate the moving average parameters that I deem to be nonzero. However, from Tesfaye, 64 et al. [42], the PARMA(1, 1) model is generally adequate to model most river flow time series. I then compute a 24-step future prediction for the Fraser river data, that is, a forecast for the next 24 months from October 1982 to September 1984.. The prediction is compared to the original data in Figure 4.4. Note that the forecast curve is close to the original data curve, and that the historical Fraser River data stay well within the 95% prediction bands. Figure 4.5 illustrates how the width of the prediction intervals vary with the season. This is a consequence of the non-stationarity of the river flow series, and specifically the fact that the standard deviation and the correlation functions vary significantly from month to month. 65 50000 0 −50000 Width of prediction bounds Width of prediction bounds (mean subtracted) 5 10 15 20 Month Figure 4.5: Width of 95% prediction bounds for the Fraser river. 66 Chapter 5 Maximum likelihood estimation and reduced model 5.1 Maximum likelihood Function for PARMAS (p, q) Model (i) (i) (i) By Yule-Walker equations, the vector of autoregressive coefficients φn = (φn,1 , . . . , φn,n ) solves the prediction equations (i) (i) Γn,i φn = γn (5.1.1) (i) with γn = (γi+n−1 (1), γi+n−2 (2), . . . , γi (n)) and Γn,i = γi+n− ( − m) (5.1.2) ,m=1,...,n is the covariance matrix of Xn,i = (Xi , ..., Xi+n−1 ) for each i = 0, ..., S − 1. If σi2 > 0 for i = 0, . . . , S − 1, then for a causal PARMAS (p, q) process the covariance matrix Γn,i is 67 nonsingular for every n ≥ 1 and each i in Lund and Basawa [22, Theorem 3.1]. Assuming that Xn,i is a Gaussian process, its likelihood function can be written using the general formula for a multivariate normal random vector with a given mean and covariance matrix L(Γn,i ) = (2π)−n/2 (det Γn,i )−1/2 exp − 21 Xn,i Γ−1 n,i Xn,i , (5.1.3) where Γn,i is the covariance matrix of Xn,i defined in (5.1.2). Whenever Γn,i is invertible, the direct calculation of det Γn,i and Γ−1 n,i can be avoided ˆ by expressing it in terms of the one-step predictors X i+j and the mean square errors vj,i , both of which are easily calculated recursively from the innovations algorithm, where vj,i is from (3.0.1), and ˆ (i) ∼ N (0, v ), Xi+j − X j,i i+j j = 0, 1, . . . , n − 1, Recall the innovations representation for PARMA models in Anderson et al [6], where     t=0   0, (i) ˆ t X (i) i+t =  ˆ (i)  θt,j (Xi+t−j − X  i+t−j ),   j=1 (5.1.4) t = 1, . . . , n − 1. Then the computation for (5.1.3) can be simplified in the following theorem: Theorem 5.1.1. When Γn,i is invertible, the likelihood (5.1.3) can be equivalently written 68 in the form of L(Γn,i ) =(2π)−n/2 (v0,i v1,i · · · vn−1,i )−1/2 n−1 1 ˆ (i) )2 /v exp − 2 (Xi+j − X j,i , i+j j=0 (5.1.5) ˆ where X i+j are the one-step predictors in (2.1.13) and vj,i are the mean square errors in (3.0.1). Proof. For each i, define the n × n lower triangular matrix Ci = [θk,k−j (i)]n−1 , k,j=0 (5.1.6) where det Ci = 1, since all the diagonal elements θk,0 (i) = 1 for k = 0, . . . , n − 1, and define the n × n diagonal matrix, Di = diag(v0,i , v1,i , . . . , vn−1,i ). (5.1.7) ˆ (i) = (X ˆ (i) , . . . , X ˆ (i) Therefore X n,i i i+n−1 ) can be written in the from, ˆ ˆ (i) X n,i = (Ci − I)(Xn,i − Xn,i ), (5.1.8) where I is the n × n identity matrix. Hence, ˆ ˆ ˆ Xn,i = Xn,i − X n,i + Xn,i = Ci (Xn,i − Xn,i ). 69 (5.1.9) Since Di = E ˆ Xn,i − X n,i ˆ Xn,i − X n,i , and Γn,i = E Xn,i Xn,i , it follows that Γn,i = Ci Di Ci . (5.1.10) By (5.1.9) and (5.1.10), we obtain −1 ˆ ˆ Xn,i Γ−1 n,i Xn,i = (Xn,i − Xn,i ) Di (Xn,i − Xn,i ) = n−1 ˆ (i) )2 /v , (5.1.11) (Xi+j − X j,i i+j j=0 and det Γn,i = (det Ci )2 (det Di ) = v0,i v1,i · · · vn−1,i . (5.1.12) Therefore the likelihood (5.1.3) of the vector Xn,i reduces to (5.1.5). Whenever Γn,i is invertible, the likelihood in (5.1.3) has the equivalent innovations representation as (5.1.5). Actually Γn,i is invertible for causal PARMA models, see the proof in Lund and Basawa [23, Proposition 4.1]. Given the seasonal one-step error variances vj,i from the innovations and ˆ (i) , then L(Γ ) in (5.1.3) can easily be calculated. the one-step ahead predictor X n,i i+j For notation, let φt = (φt (1), . . . , φt (p)) and θ t = (θt (1), . . . , θt (q)) denote the autoregressive and moving-average parameters during season t, respectively. Then we can write L(Γn,i ) = L(β), where we use β = (φ0 , θ 0 , φ1 , θ 1 , . . . , φS−1 , θ S−1 ) to denote the collection of all PARMAS (p, q) parameters. The dimension of β is (p+q)S ×1. Following the ideas 2 ) from Basawa and Lund [24], we treat the white noise variances σ 2 = (σ02 , σ12 , . . . , σS−1 70 as nuisance parameters. Then the likelihood function is given by LX (β) = (2π)−n/2 (v0,i v1,i · · · vn−1,i )−1/2 exp − 21 n−1 ˆ (i) )2 /v (Xi+j − X j,i . i+j j=0 (5.1.13) Another way to compute the PARMA likelihood parameter estimates is to equivalently minimize the negative log likelihood n−1 n−1 ˆ (i) )2 /v . −2 log{L(β)} = n log(2π) + log(vj,i ) + (Xi+j − X j,i i+j j=0 j=0 (5.1.14) ˆ According to Basawa and Lund [24], once we obtain the maximum likelihood estimate β, the MLE of σi2 for 0 ≤ i ≤ S − 1, have the large sample form σ ˆi2 = N −1 N −1 ˆ 2, εjS+i (β) j=0 2 ) . However, they didn’t give a proof for the large sample form. where σ ˆ 2 = (ˆ σ02 , . . . , σ ˆS−1 In the following we propose our method of MLE computation, where there is an exact form of MLE for σ 2 , and the computation is much faster. Given the data set Xi , . . . , Xi+n−1 , ˆ (i) , j = 0, . . . , n − 1 apply innovations algorithm to compute the 1-step ahead predictors X i+j 2 ˆ (i) and the forecast errors vj,i = E Xi+j − X . i+j Corollary 5.1.2. |rj,i − 1| → 0 as j → ∞. Proof. By Anderson et al. [6, Corollary 2.2.1] vk, i−k → σi2 , 71 we have 2 | → 0, as j → ∞. |vj,i − σi+j 2 r , then we have Recall that vj,i = σi+j j,i 2 r − σ 2 | → 0, as j → ∞, |σi+j j,i i+j so that 2 |r − 1| → 0, as j → ∞. σi+j j,i 2 is periodic of S seasons, so Notice that for PARMAS (p, q) model σi+j 2 ≤ max{σ 2 , . . . , σ 2 } < ∞, 0 < σi+j 0 S−1 then we have proved |rj,i − 1| → 0, as j → ∞. 72 Then the negative log likelihood (5.1.14) becomes n−1 (β) = −2 log{L(β)} = n log(2π) + n−1 2 ˆ (i) Xi+j − X /vj,i i+j log(vj,i ) + j=0 S−1 N −1 = n log(2π) + j=0 log rkS+i σi2 i=0 k=0 S−1 N −1 ˆ (i) XkS+i − X kS+i + i=0 k=0 N −1 = n log(2π) + 2 (5.1.15) / rkS+i σi2 log(rj ) + N log(σi2 ) + j=0 where N −1 (X Si = k=0 S−1 Si /σi2 , i=0 2 ˆ (i) kS+i − XkS+i ) . rkS+i Then for i = 0, . . . , S − 1, we have ∂ ∂σi2 = N −1 + S = 0, 2 i σi2 2 σi therefore N σi2 − Si = 0, and S σi2 = i . N So the MLE of σi2 is 1 σ ˆi2 = N N −1 ˆ (i) XkS+i − X kS+i k=0 73 2 /rkS+i = Si /N, (5.1.16) ˆ (i) where X and rkS+i come from the innovations algorithm applied to the model. Now kS+i we can substitute σ ˆi2 for σi2 in the negative log likelihood to get n−1 −2 log{L(β)} = n log(2π) + n + S−1 log(rj,i ) + N j=0 log Si /N . i=0 Then it suffices to minimize ∗ (β) = 1 N n−1 S−1 log(rj,i ) + j=0 log Si /N . i=0 Also, since rj,i → 1 as j → ∞, then log(rj ) → 0, so 1 N and approximate MLE minimizes n−1 log(rj ) ≈ 0, j=0 S−1 log S /N . i i=0 Example 5.1.3. A PARMA(1, 1) example from Chapter 4 will be shown here, to demonstrate how to explicitly solve the mean square errors, and obtain the likelihood value and maximum likelihood estimate. We adopt the 70 years of monthly Fraser river data from October 1912 to September 1982, and since there are 840 observations, we denote them as {Xt } = {X0 , X1 , . . . , X839 }. Therefore the negative log likelihood is simplified as n−1 ˆ )2 /v log(vj ) + (Xj − X j j j=0 j=0 N −1 S−1 2 = n log(2π) + log(rj ) + N log(σi ) + Si /σi2 , j=0 i=0 −2 log{L(β, σ 2 )} = n log(2π) + n−1 74 (5.1.17) 2 ) . Computations in where β = (φ0 , θ0 , φ1 , θ1 , . . . , φ11 , θ11 ) and σ 2 = (σ02 , σ12 , . . . , σ11 Lund and Basawa [23, Equations (2.7), (2.11)] show that the PARMA(1, 1) model is causal when |φ0 φ1 · · · φ11 | < 1 and invertible when |θ0 θ1 · · · θ11 | < 1. By Theorem 2.1.7, we may write ˆ =0 X 0 ˆ =θ ˆ X 1 1,1 X0 − X0 (5.1.18) ˆ =φ X ˆ X t t t−1 + θt,1 Xt−1 − Xt−1 t = 2, . . . , 839. For the covariance structure of {Xt }, we will use the γt (h) results from Section 1.1. The innovations algorithm explicitly identifies the prediction coefficients as −1 C (t, t − 1) = v −1 θ σ 2 . θt,1 = vt−1 t−1 t t−1 (5.1.19) Use (5.1.19) in innovation algorithm gives −1 2 4 2 v 2 2 2 vt = C(t, t) − θt,1 t−1 = (σt + θt σt−1 ) − vt−1 θt σt−1 , (5.1.20) where t ≥ 1, and the initial condition is v0 = γ0 (0). Equation (5.1.20) is a difference equation for vt , which can be explicitly solved as follows. First, write y vt = σt2 t , yt − 1 yt−1 2 vt−1 = σt−1 , yt−1 − 1 75 (5.1.21) and substitute (5.1.21) in (5.1.20) to get 4 θt2 σt−1 (yt−1 − 1) yt 2 2 2 2 σt = (σt + θt σt−1 ) − 2 yt − 1 yt−1 σt−1 2 )y 2 2 σt2 yt yt−1 = (σt2 + θt2 σt−1 t−1 (yt − 1) − θt σt−1 (yt−1 − 1)(yt − 1), therefore yt = 1 + σt2 t ≥ 1, yt−1 σt−1 θt2 (5.1.22) σ0−2 γ0 (0) where the initial condition of (5.1.22) is y0 = −2 . Letting σ0 γ0 (0)−1 σt2 P (t) = , 2 θ2 σt−1 t (5.1.23) then the solution to (5.1.22) is yt = 1 + P (t)yt−1 = 1 + P (t) 1 + P (t − 1) yt−2 = 1 + P (t) 1 + P (t − 1) 1 + P (t − 2) yt−3 (5.1.24) .. .  t  t  t   =1+ P (r) y0 + P (r) , r=1 j=2 r=j In the computation, notice that t j=2 t ≥ 1. t r=j P (r) is the row-summation of the product 76 of each row in a matrix, where the matrix is   1 P (2) P (3) . . . . . . P (t)     1  1 P (3) . . . . . . P (t)       . 1 1 1 P (4) . . . P (t)     .. .. .. .. ..   .. . . . . . .      1 1 1 ... 1 P (t) Use (5.1.24) in (5.1.21) to compute vt . Then use vt in (5.1.19) to compute θt,1 . Next obtain ˆ in (5.1.18). Finally substitute into (5.1.17) to get the negative log likelihood function. X t In Chapter 4, we fit a PARMA(1,1) model to the 70-year Fraser river flows data. The model parameters θˆt , φˆt and σ ˆt are shown in Table 4 of Chapter 4. Using these values to ˆ in this example, we obtain the negative log likelihood value of 18543 compute vt and X t for this model. However, in the following we consider to take the logarithm of the data, for all computations in MLE. The model parameters for the logarithm of the data are shown in Table 5.1.3, where the negative log likelihood value is -362.655. Season θˆi φˆi σi Season θˆi φˆi σi To get the maximum 0 1 2 3 4 5 0.650 0.206 -0.026 -0.033 0.426 -0.400 0.393 0.740 0.712 0.715 0.431 1.020 0.165 0.187 0.164 0.159 0.135 0.134 6 7 8 9 10 11 0.446 -0.171 1.918 0.836 -0.299 0.540 0.425 0.313 -1.628 -0.045 0.979 0.414 0.260 0.184 0.127 0.143 0.108 0.138 Table 5.1: Estimated parameters for PARMA12 (1, 1) model, from innovations algorithm, using the logarithm of the original data. The resulting negative likelihood value is -362.655. likelihood estimators, we will differentiate −2 log L(β) partially with respect to the parame77 ters. A non-linear optimization algorithm will be applied, using R software to find the MLE. 2 ) as By our earlier assumption, we treat the white noise variances σ 2 = (σ02 , σ12 , . . . , σS−1 nuisance parameters, so that the independent variables in the optimization are the model ˆ we solve parameters β = (φ0 , θ0 , φ1 , θ1 , . . . , φS−1 , θS−1 ) . After we compute the MLE β, 2 ) using the MLE for σ ˆ 2 = (ˆ σ02 , . . . , σ ˆS−1 1 σ ˆi2 = N N −1 ˆ (i) XkS+i − X kS+i 2 /rkS+i = Si /N, k=0 The function optim in R is used, a general-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms. It includes an option for box-constrained optimization and simulated annealing. Several different options were explored, using our likelihood function, and the initial values from Table 4. A summary of the performance of the different options is provided in Table 5.2. The BFGS option was found to be the fastest convergent method. This option uses the results published simultaneously by Broyden, Fletcher, Goldfarb and Shanno in 1970 [12], [17], [19] and [38], which uses function values and gradients to approximate the optimization surface. The parameter estimates that resulted using the BFGS algorithm are listed in Table 5.3. ˆ Method −2 log{L(β)} BFGS 463.8585 CG -505.37 SANN -362.65 Nelder-Mead 1182.61 Running time 137s 282s 378s 17.29s Convergence Yes No Yes No Depending on initial value Yes Yes Yes Yes Table 5.2: A compare of different algorithm in optim function. In order to test the convergence, we use the result in Table 5.3 as an initial value, and run the optimization procedure again. The algorithm converges quickly to the same value 78 Season θˆi φˆi σi Season θˆi φˆi σi 0 0.586 0.546 0.165 6 0.204 0.720 0.260 1 0.154 0.800 0.187 7 0.071 0.162 0.184 2 3 4 5 0.119 -0.008 0.085 -0.206 0.709 0.679 0.738 0.884 0.164 0.159 0.135 0.134 8 9 10 11 1.035 0.398 -0.110 0.415 -0.690 0.417 0.855 0.558 0.127 0.143 0.108 0.138 Table 5.3: MLE result by BFGS method, with σi as nuisance parameters. The resulting ˆ = −463.8585. MLE value was −2 log L(β) ˆ as before. of −2 log{L(β)} Next we plot −2 log{L(β, σ 2 )} as a function of each σi , for i = 0, 1, . . . , 11. In this way, we obtain 12 plots, shown in Figure 5.1. There is a clear global local minimum point on each plot, which we compute using the optimize function in R to find the corresponding σi ∗ ) . With this σ ∗ value, we run optim value, and denote it as σi∗ , with σ ∗ = (σ0∗ , σ1∗ , . . . , σ11 i again for −2 log{L(β, σ 2 )}, using the same BFGS algorithm, resulting in a lower negative likelihood value of -506.6053. For different starting values and different iterations, the routine converges to the same optimum. See results in Table 5.4. Finally, to check our results, we use a one variable optimization to minimize the negative log likelihood function as a function of each individual variable, by similar techniques to find ∗ ) on Figure 5.2 and φ∗ = (φ∗ , φ∗ , . . . , φ∗ ) the global minimum points θ ∗ = (θ0∗ , θ1∗ , . . . , θ11 0 1 11 on Figure 5.3. Since these results have been consistent with the MLE results from BFGS optimization algorithm shown in Table 5.4, then we are confident that the values reported in Table 5.4 represent the true MLE. Therefore we take the results in Table 5.4 as our final optimization results for model parameters. 79 0 0 0 −100 −300 −400 loglik −200 loglik loglik −100 −200 −200 −300 −400 −400 −500 −500 0.25 0.50 0.75 1.00 0.25 sigma_0 0.50 0.75 1.00 0.25 sigma_1 0.50 0.75 1.00 sigma_2 −200 −300 −350 loglik −300 −300 loglik loglik −200 −400 −400 −400 −450 −500 −500 −500 0.25 0.50 0.75 1.00 0.25 sigma_3 0.50 0.75 1.00 0.25 sigma_4 200 1000 0.50 0.75 1.00 sigma_5 −250 −300 loglik loglik 500 loglik 0 −200 −350 −400 0 −450 −400 −500 −500 0.25 0.50 0.75 1.00 0.25 sigma_6 0.50 0.75 1.00 0.25 sigma_7 0.50 0.75 1.00 sigma_8 −200 −250 −300 −300 loglik loglik loglik −350 −300 −400 −400 −500 0.25 0.50 0.75 sigma_9 1.00 −350 −400 −450 −450 −500 −500 0.25 0.50 0.75 sigma_10 1.00 0.25 0.50 0.75 1.00 sigma_11 Figure 5.1: −2 log{L(β, σ 2 )} as a function of each σi , i = 0, . . . , 11, with the remaining parameters fixed. 80 1500 1000 1000 0 500 loglik loglik loglik 500 0 −500 0 −500 −5.0 −2.5 0.0 2.5 5.0 500 −500 −5.0 −2.5 theta_0 0.0 2.5 5.0 −5.0 −2.5 theta_1 0.0 2.5 5.0 2.5 5.0 theta_2 2000 1500 1500 1500 1000 500 loglik loglik loglik 1000 500 0 0 −5.0 −2.5 0.0 2.5 5.0 500 0 −500 −500 1000 −500 −5.0 −2.5 theta_3 0.0 2.5 5.0 −5.0 −2.5 theta_4 0.0 theta_5 2000 −100 9000 −300 loglik loglik loglik −200 6000 1000 3000 −400 0 0 −500 −5.0 −2.5 0.0 2.5 5.0 −5.0 theta_6 −2.5 0.0 2.5 5.0 −350 1000 −400 500 loglik loglik −300 −400 −450 −500 −500 0 1 theta_9 2 1 2 3 4 5 theta_8 loglik −200 −1 0 theta_7 0 −500 −1.0 −0.5 0.0 0.5 theta_10 1.0 −5.0 −2.5 0.0 2.5 5.0 theta_11 Figure 5.2: −2 log{L(β, σ 2 )} as a function of each θi , i = 0, . . . , 11, with the remaining parameters fixed. 81 2000 3000 4000 500 2000 loglik 1000 loglik loglik 1500 1000 2000 0 0 0 −500 −5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 phi_0 2.5 5.0 −5.0 −2.5 phi_1 0.0 2.5 5.0 2.5 5.0 2.5 5.0 2.5 5.0 phi_2 6000 6000 2500 loglik 4000 loglik loglik 5000 2000 2000 0 0 −5.0 −2.5 0.0 2.5 5.0 4000 0 −5.0 −2.5 0.0 phi_3 2.5 5.0 −5.0 −2.5 phi_4 0.0 phi_5 6000 1000 15000 500 10000 0 5000 −500 0 −5.0 −2.5 0.0 2.5 5.0 loglik loglik loglik 4000 0 −5.0 −2.5 phi_6 0.0 2.5 5.0 −5.0 −2.5 phi_7 0.0 phi_8 2000 4000 1500 3000 1000 500 loglik 5000 loglik loglik 2000 2500 2000 1000 0 0 0 −500 −5.0 −2.5 0.0 phi_9 2.5 5.0 −5.0 −2.5 0.0 2.5 phi_10 5.0 −5.0 −2.5 0.0 phi_11 Figure 5.3: −2 log{L(β, σ 2 )} as a function of each φi , i = 0, . . . , 11, with the remaining parameters fixed. 82 Season θˆi φˆi σi∗ Season θˆi φˆi σi∗ 0 0.586 0.546 0.199 6 0.204 0.720 0.300 1 0.154 0.800 0.208 7 0.071 0.162 0.219 2 3 4 5 0.119 -0.008 0.085 -0.206 0.709 0.679 0.738 0.884 0.194 0.171 0.166 0.151 8 9 10 11 1.035 0.398 -0.110 0.415 -0.690 0.417 0.855 0.558 0.153 0.162 0.126 0.154 Table 5.4: MLE by BFGS method, using the values of σ ∗ from Figure 5.1. The resulting ˆ = −506.6053. We take those parameters as our best results in MLE value was −2 log L(β) optimization. 5.2 Reduced PARMAS (1, 1) model. To obtain additional parsimony, it is also advisable to consider simpler models where some statistically insignificant model parameters are set to zero. 5.2.1 Reduced model by asymptotic distribution of ψˆi ( ) In the discussion of a reduced PARMAS (1, 1) model, by (3.0.8), φˆt (1) = ψˆt (2)/ψˆt−1 (1) and θˆt (1) = ψˆt (1) − φˆt (1). In order to obtain the estimates of the PARMA12 (1, 1) model parameters, we need only consider ψˆi ( ) at lag 1 and lag 2. An α-level test statistic rejects the null hypothesis H0 : ψi ( ) = 0 in favor of the alternative Ha : ψi ( ) = 0, indicating that the model parameter is statistically significantly different from zero) if |Z| > zα/2 . The p-value for this test is given by (3.0.7), and it gives us a way to determine which coefficients in the identified PARMA model are statistically significantly different from zero (those with a small p-value, 83 i θi φi σi 0 1.04 0.00 0.16 1 0.20 0.74 0.18 2 3 4 -0.02 -0.03 0.42 0.71 0.71 0.43 0.16 0.15 0.13 5 6 7 8 9 10 11 -0.40 0.87 0.00 -0.71 0.79 -0.29 0.54 1.02 0.00 0.00 1.00 0.00 0.97 0.41 0.13 0.26 0.18 0.12 0.14 0.10 0.13 Table 5.5: Parameter estimates for the reduced PARMA model (1.0.1) of average monthly flow series for the Fraser River near Hope BC from October 1912 to September 1982. The resulting negative likelihood value is -351.3939. say, p < 0.05). Coefficients with a higher p-value are shown in bold font in Table 3.1, and we set coefficients ψˆi ( ) = 0 in that case. Using (3.0.8), we then list in Table 5.5 the parameter estimates of the reduced PARMA12 (1, 1) model, where autoregressive coefficients in season 0, 6, 7 and 9 are set to 0. Here we apply innovations algorithm to get our model parameters. Later in this chapter we will shown a different method using MLE. Once we obtain the estimates for the reduced PARMA12 (1, 1) model parameters, we can carry out forecast procedures similar to the full PARMA12 (1, 1) model, as follows. Since autoregressive coefficients in season 0, 6, 7 and 9 are set to 0, the computation is simpler for the Wt process. • Compute the transformed process (2.1.2) using the reduced model parameters. • Compute the sample autocovariance of that process by Proposition 2.1.3. • Apply the innovations algorithm (2.1.9) to get the projection coefficients θn,j . ˆ n for n = 1, 2, . . . , 840 = • Use (2.1.13) to compute the one-step-ahead predictors X 70 × 12. • Apply (2.1.17) to get the forecasts. • Use the asymptotic formula (2.2.3) to compute 95% prediction bounds, based on the assumption of Gaussian innovations. 84 The resulting prediction, along with the 95% prediction bands, is shown in Figure 5.4. The actual data (solid line) is also shown for comparison. Note that the forecast (solid line with dots) is in reasonable agreement with the actual data (which were not used in the forecast), and that the actual data lies well within the 95% prediction bands. Furthermore, we give a detailed comparison in Figure 5.5 between full model and the reduced one. As shown in the first two graphs, the prediction results are quite similar. In a further investigation, we check the difference and relative error between the two forecasts, for every step. The differences are small. In fact, when compared to predictions, the relative errors are small. When the step h ≥ 5, the relative errors are nearly zero. We conclude that the removed autoregressive coefficients in season 0, 6, 7 and 9 are insignificant, and the reduced model is a viable substitute for the full model. We prefer the reduced model, since it has fewer parameters. 5.2.2 Reduced model by asymptotic distribution of MLE Another method for obtaining a reduced model is to apply the asymptotics of MLE for the PARMA process, which were developed in Basawa and Lund [24]. For a causal and invertible Gaussian PARMA model, with the assumption of {εt } being periodic i.i.d. Gaussian noise, ˆ as N → ∞, Basawa and Lund [24, Theorem 3.1] gives the asymptotic distribution of β, N 1/2 (βˆ − β) → N 0, A−1 (β, σ 2 ) , (5.2.1) where A(β, σ 2 ) = S−1 i=0 85 σi−2 Γi (β, σ 2 ), (5.2.2) 11 9 10 Flow (cms) 12 13 Forecasting by reduced model 10/1982 02/1983 06/1983 10/1983 02/1984 06/1984 Month / Year Figure 5.4: A 24-month forecast using the reduced PARMA12 (1, 1) model. Solid line is the original data; Solid line with dots is the reduced PARMA(1,1) forecast; The two dotted lines are 95% Gaussian prediction bounds. The 24-month forecast is from October 1982 to September 1984. and Γi (β, σ 2 ) = E ∂εt (β) ∂εt (β) )( ∂β ∂β . Furthermore, Basawa and Lund [24, Remark 3.1] states that if {εt } is Gaussian, then the maximum likelihood estimate has the same asymptotic distribution as the weighted least ˆ the MLE of σ 2 for squares estimate. Once we obtain the maximum likelihood estimate β, i 0 ≤ i ≤ S − 1 could be computed by 1 σ ˆi2 = N N −1 ˆ (i) XkS+i − X kS+i k=0 86 2 /rkS+i = Si /N, (5.2.3) 12 10 9 5 10 15 20 5 10 15 20 Month Difference of forecast between full model and reduced model Relative error between full model and reduced model 0.010 0.002 0.006 0.10 Flow (cms) 0.15 0.014 Month 0.05 Flow (cms) 11 Flow (cms) 11 9 10 Flow (cms) 12 13 Reduced model forecast vs Data 13 Full model forecast vs Data 5 10 15 20 5 Month 10 15 20 Month Figure 5.5: A comparison of 24-month forecasts between the full PARMA12 (1, 1) model and reduced PARMA12 (1, 1) model. In the first two graphs, dotted line is forecast, and solid line is real data. 87 2 ). ˆS−1 where σ ˆ 2 = (ˆ σ02 , . . . , σ Example 5.2.1. We adopt the causal and invertible first-order PARMA12 (1, 1) model from Example 5.1.3, and show how to compute the asymptotic covariance matrix A−1 (β, σ 2 ). diag(A−1 ) The results are shown in Table 5.6, where se = stands for the standard error N 1/2 ˆ for β. In the following, p = q = 1, and S = 12 denotes the number of seasons, and Season 0 1 2 3 4 5 6 7 8 9 10 11 γˆi (0) se(θˆi ) se(φˆi ) 0.06 0.36 0.30 0.07 0.26 0.21 0.06 0.23 0.19 0.05 0.22 0.18 0.05 0.27 0.21 0.05 0.24 0.17 0.16 0.29 0.19 0.08 0.08 0.06 0.11 0.25 0.19 0.05 0.19 0.09 0.03 0.18 0.14 0.03 0.24 0.18 Table 5.6: MLE and its standard error. i = 0, 1, . . . , S − 1, and t denotes the corresponding season index for t, where t =     t − S[t/S] if t ≥ 0,    S + t − S[t/S + 1] if t < 0, and [ · ] stands for the greatest integer function. The model is Xt = φt Xt−1 + εt + θt εt−1 . 88 (5.2.4) Taking partial derivations in (5.2.4) gives ∂ε (β) ∂εt (β) = −e2 t +1 Xt−1 − θt t−1 − e2 t +2 εt−1 (β), ∂β ∂β (5.2.5) where ej denotes a (p + q)S × 1 unit vector whose entries are all zero, except for a 1 in the ∂ε (β) j th row. Note that in the following Xt−1 , θt and εt−1 (β) are scalars, however t and ∂β ej denote the (p + q)S × 1 vectors. Therefore by causality 2 , E Xt−1 εt−1 (β) = E εt−1 (β)Xt−1 = E ε2t−1 (β) = σt−1 by Leibniz integral rule, provided that εt−1 (β) and ∂εt−1 (β) are both continuous, we take ∂β the derivative outside the expectation, E ∂εt−1 (β) ∂ ∂ 2 = 0, Xt−1 = E εt−1 (β)Xt−1 = σt−1 ∂β ∂β ∂β where 0 is a (p + q)S × 1 zero vector. Similarly  E Xt−1 ∂εt−1 (β) ∂β  =0, and E ∂εt−1 (β) ∂ ∂ 2 εt−1 (β) = E εt−1 (β)εt−1 (β) = σt−1 = 0, ∂β ∂β ∂β and similarly  E  ∂εt−1 (β) ∂β εt−1 (β) = 0 . Therefore, if we multiply both sides of (5.2.5) by its own transpose and take an expectation, 89 we have E ∂εt (β) ∂β ∂εt (β) ∂β = E − e2 t +1 Xt−1 − θt ∂εt−1 (β) − e2 t +2 εt−1 (β) ∂β ∂εt−1 (β) − e2 t +2 εt−1 (β) ∂β ∂εt−1 (β) + var(Xt−1 )E2 t +1,2 t +1 ∂β − e2 t +1 Xt−1 − θt ∂εt−1 (β) = θt2 E ∂β 2 E2 t +2,2 t +2 + E2 t +2,2 t +1 + E2 t +1,2 t +2 + σt−1 + θt e2 t +1 0 + 0e2 t +1 + e2 t +2 0 + 0e2 t +2 ∂εt−1 (β) = θt2 E ∂β ∂εt−1 (β) ∂β + var(Xt−1 )E2 t +1,2 t +1 2 + σt−1 E2 t +2,2 t +2 + E2 t +2,2 t +1 + E2 t +1,2 t +2 +O ∂εt−1 (β) = θt2 E ∂β ∂εt−1 (β) ∂β + var(Xt−1 )E2 t +1,2 t +1 2 + σt−1 E2 t +2,2 t +2 + E2 t +2,2 t +1 + E2 t +1,2 t +2 , where var(Xt ) = γt (0) is the variance of Xt , and Ei,j = ei ej denotes a (p + q)S × (p + q)S matrix whose entries are all zero, except for a one in the ith row and j th column, and O is a (p + q)S × (p + q)S zero matrix. Recall that Γt (β, σ 2 ) = E ∂εt (β) ∂β ∂εt (β) ∂β then we obtain Γt (β, σ 2 ) = θt2 Γt−1 (β, σ 2 ) + Mt , 90 , where 2 Mt =σt−1 E2 t +2,2 t +2 + E2 t +2,2 t +1 + E2 t +1,2 t +2 + var(Xt−1 )E2 t +1,2 t +1 . For simplicity we adopt the notation Γi (β, σ 2 ) = θi2 Γi−1 (β, σ 2 ) + Mi , (5.2.6) with the boundary condition Γ0 (β, σ 2 ) = ΓS (β, σ 2 ), hence in this way all the index i are bounded, 0 ≤ i ≤ S − 1. The solution to (5.2.6) is 2 Γi (β, σ 2 ) = rθ,i 2 S−1 rθ,S−1 Mk Mk 2 +( )rθ,i , r2 1 − r2 r2 θ,S−1 k=0 θ,k k=0 θ,k i (5.2.7) where t rθ,t = θt , (5.2.8) j=0 the proof for (5.2.7) is given as follows. By (5.2.6), ∀ 0 ≤ i ≤ S − 1, we could write 91 recursively: Γi = θi2 Γi−1 + Mi 2 Γ = θi2 (θi−1 i−2 + Mi−1 ) + Mi 2 (θ2 Γ 2 = θi2 θi−1 i−2 i−3 + Mi−2 ) + θi Mi−1 + Mi 2 θ2 Γ 2 2 2 = θi2 θi−1 i−2 i−3 + θi θi−1 Mi−2 + θi Mi−1 + Mi ... = θi θi−1 · · · θi−(S−1) + θi θi−1 · · · θi−(S−2) 2 2 Γi−S Mi−(S−1) . . . + θi θi−1 · · · θ0 2 M−1 + θi θi−1 · · · θ1 2 M0 + θi θi−1 · · · θ2 2 M1 + . . . + (θi θi−1 )2 Mi−2 + θi2 Mi−1 + Mi 2 (5.2.9) = θi θi−1 · · · θi−(S−1) Γi−S + I + II, where we define I = θi θi−1 · · · θi−(S−2) 2 Mi−(S−1) . . . + θi θi−1 · · · θ0 2 M−1 II = θi θi−1 · · · θ1 2 M0 + θi θi−1 · · · θ2 2 M1 + . . . + (θi θi−1 )2 Mi−2 + θi2 Mi−1 + Mi , since Γi−S = Γi by periodic property, θi θi−1 · · · θi−(S−1) = rθ,S−1 by definition in 92 (5.2.8), we could write (5.2.9) as I + II 1 − r2 θ,S−1 2 2 II(1 − rθ,S−1 ) + I + II(rθ,S−1 ) = 1 − r2 θ,S−1 2 I + II(rθ,S−1 ) . = II + 1 − r2 θ,S−1 Γi = (5.2.10) Note that II = θi θi−1 · · · θ1 2 M0 + θi θi−1 · · · θ2 2 M1 + . . . + (θi θi−1 )2 Mi−2 + θi2 Mi−1 + Mi Mi M0 M1 + ... + = θi θi−1 · · · θ0 2 + (θ0 θ1 )2 (θ0 θ1 · · · θi )2 θ02   2  M0 + M1 + . . . + Mi  = rθ,i r2 r2 r2 θ,0 θ,1 θ,i i Mk 2 = rθ,i . r2 k=0 θ,k (5.2.11) On the other hand, we want to prove that 2 I + II(rθ,S−1 ) 1 − r2 θ,S−1 2 S−1 rθ,S−1 Mk 2 =( )rθ,i , 1 − r2 r2 θ,S−1 k=0 θ,k 93 (5.2.12) in this way we will finish the proof of (5.2.7). Note that 2 2 I + (rθ,S−1 )II = θi θi−1 · · · θi−(S−2) Mi−(S−1) . . . + θi θi−1 · · · θ0 2 M−1 2 Mi M0 + ... + (θ0 θ1 · · · θi )2 θ02 2 = θi θi−1 · · · θi−(S−2) Mi+1 . . . + θi θi−1 · · · θ0 2 MS−1 i Mk 2 2 + rθ,S−1 rθ,i r2 k=0 θ,k MS−1 Mi+1 2 θi · · · θ0 2 = θS−1 · · · θ0 + ... + (θ0 · · · θi+1 )2 (θ0 · · · θS−1 )2 i Mk 2 2 + rθ,S−1 rθ,i r2 k=0 θ,k S−1 i Mk Mk 2 2 2 2 = rθ,S−1 rθ,i + rθ,S−1 rθ,i r2 r2 k=i+1 θ,k k=0 θ,k S−1 Mk 2 2 = rθ,S−1 rθ,i , r2 k=0 θ,k + θS−1 θS−2 · · · θ0 θi θi−1 · · · θ0 2 2 dividing both sides by 1−rθ,S−1 , (5.2.12) is proved. Add (5.2.11) and (5.2.12) into (5.2.10), we complete the proof of (5.2.7). The invertibility of the model requires that |rθ,S−1 | < 1. In the end, A−1 (β, σ 2 ) is a (p + q)S × (p + q)S matrix, and it could be computed by the inverse of A(β, σ 2 ) in (5.2.2). Equation (5.2.1) can be used to produce confidence intervals and hypothesis tests for the model parameters β. An α-level test statistic rejects the null hypothesis H0 : β i = 0 in favor of the alternative hypothesis Ha : β i = 0, indicating that β i is statistically significantly 94 different from zero if |Z| > zα/2 . The p value for this test is p = P (|Z| > |z|), (5.2.13) where Z ∼ N (0, 1), and ˆ − 0)/se. Z = (β After we get the MLE optimization results from Example 5.1.3 and Table 5.4, we do a ˆ φ), ˆ where the standard error se is obhypothesis test for the model parameters βˆ = (θ, tained from Table 5.6. Using the level of significance of α = 5%, we get the reduced model parameters, which achieve the goal of parsimony. The results are shown in Table 5.7. Season θˆi p-value reduced θˆi φˆi p-value reduced φˆi 0 1 2 3 4 0.55 -0.01 0.04 -0.02 0.09 0.13 0.96 0.87 0.93 0.74 0.00 0.00 0.00 0.00 0.00 0.37 0.63 0.52 0.54 0.71 0.22 0.00 0.01 0.00 0.00 0.00 0.63 0.52 0.54 0.71 5 -0.26 0.27 0.00 0.86 0.00 0.86 6 0.01 0.98 0.00 1.18 0.00 1.18 7 8 0.37 1.89 0.00 0.00 0.37 1.89 0.23 -1.41 0.00 0.00 0.23 -1.41 9 10 0.27 -0.08 0.16 0.64 0.00 0.00 0.35 0.57 0.00 0.00 0.35 0.57 11 0.32 0.18 0.00 0.37 0.04 0.37 Table 5.7: Model parameters by MLE optimization and their p-values. We obtain the forecast in a similar manner as the previous reduced model. The resulting prediction, along with the 95% prediction bands, is shown in Figure 5.6. The actual data (solid line) is also shown for comparison. The forecast (solid line with dots) is in reasonable agreement with the actual data (which were not used in the forecast), and the actual data lies well within the 95% prediction bands. A detailed comparison can be seen in Figure 5.7 between full model and the reduced one. As shown in the first two graphs, the reduced model predicts as well as the full model. In the next two graphs, we check the difference and 95 relative error between the two forecasts, for every step. The differences are small. In fact, when compared to predictions, the relative errors are quite small. When the step h ≥ 5, the relative errors are nearly zero. We conclude that the removed autoregressive coefficients in Table 5.7 are insignificant, and the reduced model performs as well as the full model. In summary, we prefer the reduced PARMA12 (1, 1) model simply because it is a simpler model with fewer parameters and the same adequacy. 11 9 10 Flow (cms) 12 13 Forecasting by reduced PARMA model by asymptotic distribution of MLE 10/1982 02/1983 06/1983 10/1983 02/1984 06/1984 Month / Year Figure 5.6: A 24-month forecast using the reduced PARMA12 (1, 1) model in Table 5.7. Solid line is the original data; Solid line with dots is the reduced PARMA(1,1) forecast; The two dotted lines are 95% Gaussian prediction bounds. The 24-month forecast is from October 1982 to September 1984. 96 0 100000 Flow (cms) 100000 0 Flow (cms) 250000 Reduced model forecast vs Data 250000 Full model forecast vs Data 5 10 15 20 5 10 15 20 Month Difference of forecast between full model and reduced model Relative error between full model and reduced model 0.08 0.00 0.04 Flow (cms) 4000 0 Flow (cms) 8000 Month 5 10 15 20 5 Month 10 15 20 Month Figure 5.7: A comparison of 24-month forecasts between the full PARMA12 (1, 1) model and reduced PARMA12 (1, 1) model, by asymptotic distribution of MLE. In the first two graphs, dotted line is forecast, and solid line is real data. 97 Chapter 6 Asymptotic normality of PARMA model Let Xn,i = (Xi , ..., Xi+n−1 ) be a causal and invertible PARMAS (p, q) process (1.0.1), where n is the number of observations and i = 0, ..., S − 1. For notations, let φt = (φt (1), . . . , φt (p)) and θ t = (θt (1), . . . , θt (q)) denote the autoregressive and moving-average parameters during season t, respectively. Then we can write the likelihood function as L(β), where we use β = (φ0 , θ 0 , φ1 , θ 1 , . . . , φS−1 , θ S−1 ) to denote the collection of all PARMAS (p, q) parameters. The dimension of β is (p+q)S ×1. Following the ideas from Ba2 ) as nuisance sawa and Lund [24], we treat the white noise variances σ 2 = (σ02 , σ12 , . . . , σS−1 parameters. Then the likelihood function is given by (5.1.13) in Chapter 5. By (5.1.16), once ˆ the MLE of σ 2 for 0 ≤ i ≤ S − 1 could be we obtain the maximum likelihood estimate β, i computed by 1 σ ˆi2 = N N −1 ˆ (i) XkS+i − X kS+i k=0 98 2 /rkS+i = Si /N, 2 ) . Equivalently, we could minimize the negative log likelihood ˆS−1 where σ ˆ 2 = (ˆ σ02 , . . . , σ (β) = −2 log{L(β)} n−1 = n log(2π) + n−1 (X log(vj,i ) + j=0 n−1 = n log(2π) + j=0 2 r )+ log(σi+j j,i j=0 n−1 = n log(2π) + ˆ (i) 2 i+j − Xi+j ) vj,i n−1 (X j=0 2 )+ log(σi+j j=0 ˆ (i) 2 i+j − Xi+j ) 2 r σi+j j,i n−1 n−1 (X log(rj,i ) + j=0 j=0 Next we consider the properties of the first derivative (6.0.1) ˆ (i) 2 i+j − Xi+j ) . 2 r σi+j j,i ∂ (β) in a few lemmas. ∂βk Lemma 6.0.2. For k = 1, . . . , p+q, there exist constants C > 0 and s ∈ (0, 1) such that ∂rt,i (β) ≤ C(β)st , ∂βk where t ≥ 1, and 1 ≤ k ≤ p + q. Proof. Note that rt,i (β) = E Yt+i − Yˆt+i 2 ˆ E Xt+i − X t+i = 2 σt+i 2 , where Y t,i = (Yi , . . . , Yt+i−1 ) is the mean zero PARMA process with period S given by p Yt − q at (k)Yt−k = Zt + bt (j)Zt−j j=1 k=1 99 (6.0.2) where {Zt } is a sequence of random variables with mean zero and standard deviation 1. Here the autoregressive parameters at (j) and the moving average parameters bt (j) are assumed to be periodic functions of t with the same period S ≥ 1. By the invertibility assumption, we write ∞ Yt+i + Yt+i−j πt+i (j) = Zt+i , (6.0.3) j=1 where πt+i (j) = πt+i+kS (j) by the periodic property of PARMA model. In the following, we write πt+i (j) = πt+i (j; β) to emphasize the dependence of πt+i (j) on β, and the autocovariance function is ηi (h; β) = Cov(Yi , Yi−h ), By Anderson et al. [6], the best linear predictor of Yt+i is defined as (i) (i) (i) Yˆt+i = φt,1 Yt+i−1 + . . . + φt,t Yi , t ≥ 1, where the vector of coefficients, φt,i = (i) (i) φt,1 , . . . , φt,t , appears in the prediction equations Gt,i φt,i = η t,i , where η t,i = ηt+i−1 (1), ηt+i−2 (2), . . . , ηi (t) , and t Gt,i = ηt+i− ( − j; β) j, =1 is the covariance matrix of (Yt+i−1 , . . . , Yi ) . Proposition 2.1.1 of Anderson et al. [6] shows 100 that if σi2 > 0 for i = 0, 1, . . . , S − 1, then for a causal PARMAS (p, q) process the covariance matrix Gt,i is nonsingular for every t ≥ 1 and i. In this way we can generalize Corollary 5.1.1 in Brockwell and Davis [11] to periodically stationary process, rt,i (β) = ηt+i (0; β) − η t,i G−1 t,i η t,i . (6.0.4) If we multiple Yt+i−1 and take expectations on both sides of (6.0.3), we could obtain ∞ ηt+i−1 (1; β) = − πt+i (j; β)ηt+i−j (j − 1; β), j=1 similarly, if we multiple Yt+i−2 , Yt+i−3 , . . . individually on both sides of (6.0.3) and take expectations, we have ∞ ηt+i−2 (2; β) = − πt+i (j; β)ηt+i−j (j − 2; β), j=1 ∞ πt+i (j; β)ηt+i−j (j − 3; β), ηt+i−3 (3; β) = − j=1 .. . Therefore we could have η ∞,i = −G∞,i π ∞,i , where η ∞,i = ηt+i−1 (1; β), ηt+i−2 (2; β), . . . , ∞ G∞,i = ηt+i− (j − ; β) , j, =1 101 π ∞,i = πt+i (1; β), πt+i (2; β), . . . . Additionally, G−1 ∞,i can be written as G−1 i,∞ = Ti,∞ Ti,∞ , where Ti,∞ = πt+i (k − j; β) ∞ k,j=1 , πt+i (0; β) = 1 and πt+i (j; β) = 0, for j < 0. Furthermore, if we multiply Yt+i on both sides of (6.0.3) and take expectation, we have ηt+i (0; β) = π ∞,i G∞,i π ∞,i + 1. (6.0.5) Substituting (6.0.5) into (6.0.4), we have rt,i (β) = 1 + π ∞,i G∞,i π ∞,i − η t,i G−1 t,i η t,i −1 = 1 + η ∞,i G−1 ∞,i η ∞,i − η t,i Gt,i η t,i Therefore ∂rt,i (β 0 ) ∂βk = 2 ∂η ∞,i −1 ∂G∞,i G−1 η G−1 η + η G ∞,i ∞,i ∂β ∞,i ∞,i ∞,i ∞,i ∂βk k −2 ∂η t,i −1 ∂Gt,i G−1 η , η + η G G−1 t,i t,i ∂β t,i t,i ∂βk t,i t,i k where all the terms on the right hand side are evaluated at β = β 0 , and β 0 is the truth. By 102 G−1 t,i η t,i = φt,i , the above equation reduces to   ∂η ∂η ∂rt,i (β 0 ) t,i  ∞,i  = −2  π ∞,i + φt,i  ∂βk ∂βk ∂βk ∂G∞,i ∂Gt,i − π ∞,i π ∞,i − φt,i φ . ∂βk ∂βk t,i In Anderson et al. [6, Corollary 2.2.4], they show that (i) φ → −πt+i (k) as k → ∞, t,k and 2  n−1  2 2  (i)  πt+i (j)  M  , ≤ φt,j + πt+i (j)  πC j≥n j=0 where M = max{ηt+i (0; β), i = 0, . . . , S − 1}. Note that j≥n πt+i (j) 2 is bounded in n. Therefore we could have ∂rt,i (β 0 ) ∂βk  ≤ 2K1  n−1  (i) πt+i (j) + φt,j + πt+i (j)  j≥n j=0  t (i) (i) +K1  πt+i (i)πt+i ( ) − φt,j φ +2 πt+i (j) t, j≥n j, =1  πt+i ( )  , 1 π ∂g(λ;β0 ) dλ and g(λ; β ) is the spectral density matrix of the equivwhere K1 = 2π 0 −π ∂βk alent vector ARMA process with length S. Note that |A| is the determinant of A if A is a 103 matrix. By the Cauchy-Schwarz inequality, ∂rt,i (β 0 ) ∂βk ≤ K2 t1/2 st/2 + K3 πt+i (j) + K4 tst/2 j≥n (6.0.6) ≤ K5 st1 , where K2 , K3 , K4 , K5 are positive constants, 0 < s < 1, and 0 < s1 < 1. This finishes the proof. Lemma  6.0.3. For k = 1, . . . , p+q, n−1/2  n−1 log r t,i + t=0 ∂ ∂βk Proof. Given  2 ˆ p n−1 (Xt+i −Xt +i) ∂rt,i  → 0. 2 t=0 ∂βk rt,i β=βˆ ∂rt,i (β 0 ) 2 ≥ 1, we ≤ C(β){S(β)}t from Lemma 6.0.2 and the fact rt−1,i ∂βk have n−1/2 ∂ ∂βk n−1 ≤ n−1/2 log rt−1,i β=βˆ t=0 n−1 1 2 t=0 rt−1,i ∂rt−1,i ∂βk n−1 ∂r t−1,i ∂βk β=βˆ t=0 n−1 −1/2 st1 ≤ C1 n t=0 → 0. ≤ n−1/2 Therefore n−1/2 ∂ ∂βk n−1 p → 0. log rt−1,i β=βˆ t=0 104 β=βˆ On the other hand,   n−1 (X 2 ˆ  t+i − Xt+i ) ∂rt,i I(A) 2 ∂βk rt,i t=0 β=βˆ   n−1 (X ˆ −X )2 t t+i t+i  ≤ C1 E s1 I(A) 2 r t,i t=0   2 2 n−1 σ r t+i t,i t = C1  s1 I(A) 2 r t,i t=0  n−1/2 E  n−1 ≤ CC1 st1 t=0 → 0, where I(·) is the indicator function and C > 0 is a constant, A is an event with the probability arbitrarily close to 1 and on which βˆ − β 0 arbitrarily small for all large n, where we adopt the strong consistency for MLE of a vector ARMA model from Dunsmuir and Hannan [15], and apply it for PARMA model. n−1/2 n−1 (X t=0 2 ˆ t+i − Xt+i ) ∂rt,i 2 ∂βk rt,i p → 0. β=βˆ Remark 6.0.4. From Section 2 of Basawa and Lund [24], the PARMA model (1.0.1) has the S−variate vector ARMA representation → − Φ0 X N − p∗ → − − Φk X N −k = Θ0 → ε0+ k=1 q∗ k=1 105 − Θk → ε N −k , (6.0.7) → − → − − where { X N } and {→ ε N } are the S−variate series { X N } = XN S+1 , . . . , XN S+S and − {→ ε N } = εN S+1 , . . . , εN S+S . The model orders in (6.0.7) are p∗ = [p/S] and q ∗ = [q/S], where [x] denotes the smallest integer greater than or equal to x. Remark 6.0.5. In Dunsmuir and Hannan [15], the following assumption is needed for the strong consistency of MLE of the vector ARMA model. They restricted the elements of Φj , 1 ≤ j ≤ p∗ and Θj , 1 ≤ j ≤ q ∗ to lie in a ball specified by   tr  p∗ q∗ Φj Φj + 1 1   Θj Θj  < ∞. (6.0.8) With condition (6.0.8), Dunsmuir and Hannan [15, Theorem 3] gave the strong consistency of the MLE of the vector ARMA model. Remark 6.0.6. By Section 3 of Basawa and Lund [24], we could work in the univariate PARMA setting rather than transform to an S−variate vector ARMA model. This is eligible by two reasons. First of all, one has to invert the S-variate ARMA transformation to obtain the individual PARMA model cofficients. Hence the results derived directly in terms of the univariate PARMA model will be more readily usable. Second of all, even though one could obtain a standard vector ARMA model, the covariance matrix of the vector noises and the moving average parameters would still depend both the PARMA autoregressive parameters and the variance of vector ARMA model. For the above reasons, we will work directly in the PARMA setting. Therefore we adopt the strong consistency for MLE of a vector ARMA model from Dunsmuir and Hannan [15], and apply it for PARMA model. 106 Lemma 6.0.7. For k = 1, . . . , p + q, n−1/2 n−1 X ˆ ˆ p t+i − Xt+i + Zt+i ∂(Xt+i + Zt+i ) → 0, rt,i ∂βk β=βˆ t=0 n−1/2 n−1 X ˆ ˆ p t+i − Xt+i − Zt+i ∂(Xt+i − Zt+i ) → 0. rt,i ∂βk β=βˆ t=0 and Proof. We only prove n−1/2 ˆ ˆ n−1 Xt+i −Xt+i −Zt+i ∂(Xt+i −Zt+i ) rt,i t=0 ∂βk p → 0, since β=βˆ the other result could be proved in a similar way. By the invertible assumption of PARMAS (p, q) model, ∞ πt (j)Xt−j , Zt = j=0 and from the model set-up in (1.0.1), we can write φt (z) = 1 − φt (1)z − . . . − φt (p)z p , and θ t (z) = 1 + θt (1)z + . . . + θt (q)z q . Additionally, π t (z) = φt (z) =1+ θ t (z) ∞ πj z j . j=1 Notice that β = (φt (1), φt (2), . . . , φt (p), θt (1), θt (2), . . . , θt (q)) , and when k = 1, . . . , p, we 107 have ∞ −∂φt (z)/∂βk zk ∂ (φt (z)/θ t (z)) ∂π t (z) ∂πt (j) j = =− =− =− z , θ t (z) θ t (z) ∂βk ∂βk ∂βk j=1 and when k = 1, 2, . . . , q, we have ∂ (π t (z)) = ∂βp+k ∞ ∂πt (j) j ∂ (φt (z)/θ t (z)) φ (z) ∂θ t (z) φ (z) z = =− t = − t zk . ∂β ∂βp+k θ 2t (z) ∂βp+k θ 2t (z) j=1 p+k In summary,     zk ∞ ∂πt (j) z j   θ (z) = − j=1 ∂β t k  φt (z) k  ∞ ∂πt (j) j    θ 2 (z) z = − j=1 ∂βp+k z . t (6.0.9) Then we can prove from (6.0.9) that there exist constants C > 0 and s ∈ (0, 1) such that ∂πj ∂βk ≤ Csj , j ≥ 1, 1 ≤ k ≤ p + q. (i) (i) Furthermore, we may expand the Yule-Walker equation Γn,i φn = γ n as                  γi+n−1 (1)        γi+n−2 (2)        = γi+n−3 (3)      ..     .     γi (n)  (i) φn,1 (i) φn,2 (i) φn,3 .. . γi+n−1 (0) γi+n−1 (−1) . . . γi+n−1 (−n + 1)      γi+n−2 (1) γi+n−2 (0) . . . γi+n−2 (−n + 2)      γi+n−3 (2) γi+n−3 (1) . . . γi+n−3 (−n + 3)    .. .. ..   . . ... .   (i) γi (n − 1) γi (n − 2) ... γi (0) φn,n 108         ,       then for 1 ≤ j ≤ n, we have n (i) φ γi+n−j (j − ), n, γi+n−j (j) = =1 therefore for 1 ≤ k ≤ p + q, ∂γi+n−j (j) ∂βk n ∂γ n ∂φ(i) (j − ) (i) i+n−j n, = φ + γ (j − ). n, ∂βk ∂βk i+n−j =1 =1 (6.0.10) On the other hand, ∞ Zn+i = Xn+i + πn+i ( )Xn+i− , =1 if we multiply Xn+i−j on both sides and take expectations, we can obtain ∞ πn+i ( )γn+i−j (j − ), γn+i−j (j) = =1 then similarly for 1 ≤ k ≤ p + q, ∂γn+i−j (j) ∂βk = ∞ ∂π ∞ ∂γn+i−j (j − ) n+i ( ) γ (j − ) + πn+i ( ) . n+i−j ∂βk ∂βk =1 =1 (6.0.11) If we subtract (6.0.10) from (6.0.11), we have n 0 = ∂γi+n−j (j − ) (i) ∂γi+n−j (j − ) πi+n ( ) − φ + πi+n ( ) (6.0.12) n, ∂βk ∂βk =1 >n   (i) n ∂φ ∂πi+n ( ) n,   ∂πi+n ( ) − γi+n−j (j − ).  γi+n−j (j − ) +  ∂βk ∂βk ∂βk >n =1 109 Furthermore, we may write (6.0.12) as  (i) n ∂φ ∂π ( )  n, − i+n  γi+n−j (j − )  ∂βk ∂βk =1 n (i) ∂γi+n−j (j − ) = πi+n ( ) − φ n, ∂βk =1 ∂γi+n−j (j − ) ∂πi+n ( ) + πi+n ( ) + γi+n−j (j − ) . ∂βk ∂βk >n  (6.0.13) Consequently, we write (6.0.13) as ∂ π n,i − φn,i ∂βk ∂Γn,i = Γ−1 n,i ∂β (φn,i − π n,i ) + dn,i , k where dn,i is an n × 1 vector with πi+n ( ) >n ∂γi+n−j (j − ) ∂βk + ∂πi+n ( ) γi+n−j (j − ) ∂βk as its j-th component. Then under the assumption of Z−t = X−t = 0 for all t ≥ 0 we have Zt+i = θ(B)−1 φ(B)Xt+i = π(B)Xt+i ∞ = Xt+i + Xt+i−j πt+i (j) j=1 n−1 = Xt+i + Xt+i−j πt+i (j). j=0 110 Now it follows from (6.0.13) that 2 (i) ˆ  ∂ Xt+i − Zt+i   E   ∂βk   (i) (i)  ∂ φt,1 Xt+i−1 + . . . + φt,t Xi − Xt+i − =E   ∂βk 2 n−1 X j=0 t+i−j πt+i (j)    2 (i) n−1 ∂ φt,j − πt+i (j)  =E  Xt+i−j    ∂βk j=0  = = ≤ ∂ φt,i − π t,i ∂βk Γt,i ∂ φt,i − π t,i ∂βk ∂Γt,i φt,i − π t,i + dn,i ∂βk 2 λmin (Γt,i ) ∂Γt,i Γ−1 φt,i − π t,i + dt,i t,i ∂β k ∂Γt,i φt,i − π t,i ∂βk  2+ d 2 t,i 2  ∂πt+i ( ) 2max{α, γi (0)}t  2   ≤ πt+i ( ) +  φt,i − π t,i +   λmin (Γt,i ) ∂βk >t  ≤C1 st1 , where λmin (Γt,i ) > 0 denotes the minimum eigenvalue of Γt,i , and 2 π ∂ θ(eiw )/φ(eiw ) α −π ∂βk dw ≥ ∂γi (j) , ∂βk for j ≥ 1 and 1 ≤ k ≤ p + q, and C1 > 0 and s1 ∈ (0, 1) are some constants, which depend on β ∈ B continuously. Additionally, a detailed discussion of spectral density for PARMA 111 model could be seen in Wyloma´ nska [56]. Next using the similar argument as in the proof of Lemma 6.0.3, we may show that ˆ ∂(X t+i − Zt+i ) I(A) E ∂βk 2 β=βˆ ≤ C2 st2 , t ≥ 1, 1 ≤ k ≤ p + q, where A is an event with the probability arbitrarily close to 1 and on which βˆ − β 0 arbitrarily small for all large n, and C2 > 0 and S2 ∈ (0, 1) are some constants. Hence   ˆ ˆ n−1 E (Xt+i −Xt+i −Zt+i ) ∂(Xt+i −Zt+i ) I(A) rt,i t=0 ∂βk ˆ β=β n−1/2 1/2 2 ˆ ∂(Xt+i −Zt+i ) n−1 E X ˆ I(A) t+i − Xt+i (β) − Zt+i (β) E t=0 ∂βk ˆ β=β t/2 Cn−1/2 n−1 t=0 s2  ≤ n−1/2 ≤ 2 → 0, where C > 0 is a constant. Thus n−1/2 n−1 (X ˆ ˆ t+i − Xt+i − Zt+i ) ∂(Xt+i − Zt+i ) → 0. rt,i ∂βk ˆ β=β t=0 In a similar manner, we could prove n−1/2 n−1 X ˆ ˆ p t+i − Xt+i + Zt+i ∂(Xt+i + Zt+i ) → 0, rt,i ∂βk β=βˆ t=0 and this ends the proof. We will show that the asymptotic variance of the estimated periodic AR and MA coeffi112 cient vector can be nicely represented in terms of the two PAR models. Define φt (B)ξt = Wt , and θt (B)ςt = Wt, where {Wt } ∼ WN(0, 1) is a white noise process with mean 0 and variance 1. Let ξ = ξ−1 , ξ−2 , . . . , ξ−p , ς−1 , . . . , ς−q , and W(β) = W(φt , θt ) = {Var(ξ)}−1 . Before we state our next lemma, we need to clarify notation. By the invertible assumption of PARMA model (1.0.1), we may write εt+i = εt+i (β) = Xt+i −φt (i)Xt+i−1 −. . . φt (p)Xt+i−p −θt (1)εt+i−1 . . . −θt (q)εt+i−q . Then for k = 1, . . . .p, we may write ∂ε (i) U = − t+i , tk ∂βk and for k = 1, 2, . . . , q, V ∂ε (i) = − t+i . tk ∂βp+k Let χ = (X, Z) and τ = (U , V ), where X and U are n × p matrices with Xi+j− (i) and U as their (j, )-th elements, respectively, and Z and V are n × q matrices with j 113 (i) εi+j− and V as their (j, )-th elements respectively. Denote R as the diagonal matrix j diag r0,i (β0 ), . . . , rn−1,i (β0 ) . P Lemma 6.0.8. n−1 τ R−1 τ → W∗ (β) = {Var(σ t ξ t )}−1 . (i) (i) Proof. In the following proof, all U ,V ,Zt+i , and rt,i are evaluated at β = β 0 . We tk tk adopt the notations that (i) (i) B k Utj = U , t−k,j and (i) (i) B k Vtj = V . t−k,j From the invertible representation, we have (i) Utj = θt−1 (B)Xt+i−j , and (i) Vtj = θt−1 (B)Zt+i−j = φt (B)θt−2 (B)Xt+i−j, assuming that X−t = Z−t = 0 for all t ≥ 0. Let     1/θt (z) = 1 − j≥k ψt (j)z    φt (z)/θ2 (z) = 1 − t 114 j j j≥1 ηj z , then (i) Utj =Xt+i−j − t+i−j−1 ψt (j)Xt+i−j−k k=1 =φt (B)−1 Zt+i−j + ψt (k)Xt+i−j−k (6.0.14) k≥t+i−j ˜(i) (i) =Utj + utj , and (i) Vtj =Xt+i−j − t+i−j−1 ηk Xt+i−j−k k=1 =θt (B)−1 Zt+i−j + ηk Xt+i−j−k (6.0.15) k≥t+i−j ˜(i) (i) =Vtj + vtj . Note that  2 ∞ 2 (i) = E E utj ψt (k)Xt+i−j−k  k≥t+i−j ∞ ≤ γt+i−j−k (k − )ψt (k)ψt ( ) =1,k≥t+i−j ∞ 2 ≤ M ψt+i−j−k k=1 ≤ Cst−j , (6.0.16) for t ≥ 1, and 1 ≤ j ≤ p, where 0 < s < 1 and M = max{γi (0) : i = 0, 1, . . . , S − 1}. 115 Similarly, we could have (i) 2 E vtj ≤ Cst−j , for t ≥ 1, and 1 ≤ j ≤ q. Notice that the (j, )−th element of n−1 τ R−1 τ is 1 n n−1 t=0 1 (i) (i) Utj U /rt,i = t n n−1 t=0 ˜(i) ˜(i) ˜(i) (i) (i) ˜(i) (i) (i) Utj U + Utj u + utj U + utj u /rt,i . t t t t By the ergodic theorem, under the assumption of a measurable density function for PARMA process, 1 n n−1 ˜ ˜ (i) (i) Utj U t t=0 = a.s. → = Cov = n−1 −1 [φ−1 t (B)εt+i−j ][φt (B)εt+i− ] t=0 ˜(i) ˜(i) Cov(Utj U ) t 1 n −1 φ−1 t (B)Zt+i−j , φt (B)Zt+i− (σt+i−j σt+i− )Cov(ξt+i−j , ξt+i− ). By Corollary 5.1.2 |rt,i − 1| → 0 as t → ∞, we have 1 n n−1 ˜ ˜ (i) (i) a.s. Utj U /rt,i → (σt+i−j σt+i− )Cov(ξt+i−j , ξt+i− ). t t=0 116 By Cauchy-Schwartz inequality and (6.0.16), 1 n n−1 t=0 1 (i) (i) E utj u /rt,i ≤ t n 1 ≤ n ≤ n−1 t=0 n−1 C n (i) (i) E utj u t t=0 n−1 1/2 (i) 2 (i) 2 E utj E u t s2t−j− t=0 → 0. Then 1 n n−1 t=0 p (i) (i) utj u /rt,i → 0. t With a similar method, we may also show that 1 n n−1 ˜ p (i) (i) Utj u /rt,i → 0, t t=0 and 1 n n−1 t=0 p (i) ˜(i) utj U /rt,i → 0. t Therefore for 1 ≤ j, ≤ p, 1 n n−1 t=0 p (i) (i) Utj U /rt,i → (σt+i−j σt+i− )Cov(ξt+i−j , ξt+i− ). t We can prove in a similar manner that for 1 ≤ j ≤ p and 1 ≤ ≤ q, 1 n n−1 t=0 p (i) (i) Utj V /rt,i → (σt+i−j σt+i− )Cov(ξt+i−j , ςt+i− ), t 117 (6.0.17) and 1 ≤ j, ≤ q, 1 n n−1 t=0 p (i) (i) Vtj V /rt,i → (σt+i−j σt+i− )Cov(ςt+i−j , ςt+i− ). t Putting the above three equations together, we have completed the proof. (i) Lemma 6.0.9. Let τ = (U , V ), U is an n × p matrix with U as its (j, )-th elej (i) ments, respectively, and V is an n × q matrix with V as their (j, )-th elements. Both j U and V follow the assumption of second finite moment. R denotes the diagonal matrix diag r0,i (β0 ), . . . , rn−1,i (β0 ) , and Z = Zt+i (β0 ), Zt+i+1 (β0 ) . . . , Zt+n−1 (β0 ) , then D n−1/2 τ R−1 Z → N 0, σt2 W ∗ (β 0 )−1 . Proof. Define (i) U˜t = (i) (i) (i) (i) U˜t1 , . . . , U˜tp , V˜t1 , . . . , V˜tp and (i) ut = (i) (i) (i) (i) ut1 , . . . , utp , vt1 , . . . , vtq , (i) (i) (i) (i) where U˜tj , V˜tj , utj and vtj are defined in (6.0.14) and (6.0.15). From the model invertible assumption, we may write Zt = εt + zt , where zt = − πt (j)Xt−j , and π t (z) = j≥t 118 φt (z) = θ t (z) 1+ ∞ π z j . then j=1 j n−1/2 τ R−1 Z = n−1/2 n−1 (i) (i) U˜t + ut εt + zt rt,i (6.0.18) t=0 n−1 U˜ (i) ε + U˜ (i) z + u(i) ε + u(i) z −1/2 t t t t t t t t. = n rt,i t=0 Using similar method as (6.0.16), we may obtain that for all t ≥ 1, E zt2 ≤ Cst , where C > 0, and s ∈ (0, 1) are some constants. Additionally based on the same argument as (6.0.17), we may obtain that n−1/2 n−1 U˜ (i) z + u(i) ε + u(i) z P t t t t t t → 0. rt,i t=0 (6.0.19) (i) (i) Let Ft be the σ−algebra generated by {εt+i−k, k ≥ 0}, then {α U˜t εt /rt,i } are martin(i) (i) gale differences with respect to {Ft }, for any α ∈ Rp+q . Furthermore, α U˜t εt /rt,i is (i) Ft −measurable and E (i) (i) α U˜t εt /rt,i |Ft−1 = 119 (i) α U˜t /rt,i Eεt = 0. Further for any ε > 0, 1 n ≤ 1 n n−1 2 (i) α U˜t εt /rt,i I E 2 (i) α U˜t εt I t=0 n−1 (i) α U˜t εt /rt,i (i) |Ft−1 (i) α U˜t εt > n1/2 ε t=0 I 1 ≤ n E (i) α U˜t > log n + I n−1 (i) 2 2 ˜ σt α Ut I (i) α U˜t ≤ log n (i) |Ft−1 (i) (i) 2 ˜ ˜ α Ut > log n + α Ut E ε2t I n1/2 ε |εt | > log n t=0 ∼σt2 E (i) 2 I α U˜1 (i) 2 + E α U˜1 E ε21 I (i) α U˜1 > log n |ε1 | > n1/2 ε log n →0. (i) The last limit follows from the fact that both εt and α U˜1 have finite second moments. Note that since rt,i → 1 as t → ∞, 1 n n−1 (i) α U˜t εt /rt,i 2 ∼ t=0 1 n n−1 2 (i) α U˜t εt t=0 2 (i) a.s. → E α U˜t εt (i) 2 = σt2 E α U˜t = σt4 α W (β 0 )−1 α = σt2 α W ∗ (β 0 )−1 α. 120 Then it follows from Theorem 4 of p. 511 of Shiryaev [40] that n−1/2 n−1 (i) α U˜t εt /rt,i d →N 0, σt2 α W ∗ (β 0 )−1 α , (6.0.20) t=0 for any α ∈ Rp+q . Now the limit n−1/2 n−1 D τ R−1 Z → N 0, σt2 W ∗ (β 0 )−1 t=0 follows from (6.0.20), (6.0.18) and (6.0.19). Theorem 6.0.10. Let Xt be the PARMAS (p, q) process defined by (1.0.1), and suppose that the vector of true parameter values β 0 ∈ B, where B is the parameter space containing all β. Then as n → ∞, D ˆ −β )→ n1/2 (β N 0, W ∗ (β 0 ) . 0 (i) X − Xˆ , where H (i) is given as ˆ ,...,X ˆ Proof. Let Xˆn = X n n i i+n−1 , then Xn = H         (i) H =        1 0 ... 0    (i) h11 1 0 ... 0     (i) (i) , h22 h21 1 ... 0     (i) (i) (i) h33 h32 h31 ... 0    (i) (i) (i) hn−1,n−1 hn−1,n−2 hn−1,n−3 . . . 1 121 0 (i) where the coefficients h22 depend on i. Notice that Consequently, Xn Σ(β)−1 Xn = n−1 2 ˆ (i) Xt+i − X t+i σt2 rt−1,i t=0 where |Σ(β)| = , S−1 σ 2 N , with N = n/S. Now it follows from (6.0.1) that i=0 i n−1 (β) = −2 log L(β) = n log(2π) + n + S−1 log(rj,i ) + N j=0 where N −1 ˆ (i) XkS+i − X kS+i k=0 rkS+i Si = log(Si /N ), i=0 2 . ˆ is the solution of the equation ∂ (β) = 0, and for 1 ≤ k ≤ p, the equality Note that β ∂β ∂ (β)| = 0 leads to ∂β β=βˆ n−1 Z (β)U ˆ (i) (β) ˆ t (i) tk 0 = +δ (6.0.21) k ˆ rt,i (β) t=0   (i) p q n−1 Uˆ (β 0 ) ˆ (j)X ˆ (p + i)Z ˆ − β ) + δ (i) , Xt −  tk = β β + η k (β t t 0 t−j − t−i (β 0 ) k rt,i t=0 j=1 i=1 122 where X−j = Z−j = 0 for all j ≥ 0, and n−1 ˆ )2 ∂rt,i σt2 ∂ n−1 1 (Xt − X (i) t δ = log rt,i − k 2 2 ∂βk 2 ∂βk r t,i t=0 t=0 n−1 ˆ + Z ∂(X ˆ +Z ) X −X ˆ − Z ∂(X ˆ −Z ) 1 Xt − X t t t t + t t t t t − 2 rt,i ∂βk rt,i ∂βk t=0 ˆ β=β , (6.0.22) and n−1 U (i) (β ) q (i) tk n ˆ β Ut−j (β n ) ηk = p+j r (β ) t=0 t−1 n j=1     (i) q p n−1 U ˆ  ∂  tk  Xt − βˆj Xt−j − β + p+j Zt−i (β n ) ∂β r t−1 t=0 j=1 j=1 β=β n   (i) n−1 n−1 U (i) (β ) q ∂  Utk  0 (i) tk Zt (β 0 ) β p+j Ut−j (β 0 ) + = rt−1 (β 0 ) ∂β rt−1 t=0 t=0 j=1 β=β 0 ˆ − β ). +Op (n β 0 (i) (i) (i) (i) Ut1 , . . . , Utp , Vt1 , . . . , Vtq , and β n is always between ∂ ˆ and β . Similarly, the equation β (β)| 0 ˆ = 0 (1 ≤ k ≤ q) leads to ∂βp+j β=β In the above expression, U t = n−1  p ˆ X β j t−j − Xt − 0= t=0 +η  q j=1 j=1 ˆ  (i) β p+j Zt−i (β 0 ) Vtk (β 0 )rt−1 (β 0 ) (i) (i) ˆ (i) (β − β 0 ) + δ , p+k p+k 123 where δ n−1 n−1 ˆ )2 ∂rt,i σt2 ∂ 1 (Xt − X t log rt,i − 2 2 ∂βp+k 2 ∂βp+k rt,i t=0 t=0 n−1 ˆ + Z ∂(X ˆ +Z ) X −X ˆ − Z ∂(X ˆ −Z ) 1 Xt − X t t t t + t t t t t − 2 rt,i ∂βp+k rt,i ∂βp+k t=0 (i) = p+k ˆ β=β , (6.0.23) and   (i) n−1 U (i) (β ) q n−1 V ∂  tk  (i) (i) tk 0 η = β p+j Ut−j (β 0 ) + Zt (β 0 ) p+k r (β ) ∂β rt−1 t=0 t−1 0 j=1 t=0 β=β 0 ˆ − β ). +Op (n β 0 It follows from (6.0.21) and (6.0.23) that ˆ = U R−1 Y + A (β ˆ − β ) + δ (i) , U R−1 X β 0 (6.0.24) where (i) (i) δ (i) = (δ1 , . . . , δp+q ) , and A is the (p + q) × (p + q) matrix with η (i) as its k-th column. Note that Y − Xβ 0 = Z k and  q    U =X− β p+j    j=1 124  U −j (β 0 ) .. . U n−1−j (β 0 )    .   By (6.0.21), (6.0.23) and (6.0.24), we could obtain ˆ − β ) = U R−1 Z + B (i) U R−1 U (β 0 ˆ − β ) + δ (i) , (β 0 where B (i) is the (p + q) × (p + q) matrix, and the sum of last two terms on the right hand side of (6.0.23) is the (p + k)-th term for B (i) , with k = 1, . . . , q. Therefore,  ˆ −β ) =  n1/2 (β 0 = U R−1 U n U R−1 U n − B (i) −1 n −1 U R−1 Z − δ (i)  n1/2 U R−1 Z + op (1), n1/2 (i) p where the last equality follows from Lemma 6.0.3 and Lemma 6.0.7, and the fact that Bn → 0, with a similar proof as (6.0.17). Then the theorem follows from Lemma 6.0.9. 125 Chapter 7 Periodic AIC for PARMAS (p, q) model 7.1 Kullback-Liebler (K-L) Information The development of the AIC is predicted on the Kullback-Liebler (K-L) information between two probability density functions f and g, where K-L information is defined to be I(f, g) = f (t) ln f (t) g(t) dt. The notation I(f, g) denotes a measure of the information lost when g is used to approximate f , which is also the expectation of the logarithmic difference between the two density functions f and g. Example 7.1.1. Suppose we approximate the normal distribution given by f (t|µ, σ 2 ) = 1 t−µ 2 1 t−ξ 2 √1 e− 2 ( σ ) with g(t|ξ, τ 2 ) = √1 e− 2 ( τ ) . Then σ 2π τ 2π ln f (t) g(t) 1 τ2 t−µ 2 t−ξ 2 = {ln −( ) +( ) }, 2 σ τ σ2 126 and the K-L information is 1 τ2 t−µ 2 t−ξ 2 ln − E( ) + E( ) 2 σ τ σ2 1 τ2 σ 2 + (µ − ξ)2 = ln −1+ . 2 σ2 τ2 I(f, g) = Ef ln f (t) g(t) = If the true distribution f is the standard normal and g ∼ N(0.1, 1.5), then 1.5 1 + (0 − 0.1)2 1 −1+ } = 0.0394. I(f, g) = {ln 2 1 1.5 Proposition 7.1.2. I(f, g) ≥ 0. Proof. For a convex function, C, Jensen’s Inequality from Durrett [16, Theorem 1.5.1] asserts that C [E(X)] ≤ E (C(X)). Therefore by letting C = − ln(t), we write I(f, g) = = = = ≥ = f (t) dt g(t) g(t) − f (t) ln dt f (t) g(t) dt f (t)C f (t) g(t) Ef C f (t) g(t) C Ef f (t) g(t) − ln f (t) dt f (t) f (t) ln = − ln = − ln 1 = 0. 127 g(t)dt 7.2 Derivation of periodic AIC for PARMAS (p, q) process Following the ideas from Basawa and Lund [24], we treat the white noise variances σ 2 = 2 ) as nuisance parameters. We use β = (φ , θ , φ , θ , . . . , φ (σ02 , σ12 , . . . , σS−1 0 0 1 1 S−1 , θ S−1 ) to denote the collection of all PARMAS (p, q) parameters. The dimension of β is (p+q)S ×1. ˆ (i) depend on β. We Then the likelihood function is given by (5.1.13), where vj,i and X i+j also need the asymptotic results of MLE for PARMA process in Basawa and Lund [24]. For a causal and invertible Gaussian PARMA model, with the assumption of {εt } being periodic i.i.d. Gaussian noise, Theorem 3.1 in Basawa and Lund [24] gave the asymptotic distribution ˆ of β, N 1/2 (βˆ − β) → N 0, A−1 (β, σ 2 ) , (7.2.1) where A(β, σ 2 ) = S−1 σi−2 Γi (β, σ 2 ), (7.2.2) i=0 and Γi (β, σ 2 ) = E ∂εt (β) ∂β ∂εt (β) ∂β , where the right hand side also depends on β and σ 2 by (5.2.7). However, their proof for the asymptotic distribution is based on the asymptotic equivalence of lease square estimator and MLE. In Chapter 6 of my thesis, I gave a direct proof in Theorem 6.0.10. ˆ the MLE of σ 2 for 0 ≤ i ≤ S − 1 Once we obtain the maximum likelihood estimate β, i 128 could be computed from (5.1.16), where 1 σ ˆi2 = N N −1 ˆ XkS+i − X kS+i 2 /rkS+i = Si /N, k=0 ˆ where X kS+i and rkS+i come from the innovations algorithm applied to the model. Now we are ready to discuss the derivation of AIC. In the K-L information, f represents the true probability distribution and g represents a model distribution that estimates f . In the context of PARMA time series modeling, we assume that the truth f and all approximating alternatives g are Gaussian. Additionally, define it as g(t|β0 ). Suppose X is an n-length PARMAS (p, q) time series whose probability density is given by f (t) = g(t|β0 ), where X = (Xi , Xi+1 , . . . , Xi+n−1 ) . To see how we use the K-L information to determine which model, g(t|β), best fits the truth f , so that it would minimize I(f, g), we write I(f, g) = = f (t) ln( f (t) )dt g(t|β) f (t) ln(f (t))dt − f (t) ln(g(t|β))dt. (7.2.3) For all models g, the first integral on the right-hand side of (7.2.3) is a constant, so it suffices to maximize f (t) ln(g(t|β))dt. Given that we have data Y = (Yi , Yi+1 , . . . , Yi+n−1 ) as ˆ ), since a sample from the same truth f (t), the logical step would be to find the MLE β(Y ˆ ) approximates βˆ that minimizes the K-L information. Then we compute an estimate β(Y 0 of I(f, g(t|β0 )) as ˆ )) = I(f, g(t|β(Y f (t) ln( f (t) )dt. ˆ )) g(t|β(Y ˆ ) other than β results in I(f, g(t|β(Y ˆ ))) > Since f = g(t|β0 ), any value of β(Y 0 129 I(f, g(t|β0 )). Consider the method of repeated sampling as a guide to inference, and minimize the K-L ˆ )))]. We therefore want to select the model g that minimizes discrepancy EY [I(f, g(t|β(Y ˆ )))] = EY [I(f, g(t|β(Y f (t) ln(f (t))dt − EY ˆ )) dt f (t) ln g(t|β(Y ˆ ) = constant − EY EX ln g(t|β(Y = constant − T, where all expectations are taken under the assumption that f is the density of X and Y , and X and Y are independent. Hence, the K-L information criterion for selecting the best model, g, is to maximize the objective function denoted by ˆ ) T = EY EX ln g(t|β(Y . (7.2.4) ˆ ) We have assumed that f (t) has the true PARMAS (p, q) model structure, and g t|β(Y ˆ ) → β as N → ∞. Note that βˆ is not necessarily is an estimate of f (t). By (7.2.1), β(Y equal to βˆ0 or even of the same dimension. Here βˆ is the parameter vector for the PARMA model under consideration with the smallest K-L discrepancy from the true model. Without loss of generality we take the likelihood of β(X) as g(t|β(X)) = LX (β(X)), by simply then interpreting g as a function of β(X) given X, which equals the likelihood under data X. Similarly, under data Y we write g(t|β(Y )) = LY (β(Y )). In the following ˆ )))] would be ln(L (β(X))), ˆ we will show that our estimate of T = EY EX [ln(LY (β(Y X 130 where the data X = (Xi , Xi+1 , . . . , Xi+n−1 ) are given. The bias of this estimate is ˆ bias = EX [ln(LX (β(X)))] − T. (7.2.5) We will obtain a first order estimate of the bias, and then remove this. Ultimately, AIC is defined to be ˆ AIC = −2 ln(LX (β(X))) + 2 bias. (7.2.6) Next we will state a few prerequisite results for our main theorem. Define n−1 ε2 (β) N −1 S−1 ε2 i+j jS+k+i (β) S(β) = = , 2 2 σ σ i+j j=0 j=0 k=0 k+i (7.2.7) where n = N S, and p εt = X t − q φt (k)Xt−k − θt (j)εt−j , j=1 k=1 are the model residuals. Then we can have the following result, where we write the residual εt as εt (β) to emphasize explicit dependence of εt on β, and β needs to be estimated. Proposition 7.2.1. Under the assumption of finite second moment for the causal and invertible PARMA process, as N → ∞, we have 1 ∂ 2 S(β) → 2A(β, σ 2 ) in probability, N ∂β 2 where A(β, σ 2 ) is given in (7.2.2). Proof. In the following, β is a (p+q)S ×1 vector, 131 ∂ 2 S(β) is a (p+q)S ×(p+q)S dimensional ∂β 2 matrix. 1 ∂ 2 S(β) 1 = 2 N ∂β N ∂ 2( ∂( = 2 N 2 = N N −1 j=0 N −1 j=0 N −1 S−1 ε2 (β) S−1 jS+k+i ) k=0 σ2 k+i 2 ∂β S−1 εjS+k+i (β) ∂εjS+k+i (β) ) ∂β k=0 σ2 k+i ∂β σ −2 ( jS+k+i j=0 k=0 N −1 S−1 ε ∂εjS+k+i (β) ∂εjS+k+i (β) )( ) ∂β ∂β 2 jS+k+i (β) ∂ εjS+k+i (β) ∂β 2 σ2 j=0 k=0 k+i   S−1 N −1 ∂ε (β) ∂ε (β) 1 jS+k+i jS+k+i = 2 σ −2  ( )( ) i+k N ∂β ∂β j=0 k=0   2ε S−1 N −1 ε ∂ (β) (β) jS+k+i jS+k+i 1 , + 2 2 2 N ∂β σ j=0 k+i k=0 2 + N As N → ∞, by equation (3.13) in Basawa and Lund [24], 1 N N −1 ∂ε jS+k+i (β) ∂εjS+k+i (β) ( )( ) → Γi+k (β, σ 2 ) in probability, ∂β ∂β j=0 and by equation (3.14) in Basawa and Lund [24], 1 N N −1 ε 2 jS+k+i (β) ∂ εjS+k+i (β) → 0 in probability, 2 ∂β 2 σ j=0 k+i 132 therefore as N → ∞, S−1 1 ∂ 2 S(β) → 2 σ −2 Γi+k (β, σ 2 ) + 0 in probability i+k N ∂β 2 k=0 = 2A(β, σ 2 ). Lemma 7.2.2. Given X a random variable, if E|X| < ∞, then as x → ∞, E |X|I {|X|>x} → 0. Thus every random variable X such that E|X| < ∞ is by itself uniform integrable. Proof. 0 ≤ |X|I{|X|>x} is monotone increasing in x to |X|, and therefore using the monotone convergence theorem yields E |X|I{|X|≤x} → E|X|, as x → ∞. Notice that E|X| = E |X|I{|X|≤x} + E |X|I{|X|>x} , by assumption E|X| < ∞, we conclude that E |X|I {|X|>x} → 0, and X is by definition uniform integrable. 133 Lemma 7.2.3. If Xn → X in probability, the following statements are equivalent: (i) {Xn : n ≥ 0} is uniform integrable. (ii) Xn → X in L1 . (iii) E|Xn | → E|X| < ∞. Proof. See Durrett [16, pp. 221-222]. In the following, we rewrite the second derivatives of log likelihood function, to simplify notation. Define ∂ 2 ln(LX (β)) , Ω(β) = ∂β 2 then we have the following lemma, which was given in Lund et al. [25, Equation 10] without proof. A few necessary assumptions are needed for Lemma 7.2.4 and Theorem 7.2.5, and 2 ∗ 2 1 ∂ S (β) , 1 ∂ S(β) and 1 (−2Ω(β)) are not far apart. If we could they guarantee that N N ∂β 2 N ∂β 2 ˆ for PARMA model, some of the assumptions below may prove the strong consistency of β be reduced. This would be the direction of our further investigation. Currently we adopt the strong consistency for MLE of a vector ARMA model from Section 3 of Basawa and Lund [24], and apply it for PARMA model. • A1. Ω(β) is bounded and continuous on the parameter space B. 2 ∗ 2 1 ( ∂ S (β) − ∂ S(β) ) = o (1). • A2. N p ∂β 2 ∂β 2 1 • A3. N ∂ 2 S ∗ (β) Ω(β) + 21 ∂β 2 • A4. N β − βˆ β − βˆ = op (1). is uniform integrable. Lemma 7.2.4. − Ω(β) → A(β,σ 2 ) N in probability as 134 N → ∞. Proof. By Proposition 7.2.1, as N → ∞, 1 ∂ 2 S(β) → A(β, σ 2 ) in probability, 2N ∂β 2 then by assumption A2 and A3 we can write, as N → ∞, − Ω(β) 1 =− N N Ω(β) + 1 ∂ 2 S ∗ (β) 2 ∂β 2 + 1 N 1 ∂ 2 S ∗ (β) 1 ∂ 2 S(β) − 2 ∂β 2 2 ∂β 2 + 1 ∂ 2 S(β) 2N ∂β 2 1 1 ∂ 2 S(β) = −op (1) + op (1) + 2 2N ∂β 2 → A(β,σ 2 ) in probability, where the last equality is by Brockwell and Davis [11, Proposition 6.1.3]. Now we are ready to state our main theorem. Theorem 7.2.5. Suppose that Xt is a causal and invertible Gaussian PARMAS (p, q) process, such that assumptions A1 − A4 hold. Let f (t) be the joint probability density function ˆ of X = (Xi , Xi+1 , . . . , Xi+n−1 ) , and β(X) denote the maximum likelihood estimates given data X, suppose that we are given an independent realization of the same process ˆ ) as its MLE. Let T = E E [ln(L (β(Y ˆ )))], Y = (Yi , Yi+1 , . . . , Yi+n−1 ) , with β(Y Y X Y then ˆ T = EX [ln(LX (β(X)))] − (p + q)S + o(1). Proof. In the following, both the expectations EX and EY are with respect to f , where the samples X and Y are independent. Let true parameter values be β 0 ∈ B, where B is the 135 parameter space containing all β. From (7.2.5), ˆ EX [ln(LX (β(X)))] −T ˆ ˆ ))] =EX [ln(LX (β(X)))] − EY EX [ln(LX (β(Y ˆ ˆ ))] =EY EX [ln(LX (β(X))) − ln(LX (β(Y ˆ ˆ ))] =EY EX [ln(LX (β(X))) − ln LX (β0 ) ] + EY EX [ln(LX (β0 )) − ln(LX (β(Y ˆ ˆ ))] =EX [ln(LX (β(X))) − ln LX (β0 ) ] + EY EX [ln(LX (β0 )) − ln(LX (β(Y (7.2.8) ˆ We will prove in the following that EX ln(LX (β(X))) − ln LX (β0 ) = 21 (p+q)S +o(1) ˆ ))] = 1 (p + q)S + o(1), respectively. and EY EX [ln(LX (β0 )) − ln(LX (β(Y 2 ˆ First of all, we compute EX [ln(LX (β(X))) − ln LX (β0 ) ] in (7.2.8). Apply a Taylor ˆ series expansion to ln LX (β0 ) about the MLE β(X) for a sample of data X yielding ˆ ∂ ln(LX (β(X))) ˆ ˆ ln LX (β0 ) = ln(LX (β(X))) +[ ] (β0 − β(X)) ∂β (7.2.9) ˆ ∂ 2 ln(LX (β(X))) 1 ˆ ˆ + (β0 − β(X)) [ ](β0 − β(X)) + Re, 2 ∂β 2 where ˆ ∂ 2 ln(LX (β(X))) ˆ is a (p + q)S × (p + q)S matrix, β0 − β(X) is a 1 × (p + q)S ∂β 2 column vector, and Re represents the exact remainder term for the quadratic Taylor series 1 ), and the convergence expansion and assume it is uniformly integrable. Note that Re = op ( n of Re in probability could be inferred from Lemma 6.0.7 and Brockwell and Davis [11, Proposition 6.1.5]. 136 ˆ Since β(X) is the MLE from data X, then [ ˆ ∂ ln(LX (β(X))) ∂ ln(LX (β)) ]=[ ] = 0. β=βˆ ∂β ∂β Taking expectations on both sides of (7.2.9), ˆ EX [ln LX (β0 )] = EX [ln(LX (β(X)))] ˆ ∂ 2 ln(LX (β(X))) 1 ˆ ˆ (β0 − β(X))] + EX [Re] + EX [(β0 − β(X)) 2 ∂β 2 ˆ = EX [ln(LX (β(X)))] ˆ ∂ 2 ln(LX (β(X))) 1 ˆ ˆ + tr{EX [ ](β0 − β(X))(β 0 − β(X)) } + EX [Re], 2 ∂β 2 1 ) by Lemma 6.0.7 and Brockwell and Davis [11, Proposition 6.1.5]. Therewhere Re = op ( n fore the expectation of the remainder term is negligible for large sample sizes if we assume the uniform integrability, i.e. limN →∞ EX [Re] = 0. Additionally, because βˆ is the MLE under LX (β0 ), by (7.2.1), βˆ → β0 in probability as N → ∞. By assumption A1 and [11, Proposition 6.1.4], we have ˆ → Ω(β ) in probability as N → ∞. Ω(β) 0 ˆ is also uniformly integrable. Then by Lemma By assumption A1 and Lemma 7.2.2, Ω(β) 7.2.3 we can get: lim EX [ N →∞ ˆ ∂ 2 ln(LX (β)) ∂ 2 ln(LX (β(X))) ] = EX [ ]. ∂β 2 ∂β 2 137 Hence we may write ˆ ∂ 2 ln(LX (β(X))) ˆ ˆ (β0 − β(X)) (β0 − β(X)) ∂β 2 ˆ ˆ ˆ = (β0 − β(X)) Ω(β)(β 0 − β(X)) ˆ ˆ = (β0 − β(X)) Ω(β0 )(β0 − β(X)) ˆ + (β0 − β(X)) ˆ − Ω(β ) (β − β(X)) ˆ Ω(β) 0 0 ˆ ˆ = (β0 − β(X)) Ω(β0 )(β0 − β(X)) + op (1), ˆ since (β0 − β(X)) = op (1) so that ˆ − Ω(β ) = op (1), Ω(β) 0 and ˆ (β0 − β(X)) ˆ − Ω(β ) (β − β(X)) ˆ Ω(β) = op (1) 0 0 by Brockwell and Davis [11, Proposition 6.1.1]. Therefore ˆ ˆ ˆ ˆ ˆ (β0 − β(X)) Ω(β)(β 0 − β(X)) − (β0 − β(X)) Ω(β0 )(β0 − β(X)) = op (1). (7.2.10) Now by assumption A1 and A4, ˆ ˆ ˆ ˆ ˆ EX (β0 − β(X)) Ω(β)(β 0 − β(X)) − (β0 − β(X)) Ω(β0 )(β0 − β(X)) = o(1). 138 Therefore ˆ ˆ ˆ Ω(β)(β EX (β0 − β(X)) 0 − β(X)) ˆ ˆ ˆ ˆ ˆ = EX (β0 − β(X)) Ω(β)(β 0 − β(X)) − (β0 − β(X)) Ω(β0 )(β0 − β(X)) ˆ ˆ + EX (β0 − β(X)) Ω(β0 )(β0 − β(X)) ˆ ˆ = o(1) + EX (β0 − β(X)) Ω(β0 )(β0 − β(X)) ˆ = o(1) + tr EX Ω(β0 ) β0 − β(X) ˆ β0 − β(X) . Then ˆ lim EX [ln LX (β0 )] = lim EX [ln(LX (β(X)))] N →∞ N →∞ 1 ˆ ˆ ˆ + (β0 − β(X)) Ω(β)(β lim E 0 − β(X)) 2 N →∞ X + lim EX [Re] N →∞ ˆ lim EX [ln(LX (β(X)))] N →∞ 1 ˆ ˆ + + lim tr{EX Ω(β0 ) β0 − β(X) β0 − β(X) 2 N →∞ 1 ˆ = EX [ln(LX (β(X)))] − lim tr{EX AN WN }, 2 N →∞ = Ω(β0 ) ˆ N , and WN = N β0 − β(X) √ ˆ we define WN = ZN ZN , where ZN = N β0 − β(X) where we define AN = − } ˆ β0 − β(X) . Additionally → Z as N → ∞ and Z ∼ N(0, A−1 (β, σ 2 )) by (7.2.1). Next we will show that lim EX AN WN = I(p+q)S, N →∞ 139 (7.2.11) where I(p+q)S is the identity matrix. Notice that AN → A(β,σ 2 ) in probability as N → ∞, by Lemma 7.2.4, and WN → W = ZZ , then EWN → EW = E ZZ = A−1 (β,σ 2 ), where the uniform integrability is guaranteed in assumption A4. Hence EX AN WN = EX AN WN − W AN − A(β,σ 2 ) W + EX (7.2.12) + EX A(β,σ 2 )W . By assumption A1, |AN | < M , where M < ∞, therefore as N → ∞, 0 ≤ |EX AN WN − W | ≤ |EX |AN | WN − W | ≤ M |EX WN − W | → 0, then EX AN WN − W → 0. (7.2.13) Similarly, EX AN − A(β,σ 2 ) W → 0, (7.2.14) and EX A(β,σ 2 )W = A(β,σ 2 )EX [W ] = A(β,σ 2 )A−1 (β,σ 2 ) = I(p+q)S. (7.2.15) Substitute (7.2.13), (7.2.14) and (7.2.15) into (7.2.12), and let N → ∞, to arrive at (7.2.11). Then 1 ˆ lim tr{I(p+q)S } lim EX [ln LX (β0 )] = EX [ln(LX (β(X)))] − 2 N →∞ N →∞ 1 ˆ = EX [ln(LX (β(X)))] − (p + q)S. 2 140 So we obtain 1 ˆ lim {EX [ln(LX (β(X)))] − EX [ln LX (β0 )]} = (p + q)S, 2 N →∞ or, in other words, 1 ˆ EX [ln(LX (β(X)))] − EX [ln LX (β0 )] = (p + q)S + o(1). 2 (7.2.16) ˆ ))], the remaining term in the last Now let us consider EY EX [ln(LX (β0 )) − ln(LX (β(Y ˆ )) around β for any line of (7.2.8). Similarly, apply the Taylor expansion to ln(LX (β(Y 0 given data X yielding ˆ )) = ln(L (β )) + [ ∂ ln(LX (β)) ] (β(Y ˆ )−β ) ln LX (β(Y 0 X 0 ∂β ∂ 2 ln(LX (β)) ˆ 1 ˆ + (β(Y ) − β0 ) [ ](β(Y ) − β0 ) + Re. 2 ∂β 2 Taking expectations with respect to X yields ˆ ))] = E [ln(L (β ))] + E [ ∂ ln(LX (β)) ] (β(Y ˆ )−β ) EX [ln LX (β(Y 0 X X 0 X ∂β ∂ 2 ln(LX (β)) ˆ 1 ˆ + (β(Y ) − β0 ) EX [ ](β(Y ) − β0 ) 2 ∂β 2 +EX [Re] (7.2.17) where Y is independent of X. 141 Taking expectations of (7.2.17) with respect to Y yields ˆ ))] = E [ln(L (β ))] + E [ ∂ ln(LX (β)) ] E ˆ EY EX [ln LX (β(Y X X 0 X Y β(Y ) − β0 ∂β + 2 1 ˆ ) − β )] ˆ ) − β ) E [ ∂ ln(LX (β)) ](β(Y EY [(β(Y 0 0 X 2 ∂β 2 + EX [Re], ˆ ) − β → 0 by letting N → ∞ on both sides, the linear terms vanishes since EY β(Y 0 142 (7.2.1), then the above equations becomes ˆ ))] lim EY EX [ln LX (β(Y N →∞ ∂ ln(LX (β)) ˆ )−β = lim EX [ln(LX (β0 ))] + lim EX [ ] EY β(Y 0 ∂β N →∞ N →∞ ∂ 2 ln(LX (β)) ˆ 1 ˆ + lim E [(β(Y ) − β0 ) EX [ ](β(Y ) − β0 )] 2 N →∞ Y ∂β 2 + lim E [Re] N→∞ X ∂ ln(LX (β)) ˆ )−β ] lim EY β(Y 0 ∂β N →∞ 1 1 1 1 ∂ 2 ln(LX (β)) ˆ ) − β )N 2 (β(Y ˆ ) − β ) ]} + lim tr{EX [ ]EY [N 2 (β(Y 0 0 2 N →∞ N ∂β 2 =EX [ln(LX (β0 ))] + EX [ =EX [ln(LX (β0 ))] 1 1 1 1 ∂ 2 ln(LX (β)) ˆ ) − β )N 2 (β(Y ˆ ) − β ) ]} ]EY [N 2 (β(Y + tr{ lim EX [ 0 0 2 N →∞ N ∂β 2 1 ∂ 2 ln(LX (β)) 1 ]A(β, σ 2 )−1 } =EX [ln(LX (β0 ))] + tr{EX lim [ 2 ∂β 2 N →∞ N =EX [ln(LX (β0 ))] 1 1 ∂ 2 ln(LX (β)) 1 ∂ 2 S ∗ (β) + tr{EX lim [ + ]A(β, σ 2 )−1 } 2 2 ∂β 2 ∂β 2 N →∞ N 1 ∂ 2 S ∗ (β) 1 ]A(β, σ 2 )−1 } − tr{EX lim [ 2 2 2 ∂β N →∞ 1 =EX [ln(LX (β0 ))] + 0 − tr{[2A(β, σ 2 )A(β, σ 2 )−1 ]} 4 1 =EX [ln(LX (β0 ))] − tr{I(p+q)S } 2 1 =EX [ln(LX (β0 ))] − (p + q)S, 2 Hence ˆ ))]} = 1 (p + q)S, lim {EX [ln(LX (β0 ))] − EY EX [ln(LX (β(Y 2 N →∞ 143 or, in other words, ˆ ))] = 1 (p + q)S + o(1). EX [ln(LX (β0 ))] − EY EX [ln(LX (β(Y 2 (7.2.18) Add (7.2.16) and (7.2.18) into (7.2.8), we have ˆ EX [ln(LX (β(X)))] − T = (p + q)S + o(1) which completes the proof. ˆ Akaike [2] defined an information criterion (AIC) by multiplying ln(LX (β(X))) by −2, to get ˆ AIC = −2 ln(LX (β(X))) + 2K, where K is the bias term for maximum log-likelihood as an estimator for ˆ )))], T = EY EX [ln(LY (β(Y which is equal to the number of estimable parameters in the model. This has become known as Akaike’s information criterion or AIC. For a PARMAS (p, q) model, if we treat the variance as nuisance parameters, there are k = (p + q)S estimable parameters. It is also proved in Theorem 7.2.5, where ˆ lim bias = lim {EX [ln(LX (β(X)))] − T } = (p + q)S, N →∞ N →∞ 144 so that bias = (p + q)S + o(1). ˆ )))] is Therefore an asymptotically unbiased estimator of T = EY EX [ln(LY (β(Y ˆ EX [ln(LX (β(X)))] − (p + q)S. From (7.2.6) the AIC for a PARMA model is obtained by ˆ AIC = −2 ln(LX (β(X))) + 2(p + q)S, 7.2.1 (7.2.19) Application to model selection for the Fraser River Besides the compare of forecast plots in Chapter 5, we can also compute the value of AIC for each candidate model, and the one yielding the minimum AIC is the best model. The results are shown in Table 7.1. Within all full model candidates, the full PARMA(1, 1) model fitted by MLE in Table 5.4 has the minimum AIC, and so it is the best full model. For reduced model, the model in in Table 5.7, obtained by asymptotic distribution of MLE, has the minimum AIC, and there are only 13 estimable parameters. Lastly, for PARS (p) model, we tried two different approaches. First approach is by removing all θˆi in Table 5.7, since there are only three of them. In this way, we obtain a PAR12 (1) model, and there are only 12 estimable parameters. However, the value of AIC turns out to be very large. The second approach is done in a more rigorous way, using the pear package in R to do automatic model selection for PAR12 (p) model. The pear package was developed by A.I. McLeod and Mehmet Balcilar, for estimating periodic autoregressive models, and they provided a built-in 145 data set for the historical flows of Fraser river. The best model selected is shown in Table 7.2, which is a PAR12 (3) model with 19 estimable parameters. We compute its AIC, shown in Table 7.1. Additionally, Table 7.3 demonstrate the AIC values for the logarithm of the data, where the model parameters have more impacts on AIC. Note that both full and reduced PARMA12 (1, 1) models perform better than PAR models. Therefore the PARMA12 (1, 1) model is a better model for the Fraser river flows. Full PARMA12 (1, 1) model Full PARMA12 (1, 1) model Reduced PARMA12 (1, 1) model Reduced PARMA12 (1, 1) model PAR12 (1) model, removing all θˆi PAR12 (3) model in in in in in in Model Number of parameters AIC Table 5.3 24 18524.25 Table 5.4 24 18476.47 Table 5.5 19 18768.79 Table 5.7 13 18528.13 Table 5.7 12 18754.09 Table 7.2 19 17714.33 Table 7.1: Comparison of AIC values for different models ˆ (1) Season i φ i 0 0.527 1 0.779 2 0.764 3 1.188 4 0.647 5 0.411 6 0.545 7 0.517 8 0.661 9 0.890 10 0.631 11 0.543 ˆ (2) φ ˆ (3) φ i i 0.000 0.000 -0.231 0.189 0.000 0.000 0.000 0.000 0.000 0.000 -1.237 1.562 0.000 0.000 0.000 0.000 -0.127 0.000 -0.434 0.165 0.000 0.000 0.000 0.000 σ ˆi 6325.612 5004.514 5326.131 17540.533 37180.017 38084.824 34809.521 17666.091 13518.955 14685.378 12434.951 8535.279 Table 7.2: The model parameters in PAR12 (p) automatic model fitting by pear package in R, where the number of estimable parameters is 19, assuming σ ˆ i as nuisance parameters in the AIC computation. 146 Full Full Reduced Reduced PARMA12 (1, 1) PARMA12 (1, 1) PARMA12 (1, 1) PARMA12 (1, 1) PAR12 (3) model model model model model in in in in in Model Number of parameters AIC Table 5.3 24 -415.8586 Table 5.4 24 -506.6053 Table 5.5 19 -313.3939 Table 5.7 14 -297.9929 Table 7.2 19 520.9146 Table 7.3: A compare of AIC values for different models, after taking the log of the data. Note that both full and reduced PARMA12 (1, 1) models perform better than PAR models. 7.3 Future Research In this section I would like to list out the open problems in my research, and this would also be helpful for researchers who are interested in studying in this topic deeply. • Strong consistency of MLE. This could follow the result for ARMA model from Yao and Brockwell [57]. Clear details applying to PARMA model should be carefully generalized. • A stronger condition for Theorem 7.2.5, with fewer assumptions. The uniform integrability of N β − βˆ β − βˆ is waiting for a complete proof, and the property of the second derivative of likelihood function in PARMA model needs to be studied in details. • I will work on cleaning up my R code, and add the forecasting tool in perARMA package. This would be very useful for researchers would work on hydrology and periodic stationary time series prediction. 147 APPENDIX 148 APPENDIX # R code for the paper "Forecasting for periodic ARMA models" # by Anderson, Paul; Meerschaert, Mark; Zhang, Kai # Please put the file "frazierc.txt" under your work directory ####################################### # Seasonal Sample Mean for (4.1) ####################################### data <- read.table("frazierc.txt") XMEAN <- array(0,c(12)) # XMEAN is a vector of sample means for 12 seasons for(I in 1:12) { XMEAN[I] <- 0 for(T in 0:71) { XMEAN[I] <- XMEAN[I] + data[T*12+I,1] } XMEAN[I] <- XMEAN[I]/72 149 } ###################################### # Sample Autocovariance for (4.2) ###################################### # season i = 0, 1, ..., 11 # LAG = 0, 1, ..., 124 COVAR <- array(0,c(12,125)) for(L in 1:125) { LAG <- L-1 for (I in 1:12) { i <- (I-1) COVAR[I,L] <- 0 J <- as.integer((i+LAG)/12) K <- ((I+LAG)-(12*J)) for (T in 0:(71-J)) { COVAR[I,L] <- COVAR[I,L] +(data[T*12+I,1]-XMEAN[I])*(data[T*12+I+LAG,1]-XMEAN[K]) 150 } COVAR[I,L] <- COVAR[I,L]/(72-J) } } #################################### # Sample Autocorrelation for (4.3) #################################### rho <- array(0,c(12,125)) for (I in 1:12) { for (L in 1:125) { R <- (I+L-2)%%12 + 1 rho[I,L] <- COVAR[I,L]/((COVAR[I,1]*COVAR[R,1])^.5) } } ############################################## # Innovations Algorithm for X_t process ############################################## ## I = i + 1 151 ## J = j + 1 ## K = k + 1 ## L = ell + 1 ## N = n + 1 ## COVAR(I,L) = gamma_i(ell) ## V(N,I) = v_{n,i} ## THETA(N,M,I) = theta_{n,m}^{(i)} WHERE m = M + 1 ## NOTE THAT: theta_{n,n-k}^{(i)} = THETA(N,N-K+1,I) ## theta_{k,k-j}^{(i)} = THETA(K,K-J+1,I) ## gamma_k(n-k) = COVAR(K,N-K+1) V<- array(0,c(50,12)) THETA <- array(1,c(50,50,12)) for (I in 1:12) { V[1,I] <- COVAR[I,1] for (N in 2:50) { for (K in 1:(N-1)) { S <- 0 if (K == 1) { 152 K0 <- as.integer((I+1-2)/12) K1 <- (I+K-1)-12*K0 THETA[N,N,I] <- (COVAR[K1,N]-S)/V[1,I] } else { for (J in 1:(K-1)) { S <- S+THETA[K,K-J+1,I]*THETA[N,N-J+1,I]*V[J,I] K0 <- as.integer((I+K-2)/12) K1 <- (I+K-1)-12*K0 THETA[N,N-K+1,I] <- (COVAR[K1,N-K+1]-S)/V[K,I] } } } R <- 0 for(J in 1:(N-1)) { R <- R+V[J,I]*(THETA[N,N-J+1,I])^2 N0 <- as.integer((I+N-2)/12) N1 <- (I+N-1)-12*N0 V[N,I] <- COVAR[N1,1]-R 153 } } } # At k = 20 iterations, get the convergence of THETA and V psi1 <- array(0,c(12,12)) for ( I in 1:12 ) { R <- 0 for (J in 1:12) { R <- (I-20-1)%%12+1 psi1[I,J] <- THETA[21,J,R] } } sigma_square <- array(0,c(12)) for (I in 1:12) { S <- 0 S <- (I-20-1)%%12+1 sigma_square[I] <- V[21,S] } 154 ############################################ # Get model parameter estimates by (4.4) ############################################ phi <- array(0,c(12)) sigma <- array(0,c(12)) theta <- array(0,c(12,864)) for (I in 1:12) { R <- ((I-1)-1)%%12+1 phi[I] <- psi1[I,3]/psi1[R,2] } for (I in 1:12) { theta[I,1] <- -1 } for (I in 1:12) { theta[I,2] <- psi1[I,2]-phi[I] } for (I in 1:12) { 155 sigma[I] <- (sigma_square[I])^.5 } # A simple output of model estimates phi theta[,2] sigma # reduced model # phi[1] <- 0 # phi[5] <- 0 # phi[7] <- 0 # phi[8] <- 0 # phi[10] <- 0 ###################################### # The following is for prediction ###################################### # Autocovariances K(J,L) for W_t process # K(J,L) = C in (2.6) K <- array(0,c(865,865)) 156 for (I in 1:12) # I is season { K[I,I] <- COVAR[I,1] # when J = L = I for (J in 1:12) { for (L in 1:13) { if ( J <= I && L == (I+1) ) { s1 <- (J-1)%%12+1 l1 <- (L-1)%%12+1 K[J,L] <- COVAR[s1,abs(J-L)+1] - phi[l1]*COVAR[s1,abs(L-1-J)+1] } } } } for (I in 1:12) { 157 for (J in 1:865) { for (L in 1:865) { if (min(J,L) >= (I+1) && abs(J-L) <= 1) { s1 <- (J-1)%%12+1 n1 <- (L-1)%%12+1 q1 <- (J-2)%%12+1 K[J,L] <- theta[s1,1]*theta[n1,abs(J-L)+1]*(sigma[s1])^2 + theta[s1,2]*theta[n1,(abs(1+J-L))+1]*(sigma[q1])^2 } } } } ########################################## # Innovations algorithm for W_t process # The computation in (2.6) ########################################## V<- array(0,c(865,12)) THETA <- array(0,c(865,865,12)) 158 for (I in 1:12) { for (J in 1:865) { THETA[J,1,I] <- 1 } } for (I in 1:12) { V[1,I] <- K[I,I] for (N in 2:(865-I)) { THETA[N,2,I] <- K[I+N,I+N-1]/V[N-1,I] V[N,I] <- K[I+N,I+N]- V[N-1,I]*(THETA[N,2,I])^2 } } ##################################### # Computation of \hat{X} in (2.7) ##################################### 159 # Subtract seasonal mean fraser <- t(data) X <- array(0,c(864)) for(I in 1:12) { for(T in 0:71) { X[T*12+I] <- fraser[T*12+I]-XMEAN[I] } } # Xhat = \hat{X} in (2.7) Xhat <- array(0,c(12,880)) for (I in 1:12) { Xhat[I,1+I] <- 0 Xhat[I,2+I] <- THETA[2,2,I]*(X[I+1]-Xhat[I,I+1]) for (N in 3:(840-I)) { s1 <- (I+N-1)%%12+1 Xhat[I,N+I] <- phi[s1]*X[I+N-1]+ THETA[N,2,I]*(X[I+N-1]-Xhat[I,I+N-1]) } 160 } ############################################# # h-step prediction in (2.8) and (2.9) ############################################# for (I in 1:12) { s1 <- (841-1)%%12+1 Xhat[I,841] <- phi[s1]*X[840] + THETA[841-I,2,I]*(X[840]-Xhat[I,840]) for (h in 2:24) { m1 <- (840+h-1)%%12+1 Xhat[I,840+h] <- phi[m1]*Xhat[I,840+h-1] } } ############################# # Forecast error in (3.3) ############################# # Calculation of Casual represention for PARMA(1,1) # The casual coefficient psi is periodic in S psi <- array(0,c(12,24)) # I = 12, h =24 161 for (I in 1:12) { psi[I,1] <- 1 psi[I,2] <- (phi[I]+theta[I,2]) S <- 1 for (k in 3:24) { for (j in 0:(k-3)) { j0 <- (I-j-1)%%12+1 S <- S*phi[j0] } j1 <- (I-(k-1)-1)%%12+1 psi[I,k] <- S*(phi[j1]+theta[j1,2]) } } # h-step prediction error # Use this error for confidence band in Corr.3.2 sigma_h2 <- array(0,c(12,24)) for (I in 1:12) { for (h in 1:24) 162 { R <- 0 for (J in 1:h) { s0 <- (840+h-1)%%12+1 s1 <- (840+h-J-1)%%12+1 R <- R + (psi[s0,J])^2*(sigma[s1])^2 } sigma_h2[I,h] <- R } } # Add seasonal mean to Xhat Yhat <- array(0,c(880)) for(I in 1:12) { for(T in 0:71) { Yhat[T*12+I] <- Xhat[1,T*12+I]+XMEAN[I] } } # Computation of residuals 163 res <- X - Yhat[1:864] for (I in 1:12) { for (T in 0:71) { res[T*12+I] <- res[T*12+I]/sigma[I] } } ############################################## # 95% Prediction bounds for h-step prediction in Figure 3 ############################################## CI_low <- array(0,c(24)) CI_up <- array(0,c(24)) for (I in 1:12) { for (h in 1:24) { CI_low[h] <- Yhat[840+h]-1.96*sqrt(sigma_h2[I,h]) CI_up[h] <- Yhat[840+h]+1.96*sqrt(sigma_h2[I,h]) } } 164 ########################################### # 95% Confidence Intervals for sample mean ########################################### CI_low_mean <- array(0,c(12)) CI_up_mean <- array(0,c(12)) for (I in 1:12) { CI_low_mean[I] <- XMEAN[I]-1.96*(COVAR[I,1])^.5 CI_up_mean[I] <- XMEAN[I]+1.96*(COVAR[I,1])^.5 } ######################################### # 95% Confidence Intervals for gamma_0 ######################################### error <- array(0,c(12)) ## define error as sqrt{[(V_00)_ellell]/72} for (I in 1:12) { S <- 0 for (L in 0:10) { S <- S + 4*((COVAR[I,12*L+1])^2) } error[I] <- ((S - 2*((COVAR[I,1])^2))/72)^.5 165 ## subtract the exact one from niu = 0 } CI_low_gamma0 <- array(0,c(12)) CI_up_gamma0 <- array(0,c(12)) for (I in 1:12) { CI_low_gamma0[I] <- COVAR[I,1]-1.96*error[I] CI_up_gamma0[I] <- COVAR[I,1]+1.96*error[I] } ######################################### # 95% Confidence Intervals for rho_1 ######################################### W_11 <- array(0,c(12)) for (I in 1:12) { S <- 0 R <- ((I-1)%%12)+1 ## S is the sum at n = 0, niu = 12 S <- rho[I,1]*rho[R,1] + rho[I,2]*rho[R,2] - rho[I,2]*(rho[I,1]*rho[I,2]+ 166 rho[R,2]*rho[R,1]) - rho[I,2]*(rho[I,1]*rho[R,2]+rho[I,2]*rho[R,1]) + .5*rho[I,2]^2*(rho[I,1]^2+rho[I,2]^2+rho[R,2]^2+rho[R,1]^2) ## Next loop is sum from n = -10 to 10 for (L in 1:10) { S <- S + (rho[I,L*12+1]*rho[R,L*12+1] + rho[I,L*12+2]*rho[R,L*12] rho[I,2]*(rho[I,L*12+1]*rho[I,L*12+2]+rho[R,L*12]*rho[R,L*12+1]) rho[I,2]*(rho[I,L*12+1]*rho[R,L*12]+rho[I,L*12+2]*rho[R,L*12+1]) + .5*rho[I,2]^2*(rho[I,L*12+1]^2 +rho[I,L*12+2]^2+rho[R,L*12]^2+rho[R,L*12+1]^2)) } P <- 0 for (L in (-10):(-1)) { P <- P + (rho[I,abs(L*12)+1]*rho[R,abs(L*12)+1] + rho[I,abs(L*12+1)+1]*rho[R,abs(L*12-1)+1] rho[I,2]*(rho[I,abs(L*12)+1]*rho[I,abs(L*12+1)+1]+ rho[R,abs(L*12-1)+1]*rho[R,abs(L*12)+1]) rho[I,2]*(rho[I,abs(L*12)+1]*rho[R,abs(L*12-1)+1]+ rho[I,abs(L*12+1)+1]*rho[R,abs(L*12)+1]) + .5*rho[I,2]^2*(rho[I,abs(L*12)+1]^2+rho[I,abs(L*12+1)+1]^2 +rho[R,abs(L*12-1)+1]^2+rho[R,abs(L*12)+1]^2)) } 167 W_11[I] <- P+S } CI_low_rho1 <- array(0,c(12)) CI_up_rho1 <- array(0,c(12)) for (I in 1:12) { CI_low_rho1[I] <- rho[I,2]-1.96*((W_11[I]/72)^.5) CI_up_rho1[I] <- rho[I,2]+1.96*((W_11[I]/72)^.5) } ######################################### # 95% Confidence Intervals for rho_2 ######################################### W_22 <- array(0,c(12)) for (I in 1:12) { S <- 0 R <- ((I)%%12)+1 ## S is the sum at n = 0, niu = 12 S <- rho[I,1]*rho[R,1] + rho[I,3]*rho[R,3] rho[I,3]*(rho[I,1]*rho[I,3]+rho[R,3]*rho[R,1]) - 168 rho[I,3]*(rho[I,1]*rho[R,3]+rho[I,3]*rho[R,1]) + .5*rho[I,3]^2*(rho[I,1]^2+rho[I,3]^2+rho[R,3]^2+rho[R,1]^2) ## Next loop is sum from n = -10 to 10 for (L in 1:10) { S <- S + (rho[I,L*12+1]*rho[R,L*12+1] + rho[I,L*12+3]*rho[R,L*12-1] - rho[I,3]*(rho[I,L*12+1]*rho[I,L*12+3]+ rho[R,L*12-1]*rho[R,L*12+1]) - rho[I,3]*(rho[I,L*12+1]*rho[R,L*12-1]+ rho[I,L*12+3]*rho[R,L*12+1]) + .5*rho[I,3]^2*(rho[I,L*12+1]^2+ rho[I,L*12+3]^2+rho[R,L*12-1]^2+rho[R,L*12+1]^2)) } P <- 0 for (L in (-10):(-1)) { P <- P + (rho[I,abs(L*12)+1]*rho[R,abs(L*12)+1] + rho[I,abs(L*12+2)+1]*rho[R,abs(L*12-2)+1] rho[I,3]*(rho[I,abs(L*12)+1]*rho[I,abs(L*12+2)+1]+ rho[R,abs(L*12-2)+1]*rho[R,abs(L*12)+1]) rho[I,3]*(rho[I,abs(L*12)+1]*rho[R,abs(L*12-2)+1]+ rho[I,abs(L*12+2)+1]*rho[R,abs(L*12)+1]) + .5*rho[I,3]^2*(rho[I,abs(L*12)+1]^2+rho[I,abs(L*12+2)+1]^2+ 169 rho[R,abs(L*12-2)+1]^2+rho[R,abs(L*12)+1]^2)) } W_22[I] <- P+S } CI_low_rho2 <- array(0,c(12)) CI_up_rho2 <- array(0,c(12)) for (I in 1:12) { CI_low_rho2[I] <- rho[I,3]-1.96*((W_22[I]/72)^.5) CI_up_rho2[I] <- rho[I,3]+1.96*((W_22[I]/72)^.5) } ############# # Output ############# # Please remove "#" if you want to generate output files write(CI_low,file="CI_low_prediction.txt",ncolumns=1) write(CI_up,file="CI_up_prediction.txt",ncolumns=1) 170 write(fraser,file="Data of Fraser River.txt",ncolumns=1) write(Yhat[1:864],file="24-month Predictions.txt",ncolumns=1) # write(XMEAN,file="Sample Mean.txt",ncolumns=1) # write(CI_low_mean,file="CI_low_mean.txt",ncolumns=1) # write(CI_up_mean,file="CI_up_mean.txt",ncolumns=1) # write(COVAR[,1],file="Sample Variance.txt",ncolumns=1) # write(CI_low_gamma0,file="CI_low_variance.txt",ncolumns=1) # write(CI_up_gamma0,file="CI_up_variance.txt",ncolumns=1) # write(rho[,2],file="rho_1.txt",ncolumns=1) # write(CI_low_rho1,file="CI_low_rho1.txt",ncolumns=1) # write(CI_up_rho1,file="CI_up_rho1.txt",ncolumns=1) # write(rho[,3],file="rho_2.txt",ncolumns=1) # write(CI_low_rho2,file="CI_low_rho2.txt",ncolumns=1) # write(CI_up_rho2,file="CI_up_rho2.txt",ncolumns=1) # write(res,file="Residuals.txt",ncolumns=1) ########################### # Plots 171 ########################### # The plots in this paper were producted by Minitab # The following plots are provided as drafts for reference ############ # Figure 1 g_range <- range(0, data) plot(data[1:180,1], col = "black", ylab = "Flow (cms)", axes=FALSE, type = "l", ylim=g_range, xlab="Month / Year", cex.lab=1.5, lty = 5) axis(1,at = c(1,37,73,109,145,180),lab= c("10/1912", "10/1915","10/1918","10/1921","10/1924","09/1927")) axis(2, at = c(1,50000,100000,150000,200000,250000, 300000,350000,400000), lab =c("0","50000","100000", "150000","200000","250000","300000","350000","400000") ) ############ # Figure 3 ############ g_range <- range(0, CI_up) plot(CI_low, col = "blue", ylab = "Flow (cms)", lwd = 2, axes=FALSE, type = "l", ylim=g_range, xlab="Month / Year", 172 cex.main = 2, cex.lab=1.5, lty = 5) axis(1,at = c(1,5,9,13,17,21,24),lab= c("10/1982","02/1983", "06/1983","10/1983","02/1984","06/1984","")) axis(2) lines(CI_up, col = "blue", type="l", lty=5, lwd = 2) lines(Yhat[841:864], col = "red", type="o", pch=20, lty=1, lwd = 2) lines(data[841:864,1],col = "black", type="l", lty=1, lwd = 2) ############ # Figure 4 ############ g_range <- range((CI_low-c(XMEAN,XMEAN)), (CI_up-c(XMEAN,XMEAN))) plot((CI_low-c(XMEAN,XMEAN)), col = "blue", type = "l", ylab = "Width of prediction bounds", ylim=g_range, xlab="Month", main="Width of prediction bounds (mean subtracted)", lty = 5) lines(Yhat[841:864]-c(XMEAN,XMEAN), col = "red", type="o", pch=20, lty=1) lines((CI_up-c(XMEAN,XMEAN)), col = "blue", type="l", lty=5) ############ # Figure 2 ############ par(mfrow = c(2,2)) g_range <- range(0, (CI_up_mean)) 173 plot(XMEAN, col = "black", ylab = "Sample mean (cms)", lwd =2, ylim=g_range,axes=FALSE, xlab="Season", main="(a) Sample Means", lty = 5, cex.main = 2.5, cex.lab=1.5) axis(1,at = 1:12,lab= c("0","1","2","3","4","5","6","7","8","9","10","11")) axis(2) lines(XMEAN, type="o",lwd = 2,pch = 20) lines((CI_low_mean), col = "red", type="l", lty=5,lwd =2) lines((CI_up_mean), col = "red", type="l", lty=5,lwd =2) g_range <- range(0, (CI_up_gamma0)^.5 ) plot((COVAR[,1])^.5, col = "black", ylab = "Sample sd (cms)", axes=FALSE, lwd=2, ylim=g_range, xlab="Season", main="(b) Sample Standard Deviations", lty = 5,cex.main = 2.2, cex.lab=1.5) axis(1,at = 1:12,lab= c("0","1","2","3","4","5","6","7","8","9","10","11")) axis(2) lines((COVAR[,1])^.5, type="o",lwd = 2,pch = 20) lines((CI_low_gamma0)^.5, col = "red", type="l", lty=5,lwd=2) lines((CI_up_gamma0)^.5, col = "red", type="l", lty=5,lwd=2) g_range <- range(0, 1) plot(rho[,2], col = "black", ylab = "Autocorrelations", axes=FALSE, ylim=g_range, xlab="Season", main="(c) Sample Autocorrelations : lag = 1", lty = 5,cex.main = 1.9, cex.lab=1.5) 174 axis(1,at = 1:12,lab= c("0","1","2","3","4","5","6","7","8","9","10","11")) axis(2) lines(rho[,2], type="o",lwd = 2,pch = 20) lines(CI_low_rho1, col = "red", type="l", lty=5,lwd=2) lines(CI_up_rho1, col = "red", type="l", lty=5,lwd=2) g_range <- range(-.5, 1) plot(rho[,3], col = "black", ylab = "Autocorrelations", axes=FALSE,lwd=2, ylim=g_range, xlab="Season", main="(d) Sample Autocorrelations : lag = 2", lty = 5,cex.main = 1.9, cex.lab=1.5) axis(1,at = 1:12,lab= c("0","1","2","3","4","5","6","7","8","9","10","11")) axis(2) lines(rho[,3], type="o",lwd = 2,pch = 20) lines(CI_low_rho2, col = "red", type="l", lty=5,lwd=2) lines(CI_up_rho2, col = "red", type="l", lty=5,lwd=2) ################################################# # Computation of PARMA Autocovariances in Chapter 1.1 ################################################# # Model is PARMA_12(1,1) i.e. there are 12 seasons # Model is based on table 5 in Tesfaye, Meerschaert and Anderson (2006) # phi_1 <- c(.198,.568,.560,.565,.321,.956,1.254,.636,-1.942,-.092,.662,.355) 175 phi_1 <- c(0,.568,.560,.565,0,.956,0,0,-1.942,0,.662,.355) theta_1 <- c(.687,.056,-.052,-.05,.47,-.389,-.178,-.114,2.393,.71,-.213,.322) sigma <- c(11875.479,11598.254,7311.452,5940.845,4160.214,4610.209, 15232.867,31114.514,32824.370,29712.190,15511.187,12077.991) theta_0 <- array(1,c(12)) psi_0 <- array(1,c(12)) psi_1 <- phi_1 + theta_1 # By (16) in Tesfaye, Meerschaert and Anderson (2006) # Set AX = b, then solve X, where X is a vector # X gives ACVF = gamma_i(h), when h <= max(p,q), i = 0, 1, ... 11 # In PARMA_12(1,1), p = q = 1 A <- array(0,c(24,24)) for (i in 1:12) { A[i,i] <- 1 A[i+12,(i-2)%%12+1+12] <- 1 A[i,(i-2)%%12+1+12] <- (-phi_1[i]) A[i+12,(i-2)%%12+1] <- (-phi_1[i]) } b <- array(0,c(24)) 176 for (i in 1:12) { b[i] <- theta_0[i] * psi_0[i] * (sigma[i])^2 + theta_1[i] * psi_1[i] * (sigma[(i-2)%%12+1])^2 b[i+12] <- theta_1[i] * psi_0[(i-2)%%12+1] * (sigma[(i-2)%%12+1])^2 } X <- solve(A,b) X matrix(X, ncol = 2) # COVAR is autocovariance function # COVAR[I,H] = \gamma_{i}(h) # I = i + 1, i is season, i = 0, 1, 2, ... 11 # H = h + 1, h is lag, h = 0, 1, 2, ... 79 COVAR <- array(0,c(12,80)) COVAR[,1:2] <- X # Read X into first columns of COVAR # A good way to get rid of the for loop below #for (I in 1:12) # This loop reads X into COVAR, for h <= max(p,q); here h = 0,1 177 # { # COVAR[I,1] <- X[I] # COVAR[I,2] <- X[I+12] # } for (H in 3:80) # This loop computes ACVF for h > max(p,q) { for (I in 1:12) { COVAR[(I-H)%%12+1,H] <- phi_1[I]*COVAR[(I-H)%%12+1,H-1] #this one is right! # (I-2)%%12+1 represents season I-1 # COVAR[I,H] <- phi_1[I]*COVAR[(I-2)%%12+1,H-1] } } ############################### # Innovations Algorithm ############################### V<- array(0,c(50,12)) THETA <- array(0,c(50,50,12)) 178 for (I in 1:12) # I = i+1, i is season { V[1,I] <- COVAR[I,1] for (N in 2:50) # N = n+1, n is number of iterations { for (K in 1:(N-1)) { S <- 0 if (K == 1) { K0 <- as.integer((I+1-2)/12) K1 <- (I+K-1)-12*K0 THETA[N,N,I] <- (COVAR[K1,N]-S)/V[1,I] } else { for (J in 1:(K-1)) { S <- S+THETA[K,K-J+1,I]*THETA[N,N-J+1,I]*V[J,I] K0 <- as.integer((I+K-2)/12) K1 <- (I+K-1)-12*K0 179 THETA[N,N-K+1,I] <- (COVAR[K1,N-K+1]-S)/V[K,I] THETA[N,1,I] <- 1 # This defines \theta_{n,0}(i) = 1 } } } R <- 0 for(J in 1:(N-1)) { R <- R+V[J,I]*(THETA[N,N-J+1,I])^2 N0 <- as.integer((I+N-2)/12) N1 <- (I+N-1)-12*N0 V[N,I] <- COVAR[N1,1]-R } } } ################################# # convergence of theta to psi ################################# psi_k <- array(0,c(50,12,50)) # psi_k is \psi_{i}(\ell) for (K in 1:50) # K = k+1, k is number of iterations { 180 for ( I in 1:12 ) # I = i+1, i is season { R <- 0 for (J in 1:50) { R <- (I-K)%%12+1 psi_k[K,I,J] <- THETA[K,J,R] } } } # Test output for lag = 1 # This matches values of psi_1 = phi_1 + theta_1, after 5 iterations psi_k[,,2] psi_1 # Test output for lag = 2 # psi_k[,,3] # This shows the error between psi_k[,,2] and psi_1 error <- array(0,c(50,12)) for (I in 1:12) { for (K in 1:50) 181 { error[K,I] <- psi_k[K,I,2]-psi_1[I] } } max_error <- array(0,c(49)) # for all season, for lag =1 only for (K in 1:49) for (I in 1:12) { { max_error[K] <- max(abs(error[K,I])) } } plot(max_error,ylab = "Value of convergence error", xlab="Number of iterations", main="error",cex.main = 2, cex.lab=1.8) lines(max_error, type="o", pch=20, lty=1, col="red") ######################################## # convergence of v to sigma^2 ######################################## 182 sigma_square <- array(0,c(50,12)) for (K in 1:50) { for (I in 1:12) { S <- 0 S <- (I-K)%%12+1 sigma_square[K,I] <- V[K,S] } } # Test output, which matches sigma, after 5 iterations sigma_square^.5 sigma # This gives the error between sigma_square and sigma^2 error2 <- array(0,c(50,12)) for (I in 1:12) { for (K in 1:50) { error2[K,I] <- (sigma_square[K,I]-(sigma[I])^2) } 183 } 184 BIBLIOGRAPHY 185 BIBLIOGRAPHY [1] Adams, G.J and G.C. Goodwin, (1995), Parameter estimation for periodically ARMA models. Journal of Time Series Analysis, 16, 127–145. [2] Akaike, H. (1974), A new look at the statistical model identification. IEEE Transactions of Automatic Control, 19, 716–723. [3] Anderson, P.L. and A.V. Vecchia (1993), Asymptotic results for periodic autoregressive moving -average processes. Journal of Time Series Analysis, 14, 1–18. [4] Anderson, P.L. and M.M. Meerschaert (1997), Periodic moving averages of random variables with regularly varying tails. Annals of Statistics, 25, 771–785. [5] Anderson, P.L. and M.M. Meerschaert (1998), Modeling river flows with heavy tails. Water Resources Research , 34 (9), 2271–2280. [6] Anderson, P.L., M.M. Meerschaert and A.V. Veccia (1999), Innovations algorithm for periodically stationary time series. Stochastic Processes and their Applications, 83, 149– 169. [7] Anderson, P.L. and M.M. Meerschaert (2005), Parameter estimates for periodically stationary time series. Journal of Time Series Analysis, 26, 489–518. [8] Anderson, P.L., M.M. Meerschaert, Y.G. Tesfaye (2007), Fourier-PARMA models and their application to modeling of river flows. Journal of Hydrologic Engineering, 12 (5), 462–472. [9] Ansley, C.F. (1979), An algorithm for the exact likelihood of a mixed autoregressivemoving average process. Biometrika, 66 (1), 59–65. [10] Box, G.E., and G.M. Jenkins (1976), Time Series Analysis: Forcasting and Control, 2nd edition, San Francisco: Holden-Day. [11] Brockwell, P.J. and R.A. Davis (1991), Time Series: Theory and Methods, 2nd ed. New York: Springer-Verlag. 186 [12] Broyden, C.G. (1970), The convergence of a class of double-rank minimization algorithms, Journal of the Institute of Mathematics and Its Applications 6, 7690. [13] Burnham, K.P. and Anderson, D.R. (2010), Model selection and multi-model inference: a practical information theoretic approach, 1st ed. Springer. [14] Chen, H.-L. and A.R. Rao (2002), Testing hydrologic time series for stationarity. Journal of Hydrologic Engineering,7 (2), 129–136. [15] Dunsmuir, W. and Hannan E. J. (1976), Vector linear time series models. Advances in Applied Probability, 8 (2), 339 - 364. [16] Durrett, R. (2010), Probability: Theory and Examples, 4th ed. Cambridge University Press. [17] Fletcher, R. (1970), A New Approach to Variable Metric Algorithms, Computer Journal 13 (3): 317322. [18] Gladyshev, E.G. (1961), Periodically correlated random sequences. Soviet Mathematics, 2, 385–388. [19] Goldfarb, D. (1970), A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24 (109): 2326. [20] Jones, R.H. and W.M. Brelsford (1967), Times series with periodic structure. Biometrika, 54, 403–408. [21] Lehmann, E.L. and Caselle, G. (1998), Theory of Point Estimation, 2nd ed. Springer. [22] Lund, R.B. and I.V. Basawa (1999), Modeling and inference for periodically correlated time series. Asymptotics, Nonparameterics, and Time Sereis(ed. S. Ghosh), 37–62. Marcel Dekker, New York. [23] Lund, R.B. and I.V. Basawa (2000), Recursive prediction and likelihood evaluation for periodic ARMA models. Journal of Time Series Analysis, 20 (1), 75–93. [24] Basawa, I.V. and Lund, R.B. (2001), Large sample properties of parameter estimates for periodic ARMA models. Journal of Time Series Analysis, 22(6), 651–663. 187 [25] Lund, R.B., Shao Q. and I.V. Basawa (2006), Parsimonious periodic time series modeling. Australian & New Zealand Journal, 48(1), 33–47. [26] Makagon, A., Miamee, A. G. and Salehi, H. Periodically correlated processes and their spectrum(1991). Nonstationary stochastic processes and their applications, 147-164. [27] Makagon, A., Miamee, A. G. and Salehi, H. Continuous time periodically correlated processes: spectrum and prediction(1994). Stochastic Process. Appl., (49)2, 277-295. [28] Makagon, A., Miamee, A. G., Salehi, H. and Soltani, A. R. On spectral domain of periodically correlated processes(2008). Theory Probab. Appl., 52(2), 353-361. [29] Makagon, A. and Salehi, H. Structure of periodically distributed stochastic sequences(1993). Stochastic processes, 245-251. [30] Mcleod, A.I. (1994), Diagnostic checking of periodic autoregression models with application. Journal of Time Series Analysis, 15(2), 221-233. [31] Noakes, D.J., Mcleod, A.I. and Hipel, K.W. (1985), Forecasting monthly riverflow time series. International Journal of Forecasting, 1, 179–190. [32] Pagano, M. (1978), On periodic and multiple autoregressions. Annals of Statistics, 6 (6), 1310–1317. [33] R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org. [34] Salas, J.D, D.C. Boes and R.A. Smith (1982), Estimation of ARMA models with seasonal parameters. Water Resources Research, 18, 1006–1010. [35] Salas, J.D, G.Q. Tabios III and P. Bartolini (1985), Approaches to multivariate modeling of water resources time series. Water Resources Bulletin, 21, 683–708. [36] Salas, J.D. and J.T.B Obeysekera (1992), Conceptual basis of seasonal streamflow time series models. Journal of Hydraulic Engineering, 118 (8), 1186–1194. [37] Schwarz,G. (1978), Estimating the dimension of a model. Annals of Statistics, 6, 461– 464. 188 [38] Shanno, D.F. (1970), Conditioning of quasi-Newton methods for function minimization, Math. Comput. 24 (111): 647656. [39] Shao, Q. and R.B. Lund (2004), Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis, 25 (3), 359–372. [40] Shiryaev, A.N. (1984), Probability. Springer-Verlag, New York. [41] Tesfaye, Y.G. (2005), Seasonal Time Series Models and Their Application to the Modeling of River Flows. PhD. Dissertation, Uinversity of Nevada, Reno, Nevada. [42] Tesfaye, Y.G., M.M. Meerschaert and P.L. Anderson (2006), Identification of PARMA Models and Their Application to the Modeling of River Flows. Water Resources Research, 42 (1), W01419. [43] Y.G. Tesfaye, P.L. Anderson, and M.M. Meerschaert (2011), Asymptotic results for Fourier-PARMA time series. Journal of Time Series Analysis, 32 (2), 157–174. [44] Tiao, G.C. and M.R. Grupe (1980), Hidden periodic autoregressive moving average models in time series data. Biometrika 67, 365-373. [45] Tjøstheim, D. and J. Paulsen (1982), Empirical identification of multiple time series.Journal of Time Series Analysis , 3, 265–282. [46] Thompstone,R.M., K.W. Hipel and A.I. McLeod (1985), Forecasting quarter-monthly riverflow. Water Resources Bulletin, 25 (5),731–741. [47] Troutman, B.M. (1979), Some results in periodic autoregression. Biometrika, 6, 219– 228. [48] Ula, T.A. (1990), Periodic covariance stationarity of multivariate periodic autoregressive moving average processess. Water Resources Research, 26 (5), 855–861. [49] Ula, T.A. (1993), Forcasting of Multivariate periodic autoregressive moving average processes. Journal of Time Series Analysis, 14, 645–657. [50] Ula, T.A. and A.A Smadi (1997), Periodic stationarity conditions for periodic autoregressive moving average processess as eigenvalue problems. Water Resources Research, 33 (8), 1929–1934. 189 [51] Ula, T.A. and A.A Smadi (2003), Identification of periodic moving average models. Communications in Statistics: Theory and Methods, 32 (12), 2465–2475. [52] Ursu, E. and Turkman K.F. (2012), Periodic autoregression model identification using genetic algorithms. Journal of Time Series Analysis, doi: 10.1111/j.14679892.2011.00772.x. [53] Vecchia , A.V. (1985a), Periodic autoregressive-moving average (PARMA) modelling with applications to water resources. Water Resources Bulletin 21, 721–730. [54] Vecchia, A.V. (1985b), Maximum likelihood estimation for periodic moving average models. Technometrics, 27 (4), 375–384. [55] Vecchia , A.V. and R. Ballerini (1991), Testing for periodic autocorrelations in seasonal time series data. Biometrika, 78 (1), 53–63. [56] Wyloma´ nska, A. (2008), Spectral measures of PARMA sequences. Journal of Time Series Analysis, 29 (1), 113. [57] Yao, Q. and Brockwell, P.J. (2006), Gaussian maximum likelihood estimation for ARMA models I: time series. Journal of time series analysis, 27 (6), 857-875. 190