ESSAYS ON ESTIMATION AND INFERENCE IN MODELS WITH DETERMINISTIC TRENDS WITH AND WITHOUT STRUCTURAL CHANGE By Jingjing Yang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Economics 2010 ABSTRACT ESSAYS ON ESTIMATION AND INFERENCE IN MODELS WITH DETERMINISTIC TRENDS WITH AND WITHOUT STRUCTURAL CHANGE By Jingjing Yang Empirical macroeconomists who analyze typical time series, such as GDP, interest rates, stock returns have to worry about the structural change. Possibilities of structural change over time and the properties of structural change parameters are the focus of my dissertation. It includes the choice of break point estimators between using level shift model and trend shift model, break tests robust to I(0)/I(1) errors, and the estimation of break numbers. I. Break Point Estimates for a Shift in Trend: Levels versus First Differences In the first chapter I analyze the estimation of an unknown break point in a univariate trend shift model under I(1) errors by minimizing the sum of squared residuals. Two break point estimators are considered, one is from the original trend shift model and the other is from its first difference, a mean shift model with I(0) errors. Simulations show a discrepancy between existing asymptotic theories and finite sample distributions of the break point estimators. To achieve a closer approximation, I derive an asymptotic theory for the break point estimators assuming the break magnitude is within a T −1/2 neighborhood of zero. The break point estimator from the trend break model converges to its true value at rate T 1/2 under I(1) errors, while the break point estimator of the first difference model converges at rate T under I(0) errors. Given this fact, many researchers would think they should use the estimator that converges to the faster rate. However I show that when the break magnitude is small relative to the noise magnitude, the break point estimator from the trend shift model may have thinner tails and concentrates more on the true break point than that from the first difference transformation. That indicates a preference of the break point estimator from the level model. II. Fixed-b Analysis of LM Type Tests for a Shift in Mean We analyze lagrange multiplier (LM ) tests for a shift in mean of a univariate time series at an unknown date. We consider a class of LM statistics based on nonparametric kernel estimates of the long run variance and we develop a fixed-b asymptotic theory for the statistics. We provide results for the case of I(0) and I(1) errors and use the fixed-b theory to explain finite sample null rejection probabilities and finite sample power of the LM tests. We show that the choice of bandwidth has a large impact on the size and power of the tests. In particular we find that larger bandwidths lead to non-monotonic power whereas smaller bandwidths give tests with monotonic power. The fixed-b theory suggests that, for a given statistic, kernel and significance level, there exists a “robust” bandwidth such that the fixed-b asymptotic critical value is the same for both I(0) and I(1) errors. In the case of the supremum statistic, the robust bandwidth LM test has good power that is monotonic whereas the power of the mean statistic is non-monotonic. III. Consistency of Break Point Estimator under Misspecification of Break Number In this chapter, I discuss the inconsistency of sequential trend break point estimators in the presence of underspecification of the number of breaks. The analysis of models with level shifts has been documented by researchers under comprehensive settings such as allowing a time trend in the model. Despite the consistency of break point estimators of level shifts, there are few papers on the consistency of trend shift point estimators under misspecification. My simulation study and asymptotic analysis show that the trend break point estimators do not converge to the true breaks points under most conditions when the number of estimated breaks is smaller than the true number of breaks. This inconsistency leads to a potential power loss for testing for multiple trend breaks. Taking first difference is proposed to deal with this problem under certain circumstances. Copyright by Jingjing Yang 2010 ACKNOWLEDGMENTS I am very grateful to my advisor, Professor Tim Vogelsang for his continuous support and guidance. I also wish to express my gratitude to my dissertation committee, Professor Emma Iglesias, Professor Peter Schmidt, and Professor Lijian Yang. My thanks also go to my parents. v TABLE OF CONTENTS LIST OF TABLES viii LIST OF FIGURES x 1 Break Point Estimates for a Shift in Trend: Levels versus First Differences 1.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Models, Assumptions, and Two Break Point Estimators . . . . . . . . . . . ˆ ˆ 1.3 Existing analysis of λT S and λM S . . . . . . . . . . . . . . . . . . . . . . ˆ ˆ 1.4 Finite Sample Behavior of λT S and λM S . . . . . . . . . . . . . . . . . . ˆ ˆ 1.5 Asymptotic Analysis of λT S and λM S when δ is Local to 0 at Rate T 1/2 . . 1.6 Break Point Estimators of The Trend Shift Model and its Partial Sum Model 1.7 Application to One-step Ahead Forecasts . . . . . . . . . . . . . . . . . . . 1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 5 9 12 26 30 37 2 Fixed-b Analysis of LM Type Tests for a Shift in Mean 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 2.2 Model and Assumptions . . . . . . . . . . . . . . . . . 2.3 LM Tests for a Shift in Mean . . . . . . . . . . . . . . 2.4 Finite Sample Behavior of the LM Tests . . . . . . . . 2.5 Fixed-b Asymptotic Analysis of LM Mean Shift Tests 2.6 Bandwidths That Control Size . . . . . . . . . . . . . 2.7 The W ald∗ Statistic of Kejriwal (2009) . . . . . . . . . . . . . . . 38 38 40 42 43 52 60 67 . . . . . . . 74 74 76 78 84 84 89 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Consistency of the Sequential Trend Break Point Estimator 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model Assumption . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Existing Analysis and Finite Sample Simulations . . . . . . . . 3.4 Break Date Estimator under Multiple Breaks . . . . . . . . . . . 3.4.1 Multiple mean shifts . . . . . . . . . . . . . . . . . . . 3.4.2 Multiple trend shifts . . . . . . . . . . . . . . . . . . . ˆ ˆ 3.4.3 Consistency/Inconsistency conclusion of λM S and λT S . 3.5 Break point estimators for level and first difference model under breaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Application to Sequential Tests of Multiple Breaks Model . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . multiple . . . . . . . . . . . 91 . 96 APPENDICES A.1 Extension of the asymptotics in Theorem 1.5.1 to near-I(1) errors A.2 Proof of Theorem 1.5.1 . . . . . . . . . . . . . . . . . . . . . . A.2.1 Proof of part 1 in Theorem 1.5.1 . . . . . . . . . . . . . A.2.2 Proof of part 2 in Theorem 1.5.1 . . . . . . . . . . . . . A.3 Proof that arg maxλ G2(λ, λc ) = λc . . . . . . . . . . . . . . . A.4 Proof of Corollary 0.1.1 . . . . . . . . . . . . . . . . . . . . . . A.4.1 Proof of part 1 in Corollary 0.1.1 . . . . . . . . . . . . . A.4.2 Proof of part 2 in Corollary 0.1.1 . . . . . . . . . . . . . A.5 Proof of Theorem 1.6.2 . . . . . . . . . . . . . . . . . . . . . . A.5.1 Proof of part 1 in Theorem 1.6.2 . . . . . . . . . . . . . A.5.2 Proof of part 2 in Theorem 1.6.2 . . . . . . . . . . . . . B.1 Proofs and Additional Results of Chapter 2 . . . . . . . . . . . . C.1 Proof of Theorem 3.4.4 . . . . . . . . . . . . . . . . . . . . . . C.2 Proof of Theorem 3.4.5 . . . . . . . . . . . . . . . . . . . . . . C.3 Analysis of G2T S under two breaks . . . . . . . . . . . . . . . C.4 Proof of Theorem 3.5.6 . . . . . . . . . . . . . . . . . . . . . . ˆ C.4.1 Proof of part 1: λT S . . . . . . . . . . . . . . . . . . . ˆ C.4.2 Proof of part 2: λM S . . . . . . . . . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 113 114 114 117 119 121 121 121 122 122 124 125 126 130 133 137 137 137 139 vii LIST OF TABLES Table 1.1 Mean squared error (MSE) of one-step ahead forecasts of the trend shift model (1.2.1) under I(1) errors (For one-step forecast yt+1 , the ˆ 2 .) Simulation settings: λc = 0.5; MSE is defined as (ˆt+1 − yt+1 ) y δ = 0, 0.1, 0.2, 0.3, 0.4, 0.5; ut is I(1); T = 101; and N = 10, 000. ∗ ∗ OLS1 and OLS2 assume that ut is known to be I(1). . . . . . . . . . 32 Table 1.2 MSE in one-step forecast with the log real per capita GDP series (1870-1996) with trend shift model (1.2.1) and different break point ˆ ˆ estimators (λT S and λM S ), where MSE of one-step forecast are calculated for 1987-1996. . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 1.3 Mean squared error (MSE) of one-step ahead forecasts of the trend shift model (1.2.1) under I(0) errors (For one-step forecast yt+1 , the ˆ 2 for point forecasts.) Simulation MSE is defined as (ˆt+1 − yt+1 ) y c = 0.5; δ = 0, 0.01, 0.02, 0.03, 0.04, 0.05; u is I(0); T = settings: λ t ∗ and OLS ∗ assume that u is known to 101; and N = 10, 000. OLS1 t 2 be I(0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 1.4 MSE in one-step forecast with the log real per capita GDP series (1870-1996) with trend shift model (1.2.1) and different break point ˆ ˆ estimators (λT S and λQS ), where MSE of one-step forecast is calculated for 1987-1996. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 2.1 Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, QS Kernel. . . . . . . . . 45 Table 2.2 Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, Bartlett Kernel. . . . . . 46 Table 2.3 Finite Sample Behavior of Data Dependent Bandwidth to Sample Size Ratios, T = 120, QS Kernel. . . . . . . . . . . . . . . . . . . . . 51 Table 2.4 Finite Sample Behavior of Data Dependent Bandwidth to Sample Size Ratios, T = 120, Bartlett Kernel . . . . . . . . . . . . . . . . . . 51 viii Table 2.5 Fixed-b Asymptotic Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, QS Kernel 61 Table 2.6 Fixed-b Asymptotic Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, Bartlett Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table 2.7 I(0)/I(1) Robust Bandwidths and Critical Values QS kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Table 2.8 I(0)/I(1) Robust Bandwidths and Critical Values Bart kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Table 2.9 Finite Sample Null Rejection Probabilities for Tests Using Size Robust Bandwidths and Fixed-b I(0)/I(1) Critical Values, 5% Nominal Level, 15% Trimming, QS Kernel. . . . . . . . . . . . . . . . . . . . 68 Table 2.10 Finite Sample Null Rejection Probabilities for Tests Using Size Robust Bandwidths and Fixed-b I(0)/I(1) Critical Values, 5% Nominal Level, 15% Trimming, Bartlett Kernel. . . . . . . . . . . . . . . . . . 69 Table 3.1 Sum of densities at the true break {1/3, 2/3} under different ρ and M1 λc and λc where {λc , λc } = 1 2 1 2 = M2 . . . . . . . . . . . . . . . 95 Table 3.2 Sum of densities at the true break {1/3, 2/3} under different ρ and M1 λc and λc where {λc , λc } = 1 2 1 2 = −M2 . . . . . . . . . . . . . . 95 Table 3.3 Sum of densities at the true break λc and λc where {λc , λc } = 1 2 1 2 {1/3, 2/3} under different ρ and |M1 | = |M2 |, where M1 = 50(δ1 = 5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 3.4 Probability of Break Number Selection m for Trend Shift Model with ˆ c , λc } = {1/2, 2/3}, δ = 1, θ = 0, T = 120. . . . . . . . 99 2 breaks: {λ1 2 1 Table 3.5 Probability of Break Number Selection m for Trend Shift Model with ˆ c , λc } = {1/2, 2/3}, δ = 1, θ = 0.5, T = 120. . . . . . . 111 2 breaks: {λ1 2 1 ix LIST OF FIGURES ˆ ˆ Figure 1.1 Comparison of the pdf of λT S and λM S using the asymptotics of Bai(1994) and PZ(2005) with λc = 0.5. x-axis: λ; y-axis: pdf. The left from top to bottom: M = 1, 3, 5, 7; the right from top to bottom: M = 2, 4, 6, 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 ˆ ˆ Figure 1.2 Histograms of the break point estimators λT S and λM S . µ = β = 0; λc = 0.5; δ = 0.2, 0.4, 0.6, 0.8; ut : I(1) errors; T = 100; and N = 30, 000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 1.3 The asymptotic pdfs of Bai(1994), PZ(2005) and Theorem 1.5.1 ˆ ˆ with the empirical pdf of λT S and λM S . λc = 0.5; M = 2, 4, 6, 8; ut : I(1) errors; T = 100; and N = 30, 000. The left: λT S (solid: Finite sample; dash: PZ(2005); dash-dot: Theorem 1). The right: λM S (solid: Finite sample; dash: Bai(1994); dash-dot: Theorem 1). . 11 ˆ ˆ Figure 1.4 Asymptotic pdf of λT S and λM S by Theorem 1.5.1 at λc = 0.5 and M = 1, 2, 3, 4, 5, 6, 7, 8. . . . . . . . . . . . . . . . . . . . . . . . . 15 ˆ ˆ Figure 1.5 Finite sample histograms and asymptotic pdf of λT S and λM S in ˆ the case of no breaks. (a) Histogram of λT S (N=30,000 replications ˆ and sample length T=100). (b) Histogram of λM S (N=30,000 repliˆ cations and sample length T=100). (c) Asymptotic pdf of λT S and ˆ λM S under no breaks. . . . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 1.6 G2T S (λ, λc ) and G2M S (λ, λc ) in equation (1.5.16) and (1.5.17) when λc = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Figure 1.7 The finite sample pdf and the asymptotic pdf by Bai(1994) and ˆ ˆ PZ(2005) of λT S and λM S under fixed M = 2, 4, 6, 8, d(1) = 1, and T = 100, 200, 500, 1000. solid: T = 100; dash: T = 200; dot: T = 500; dash-dot: T = 1000; ’·’: pdf(Bai); ’o’: pdf(Theo. 1). . . . 20 x Figure 1.8 The finite sample pdf and theoretical pdf by Bai(1994), PZ(2005), ˆ ˆ and Theorem 1.5.1 of λT S and λM S under fixed δ = 0.4, d(1) = 1, and T = 100, 200, 500, 1000. Solid: finite sample; ’·’: Theorem 1; dash: PZ(the left) or Bai(the right). . . . . . . . . . . . . . . . . . . 21 Figure 1.9 The asymptotic pdfs of Bai(1994), PZ(2005) and Theorem 1.5.1 ˆ ˆ with the empirical pdfs of λT S and λM S . λc = 0.2; M = 2, 4, 6, 8; ut : I(1) errors; T = 100; and N = 30, 000. Solid: finite sample; dash-dot: Theorem 1; dash: PZ(left) or Bai(right). . . . . . . . . . . 23 ˆ ˆ Figure 1.10 Asymptotic pdfs of λT S and λM S by Theorem 1.5.1 for λc = 0.2 and M = 1, 2, 3, 4, 5, 6, 7, 8. . . . . . . . . . . . . . . . . . . . . . . 24 Figure 1.11 Pdfs of the break point estimators from the “TS-MS” models, under I(1) ut s, λc = 0.5, T = 100, M = 4(δ = 0.4), and different trimmings λ∗ = 0.05, 0.1, 0.15, 0.2. . . . . . . . . . . . . . . . . . . 25 Figure 1.12 Finite and asymptotic pdf by Theorem 1.6.2 and PZ(2005) (only for ˆ ˆ ˆ λT S ) of λT S and λQS (“TS-QS”) under I(0) ut s, λc = 0.5, T = 100, d(1) = 1, and δ = 0, 0.02, 0.04, 0.1. Solid: finite sample; dashdot: Theorem 2; dash: PZ(left) or Bai(right). . . . . . . . . . . . . . 28 Figure 1.13 The real (log) per capita GDP of Italy, Norway, and Sweden, which are of I(1) errors. x-axes: year; y-axes: (log)Per Capita GDP. . . . . 33 Figure 1.14 The real (log) per capita GDP of Australia, Canada, Germany, UK, and US, which are of I(0) errors. x-axes: year; y-axes: (log)Per Capita GDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 2.1 Finite Sample Power, QS kernel, 15% Trimming. . . . . . . . . . . . 47 Figure 2.2 Finite Sample Power, QS kernel, 15% Trimming. . . . . . . . . . . . 48 Figure 2.3 Finite Sample Power, Bartlett kernel, 15% Trimming. . . . . . . . . 49 Figure 2.4 Finite Sample Power, Bartlett kernel, 15% Trimming. . . . . . . . . 50 Figure 2.5 Finite Sample and Asymptotic Power of M eanLM , QS kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 2.6 Finite Sample and Asymptotic Power of SupLM , QS kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 xi Figure 2.7 Finite and Asymptotic Power of M eanLM , Bartlett kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 2.8 Finite and Asymptotic Power of SupLM , Bartlett kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 2.9 Asymptotic Fixed-b Critical Values, 5% Level, QS kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 2.10 Asymptotic Fixed-b Critical Values, 5% Level, Bartlett kernel, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Figure 2.11 Finite Sample Power of Robust Bandwidth Tests, 5% Level, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure 2.12 Finite Sample Power of Robust Bandwidth Tests, 5% Level, 15% Trimming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 ˆ Figure 3.1 Histogram of single break point estimator λM S in two breaks model: {λc , λc } = {1/3, 2/3}. δ1 = 1 always. From left to right: 1 2 ν = −2(δ2 = −2), −1(δ2 = −1); from top to bottom: T = 100, 250, 500, 1000. . . . . . . . . . . . . . . . . . . . . . . . . . . 80 ˆ Figure 3.2 Histogram of single break point estimator λM S in two breaks model: {λc , λc } = {1/3, 2/3}. δ1 = 1 always. From left to right: ν = 1 2 1(δ2 = 1), 2(δ2 = 2); from top to bottom: T = 100, 250, 500, 1000. . 81 ˆ Figure 3.3 Histogram of single break point estimator λT S in two breaks: {λc , λc } = {1/3, 2/3}. δ1 = 1 always. The left to right: 1 2 ν = −2(δ2 = −2), −1(δ2 = −1); The top to bottom: T = 100, 250, 500, 1000. . . . . . . . . . . . . . . . . . . . . . . . . . . 82 ˆ Figure 3.4 Histogram of single break point estimator λT S in two breaks: {λc , λc } = {1/3, 2/3}. δ1 = 1 always. The left to right: ν = 1 2 1(δ2 = 1), 2(δ2 = 2); The top to bottom: T = 100, 250, 500, 1000. . 83 Figure 3.5 G2M S (λ, λc ) under λc = 0.5 for mean shift model . . . . . . . . . . 86 Figure 3.6 |G2M S (λ, λc ) + ν · G2M S (λ, λc )| under different ν = 1 and -1 for 1 2 mean shift model, where {λc , λc } = {1/4, 3/4}. . . . . . . . . . . . 87 1 2 Figure 3.7 G2T S (λ, λc ) under λc = 0.5 for trend shift model . . . . . . . . . . 88 xii Figure 3.8 |G2T S (λ, λc ) + ν · G2T S (λ, λc )| under ν = 1 and -1 for trend shift 1 2 model, where {λc , λc } = {1/4, 3/4}. . . . . . . . . . . . . . . . . . 100 1 2 Figure 3.9 Finite sample distribution with the asymptotic distribution of ˆ ˆ λT S and λM S at ν = −5. The left to right: {λc , λc } = 1 2 {1/4, 3/4}, {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 101 ˆ Figure 3.10 Finite sample distribution with the asymptotic distribution of λT S c , λc } = {1/4, 3/4}; the right: ˆ and λM S at ν = −2. The left: {λ1 2 c , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. {λ1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 102 ˆ Figure 3.11 Finite sample distribution with the asymptotic distribution of λT S c , λc } = {1/4, 3/4}; the right: ˆ and λM S at ν = −1. The left: {λ 1 2 {λc , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. 1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 103 ˆ Figure 3.12 Finite sample distribution with the asymptotic distribution of λT S ˆ M S at ν = −0.5. The left: {λc , λc } = {1/4, 3/4}; the right: and λ 1 2 {λc , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. 1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 104 ˆ Figure 3.13 Finite sample distribution with the asymptotic distribution of λT S ˆ and λM S at ν = 0.5. The left: {λc , λc } = {1/4, 3/4}; the right: 1 2 {λc , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. 1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 105 ˆ Figure 3.14 Finite sample distribution with the asymptotic distribution of λT S ˆ and λM S at ν = 1. The left: {λc , λc } = {1/4, 3/4}; the right: 1 2 {λc , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. 1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 106 xiii ˆ Figure 3.15 Finite sample distribution with the asymptotic distribution of λT S c , λc } = {1/4, 3/4}; the right: ˆ and λM S at ν = 2. The left: {λ1 2 c , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. {λ1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 107 ˆ Figure 3.16 Finite sample distribution with the asymptotic distribution of λT S ˆ and λM S at ν = 5. The left: {λc , λc } = {1/4, 3/4}; the right: 1 2 {λc , λc } = {1/3, 2/3}; the top to bottom: T = 100, 250, 500, 1000. 1 2 ˆ ˆ ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . . . . . . . . . . . . . . 108 Figure 3.17 λ to achieve maximal G2T S (λ, λc ) + ν · G2T S (λ, λc ), {λc , λc } = 1 2 2 1 {1/3, 2/3}, ν = −10, · · · , 10. . . . . . . . . . . . . . . . . . . . . . 109 Figure 3.18 λ to achieve maximal G2T S (λ, λc ) + ν · G2T S (λ, λc ), {λc , λc } = 1 2 1 2 {1/4, 3/4}, ν = −10, · · · , 10. . . . . . . . . . . . . . . . . . . . . . 110 xiv CHAPTER 1 Break Point Estimates for a Shift in Trend: Levels versus First Differences 1.1 Introduction and Motivation The break point estimator in the mean shift model or the trend shift model is analyzed extensively by Bai (1994), Bai and Perron (1998)(BP hereafter), Perron and Zhu (2005)(PZ hereafter). The least squares (LS) estimator is considered in these papers, and the break points are estimated by minimizing the sum of squared residuals (SSR). Bai (1994) analyzes the break point estimator of the mean shift model under the assumption that the break magnitude is much greater than T −1/2 , where T is the sample size. He derives that, for the mean shift model with I(0) errors, the break point estimator converges to the true break at rate T . BP(1998) extend the single unknown break to multiple unknown breaks under both fixed and shrinking shift magnitudes. PZ(2005) analyze the break point estimator of the trend shift model, which allows joint breaks in both the intercept and the trend under both I(0) and I(1) errors. They assume a fixed shift in trend, and show that the break point estimator converges at rate T 1/2 for the trend shift model under I(1) errors. The existing literature examines break point estimators of the mean shift model and the 1 trend shift model separately. I analyze the two estimators from these two models using the same data generating process (DGP). A trend shift model with unit root errors can be transformed into a mean shift model with stationary errors and vice versa. In other words, there are two ways to represent the same DGP: we could start with a trend shift model with I(1) errors and first differencing it to obtain a mean shift model with I(0) errors; or we could start with a level shift model with I(0) error and partial sum it to a trend shift model with I(1) errors. Based on the convergence rate of the estimators, many researchers would estimate the break point using the first differenced form, which has a faster convergence rate based on Bai and Perron (1998)’s and PZ(2005)’s results. However, it will be shown in this chapter that first differencing is not always better, i.e., the break point estimator from the trend shift model could be preferred though it converges slower. The finite sample results show that the break point estimators have special tail behaviors which are not captured by the existing asymptotic approximations. Therefore I develop a new asymptotic theory to capture the tail behavior. I assume that the break magnitude is within a local T −1/2 neighborhood of zero and show the following in this chapter: a) The asymptotics by Bai and Perron (1998) and PZ(2005) indicate a certain range of break magnitudes where the level model break point estimator behaves better than the first difference estimator. However, there is considerable discrepancy between these asymptotics and the finite sample distributions; b) The proposed asymptotics more closely resemble the true distributions of the break point estimators. My results lead to the counter-intuitive result that first differencing is not always the better way to estimate the break point under I(1) errors. In fact the break point estimator from the level model can have thinner tails in the distribution and concentrates more at the true break point when the break magnitude is small relative to noise. The rest of this chapter is organized as follows. Section 2 describes the model and lays out the assumptions. It defines the break point estimators from the level model and the first difference model. Section 3 summarizes and compares the existing asymptotic 2 results by Bai and Perron (1998) and PZ(2005). Section 4 provides the finite sample results of the two break estimators, showing the discrepancy between existing asymptotics and finite sample behavior. In Section 5, I develop the new asymptotic theory assuming the break magnitude is local to 0 at rate T −1/2 and show that the new theory captures the important finite sample patterns. Extensions to the trend shift model with I(0) errors and its partial sum model are included in Section 6. Section 7 gives an example where using the break point estimator from the trend shift model may reduce one-step ahead forecast errors. Section 8 summarizes the major results of paper. 1.2 Models, Assumptions, and Two Break Point Estimators For simplicity, I use “TS-MS” to denote a pair of level model (the trend shift model under I(1) errors) and its first difference (the mean shift model under I(0) errors). Let us start with a simple linear trend shift model (TS model): yt = µ + βt + δDTt (λc ) + ut , t = 1, · · · , T (1.2.1) c where δ is the break magnitude, λc is the true break point with Tb = λc T , and . DTt (λc ) = c 0, t ≤ Tb c c . t − Tb , t > Tb The error is assumed to be I(1), defined by assumption (A1.a). (A1.a) ut = ut−1 + εt , where ∞ ∞ di Li , εt = d(L)et ; d(L) = i|di | < ∞, d(1)2 > 0; i=0 i=0 L is the lag operator; {et } is a martingale difference sequence with supt E(e4 ) < ∞, t E(et |et−1 , et−2 , · · · )=0, and E(e2 |et−1 , et−2 , · · · )=1. t 3 The first differenced model can be written as: ∆yt = β + δDUt (λc ) + ∆ut , t = 2, · · · , T (1.2.2) . c where DUt (λc ) = 1(t > Tb ); and c . 1(t > Tb ) = c 0, t ≤ Tb c . 1, t > Tb Because {ut } is I(1), GLS estimates are obtained using this first difference transformation. The error of the first differenced model is I(0), given by ∆ut = εt . Existing asymptotics of the break point estimators depend on assumptions about δ. Typical assumptions of δ in the literature are (A2.a) (A2.b) δ = a constant scalar, δ → 0, T 1/2 δ → ∞. (log T )1/2 (A2.a) is the assumption used by PZ(2005). (A2.b) is the assumption used by Bai (1994), where δ >> T −1/2 . Though trimming is not necessary in break point estimation as stated in PZ(2005), it is . commonly used in break tests. Consider the grid of possible break dates: Λ∗ = [Tλ∗ , Tλ∗ + 1, · · · , T −Tλ∗ ]. The corresponding grid of the break points is defined as Λ = [λ∗ , · · · , 1− . Tλ∗ λ∗ ], where λ∗ = T . Denote SSR(λ) as the sum of squared residuals (SSR) with a single break at Tb = [λT ] 0 and SSR0 as SSR with no break. We further define SSRT S and SSRT S (λ) as SSR0 and 0 SSR(λ) for the trend shift model (1.2.1), and SSRM S and SSRM S (λ) as those for the mean shift model (1.2.2). 4 0 SSRT S and SSRT S (λ) are calculated as T . 0 SSRT S = [yt − (˜ + βt)]2 , µ ˜ t=1 T . SSRT S (λ) = ˆ [yt − (ˆ + βt + δDTt (λ))]2 , µ ˆ t=1 ˜ ˆ where µ and β are the OLS estimates of model (1.2.1) with δ = 0 imposed; µ, β, and δ are ˜ ˆ ˆ 0 the OLS estimates of model (1.2.1) with no restrictions imposed. SSRM S and SSRM S (λ) are calculated likewise as: 0 SSRM S . = . SSRM S (λ) = T ˜ (∆yt − β)2 t=2 T ˆ ˆ [∆yt − (β + δDUt (λ))]2 , t=2 ˜ ˆ ˆ where δ is the OLS estimate of model (1.2.2) with δ = 0 imposed; β and δ are the OLS estimates of model (1.2.2) with no restrictions imposed. The break points are estimated by minimizing SSRT S (λ) or SSRM S (λ) over the set Λ. ˆ ˆ λT S and λM S denote the break point estimator for the level model and its first difference respectively and are defined as ˆ λT S = arg min SSRT S (λ), (1.2.3) ˆ λM S = arg min SSRM S (λ). (1.2.4) λ∈Λ λ∈Λ 1.3 ˆ ˆ Existing analysis of λT S and λM S ˆ ˆ Bai and Perron (1998) and PZ(2005) provide limiting distributions of λT S and λM S under fixed or shrinking break magnitudes. Deng and Perron (2006) extend PZ(2005)’s results from an “unbounded-trend” asymptotic framework to a “bounded-trend” asymptotic framework where the break point estimator is restricted to [0, 1] by normalizing the range of T . 5 This extension for the trend shift model under I(1) errors is of little use here since the limiting distribution under the “bounded-trend” asymptotic framework is essentially equivalent to that obtained assuming no trend shift. The point of reference in this chapter are the results of Bai and Perron (1998) and PZ(2005) which I now review. Under assumption (A1.a) and (A2.a), for the level model (1.2.1), PZ(2005) prove that √ d ˆ T (λT S − λc ) − N (0, → 2d(1)2 ). 15δ 2 (1.3.5) Under assumption (A1.a) and (A2.b), for the first differenced model (1.2.2), Bai (1994) proves that d ˆ T (λM S − λc ) − → 1 d(1)2 arg max{W1 (r) − |r|}, r 2 δ2 (1.3.6) where r ∈ R and W1 (r) is a two-sided Brownian motion on R1 . Equations (1.3.5) and (1.3.6) show that the break point estimator from the first differenced model converges to zero at speed of T , faster than the T 1/2 rate in the level model. . 1/2 For a given T , d(1), and δ, if we define M = Td(1)δ , the limiting distributions by Bai (1994) and PZ(2005) can be approximated in terms of M . We can describe the effect of M on the limiting distributions and compare the performance of the two estimators as M varies. For a given T , we have the following implications from the existing asymptotic theories. a) Under assumption (A1.a) and (A2.a), for model (1.2.1), ˆ λT S − λc ≈ N (0, 2 ). 15M 2 ˆ The probability density function of λT S is given by √ ˆ 15M 15(λT S − λc )2 M 2 ˆT S ) = √ h(λ exp(− ); 4 4π (1.3.7) (1.3.8) b) Under assumption (A1.a) and (A2.b), for model (1.2.2), ˆ λM S − λc ≈ 1 1 arg max{W1 (r) − |r|}. 2 r 2 M 6 (1.3.9) It has been shown by Yao (1987) that arg maxr {W1 (r) − |r|/2} has the distribution function √ √ √ 1 3 H(x) = 1 + (2π)−1/2 xe−x/8 − (x + 5)Φ(− x/2) + ex Φ(−3 x/2), 2 2 for x > 0 and H(x) = 1 − H(−x) for x < 0, with Φ(x) the distribution function of a standard normal random variable. The density function can be derived as: . h(x) = H(x) √ 3 x x 3√ 1 = e Φ(− x) − Φ(− ). 2 2 2 2 ˆ The probability density function of λM S is given by ˆ 3M 3M 2 exp(M 2 (λ − λc )) ˆ Φ(− h(λM S ) = 2 ˆ M λ M S − λc M2 Φ(− ). − 2 2 ˆ λM S − λc 2 ) (1.3.10) ˆ Based on equations (1.3.8) and (1.3.10), we can compare the densities of λT S and ˆ λM S with respect to M . In Figure 1.1, the limiting densities are depicted in pairs for ˆ ˆ M = 1, 2, · · · , 8. We can see that λT S mostly (M > 1) has thinner tails than λM S but the ˆ concentrations around λc are different. For small values of M , λT S is more concentrated ˆ ˆ around λc than λM S . In this situation, first differencing does not help because λM S is ˆ ˆ dominated by λT S in terms of concentration. For large values of M , λM S is more concenˆ ˆ trated around λc . This comparison shows a crossing in the distributions of λT S and λM S along M . To describe the crossing more accurately, we can define concrete criteria on how concentrated the estimator is around λc . For a specific significance level, the critical values (CVs) can describe how tight the estimator is, and we can compare the behaviors under a significance level. Take the 80% significance level as an example. From Figure 1.1 when ˆ ˆ ˆ λc = 0.5, both CVs of λT S and λM S decrease with the increase of M . However, λM S ˆ decreases faster than λT S but starts with a much bigger value at small M . Therefore, there ˆ ˆ exists a specific value, M0 , such that for M ≤ M0 , λT S has smaller CVs, i.e., λT S has 7 M=1 1 M=2 λMS 2 λMS λ λ 1.5 TS 0.5 TS 1 0.5 0 0 0.5 0 0 1 M=3 1 M=4 4 3 8 λMS λTS λMS λTS 6 4 2 2 1 0 0 0.5 0.5 0 0 1 M=5 1 M=6 λMS 10 0.5 λMS 15 λTS λTS 10 5 5 0 0 0.5 0 0 1 M=7 30 λTS 15 1 M=8 λMS 20 0.5 λMS λTS 20 10 10 5 0 0 0.5 0 0 1 0.5 1 ˆ ˆ Figure 1.1. Comparison of the pdf of λT S and λM S using the asymptotics of Bai(1994) and PZ(2005) with λc = 0.5. x-axis: λ; y-axis: pdf. The left from top to bottom: M = 1, 3, 5, 7; the right from top to bottom: M = 2, 4, 6, 8. 8 ˆ higher densities around λc at the significance level of 80%; and for M ≥ M0 , λM S has smaller CVs, and it is tighter around λc . Based on the probability density function (pdf) curves in Figure 1.1, we can estimate that for the 80% significance level M0 is between 7 and 8 for λc = 0.5. ˆ ˆ We can also observe that for small values of M , the densities of λT S and λM S do not ˆ collapse to zero when λ is outside of [0, 1]. Because the estimators cannot be outside [0, 1], this implies potential problems of these asymptotic approximations in practice. 1.4 ˆ ˆ Finite Sample Behavior of λT S and λM S ˆ ˆ In this section, I first use a simple simulation to illustrate the properties of λT S and λM S in finite samples. I generate data based on model (1.2.1) where ut is I(1), d(L) = 1 and et is an iid N (0, 1) process. Set µ = β = 0 without loss of generality. Equation (1.2.3) and ˆ ˆ (1.2.4) are used to estimate λT S and λM S in each replication. Trimming is not necessary, however in order to ensure the invertibility of the regression matrix I use 2% trimming, i.e., λ∗ = 0.02. The results are reported for λc = 0.5, T = 100, and replications are N = 30, 000 for all cases. ˆ ˆ Figure 1.2 plots the histograms of λT S and λM S for δ = 0.2, 0.4, 0.6, 0.8. The left ˆ ˆ are the histograms of λT S , and the right are the histograms of λM S . When δ = 0.2, the ˆ histogram of λT S has one peak at λ = 0.5 and little mass at {0.02} and {0.98}. It is not ˆ close to normal in appearance. More interestingly, the histogram of λM S has three peaks ˆ ˆ around {0.02}, {0.5}, and {0.98}. Compared to λT S , λM S is less concentrated around ˆ λc for small δ. With an increase of δ, the peaks of the histogram of λM S around {0.02} ˆ and {0.98} decrease gradually. For large δ, λM S still has fatter tails but concentrates more around λc . The comparison of the concentrations matches the asymptotic results in ˆ Figure 1.1. However, the fact that there is a large mass on the tails of λM S when δ is small is missed by the asymptotic approximation given by (1.3.9). 9 λ λTS,δ=0.2 ,δ=0.2 MS 1000 1000 500 500 0 0 0.5 λ 0 0 1 0.5 λ ,δ=0.4 TS 1 ,δ=0.4 MS 1500 1500 1000 1000 500 500 0 0 0.5 λTS,δ=0.6 0 0 1 4000 1 0.5 1 2000 1000 0.5 λMS,δ=0.8 3000 2000 1 4000 3000 0.5 λMS,δ=0.6 1000 0 0 0.5 λTS,δ=0.8 0 0 1 6000 6000 4000 4000 2000 2000 0 0 0.5 1 0 0 ˆ ˆ Figure 1.2. Histograms of the break point estimators λT S and λM S . µ = β = 0; λc = 0.5; δ = 0.2, 0.4, 0.6, 0.8; ut : I(1) errors; T = 100; and N = 30, 000. 10 Let us further compare the two asymptotic theories to see how the asymptotic approximations do in practice. Figure 1.3 compares the asymptotic pdfs and the finite sample pdfs. (Each of the plots contains a third density curve corresponding to the new approximation given in Section 5 which will be discussed later.) I obtain the finite sample pdf using a non-parametric kernel density smoothing method 1 . Consistent with the histograms, under ˆ ˆ small M , λM S tends to pile up more on the tails, and less on λc . As M grows, λM S is more concentrated around λc , which is what the existing asymptotics predicts. The existing asymptotics predict the concentration patterns well in finite samples. What these asymptotics do not get right is the tail behavior. Neither of the approximations captures the finite sample tail behavior under small M . PZ(2005)’s result tends to put too little density on the tails, as does Bai (1994)’s density. It becomes less of a concern when M is large. Technically, when M is large, the tails keep going outside [0, 1], but in a practical sense it does not matter. The previous discussion shows the existing theories by Bai (1994) and PZ(2005) correctly capture the concentration patterns of the two break point estimators, but both of them miss the tail behavior and provide less accurate approximation in finite samples. Because M is a multiplicative factor in equation (1.3.8) and (1.3.10), changes in M cannot be linked to the bimodality or trimodality of the finite sample behavior in Figure 1.2. This suggests that an alternative theoretical explanation for the finite sample patterns is desirable. 1 For a given set of statistics, {X }|i = 1, · · · , N , we estimate the pdf f at x by a ˜ i ˜ kernel smooth form, f (x) = 1/n · K((x − Xi )/h)|i = 1, · · · , n, where K(.) is the kernel function and h is the bandwidth. For details see Bowman and Azzalini (1997). In this chapter, I use the standard normal distribution as the kernel function. For the same reason as in PZ(2005), i.e., the optimal data dependant bandwidth may not work well, I choose a simple bandwidth h = 0.5 ∗ σ for any error. Simulations show that h does not affect the pdf estimator much. 11 λTS,M=2 λMS,M=2 2 8 1.5 6 1 4 0.5 2 0 0 0.5 0 0 1 λTS,M=4 0.5 1 λMS,M=4 8 4 6 3 2 4 1 2 0 0 0.5 0 0 1 λTS,M=6 1 λMS,M=6 6 15 4 10 2 0 0 0.5 5 0.5 0 0 1 λTS,M=8 0.5 1 λMS,M=8 30 8 6 20 4 10 2 0 0 0.5 0 0 1 0.5 1 Figure 1.3. The asymptotic pdfs of Bai(1994), PZ(2005) and Theorem 1.5.1 with the emˆ ˆ pirical pdf of λT S and λM S . λc = 0.5; M = 2, 4, 6, 8; ut : I(1) errors; T = 100; and N = 30, 000. The left: λT S (solid: Finite sample; dash: PZ(2005); dash-dot: Theorem 1). The right: λM S (solid: Finite sample; dash: Bai(1994); dash-dot: Theorem 1). 12 1.5 ˆ ˆ Asymptotic Analysis of λT S and λM S when δ is Local to 0 at Rate T 1/2 Bai (1994) and PZ(2005) assume that the break magnitude δ is outside a T −1/2 neighborhood of zero. An alternative way to develop an asymptotic theory is to assume that δ is within a T −1/2 neighborhood using the assumption ∗ δ = δ , δ ∗ = a constant scalar. T 1/2 (A2.c) (1.5.11) ˆ ˆ Next the limiting distributions of λT S and λM S are derived under assumption (A2.c). Theorem 1.5.1 Suppose the regressions of the level model (1.2.1) and its first difference c . (1.2.2) are estimated using λ ∈ Λ ⊆ (0, 1) and Tb = λc T is the true break. Under assumption (A1.a) and (A2.c), the break point estimators defined by (1.2.3) and (1.2.4) have the limiting distributions as follows: 1. For the level model (1.2.1), d ˆ λT S − arg max{ → 1 1 δ∗ [ 0 F (r, λ)W (r)dr + d(1) 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr λ∈Λ } (1.5.12) where . F (r, λ) = λ3 − 2λ2 + λ − (2λ3 − 3λ2 + 1)r, if r ≤ λ, λ3 − 2λ2 − (2λ3 − 3λ2 )r, if r > λ, which implies the approximation 1 1 c 2 ˆ T S − λc ≈ arg max{ [ 0 F (r, λ)W (r)dr + M 0 F (r, λ)F (r, λ )dr] − λc }(1.5.13) λ 1 2 λ∈Λ 0 F (r, λ) dr ∗ 1/2 δ where M = d(1) ≡ δT . d(1) 2. For the first difference model (1.2.2), d ˆ → λM S − arg max{ δ∗ [(λW (1) − W (λ)) + d(1) Ψ(λ, λc )]2 λ(1 − λ) λ∈Λ 13 } (1.5.14) where . Ψ(λ, λc ) = (1 − λc )λ, if λ ≤ λc , (1 − λ)λc , if λ > λc , which implies the approximation c 2 ˆ M S − λc ≈ arg max{ [(λW (1) − W (λ)) + M Ψ(λ, λ )] − λc }. λ λ(1 − λ) λ∈Λ (1.5.15) The limits in Theorem 1.5.1 are different from what we have seen before, but just as before, M shows up in the approximations, so we can directly compare them with the existing theory. Figure 1.3 compares the finite sample pdfs with the asymptotic pdfs of ˆ ˆ λT S and λM S from equations (1.3.8), (1.3.10), and Theorem 1.5.1 with λc = 0.5, when M = 2, 4, 6, 8 (i.e. δ = 0.2, 0.4, 0.6, 0.8; T = 100.) We can see the new asymptotic theory ˆ captures the density of λT S on the boundary of [0, 1]. It also tracks the unusual tail behavior ˆ ˆ ˆ of λM S , the large mass on the boundary, and predicts the densities of λT S and λM S in the middle of [0, 1] in finite sample cases as well. ˆ ˆ Also I compare λT S and λM S using the new asymptotics. Figure 1.4 shows the concenˆ ˆ ˆ tration patterns of λT S and λM S . Similar to Bai (1994) and PZ(2005), it shows that λT S concentrates more around λc for a small M . If we consider the high probability in a small ˆ ˆ area around λc as our criterion, we can see λT S can be a more precise estimator than λM S under small M . Why does this new asymptotic theory pick up the tail behavior better? To discover the effect of M on the limiting distributions, I decompose the terms inside arg max of equations (1.5.12) and (1.5.14) into two parts: . GT S (λ, λc ) = G1T S (λ) + M · G2T S (λ, λc ) 1 1 F (r, λ)W (r)dr F (r, λ)F (r, λc )dr . = 0 +M · 0 1 1 2 2 0 F (r, λ) dr 0 F (r, λ) dr and 14 (1.5.16) M=1 15 M=2 λMS 10 λMS λTS 10 λTS 5 5 0 0 0 1 0 0.5 M=3 8 1 M=4 6 λMS λTS 6 0.5 λMS λTS 4 4 2 2 0 0 0 1 0 0.5 M=5 6 1 M=6 8 λMS λTS 4 0.5 λMS λTS 6 4 2 0 0 2 0 1 0 0.5 M=7 10 0.5 1 M=8 λMS λTS 15 λMS λTS 10 5 5 0 0 0.5 0 1 0 0.5 1 ˆ ˆ Figure 1.4. Asymptotic pdf of λT S and λM S by Theorem 1.5.1 at λc = 0.5 and M = 1, 2, 3, 4, 5, 6, 7, 8. 15 . GM S (λ, λc ) = G1M S (λ) + M · G2M S (λ, λc ) Ψ(λ, λc ) . (λW (1) − W (λ)) +M · = λ(1 − λ) λ(1 − λ) (1.5.17) For conciseness, denote G1T S (λ, λc ) and G1M S (λ, λc ) as G1, G2T S (λ, λc ) and G2M S (λ, λc ) as G2, and GT S (λ, λc ) and GM S (λ, λc ) as G. From the decomposition we can see firstly that the asymptotics in Theorem 1.5.1 is continuous at M = 0, i.e., M could be as small as possible in the asymptotics. The existing theories by Bai and PZ need to assume there is a break. If there is no break, their distribution theory breaks down, generating a discontinuity in their asymptotic theory as M converges to zero, while the new approximation is continuous w.r.t. the magnitude of the break. ˆ ˆ Figure 1.5 depicts the finite sample histograms and the asymptotic pdfs of λT S and λM S when M = 0. The asymptotic pdfs are obtained by Theorem 1.5.1. Both finite sample ˆ histograms and asymptotic pdfs show that when there is no break, the tail behaviors of λT S ˆ ˆ ˆ and λM S are very different: λT S concentrates more in the middle while λM S concentrates ˆ more around {0} and {1}. The distribution of λT S when M = 0 goes less often to {0} and {1} but more to the middle, which is consistent with the results by Nunes, Kuan and ˆ Newbold (1995) and Bai (1998) about “spurious breaks”; while the distribution of λM S has peaks at {0} and {1}, and is flat in the middle. Theorem 1.5.1 picks up the tails as ˆ shown in Figure 1.5, where λT S has higher probability in the middle but lower probability ˆ on the boundary, while λM S has higher probability on the boundary but lower probability in the middle, and both pdfs are flat in the middle range of [0, 1]. Although under no break ˆ ˆ λT S is spurious, it forms a major source of the preciseness of λT S (in the sense of more concentration in the pdf at certain significance levels) when M is small and λc is around 0.5. With the form of (G1 + M · G2) in the limiting distributions, Theorem 1.5.1 provides a bridge between the δ = 0 asymptotics and the δ = 0 asymptotics. When M is small, 16 λT S : ρ=1, δ=0 λM S : ρ=1, δ=0 1500 1500 1000 1000 500 500 0 0 0.5 0 0 1 0.5 (a) 1 (b) Asymptotic pdf: M=0 15 λMS λTS 10 5 0 0 0.5 1 (c) ˆ ˆ Figure 1.5. Finite sample histograms and asymptotic pdf of λT S and λM S in the case of ˆ no breaks. (a) Histogram of λT S (N=30,000 replications and sample length T=100). (b) ˆ Histogram of λM S (N=30,000 replications and sample length T=100). (c) Asymptotic pdf ˆ ˆ of λT S and λM S under no breaks. 17 the random component G1 dominates G2 and the distribution tends to have the null tail. When M is significantly different from zero, G2 dominates G1 and M affects the limiting distribution through M · G2T S (λ, λc ) and M · G2M S (λ, λc ). Both of the G2 parts attain global maxima at the same place, λc , as shown in Figure 1.6. (For a detailed proof see Appendix A.3.) If M is big enough, the G2 parts are completely dominant in (G1+M ·G2), ˆ ˆ which makes λT S and λM S arbitrarily close to λc . Therefore, Theorem 1.5.1 explains why ˆ ˆ as M grows, λT S and λM S are consistent to some extent. For a moderately large to small ˆ M , the limiting distribution of λT S exhibits a shape of “ ˆ ” and λM S exhibits a shape of “w”, resulting from the mixed effects of G1 and G2 parts in the asymptotics. It is useful to see how the asymptotics approximates finite sample behaviors under different sample sizes. Figure 1.7 compares the finite sample distributions with the asymptotic theories by Bai (1994), PZ(2005), and Theorem 1.5.1 under different fixed M but various T s (M = 2, 4, 6, 8 and T = 100, 200, 500, 1000). We can see that if M is fixed, increasing the sample size does not improve the approximation of the asymptotics by Bai (1994) and PZ(2005) in finite samples. In contrast, the compatibility of the new approximation with the finite sample pdfs at T = 100, 200, 500, 1000 shows the approximation of Theorem 1.5.1 is adequate no matter what T is. The reason why increasing the sample size does not improve the approximation of the existing asymptotics in this case is because M is fixed. It is not T per se or δ per se but the relative magnitude of them that drives the shape of finite sample patterns of the break point estimator. This relative effect is picked up by M . For a given value of M , the finite sample behavior of the break point estimator is the same whether T is large and δ is small or T is small and δ is large. Figure 1.8 looks at fixed δ with T getting bigger. In that case, M is also getting bigger. As is expected that the asymptotic approximations of the existing theories are getting better as M increases, while Theorem 1.5.1 continues to provide a close approximation to finite ˆ ˆ sample behaviors of λT S and λM S as shown in this figure. One additional advantage of the new asymptotics lies in the finite sample approximation 18 λc=0.5 0.08 G2TS 0.06 0.04 0.02 0 0 0.2 0.4 0.6 0.8 1 0.6 0.8 1 λ c λ =0.5 0.5 G2MS 0.4 0.3 0.2 0.1 0 0.2 0.4 λ Figure 1.6. G2T S (λ, λc ) and G2M S (λ, λc ) in equation (1.5.16) and (1.5.17) when λc = 0.5. 19 pdf of λ pdf of λ ,M=2 TS ,M=2 TS 3 3 2 2 1 1 0 0 0 1 0 0.5 pdf of λ 0.5 pdf of λ ,M=4 TS 1 ,M=4 MS 8 6 6 4 4 2 0 0 2 0 1 0 0.5 pdf of λ 0.5 pdf of λ ,M=6 TS 1 ,M=6 MS 10 15 10 5 5 0 0 0.5 0 1 0 pdf of λTS,M=8 0.5 1 pdf of λMS,M=8 30 10 20 5 0 0 10 0.5 0 1 0 0.5 1 Figure 1.7. The finite sample pdf and the asymptotic pdf by Bai(1994) and PZ(2005) of ˆ ˆ λT S and λM S under fixed M = 2, 4, 6, 8, d(1) = 1, and T = 100, 200, 500, 1000. solid: T = 100; dash: T = 200; dot: T = 500; dash-dot: T = 1000; ’·’: pdf(Bai); ’o’: pdf(Theo. 1). 20 λ λTS:T=100,δ=0.4 :T=100,δ=0.4 MS 6 6 4 4 2 2 0 0 0.5 0 0 1 λTS:T=200,δ=0.4 0.5 λ 1 :T=200,δ=0.4 MS 10 8 6 5 4 2 0 0 0.5 0 0 1 λTS:T=500,δ=0.4 0.5 1 λMS:T=500,δ=0.4 10 20 15 5 10 5 0 0 0.5 0 0.3 1 λTS:T=1000,δ=0.4 0.4 0.5 0.6 0.7 λMS:T=1000,δ=0.4 50 40 10 30 20 5 10 0 0.2 0.4 0.6 0 0.4 0.8 0.45 0.5 0.55 0.6 Figure 1.8. The finite sample pdf and theoretical pdf by Bai(1994), PZ(2005), and Theoˆ ˆ rem 1.5.1 of λT S and λM S under fixed δ = 0.4, d(1) = 1, and T = 100, 200, 500, 1000. Solid: finite sample; ’·’: Theorem 1; dash: PZ(the left) or Bai(the right). 21 ˆ under different λc . The normalized asymptotics (the limiting distributions of λ − λc ) by Bai (1994) and PZ(2005) are invariant to λc . (See equation (1.3.5)and (1.3.6).) In contrast, the finite sample behavior depends on λc , and this is captured by Theorem 1.5.1. We can see in equations (1.5.13) and (1.5.15), G1, the leading term, does not depend on λc , while G2 does depend on λc and attains its maximum at λc ; hence the limiting distributions of ˆ ˆ λT S − λc and λM S − λc are functions of λc . Figure 1.9 compares the new and existing aymptotics with finite sample pdfs for λc = 0.2. Compared to the λc = 0.5 case in Figure 1.3, when λc = 0.5, the existing asymptotics miss the tail behavior to an even greater extent, while the new asymptotics nails them down. ˆ ˆ Figure 1.10 compares λT S and λM S at λc = 0.2 using the new asymptotics. Compared ˆ ˆ to the λc = 0.5 case in Figure 1.4, where the distributions of λT S and λM S are symmetric ˆ ˆ around λ = 0.5, the distributions of λT S and λM S here are asymmetric around λ = 0.2. ˆ ˆ The estimator at λ < 0.2 has a lower density than that at λ > 0.2 for λT S . For λM S , the behavior on the right and left side are reversed. This is fairly intuitive: when M is small, the δ = 0 asymptotics dominates, which results in this asymmetry in the distributions. We may also notice that GT S (λ, λc ) − λc is symmetric to GT S (λ, λc ) − λc around λ = 0, 1 1 2 2 . ˆ where λc = 1 − λc . This analysis also holds for λM S . 2 1 Trimming is used by most break tests to deal with unknown break dates. Trimming also affects the performance of break point estimators. Theorem 1.5.1 captures the effect of trimming well. The trimming affects the asymptotics through the set Λ in equation (1.5.12) and (1.5.14), which explains big differences in the tail behaviors. Figure 1.11 plots the asymptotic distributions and finite sample distributions with trimming of 0.05, 0.10, 0.15, and 0.20. As expected, with the increase of trimming, the pile up in the tails becomes more pronounced. Theorem 1.5.1 again captures the tail behavior well. ˆ ˆ The pattern of concentration around λc for λT S and λM S changes with different trimˆ ˆ ming. When trimming=0.05, λT S dominates λM S in the densities around λ = λc . With the ˆ increase of trimming, this dominant effect tends to reverse. When trimming=0.2, λM S has 22 λTS:M=2 λMS: M=2 10 2 1.5 5 1 0.5 0 0 0.5 λ 0 0 1 0.5 λ :M=4 TS 1 : M=4 MS 8 4 6 3 2 4 1 2 0 0 0.5 0 0 1 λTS:M=6 0.5 1 λMS: M=6 6 15 4 10 2 5 0 0 0.5 λ 0 0 1 0.5 λ :M=8 TS 1 : M=8 MS 30 8 6 20 4 10 2 0 0 0.5 0 0 1 0.5 1 Figure 1.9. The asymptotic pdfs of Bai(1994), PZ(2005) and Theorem 1.5.1 with the emˆ ˆ pirical pdfs of λT S and λM S . λc = 0.2; M = 2, 4, 6, 8; ut : I(1) errors; T = 100; and N = 30, 000. Solid: finite sample; dash-dot: Theorem 1; dash: PZ(left) or Bai(right). 23 pdf: M=1 15 pdf: M=2 15 λMS λTS 10 λTS 10 5 0 0 λMS 5 0.5 0 0 1 pdf: M=3 10 1 pdf: M=4 8 λMS λTS 8 0.5 λMS λTS 6 6 4 4 2 2 0 0 0.5 0 0 1 pdf: M=5 6 1 pdf: M=6 8 λMS λTS 4 0.5 λMS λTS 6 4 2 2 0 0 0.5 0 0 1 pdf: M=7 10 1 pdf: M=8 15 λMS λTS 8 0.5 λMS λTS 10 6 4 5 2 0 0 0.5 0 0 1 0.5 1 ˆ ˆ Figure 1.10. Asymptotic pdfs of λT S and λM S by Theorem 1.5.1 for λc = 0.2 and M = 1, 2, 3, 4, 5, 6, 7, 8. 24 pdf: λ∗ =0.05 4 λ 3 λMS TS 2 1 0 0 0.5 pdf: λ∗ =0.10 1 4 λTS 3 λMS 2 1 0 0 0.5 pdf: λ∗ =0.15 1 4 λTS 3 λMS 2 1 0 0 0.5 pdf: λ∗ =0.20 1 λTS 4 λMS 3 2 1 0 0 0.5 1 Figure 1.11. Pdfs of the break point estimators from the “TS-MS” models, under I(1) ut s, λc = 0.5, T = 100, M = 4(δ = 0.4), and different trimmings λ∗ = 0.05, 0.1, 0.15, 0.2. 25 higher density at λ = λc . Also the tails change according to different trimmings, especially ˆ for λT S . When trimming=0.05, it has little density in the tails. But when trimming=0.2, ˆ ˆ λT S has considerable mass in the tails while λM S does not change that much, which might ˆ be the reason why λM S becomes dominant. 1.6 Break Point Estimators of The Trend Shift Model and its Partial Sum Model ˆ ˆ Given that λT S can be more accurate than λM S when ut ∼ I(1), it might be possible that when ut ∼ I(0), a more precise estimator of λc can be obtained by partial summing the model and inducing a unit root in the error. Similar to the “TS-MS” models, I define a second pair of models: the level trend shift model (TS), and its partial sum, the quadratic shift model (QS). The level model is still the trend shift model (1.2.1). However, the noise assumption is changed from I(1) errors to I(0), defined in assumption (A1.c). (A1.c) ut = et , where t = 1, · · · , T, and et is defined in (A1.a). Because of I(0) errors, the break magnitude δ is assumed to be within a T −3/2 neighborhood of zero using the assumption (A2.d) ∗ δ = δ , δ ∗ = a constant scalar. T 3/2 (1.6.18) Under assumption (A1.c) and (A2.d), for the trend shift model under I(0) errors, PZ(2005) prove that 2 4 d(1) d ˆ T 3/2 (λT S − λc ) − N (0, c → c ) δ 2 ). λ (1 − λ 3/2 . If we define M = δT , PZ’s result can be rewritten as d(1) 26 (1.6.19) 4 1 ˆ λT S − λc ≈ N (0, c c ) M 2 ). λ (1 − λ Taking the partial sum of model (1.2.1), we can obtain the partial sum model: 1 St = αt + β (t2 + t) + δ 2 t DTj (λc ) + vt (1.6.20) DTj (λc ) + εt . (1.6.21) j=1 vt = vt−1 + et , t = 1, · · · , T, . where St = . t j=1 yj , vt = t j=1 uj . We rewrite the partial sum model as β β St = (α + )t + t2 + δ 2 2 . . . Define α∗ = α + β , β ∗ = β , and DQt (λc ) = 2 2 t j=1 t c j=1 DTj (λ ), a quadratic shift. Equation (1.6.21) is expressed as St = α∗ t + β ∗ t2 + δDQt (λc ) + εt . (1.6.22) 0 Define SSRQS and SSRQS (λ) as SSR0 and SSR(λ) for the quadratic shift model 0 (1.6.22). SSRQS and SSRQS (λ) are calculated as 0 SSRQS . = . SSRQS (λ) = T ˜ ˜ [St − (α∗ t + β ∗ t2 )]2 , t=1 T ˆ ˆ ˆ [St − (α∗ t + β ∗ t2 + δDTt (λ))]2 , t=1 ˜ ˜ ˆ ˆ where α∗ and β ∗ are the OLS estimates of model (1.6.22) with δ = 0 imposed; α∗ , β ∗ , and ˆ δ are the OLS estimates of model (1.6.22) with no restrictions imposed. ˆ The break point estimator λQS is obtained by minimizing the SSRQS (λ): ˆ λQS = arg min SSRQS (λ). λ∈Λ (1.6.23) ˆ Similar to what we see before in the distribution of λT S from the trend shift model under I(1) errors, there is also a discrepancy between their asymptotic theory and the finite 27 λT S : δ=0 λQS : δ=0 4 8 3 6 2 4 1 2 0 0.2 0.4 0.6 0 0.8 λT S : δ=0.02 6 0.2 0.4 0.6 λQS : δ=0.02 0.2 0.4 0.6 0.8 λQS : δ=0.04 0.2 0.4 0.6 λQS : δ=0.1 0.8 0.2 0.4 0.8 3 4 2 2 0.8 1 0 0.2 5 0.4 0.6 λT S : δ=0.04 0 0.8 4 4 3 3 2 2 1 1 0 0.2 0.4 0.6 λT S : δ=0.1 0 0.8 10 8 8 6 6 4 4 2 2 0 0.2 0.4 0.6 0 0.8 0.6 ˆ Figure 1.12. Finite and asymptotic pdf by Theorem 1.6.2 and PZ(2005) (only for λT S ) ˆ ˆ of λT S and λQS (“TS-QS”) under I(0) ut s, λc = 0.5, T = 100, d(1) = 1, and δ = 0, 0.02, 0.04, 0.1. Solid: finite sample; dash-dot: Theorem 2; dash: PZ(left) or Bai(right). 28 sample behaviors of this estimator. On the left column of Figure 1.12 are the finite sample ˆ histograms (T = 100) of λT S (The histograms on the right column are for the break point estimator from the partial sum model which will be analyzed in Theorem 1.6.2). The finite sample histograms exhibit not only non-normal distributions but also the complicated tail ˆ behaviors of the break point estimator. The limiting distribution for λQS is not available in the literatures. Using the same approach as Theorem 1.5.1, I derive new asymptotic limits ˆ ˆ of λT S and λQS from the trend shift model under I(0) errors and its partial sum model. The proof is similar to that of Theorem 1.5.1. (See Appendix A.5) Theorem 1.6.2 Suppose the regressions of the level model (1.2.1) and its partial sum c . (1.6.20) are estimated using λ ∈ Λ ⊆ (0, 1) and Tb = λc T is the true break. Under assumption (A1.c) and (A2.d), the break point estimators by minimizing the SSR(λ) have the limiting distributions as follows. 1. For the level model (1.2.1) under I(0) errors (A1.c), ˆ λT S ⇒ arg max{ 1 1 δ∗ [ 0 F (r, λ)dW (r) + d(1) 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr λ∈Λ } (1.6.24) where λ3 − 2λ2 + λ − (2λ3 − 3λ2 + 1)r, if r ≤ λ, λ3 − 2λ2 − (2λ3 − 3λ2 )r, if r > λ. . F (r, λ) = which implies the approximation 1 1 c 2 ˆ T S − λc ≈ arg max{ [ 0 F (r, λ)dW (r) + M 0 F (r, λ)F (r, λ )dr] − λc },(1.6.25) λ 1 2 λ∈Λ 0 F (r, λ) dr ∗ 3/2 δ . where M = d(1) ≡ δT d(1) 2. For the partial sum model (1.6.20), ˆ λQS ⇒ arg max{ λ∈Λ 1 1 δ∗ [ 0 Q(r, λ)W (r)dr + d(1) 0 Q(r, λ)Q(r, λc )dr]2 1 2 0 Q(r, λ) dr where 29 } (1.6.26) 2 4 5 . (r − λ)2 2 − 2λ4 + λ5 )r − ( 1 − 5λ + 5λ − 4λ )r 2 . 1(r > λ) − (−λ + 2λ Q(r, λ) = 2 2 3 2 3 which implies the approximation 1 [ 1 Q(r, λ)W (r)dr + M 0 Q(r, λ)Q(r, λc )dr]2 ˆ λQS − λc ≈ arg max{ 0 − λc }. 1 2 dr λ∈Λ 0 Q(r, λ) (1.6.27) Similar to Theorem 1.5.1, the distribution can be decomposed into two parts: one is determined by the null asymptotic distribution and the other is determined by the break magnitude. Also, the asymptotic distributions are continuous in the break magnitude. For ˆ ˆ the same reason, both λT S and λQS are consistent to some extent when M is big. Figure 1.12 shows similar patterns of a comparison between the break point estimators from the “TS-QS” models and those from “TS-MS” . Generally with small δ ∗ , the break point estimator from the partial sum model has thinner tails than from the level model. This means that for small values of the break magnitude, it would be better to use the partial sum model to obtain the break point estimator. 1.7 Application to One-step Ahead Forecasts ˆ ˆ The previous analysis shows λT S could be more precise than λM S , which means choosing ˆ λT S might be sensible in applications that use break point estimates, such as modeling, ˆ tests, and forecasts. In this section, we will see if the thinner tails of λT S can result in better ˆ focasts compared to λM S . Ng and Vogelsang (2002)(NV hereafter) discuss the forecasting of the dynamic time series in the presence of the deterministic components, where the MSE of the forecast is considered as a criterion to evaluate different modeling approaches. Two approaches, OLS1 and OLS2 , defined in NV(2002), are used for the trend shift model in this chapter. The definition of these approaches are described as follows. 30 Model (1.2.1) is the focus, which is the same model used by NV(2002). Let mt denote . the deterministic part of yt and mt = µ + βt + δDTt (λc ). The error is defined as ut = αut−1 + εt . (1.7.28) where εt ∼ i.i.d. N (0, 1). The assumption on ut is extended to cover both I(1) and I(0) cases by allowing |α| ≤ 1. 1. OLS1 The data generating process (DGP) can be written as yt = d0 + d1 t + d2 DTt (λc ) + αyt−1 + εt . (1.7.29) The feasible one-step forecast: ˆ ˆ ˆ yt+1|T = d0 + d1 t + d2 DTt (λc ) + αyt−1 + εt . ˆ (1.7.30) ˆ ˆ ˆ ˆ The OLS1 approach first applies OLS to equation (1.7.29) to obtain d0 , d1 , d2 , α, and ut ; ˆ then the estimated parameters are used in equation (1.7.30) to obtain the yT +1|T . 2. OLS2 The feasible one-step forecast is given by yT +1|T = mT +1 + α(yT − mT ). ˆ ˆ ˆ ˆ (1.7.31) ˆ The OLS2 approach first applies OLS to equation (1.2.1) to obtain ut , µ, β, and δ; then ˆ ˆ ˆ applies OLS to equation (1.7.28) with ut to obtain α; finally, yT +1|T is obtained based on ˆ ˆ ˆ equation (1.7.31). ˆ ˆ The closer λ is to the true break λc , the closer DTt (λ) is to DTt (λc ), and hence the less model misspecification is a concern. A smaller mismatch between the estimated model and the true model leads to a smaller MSE of forecast. I compare the MSE of one-step forecasts by OLS1 and OLS2 to illustrate the effect of different accuracies of the break ˆ ˆ point estimators λT S and λM S on the forecasts. 31 Table 1.1. Mean squared error (MSE) of one-step ahead forecasts of the trend shift model (1.2.1) under I(1) errors (For one-step forecast yt+1 , the MSE is defined as (ˆt+1 −yt+1 )2 .) ˆ y c = 0.5; δ = 0, 0.1, 0.2, 0.3, 0.4, 0.5; u is I(1); T = 101; and Simulation settings: λ t ∗ and OLS ∗ assume that u is known to be I(1). N = 10, 000. OLS1 t 2 δ 0 0.1 0.2 0.3 0.4 0.5 OLS1 ˆ T S λM S ˆ λ 1.214 1.300 1.208 1.301 1.185 1.286 1.174 1.258 1.159 1.231 1.148 1.196 OLS2 ˆ T S λM S ˆ λ 1.240 1.267 1.230 1.269 1.209 1.290 1.195 1.302 1.181 1.301 1.167 1.291 ∗ OLS1 ˆ ˆ λT S λ M S 1.107 1.323 1.109 1.317 1.093 1.282 1.091 1.237 1.083 1.197 1.078 1.162 ∗ OLS2 ˆ ˆ λT S λ M S 1.049 1.090 1.046 1.089 1.033 1.090 1.025 1.087 1.016 1.076 1.010 1.063 First, I provide simulation results with the setting: λc = 0.5; δ = 0, 0.1, 0.2, 0.3, 0.4, 0.5; ut is I(1); T = 101; and N = 10, 000. Table 1.1 gives the OLS1 and OLS2 MSE of oneˆ ˆ step forecasts with λT S and λM S under different δ. The MSE of one-step forecasts using ˆ ˆ λT S are smaller than those using λM S with both OLS1 and OLS2 methods for all δ s ˆ in this example. This happens because λT S concentrates more around λc and has thinner ˆ ˆ tails than λM S , which leads to less misspecification in DTt (λ). With the same break point estimator, the MSE by OLS2 are mostly bigger than OLS1 , which is consistent to the conclusion in NV(2002). ˆ ˆ Next, I describe an empirical illustration of forecast errors using λT S and λM S . Similar to PZ(2005), I estimate the break points and calculate one-step forecast of the annual (log) real per capita GDP series between 1870 and 1996. All the data are taken from the Groningen Growth and Development Centre 2 (See Figure 1.13 and 1.14). World War I and II along with other factors may affect the location of possible breaks in different ways on different countries. Sweden seems to have a break in around 1920, while Italy more likely has the break in around 1945. To choose the series with errors that are likely I(1), I follow PZ(2005) and use the series of Italy, Norway and Sweden (See Figure 1.13). I use the trend 2 http://www.ggdc.net/maddison/Historical Statistics. 32 Figure 1.13. The real (log) per capita GDP of Italy, Norway, and Sweden, which are of I(1) errors. x-axes: year; y-axes: (log)Per Capita GDP. 33 Figure 1.14. The real (log) per capita GDP of Australia, Canada, Germany, UK, and US, which are of I(0) errors. x-axes: year; y-axes: (log)Per Capita GDP. 34 Table 1.2. MSE in one-step forecast with the log real per capita GDP series (1870-1996) ˆ ˆ with trend shift model (1.2.1) and different break point estimators (λT S and λM S ), where MSE of one-step forecast are calculated for 1987-1996. Country Italy Norway Sweden OLS1 ˆ ˆ λT S λM S 0.0016 0.0017 0.0003 0.0006 0.0014 0.0015 OLS2 ˆ ˆ λT S λM S 0.0018 0.0015 0.0002 0.0008 0.0016 0.0014 shift model (1.2.1) for parameter estimation. The one-step forecast is only applied on the data from year 1987 to 1996, based on estimated model using data of the whole period prior to each forecasted year. Consider the one-step forecast in 1990 as an example. It is based on the model estimated using the data from 1970 to 1989, and the MSE error is computed as (ˆ1990 − y1990 )2 . The overall error is the average MSE over these years. Table 1.2 lists y the average MSE errors of forecast for the real per capita GDP from 1987 to 1996 by OLS1 ˆ ˆ and OLS2 . For OLS1 , the MSE using λT S is smaller than that using λM S for all three countries. For OLS2 , it is hard to conclude which break point estimator leads to a smaller MSE. Since some series have I(0) errors around the trend and Theorem 1.6.2 reveals the advanˆ ˆ tage of λQS over λT S under small break magnitude, it would be interesting to look at both ˆ ˆ the Monte Carlo simulations and the empirical data to see how λQS and λT S behave in ˆ finite samples. Table 1.3 lists the MSE of one-step forecast by OLS1 and OLS2 with λT S ˆ and λQS under different δ. The setting is: λc = 0.5; δ = 0, 0.01, 0.02, 0.03, 0.04, 0.05; ut ˆ is I(0); T = 101; and N = 10, 000. The MSE of one-step forecast using λQS are smaller ˆ than those using λT S by both OLS1 and OLS2 for all δ s, which is what we expected. ˆ Next λQS is used to see whether its more concentration around λc can lead to smaller ˆ MSE in one-step forecast than λT S for the GDP series. To choose the (log) GDP series with I(0) errors, I follow PZ(2005) and choose Australia, Germany, United Kingdom, and United States. Similar to the previous application, I choose the data from 1870 to 1996. 35 Table 1.3. Mean squared error (MSE) of one-step ahead forecasts of the trend shift model (1.2.1) under I(0) errors (For one-step forecast yt+1 , the MSE is defined as (ˆt+1 − yt+1 )2 ˆ y c = 0.5; δ = 0, 0.01, 0.02, 0.03, 0.04, 0.05; u is for point forecasts.) Simulation settings: λ t ∗ and OLS ∗ assume that u is known to be I(0). I(0); T = 101; and N = 10, 000. OLS1 t 2 δ 0 0.01 0.02 0.03 0.04 0.05 OLS1 ˆT S ˆ λ λQS 1.454 1.158 1.417 1.152 1.336 1.134 1.261 1.110 1.191 1.098 1.150 1.094 OLS2 ˆT S ˆ λ λQS 1.453 1.155 1.416 1.149 1.335 1.132 1.260 1.108 1.190 1.096 1.149 1.092 ∗ OLS1 ˆ ˆ λT S λQS 2.539 2.204 2.502 2.202 2.400 2.197 2.303 2.188 2.238 2.179 2.204 2.173 ∗ OLS2 ˆ ˆ λT S λQS 2.124 2.031 2.114 2.030 2.091 2.026 2.068 2.024 2.049 2.022 2.037 2.022 Table 1.4. MSE in one-step forecast with the log real per capita GDP series (1870-1996) ˆ ˆ with trend shift model (1.2.1) and different break point estimators (λT S and λQS ), where MSE of one-step forecast is calculated for 1987-1996. Country Australia Germany United Kingdom United States OLS1 ˆ ˆ λQS λT S 0.0035 0.0007 0.0028 0.0012 0.0004 0.0005 0.0003 0.0002 OLS2 ˆ ˆ λQS λT S 0.0033 0.0021 0.0031 0.0015 0.0004 0.0009 0.0002 0.0003 Figure 1.14 shows the raw data of the (log) real per capita GDP of these countries. I use the same forecast methods on this data set. The one-step forecasts are provided from 1987 to 1996, and the overall error is the average MSE over 10 years. Table 1.4 lists the average ˆ MSE of one-step forecast for the real per capita GDP from 1987 to 1996 when λT S and ˆ λQS are used. We can see that, for United Kingdom and United States, the MSE of one-step ˆ ˆ forecast is similar when λT S and λQS are applied under both OLS1 and OLS2 forecasts. ˆ For Australia and Germany, MSE of one-step forecast with λQS is considerably lower. 36 1.8 Conclusions ˆ In this chapter, I derive a new asymptotic theory for two break point estimators: one (λT S ) ˆ is from the trend shift model and the other (λM S ) is from the first difference, the mean shift model. Existing theories do not fully capture the finite sample behaviors, especially the tail behavior of the finite sample distributions with small break magnitude. This discrepancy is stronger when the true break is not in the middle of the sample. To better approximate the finite sample distributions, a new asymptotic theory is developed under the assumption that the break magnitude is within a small neighborhood of zero. The new asymptotic ˆ ˆ theory captures the finite sample behaviors of λT S and λM S , especially the tails in the ˆ ˆ densities. Under the same break magnitude, λT S and λM S are compared in precision using the new asymptotics. Both theoretical analysis and simulations reveal that, under ˆ ˆ small break magnitude, λT S concentrates more around the true break. Using λT S instead ˆ of λM S can decrease the MSE in one-step ahead forecasts. There are other potentially interesting topics accompanying the comparison of the break point estimators using the new approximation. A possible improvement of break tests could be achieved if we choose the break estimator properly according to break magnitude in a data dependent way. Also, this limiting distribution analysis of the single break estimators would help the research on the multiple break point estimates, e.g. the break point estimates in the presence of under-specification of the break numbers. 37 CHAPTER 2 Fixed-b Analysis of LM Type Tests for a Shift in Mean 2.1 Introduction In this chapter we provide a theoretical analysis of lagrange multiplier (LM ) tests for a shift in the mean of a univariate time series at an unknown date. We consider a class of LM statistics based on nonparametric kernel estimators of the long run variance. The main theoretical contribution of this part is to develop a fixed-b asymptotic theory for the long run variance estimator. The fixed-b limit of the LM statistics depends on the kernel and bandwidth needed to implement the long run variance estimator and the fixed-b limit also depends on the magnitude of the mean shift under the alterative. This allows us to theoretically capture to impact of the choice of bandwidth on both the size and power of the tests. In particular we show that the bandwidth plays an important role on determining whether the tests exhibit non-monotonic power (power that is not necessarily increasing as the magnitude of the mean shift increases). Small bandwidths lead to tests with monotonic power whereas large bandwidths lead to non-monotonic power. We derive fixed-b results for both the case of stationary I(0) errors and nearly integrated 38 I(1) errors. We obtain an unexpected and very useful finding. There exist bandwidths such that, for a given significance level, the critical values of the LM statistics are the same for both I(0) and I(1). Use of these ”robust” bandwidths and the associated fixed-b critical values provides tests that are asymptotically robust to whether the errors are I(0) or I(1). Such a simple way of obtaining robustness to the strength of serial correlation in the errors should appeal to empirical researchers. Our robust LM tests complement the I(0)/I(1) robust tests in literature that have been developed for Wald-type tests. See Vogelsang (1997), Vogelsang (1998) and Sayginsoy and Vogelsang (2010). While the various I(0)/I(1) robust tests are asymptotically valid whether the errors are I(0) or I(1), in finite samples the Wald-type tests tend to over-reject when there is a negative moving average component and an autoregressive root near one in the errors. In contrast, the robust LM tests do not over-reject in this case although they do tend to over-reject when there is a negative moving average component but no autoregressive component is present. These complementary finite sample properties of robust W ald and LM tests could be exploited to provide more robust inference overall. The approach and analysis in this chapter is related to some recent papers in the econometrics literature on LM tests for a shift in mean. The possibility of non-monotonic power of LM tests for a shift in mean where documented by Vogelsang (1999). The reason that power can be non-monotonic is simple. LM statistics use long run variance estimators based on residuals from the model estimated under the null hypothesis of no mean shift. Therefore, when there is a shift in mean, the long run variance estimator is not invariant to the magnitude of the mean shift. A large shift in mean can cause the denominator of an LM statistic to be large and this can cause power to be low. While Vogelsang (1999) pin-pointed the long run variance estimator as the source of non-monotonic power, he did not examine the role played by the choice of bandwidth. A recent paper by Kejriwal (2009) proposed the use of a hybrid long run variance estimator that can restore monotonic power to LM tests. The hybrid estimator blends components of long run variance estimators based on 39 null and alternative residuals. We show in this chapter that there is a direct link between the statistics proposed by Kejriwal (2009) and the LM statistics based on a specific bandwidth choice. Our theory shows that an explanation for the monotonic power of the statistics proposed by Kejriwal (2009) is the use of a data dependent bandwidth based on alternative residuals. The results in this chapter on non-monotonic power add to a small but growing literature on non-monotonic power of tests for a shift mean. This literature was started by Perron (1991) where simulation results were given for some well known tests for a shift in mean. Vogelsang (1999) provided some theoretical explanations for non-monotonic power for a large group of statistics. Other papers have given results for specific statistics with contributions by Vogelsang (1997), Crainiceanu and Vogelsang (2007), Deng and Perron (2008), Juhl and Xiao (2009) and Kejriwal (2009). Our research parallels the work by Crainiceanu and Vogelsang (2007) in establishing a direct link between the bandwidth and non-monotonic power. Crainiceanu and Vogelsang (2007) showed that the bandwidth choice of the long run variance estimator is directly linked to non-monotonic power of CUSUM and related tests for a shift in mean. The remainder of the chapter is organized as follows. In the next section we describe the model and lay out the assumptions. In Section 3 we define the statistics and their finite sample properties are illustrated Section 4. In Section 5 we develop the fixed-b asymptotic theory for the LM tests and show that the fixed-b theory explains the important finite sample patterns. In Section 6 we compute the I(0)/I(1) ”robust” bandwidths and examine their finite sample performance when used with fixed-b critical values. For the most part, the bandwidths effectively control the over-rejection problem caused by strong serial correlation and in some cases retain good power. Section 7 establishes a direct relationship between the hybrid Wald statistics proposed by Kejriwal (2009) and the LM statistics. The proof of the main theoretical result of this chapter is given in an appendix. 40 2.2 Model and Assumptions Consider a simple mean shift model yt = µ + δDUt (Tb ) + ut , t = 1, 2, ..., T, (2.2.1) where DUt (Tb ) = 1(t > Tb ) and 1(·) is the indicator function. We denote the true break 0 data as Tb , and following standard practice in the structural change literature, we assume 0 that the break point, λ0 = Tb /T , remains fixed as the sample size increases. Throughout this chapter we assume that λ0 in unknown. Following Canjels and Watson (1997) and Bunzel and Vogelsang (2005) among others we assume the error term is given by ut = ρut−1 + εt , t = 1, · · · , T ∞ ∞ i|di | < ∞, d(1)2 > 0 di Li , εt = d(L)et , d(L) = (2.2.2) (2.2.3) i=0 i=0 where L is the lag operator, {et } is a martingale difference sequence with supt E(e4 ) < ∞, t E(et |et−1 , et−2 , · · · ) = 0 and E(e2 |et−1 , et−2 , · · · ) = 1. When |ρ| < 1, the errors are t I(0) and when ρ = 1 − c/T , where c is a constant the errors are nearly I(1). The pure unit root error case is given when c = 0. Under assumptions (2.2.2) and (2.2.3), some standard results are (see, for example, Phillips (1987)): [rT ] T −1/2 ut ⇒ σW (r) if |ρ| < 1, t=1 T −1/2 u[rT ] ⇒ d(1)Vc (r) if ρ = 1 − c , T r where σ 2 = d(1)2 /(1 − ρ)2 , W (r) is a standard Wiener process, Vc (r) = 0 exp{−c(r − s)}dW (s), [rT ] is the integer part of rT where r ∈ [0, 1] and ⇒ denotes weak convergence. The parameter σ 2 needs to be estimated in order to test the null hypothesis that the mean of yt is stable. Here we focus on the class of nonparametric spectral density estimators given by, T −1 T k(j/m)˜j , γj = T −1 γ ˜ σ 2 (m) = γ0 + 2 ˜ ˜ j=1 ut ut−j ; ˜ ˜ t=j+1 41 where ut = yt − y are the OLS residuals from regression (2.2.1) with δ = 0 (no ˜ ¯ shift in mean) imposed on the model. As usual, k(x) is the kernel function and m is the bandwidth (or truncation lag for kernels that truncate). A kernel is labelled type 1 if k (x) is twice continuously differentiable everywhere, and as a type 2 kernel if k (x) is continuous, twice continuously differentiable everywhere except at |x| = 1 and k (x) = 0 for |x| ≥ 1. For type 2 kernels define the derivative from the left at x = 1 as k− (1) = limh→0 [(k(1) − k(1 − h)) /h]. 2.3 LM Tests for a Shift in Mean We focus on testing the null hypothesis that there is no shift in mean: H0 : δ = 0, against the alternative H1 : δ = 0. For a given break date, Tb define the LM test as LM (Tb , m) = where SSR0 = SSR0 − SSR(Tb ) σ 2 (m) ˜ T ˜2 t=1 ut is the sum of squared residuals under the null hypothesis and SSR(Tb ) is the sum of squared residuals from the regression yt = µ + δDUt (Tb ) + ut . (2.3.4) Because we treat the break date as unknown, we follow Andrews (1993) and Andrews and Ploberger (1994) and consider supremum and mean tests of the form M eanLMm = T −1 LM (Tb , m), Tb ∈Λ∗ 42 SupLMm = sup LM (Tb , m) Tb ∈Λ∗ ∗ ∗ ∗ where Λ∗ = {Tb , Tb + 1, · · · , T − Tb } is the set of possible break dates. The pa∗ rameter λ∗ = Tb /T is held fixed as T increases and λ∗ determines the amount of trim- ming used in computing the statistics. Note that because Tb only shows up through the −SSR(Tb ) component of LM (Tb , m), it follow that SupLMm = LM (Tb , m) where Tb = arg minT ∈Λ∗ SSR(Tb ). b 2.4 Finite Sample Behavior of the LM Tests In this section we use a simple simulation design to illustrate the impact of the bandwidth choice on the performance of the Mean and Sup LM statistics. We generate data according model (2.2.2) for the case where d(L) = 1 and et is an iid N (0, 1) process, i.e. a Gaussian AR(1) model. We set µ = 0 without loss of generality. We focus exclusively on the quadratic spectral (QS) kernel and we consider two data dependent bandwidth rules for m based on Andrews (1991) using the AR(1) plug-in method. In the first case we use the null OLS residuals, ut when computing the AR(1) estimate needed for the bandwidth formula. In the second case we compute the AR(1) estimate using the alternative OLS residuals ut (Tb ) = yt − µ − δDUt (Tb ) where µ, δ are the OLS estimates from regression (2.3.4) using the estimated break date Tb = arg minT ∈Λ∗ SSR(Tb ) We denote the respective bandwidths by m and m. We b report results for the sample size T = 120 and we use 5, 000 replications in all cases. We use 15% trimming, i.e. λ∗ = 0.15. We compute rejection probabilities using critical values taken from the I(0) asymptotic distribution of the statistics using results in Andrews (1993) and Andrews and Ploberger (1994) which require consistency of σ 2 (m). Results ˜ are reported for the nominal level of 0.05. Empirical null rejections are reported in Table 2.1 and 2.2. The first column gives the values of ρ used in the simulations. Columns two through four give results for the LM tests using the null bandwidth, m and whereas columns five through seven give results using the 43 alternative bandwidth, m. For each bandwidth we report the average, across replications, of the bandwidth relative to the sample size: b = m/T and b = m/T . Some interesting patterns appear in the table. When the data is iid (ρ = 0), both data dependent bandwidths are small and the tests have rejections not far from the nominal level. As ρ increases and serial correlation becomes stronger, both bandwidths increase although b increases faster than b; in fact b is quite large when ρ = 1. This makes sense because b is based on an AR(1) estimate that has less downward bias and is hence closer to one when ρ is close to one and this inflates the bandwidth. For the M eanLM statistics, we see that rejections become larger than 0.05 as ρ becomes closer to one with severe over-rejections when ρ = 1. This is the usual over-rejection problem cased by a unit root in the error. The pattern for the SupLM statistics are different. As ρ increases, both SupLM statistics tend to under-reject and when ρ = 1, SupLMm substantially under-rejects whereas SupLMm slightly overrejects. It is surprising that SupLMm under-rejects in the unit root case. Clearly there is a complicated relationship between the values of ρ, the bandwidths and whether a test tends to over-reject or under-reject. Figures 2.1, 2.2, 2.3 and 2.4 depict finite sample power for the case where the break occurs in the middle of the sample (λ0 = 0.5). The four panels correspond to the values of ρ = 0.0.0.5, 0.9, 1.0. For the cases where ρ < 1, we see that power is non-monotonic when the data dependent bandwidth is computed under the null. This finding of non-monotonic power for LM-type tests was also documented by Vogelsang (1999) and Kejriwal (2009). Interestingly, if the data dependent bandwidth is based on alternative residuals, we see that the LM tests have monotonic power. This suggests the bandwidth has an important effect on finite sample power functions. Table 2.3 and 2.4 and reports the average bandwidth ratios (across replications) for the four values of ρ and a grid of values for the mean shift magnitude δ. Notice that as δ increases, b steadily grows and can become very large when the mean shift is large. This is not surprising given the well known result of Perron (1990). Because the AR(1) parameter is being estimated using null residuals, the estimated AR(1) 44 Table 2.1. Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, QS Kernel. ρ 0 0.5 0.7 0.9 1 Panel A: T = 120 ave(b) M eanLMm SupLMm ave(b) M eanLMm ˜ ˜ ˆ 0.012 0.022 0.013 0.012 0.052 0.048 0.026 0.006 0.044 0.069 0.082 0.018 0.001 0.072 0.077 0.199 0.002 0.002 0.142 0.096 0.589 0.219 0.041 0.214 0.265 Panel B: Fixed-b Asymptotic Rejections b M eanLM SupLM 0.02 0.049 0.044 I(1), c = 60 0.04 0.06 0.08 0.092 0.065 0.051 0.047 0.016 0.006 I(1), c = 36 0.06 0.08 0.10 0.096 0.069 0.051 0.025 0.006 0.002 I(1), c = 12 0.12 0.14 0.16 0.18 0.20 0.22 0.093 0.063 0.041 0.023 0.014 0.008 0.000 0.000 0.001 0.001 0.003 0.005 I(1), c = 0 0.18 0.20 0.22 0.58 0.60 0.281 0.182 0.077 0.059 0.071 0.000 0.000 0.000 0.087 0.098 I(0) 45 SupLMm ˆ 0.04 0.038 0.028 0.012 0.016 Table 2.2. Null Rejection Probabilities Using Standard (b = 0) I(0) Critical Values, 5% Nominal Level, 15% Trimming, Bartlett Kernel. Panel A: T = 120 ρ 0 0.5 0.7 0.9 1 ave(b) M eanLMm ˜ 0.012 0.055 0.088 0.185 0.501 0.051 0.079 0.085 0.066 0.191 SupLMm ˜ 0.037 0.042 0.023 0.002 0.012 ave(b) M eanLMm ˆ 0.013 0.050 0.078 0.140 0.197 0.056 0.090 0.104 0.162 0.501 Panel B: Fixed-b Asymptotic Rejections b M eanLM SupLM 0.02 0.048 0.041 I(1), c = 60 0.04 0.06 0.08 0.139 0.099 0.077 0.106 0.047 0.023 I(1), c = 36 0.06 0.08 0.10 0.149 0.110 0.090 0.079 0.033 0.012 I(1), c = 12 0.12 0.14 0.16 0.18 0.20 0.22 0.175 0.141 0.111 0.087 0.069 0.050 0.004 0.001 0.000 0.001 0.001 0.001 I(1), c = 0 0.18 0.20 0.22 0.50 0.52 0.478 0.434 0.382 0.016 0.019 0.000 0.000 0.000 0.020 0.026 I(0) 46 SupLMm ˆ 0.042 0.058 0.051 0.035 0.042 Table 2.3. Finite Sample Behavior of Data Dependent Bandwidth to Sample Size Ratios, T = 120, QS Kernel. ρ=0 δ 0 0.5 1 1.5 2 3 4 5 6 7 8 9 10 25 50 100 ρ = 0.5 ρ = 0.7 ρ = 0.9 ρ=1 ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) 0.012 0.014 0.023 0.035 0.049 0.082 0.120 0.162 0.207 0.252 0.297 0.342 0.385 0.771 0.921 0.969 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.048 0.051 0.059 0.072 0.088 0.128 0.176 0.228 0.282 0.336 0.388 0.437 0.484 0.834 0.942 0.975 0.044 0.044 0.045 0.045 0.045 0.046 0.046 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.082 0.085 0.092 0.104 0.120 0.162 0.212 0.266 0.322 0.377 0.429 0.479 0.524 0.853 0.947 0.976 0.072 0.072 0.072 0.072 0.073 0.074 0.075 0.076 0.077 0.078 0.078 0.078 0.078 0.078 0.078 0.078 0.199 0.201 0.207 0.218 0.231 0.268 0.314 0.364 0.416 0.467 0.515 0.559 0.600 0.870 0.947 0.974 0.142 0.142 0.142 0.141 0.141 0.142 0.143 0.147 0.152 0.156 0.161 0.165 0.167 0.171 0.171 0.171 0.589 0.589 0.590 0.592 0.594 0.599 0.607 0.615 0.625 0.636 0.647 0.660 0.673 0.821 0.905 0.950 0.214 0.214 0.214 0.214 0.214 0.214 0.216 0.220 0.227 0.235 0.245 0.256 0.268 0.334 0.334 0.334 parameter approaches one as δ increases and this inflates the bandwidth. In contrast, b changes very little as δ increases when ρ is small and b increases much more slowly as δ increases when ρ is close to one. This reflects the fact that b is based on the alternative 0 residuals which are nearly invariant to δ when Tb is close to Tb . It appears that large bandwidths are leading to tests with non-monotonic power. To see the link between the bandwidth and monotonic power more clearly, we simulated finite sample for the case of T = 120 and ρ = 0.7 using both data dependent bandwidths and several fixed values of b. Results for the M eanLM and SupLM statistics are given, respectively, in the top panels of Figures 2.5, 2.6, 2.7 and 2.8. For M eanLM we see that power is monotonic for b = 0.02, 0.1. For b = 0.18 power is lower but is still monotonic. By just increasing b to 0.2, power suddenly becomes non-monotonic. This is the equivalent of changing m from 22 to 24 given the sample size of 120. As b increases further power completely collapses. Similar patterns hold for SupLM although the change from monotonic power to non-monotonic power happens more quickly as b increases. Because 47 ρ=0 1 M eanLMm ˜ M eanLMm ˆ 0.8 Power SupLMm ˜ SupLMm ˆ 0.6 0.4 0.2 0 0 2 4 6 8 10 δ ρ=0.5 1 M eanLMm ˜ 0.8 M eanLMm ˆ Power SupLMm ˜ SupLMm ˆ 0.6 0.4 0.2 0 0 2 4 6 8 δ Figure 2.1. Finite Sample Power, QS kernel, 15% Trimming. 48 10 ρ=0.9 M eanLMm ˜ 0.5 M eanLMm ˆ SupLMm ˜ Power 0.4 SupLMm ˆ 0.3 0.2 0.1 0 0 2 4 δ ρ=1.0 6 8 10 M eanLMm ˜ M eanLMm ˆ SupLMm ˜ SupLMm ˆ 0.5 Power 0.4 0.3 0.2 0.1 0 0 2 4 6 8 δ Figure 2.2. Finite Sample Power, QS kernel, 15% Trimming. 49 10 ρ=0 1 Power 0.8 0.6 0.4 M eanLMm ˜ M eanLMm ˆ SupLMm ˜ 0.2 SupLMm ˆ 0 0 2 4 6 8 10 δ ρ=0.5 1 M eanLMm ˜ M eanLMm ˆ 0.8 Power SupLMm ˜ SupLMm ˆ 0.6 0.4 0.2 0 0 2 4 6 8 δ Figure 2.3. Finite Sample Power, Bartlett kernel, 15% Trimming. 50 10 ρ=0.9 0.9 0.8 M eanLMm ˜ 0.7 M eanLMm ˆ Power 0.6 SupLMm ˜ 0.5 SupLMm ˆ 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 δ ρ=1.0 0.5 M eanLMm ˜ 0.4 Power M eanLMm ˆ SupLMm ˜ 0.3 SupLMm ˆ 0.2 0.1 0 0 2 4 δ 6 8 Figure 2.4. Finite Sample Power, Bartlett kernel, 15% Trimming. 51 10 Table 2.4. Finite Sample Behavior of Data Dependent Bandwidth to Sample Size Ratios, T = 120, Bartlett Kernel ρ=0 ρ = 0.5 ρ = 0.7 ρ = 0.9 ρ=1 δ ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) ave(b) 0 0.012 0.013 0.055 0.050 0.088 0.078 0.185 0.140 0.501 0.197 0.5 0.014 0.013 0.058 0.051 0.090 0.078 0.187 0.140 0.502 0.197 1 0.025 0.013 0.066 0.051 0.097 0.079 0.192 0.140 0.503 0.197 1.5 0.041 0.013 0.078 0.051 0.108 0.079 0.200 0.140 0.504 0.197 2 0.056 0.012 0.093 0.052 0.122 0.080 0.210 0.140 0.505 0.197 3 0.088 0.012 0.129 0.052 0.157 0.081 0.238 0.140 0.510 0.197 4 0.122 0.012 0.169 0.053 0.196 0.082 0.271 0.141 0.515 0.199 5 0.158 0.012 0.210 0.053 0.238 0.083 0.307 0.144 0.522 0.202 6 0.193 0.012 0.250 0.053 0.279 0.083 0.344 0.148 0.530 0.207 7 0.228 0.012 0.290 0.053 0.318 0.084 0.379 0.152 0.539 0.213 8 0.262 0.012 0.327 0.053 0.355 0.084 0.412 0.155 0.548 0.220 9 0.294 0.012 0.361 0.053 0.389 0.084 0.443 0.159 0.558 0.229 10 0.324 0.012 0.393 0.053 0.420 0.084 0.470 0.161 0.568 0.238 25 0.579 0.012 0.619 0.053 0.631 0.084 0.649 0.164 0.670 0.288 50 0.673 0.012 0.686 0.053 0.690 0.084 0.695 0.164 0.707 0.288 100 0.702 0.012 0.706 0.053 0.707 0.084 0.709 0.164 0.712 0.288 the alternative data dependent bandwidth, m, leads to relatively small values of b, the LM tests using m are, on average, using bandwidths in the monotonic power range. In contrast, because m increases as δ increases, LM tests based on m are using bandwidths in the non-monotonic power range when δ is large. In other words, as δ increases, the LMm tests jump from ”low b” power curves to ”high b” power curves and this results in non-monotonic power. In summary, the finite sample simulations show that patterns in null rejection probabilities and power depend on the bandwidth and this relationship in turn depends on the value of ρ. This suggests that a theoretical explanation for the finite sample patterns requires an asymptotic theory that depends on the bandwidth and the strength of the serial correlation. The natural candidate is fixed-b asymptotic theory used in conjunction with nearly I(1) asymptotics which we explore in the next section. 52 T=120,ρ=0.7 1 M eanLMb=0.02 M eanLMb=0.1 0.8 M eanLMb=0.18 Power M eanLMb=0.2 0.6 M eanLMb=0.5 M eanLMm ˜ 0.4 M eaLMm ˆ 0.2 0 0 2 4 6 8 10 δ Asymptotic Power, I(1) erros, c=36 1 M eanLMb=0.02 M eanLMb=0.1 0.8 Power M eanLMb=0.18 M eanLMb=0.2 0.6 M eanLMb=0.5 0.4 0.2 0 0 0.2 0.4 0.6 0.8 * δ Figure 2.5. Finite Sample and Asymptotic Power of M eanLM , QS kernel, 15% Trimming. 53 T=120,ρ=0.7 1 Power 0.8 SupLMb=0.02 0.6 SupLMb=0.08 SupLMb=0.1 0.4 SupLMb=0.2 SupLMb=0.5 0.2 SupLMm ˜ SupLMm ˆ 0 0 2 4 6 8 10 δ Asymptotic Power, I(1) erros, c=36 1 Power 0.8 0.6 SupLMb=0.02 0.4 SupLMb=0.08 SupLMb=0.1 0.2 SupLMb=0.2 SupLMb=0.5 0 0 0.2 0.4 0.6 0.8 * δ Figure 2.6. Finite Sample and Asymptotic Power of SupLM , QS kernel, 15% Trimming. 54 T=120,ρ=0.7 1 M eanLMb=0.02 0.8 M eanLMb=0.1 Power M eanLMb=0.2 0.6 M eanLMb=0.3 M eanLMb=0.5 0.4 M eanLM m ˜ M eanLMm ˆ 0.2 0 0 2 4 6 8 δ Asymptotic Power, I(1) erros, c=36 10 1 M eanLMb=0.02 0.8 Power M eanLMb=0.1 M eanLMb=0.18 0.6 M eanLMb=0.2 M eanLMb=0.5 0.4 0.2 0 0 0.2 0.4 0.6 0.8 * δ Figure 2.7. Finite and Asymptotic Power of M eanLM , Bartlett kernel, 15% Trimming. 55 T=120,ρ=0.7 1 SupLMb=0.02 SupLMb=0.05 0.8 Power SupLMb=0.1 SupLMb=0.15 0.6 SupLMb=0.2 SupLMm ˜ 0.4 SupLMm ˆ 0.2 0 0 2 4 6 8 10 δ Asymptotic Power, I(1) erros, c=36 1 SupLMb=0.02 SupLMb=0.05 0.8 SupLMb=0.1 Power SupLMb=0.15 0.6 SupLMb=0.2 0.4 0.2 0 0 0.2 0.4 0.6 0.8 δ* Figure 2.8. Finite and Asymptotic Power of SupLM , Bartlett kernel, 15% Trimming. 56 2.5 Fixed-b Asymptotic Analysis of LM Mean Shift Tests In this section, we provide fixed-b asymptotic results for the LM tests. These results complement the fixed-b results derived by Sayginsoy and Vogelsang (2010) for the case of nonparametric HAC W ald statistics for testing for a shift in mean. We derive results under the local alternative HA : δ = δ0 g(T ) where g(T ) = T −1/2 if |ρ| < 1 and g(T ) = T 1/2 if ρ = 1 − c/T . Because the numerator of LM , SSR0 − SSR(Tb ), is identical to the W ald statistic, its limit follows directly from the results of Sayginsoy and Vogelsang (2010) (Theorems 1 & 2). Our theoretical contribution is obtaining the fixed-b limit of σ 2 (m) under HA . Obviously, results for the ˜ null distribution of the LM tests follow by setting δ0 = 0. The following theorem gives the limiting distribution of LM under the local alternative. 0 Theorem 2.5.3 Suppose the true model is given by (2.2.1) with break date Tb = λ0 T . Suppose the LM statistic is computed using model (2.2.1) using the break date Tb = λT . Let m = bT , where b ∈ (0, 1] is fixed as T increases. Under the local alternative HA , as T → ∞, LM (Tb , m) ⇒ λ(1 − λ)[Pi (λ) + Ψ(λ, λ0 )δ ∗ ]2 , i = 0 if {ut } is I(0), i = 1 if {ut } is I(1) Φi (b, δ0 ) where 1 1 [1(r > λ) − (1 − λ)]dW (r), λ(1 − λ) 0 1 1 P1 (λ) = [1(r > λ) − (1 − λ)]Vc (r)dr, λ(1 − λ) 0  1−λ 0 , if λ ≤ λ ,  0 λ(1−λ)2 Ψ(λ, λ0 ) = λ0  , if λ > λ0 . 2 P0 (λ) = λ (1−λ) δ ∗ = δ0 /σ, if {ut } is I(0); δ0 /d(1), if {ut } is I(1). 57 δ Q0 (r) = 0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + W (r) − rW (1), σ if {ut } is I(0), Q1 (r) = r 1 δ0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + Vc (s)ds − r Vc (s)ds, d(1) 0 0 if {ut } is I(1). Φi (b, δ0 ) =  1 1 1  0 0 − 2 k ( r−s )Qi (r)Qi (s)drds,  b b    if k(.) is of type 1;     1 k ((r − s)/(b))Q (r)Q (s)drds + 2 k (1) 1−b Q (r + b)Q (r)dr, i i i i |r−s| Tb ), Tb = T λc , T is the sample length, and c . I(t > Tb ) = c 0, t ≤ Tb c . 1, t > Tb As a comparison, the trend shift model with two breaks is yt = µ + βt + δ1 DTt (λc ) + δ2 DTt (λc ) + ut , 1 2 (3.2.3) where δ1 and δ2 are the break magnitudes, λc and λc are the true break points. And if the 1 2 model (3.2.3) is misspecified with only 1 break, it is denoted as yt = µ + βt + δDTt (λc ) + ut , where 76 (3.2.4) c 0, t ≤ Tb c, t > T c . t − Tb b . DTt (λc ) = I(0) errors are defined by assumption (A1.a). (C1.a) ut = ρut−1 + εt , where |ρ| < 1 and ∞ ∞ di Li , εt = d(L)et ; d(L) = i=0 i|di | < ∞, d(1)2 > 0; i=0 L is the lag operator; {et } is a martingale difference sequence with supt E(e4 ) < ∞, t E(et |et−1 , et−2 , · · · ) = 0, and E(e2 |et−1 , et−2 , · · · ) = 1. t And I(1) errors are defined by assumption (C1.b) (C1.b) ut = ρut−1 + εt , c where ρ = 1 − T , c ≥ 0 is a constant scalar. The break point is obtained by minimizing the sum of squared residuals (SSR) over the . gridding set Λ = {λ∗ , · · · , 1 − λ∗ }. ˆ λM S = arg min {SSRM S (λ)}, λ∈Λ∗ ˆ λT S = arg min {SSRT S (λ)}, λ∈Λ∗ where . SSRM S (λ) = . SSRT S (λ) = T ˆ ˆ [yt − µM S − δM S DUt (λ)]2 , ˆ (3.2.5) ˆ ˆ ˆ [yt − µT S − βT S t − δT S DTt (λ)]2 . ˆ (3.2.6) t=1 T t=1 ˆ ˆ βM S and δM S are the OLS estimates of Model (3.2.2) with no restrictions imposed. µT S , ˆ ˆ ˆ βT S , and δT S are the OLS estimates of Model (3.2.4) with no restrictions imposed. 77 3.3 Existing Analysis and Finite Sample Simulations Chong (1994), Chong (1995), and Bai (1995) studied the consequences of underspecifying the number of change points in structural change models. A general case with a single break in the intercept is estimated when the data sequence has two breaks. Their discussion covers the mean shift model with or without trending. Bai and Perron (1998)(BP hereafter) extended the estimate of single unknown break to multiple unknown breaks under both fixed and shrinking shift magnitudes. They concluded that the break point estimator still converges to one of the true breaks for the mean shifts model. Based on this argument, they proposed a sequential procedure for multi-break estimates without estimating the multiple breaks simultaneously. Dynamic programming was introduced by Bai and Perron (2003) to deal with the computational burden in multiple break point estimation. Kejriwal and Perron (2010)(KP hereafter) extended the sequential tests to the multiple trend shifts model to be robust to the persistence in the noise. ˆ In the following, I first use a simple simulation to illustrate the properties of λM S and ˆ λT S in finite samples in the presence of under-specification of break number. I generate data based on model (3.2.1) and (3.2.3) with two breaks, where T = 100, 250, 500, 1000, {λc , λc } = {1/3, 2/3}, ν = −2, −1, 1, 2 (δ1 = 1), d(L) = 1 and εt is an iid N (0, 1) 1 2 process. And set δ1 = 1 without loss of generality. Equation (3.2.5) and (3.2.6) are used to ˆ ˆ estimate λM S and λT S separately in each replication. Trimming is not necessary, however in order to ensure the invertibility of the regression matrix, I use 2% trimming, i.e., λ∗ = 0.02. The replications N = 20000, 10000, 5000, 2500 for T = 100, 250, 500, 1000 cases. ˆ Figure 3.1 and 3.2 plot the histograms of λM S when µ = −2, −1, 1, 2, T = 100, 250, 500, 1000 and error ut is i.i.d. N (0, 1). Interestingly, when |ν| = 1 and T = 100, ˆ the histogram of λM S has a wide spread over the area around both ends or the middle ˆ λ = 0.5. This can be explained by Yang (2010) through the behavior of the mean shift 78 break point estimator, where the break point estimates concentrate around the ends of the ˆ gridding area in no break case. With the increase of T , the tails of λM S decrease graduˆ ally. When T = 1000, the histogram of λT S concentrates at one true break λc = 1/3 or λc = 2/3. For µ = 1, there are two peaks in histogram because δ1 and δ2 has the same effects on the break point estimates. ˆ Figure 3.3 and 3.4 plot the histograms of λT S when ν = −2, −1, 1, 2(δ1 = 1), T = ˆ 100, 250, 500, 1000. For ν = −2, the only peak in histogram of λT S is at λ > 2/3. When ˆ ν = −1, λT S has two equivalent peaks in histogram at around λ = 0.2 and 0.8. When ˆ ν = 1, the histogram of λT S roughly has only one peak in the histograms for the stationary cases, at around λ = 0.5. When T = 100, the break point estimates are less concentrated around λ = 0.5. With the increase of T , the pattern of only one peak in the histograms still holds, and the break estimates are more concentrated around λ = 0.5. When ν = 2, ˆ the histogram of λT S concentrates at 1/3 < λ < 2/3. In all these cases, the break date estimates are mostly at 0.5. When ν = 2 and T = 1000, the histograms concentrates on the points other than the true breaks. It shows that when the misspecification of break number exists, the break point estimator for the trend shift model does not converge to either of the true breaks. And if λc and λc are different, the concentration of the break point estimators 1 2 ˆ varies, i.e., the limits of the break point estimator λT S depend on both break magnitudes and break dates. How these parameters matters will be analyzed later. The finite sample histograms show two interesting points: a) In the case of underspecification of the break number, the mean shift break estimator converges to a subset of the true break points, while the trend shift counterpart does not converge to either of the true break points. The limits of the break point estimators depend on the break dates and the break magnitudes. b) When break magnitude increases, the break point estimators have 79 15000 2500 2000 10000 1500 1000 5000 500 0 0.2 0.4 0.6 0 0.8 6000 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 2000 1500 4000 1000 2000 0 500 0.2 0.4 0.6 0 0.8 3000 1500 2000 1000 1000 500 0 0.2 0.4 0.6 0 0.8 1500 500 400 1000 300 200 500 100 0 0.2 0.4 0.6 0 0.8 Figure 3.1. Histogram of single break point estimator {λc , λc } = {1/3, 2/3}. δ1 = 1 always. From left to right: 1 2 −1); from top to bottom: T = 100, 250, 500, 1000. 80 ˆ λM S in two breaks model: ν = −2(δ2 = −2), −1(δ2 = 2500 15000 2000 10000 1500 1000 5000 500 0 0.2 0.4 0.6 0 0.8 1500 0.6 0.8 4000 500 0.4 6000 1000 0.2 2000 0 0.2 0.4 0.6 0 0.8 1000 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 3000 800 2000 600 400 1000 200 0 0.2 0.4 0.6 0 0.8 500 1500 400 1000 300 200 500 100 0 0.2 0.4 0.6 0 0.8 Figure 3.2. Histogram of single break point estimator {λc , λc } = {1/3, 2/3}. δ1 = 1 always. From left to right: ν 1 2 top to bottom: T = 100, 250, 500, 1000. 81 ˆ λM S in two breaks model: = 1(δ2 = 1), 2(δ2 = 2); from 4000 1000 800 3000 600 2000 400 1000 0 200 0.2 0.4 0.6 0 0.8 1500 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 400 500 0.4 600 1000 0.2 200 0 0.2 0.4 0.6 0 0.8 400 400 300 300 200 200 100 100 0 0.2 0.4 0.6 0 0.8 150 200 150 100 100 50 0 50 0.2 0.4 0.6 0 0.8 ˆ Figure 3.3. Histogram of single break point estimator λT S in two breaks: {λc , λc } = 1 2 {1/3, 2/3}. δ1 = 1 always. The left to right: ν = −2(δ2 = −2), −1(δ2 = −1); The top to bottom: T = 100, 250, 500, 1000. 82 800 2500 2000 600 1500 400 1000 200 0 500 0.2 0.4 0.6 0 0.8 250 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 800 200 600 150 400 100 200 50 0 0.2 0.4 0.6 0 0.8 200 300 150 200 100 100 50 0 0.2 0.4 0.6 0 0.8 60 100 80 40 60 40 20 20 0 0.2 0.4 0.6 0 0.8 ˆ Figure 3.4. Histogram of single break point estimator λT S in two breaks: {λc , λc } = 1 2 {1/3, 2/3}. δ1 = 1 always. The left to right: ν = 1(δ2 = 1), 2(δ2 = 2); The top to bottom: T = 100, 250, 500, 1000. 83 complicated distributions which have not been explained by current results. 3.4 Break Date Estimator under Multiple Breaks I can assume that break magnitude δ1 and δ2 are within a T −1/2 neighborhood: (C2.a) δ∗ δ∗ ∗ ∗ 2 1 δ1 = 1/2 , δ2 = 1/2 , where δ1 , δ2 = constant scalars. (3.4.7) T T Also I can assume that break magnitude δ1 and δ2 are within a T −3/2 neighborhood: (C2.b) δ∗ δ∗ ∗ ∗ 1 2 δ1 = 3/2 , δ2 = 3/2 , where δ1 , δ2 = constant scalars. (3.4.8) T T ˆ ˆ The limiting distributions of λM S and λT S with under-specified break number are derived under assumption (C2.a) or (C2.b) for different models and errors. 3.4.1 Multiple mean shifts Theorem 3.4.4 Assume the mean shift model has two break points, λc and λc , as in (3.2.1). 1 2 When the break number is underspecified as one and the assumptions (C1.a) and (C2.a) ∗ ∗ ∗ ∗ hold such that δ1 = T −1/2 δ1 and δ2 = T −1/2 δ2 , where δ1 and δ2 are constant, the break ˆ point estimator λM S has the limiting distributions as follows: δ∗ δ∗ 1 2 [(λW (1) − W (λ)) + d(1) Ψ(λ, λc ) + d(1) Ψ(λ, λc )]2 1 2 d ˆ M S − arg max{ λ → } (3.4.9) λ(1 − λ) λ∈Λ where . Ψ(λ, λc ) = (1 − λc )λ, if λ ≤ λc , (1 − λ)λc , if λ > λc , ∗ ∗ . δ1 . δ2 If we define M1 = d(1) and M2 = d(1) , c c 2 [(λW (1) − W (λ)) + M1 Ψ(λ, λ1 ) + M2 Ψ(λ, λ2 )] ˆ λM S ≈ arg max{ }. λ(1 − λ) λ∈Λ 84 (3.4.10) To discover the effect of M1 , λc , M2 , and λc on the limiting distributions, I decompose 2 1 the terms inside arg max in equation (3.4.10) into three parts: GM S (λ, λc , λc ) 1 2 c c . λW (1) − W (λ) + M1 Ψ(λ, λ1 ) + M2 Ψ(λ, λ2 ) = λ(1 − λ) . = G1M S (λ) + M1 · G2M S (λ, λc ) + M2 · G2M S (λ, λc ) 2 1 Ψ(λ, λc ) Ψ(λ, λc ) . (λW (1) − W (λ)) 1 2 = + M1 · + M2 · λ(1 − λ) λ(1 − λ) λ(1 − λ) (3.4.11) (3.4.12) With the form of G1M S (λ) + M1 · G2M S (λ, λc ) + M2 · G2M S (λ, λc ) in the limiting 2 1 distributions, Theorem 3.4.4 provides a bridge between the δ = 0 asymptotics and the δ = 0 asymptotics. When M1 and M2 are small, the random component G1M S dominates GM S and the distribution is close to the case of no break. ˆ Theorem 3.4.4 also explains why as M grows, λM S are closer to the true breaks. With the increase of T , where M1 and M2 increase, G2M S parts will be dominant in G1M S + ˆ M1 · G2M S (λc ) + M2 · G2M S (λc ). For a moderate M , the limiting distribution of λM S 1 2 exhibits a shape of “w”, resulting from the mixed effects of G1M S and G2M S parts in the asymptotics. If T → ∞, both M1 and M2 increase to ∞, ˆ lim λM S T →∞ = = lim arg max[G1M S (λ) + M1 · G2M S (λ, λc ) + M2 · G2M S (λ, λc )]2 1 2 T →∞ λ∈Λ lim arg max[G1M S (λ)/M1 + G2M S (λ, λc ) + ν · G2M S (λ, λc )]2 1 2 T →∞ λ∈Λ → arg max |G2M S (λ, λc ) + ν · G2M S (λ, λc )|. 1 2 λ∈Λ . where ν = M2 /M1 = δ2 /δ1 . ˆ Therefore, the limit of λM S is determined by |G2M S (λ, λc ) + ν · G2M S (λ, λc )|, which attains global maximum at either λc or λc as 1 2 1 2 ˆ shown in Figure 3.5 (The proof is straightforward and omitted here). Hence, λM S conˆ verges to λc or λc as T → ∞, that is the break point estimator λM S is consistent to one 1 2 of the true breaks. Our results agree the existing literatures in consistency analysis, and moreover the distribution is derived here. 85 λc=0.5 0.5 0.45 0.4 G2FD 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.2 0.4 λ 0.6 0.8 Figure 3.5. G2M S (λ, λc ) under λc = 0.5 for mean shift model 86 1 Mean shift model: ν=1; λc =0.25; λc =0.75 2 1 1.2 G2(λc ) 1 |G2(λ,λc )+ν*G2(λ,λc )| 2 1 1 ν*G2(λc ) 2 |G2(λ,λc )+ν*G2(λ,λc )| 2 1 0.8 [λ=1/4] 0.6 [λ=3/4] 0.4 0.2 0 0 0.2 0.4 λ 0.6 0.8 1 Mean shift model: ν=−1; λc =0.25; λc =0.75 2 1 |G2(λ,λc )+ν*G2(λ,λc )| 1 2 0.8 G2(λc ) 1 0.6 ν*G2(λc ) 2 0.4 |G2(λ,λc )+ν*G2(λ,λc )| 2 1 [λ=1/4] [λ=3/4] 0.2 0 −0.2 −0.4 0 0.2 0.4 λ 0.6 0.8 1 Figure 3.6. |G2M S (λ, λc ) + ν · G2M S (λ, λc )| under different ν = 1 and -1 for mean shift 1 2 model, where {λc , λc } = {1/4, 3/4}. 1 2 87 λc=0.5 0.08 0.07 0.06 G2TS 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 λ 0.6 0.8 Figure 3.7. G2T S (λ, λc ) under λc = 0.5 for trend shift model 88 1 3.4.2 Multiple trend shifts When the break number is underspecified, we estimate one break point by model (3.2.4), while the true model is (3.2.3). In the following I explore the consistency of break points by deriving the asymptotics under local alternative. Theorem 3.4.5 gives the limiting disˆ tribution of λT S under assumption (C1.a) and (C2.b) and shows that it is inconsistent to any of the true break points. Theorem 3.4.5 Assume the trend shift model has two break points, λc and λc , as in (3.2.3). 1 2 When the break number is underspecified as one and assumption (C1.a) and (C2.b) hold ∗ ∗ ˆ such that δ1 = T −3/2 δ1 and δ2 = T −3/2 δ2 , the break point estimator λT S has the limiting distributions as follows: ˆ λT S ⇒ 1 arg max{[ λ∈Λ 1 ∗ δ2 d(1) 0 F (r, λ)dW (r) + 0 F (r, λ)F (r, λc )dr]2 / 2 ∗ 1 δ1 F (r, λ)F (r, λc )dr + 1 d(1) 0 1 F (r, λ)2 dr} (3.4.13) 0 where λ3 − 2λ2 + λ − (2λ3 − 3λ2 + 1)r, if r ≤ λ, λ3 − 2λ2 − (2λ3 − 3λ2 )r, if r > λ. . F (r, λ) = δ∗ δ∗ δ T 3/2 δ T 3/2 1 2 If we define M1 = d(1) ≡ 1d(1) and M2 = d(1) ≡ 2d(1) , there is ˆ λT S ⇒ 1 arg max{[ λ∈Λ 1 M2 0 0 1 F (r, λ)dW (r) + M1 F (r, λ)F (r, λc )dr]2 2 1 0 89 0 F (r, λ)F (r, λc )dr + 1 F (r, λ)2 dr}. (3.4.14) ˆ For the asymptotic distribution of λT S , we can also decompose the part inside arg min into G1T S and G2T S part, where GT S (λ, λc ) . = G1T S (λ) + M1 · G2T S (λ, λc ) + M2 · G2T S (λ, λc ) 2 1 1 F (r, λ)dW (r) . = 0 + 1 2 dr 0 F (r, λ) 1 1 c c 0 F (r, λ)F (r, λ1 )dr + M · 0 F (r, λ)F (r, λ2 )dr M1 · 2 1 1 2 F (r, λ)2 dr 0 0 F (r, λ) dr (3.4.15) Similar to previous discussion on mean shifts, if M1 and M2 are small, GT S is dominated by that of G1T S . The asymptotic distribution will be close to the distribution when there is no break. Later on G2T S part starts to dominate. If T → ∞ (M1 , M2 → ∞), ˆ lim λT S → arg max |G2T S (λ, λc ) + ν · G2T S (λ, λc )|. 1 2 T →∞ λ∈Λ And it is still true that G2(λ, λc ) achieves maximum at λ = λc as shown in Figure 3.7. i i What makes it different from the mean shifts case is: when we stack one part of G2 to the other, the function smooths out through the two peaks at the λc ’s. Hence when the number i of trend breaks is two while assumed to be one, |G2T S (λ, λc ) + ν · G2T S (λ, λc )| achieves 1 2 maximum neither at λc nor at λc . Figure 3.8 plots |G2(λ, λc )+ν·G2(λ, λc )| under different 1 2 1 2 ν when λc = 1/3 and λc = 2/3, which shows in both cases |G2(λ, λc ) + ν · G2(λ, λc )| 1 2 1 2 ˆ reaches peak at neither of the true break points. Certainly, if |ν| is smaller than 1, λT S ˆ will be closer to λc ; if |ν| is bigger than 1, λT S will be closer to λc . This indicates the 1 2 inconsistency of the trend shifts estimator when the break number is underspecified. 3.4.3 ˆ ˆ Consistency/Inconsistency conclusion of λM S and λT S In this section, we summarize the previous analysis on the consistency/inconsistency of ˆ ˆ λM S and λT S under assumption (C1.a) and (C2.a): 90 1. For mean shift model with 2 breaks, if the break magnitude δ1 = 0 and δ2 = 0, the ˆ single break point estimator λM S is consistent to either λ1 or λ2 : ˆ lim λM S → λc or λc . 1 2 T →∞ 2. For trend shift model with two breaks, if the break magnitude δ1 = 0 and δ2 = 0, the ˆ single break point estimator λT S is inconsistent to either λ1 or λ2 : ˆ lim λT S → λc and λc . 2 1 T →∞ The limit depends on λc , λc , and ν, and will be 1 2 ˆ lim λT S = arg max |G2T S (λ, λc ) + ν · G2T S (λ, λc )|. 1 2 T →∞ 3.5 λ∈Λ Break point estimators for level and first difference model under multiple breaks In this section, I first difference the trend shift model with multiple breaks to solve the inconsistency problem under near-I(1) errors. We choose near-I(1) errors because it can provide a good approximation to finite sample case with different persistence in the errors. In the case of near-I(1) errors, I assume that δ1 and δ2 are within a T −1/2 neighborhood as δ∗ δ∗ 1 2 in (C2.a): δ1 = 1/2 and δ2 = 1/2 . T T As we can see in the previous section, the limit of the break point estimator with underspecified break number depends on the model, that is, the mean shift model leads to the consistent break point estimator while the trend shift model does not. What follows is that if we take the first difference on the trend shift model, we might solve the inconsistency problem. Let us start with the trend shift model with two breaks: yt = µ + βt + δ1 DTt (λc ) + δ2 DTt (λc ) + ut . 1 2 91 (3.5.16) First Difference it, we get: ∆yt = β + δ1 DUt (λc ) + δ2 DUt (λc ) + ∆ut . 2 1 (3.5.17) ˆ ˆ The asymptotics are different for λT S and λM S during this procedure. I extend the results in Theorem 3.4.4 and 3.4.5 for I(0) errors to near-I(1) errors to describe the perforˆ ˆ mance of λT S and λM S under different λc , δ1 , λc , and δ2 . 1 1 Theorem 3.5.6 Assume there are two breaks in the trend shift model (3.5.16). Suppose the regressions in the level model (3.5.16) and its first difference (3.5.17) are estimated assuming a single break λ ∈ Λ ⊆ (0, 1) where λc and λc are the true breaks. Under the 1 2 assumption (C1.b) and (C2.a), the break point estimators by minimizing the SSR(λ) have the limiting distributions as follows. 1. For the level model (3.2.3), ˆ λT S ⇒ 1 1 [ 1 F (r,λ)Vc (r)dr+M1 0 F (r,λ)F (r,λc )dr+M2 0 F (r,λ)F (r,λc )dr]2 2 1 arg maxλ∈Λ { 0 } 1 F (r,λ)2 dr 0 . where Vc (r) = . δ∗ r exp(−c(r − s))dW (s), M = d(1) and F (r, λ) is defined in Theo0 rem 3.4.5. 2. For the first difference (3.2.1), ˆ λM S ⇒ arg maxλ∈Λ {[ 1 (λW (1)−W (λ)−c 0 (1(r>λ)−(1−λ))Vc (r)dr) √ + λ(1−λ) M1 (Ψ(λ,λc )+M2 (Ψ(λ,λc ) 2 1 2 √ λ(1−λ) ] } . δ∗ where M = d(1) and Ψ(λ, λc ) is defined in Theorem 3.4.4. The asymptotics in Theorem 3.5.6 are the extension of the work by Yang (2010) from single break case to multiple breaks case. Compared to Theorem 3.4.5 and 3.4.4, G1T S and G1M S are different in Theorem 3.5.6: 92 . G1T S = 1 0 F (r, λ)Vc (r)dr , 1 2 0 F (r, λ) dr 1 . (λW (1) − W (λ) − c 0 (1(r > λ) − (1 − λ))Vc (r)dr) G1M S = . λ(1 − λ) Though G1 is different, when T → ∞, G1 will not show in the limit equation, therefore ˆ ˆ ˆ the limits of λT S and λM S are the same as in Theorem 3.4.5 and 3.4.4. Hence λT S on ˆ the level model is inconsistent, while λM S on the first difference model converges to either ˆ λc or λc . That is the first difference break point estimator λM S solves the inconsistency 1 2 ˆ problem of λT S . Figure 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15 and 3.16 plot the finite sample distribution ˆ ˆ and asymptotic distribution of λT S and λM S for ν = −5, −2, −1, −0.5, 0.5, 1, 2, 5 under T = 100, 250, 500, 1000, ρ = 1, µ = β = 0, and {λc , λc } = {1/4, 3/4} or {1/3, 2/3}. 1 2 For all figures, without loss of generality, I assume δ1 = 1 and is fixed. In these Figures, the pdfs of λT S and λM S are plotted in the same panel to compare the performance in presence of under-specified break numbers. We use kernel smoothing to obtain the pdf based on the simulations 3 . Figures 3.9 to 3.16 show the evolution of the distributions along the increase of ν. For ν < −1 and ν > 1, the break point estimator tends to be closer to λc ; For |ν| < 1, the break 2 ˆ point estimator will be closer to λc . With the increase of T , we see that λT S converges to 1 some points which are not the true ones. Therefore, from the point of the consistency to the ˆ ˆ true break points, we prefer λM S . For ν > 0 as in Figure 3.9 to 3.12, λT S would converge ˆ to point between λc and λc , while for ν < 0 in Figure 3.13 to 3.16, λT S would converge 1 2 to point in [0, λc ] or [λc , 1]. 1 2 3 In this chapter, I use the standard normal distribution as the kernel function. For the same reason as in PZ(2005), that is, the optimal data dependant bandwidth may not work well, I choose a simple bandwidth equals to 0.5σ for any error, where σ is the STD of the data sequence. Simulations show that h does not affect the pdf estimator much. 93 For either of the two estimators, whether the pdf has one peak or two peaks and the location of the peak(s) depend on several factors: the space between the breaks, the relative magnitude, the signs of two breaks, and the persistence of the errors as well. For example, ˆ when break magnitudes are the same, λT S tends to have one peak in pdf if the distance in ˆ between is small, i.e. {λc , λc } = {1/3, 2/3}. While for {λc , λc } = {1/4, 3/4}, λT S has 1 2 1 2 two peaks in pdf. The effect of persistence on the shapes of density seems non-monotonic. ˆ ˆ To compare the performance of λT S and λM S quantitatively, we use the sum of pdf values at the true break points as a criterion to compare the precision. We consider 3 cases: ν = 1, ν = −1, and |ν| = 1. Table 3.1, 3.2, and 3.3 list the sum of pdf values at the true ˆ break point. We can see that λT S stands at an advantage when the break magnitudes are small, disregarding the sign of the breaks and how strong the persistence in the errors is. One exception is when we are in stationary case and the difference between the magnitudes ˆ and the magnitudes themselves are small. In that case, we may prefer λM S . This can be explained clearly by both finite sample pdf and the theoretical limiting distributions. When the break magnitude is small, the null hypothesis plays the major role in the distribution. ˆ ˆ λT S has high pdf around the true break, and λM S has most mass around 0 and 1. Therefore ˆ when break magnitude is small, we prefer λT S . However, persistence may not be such a dominant factor as break magnitudes. With the increase of break magnitude, this condition ˆ ˆ changes. The pdf of λM S gradually gets denser at the true breaks, while that of λT S ˆ gradually becomes smaller. When the break magnitude becomes big enough, λM S has ˆ much higher pdf at the true breaks. And eventually the inconsistency problem for λT S drives it to converge to the points other than the true break points. Then even in a stationary case, we still want to “over-difference” the trend shift model and get a higher density at the true values. In Figure 3.9 to 3.16, the asymptotic distributions are plotted together with the finite sample distributions via simulations. We can see that the asymptotic distribution approximates well to the finite sample distributions. When the pdf is calculated, the bandwidth 94 Table 3.1. Sum of densities at the true break λc and λc where {λc , λc } = {1/3, 2/3} under 1 2 1 2 different ρ and M1 = M2 . ρ = 0 ρ = 0 ρ = 0.5 ρ = 0.5 ρ = 1 ρ = 1 ˆ ˆ ˆ ˆ ˆ ˆ M 1 = M 2 λT S λ M S λT S λM S λT S λM S 10 0.11 0.71 1.00 1.01 1.42 0.87 0.00 0.97 0.71 2.51 1.50 1.27 20 50 0.00 6.02 0.00 5.13 1.52 3.42 100 0.00 8.17 0.00 6.71 2.23 6.11 150 0.00 11.31 0.00 9.92 2.76 8.97 Table 3.2. Sum of densities at the true break λc and λc where {λc , λc } = {1/3, 2/3} under 1 2 1 2 different ρ and M1 = −M2 . ρ = 0 ρ = 0 ρ = 0.5 ρ = 0.5 ρ = 1 ρ = 1 ˆ ˆ ˆ ˆ ˆ ˆ M1 = −M2 λT S λM S λT S λM S λ T S λM S 10 1.89 0.07 1.73 0.12 1.71 0.86 20 1.31 0.48 1.47 0.81 1.51 1.03 50 0.17 2.11 0.97 2.51 1.77 2.01 100 0.02 6.31 0.00 5.97 1.51 4.13 0.00 8.12 0.00 7.46 1.43 6.47 150 used in the kernel smooth plays an important role. To provide a comparison under the same base, we should choose the same bandwidth in the pdf calculation. For different T , under the same M1 and M2 , our results also show that the approximation of Theorem 3.5.6 to the finite sample distribution is pretty good. The data will be provided upon request. Figure 3.17 and Figure 3.18 plot the λ to achieve the maximal value of |G2T S (λ, λc ) + 1 ν · G2T S (λ, λc )| along ν at (λc , λc ) = (1/3, 2/3) and (1/4, 3/4), which shows where the 2 1 2 Table 3.3. Sum of densities at the true break λc and λc where {λc , λc } = {1/3, 2/3} under 1 2 1 2 different ρ and |M1 | = |M2 |, where M1 = 50(δ1 = 5). ρ = 0 ρ = 0 ρ = 0.5 ρ = 0.5 ρ = 1 ρ = 1 ˆ ˆ ˆ ˆ ˆ ˆ M 2 λT S λ M S λT S λM S λT S λM S 10 4.97 19.16 6.31 13.17 4.27 4.53 20 2.35 20.37 4.12 14.76 3.97 4.46 40 0.00 5.76 0.00 5.07 1.51 3.36 100 0.00 0.00 0.00 0.00 0.31 0.96 150 0.00 0.00 0.00 0.00 0.00 0.00 95 ˆ limits of λT S would be when T → ∞. When ν = 0, the λ to achieve maximum will be λc , 1 one of the true break points. Other than that, all the limits would not be at the true break point. With the increase of ν from -10 to 10, there are two kinks in the limit at |ν| = 1. And when |ν| goes to ∞, the limit of the break point estimator will be the true break λc . Take 2 {λc , λc } = {1/3, 2/3} as an example. When ν < −1, the limiting point is greater than 1 2 2/3. When −1 < ν < 0, the limiting point is less than 1/3. In both cases, the limiting point is between the two true break points. When ν > 0, the limiting point will be between the true break points. And when ν = 1, the limiting point is at λ = 0.5. This can be extended to be a general pattern. 3.6 Application to Sequential Tests of Multiple Breaks Model ˆ ˆ The previous analysis shows that λM S can deal with the inconsistency of λT S in the presˆ ence of under-specification of break numbers, which means using λM S might be better than ˆ λT S in some applications with unknown break number. BP(1998) and BP(2003) proposed the sequential process to test mean shift hypothesis and locate the break points step by step, where the consistency of the break point estimator is critical. PY(2008) develop a test for an unknown break point in a univariate trend break model where the noise component can be either stationary or integrated. A bias-corrected estimate of the serial correlation parameter is used and a super efficient estimate of ρ is applied to choose the test for I(0) or I(1) errors. KP(2010) applied the sequential test in BP(2003) to a multiple trend break model and made it robust to I(0) and I(1) errors using the method in PY(2008). A general model is defined in these papers as follows. y t = xt Ψ + ut (3.6.18) ut = ρut−1 + εt . (3.6.19) 96 . . For a linear trend shift model, xt = (1, t, DTt ) , Ψ = (µ, β, δ) . Testing hypothesis H0 : RΨ = γ, PY(2008)’s test is defined as ˆ ˆ WF S = (RΨ − γ) [s2 R(X X)−1 R](RΨ − γ), where R = (0, 0, 1), γ = 0, X = (x1 , (1 − ρ)x2 , · · · , (1 − ρ)xT ) ,s2 = T −1 ˆ ˆ T ˆ2 t=1 εt , and εt are the residuals associated with the feasible GLS regression. ˆ KP(2010) described the sequential way as follows: first, we obtain the estimates of the break dates T1 , · · · , Tl as global minimizers of the sum of squared residuals from the model with l breaks estimated by OLS: ˆ ˆ (T1 , · · · , Tl ) = arg min (T1 ,··· ,Tl ) SSR(T1 , · · · , Tl ). This can be achieved using the dynamic programming algorithm proposed by BP(2003). Second, we test for the presence of an additional break in each of the (l + 1) segments ˆ ˆ partitioned by (T1 , · · · , Tl ). The test statistics test the null hypothesis of, say, l breaks, versus the alternative hypothesis of (l + 1) breaks is proposed. In practice, l starts from 0. Break tests in each segment follows PY(2009) in details. The inconsistency problem of the trend break point estimator with under-specified break number is not considered by KP(2010) and other related literatures. Theorem 3.5.6 proves ˆ the inconsistency of λT S , which could hurt the sequential test. Hence instead of using ˆ ˆ λT S as in KP(2010) that does not take into account the inconsistency problem, I use λM S as break point estimates to show how solving the inconsistency problem can improve the power of the test. To do so, we should make sure this does not change the null behavior of the KP(2010) test. KP(2010)’s result are based on the assumption that the break point estimators are consistent to the true breaks. Hence the null distribution of KP(2010) test does not depend on the break point estimators, i.e. consistency of the break point estimators ensures the break point estimators do not asymptotically show up in the null distribution. ˆ ˆ From this point, we can use λM S instead of λT S to solve the inconsistency problem. 97 I consider the similar variety of data generating processes as in KP(2010), especially the setting of errors. m denotes the selected break number. Simulation DGP can be described as follows for a two-break case (m = 2), yt = δ1 DTt (λc ) + δ2 DTt (λc ) + ut , 2 1 (3.6.20) ut = ρut−1 + et + θet−1 . (3.6.21) In Table 3.4 and 3.5, we set δ1 = 1, λc = 1/3, and λc = 2/3, θ = 0 and 0.5 re2 1 spectively. The true break number m = 2. The results here show that, for big δ or when ˆ ˆ ρ = 1, 0.9, 0.8, λM S is always preferred in that the power is raised by introducing λM S ˆ and that the possibility to over-estimate (m ≥ 3) is lowered using λM S . When ρ = 0.5 and ˆ ˆ break magnitude is small, the improvement from using λM S is not as much as the other cases. This is because when ρ and δ are both small, there is a comprehensive effect from the over-difference and consistency problem. The effect from over-difference outweights the ˆ ˆ value of consistency. The power comparison between choosing λM S and λT S at T = 240 (which will be provided upon request) is similar to when T = 120. The asymptotic result predicts the finite sample distribution well, that is, the finite sample performance is only a matter of M1 , M2 , λc , λc , and does not depend on T much. When θ = 0.5, the power 1 2 ˆ improve from λM S is less than θ = 0. 98 Table 3.4. Probability of Break Number Selection m for Trend Shift Model with 2 breaks: ˆ c , λc } = {1/2, 2/3}, δ = 1, θ = 0, T = 120. {λ1 2 1 ˆ ˆ m using λT S ˆ m using λM S ˆ δ2 0.00 1.00 2.00 ≥ 3.00 0.00 1.00 2.00 ≥ 3.00 ρ 1.00 0.50 0.00 0.58 0.30 0.12 0.00 0.63 0.31 0.06 0.60 0.00 0.43 0.43 0.14 0.00 0.47 0.48 0.05 0.70 0.00 0.33 0.52 0.15 0.00 0.38 0.54 0.08 0.80 0.00 0.23 0.60 0.17 0.00 0.27 0.64 0.09 0.90 0.00 0.16 0.67 0.17 0.00 0.18 0.70 0.12 1.00 0.00 0.10 0.72 0.18 0.00 0.16 0.78 0.06 0.90 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.63 0.44 0.27 0.16 0.07 0.05 0.30 0.46 0.58 0.71 0.79 0.80 0.07 0.10 0.15 0.13 0.14 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.66 0.49 0.28 0.18 0.10 0.09 0.33 0.48 0.60 0.76 0.81 0.83 0.01 0.03 0.12 0.06 0.09 0.08 0.80 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.58 0.40 0.23 0.10 0.05 0.02 0.37 0.52 0.65 0.79 0.84 0.86 0.05 0.08 0.12 0.11 0.11 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.61 0.42 0.26 0.13 0.08 0.03 0.38 0.53 0.68 0.82 0.88 0.86 0.01 0.05 0.06 0.05 0.04 0.11 0.50 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.01 0.01 0.00 0.00 0.88 0.87 0.88 0.90 0.90 0.90 0.09 0.11 0.11 0.09 0.10 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.03 0.03 0.02 0.01 0.00 0.87 0.85 0.86 0.88 0.91 0.93 0.07 0.12 0.11 0.10 0.08 0.07 99 Trend shift model: ν=1; λc =0.25; λc =0.75 2 1 0.07 [λ=1/2] 0.06 0.05 0.04 0.03 G2(λc ) 1 0.02 ν*G2(λc ) 2 0.01 0 0 |G2(λ,λc )+ν*G2(λ,λc )| 2 1 0.2 0.4 0.6 0.8 λ Trend shift model: ν=−1; λc =0.25; λc =0.75 2 1 1 0.05 [λ=0.2] [λ=0.8] 0 G2(λc ) 1 ν*G2(λc ) 2 |G2(λc )+ν*G2(λc )| 2 1 −0.05 0 0.2 0.4 λ 0.6 0.8 1 Figure 3.8. |G2T S (λ, λc ) + ν · G2T S (λ, λc )| under ν = 1 and -1 for trend shift model, 1 2 where {λc , λc } = {1/4, 3/4}. 1 2 100 100 80 80 60 60 40 40 20 20 0 0.7 0.75 0.8 0.85 0 0.9 100 0.65 0.7 0.75 0.8 80 80 60 60 40 40 20 20 0 0.7 0.75 0.8 0.85 0 0.65 0.9 100 0.7 0.75 80 80 60 60 40 40 20 20 0 0.72 0.74 0.76 0.78 0 0.64 0.8 100 0.66 0.68 0.7 0.72 0.66 0.68 0.7 0.72 80 80 60 60 40 40 20 20 0 0.72 0.74 0.76 0.78 0 0.64 0.8 ˆ ˆ Figure 3.9. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = −5. The left to right: {λc , λc } = {1/4, 3/4}, {1/3, 2/3}; the top to bottom: 1 2 ˆ ˆ T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample λM S ; dot: ˆ ˆ asymptotic λT S ; dot-solid: asymptotic λM S . 101 50 40 40 30 30 20 20 10 10 0 0.65 0.7 0.75 0.8 0 0.5 0.85 80 0.6 0.7 0.8 0.9 0.7 0.8 0.9 1 0.65 0.7 0.75 0.8 0.65 0.7 0.75 0.8 60 60 40 40 20 20 0 0.7 0.75 0.8 0.85 0 0.9 80 80 60 60 40 40 20 20 0 0.7 0.75 0.8 0.85 0 0.9 100 80 80 60 60 40 40 20 20 0 0.7 0.75 0.8 0.85 0 0.9 ˆ ˆ Figure 3.10. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = −2. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top 1 2 1 2 ˆ to bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 102 15 10 8 10 6 4 5 2 0 0 0.5 0 0 1 20 0.5 1 0.5 1 0.5 1 0.5 1 15 15 10 10 5 5 0 0 0.5 0 0 1 25 25 20 20 15 15 10 10 5 5 0 0 0.5 0 0 1 50 40 40 30 30 20 20 10 10 0 0 0.5 0 0 1 ˆ ˆ Figure 3.11. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = −1. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top 1 2 1 2 ˆ to bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 103 20 20 15 15 10 10 5 5 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 30 30 20 20 10 10 0.1 0.2 0.3 0.4 0.5 0.2 50 0.4 0.6 40 40 30 30 20 20 10 10 0 0.2 0.3 0 0.4 0.2 0.3 0.4 80 80 60 60 40 40 20 20 0 0.2 0.25 0 0.2 0.3 0.25 0.3 0.35 0.4 ˆ ˆ Figure 3.12. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = −0.5. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top 1 2 1 2 ˆ to bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 104 20 20 15 15 10 10 5 5 0 0.2 0.4 0.6 0 0 0.8 0.5 1 30 30 20 20 10 10 0 0.2 0.4 0.6 0 0.8 50 0.4 0.5 0.6 0.7 40 40 0.3 30 30 20 20 10 10 0 0.2 0.3 0.4 0 0.5 0.3 0.4 0.5 80 60 60 40 40 20 20 0 0 0.25 0.3 0.35 0.4 0.45 0.3 0.35 0.4 0.45 0.5 ˆ ˆ Figure 3.13. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = 0.5. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top 1 2 1 2 ˆ to bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 105 15 15 10 10 5 5 0 0.2 0.4 0.6 0 0.8 20 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 15 15 10 10 5 5 0 0.2 0.4 0.6 0 0.8 30 20 15 20 10 10 0 5 0.2 0.4 0.6 0 0.8 50 40 40 30 30 20 20 10 10 0 0.2 0.4 0.6 0 0.8 ˆ ˆ Figure 3.14. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = 1. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top to 1 2 1 2 ˆ bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 106 50 40 40 30 30 20 20 10 10 0 0.6 0.7 0 0.8 0.5 0.6 0.7 50 60 40 40 30 20 20 10 0 0.65 0.7 0.75 0 0.8 0.5 0.6 0.7 60 60 40 40 20 20 0 0.65 0.7 0.75 0 0.8 0.55 0.6 0.65 0.7 80 60 60 40 40 20 20 0 0.65 0.7 0 0.75 0.55 0.6 0.65 0.7 0.75 ˆ ˆ Figure 3.15. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = 2. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top to 1 2 1 2 ˆ bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 107 100 80 80 60 60 40 40 20 20 0 0.65 0.7 0.75 0 0.55 0.8 100 0.6 0.65 0.7 0.75 80 80 60 60 40 40 20 20 0 0.7 0.72 0.74 0.76 0 0.78 100 0.62 0.64 0.66 0.68 0.7 0.62 0.64 0.66 0.68 0.7 80 80 60 60 40 40 20 20 0 0.7 0.72 0.74 0.76 0 0.78 100 80 80 60 60 40 40 20 20 0 0.7 0.72 0.74 0.76 0 0.62 0.78 0.64 0.66 0.68 0.7 ˆ ˆ Figure 3.16. Finite sample distribution with the asymptotic distribution of λT S and λM S at ν = 5. The left: {λc , λc } = {1/4, 3/4}; the right: {λc , λc } = {1/3, 2/3}; the top to 1 2 1 2 ˆ bottom: T = 100, 250, 500, 1000. ρ = 1. Solid: finite sample λT S ; dash: finite sample ˆ ˆ ˆ λM S ; dot: asymptotic λT S ; dot-solid: asymptotic λM S . 108 c c c c λ to ahieve maximal |G2(λ,λ1)+ν*G2(λ,λ2)| where λ1=1/3, λ2=2/3 0.8 c λ at maximal |G2(λ,λ1)+ν*G2(λ,λ2)| 0.9 0.7 c 0.6 0.5 0.4 0.3 0.2 0.1 −10 −5 0 ν 5 10 Figure 3.17. λ to achieve maximal G2T S (λ, λc )+ν·G2T S (λ, λc ), {λc , λc } = {1/3, 2/3}, 1 2 1 2 ν = −10, · · · , 10. 109 c c c c λ to ahieve maximal |G2(λ,λ1)+ν*G2(λ,λ2)| where λ1=1/4, λ2=3/4 λ at maximal |G2(λ,λc )+ν*G2(λ,λc )| 1 2 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 −10 −5 0 ν 5 10 Figure 3.18. λ to achieve maximal G2T S (λ, λc )+ν·G2T S (λ, λc ), {λc , λc } = {1/4, 3/4}, 1 2 1 2 ν = −10, · · · , 10. 110 Table 3.5. Probability of Break Number Selection m for Trend Shift Model with 2 breaks: ˆ c , λc } = {1/2, 2/3}, δ = 1, θ = 0.5, T = 120. {λ1 2 1 ˆ ˆ m using λT S ˆ m using λM S ˆ δ2 0.00 1.00 2.00 ≥3.00 0.00 1.00 2.00 ≥3.00 ρ 1.00 0.50 0.00 0.59 0.24 0.17 0.00 0.61 0.26 0.13 0.60 0.00 0.50 0.30 0.20 0.00 0.50 0.32 0.18 0.70 0.00 0.43 0.35 0.22 0.00 0.45 0.35 0.20 0.80 0.00 0.35 0.43 0.22 0.00 0.33 0.42 0.25 0.90 0.00 0.28 0.47 0.25 0.00 0.27 0.48 0.25 1.00 0.00 0.25 0.50 0.25 0.00 0.26 0.51 0.23 0.90 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.61 0.51 0.39 0.29 0.19 0.14 0.28 0.35 0.45 0.53 0.62 0.69 0.11 0.14 0.16 0.18 0.19 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.62 0.52 0.39 0.31 0.20 0.16 0.27 0.36 0.47 0.56 0.66 0.73 0.11 0.12 0.14 0.13 0.12 0.11 0.80 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.55 0.45 0.26 0.18 0.09 0.06 0.36 0.45 0.62 0.69 0.78 0.80 0.09 0.10 0.12 0.13 0.13 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.54 0.44 0.27 0.19 0.06 0.04 0.38 0.47 0.65 0.72 0.81 0.84 0.08 0.09 0.08 0.09 0.13 0.12 0.50 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.55 0.45 0.26 0.18 0.09 0.06 0.36 0.45 0.62 0.69 0.78 0.80 0.09 0.10 0.12 0.13 0.13 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.53 0.42 0.25 0.18 0.10 0.07 0.36 0.42 0.60 0.71 0.82 0.83 0.12 0.16 0.15 0.11 0.08 0.10 111 APPENDICES 112 A.1 Extension of the asymptotics in Theorem 1.5.1 to near-I(1) errors The results in Theorem 1.5.1 are easily extended from pure I(1) errors to near-I(1) errors. Assume (A1.b) ut = ρut−1 + εt , where t = 2, · · · , T, 0 c where ρ = 1 − T , c ≥ 0 is a constant scalar. The asymptotics of the break point estimator is derived under near-I(1) errors to show that the limiting distributions depend on c. Corollary 0.1.1 Suppose the regressions in the level model (1.2.1) and its first difference c . (1.2.2) are estimated by using λ ∈ Λ ⊆ (0, 1) and Tb = λc T is the true break. Under the assumption (A1.b) and (A2.c), the break point estimators by minimizing the SSR(λ) have the limiting distributions as follows. 1. For the level model (1.2.1), 1 1 c 2 ˆ T S ⇒ arg max{ [ 0 F (r, λ)Vc (r)dr + M 0 F (r, λ)F (r, λ )dr] } λ 1 2 λ∈Λ 0 F (r, λ) dr . 1 . δ∗ where Vc (r) = 0 exp(−c(r − s))dW (s), M = d(1) and F (r, λ) is defined in Theorem 1.5.1. 2. For the first difference (1.2.2), ˆ λM S ⇒ 1 [(λW (1) − W (λ) − c 0 (1(r > λ) − (1 − λ))Vc (r)dr) + M (Ψ(λ, λc )]2 arg max{ } λ(1 − λ) λ∈Λ . δ∗ where M = d(1) and Ψ(λ, λc ) is defined in Theorem 1.5.1. 113 A.2 Proof of Theorem 1.5.1 A.2.1 Proof of part 1 in Theorem 1.5.1 ˆ The break point estimator λT S is obtained by minimizing the SSRT S (λ) (See (1.2.3)). ˆ Because SSR0 does not depend on λ, we can equivalently define λT S as 0 ˆ λT S = arg max{SSRT S − SSRT S (λ)}, λ∈Λ 0 where SSRT S denotes the SSR under the assumption of no beaks. Using the Frisch and Waugh (1933) Theorem, T T ˜ ˜ DT t (λ)DT t (λ)]−1 ˆ δ=[ t=1 ˜ DT t (λ)˜t , y (0.2.22) t=1 ˜ where {DT t (λ)} and {˜t } are the residuals from the OLS regressions of {DT t (λ)} and y {yt } on [1 t] . There is a standard result (See Sayginsoy and Vogelsang (2010)) that T ˆ ˜ ˜ DT t (λ)DT t (λ)]δ 2 . 0 SSRT S − SSRT S (λ) = [ (0.2.23) t=1 ˜ ˜ Consider T −1 DT t (λ). Simple algebra gives T −1 DT t (λ) T = T −1 DTt (λ) − T −1 DTt (λ)[1 t] t=1   T  t=1 1 0 0 T −1 1 t 1 0 0 T −1 −1  1 0 0 T −1 1 t ×  1 0 0 T −1 ⇒ (r − λ)1(r > λ) − 1 1 (r − λ)1(r > λ)[1 r]dr[ 0 0 1 r 1 = (r − λ)1(r > λ) − [ 1 r 1 r 1 (r − λ)1(r > λ)dr 0 4 −6 −6 12 dr]−1 1 r (r − λ)1(r > λ)rdr] 0 . 114 1 t Because 1 1 λ2 −λ+ , (r − λ)1(r > λ)dr = 2 2 0 3 1 λ λ 1 (r − λ)1(r > λ)rdr = − + , 6 2 3 0 we have ˜ T −1 DT t (λ) ⇒ (r − λ)1(r > λ) + (λ3 − 2λ2 + λ) − (2λ3 − 3λ2 + 1)r. For simplicity, we define . F (r, λ) = (r − λ)1(r > λ) + (λ3 − 2λ2 + λ) − (2λ3 − 3λ2 + 1)r = (λ3 − 2λ2 + λ) − (2λ3 − 3λ2 + 1)r, if r ≤ λ, (λ3 − 2λ2 ) − (2λ3 − 3λ2 )r, if r > λ. Because {ut } is I(1), T −1/2 u[rT ] ⇒ d(1)W (r), where W (r) is the standard Wiener process. Well known results give 115 T T −3/2 ˜ T −1 DT t (λ)ut ⇒ d(1) t=1 1 ˜ F (r, λ)W (r)dr. 0 Scaling (0.2.23) by T −2 gives T 0 ˆ T −2 [SSRT S − SSRT S (λ)] = [T 1/2 δ]2 [T −1 ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]. t=1 From the previous results, it follows that T ˆ T 1/2 δ = ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 [T −1 t=1 T [T −1 ˜ ˜ T −1 DT t (λ)DT t (λc )T −1 (T 1/2 δ)] + t=1 T [T −1 T ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 [T −3/2 t=1 1 ⇒ [ t=1 1 F (r, λ)2 dr]−1 [δ ∗ 0 ˜ T −1 DT t (λ)ut ] F (r, λ)F (r, λc )dr] 0 1 +[ F (r, λ)2 dr]−1 [d(1) 1 F (r, λ)W (r)dr] 0 = 0 1 1 d(1) 0 F (r, λ)W (r)dr + δ ∗ 0 F (r, λ)F (r, λc )dr , 1 2 0 F (r, λ) dr and T T −1 ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ⇒ t=1 1 F (r, λ)2 dr, 0 which gives 0 T −2 [SSRT S − SSRT S (λ)] ⇒ 1 1 [d(1) 0 F (r, λ)W (r)dr + δ ∗ 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr 116 . Furthermore using the continuous mapping theorem (CMT), we obtain the limit of the break point estimator as ˆ λT S 0 = arg max{SSRT S − SSRT S (λ)) λ∈Λ 0 = arg max{T −2 [SSRT S − SSRT S (λ)]) λ∈Λ ⇒ arg max{ 1 1 [d(1) 0 F (r, λ)W (r)dr + δ ∗ 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr 1 [ 1 F (r, λ)W (r)dr + M 0 F (r, λ)F (r, λc )dr]2 arg max{ 0 }, 1 λ∈Λ F (r, λ)2 dr 0 λ∈Λ = . δ∗ where M = d(1) . A.2.2 Proof of part 2 in Theorem 1.5.1 Using similar arguments as the level model, we have T ˆ ˜ DU t (λ)2 ]δ 2 . 0 SSRM S − SSRM S (λ) = [ t=1 Under the assumptions of Model (1.2.2), the OLS estimate of δ is given by T ˆ δ=[ ˜ 2 DU t (λ)]−1 t=1 T ˜ [DU t (λ)˜t ], y t=1 where T ˜ DU t (λ) = DUt (λ) − ¯ DUt /T = DUt (λ) − DU (λ), t=1 T ˜ ∆y t = ∆yt − ¯ ∆yt /T = ∆yt − ∆y. t=1 Simple algebra gives 117 } T ˜ 2 DU t (λ)]−1 ˆ δ = [ t=1 T = [ ˜ 2 DU t (λ)]−1 t=1 T ˜ ˜ DU t (λ)[DU t (λc )δ + ∆ut ] t=1 T T ˜ ˜ DU t (λ)DU t (λc )δ + [ t=1 ˜ 2 DU t (λ)]−1 t=1 T ˜ DU t (λ)∆ut . t=1 Because ∆ut = εt , T ˜ 2 DU t (λ)]−1 [T −1 ˆ T 1/2 δ = [T −1 T ˜ ˜ DU t (λ)DU t (λc )δ ∗ ] + t=1 T t=1 T −1 ˜ 2 (λ)]−1 [T −1/2 ˜ [T DU t DU t (λ)εt ]; t=1 t=1 also because T 1 ˜ 2 DU t (λ)] ⇒ [T −1 [I(r > λ) − (1 − λ)]2 dr = λ(1 − λ), 0 t=1 and T [T −1 ˜ ˜ DU t (λ)DU t (λc )] ⇒ t=1 = 1 [I(r > λ) − (1 − λ)][I(r > λc ) − (1 − λc )]dr 0 (1 − λc )λ, if λ ≤ λc , ; (1 − λ)λc , if λ > λc , and T T −1/2 [ ˜ DU t (λ)εt ] ⇒ d(1) t=1 1 [I(r > λ) − (1 − λ)]dW (r) 0 = d(1)[λW (1) − W (λ)]; we obtain ˆ T 1/2 δ ⇒ d(1) δ∗ Φ(λ, λc ) + [λW (1) − W (λ)], λ(1 − λ) λ(1 − λ) where 118 (1 − λc )λ, if λ ≤ λc , (1 − λ)λc , if λ > λc . Φ(λ, λc ) = Using this result, it immediately follows that T 0 SSRM S − SSRM S (λ) ˆ ˜ DU t (λ)2 ]δ 2 = [ t=1 T [T −1 = ˆ ˜ DU t (λ)2 ][T 1/2 δ]2 t=1 1 ⇒ λ(1 − λ) [d(1)(λW (1) − W (λ)) + δ ∗ Ψ(λ, λc )]2 . Applying the CMT theorem gives ˆ λT S 0 = arg max{SSRM S − SSRM S (λ)) λ∈Λ [d(1)(λW (1) − W (λ)) + δ ∗ Ψ(λ, λc )]2 } λ(1 − λ) λ∈Λ [(λW (1) − W (λ)) + M Ψ(λ, λc )]2 } = arg max{ λ(1 − λ) λ∈Λ ⇒ arg max{ ∗ δ where M = d(1) . A.3 Proof that arg maxλ G2(λ, λc) = λc a) First I derive G2T S (λ, λc ), which is a function of λ and λc . Then I prove that it always achieves the global maximum at λ = λc in [0, 1]. This result suggests that arg max{G1T S (λ) + G2T S (λ, λc )} converges to λc as M increases. Take λ ≤ λc , we have G2T S (λ, λc ) = = (1 − λ)2 (1 − λc )2 (λ + λc + 2λλc + 2)/6 (1 − λ3 )(1 − λc )3 /3 (1 − λc )2 (λ + λc + 2λλc + 2) √ √ . 2 3 λ2 + λ + 1 119 Taking the derivative of G2T S with respect to λ gives √ 3(1 − λc )2 (λc − λ) . G2T S (λ) = 4 (λ2 + λ + 1)3/2 We can see that G2T S (λ) ≥ 0 when λ ≤ λc , which proves that the maximum value of G2T S is obtained at λ = λc for λ ≤ λc . Now take λ ≥ λc , giving √ G2T S (λ, λc ) = 3(λc )2 [(6 − 3λc ) + (2λc − 3)λ] √ . 4 λ2 − 3λ + 3 The derivative of G2T S (λ, λc ) with respect to λ is √ c 2 (λc − λ) 3(λ ) . G2T S (λ) = 4 (λ2 − 3λ + 3)3/2 The fact that G2T S (λ) ≤ 0 shows that the maximum value of G2T S is obtained at λ = λc when λ ≥ λc . b) I calculate G2M S (λ, λc ) and show that it achieves the global maximum at λ = λc in [0, 1]. The conclusion is similar to that in part a): arg max{G1M S (λ) + G2M S (λ, λc )} converges to λc as M increases. When λ ≤ λc G2M S (λ, λc ) = = ≤ (1 − λc )λ λ(1 − λ) √ (1 − λc ) λ (1 − λ) (1 − λc )λc hence G2M S (λ, λc ) attains maximum at λ = λc for λ ≤ λc . It can be proved similarly for λ ≥ λc . Combining a) and b) we obtain that arg maxλ G2T S (λ, λc ) arg maxλ G2M S (λ, λc ) = λc . 120 = λc and A.4 Proof of Corollary 0.1.1 A.4.1 Proof of part 1 in Corollary 0.1.1 . c Under assumption (A1.b): ut = ρut−1 + εt , where ρ = 1 − T , T T −3/2 ˜ T −1 DT t (λ)ut ⇒ d(1) t=1 1 F (r, λ)Vc (r)dr, 0 1 where Vc (r) = 0 exp(−c(r − s))dW (s). A.4.2 Proof of part 2 in Corollary 0.1.1 c c Because ut = (1 − T )ut−1 + εt , it follows that ∆ut = − T ut−1 + εt . This gives T (T − 1)−1/2 T ˜ DU t (λ)∆ut = (T − 1)−1/2 t=2 ˜ DU t (λ)εt t=2 T −(T − 1)−1/2 (T )−1 c ˜ DU t (λ)ut−1 , t=2 where T (T − 1)−1/2 ˜ DU t (λ)εt ⇒ d(1)[λW (1) − W (λ)], t=2 and T (T − 1)−1/2 (T )−1 c ˜ DU t (λ)ut−1 ⇒ d(1)c t=2 1 (1(r > λ) − (1 − λ))Vc (r)dr. 0 The rest of the proof is straightforward and follows the proof of Theorem 1.5.1. 121 A.5 Proof of Theorem 1.6.2 A.5.1 Proof of part 1 in Theorem 1.6.2 ˆ The break point estimator λT S is obtained by minimizing the SSRT S (λ) (See (1.2.3)). ˆ Because SSR0 does not depend on λ, we can equivalently define λT S as 0 ˆ λT S = arg max{SSRT S − SSRT S (λ)}, λ∈Λ 0 where SSRT S denotes the SSR under the assumption of no beaks. Using the Frisch and Waugh (1933) Theorem, T T ˜ ˜ DT t (λ)DT t (λ)]−1 ˆ δ=[ t=1 ˜ DT t (λ)˜t , y (0.5.24) t=1 ˜ where {DT t (λ)} and {˜t } are the residuals from the OLS regressions of {DT t (λ)} and y {yt } on [1 t] . There is a standard result (See Sayginsoy and Vogelsang (2010)) that T ˆ ˜ ˜ DT t (λ)DT t (λ)]δ 2 . 0 SSRT S − SSRT S (λ) = [ t=1 ˜ Consider T −1 DT t (λ), simple algebra in the proof of Theorem 1.5.1 gives . ˜ T −1 DT t (λ) ⇒ F (r, λ) = (r − λ)1(r > λ) + (λ3 − 2λ2 + λ) − (2λ3 − 3λ2 + 1)r. Because {ut } is I(0), [rT ] T −1/2 ut ⇒ d(1)W (r), t=1 where W (r) is the standard Wiener process. Well known results give T T −1/2 ˜ T −1 DT t (λ)ut ⇒ d(1) t=1 ˆ Scaling δ by T 3/2 , equation (1.2.3) is written as 122 1 F (r, λ)dW (r). 0 T ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]. 0 ˆ [SSRT S − SSRT S (λ)] = [T 3/2 δ]2 [T −1 t=1 From the previous results, it follows that T ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 ˆ T 3/2 δ = [T −1 t=1 T [T −1 ˜ ˜ T −1 DT t (λ)DT t (λc )T −1 (T 3/2 δ)] + t=1 T [T −1 T ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 [T −1/2 t=1 1 ⇒ [ t=1 1 F (r, λ)2 dr]−1 [δ ∗ 0 F (r, λ)F (r, λc )dr] 0 1 +[ = ˜ T −1 DT t (λ)ut ] F (r, λ)2 dr]−1 [d(1) 1 F (r, λ)dW (r)] 0 0 ∗ 1 F (r, λ)F (r, λc )dr + d(1) 1 F (r, λ)dW (r) δ 0 0 , 1 2 dr F (r, λ) 0 and T T −1 ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ⇒ t=1 1 F (r, λ)2 dr, 0 which gives 0 [SSRT S − SSRT S (λ)] ⇒ 1 1 [d(1) 0 F (r, λ)dW (r) + δ ∗ 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr Furthermore, using the CMT we obtain the limit of the break point estimator as ˆ λT S 0 = arg max{SSRT S − SSRT S (λ)) λ∈Λ ⇒ arg max{ 1 1 [d(1) 0 F (r, λ)dW (r) + δ ∗ 0 F (r, λ)F (r, λc )dr]2 1 2 0 F (r, λ) dr 1 [ 1 F (r, λ)dW (r) + M 0 F (r, λ)F (r, λc )dr]2 arg max{ 0 }, 1 λ∈Λ F (r, λ)2 dr 0 λ∈Λ = 123 } . 3/2 . δ∗ . where M = d(1) ≡ δT d(1) A.5.2 Proof of part 2 in Theorem 1.6.2 ˆ Because SSR0 does not depend on λ, we can equivalently define λQS as 0 ˆ λQS = arg max{SSRQS − SSRQS (λ)}. λ∈Λ Using the Frisch and Waugh (1933) Theorem, T T ˜ ˜ DQt (λ)DQt (λ)]−1 ˆ δ=[ t=1 ˜ ˜ DQt (λ)St , t=1 ˜ ˜ where {DQt (λ)} and {St } are the residuals from the OLS regressions of {DQt (λ)} and {St } on [t t2 ] . There is a standard result (See Sayginsoy and Vogelsang (2010)) that T ˆ ˜ ˜ DQt (λ)DQt (λ)]δ 2 . 0 SSRQS − SSRQS (λ) = [ t=1 ˜ ˜ Consider T −2 DT t (λ). Simple algebra gives T −2 DQt (λ) T T −1 0 = T −2 DQt (λ) − T −2 DQt (λ)[t t2 ] × 0 T −2 t=1  −1  T  T −1 0 t T −1 0 2 t t  0 T −2 t2 0 T −2  t=1 T −1 0 0 T −2 t t2 ⇒ 1 (r − λ)2 1 (r − λ)2 1(r > λ) − 1(r > λ)[r r2 ]dr[ 2 2 0 0 r r2 r r2 dr]−1 r r2 = 1 (r − λ)2 (r − λ)2 1(r > λ) − [ 1(r > λ)rdr 2 2 0 48 −60 −60 80 r r2 . 124 1 (r − λ)2 0 2 1(r > λ)r2 dr] Because 1 (r − λ)2 1 λ λ2 λ 4 − + − , 2 8 3 4 24 0 1 (r − λ)2 2 1 λ λ λ5 1(r > λ)r2 dr = − + − , 2 10 4 6 60 0 1(r > λ)rdr = we have ˜ T −2 DQt (λ) ⇒ (r − λ)2 1 5λ2 5λ4 4λ5 2 1(r > λ) − (−λ + 2λ2 − 2λ4 + λ5 )r − ( − + − )r . 2 2 3 2 3 For simplicity, we define Q(r, λ) 1 5λ2 5λ4 4λ5 2 . (r − λ)2 1(r > λ) − (−λ + 2λ2 − 2λ4 + λ5 )r − ( − + − )r . = 2 2 3 2 3 Because {ut } is I(1), T −1/2 u[rT ] ⇒ d(1)W (r), where W (r) is the standard Wiener process. Well known results give T T −3/2 ˜ T −1 DQt (λ)ut ⇒ d(1) t=1 1 ˜ Q(r, λ)W (r)dr. 0 Scaling (0.5.25) by T −2 gives T ˜ ˜ T −2 DQt (λ)DQt (λ)T −2 ]. 0 ˆ T −2 [SSRQS − SSRQS (λ)] = [T 3/2 δ]2 [T −1 t=1 The rest part of the proof is straight forward and follows the proof of Theorem 1.5.1 part 1. B.1 Proofs and Additional Results of Chapter 2 Proof of Theorem 2.5.3. The result for the numerator of LM (m, Tb ) follows directly from Sayginsoy and Vogelsang (2010). All that is needed to complete the proof is the fixed-b 125 limit of σ 2 (m). Because the fixed-b algebra for σ 2 (m) is the same as the algebra used by ˜ ˜ Hashimzade and Vogelsang (2008), once we derive the limit of the partial sums of ut the ˜ fixed-b limits follow directly using arguments in Kiefer and Vogelsang (2005), Hashimzade and Vogelsang (2008) and Sayginsoy and Vogelsang (2010). Define [rT ] ˜ S[rT ] = ut . ˜ t=1 where, 0 0 ut = yt − y = δ(DUt (Tb ) − DU (Tb )) + ut − u, ˜ giving [rT ] [rT ] 0 0 (DUt (Tb ) − DU (Tb )) + ˜ S[rT ] = δ t=1 (ut − u). t=1 For I(0) errors recall that under HA we have δ = T −1/2 δ0 and it follows that [rT ] ˜ T −1/2 S[rT ] = δ0 T −1 [rT ] 0 0 (DUt (Tb ) − DU (Tb )) + T −1/2 t=1 (ut − u) t=1 ⇒ δ0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + σ[W (r) − rW (1)] δ0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + W (r) − rW (1) σ =σ ≡ σQ0 (r). For I(1) errors δ = T −1/2 δ0 giving ˜ T −3/2 S[rT ] [rT ] [rT ] = δ0 T −1 0 0 (DUt (Tb ) − DU (Tb )) + T −3/2 t=1 (ut − u) t=1 r ⇒ δ0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + d(1)[ = d(1) 1 Vc (s)ds − r 0 Vc (s)ds] 0 r 1 δ0 [(r − λ0 )1(r > λ0 ) − r(1 − λ0 )] + Vc (s)ds − r Vc (s)ds d(1) 0 0 ≡ d(1)Q1 (r). 126 C.1 Proof of Theorem 3.4.4 1) Derive the asymptotic distribution with underspecified break number We have T 0 RSSM S − RSSM S (λ) = [ ˆ2 ˜ DU t (λ)2 ]δM S . t=1 Under the assumptions of Model (3.2.1), the OLS estimate of δ is given by T ˆ δM S = [ ˜ 2 DU t (λ)]−1 t=1 T ˜ [DU t (λ)˜t ], y t=1 where T ˜ DU t (λ) = DUt (λ) − ¯ DUt /T = DUt (λ) − DU (λ). t=1 When the break number is under estimated, simple algebra gives T t=1 T = [ T ˜ 2 DU t (λ)]−1 ˆ δM S = [ ˜ 2 DU t (λ)]−1 ˜ ˜ ˜ DU t (λ)[DU t (λc )δ1 + DU t (λc )δ2 + ut ] 1 2 t=1 T ˜ ˜ ˜ DU t (λ)[DU t (λc )δ1 + DU t (λc )δ2 ] 1 2 t=1 T t=1 T ˜ ˜ 2 (λ)]−1 DU t (λ)ut . DU t +[ t=1 t=1 Multiplying both sides of the above equation by T 1/2 , we have T ˆ T 1/2 δM S = [T −1 ˜ 2 DU t (λ)]−1 [T −1 T ∗ ∗ ˜ ˜ ˜ DU t (λ)(DU t (λc )δ1 + DU t (λc )δ2 )] + 1 2 t=1 T t=1 T −1 ˜ 2 (λ)]−1 [T −1/2 ˜ [T DU t DU t (λ)ut ]; t=1 t=1 Because T [T −1 t=1 ˜ 2 DU t (λ)] ⇒ 1 [I(r > λ) − (1 − λ)]2 dr = λ(1 − λ), 0 127 and T [T −1 ˜ ˜ DU t (λ)DU t (λc )] ⇒ t=1 = 1 [I(r > λ) − (1 − λ)][I(r > λc ) − (1 − λc )]dr 0 (1 − λc )λ, if λ ≤ λc , ; (1 − λ)λc , if λ > λc , and T T −1/2 [ ˜ DU t (λ)εt ] ⇒ d(1) t=1 1 [I(r > λ) − (1 − λ)]dW (r) 0 = d(1)[λW (1) − W (λ)]; we obtain 128 ˆ T 1/2 δM S ⇒ ∗ ∗ δ1 δ2 d(1) Φ(λ, λc ) + Φ(λ, λc ) + [λW (1) − W (λ)], 1 2 λ(1 − λ) λ(1 − λ) λ(1 − λ) where Φ(λ, λc ) = (1 − λc )λ, if λ ≤ λc , (1 − λ)λc , if λ > λc . From this result, it immediately follows that 0 RSSM S − RSSM S (λ) T = [T −1 ˆ ˜ DU t (λ)2 ][T 1/2 δM S ]2 t=1 1 ⇒ λ(1 − λ) ∗ ∗ [d(1)(λW (1) − W (λ)) + δ1 Ψ(λ, λc ) + δ2 Ψ(λ, λc )]2 . 1 2 Applying the CMT theorem gives ˆ λT S 0 = arg max{SSRM S − SSRM S (λ)) λ∈Λ [(λW (1) − W (λ)) + M1 Ψ(λ, λc ) + M2 Ψ(λ, λc )]2 1 2 } ⇒ arg max{ λ(1 − λ) λ∈Λ δ∗ δ∗ 1 2 where M1 = d(1) and M2 = d(1) . Let’s further take a look at the M1 G2(λ, λc )+M2 G2(λ, λc ). First take the first derivative 1 2 of G2M S w.r.t. λ. G2M S (λ, λc ) = (1 − λc )λ 2(1 − λ) λ(1 − λ) , when λ ≤ λc and G2M S (λ, λc ) = (1 − λ)λc 2(1 − λ) λ(1 − λ) , when λ ≥ λc Assume λc < λc , 1 2 (M1 G2(λ, λc ) + M2 G2(λ, λc )) = M 1 1 2 (1 − λc )λ 1 2(1 − λ) λ(1 − λ) 129 + M2 (1 − λc )λ 2 , 2(1 − λ) λ(1 − λ) when λ ≤ λc ; 1 (M1 G2(λ, λc ) + M2 G2(λ, λc )) = M1 2 1 (1 − λ)λc 1 2(1 − λ) λ(1 − λ) + M2 (1 − λc )λ 2 , 2(1 − λ) λ(1 − λ) when λc ≤ λ ≤ λc ; 2 1 and (M1 G2(λ, λc ) + M2 G2(λ, λc )) = M1 2 1 (1 − λ)λc 1 2(1 − λ) λ(1 − λ) + M2 (1 − λ)λc 2 , 2(1 − λ) λ(1 − λ) when λ ≥ λc . 2 Through simple algebras, we can show that the peak values will be obtained at either λc or 1 λc . 2 C.2 Proof of Theorem 3.4.5 ˆ 1) asymptotic distribution of λc During underspecification of the break number, the only difference with the case of corˆ rect break number estimation in the form of the RSS0 − RSS1 (λ) is δT S . We can also get the standard result that T 0 SSRT S − SSRT S (λ) = [ ˆ2 ˜ ˜ DT t (λ)DT t (λ)]δT S . t=1 ˜ Consider T −1 DT t (λ), simple algebra as in Yang (2010) gives . ˜ T −1 DT t (λ) ⇒ F (r, λ) = (r − λ)1(r > λ) + (λ3 − 2λ2 + λ) − (2λ3 − 3λ2 + 1)r. Because {ut } is I(0), [rT ] T −1/2 ut ⇒ d(1)W (r), t=1 where W (r) is the standard Wiener process. Well known results give 130 T T −1/2 ˜ T −1 DT t (λ)ut ⇒ d(1) t=1 1 F (r, λ)dW (r). 0 ˆ ˆ Now we estimate δT S . Scaling δT S by T 3/2 , equation (3.2.6) is written as T ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]. 0 ˆ [SSRT S − SSRT S (λ)] = [T 3/2 δT S ]2 [T −1 t=1 From the previous results, it follows that T T ˜ ˜ DT t (λ)DT t (λ)]−1 ˆ δT S = t=1 ˜ DT t (λ)˜t . y t=1 Define the matrix X0 and DT (λc ) be stacked [1 t] and DTt (λc ) from t = 1, · · · , T . If there are two breaks at λc and λc in the model (3.2.3) 1 2 ˜ Y = Y − X0 (X0 X0 )−1 X0 Y = (X0 [α β] + δ1 DT (λc ) + δ2 DT (λc ) + U ) − 1 2 X0 (X0 X0 )−1 X0 (X0 [α β] + δ1 DT (λc ) + δ2 DT (λc ) + U ) 1 2 = δ1 (I − X0 (X0 X0 )−1 X0 )DT (λc ) + δ2 (I − X0 (X0 X0 )−1 X0 )DT (λc ) + U 1 2 ˜ ˜ = δ1 DT t (λc ) + δ2 DT t (λc ) + U 1 2 Under I(0) errors I define the break magnitude within a T −3/2 neighborhood of 0 as in ˆ Assumption (C2.b). Next I derive the asymptotic distribution of the λT S with underspecified break number. 131 From the previous results, it follows that ˆ T 3/2 δT S T T = ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 [T −1 [T −1 ˜ ˜ T −1 DT t (λ)DT t (λc )T −1 (T 3/2 δ1 ) 1 t=1 t=1 T ˜ ˜ T −1 DT t (λ)DT t (λc )T −1 (T 3/2 δ2 )] + +T −1 2 t=1 T T t=1 t=1 1 ⇒ [ 0 1 [ = ˜ T −1 DT t (λ)ut ] ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ]−1 [T −1/2 [T −1 ∗ F (r, λ)2 dr]−1 [δ1 1 0 F (r, λ)2 dr]−1 [d(1) ∗ F (r, λ)F (r, λc )dr + δ2 1 1 0 F (r, λ)F (r, λc )dr] + 2 1 F (r, λ)dW (r)] 0 0 ∗ 1 F (r, λ)F (r, λc )dr + δ ∗ 1 F (r, λ)F (r, λc )dr] + d(1) 1 F (r, λ)dW (r) [δ1 0 1 2 0 2 0 , 1 2 dr F (r, λ) 0 and T T −1 ˜ ˜ T −1 DT t (λ)DT t (λ)T −1 ⇒ t=1 1 F (r, λ)2 dr, 0 which gives 0 [SSRT S − SSRT S (λ)] ⇒ 1 ∗ 1 ∗ 1 [d(1) 0 F (r, λ)dW (r) + δ1 0 F (r, λ)F (r, λc )dr + δ2 0 F (r, λ)F (r, λc )dr]2 1 2 1 2 0 F (r, λ) dr Furthermore, using the CMT we obtain the limit of the break point estimator as ˆ λT S 0 = arg max{SSRT S − SSRT S (λ)} λ∈Λ 1 0 F (r, λ)dW (r) + = arg max{[ 1 λ∈Λ 2 0 F (r, λ) dr 1 1 M1 0 F (r, λ)F (r, λc )dr + M2 0 F (r, λ)F (r, λc )dr 2 1 2 ] }, 1 2 dr 0 F (r, λ) 132 . . δ∗ . δ∗ δ T 3/2 δ T 3/2 where M1 = d(1) ≡ 1d(1) and M2 = d(1) ≡ 2d(1) . C.3 Analysis of G2T S under two breaks In this section, we analyze the G2T S to show how the break point estimation would be in the presence of under-specification of the break number. First G2T S (λ, λc ) is a function of λ and λc . Then I prove that in the presence of two break points λc and λc it would not always achieve the global maximum at any true break 1 2 point λc or λc in [0, 1]. This result illustrate the inconsistency problem of the break point 1 2 estimator for the trend shift model. ˆ When M1 and M2 are big, the break point estimator λT S is dominated by the properties of M1 G2(λ, λc ) + M2 G2(λ, λc ). Yang (2010) shows that the maximal value of G2(λ, λc ) 1 2 ˆ is achieved at λ = λc , hence when there is only 1 break, λT S is consistent to the true break point. However, when the true break points are two and the break number is estimated as one, M1 G2(λ, λc ) + M2 G2(λ, λc ) achieves the maximal values not at λc or λc , which 1 2 1 2 causes the inconsistent break point estimator. G2T S (λ, λc ) = = = (1 − λ)2 (1 − λc )2 (λ + λc + 2λλc + 2)/6 (1 − λ3 )(1 − λc )3 /3 (1 − λc )2 (λ + λc + 2λλc + 2)/6 (λ2 + λ + 1)/3 (1 − λc )2 (λ + λc + 2λλc + 2) √ √ . 2 3 λ2 + λ + 1 Taking the derivative of G2T S with respect to λ gives √ 3(1 − λc )2 (λc − λ) . G2T S (λ) = 4 (λ2 + λ + 1)3/2 We can see that G2T S (λ) ≥ 0 when λ ≤ λc , which proves that the maximum value of G2T S is obtained at λ = λc for λ ≤ λc . 133 Now take λ ≥ λc , giving √ G2T S (λ, λc ) = 3(λc )2 [(6 − 3λc ) + (2λc − 3)λ] √ . 4 λ2 − 3λ + 3 The derivative of G2T S (λ, λc ) with respect to λ is √ c 2 3(λ ) (λc − λ) . G2T S (λ) = 4 (λ2 − 3λ + 3)3/2 The fact that G2T S (λ) ≤ 0 shows that the maximum value of G2T S is obtained at λ = λc when λ ≥ λc . However, if there are two breaks, λc and λc , we need to analyze what the break point 1 2 estimation should be. G2T S (λ, λc ) = 1 = (1 − λ)2 (1 − λc )2 (λ + λc + 2λλc + 2)/6 1 1 1 (1 − λ3 )(1 − λc )3 /3 1 (1 − λc )2 (λ + λc + 2λλc + 2)/6 1 1 1 (λ2 + λ + 1)/3 (1 − λc )2 (λ + λc + 2λλc + 2) 1 √1 √1 = . 2 3 λ2 + λ + 1 G2T S (λ, λc ) = 2 = = (1 − λ)2 (1 − λc )2 (λ + λc + 2λλc + 2)/6 2 2 2 (1 − λ3 )(1 − λc )3 /3 2 (1 − λc )2 (λ + λc + 2λλc + 2)/6 2 2 2 (λ2 + λ + 1)/3 (1 − λc )2 (λ + λc + 2λλc + 2) 2 √2 √2 . 2 3 λ2 + λ + 1 For each G2T S , we take the derivative of G2T S with respect to λ and get √ 3(1 − λc )2 (λc − λ) [G2T S ] |λ = . 4 (λ2 + λ + 1)3/2 We can see that [G2T S ] |λ ≥ 0 when λ ≤ λc , which proves that the maximum value of G2T S is obtained at λ = λc . When λ ≥ λc , we can get 134 √ G2T S (λ, λc ) = 3(λc )2 [(6 − 3λc ) + (2λc − 3)λ] √ . 4 λ2 − 3λ + 3 The derivative of G2T S (λ, λc ) over λ is 1 √ [G2T S ] |λ = 3(λc )2 (λc − λ) . 4 (λ2 − 3λ + 3)3/2 The fact that [G22 S ] |λ ≤ 0 shows that the maximum value of G2T S is obtained at λ = λc T when λ ≥ λc . For two break at λc and λc , define G2∗ S = [M1 ∗ G2T S (λ, λc ) + M2 ∗ G2T S (λ, λc )]2 , 2 1 2 1 T we have the following proof to show that the maximum will not necessarily to be achieved at λc or λc . 1 2 a) when λ < λc , there is 1 [G2∗ S ] |λ = M1 [G2T S ] (λ, λc )|λ + M2 [G2T S ] (λ, λc )|λ 1 2 T √ √ c )2 c − λ) 3(1 − λ1 3(1 − λc )2 (λ1 (λc − λ) 2 2 = M1 + M2 . 4 4 (λ2 + λ + 1)3/2 (λ2 + λ + 1)3/2 To obtain the maximum value of G2∗ S , we have to make [G2∗ S ] |λ = 0. T T [G2∗ S ] |λ = M1 [G2T S ] (λ, λc )|λ + M2 [G2T S ] (λ, λc )|λ 1 2 T √ √ 3(1 − λc )2 (λc − λ) 3(1 − λc )2 (λc − λ) 1 2 1 2 = M1 + M2 . 2 + λ + 1)3/2 2 + λ + 1)3/2 4 4 (λ (λ We need √ √ 3(1 − λc )2 (λc − λ) 3(1 − λc )2 (λc − λ) 1 1 2 2 M1 + M2 = 0. 2 + λ + 1)3/2 2 + λ + 1)3/2 4 4 (λ (λ a.1 both M1 > 0 and M2 > 0 or both M1 < 0 and M2 < 0. For this case, we can see that, there is no λ to satisfy this condition. Hence there would be no maximum under this condition. a.2 M1 > 0 and M2 < 0 or M1 < 0 and M2 > 0. 135 Under this condition, we can solve the λ that M1 (1 − λc )2 λc + M2 (1 − λc )2 λc 1 1 2 2 λ = c )2 + M (1 − λc )2 . M1 (1 − λ1 2 2 b) when λ > λc 2 Similar, we need to get M1 G2T S (λ, λc ) + M2 G2T S (λ, λc ) = 0 2 1 hence √ √ c 2 3(λc )2 (λc − λ) 3(λ2 ) (λc − λ) 1 1 2 M1 + M2 = 0. 2 − 3λ + 3)3/2 2 − 3λ + 3)3/2 4 4 (λ (λ b.1 both M1 > 0 and M2 > 0 or both M1 < 0 and M2 < 0. Similar reason, there will be result for this condition. a.2 M1 > 0 and M2 < 0 or M1 < 0 and M2 > 0. λ = M1 (λc )2 λc + M2 (λc )2 λc 1 1 2 2. M1 (λc )2 + M2 (λc )2 1 2 c) when λc < λ < λc 1 2 Under this condition, we will need to make √ c 2 √ 3(λ1 ) (λc − λ) 3(1 − λc )2 (λc − λ) 1 2 2 + M2 = 0. M1 2 − 3λ + 3)3/2 2 + λ + 1)3/2 4 4 (λ (λ We can see that the solution depends M1 and M2 . Therefore our analysis shows that the break point estimator performs very different for different M1 and M2 . whether M1 and M2 are positive or negative would be important. It shows the existence of the inconsistency problem for multiple trend shift estimation problem. 136 C.4 Proof of Theorem 3.5.6 C.4.1 ˆ Proof of part 1: λT S . c Under assumption (C1.b): ut = ρut−1 + εt , where ρ = 1 − T , T T −3/2 ˜ T −1 DT t (λ)ut ⇒ d(1) t=1 1 F (r, λ)Vc (r)dr, 0 r where Vc (r) = 0 exp(−c(r − s))dW (s). C.4.2 ˆ Proof of part 2: λM S c c Because ut = (1 − T )ut−1 + εt , it follows that ∆ut = − T ut−1 + εt . This gives T T (T − 1)−1/2 ˜ DU t (λ)∆ut = (T − 1)−1/2 t=2 ˜ DU t (λ)εt − t=2 T (T − 1)−1/2 (T )−1 c ˜ DU t (λ)ut−1 , t=2 where T (T − 1)−1/2 ˜ DU t (λ)εt ⇒ d(1)[λW (1) − W (λ)], t=2 and T (T − 1)−1/2 (T )−1 c ˜ DU t (λ)ut−1 ⇒ d(1)c t=2 1 (1(r > λ) − (1 − λ))Vc (r)dr. 0 The rest of the proof is straightforward and follows the proof of Theorem 3.4.5 and Theorem 3.4.4. 137 BIBLIOGRAPHY 138 BIBLIOGRAPHY Andrews, D. W. K.: (1991), Heteroskedasticity and autocorrelation consistent covariance matrix estimation, Econometrica 59, 817–854. Andrews, D. W. K.: (1993), Tests for parameter instability and structural change with unknown change point, Econometrica 61, 821–856. Andrews, D. W. K. and Ploberger, W.: (1994), Optimal tests when a nuisance parameter is present only under the alternative, Econometrica 62, 1383–1414. Bai, J.: (1998), A note on spirious breaks, Econometric Theory 14, 663–669. Bai, J. S.: (1994), Least squares estimation of a shift in linear process, Journal of Time Series Analysis 15, 453–472. Bai, J. S.: (1995), Least absolute deviation estimation of a shift, Econometric Theory 11, 403–436. Bai, J. S. and Perron, P.: (1998), Estimating and testing linear models with multiple structural breaks, Econometrica 66, 47–78. Bai, J. S. and Perron, P.: (2003), Computaion and analysis of multiple structural change models, Journal of Applied Econometics 18, 1–22. Bowman, A. W. and Azzalini, A.: (1997), Applied smoothing techniques for data analysis: the kernel approach with S-plus illustrations, Oxford University Press, United Kingdom. Various Issues. Bunzel, H. and Vogelsang, T. J.: (2005), Powerful trend function tests that are robust to strong serial correlation with an application to the prebisch-singer hypothesis, Journal of Business and Economic Statistics 23, 381–394. Canjels, E. and Watson, M. W.: (1997), Estimating deterministic trends in the presence of serially correlated errors, Review of Economics and Statistics May, 184–200. Chong, T.: (1994), Consistency of change-point estimators when the mumber of change points in structural change models is underspecified. working paper, Department of Economics, University of Rochester. 139 Chong, T.: (1995), Partial parameter consistency in a misspecified structural change model, Economics Letters 49, 351–357. Crainiceanu, C. and Vogelsang, T. J.: (2007), Non-monotonic power for tests of mean shift in a time series, Journal of Statistical Computation and Simulation 77, 457–476. Deng, A. and Perron, P.: (2006), A comparison of alternative asymptotic frameworks to analyze a structural change in a linear time trend, Econometrics Journal 9, 423–447. Deng, A. and Perron, P.: (2008), A non-local perspective on the power properties of the cusum and cusum of squares tests for structural change, Journal of Econometrics 141, 212–240. Frisch, R. and Waugh, F.: (1933), Partial time regressions as compared with individual trends, Econometrica 45, 939–953. Hashimzade, N. and Vogelsang, T. J.: (2008), Fixed-b asymptotic approximation of the sampling behavior of nonparametric spectral density estimators, Journal of Time Series Analysis 29, 142–162. Juhl, T. and Xiao, Z.: (2009), Tests for changing mean with monotonic power, Journal of Econometrics 148, 14–24. Kejriwal, M.: (2009), Tests for a mean shift with good size and monotonic power, Economics Letters 102, 78–82. Kejriwal, M. and Perron, P.: (2010), A sequential procedure to determine the number of breaks in trend with an integrated or stationary noise component. working paper, Department of Economics, Purdue University. Kiefer, N. M. and Vogelsang, T. J.: (2005), A new asymptotic theory for heteroskedasticityautocorrelation robust tests, Econometric Theory 21, 1130–1164. Ng, S. and Vogelsang, T. J.: (2002), Analysis of vector autoregressions in the presence of shifts in mean, Econometric Reviews 21, 353–381. Nunes, L. C., Kuan, C. M. and Newbold, P.: (1995), Spurious break, Econometric Theory 11, 736–749. Perron, P.: (1990), Testing for a unit root in a time series with a changing mean, Journal of Business and Economic Statistics 8, 153–162. Perron, P.: (1991), A test for changes in a polynomial trend function for a dynamic time series. manuscript, Princeton University. 140 Perron, P. and Zhu, X.: (2005), Structural breaks with stochastic and deterministic trends, Journal of Econometrics 129, 65–119. Phillips, P. C. B.: (1987), Time series regression with unit roots, Econometrica 55, 277– 302. Sayginsoy, O. and Vogelsang, T. J.: (2010), Testing for a shift in trend at an unknown date: A fixed-b analysis of heteroskedasticity autocorrelation robust ols based tests, Econometric Theory . forthcoming. Vogelsang, T. J.: (1997), Wald-type tests for detecting shifts in the trend function of a dynamic time series, Econometric Theory 13, 818–849. Vogelsang, T. J.: (1998), Testing for a shift in mean without having to estimate serial correlation parameters, Journal of Business and Economic Statistics 16, 73–80. Vogelsang, T. J.: (1999), Sources of nonmonotonic power when testing for a shift in mean of a dynamic time series, Journal of Econometrics 88, 283–300. Yang, J.: (2010), Break point estimates for a shift in trend: Levels versus first differences. Working paper, Department of Economics, Michigan State University. Yao, Y.-C.: (1987), Approximating the distribution of the ml estimate of the change point in a sequence of independent r.v.’s, Annals of Statistics 4, 1321–1328. 141