ESTIMATION AND INFERENCE IN COINTEGRATED PANELS By Yi Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics—Doctor of Philosophy 2017 ABSTRACT ESTIMATION AND INFERENCE IN COINTEGRATED PANELS By Yi Li This dissertation investigates parameter estimation and inference in cointegrated panel data model. In Chapter 1, for homogeneous cointegrated panels, a simple, new estimation method is proposed based on Vogelsang and Wagner [2014]. The estimator is labeled panel integrated modified ordinary least squares (panel IM-OLS). Similar to panel fully modified ordinary least squares (panel FM-OLS) and panel dynamic ordinary least squares (panel DOLS), the panel IM-OLS estimator has a zero mean Gaussian mixture limiting distribution. However, panel IM-OLS does not require estimation of long run variance matrices and avoids the need to choose tuning parameters such as kernel functions, bandwidths, leads and lags. Inference based on panel IM-OLS estimates does require an estimator of a scalar long run variance, and critical values for test statistics are obtained from traditional and fixed-b methods. The properties of panel IM-OLS are analyzed using asymptotic theory and finite sample simulations. Panel IM-OLS performs well relative to other estimators. Chapter 2 compares asymptotic and bootstrap hypothesis tests in cointegrated panels with crosssectional uncorrelated units and endogenous regressors. All the tests are based on the panel IM-OLS estimator from Chapter 1. The aim of using the bootstrap tests is to deal with the size distortion problems in the finite samples of fixed-b tests. Finite sample simulations show that the bootstrap method outperforms the asymptotic method in terms of having lower size distortions. In general, the stationary bootstrap is better than the conditional-on-regressors bootstrap, although in some cases, the conditional-on-regressors bootstrap has less size distortions. The improvement in size comes with only minor power losses, which can be ignored when the sample size is large. Chapter 3 is concerned with parameter estimation and inference in a more general case than Chapter 1 with endogenous regressors and heterogeneous long run variances in the cross section. In addition, the model allows a limited degree of cross-sectional dependence due to a common time effect. The panel IM-OLS estimator is provided for this less restricted model. Similar as in Chapter 1, this panel IM-OLS estimator has a zero mean Gaussian mixture limiting distribution. However, standard asymptotic inference is infeasible due to the existence of nuisance parameters. Inference based on panel IM-OLS relies on the stationary bootstrap. The properties of panel IM-OLS are analyzed using the stationary bootstrap in finite sample simulations. This dissertation is dedicated to my parents and my wife. Thank you for nursing me with love and always believing in me. iv ACKNOWLEDGMENTS The past 5 years have been a wonderful journey for me and I would never have been able to get this far without the support from my professors, my friends and my family. It is my pleasure to acknowledge people who have given me guidance, help and encouragement. First and foremost I would like to gratefully and sincerely thank my PhD advisor, Professor Tim Vogelsang, for his guidance, understanding and patience. His Time Series course is definitely my favorite class, which led my way to research field. I really appreciate all the support and encouragement he has given me in my research. In particular, I will never forget the effort he spent on revising this dissertation. For everything you have done for me, Professor Vogelsang, I thank you. I would also like to express my deepest gratitude to my committee members Professor Peter Schmidt and Professor Jeffrey Wooldridge. Thank both of you for your guidance and help during my graduate study, I cannot obtain my PhD without your support. It has been such an honor for me being your student. Special thanks go to Professor Martin Wagner for his important suggestions and remarks on my dissertation as well as the effort he spent as my committee member. I must thank all of the professors from Economics Department who I learned from during the last five years for showing me what is a researcher meant to be, I will keep it in mind in my future career. I would like to give special thanks to my cohort in Economics Department. I really enjoyed the time we studied together. We discussed, debated, exchanged ideas and learned from each other. It is definitely one of the best time periods in my life. My thanks also go to my friends in Math Department. In particular, I would like to thank Xin Yang for his expertise and patience when I had tough math problems. v Lastly, I would like to thank my family for all their love and the faith they put in me. For my parents who raised me with endless love and supported me in all my pursuits. For my grandfather who gave me all his trust and encouraged me all the time. Most importantly, I would like to express my deepest appreciation to my wife, Yanbo Shen. Without her love, understanding and encouragement, I would not have been able to finish this work. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Chapter 1 Integrated modified OLS estimation and fixed-b inference for homogeneous cointegrated panels . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Homogeneous cointegrated panels for benchmark estimators . . . . . . . . . 1.3 Panel integrated modified OLS . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Panel IM-OLS estimator . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Inference using Panel IM-OLS . . . . . . . . . . . . . . . . . . . . . 1.4 Finite sample bias and root mean squared error . . . . . . . . . . . . . . . . 1.4.1 Sample size N = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Sample size N = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Sample size N = 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Summary of finite sample bias and RMSE . . . . . . . . . . . . . . . 1.5 Finite sample performance of test statistics . . . . . . . . . . . . . . . . . . . 1.6 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2 Hypothesis testing in cointegrated panels: Asymptotic Bootstrap method . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The model, assumptions and asymptotic inference . . . . . . . . . . . . 2.2.1 The model and assumptions . . . . . . . . . . . . . . . . . . . . 2.2.2 Inference based on panel IM-OLS . . . . . . . . . . . . . . . . . 2.3 Bootstrap hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Conditional-on-regressors bootstrap . . . . . . . . . . . . . . . . 2.3.2 Stationary bootstrap . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Finite sample simulations . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 5 9 9 15 21 22 24 25 25 26 31 33 and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 77 79 79 82 88 88 90 92 97 99 Estimation and Inference for Heterogeneous Cointegrated Panels with Limited Cross Sectional Dependence . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model set up and estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The model and assumptions . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Panel IM-OLS estimator . . . . . . . . . . . . . . . . . . . . . . . . . 120 121 123 123 126 Chapter 3 3.1 3.2 vii 3.3 Inference about θ . . . . . . . . . . . . . . . . . 3.3.1 Inference using panel IM-OLS . . . . . . 3.3.2 Inference using the stationary bootstrap 3.4 Finite sample simulation . . . . . . . . . . . . . 3.5 Summary and conclusions . . . . . . . . . . . . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 134 140 142 146 149 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 viii LIST OF TABLES Table 1.1: Finite sample bias and RMSE of the various estimator of β1 , N = 5, T = 50, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 1.2: Finite sample bias and RMSE of the various estimator of β1 , N = 5, T = 100, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 1.3: Finite sample bias and RMSE of the various estimator of β1 , N = 10, T = 50, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 1.4: Finite sample bias and RMSE of the various estimator of β1 , N = 10, T = 100, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Table 1.5: Finite sample bias and RMSE of the various estimator of β1 , N = 25, T = 50, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Table 1.6: Finite sample bias and RMSE of the various estimator of β1 , N = 25, T = 100, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Table 1.7: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 5, data dependent bandwidths and lag lengths. . . . . . . . . . . . . 40 Table 1.8: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 10, data dependent bandwidths and lag lengths. . . . . . . . . . . . . 40 Table 1.9: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 25, data dependent bandwidths and lag lengths. . . . . . . . . . . . . 41 Table 1.10: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 5, data dependent bandwidths and lag lengths. . . . . . . . 41 Table 1.11: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 10, data dependent bandwidths and lag lengths. . . . . . . 42 Table 1.12: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 25, data dependent bandwidths and lag lengths. . . . . . . 42 Table 1.13: Fixed-b asymptotic critical value for t-test of β in regression with intercept and two regressors, N = 25, Bartlett kernel . . . . . . . . . . . . . . . . . 43 Table 1.14: Fixed-b asymptotic critical value for t-test of β in regression with intercept and two regressors, N = 25, QS kernel . . . . . . . . . . . . . . . . . . . . 44 ix Table 3.1: Empirical null rejection probabilities, 5% level, t-tests for H0 : β1 = 1, N = 5, ρ = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . 150 Table 3.2: Empirical null rejection probabilities, 5% level, t-tests for H0 : β1 = 1, N = 5, ρ = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . 150 Table 3.3: Empirical null rejection probabilities, 5% level, Wald-tests for H0 : β1 = 1, β2 = 1, N = 5, ρ = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . 151 Table 3.4: Empirical null rejection probabilities, 5% level, Wald-tests for H0 : β1 = 1, β2 = 1, N = 5, ρ = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . 151 x LIST OF FIGURES Figure 1.1: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Figure 1.2: Empirical null rejections, t-test, N = 10, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 1.3: Empirical null rejections, t-test, N = 25, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 1.4: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Figure 1.5: Empirical null rejections, t-test, N = 10, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Figure 1.6: Empirical null rejections, t-test, N = 25, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Figure 1.7: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 1.8: Empirical null rejections, t-test, N = 10, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Figure 1.9: Empirical null rejections, t-test, N = 25, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 1.10: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.3, QS kernel 54 Figure 1.11: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, QS kernel 55 Figure 1.12: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, QS kernel 56 Figure 1.13: Size adjusted power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.3, QS kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Figure 1.14: Size adjusted power of panel IM, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, QS kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 1.15: Size adjusted power of panel IM, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 xi Figure 1.16: Size adjusted power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, QS kernel 60 Figure 1.17: Size adjusted power, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, b = 0.3, QS kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Figure 1.18: Size adjusted power of panel IM, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, QS kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Figure 1.19: Size adjusted power, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, QS kernel 63 Figure 2.1: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 2.2: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 2.3: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 2.4: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Figure 2.5: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 2.6: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure 2.7: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Figure 2.8: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Figure 2.9: Raw power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Figure 2.10: Raw power, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 2.11: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 110 Figure 2.12: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression . . 111 xii Figure 2.13: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 112 Figure 2.14: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression . . 113 Figure 2.15: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 114 Figure 2.16: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression . . 115 Figure 2.17: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 116 Figure 2.18: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression . . 117 Figure 2.19: Raw power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 118 Figure 2.20: Raw power, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel, residuals from non-augmented partial sum regression . . . . . . . 119 Figure 3.1: Power of bootstrap Stat-BS IM (D), Wald test, N = 5, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel with different T and pT . . . . . . . . . . . . . . 152 Figure 3.2: Power of bootstrap Stat-BS IM (D), Wald test, N = 5, T = 500, b = 0.5, Bartlett kernel with different ρ and pT . . . . . . . . . . . . . . . . . . . 153 Figure 3.3: Size adjusted power, Wald-tests, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Figure 3.4: Size adjusted power, Wald-tests, N = 15, T = 50, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 xiii KEY TO ABBREVIATIONS OLS Ordinary Least Square IM-OLS Integrated Modified Ordinary Least Square FM-OLS Fully Modified Ordinary Least Square DOLS Dynamic Ordinary Least Square HAC Heteroskedastic and Autocorrelation Consistent RMSE Root Mean Squared Error Fb Fixed-b QS Quadratic Spectral xiv Chapter 1 Integrated modified OLS estimation and fixed-b inference for homogeneous cointegrated panels This paper is concerned with parameter estimation and inference in homogeneous cointegrated panels. We propose a simple, new estimation method originated from Vogelsang and Wagner [2014]. The estimator is labeled panel integrated modified ordinary least squares (panel IM-OLS). Similar to panel fully modified ordinary least squares (panel FM-OLS) and panel dynamic ordinary least squares (panel DOLS), the panel IM-OLS estimator has a zero mean Gaussian mixture limiting distribution. However, panel IM-OLS does not require estimation of long run variance matrices and avoids the need to choose tuning parameters such as kernel functions, bandwidths, leads and lags. Inference based on panel IM-OLS estimates does require an estimator of a scalar long run variance, and we propose both traditional and fixed-b methods for obtaining critical values for test statistics. The properties of panel IM-OLS are analyzed using asymptotic theory and finite sample simulations. Panel IM-OLS performs well relative to other estimators. 1 1.1 Introduction This paper considers the extension of the pure time series integrated modified ordinary least squares (IM-OLS) method of Vogelsang and Wagner [2014] for estimating and testing hypotheses about a cointegrating vector to a balanced panel of N individuals observed over T time periods. We call the estimator panel IM-OLS. We derive its limiting distribution and provide a finite sample simulation of panel IM-OLS compared with pooled OLS, panel fully modified OLS (panel FM-OLS) and panel dynamic OLS (panel DOLS). It is well-known that in panel cointegration regression, when the regressors are endogenous, the limiting distribution of the pooled OLS estimator is contaminated by second order bias terms. Inference is difficult in this situation because the nuisance parameters cannot be removed by simple scaling methods. Consequently, panel FM-OLS and panel DOLS were proposed, which both deal with the endogeneity problem and lead to zero mean Gaussian mixture limiting distributions and in turn make standard asymptotic inference available. The panel IM-OLS estimator is based on pooled OLS estimation of a partial sum transformation of the cointegrating panel regression. Similar to the panel FM-OLS and panel DOLS estimators, the panel IM-OLS estimator also has a zero mean Gaussian mixture limiting distribution, but it has advantage compared with its two counterparts. Panel IM-OLS estimator avoids kernel function and bandwidth choices for long run variance estimation, which is required by panel FM-OLS, and leads and lags choices to expand the regression, which is required by panel DOLS. However, for inference, panel IM-OLS does need to estimate a scalar long run variance parameter. The limit theory considered here is obtained for a fixed number of cross-sectional units N , letting T → ∞. This limit theory is widely used in empirical macroeconomics, empirical en2 ergy economics and empirical finance problems. In this case, even though the panel IM-OLS estimator converges to a zero mean Gaussian mixture distribution, inference based on this estimator still requires the estimation of a long run variance parameter. As in Vogelsang and Wagner [2014], there are two solutions for this problem. First, standard asymptotic inference based on a consistent estimator of the long run variance and second, fixed-b inference. The latter solution has its own benefit over standard asymptotic theory because fixed-b inference captures the impact of kernel and bandwidth choices on test statistics based upon them, whereas standard asymptotic theory does not. As will be discussed in detail later, the pooled OLS residuals of the panel IM-OLS regression need to be further adjusted to obtain pivotal fixed-b test statistics. All estimators and tests in this paper are derived for a cross-sectionally uncorrelated homogeneous panel. For many applications this unrealistic assumption is still commonly employed when developing panel cointegration methods, especially for estimation procedures. Only a few and partial results concerning both cointegration estimation and inference are available for cross-sectionally dependent panels to date. One branch of the literature considers panel data with spatial interaction among cross-sectional units (e.g., Kapoor et al. [2007]; Yu et al. [2008], [2010], [2012]). An alternative to the spatial approach is the factor structure approach, which can capture common stochastic shocks and trends (e.g., Bai and Ng [2004]). Bai and Kao [2006] derive an extension of FM-OLS estimation to panels with short-run cross-sectional correlation. Pesaran [2006] proposes the Common Correlated Effects (CCE) approach to estimation of panel data models with multi-factor error structure, which is further developed by Kapetanios et al. [2011] allowing for nonstationary common factors. The estimation and inference is challenging for cross-sectionally dependent heterogeneous panel with endogenous regressors, but our ongoing work shows that the methods 3 developed in this paper, with some modifications, will be able to estimate the parameter and make valid inference in that scenario. After the theoretical analysis, we provide a finite sample simulation study to assess the performance of the estimators and tests. Benchmarks are given by pooled OLS, panel FMOLS and panel DOLS. In the simulations, panel IM-OLS performs relatively well with smaller bias and only slightly larger RMSE than other estimators. The simulations of size and power of the tests show that fixed-b test statistics based on the panel IM-OLS estimator lead to the smallest size distortions at the price of only minor losses in size-corrected power. The remainder of the paper is organized as follows. In the next section we present a standard panel cointegrating regression and review several key results of the benchmark estimators. Section 1.3 describes the panel IM-OLS estimator and its asymptotic distribution. Inference using the panel IM-OLS parameter estimator is discussed. Section 1.4 reports the finite sample bias and root mean squared error of the various estimators. Section 1.5 assesses the finite sample performance of the test statistics described in Section 1.3. Section 1.6 concludes the paper. Appendix contains the proofs of this paper. 4 1.2 Homogeneous cointegrated panels for benchmark estimators Consider the following data generating process yit = µ + xit β + uit (1.1) xit = xit−1 + vit (1.2) where yit and uit are scalars, xit and vit are k × 1 vectors with sub-index i = 1, 2, · · · , N for the ith cross sectional unit, sub-index t = 1, 2, · · · T for the time period; β is k × 1 vector of the slope parameters. For notational brevity here we only include the intercept µ as the deterministic component (later when we discuss the panel IM-OLS estimator, we will extend it into more general deterministic time trends such as µ0 + µ1 t + · · · + µp−1 tp−1 ). Define the error vector as ηit = uit , v . It is assumed that ηit is a vector of I(0) processes, in it which case xit is a non-cointegrating vector of I(1) processes and there exists a cointegrating relationship among yit , x it with cointegrating vector 1, −β . Assumption 1. Assume that {ηit }N i=1 are cross-sectionally uncorrelated and theirs 2nd order moment is constant. Note that the Assumption 1 only requires the panels are homogeneous in the 2nd order moment, it’s possible that the higher order moment structure are heterogeneous across i. Assumption 2. Assume that ηit satisfies a functional central limit theorem (FCLT) of the form [rT ] T −1/2 ηit ⇒ Bi (r) = Ω1/2 Wi (r), t=1 5 r ∈ [0, 1]. In Assumption 2, [rT ] represents the integer part of rT , and Wi (r) is a (k + 1) × 1 vector of independent standard Brownian motions. Ω1/2 is a (k + 1) × (k + 1) matrix that satisfies: Ω = Ω1/2 Ω1/2 , and   Ωuu Ωuv  Ω= E(ηit ηit−j ) =   > 0, j=−∞ Ωvu Ωvv ∞ where it is clear that Ωvu = Ωuv . The assumption Ωvv > 0 rules out cointegration in xit . Partition Bi (r) as   Bu,i (r) Bi (r) =  , Bv,i (r) and likewise partition Wi (r) as    wu,i (r)  Wi (r) =  , Wv,i (r) where wu,i (r) and Wv,i (r) are a scalar and a k-dimensional standard Brownian motion respectively. Using the Cholesky form of Ω1/2 ,    σu·v λuv  Ω1/2 =  , 1/2 0k×1 Ωvv −1/2 2 = Ω − Ω Ω−1 Ω it can be shown that σu·v uu uv vv vu and λuv = Ωuv Ωvv 6 . By this Cholesky decomposition, we can write     Bu,i (r) σu·v wu,i (r) + λuv Wv,i (r) Bi (r) =  = . 1/2 Bv,i (r) Ωvv Wv,i (r) Next define the one-sided long run covariance matrix. For each i ∈ [1, 2, · · · , N ],  ∞ Λ=  Λuu Λuv  E(ηi,t−j ηit ) =  . j=1 Λvu Λvv Also define the contemporary covariances, that is, for each i ∈ [1, 2, · · · , N ],   Σuu Σuv  Σ = E(ηit ηit ) =  . Σvu Σvv Note that ∆ = Σ + Λ is half long run variance, and it is likewise partitioned as  ∞ ∆=  ∆uu ∆uv  E(ηi,t−j ηit ) =  . j=0 ∆vu ∆vv The long run variance, Ω, is related to Λ and Σ as Ω = Σ + Λ + Λ . Remark 1. 1. If we do have heterogeneity in the 2nd order moment structure, i.e. Ωi , Λi , Σi and ∆i are varied for different i, then even though we can estimate those moments individual by individual, however, finding pivotal fixed-b statistics is challenging, and we haven’t found it yet. In this case, one possible way to make valid inference is using bootstrap to mimic those non-pivotal distributions. The stationary bootstrap is one method that we could apply in this scenario. 7 2. If ηit are cross-sectionally dependent, then the inference is much more complicated. Spatial approach and factor structure approach are possible solutions according to different dependence assumptions. We haven’t been able to find a way to deal with the general cross-sectional dependence case. However, if the dependence only originates from time fixed-effect dummy variables, then the methods developped in this paper will go through with some natural modifications, and the bootstrap, both over time as well as across units, is needed for valid inference. As mentioned before, the benchmark estimators are the pooled OLS, the panel FM-OLS and the panel DOLS estimators. To conserve space, we don’t provide detail results of all those estimators. But we do want to review several key results for those estimators. For the pooled OLS estimator, when the regressors are endogenous, it has an asymptotic bias due to the nuisance parameters ∆vu , which cannot be removed by simple scaling methods. The panel FM-OLS estimator as considered here is an extension of the FM-OLS estimator of Phillips and Hansen [1990], which is designed to asymptotically remove ∆vu and to deal with the correlation between Bu,i (r) and Bv,i (r). Conditional on Bv,i (r) for all i = 1, 2, . . . , N , the limit of the scaled panel FM-OLS estimator is a mean zero mixture of normals. Asymptotically pivotal t and Wald statistics with N (0, 1) and chi-square limiting distributions can 2 . The panel DOLS estimator considered here is almost be constructed by estimating σu·v identical to Mark and Sul [2003]. The only difference is that there is no fixed effect in the data generating process (1.1). The homogeneous panel DOLS estimator of β has the same limiting distribution as the homogeneous panel FM-OLS estimator. Hence, they are asymptotically equivalent. This result was shown by Kao and Chiang [2000], and it also can be extended to heterogeneous panels. 8 1.3 Panel integrated modified OLS 1.3.1 Panel IM-OLS estimator In this section, we present a new estimator for homogeneous cointegrated panels. This estimator is an extension of Vogelsang and Wagner [2014], who propose the IM-OLS estimator for the time series case. The transformation used by IM-OLS provides an asymptotically unbiased estimator with a zero mean Gaussian mixture limiting distribution. Compared with panel FM-OLS, the transformation does not require estimators of Ω, so the choice of bandwidth and kernel is avoided for parameter estimation. We consider a slightly more general version of (1.1) given by yit = Dt δi + xit β + uit , (1.3) where δi and β are p × 1 and k × 1 parameter vectors respectively, xit continues to follow (1.2) and for the deterministic component, Dt , we assume that there is a p × p matrix GD and a vector of functions, D(s), such that √ lim T →∞ r T G−1 D D[sT ] = D(s) with 0 < D(s)D(s) ds < ∞, 0 < r 1. (1.4) 0 The deterministic component Dt could include an intercept, time trend and polynomials of the time trend. Remark 2. 1. Note that, in regression (1.3), the intercept from Dt and δi together allow fixed effect estimation of the system. In a simpler case, suppose that δi is the same 9 constant for all i so that yit = Dt δ + xit β + uit . In this case, the estimation and inference procedures introduced later in the paper go through with minor changes. 2. In regression (1.3), β is a constant for all i, which means the same long-run relation between yit and xit applies for all i. As in Philips and Moon [1999], we could also allow this coefficient differs randomly across i, which leads to the heterogeneous panel cointegration model. In that case, as long as the error vectors are uncorrelated across i and their 2nd order moments are constant, then the results will be similar as what we have in this paper. Otherwise, if the panel has heterogeneity in both cointegration relation and 2nd moment structure, then inference is challenging and might need to apply boostrap. 3. We could consider a more traditional panel data setting model like yit = µi + λt + xit β + uit . In this case, after the time effect λt being eliminated by cross-sectional demeaning, and if it is also a homogeneous panel, then the estimation and inference procedures will be similar as in this paper. But if there is heterogeneity in the sencond moment structure, even though the estimation of the β will not be affected, however, the inference is much more complicated as we disscussed in Remark 1, and bootstrap method could be used for the inference. 10 Computing the partial sum of both sides of (1.3) gives y x β + Su , Sit = StD δi + Sit it y t j=1 yij , where Sit = (1.5) x and S u are defined analogously. As in Vogelsang and and StD , Sit it Wagner [2014], we need to add xit as regressors in (1.5) to deal with correlation between uit and vit , which leads to y x β + x γ + Su − x γ Sit = StD δi + Sit it it it x β + x γ + Su ˜ = StD δi + Sit it it . (1.6) We now focus on the asymptotic behavior of the pooled OLS estimators of δi , β and γ from (1.6), which we label the panel IM-OLS estimators of δi , β and γ. Define the stacked vectors and matrices as follows:   β         y y u ˜ u γ    Si1   S1   Si1 − xi1 γ   S1               ..   ..   ..    y . u ˜ u ˜ y . S =  .  , Si =  .  ; θ =  δ1  ; S =  .  , Si =  ; .                      ..  y y u ˜ u  .  SiT SiT − xiT γ SN SN     δN      x xi1  Si1  x ˜  .  S1  ..  ..   .    S x˜ =  ...  , Six˜ =     S x x    iT iT x ˜  SN     01×p ··· S1D ··· 01×p .. . ··· .. . ··· .. . 01×p ··· STD ··· 01×p 1st block 11 ith block N th block      .      With the above notation, the matrix form of (1.6) is given by S y = S x˜ θ + S u˜ , (1.7) and the OLS estimator of (1.7) is given by −1 θ˜ = S x˜ S x˜ S x˜ S y , (1.8) which leads to θ˜ − θ = S x˜ S x˜  N −1 T S x˜ S u˜ −1  qit qit  =  T u − x γ , qit Sit it i=1 t=1 xit 01×p · · · StD submatrix of qit , 01×p · · · S D t N  i=1 t=1 where qit = S x it (1.9) · · · 01×p for i = 1, 2, . . . , N , t = 1, 2, . . . , T . The D th · · · 01×p , consists of St as its i block and other N − 1 zero vector blocks. Define the scaling matrix   0 T Ik      = A−1   Ik P IM     0 IN ⊗ GD as a (2k + N p) × (2k + N p) diagonal matrix. ˜ γ˜ . The following theorem gives the asymptotic distribution of δ˜i , β, Theorem 1. Assume that the data are generated by (1.2) and (1.3), that the deterministic components satisfy (1.4) for all i ∈ [1, 2, · · · , N ], and that Assumptions 1 and 2 hold. Define 12 θ by stacking the vectors δi , β and Ω−1 vv Ωvu . Then for fixed N , as T → ∞   T β˜ − β    γ˜ − Ω−1 Ωvu vv     GD δ˜1 − δ1   ..   .   GD δ˜N − δN         ˜  = A−1 P IM θ − θ          0k×1      Ω−1 Ωvu   vv    −1   = AP IM S x˜ S x˜ AP IM AP IM S x˜ S u˜ −  0p×1      ..     .     0p×1 ⇒ σu·v Π −1 N i=1 = σu·v Π −1 N i=1 −1 1 0 g1,i (s)g1,i (s) i=1 −1 1 0 g1,i (s)g1,i (s) N ds N ds ≡Ψ   1/2 0  Ωvv     1/2 where Π =  , Ωvv     0 IN ⊗ Ip i=1  1 0 g1,i (s)wu,i (s)ds 1 0 [G1,i (1) − G1,i (s)]dwu,i (s)     w (r) v,i     0p×1    .. g1,i (r) =  .    r  0 D(s)ds   ..  .    0p×1 13  r 0 wv,i (s)ds           r , G1,i (r) = 0 g1,i (s)ds.           Conditional on g1,i (r) for all i ∈ {1, 2, . . . , N }, it holds that Ψ ∼ N (0, VP IM ) , (1.10) where VP IM is given by  −1  2 VP IM = σu·v Π     N −1 1 i=1 0 × g1,i (s)g1,i (s) ds  1 [G1,i (1) − G1,i (s)][G1,i (1) − G1,i (s)] ds × i=1 0 N N −1 1 i=1 0 (1.11) g1,i (s)g1,i (s) ds Π−1 It is clear that this conditional asymptotic variance differs from the conditional asymptotic variance of the panel FM-OLS and panel DOLS estimator of δ and β. Denoting with mi (s) = D(s) , wv,i (s) and with ΠP F M = diag Ip , Ω1/2 vv  2 VP F M = σu·v ΠP F M −1  −1 1 N 0 i=1 the latter is given by mi (s)mi (s) ds (ΠP F M )−1 . (1.12) It is important to note that we can extend Theorem 1 further to obtain a sequential result. Since the parameters in θ require different scaling for the sequential limits, and our interests are mainly on β, therefore we only provide the result for β˜ − β . That is, if all the assumptions in Theorem 1 hold, and first T → ∞, then N → ∞, we will have following asymptotic distribution √ N T β˜ − β ⇒ Φ 14 (1.13) β β where Φ ∼ N 0, Vseq , and the asymptotic variance Vseq is given by β Vseq = 2 σu·v 1/2 Ωvv −1 −1 1 0 A1 (r)dr 1 0 −1 1 A2 (r)dr 2 Ω−1 = 5.6σu·v vv 0 A1 (r)dr 1/2 −1 Ωvv (1.14) where A1 (r) = r3 /3 Ik and A2 (r) = 1 − r − r4 + r5 /12 Ik . The importance of this result is that it leads to standard inference based on the large T and large N approximation. The details of the derivation are in the Appendix. Remark 3. Above is the sequential limit for the homogeneous panel. If our panel has heteroβ geneity in the 2nd moment structure, then the variance, Vseq , will take the same expression, 2 = lim 1 but σu·v N →∞ N 1.3.2 N i=1 2 σu·v,i and Ωvv = lim N1 N →∞ N Ωvv,i . i=1 Inference using Panel IM-OLS This section provides a discussion of hypothesis testing using the panel IM-OLS estimator. The zero mean Gaussian mixture limiting distribution of the panel IM-OLS estimator given in Theorem 1 and the conditional asymptotic variance given in (1.11) offer the theoretical basis for this discussion. In particular we consider Wald tests for testing multiple linear hypotheses of the form H0 : Rθ = r where R ∈ Rq×(2k+N p) with full rank q and r ∈ Rq . Because the vector θ˜ has elements that converge at different rates, we need restriction on R to get formal Wald statistics. We 15 assume that there exists a nonsingular q × q scaling matrix AR such that lim A−1 RAP IM T →∞ R = R∗ where R∗ has rank q. In order to carry out statistical inference, we need to scale out the asymptotic variance 2 is an estimator for σ 2 . Then an estimator for V of panel IM-OLS. Suppose that σ ˘u·v P IM u·v is given by  T −1 N 2 T −2 V˘P IM = σ ˘u·v × AP IM qit qit AP IM  t=1 i=1  T  N q T −4 q q q SiT − Si,t−1 AP IM  × AP IM SiT − Si,t−1 t=1 i=1  −1 N T T −2 AP IM qit qit AP IM  , t=1 i=1 q where Sit = t j=1 qij , q and Si0 = 0 for all i = 1, 2, · · · , N . 2 . The first is based on the pooled OLS There are several obvious candidates for σ ˘u·v residuals. Let the pooled OLS residuals be uˆit = yit − xit βˆ − Dt δˆi , where δˆi and βˆ are the pooled OLS estimators. Using these residuals, we can define estimators for the error vector ˆ ij = T −1 ηˆit = uˆit , ∆x . Then Γ it T t=j+1 ηˆit ηˆi,t−j , and   T −1 ˆ ˆ j Ωuu,i Ωuv,i  ˆ ˆ ˆ ij + Γ ˆ Ωi =  k( ) Γ  = Γi0 + ij . M ˆ ˆ j=1 Ωvu,i Ωvv,i 16 2 is given by The estimator for σu·v 2 σ ˆu·v 1 = N N ˆ uu,i − Ω ˆ uv,i Ω ˆ vv,i Ω −1 ˆ vu,i . Ω i=1 u , the first differences of the pooled OLS The second estimation approach is to use ∆S˜it 2 : residuals of the panel IM-OLS regression (1.6), to directly estimate σu·v 2 = σ ˜u·v 1 N N  T T T −1 k j=2 h=2 i=1 |j − h| M  u ∆S ˜u  . ∆S˜ij ih 2 is not a Extending a result in Vogelsang and Wagner [2014], it can be shown that σ ˜u·v consistent estimator under traditional assumptions on the bandwidth and kernel functions. 2 is larger than Under traditional bandwidth assumptions, we can show that the limit of σ ˜u·v 2 2 which leads to asymptotically conservative results when we build test statistics by σ ˜u·v σu·v and use critical values from the standard normal or a chi-square distribution. The third estimation approach is based on OLS residuals from a further augmented 2 based on regression. As discussed in Vogelsang and Wagner [2014], an estimator of σu·v 2 , independent of these residuals defined below has a fixed-b limit that is proportional to σu·v 2 ˜ and does not depend upon additional nuisance parameters, whereas the estimators σ θ, ˆu·v 2 both fail those requirements. Define this estimator as: and σ ˜u·v 2∗ = σ ˜u·v 1 N N  T T T −1 i=1 k j=2 h=2 |j − h| M  u∗ ∆S ˜u∗  ∆S˜ij ih u∗ = S ˜u∗ − S˜u∗ is the difference of the residuals S˜u∗ , obtained by running the where ∆S˜it it i,t−1 it u∗ = S y − S D δ˜ − S x β˜ − further augmented IM-OLS regression individual by individual, S˜it t i it i it 17 ˜ i . The augmented regressors, zit , are given by zit = t xit γ˜i − zit λ x = x, x qit StD , Sit it T j=1 x − qij t−1 j j=1 s=1 x , where qis for all i = 1, 2, . . . , N , t = 1, 2, . . . , T . 2 , we can define the t and Wald statistic as Using an estimator of σu·v Rθ˜ − r t˘ = RAP IM V˘P IM AP IM R ˘ = W Rθ˜ − r RAP IM V˘P IM AP IM R −1 Rθ˜ − r 2 , which defines tˆ and W 2 , which ˆ , or V˜P IM using σ where V˘P IM could be VˆP IM using σ ˆu·v ˜u·v 2∗ , which defines t˜∗ and W ˜ ∗ . The asymptotic null ˜ , or V˜ ∗ defines t˜ and W ˜u·v P IM using σ distribution of these test statistics are given in Theorem 2. Standard asymptotic results ˆ , t˜ and W ˜ , whereas based on traditional bandwidth and kernel assumptions are given for tˆ, W ˜ ∗. a fixed-b result is given for t˜∗ and W Theorem 2. Assume that the data are generated by (1.2) and (1.3), that the deterministic components satisfy (1.4), and that Assumptions 1 and 2 hold. Suppose that the bandwidth, 2 is consistent. Then for fixed M , and kernel function, k(·), satisfy conditions such that σ ˆu·v N , as T → ∞ ˆ ⇒ χ2 , where χ2 is a chi-square random variable with q degrees of freedom. When • W q q q = 1, tˆ ⇒ Z, where Z is a standard normal distribution. 2 ⇒ σ2 • Under the above assumptions, σ ˜u·v u·v 1 + dγ dγ , with dγ denoting the second k×1 N block of i=1 −1 1 0 g1,i (s)g1,i (s) ˜ ⇒ it follows that W χ2 q , 1+dγ dγ N ds i=1 1 0 [G1,i (1) − G1,i (s)]dwu,i (s) . Therefore, where χ2q is a chi-square random variable with q degrees of 18 freedom that is correlated with dγ . When q = 1, t˜ ⇒ Z , 1+dγ dγ where Z is distributed standard normal and is correlated with dγ . • If M = bT , where b ∈ (0, 1] is held fixed as T → ∞, then χ2q ˜∗⇒ W 1 N N i=1 Q∗i (b) where Q∗i (b) is exactly same form as Qb P˜ ∗ , P˜ ∗ in Vogelsang and Wagner [2014],1 and χ2q is a chi-square random variable with q degrees freedom independent of N1 N i=1 Q∗i (b). When q = 1, Z t˜∗ ⇒ 1 N N i=1 Q∗i (b) where Z is standard normal distribution independent of N1 N i=1 Q∗i (b). ˜∗ • Due to the independence between the numerator and denominator of the limits of W and t˜∗ , we can further obtain a sequential limit result, where T grows large, followed sequentially by the limit as N grows large. Define ˜ ∗ = µQ · W ˜ ∗, W µQ 1 See Vogelsang and Wagner (2014) page 744 for Q (P , P ), and formula (30) for P˜ ∗ (r). b 1 2 19 where µQ = E Q∗i (b) . Then as T → ∞, χ2q ˜ µ∗ ⇒ W Q 1 N N 1 ∗ µQ Qi (b) i=1 P −→ χ2q as N → ∞. When q = 1, t˜∗µ Q = µQ · t˜∗ Z ⇒ 1 N P −→ Z N 1 ∗ µQ Qi (b) i=1 as N → ∞. 2 , inference using W 2 is ˆ is standard. In contrast σ When appealing to consistency of σ ˆu·v ˜u·v inconsistent under traditional bandwidth assumptions. Since dγ dγ > 0, the critical values of ˜ are smaller than those of the χ2q distribution. Thus, using χ2q critical values for W ˜ leads W to a conservative test under the traditional bandwidth assumptions. The fixed-b limiting ˜ ∗ is complicated due to the presence of 1 distribution of W N N i=1 Q∗i (b), which depends on wv,i (r) for i = 1, 2, · · · , N . Therefore, critical values should be simulated taking into account the cross-sectional sample size, the specifications of deterministic components, the number of integrated regressors, the kernel function and bandwidth choice. For sake of brevity, in Table 1.13 and Table 1.14, we only tabulate critical values for the t-statistic for the parameter associated with xit in models with an intercept and 2 integrated regressors for the Bartlett and QS kernels and a grid of bandwidths indexed by b for N = 25. However, when both T 20 ˜ ∗ , with µ and N large, inference using W ˆQ as an estimator of µQ , requires merely χ2q critical µ Q values rather than simulated critical values, which is quite convenient. The t statistics have similar results. 1.4 Finite sample bias and root mean squared error In this section, we compare the performance of the pooled OLS, panel FM-OLS, panel DOLS, and panel IM-OLS estimators as measured by bias and root mean squared error (RMSE) within a small simulation study. We provide results that the individual dummy variable is not included. The data generating process is given by yit = µ + x1it β1 + x2it β2 + uit x1it = x1i,t−1 + v1it x2it = x2i,t−1 + v2it where, for ∀ i ∈ [1, 2, · · · , N ], x1i0 = 0, x2i0 = 0, and uit = ρ1 ui,t−1 + it + ρ2 (e1it + e2it ), ui0 = 0, and v1it = e1it + 0.5e1i,t−1 , v2it = e2it + 0.5e2i,t−1 , where it , e1it and e2it are i.i.d. standard normal random variables independent of each other. The parameter values chosen are µ = 3, β1 = β2 = 1. Note that the estimators of β1 and β2 are exactly invariant to the value of µ, so the value of µ has no effect on our results. In addition, we use ρ1 and ρ2 from the set {0, 0.3, 0.6, 0.9}. The parameter ρ1 controls serial correlation in the regression error, and ρ2 determines whether the regressors are endogenous or not. The kernel function chosen for panel FM-OLS are the Bartlett and the Quadratic Spectral kernels, and the 21 bandwidths are given by M = bT with b ∈ {0.06, 0.1, 0.3, 0.5, 0.7, 0.9}. We also use the data dependent bandwidth from Andrews [1991]. The panel DOLS estimator is implemented using the information criterion based lead and lag length choice as developed in Kejriwal and Perron [2008], where we use the more flexible version discussed in Choi and Kurozumi [2012] in which the numbers of leads and lags included are not exactly same. The sample sizes are N ∈ {5, 10, 25}, T ∈ {50, 100} and the number of replications is 5000. In Tables 1.1-1.6 we display the results for N = 5, 10, 25, T = 50, 100 using the Bartlett kernel only. General patterns are similar for the QS kernel. In each of those tables, Panel A reports bias and Panel B reports RMSE. 1.4.1 Sample size N = 5 Table 1.1 shows the results for N = 5 with T = 50 case. When ρ2 = 0 (no endogeneity), none of the estimators show much bias for any value of ρ1 . When the bandwidth is relatively small, panel FM-OLS has a little bit larger RMSE than pooled OLS. But, as the bandwidth increases, the RMSE of panel FM-OLS tends to first increase and then decreases, indicating a hump-shape in the RMSE, and the turning point is around b = 0.5, i.e. M = 0.5T . Pooled OLS and panel FM-OLS have smaller RMSE than panel IM-OLS and this holds regardless of bandwidth for panel FM-OLS. Panel IM-OLS has the largest RMSE in any cases. The RMSE of panel DOLS is a little bit smaller than that of panel IM-OLS, but it is still larger than that of pooled OLS and panel FM-OLS. When ρ2 = 0 (there is endogeneity), the estimators show different patterns. For a given value of ρ1 , as ρ2 increases, the bias of pooled OLS increases. Panel FM-OLS shows the 22 same pattern, but its bias is smaller than that of pooled OLS, which is expected from the theory. In addition, the bias of panel FM-OLS also depends on the bandwidth and value of ρ1 , as ρ1 is relatively small, the bias of panel FM-OLS increases as bandwidth increases, however, as ρ1 is far away from zero, the bias of panel FM-OLS is seen to initially fall as the bandwidth increases and then tends to increase as the bandwidth become large. But no matter how large the bandwidth is, the bias of panel FM-OLS does not exceed that of pooled OLS. On the contrary, the biases of panel IM-OLS and panel DOLS are much less sensitive to ρ2 , especially when ρ1 is relatively small, and are always smaller than those of pooled OLS and panel FM-OLS. The bias of panel DOLS is similar to the bias of panel IM-OLS when ρ1 is small whereas for larger values of ρ1 , the bias of panel DOLS tends to be larger than that of panel IM-OLS. The overall picture in this case is that panel IM-OLS has smaller bias than panel DOLS which in turn has lower bias than both pooled OLS and panel FM-OLS. The magnitude of the bias of both panel IM-OLS and panel DOLS are less sensitive to the values of ρ2 than for pooled OLS and panel FM-OLS. Considering the RMSE when there is endogeneity, we see that for given value of ρ1 , as ρ2 increases, the RMSE of pooled OLS increases. Panel FM-OLS shows the same pattern, but its RMSE is smaller than that of pooled OLS, especially when ρ1 is relatively small. Focusing on the bandwidth we see that the RMSE of panel FM-OLS has the same pattern as its bias, if ρ1 is small, the RMSE of panel FM-OLS increases as bandwidth increases, whereas if ρ1 is relatively large, the RMSE of panel FM-OLS is seen to initially fall as bandwidth increases and then tends to increase as the bandwidth becomes large. For RMSE of panel IM-OLS, it is larger than that of pooled OLS when both ρ1 and ρ2 are small, otherwise, it is smaller than that of pooled OLS. The RMSE of panel DOLS also has similar pattern but it is smaller than that of panel IM-OLS. The RMSE of panel IM-OLS does not vary with ρ2 when ρ1 is 23 small. The comparison of RMSE for panel IM-OLS and panel FM-OLS depend on value of ρ1 , ρ2 and b. When both ρ1 and ρ2 are small, the RMSE of panel IM-OLS is larger than that of panel FM-OLS, no matter what bandwidth used. However, when both ρ1 and ρ2 are large, the RMSE of panel IM-OLS could be smaller than that of panel FM-OLS with very large bandwidth. Also, in Table 1.1, we can see that when there is endogeneity but no serial correlation, then panel FM-OLS using the data dependent bandwidth performs better than all other estimators with very small bias and smallest RMSE. And this is true for all different combinations of N and T , which we can see from Table 1.2 to Table 1.6. When we increase T to 100, all the estimators tend to have smaller bias than the T = 50 case, which is expected because the estimators are consistent. Almost all of the results are similar to the T = 50 case. One slight difference is that when there is endogeneity and both ρ1 and ρ2 are large, the bias of panel IM-OLS is a little bit larger than that panel DOLS, even though they are still less biased than pooled OLS. The other difference when we increase T from 50 to 100, is that when both ρ1 and ρ2 are very large, the RMSEs of panel IM-OLS and panel DOLS are very similar and both are smaller than that of panel FM-OLS, no matter what bandwidth used, and in turn smaller than RMSE of pooled OLS. 1.4.2 Sample size N = 10 The results of bias in N = 10 case are similar to the N = 5 case. From Panel B of Table 1.3, most of the results for RMSEs are similar as N = 5 case except that when both ρ1 and ρ2 are large, the RMSEs of panel IM-OLS and panel DOLS are very similar and both are 24 smaller than that of panel FM-OLS for both T = 50 and T = 100. Also, when ρ1 and ρ2 are very large, the RMSEs of panel IM-OLS is slightly smaller than that of panel DOLS for T = 50 and this relation is reversed when T = 100. 1.4.3 Sample size N = 25 When we increase the cross section sample size to N = 25, the bias results are similar, but a different pattern emerges in the RMSEs. From Panel B of Tables 1.5 and 1.6, when there is endogeneity, in both cases T = 50 and T = 100, the RMSE of pooled OLS is the largest in any cases. This implies that when there is endogeneity, pooled OLS will have the largest bias and largest RMSE. Also, we can see that when both ρ1 and ρ2 are large, the RMSEs of panel IM-OLS and panel DOLS are very similar and both are smaller than that of panel FM-OLS with any bandwidth for both T = 50 and T = 100. Similar as the N = 10 case, when both ρ1 and ρ2 are very large, the RMSEs of panel IM-OLS is slightly smaller than that of panel DOLS when T = 50, and the RMSEs of panel IM-OLS is slightly larger than that of panel DOLS when T = 100. 1.4.4 Summary of finite sample bias and RMSE The simulation shows that, when there is no endogeneity (ρ2 = 0), pooled OLS dominates other estimators with no bias and smallest variance. When there is no serial correlation (ρ1 = 0), panel FM-OLS with the data dependent bandwidth performs better than other estimators. When both serial correlation and endogeneity exist (ρ1 = 0, ρ2 = 0), the relative 25 performance of the estimators depends on the values of N , T , ρ1 and ρ2 . Panel IM-OLS is more effective in reducing bias than the other estimators and both bias and RMSE of panel IM-OLS are less sensitive to the parameters ρ1 and ρ2 than are the bias and RMSE of panel FM-OLS. For N small (N = 5, 10) and T small (T = 50), panel IM-OLS has the smallest bias but with larger RMSE as a cost, except that when both ρ1 and ρ2 are large where panel IM-OLS has the smallest RMSE. For N small and T relatively large (T = 100), panel IM-OLS and panel DOLS are similar, and dominate pooled OLS and panel FM-OLS. When N is relatively large (N = 25), pooled OLS has the largest bias and largest RMSE in all cases, and if T is small and ρ1 , ρ2 are relatively large, then panel IM-OLS is better than the other estimators in reducing bias and has smallest RMSE. When N and T are large, and ρ1 , ρ2 are large, then panel DOLS is a little bit better than panel IM-OLS, which in turn is better than pooled OLS and panel FM-OLS. 1.5 Finite sample performance of test statistics In this section we provide some finite sample results about the tests’ performance using the simulation design from Section 1.4. Here, we only report results for cases where ρ1 = ρ2 . The results include t-statistics for testing the null hypothesis H0 : β1 = 1 and Wald statistics for testing the joint null hypothesis H0 : β1 = 1, β2 = 1. The pooled OLS statistics serve as a benchmark. The panel FM-OLS statistics were implemented using σ ˆ 2+ . The panel IM-OLS u 2 and is labeled panel IM(O), statistics were implemented in three ways. The first uses σ ˆu·v 2 and is labeled panel IM(D) and the third uses σ 2∗ and is labeled panel the second uses σ ˜u·v ˜u·v IM(Fb). We report results for both the Bartlett and Quadratic Spectral kernels. As for the 26 choice of bandwidth for panel FM-OLS and panel IM-OLS statistics, we follow Vogelsang and Wagner [2014]. One bandwidth choice is the data dependent bandwidth rule of Andrews [1991]. The other choice is the fixed-b bandwidth, that is b = M/T , where M = 1, 2, · · · , T . Rejections for the pooled OLS, panel FM-OLS, panel DOLS, panel IM(O) and panel IM(D) are carried out using N (0, 1) critical values for all values of M . From Theorem 2, the panel IM(D) test statistic is asymptotically conservative under traditional asymptotic theory. In contrast, rejections for panel IM(Fb) are carried out using fixed-b asymptotic critical values. The empirical rejection probabilities were computed using 5000 replications, and the nominal level is 0.05 in all cases. Tables 1.7 to 1.9 and Tables 1.10 to 1.12 report empirical null rejection probabilities using data dependent bandwidth choices for Bartlett and QS kernel. Tables 1.7 to 1.9 show results for the t-tests and Tables 1.10 to 1.12 contain results for the Wald tests. In each table Panel A corresponds to T = 50 and Panel B to T = 100. We only briefly summarize some main findings in the tables. When ρ1 = ρ2 = 0 (no serial correlation and no endogeneity), we can see that pooled OLS tests have rejection probabilities close to 0.05, but there are huge over-rejections as the value of ρ1 and ρ2 increase. For ρ1 = ρ2 = 0, when using the QS kernel, panel IM(Fb) tests tend to have rejection probabilities less than 0.05, whereas other tests show some over-rejections. For ρ1 = ρ2 = 0, when using the Bartlett kernel, all the tests show some over-rejections, but the over-rejection problem is less severe when T = 100 than T = 50. Note that both panel IM(O) and panel IM(D) show some over-rejections, but those are less severe than panel FM-OLS, especially when there is strong serial correlation and strong endogeneity. Generally, panel IM(D) tests have rejection probabilities that are smaller than those of panel IM(O), which is what we expected because the panel IM(D) test is conservative under standard asymptotic theory. In addition, increasing the values of ρ1 27 and ρ2 leads to over-rejection problems for all the tests. The problem with panel IM-OLS is that the data-dependent bandwidth is too small to give less size distortions. In contrast to the pure time series case, there is no test that dominates the others in that scenario. In order to see the impact of bandwidth and kernel choices on over-rejection problem, we plot in Figures 1.1-1.3 null rejection probabilities of the t-tests as a function of b ∈ (0, 1]. The first three figures give the results for N ∈ {5, 10, 25}, T = 100 using the Bartlett kernel and ρ1 = ρ2 = 0.3. In Figure 1.1, with cross-section sample size N = 5, we can see that with small bandwidths, all tests have some over-rejection problems. Panel IM(D) is less severe than the other tests because it is conservative. As the bandwidth increases, all rejection probabilities increase substantially except for panel IM(Fb). The panel IM(Fb) rejection probabilities are close to 10% for all values of b, which indicates that the fixed-b approximation performs relatively well for panel IM(Fb). In Figures 1.2 and 1.3, the crosssection sample size increases to 10 and 25 and the pattern of rejection probabilities are similar as Figure 1.1. However, when the bandwidth is small, like b = 0.08, panel FM(Fb) has the least rejection probabilities, around 8% and 7.5%, respectively. In addition, as N increases, panel IM(Fb) rejection probabilities are close to 10% and 12% when large bandwidth used. As the values of ρ1 , ρ2 increase to 0.9, there exists strong serial correlation and endogeneity. We can see from Figures 1.4-1.6 that all the tests have serious over-rejection problems regardless of bandwidth. Interestingly, for small N (N = 5, 10), panel FM(Fb) has less of an over-rejection problem than panel IM(Fb) although both tests are severely size distorted. As N increases to 25, the rejection probabilities of panel IM(Fb) tend to smaller than that of panel FM(Fb). In Vogelsang and Wagner [2014], it was pointed out that the over-rejection problems of IM(Fb) becomes less problematic as T increases. We find similar patterns in our simulations. In Figures 1.7 to 1.9, we show the results with T = 100, and it is clear that 28 the panel IM(Fb) over-rejections are reduced although they are still large. We believe that, similar to the pure time series case, if we further increase T to 500 or 1000, the rejections for panel IM(Fb) with non-small bandwidths will be substantially reduced to reasonable size whereas the other statistics will continue to have over-rejection problems. Figures 1.10-1.12 give some results for the QS kernel. For brevity we only report results for N = 5, T = 50, 100, ρ1 = ρ2 ∈ {0.3, 0.9}. Compared with results using Bartlett kernel, panel IM(Fb) has less over-rejection problems using QS kernel. In Figures 1.10, when serial correlation and endogeneity are not that strong, panel IM(Fb) tends to have rejections close to 10%, whereas all other tests have over-rejection problems. In Figures 1.11 and 1.12, when there is strong serial correlation and endogeneity, all the tests have some over-rejection problems, however, the QS kernel leads to less size distorted results than the Bartlett kernel. The overall picture is that the panel IM(Fb) test is the most robust statistic in terms of controlling over-rejection problems although for given sample sizes, N , T , increasing the values of ρ1 , ρ2 causes over-rejections to emerge. Large sample sizes of both N and T in conjunction with large bandwidths and the QS kernel are desirable when serial correlation and endogeneity are strong. Now, we turn to the analysis of the power properties of the tests. For the sake of brevity we only display results for the case ρ1 = ρ2 = 0.6 for the Wald test for N = 5, T = 50 and using the QS kernel. Patterns are similar for other values of ρ1 , ρ2 for t tests for other combinations of N, T with the Bartlett kernel. Starting from the null values of β1 and β2 equal to 1, we consider under the alternative β1 = β2 = β ∈ (1, 1.4], using (including the null value) a total of 21 values on a grid with mesh 0.02. We focus on size-corrected power because of the potential over-rejection problems under the null hypothesis. This allows us to see power differences across tests while holding null rejection probabilities constant at 29 0.05. This is useful for the theoretical power comparisons, but such size-corrections are not feasible in practice. In Figure 1.13 we display size adjusted power of the panel FM(Fb) and panel IM-OLS Wald tests using the QS kernel with b = 0.3. For all other values of b, the patterns are very similar. From Figure 1.13, we can see that panel IM(Fb) has the least power across the four 2∗ to obtain asymptotically fixed-b inference and less finite sample size tests. The use of σ ˜u·v distortions comes at the price of a small reduction in power. Figure 1.14 shows the effect of the bandwidth on size adjusted power of the panel IM(Fb) test by plotting power curves for eight values of b = 0.02, 0.06, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0. We can see that panel IM(Fb) power depends on the bandwidth and tends to decrease as bandwidth increases, but power is not that sensitive to the bandwidth. In addition, when b ≥ 0.5, the power of panel IM(Fb) is almost constant. In Figure 1.15, we display power using the Bartlett kernel, and it is clear that all tests almost have similar power, and the power is not sensitive to b. Figure 1.16 gives power comparisons across the various tests: pooled OLS, panel FM-OLS, panel DOLS, panel IM-OLS. In Figure 1.16, panel IM(Fb) test is shown for b = 0.06, 0.1, 1.0, and using the Andrews data dependent bandwidth. The panel FM-OLS test is implemented with the Andrews data dependent bandwidth. We note that the pooled OLS and panel FMOLS tests have the largest size-adjusted power, with the power of panel DOLS test being slightly smaller and panel IM-OLS tests have the smallest power. But the power difference between panel IM-OLS and all other tests are relatively small. Finally, Figures 1.17-1.19 provide size adjusted power comparisons similar to Figures 1.13, 1.14 and 1.16 but with N = 10. The main feature is that size adjusted power increases with N . In addition, as N increases, the power of panel IM(Fb) becomes less sensitive to the 30 bandwidth. With a larger N , the power rankings are the same as before, but the difference of the power between panel IM-OLS and panel FM-OLS is smaller. 1.6 Summary and conclusions This paper considers the extension of the integrated modified ordinary least squares (IM-OLS) method of Vogelsang and Wagner [2014] for estimation and inference about a cointegrating vector in homogeneous cointegrated panels. We label the estimator panel IM-OLS. It is a tuning parameter free estimator that is based on a partial sum transformed regression augmented by the original integrated regressors themselves. The advantage is that it leads to a zero mean mixed Gaussian limiting distribution without requiring the choice of tuning parameters (like bandwidth, kernel, numbers of leads and lags). For inference based on panel IM-OLS estimates, a long run variance still needs to be scaled out. Using a consistent estimator of the corresponding long run variance leads to tests having standard asymptotic distributions. Fixed-b inference is another way to obtain pivotal test statistics. Critical values of fixed-b t and Wald tests need to be simulated taking into account the specification of deterministic components, the number of integrated regressors, the kernel function and the bandwidth choice. We provide a finite sample simulation study in which the performance of the panel IMOLS estimator and test statistics are compared with pooled OLS, panel FM-OLS and panel DOLS. Typically, panel IM-OLS shows good performance in terms of bias and RMSE especially in the following two scenarios: (i) the panel has large sample size; (ii) small sample size panel with strong serial correlation and endogeneity. The size and power analysis of the tests 31 show that the fixed-b test statistics are more robust, in terms of having lower size distortions than all other test statistics, especially for larger sample sizes. This robustness comes at the cost of minor power losses provided serial correlation and endogeneity is not that strong. When there is strong serial correlation and endogeneity, all tests have severe over-rejection problems, and we prefer panel IM-OLS test with QS kernel and large bandwidth in this case. Further research will study panel IM-OLS estimator for panels that have identical dependent unit in cross section, panels that have non-identical dependent unit in cross section, for higher order cointegrating regressions and for nonlinear cointegration relationships. 32 APPENDIX 33 Tables and Figures Table 1.1: Finite sample bias and RMSE of the various estimator of β1 , N = 5, T = 50, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 -0.0002 -0.0001 -0.0004 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 0.3 0.0057 -0.0004 -0.0006 -0.0004 0.0003 0.0021 0.0030 0.0036 0.0040 0.0001 0.6 0.0116 -0.0008 -0.0003 -0.0006 0.0009 0.0044 0.0063 0.0075 0.0082 0.0005 0.9 0.0175 -0.0011 -0.0002 -0.0007 0.0015 0.0068 0.0096 0.0113 0.0124 0.0010 0 -0.0003 -0.0002 -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 0.3 0.0100 -0.0001 -0.0001 0.0013 0.0017 0.0039 0.0055 0.0064 0.0071 0.0016 0.6 0.0202 0.0001 0.0001 0.0029 0.0038 0.0083 0.0113 0.0132 0.0145 0.0035 0.9 0.0304 0.0002 0.0002 0.0045 0.0059 0.0126 0.0171 0.0200 0.0219 0.0055 0 -0.0004 -0.0003 -0.0008 -0.0005 -0.0005 -0.0006 -0.0005 -0.0005 -0.0005 -0.0005 0.3 0.0200 0.0024 0.0034 0.0083 0.0075 0.0091 0.0116 0.0134 0.0146 0.0076 0.6 0.0404 0.0051 0.0068 0.0172 0.0155 0.0188 0.0238 0.0274 0.0297 0.0158 0.9 0.0608 0.0078 0.0090 0.0260 0.0235 0.0285 0.0360 0.0413 0.0448 0.0240 0 -0.0009 -0.0007 -0.0007 -0.0011 -0.0011 -0.0012 -0.0011 -0.0010 -0.0010 -0.0011 0.3 0.0715 0.0404 0.0521 0.0598 0.0568 0.0511 0.0529 0.0561 0.0587 0.0575 0.6 0.1438 0.0815 0.1031 0.1206 0.1147 0.1035 0.1069 0.1131 0.1183 0.1162 0.9 0.2162 0.1226 0.1534 0.1815 0.1726 0.1559 0.1610 0.1702 0.1780 0.1749 0 0.0115 0.0202 0.0147 0.0119 0.0120 0.0124 0.0124 0.0124 0.0124 0.0120 0.3 0.0134 0.0202 0.0151 0.0119 0.0121 0.0128 0.0131 0.0133 0.0134 0.0121 0.6 0.0180 0.0202 0.0156 0.0121 0.0124 0.0141 0.0151 0.0157 0.0160 0.0123 0.9 0.0239 0.0203 0.0158 0.0123 0.0130 0.0160 0.0179 0.0190 0.0197 0.0127 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0161 0.0286 0.0211 0.0168 0.0170 0.0175 0.0175 0.0175 0.0175 0.0169 0.3 0.0198 0.0286 0.0213 0.0169 0.0172 0.0183 0.0189 0.0192 0.0194 0.0171 0.6 0.0283 0.0286 0.0215 0.0173 0.0179 0.0208 0.0227 0.0238 0.0246 0.0177 0.9 0.0386 0.0287 0.0217 0.0180 0.0192 0.0244 0.0279 0.0300 0.0313 0.0188 0 0.0270 0.0490 0.0358 0.0283 0.0287 0.0297 0.0297 0.0296 0.0295 0.0286 0.3 0.0352 0.0491 0.0365 0.0298 0.0301 0.0318 0.0330 0.0337 0.0341 0.0300 0.6 0.0529 0.0494 0.0378 0.0345 0.0341 0.0379 0.0416 0.0439 0.0455 0.0341 0.9 0.0735 0.0499 0.0391 0.0411 0.0401 0.0464 0.0529 0.0571 0.0598 0.0402 0 0.0843 0.1638 0.1090 0.0887 0.0910 0.0967 0.0973 0.0967 0.0958 0.0904 0.3 0.1131 0.1689 0.1231 0.1089 0.1091 0.1113 0.1131 0.1142 0.1149 0.1090 0.6 0.1736 0.1839 0.1563 0.1552 0.1518 0.1475 0.1512 0.1557 0.1591 0.1526 0.9 0.2432 0.2068 0.1986 0.2111 0.2041 0.1935 0.1992 0.2071 0.2134 0.2059 34 Table 1.2: Finite sample bias and RMSE of the various estimator of β1 , N = 5, T = 100, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 0.0001 0.0001 0.0000 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0001 0.3 0.0030 0.0000 0.0001 0.0002 0.0005 0.0012 0.0017 0.0020 0.0021 0.0001 0.6 0.0060 0.0000 0.0002 0.0003 0.0009 0.0024 0.0033 0.0039 0.0043 0.0002 0.9 0.0089 -0.0001 0.0001 0.0004 0.0013 0.0036 0.0050 0.0058 0.0064 0.0003 0 0.0001 0.0002 0.0002 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.3 0.0052 0.0002 0.0003 0.0008 0.0010 0.0022 0.0029 0.0034 0.0038 0.0007 0.6 0.0104 0.0002 0.0004 0.0015 0.0020 0.0043 0.0059 0.0069 0.0075 0.0014 0.9 0.0156 0.0003 0.0004 0.0022 0.0030 0.0064 0.0088 0.0103 0.0113 0.0021 0 0.0002 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0000 0.0000 0.0002 0.3 0.0107 0.0011 0.0011 0.0034 0.0032 0.0047 0.0061 0.0071 0.0078 0.0035 0.6 0.0212 0.0018 0.0015 0.0067 0.0062 0.0093 0.0122 0.0142 0.0155 0.0069 0.9 0.0317 0.0026 0.0016 0.0099 0.0093 0.0139 0.0183 0.0212 0.0232 0.0102 0 0.0008 0.0012 0.0002 0.0008 0.0008 0.0006 0.0005 0.0004 0.0004 0.0008 0.3 0.0431 0.0174 0.0174 0.0325 0.0294 0.0261 0.0286 0.0314 0.0335 0.0331 0.6 0.0854 0.0336 0.0335 0.0642 0.0580 0.0515 0.0567 0.0623 0.0666 0.0654 0.9 0.1277 0.0499 0.0486 0.0959 0.0866 0.0769 0.0849 0.0933 0.0997 0.0976 0 0.0056 0.0102 0.0082 0.0058 0.0059 0.0061 0.0062 0.0062 0.0061 0.0058 0.3 0.0067 0.0102 0.0085 0.0058 0.0059 0.0064 0.0066 0.0067 0.0067 0.0058 0.6 0.0091 0.0102 0.0087 0.0059 0.0061 0.0070 0.0076 0.0079 0.0081 0.0059 0.9 0.0121 0.0102 0.0089 0.0061 0.0065 0.0080 0.0090 0.0096 0.0099 0.0060 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0080 0.0145 0.0131 0.0083 0.0083 0.0087 0.0088 0.0088 0.0087 0.0082 0.3 0.0101 0.0145 0.0133 0.0083 0.0085 0.0092 0.0096 0.0098 0.0099 0.0083 0.6 0.0145 0.0145 0.0135 0.0086 0.0089 0.0105 0.0115 0.0121 0.0125 0.0085 0.9 0.0198 0.0145 0.0137 0.0090 0.0096 0.0123 0.0142 0.0153 0.0160 0.0089 0 0.0136 0.0251 0.0254 0.0142 0.0144 0.0150 0.0151 0.0151 0.0150 0.0142 0.3 0.0184 0.0252 0.0258 0.0149 0.0150 0.0162 0.0170 0.0174 0.0177 0.0148 0.6 0.0278 0.0252 0.0262 0.0164 0.0166 0.0193 0.0214 0.0228 0.0237 0.0165 0.9 0.0387 0.0253 0.0264 0.0187 0.0188 0.0235 0.0272 0.0296 0.0311 0.0188 0 0.0474 0.0932 0.0855 0.0501 0.0513 0.0541 0.0545 0.0541 0.0537 0.0499 0.3 0.0669 0.0950 0.0881 0.0616 0.0609 0.0619 0.0637 0.0648 0.0656 0.0618 0.6 0.1046 0.0997 0.0957 0.0865 0.0822 0.0800 0.0848 0.0888 0.0917 0.0874 0.9 0.1471 0.1071 0.1045 0.1164 0.1086 0.1031 0.1111 0.1182 0.1234 0.1180 35 Table 1.3: Finite sample bias and RMSE of the various estimator of β1 , N = 10, T = 50, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 -0.0001 0.0000 -0.0001 -0.0001 -0.0001 -0.0001 0.0000 0.0000 0.0000 -0.0001 0.3 0.0055 -0.0002 -0.0003 -0.0003 0.0003 0.0017 0.0026 0.0033 0.0037 0.0002 0.6 0.0111 -0.0005 0.0000 -0.0005 0.0006 0.0035 0.0053 0.0065 0.0074 0.0004 0.9 0.0167 -0.0008 0.0001 -0.0007 0.0010 0.0053 0.0080 0.0098 0.0111 0.0006 0 -0.0001 0.0001 -0.0001 -0.0001 -0.0001 -0.0001 0.0000 -0.0001 0.0000 -0.0001 0.3 0.0092 0.0001 0.0002 0.0012 0.0014 0.0032 0.0046 0.0056 0.0063 0.0014 0.6 0.0186 0.0002 0.0004 0.0024 0.0030 0.0064 0.0092 0.0112 0.0126 0.0028 0.9 0.0279 0.0003 0.0005 0.0037 0.0045 0.0097 0.0138 0.0168 0.0189 0.0043 0 -0.0001 0.0001 -0.0003 -0.0002 -0.0002 -0.0001 -0.0001 -0.0001 0.0000 -0.0002 0.3 0.0182 0.0024 0.0036 0.0074 0.0064 0.0074 0.0096 0.0114 0.0127 0.0066 0.6 0.0365 0.0047 0.0066 0.0149 0.0130 0.0150 0.0193 0.0229 0.0254 0.0134 0.9 0.0548 0.0070 0.0085 0.0224 0.0196 0.0225 0.0290 0.0343 0.0382 0.0202 0 0.0001 0.0005 -0.0001 0.0001 0.0000 0.0001 0.0003 0.0004 0.0005 0.0001 0.3 0.0672 0.0386 0.0488 0.0560 0.0530 0.0469 0.0482 0.0512 0.0541 0.0538 0.6 0.1343 0.0768 0.0959 0.1119 0.1060 0.0937 0.0960 0.1020 0.1077 0.1075 0.9 0.2014 0.1149 0.1419 0.1679 0.1590 0.1405 0.1439 0.1528 0.1613 0.1612 0 0.0069 0.0118 0.0088 0.0071 0.0072 0.0074 0.0075 0.0075 0.0074 0.0072 0.3 0.0091 0.0118 0.0090 0.0071 0.0072 0.0077 0.0081 0.0083 0.0084 0.0072 0.6 0.0138 0.0118 0.0092 0.0072 0.0073 0.0086 0.0097 0.0104 0.0110 0.0073 0.9 0.0193 0.0119 0.0094 0.0073 0.0076 0.0099 0.0119 0.0133 0.0142 0.0075 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0098 0.0168 0.0125 0.0101 0.0102 0.0105 0.0106 0.0106 0.0105 0.0101 0.3 0.0139 0.0168 0.0127 0.0102 0.0103 0.0111 0.0118 0.0122 0.0125 0.0103 0.6 0.0220 0.0168 0.0128 0.0105 0.0108 0.0129 0.0148 0.0161 0.0171 0.0107 0.9 0.0312 0.0168 0.0130 0.0110 0.0116 0.0154 0.0187 0.0212 0.0229 0.0114 0 0.0166 0.0290 0.0215 0.0172 0.0174 0.0180 0.0181 0.0181 0.0179 0.0173 0.3 0.0253 0.0291 0.0221 0.0189 0.0187 0.0198 0.0210 0.0218 0.0225 0.0187 0.6 0.0418 0.0294 0.0233 0.0233 0.0223 0.0244 0.0278 0.0305 0.0325 0.0225 0.9 0.0600 0.0299 0.0244 0.0293 0.0273 0.0307 0.0365 0.0411 0.0444 0.0277 0 0.0548 0.0994 0.0688 0.0568 0.0579 0.0610 0.0616 0.0612 0.0606 0.0576 0.3 0.0878 0.1068 0.0854 0.0806 0.0792 0.0776 0.0790 0.0807 0.0820 0.0795 0.6 0.1477 0.1261 0.1210 0.1276 0.1227 0.1138 0.1163 0.1213 0.1259 0.1240 0.9 0.2129 0.1529 0.1620 0.1805 0.1724 0.1564 0.1602 0.1684 0.1760 0.1744 36 Table 1.4: Finite sample bias and RMSE of the various estimator of β1 , N = 10, T = 100, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 0.0000 -0.0001 0.0000 0.0000 0.0000 -0.0001 -0.0001 -0.0001 -0.0001 0.0000 0.3 0.0027 -0.0001 0.0000 0.0000 0.0002 0.0009 0.0013 0.0016 0.0018 0.0000 0.6 0.0055 -0.0002 0.0001 0.0001 0.0005 0.0018 0.0026 0.0033 0.0037 0.0000 0.9 0.0083 -0.0003 0.0000 0.0002 0.0008 0.0027 0.0040 0.0049 0.0056 0.0000 0 -0.0001 -0.0001 0.0000 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 0.3 0.0046 -0.0001 0.0001 0.0005 0.0006 0.0015 0.0022 0.0027 0.0031 0.0004 0.6 0.0093 -0.0001 0.0002 0.0010 0.0013 0.0031 0.0045 0.0055 0.0063 0.0009 0.9 0.0140 0.0000 0.0002 0.0015 0.0020 0.0048 0.0068 0.0084 0.0094 0.0014 0 -0.0001 -0.0002 0.0000 -0.0001 -0.0001 -0.0001 -0.0002 -0.0002 -0.0002 -0.0001 0.3 0.0092 0.0005 0.0009 0.0026 0.0023 0.0034 0.0046 0.0056 0.0063 0.0027 0.6 0.0186 0.0011 0.0012 0.0052 0.0047 0.0069 0.0094 0.0113 0.0127 0.0055 0.9 0.0280 0.0017 0.0013 0.0079 0.0071 0.0104 0.0142 0.0170 0.0191 0.0082 0 -0.0001 -0.0005 -0.0005 -0.0001 -0.0002 -0.0003 -0.0003 -0.0004 -0.0004 -0.0001 0.3 0.0381 0.0141 0.0146 0.0281 0.0251 0.0213 0.0233 0.0260 0.0282 0.0286 0.6 0.0763 0.0287 0.0287 0.0563 0.0504 0.0429 0.0470 0.0523 0.0568 0.0574 0.9 0.1145 0.0433 0.0417 0.0846 0.0757 0.0645 0.0707 0.0787 0.0853 0.0862 0 0.0034 0.0059 0.0047 0.0035 0.0035 0.0036 0.0037 0.0036 0.0036 0.0035 0.3 0.0045 0.0059 0.0049 0.0035 0.0035 0.0038 0.0039 0.0040 0.0041 0.0035 0.6 0.0068 0.0059 0.0050 0.0035 0.0036 0.0042 0.0048 0.0051 0.0054 0.0035 0.9 0.0096 0.0059 0.0051 0.0036 0.0038 0.0050 0.0059 0.0066 0.0071 0.0036 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0048 0.0084 0.0075 0.0050 0.0050 0.0052 0.0052 0.0052 0.0051 0.0049 0.3 0.0069 0.0084 0.0075 0.0050 0.0051 0.0055 0.0058 0.0060 0.0061 0.0050 0.6 0.0110 0.0084 0.0077 0.0051 0.0053 0.0064 0.0073 0.0080 0.0085 0.0051 0.9 0.0157 0.0084 0.0078 0.0053 0.0057 0.0076 0.0093 0.0105 0.0114 0.0053 0 0.0083 0.0146 0.0145 0.0086 0.0087 0.0089 0.0090 0.0089 0.0089 0.0085 0.3 0.0128 0.0146 0.0148 0.0090 0.0090 0.0097 0.0103 0.0108 0.0111 0.0090 0.6 0.0213 0.0147 0.0151 0.0102 0.0101 0.0119 0.0137 0.0152 0.0162 0.0103 0.9 0.0307 0.0148 0.0152 0.0121 0.0117 0.0148 0.0181 0.0206 0.0223 0.0123 0 0.0299 0.0548 0.0519 0.0311 0.0317 0.0331 0.0333 0.0330 0.0326 0.0310 0.3 0.0494 0.0566 0.0540 0.0425 0.0410 0.0399 0.0414 0.0428 0.0439 0.0429 0.6 0.0845 0.0620 0.0608 0.0661 0.0611 0.0559 0.0598 0.0641 0.0676 0.0671 0.9 0.1223 0.0703 0.0692 0.0930 0.0848 0.0755 0.0818 0.0890 0.0949 0.0946 37 Table 1.5: Finite sample bias and RMSE of the various estimator of β1 , N = 25, T = 50, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 -0.0001 0.0001 -0.0001 -0.0001 -0.0001 0.0000 0.0000 0.0000 0.0000 -0.0001 0.3 0.0054 -0.0002 -0.0002 -0.0003 0.0002 0.0015 0.0023 0.0030 0.0034 0.0001 0.6 0.0108 -0.0004 0.0000 -0.0005 0.0005 0.0030 0.0047 0.0060 0.0069 0.0003 0.9 0.0162 -0.0007 0.0001 -0.0007 0.0008 0.0045 0.0071 0.0090 0.0104 0.0004 0 -0.0001 0.0001 -0.0001 -0.0001 -0.0001 -0.0001 0.0000 0.0000 -0.0001 -0.0001 0.3 0.0088 0.0002 0.0002 0.0010 0.0012 0.0027 0.0040 0.0050 0.0057 0.0012 0.6 0.0177 0.0003 0.0004 0.0021 0.0026 0.0055 0.0080 0.0100 0.0115 0.0024 0.9 0.0266 0.0003 0.0005 0.0033 0.0039 0.0083 0.0121 0.0151 0.0173 0.0037 0 -0.0001 0.0002 -0.0002 -0.0002 -0.0002 -0.0001 -0.0001 -0.0001 -0.0001 -0.0002 0.3 0.0172 0.0024 0.0034 0.0067 0.0058 0.0065 0.0085 0.0102 0.0116 0.0060 0.6 0.0345 0.0045 0.0062 0.0137 0.0118 0.0131 0.0170 0.0205 0.0232 0.0122 0.9 0.0518 0.0066 0.0080 0.0206 0.0178 0.0197 0.0256 0.0308 0.0348 0.0184 0 0.0000 0.0008 -0.0002 -0.0001 -0.0002 -0.0001 0.0000 0.0001 0.0001 -0.0002 0.3 0.0644 0.0378 0.0467 0.0534 0.0505 0.0441 0.0451 0.0480 0.0510 0.0512 0.6 0.1289 0.0747 0.0919 0.1070 0.1011 0.0884 0.0901 0.0959 0.1018 0.1026 0.9 0.1933 0.1116 0.1354 0.1605 0.1518 0.1327 0.1351 0.1439 0.1527 0.1540 0 0.0040 0.0068 0.0049 0.0041 0.0041 0.0042 0.0042 0.0042 0.0042 0.0041 0.3 0.0068 0.0068 0.0051 0.0041 0.0041 0.0045 0.0049 0.0052 0.0055 0.0041 0.6 0.0118 0.0068 0.0052 0.0042 0.0042 0.0054 0.0065 0.0075 0.0083 0.0042 0.9 0.0171 0.0068 0.0053 0.0042 0.0043 0.0066 0.0086 0.0103 0.0115 0.0043 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0056 0.0097 0.0071 0.0058 0.0058 0.0059 0.0060 0.0059 0.0059 0.0058 0.3 0.0106 0.0097 0.0072 0.0059 0.0060 0.0066 0.0073 0.0079 0.0083 0.0059 0.6 0.0190 0.0097 0.0072 0.0062 0.0065 0.0084 0.0103 0.0120 0.0132 0.0064 0.9 0.0278 0.0097 0.0074 0.0067 0.0072 0.0107 0.0140 0.0167 0.0188 0.0071 0 0.0096 0.0167 0.0122 0.0099 0.0100 0.0102 0.0102 0.0101 0.0100 0.0099 0.3 0.0199 0.0169 0.0128 0.0120 0.0116 0.0122 0.0135 0.0146 0.0155 0.0117 0.6 0.0364 0.0173 0.0141 0.0171 0.0157 0.0170 0.0204 0.0234 0.0258 0.0160 0.9 0.0536 0.0180 0.0152 0.0232 0.0208 0.0229 0.0284 0.0333 0.0370 0.0213 0 0.0322 0.0578 0.0397 0.0332 0.0337 0.0351 0.0352 0.0349 0.0345 0.0336 0.3 0.0725 0.0691 0.0618 0.0633 0.0610 0.0567 0.0576 0.0597 0.0619 0.0616 0.6 0.1338 0.0946 0.1012 0.1128 0.1073 0.0959 0.0976 0.1030 0.1084 0.1087 0.9 0.1975 0.1260 0.1427 0.1651 0.1566 0.1384 0.1410 0.1494 0.1578 0.1587 38 Table 1.6: Finite sample bias and RMSE of the various estimator of β1 , N = 25, T = 100, Bartlett kernel ρ1 ρ2 P-OLS P-IM P-DOLS Panel FM-OLS b=0.06 0.1 0.3 0.5 0.7 0.9 AND Panel A: Bias 0 0.3 0.6 0.9 0 0.0000 -0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0001 -0.0001 0.0000 0.3 0.0027 -0.0002 0.0000 0.0000 0.0002 0.0007 0.0011 0.0015 0.0017 0.0000 0.6 0.0054 -0.0002 0.0001 0.0000 0.0004 0.0015 0.0023 0.0030 0.0034 0.0000 0.9 0.0081 -0.0003 0.0000 0.0001 0.0006 0.0023 0.0035 0.0045 0.0052 0.0000 0 -0.0001 -0.0001 0.0000 0.0000 0.0000 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 0.3 0.0044 -0.0001 0.0001 0.0004 0.0005 0.0013 0.0019 0.0024 0.0028 0.0004 0.6 0.0089 -0.0001 0.0002 0.0008 0.0011 0.0027 0.0040 0.0050 0.0057 0.0008 0.9 0.0134 -0.0001 0.0002 0.0013 0.0017 0.0040 0.0060 0.0075 0.0086 0.0012 0 -0.0001 -0.0002 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 -0.0001 0.3 0.0087 0.0004 0.0008 0.0023 0.0020 0.0029 0.0040 0.0050 0.0057 0.0024 0.6 0.0175 0.0009 0.0011 0.0047 0.0041 0.0059 0.0082 0.0101 0.0115 0.0049 0.9 0.0264 0.0015 0.0012 0.0071 0.0062 0.0089 0.0123 0.0151 0.0173 0.0074 0 -0.0003 -0.0009 -0.0001 -0.0003 -0.0002 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 0.3 0.0360 0.0130 0.0142 0.0263 0.0235 0.0194 0.0212 0.0238 0.0261 0.0269 0.6 0.0722 0.0270 0.0271 0.0530 0.0472 0.0391 0.0427 0.0479 0.0525 0.0540 0.9 0.1084 0.0409 0.0390 0.0796 0.0709 0.0588 0.0642 0.0720 0.0789 0.0811 0 0.0020 0.0035 0.0026 0.0020 0.0021 0.0021 0.0021 0.0021 0.0021 0.0020 0.3 0.0034 0.0035 0.0027 0.0020 0.0021 0.0023 0.0025 0.0026 0.0027 0.0020 0.6 0.0059 0.0035 0.0028 0.0021 0.0021 0.0027 0.0033 0.0038 0.0041 0.0021 0.9 0.0085 0.0035 0.0029 0.0021 0.0023 0.0033 0.0043 0.0051 0.0058 0.0021 Panel B: RMSE 0 0.3 0.6 0.9 0 0.0028 0.0050 0.0041 0.0029 0.0029 0.0030 0.0030 0.0030 0.0030 0.0029 0.3 0.0053 0.0050 0.0042 0.0029 0.0030 0.0033 0.0037 0.0040 0.0042 0.0029 0.6 0.0095 0.0050 0.0043 0.0031 0.0032 0.0042 0.0052 0.0060 0.0066 0.0030 0.9 0.0140 0.0050 0.0044 0.0032 0.0035 0.0053 0.0070 0.0083 0.0094 0.0032 0 0.0049 0.0087 0.0081 0.0050 0.0051 0.0052 0.0053 0.0052 0.0051 0.0050 0.3 0.0102 0.0087 0.0083 0.0056 0.0055 0.0061 0.0068 0.0073 0.0078 0.0056 0.6 0.0186 0.0087 0.0085 0.0070 0.0067 0.0081 0.0100 0.0116 0.0129 0.0072 0.9 0.0273 0.0088 0.0086 0.0089 0.0083 0.0107 0.0139 0.0165 0.0185 0.0092 0 0.0181 0.0328 0.0297 0.0186 0.0189 0.0196 0.0197 0.0195 0.0192 0.0186 0.3 0.0407 0.0354 0.0333 0.0326 0.0304 0.0279 0.0293 0.0311 0.0327 0.0330 0.6 0.0753 0.0426 0.0412 0.0568 0.0515 0.0445 0.0479 0.0526 0.0567 0.0578 0.9 0.1113 0.0527 0.0504 0.0827 0.0743 0.0631 0.0684 0.0759 0.0824 0.0843 39 Table 1.7: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 5, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) Panel A: T = 50 0 0.0514 0.1294 0.0762 0.0724 0.0544 0.0744 0.1264 0.0836 0.0808 0.0624 0.0422 0.3 0.2178 0.1792 0.1032 0.0974 0.0796 0.132 0.1632 0.0984 0.0936 0.0746 0.101 0.6 0.6064 0.2452 0.2214 0.1462 0.1212 0.2694 0.2176 0.1908 0.1246 0.109 0.1924 0.9 0.9366 0.6068 0.777 0.4804 0.422 0.7628 0.5736 0.7336 0.4462 0.4042 0.7052 0.0524 0.0458 Panel B: T = 100 0 0.0472 0.1416 0.0588 0.0586 0.0516 0.0524 0.1402 0.062 0.0626 0.3 0.6 0.2082 0.227 0.0794 0.0774 0.0678 0.1038 0.2158 0.0756 0.0726 0.0602 0.0866 0.5914 0.3256 0.1584 0.111 0.0952 0.1776 0.2976 0.1326 0.094 0.0784 0.1308 0.9 0.934 0.4804 0.7266 0.3376 0.286 0.6198 0.4494 0.6974 0.3112 0.2626 0.561 Table 1.8: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 10, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) Panel A: T = 50 0 0.051 0.1142 0.0654 0.0708 0.0642 0.0762 0.1116 0.0712 0.08 0.072 0.0516 0.3 0.2846 0.1652 0.0902 0.0982 0.0908 0.1418 0.1454 0.0848 0.0912 0.0852 0.1054 0.6 0.7924 0.2188 0.2422 0.1482 0.1352 0.3306 0.1898 0.1994 0.1256 0.1222 0.2892 0.9 0.9924 0.7518 0.9332 0.5832 0.5276 0.8576 0.7094 0.908 0.5464 0.5094 0.8362 0 0.0476 0.139 0.0542 0.0582 0.0534 0.0536 0.137 0.0558 0.0628 0.057 0.046 0.3 0.2882 0.2252 0.0758 0.0794 0.075 0.1096 0.21 0.0662 0.0728 0.0688 0.0908 Panel B: T = 100 0.6 0.787 0.308 0.1802 0.1098 0.0984 0.1776 0.281 0.1466 0.0938 0.0848 0.1366 0.9 0.9896 0.4982 0.9002 0.3888 0.3396 0.6724 0.4598 0.8742 0.3588 0.316 0.6254 40 Table 1.9: Empirical null rejection probabilities, 0.05 level, t-tests for h0 : β1 = 1, N = 25, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) Panel A: T = 50 0 0.0482 0.1022 0.0566 0.0622 0.0592 0.0664 0.099 0.0622 0.0706 0.0682 0.0404 0.3 0.5116 0.1534 0.0822 0.0848 0.0826 0.1292 0.1358 0.0724 0.0796 0.079 0.0972 0.6 0.9786 0.2362 0.3852 0.1428 0.1328 0.335 0.2042 0.2952 0.1204 0.1202 0.2966 0.9 1 0.962 0.9976 0.8058 0.7674 0.9542 0.9448 0.9966 0.7804 0.7474 0.9476 0.1332 0.0562 0.0638 0.0612 0.046 Panel B: T = 100 0 0.0476 0.1354 0.0554 0.0612 0.0588 0.0558 0.3 0.5124 0.2032 0.0766 0.08 0.079 0.1106 0.188 0.067 0.0726 0.0716 0.0954 0.6 0.9744 0.2752 0.2708 0.1128 0.1056 0.1844 0.2478 0.2064 0.0988 0.095 0.1424 0.9 1 0.6172 0.9958 0.5474 0.5132 0.7944 0.5736 0.993 0.5224 0.4876 0.7528 Table 1.10: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 5, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) 0.0696 0.04 Panel A: T = 50 0 0.05 0.1552 0.3 0.3026 0.2344 0.6 0.822 0.3362 0.9 0.9972 0.8098 0 0.049 0.3 0.0794 0.0804 0.0574 0.0794 0.1514 0.0938 0.1006 0.1254 0.1188 0.3098 0.1996 0.0952 0.1746 0.2086 0.1168 0.1156 0.088 0.122 0.161 0.3802 0.297 0.2578 0.167 0.1408 0.2664 0.9486 0.7018 0.6244 0.9414 0.7728 0.9202 0.6646 0.6006 0.9068 0.185 0.0626 0.0636 0.0478 0.056 0.1832 0.0686 0.071 0.052 0.0418 0.2898 0.3258 0.0908 0.0912 0.0772 0.1302 0.3048 0.085 0.0812 0.067 0.105 0.6 0.9 0.8276 0.4574 0.2068 0.1372 0.1142 0.2356 0.4216 0.1696 0.1168 0.0952 0.169 0.9982 0.6786 0.9156 0.5056 0.4282 0.8274 0.6352 0.8962 0.4684 0.3962 0.7724 Panel B: T = 100 41 Table 1.11: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 10, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) Panel A: T = 50 0 0.0486 0.1396 0.0708 0.0798 0.0704 0.0874 0.134 0.0766 0.0928 0.078 0.049 0.3 0.4166 0.2162 0.109 0.1148 0.1038 0.1782 0.1886 0.099 0.1094 0.0976 0.127 0.6 0.9562 0.3076 0.3512 0.1914 0.1706 0.4722 0.2586 0.2786 0.1558 0.1494 0.4046 0.9 1 0.916 0.9952 0.8002 0.7398 0.9756 0.8894 0.9888 0.765 0.7182 0.9672 0.0608 0.0444 Panel B: T = 100 0 0.0484 0.1718 0.0602 0.0636 0.0564 0.0546 0.169 0.065 0.068 0.3 0.4212 0.311 0.085 0.6 0.9602 0.4354 0.2532 0.091 0.082 0.1342 0.2854 0.0762 0.0806 0.073 0.1084 0.1358 0.1186 0.2432 0.3916 0.1936 0.1106 0.0964 0.1782 0.9 1 0.688 0.9916 0.566 0.5024 0.8694 0.6454 0.9854 0.528 0.468 0.8304 Table 1.12: Empirical null rejection probabilities, 0.05 level, Wald-tests for h0 : β1 = 1, β2 = 1, N = 25, data dependent bandwidths and lag lengths. ρ1 , ρ2 P-OLS Bartlett kernel P- P-FM DOLS QS kernel P- P- P- P- IM(O) IM(D) IM(Fb) DOLS P-FM P- P- P- IM(O) IM(D) IM(Fb) Panel A: T = 50 0 0.0528 0.132 0.0656 0.073 0.0702 0.0836 0.1292 0.0702 0.085 0.0806 0.0472 0.3 0.6 0.7142 0.208 0.1128 0.9998 0.3328 0.5454 0.1078 0.105 0.1686 0.1796 0.0984 0.1016 0.0982 0.1202 0.1914 0.1786 0.484 0.2736 0.4302 0.1598 0.1608 0.4234 0.9 1 0.9986 1 0.9474 0.9264 0.9978 0.9972 1 0.9344 0.9138 0.9962 0 0.0538 0.1706 0.0606 0.0628 0.0596 0.058 0.1674 0.0628 0.069 0.065 0.046 0.3 0.7098 0.2822 0.0868 0.0876 0.0866 0.135 0.257 0.076 0.08 0.0786 0.1074 0.6 0.9998 0.4058 0.3832 0.1362 0.125 0.2498 0.3598 0.2922 0.1124 0.1054 0.1796 0.9 1 0.8108 1 0.7578 0.7186 0.9488 0.774 1 0.7304 0.6874 0.93 Panel B: T = 100 42 Table 1.13: Fixed-b asymptotic critical value for t-test of β in regression with intercept and two regressors, N = 25, Bartlett kernel b 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 95% 1.7329 1.9363 2.1731 2.4357 2.7227 3.0220 3.3221 3.6112 3.8807 4.1140 97.5% 2.0630 2.3079 2.5845 2.8934 3.2298 3.5864 3.9396 4.2836 4.5986 4.8755 99% 2.4683 2.7561 3.0980 3.4713 3.8776 4.3055 4.7293 5.1508 5.5243 5.8506 99.5% 2.7275 3.0599 3.4300 3.8481 4.3085 4.7773 5.2548 5.7118 6.1411 6.5021 b 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40 95% 4.3072 4.4702 4.6004 4.7217 4.8296 4.9306 5.0294 5.1291 5.2242 5.3160 97.5% 5.1085 5.2956 5.4627 5.5963 5.7331 5.8559 5.9673 6.0789 6.1903 6.3132 99% 6.1355 6.3751 6.5593 6.7345 6.8818 7.0326 7.1687 7.3157 7.4607 7.5930 99.5% 6.8015 7.0752 7.2862 7.4722 7.6352 7.7980 7.9644 8.1320 8.2883 8.4333 b 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.60 95% 5.4079 5.5070 5.5967 5.6805 5.7679 5.8447 5.9164 5.9942 6.0680 6.1366 97.5% 6.4189 6.5329 6.6388 6.7461 6.8517 6.9494 7.0410 7.1285 7.2094 7.2829 99% 7.7228 7.8553 7.9922 8.1188 8.2377 8.3674 8.4834 8.5767 8.6779 8.7842 99.5% 8.5820 8.7258 8.8732 9.0041 9.1516 9.2895 9.4089 9.5389 9.6542 9.7574 b 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.80 95% 6.2004 6.2597 6.3208 6.3811 6.4329 6.4885 6.5452 6.5982 6.6506 6.7020 97.5% 7.3600 7.4337 7.5122 7.5804 7.6560 7.7180 7.7900 7.8533 7.9185 7.9721 99% 8.8838 8.9709 9.0515 9.1402 9.2231 9.3065 9.3759 9.4616 9.5297 9.5959 99.5% 9.8619 9.9699 10.0559 10.1605 10.2384 10.3414 10.4381 10.5210 10.6046 10.6947 b 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1.00 95% 6.7480 6.7906 6.8332 6.8817 6.9216 6.9658 7.0077 7.0485 7.0834 7.1205 97.5% 8.0248 8.0815 8.1347 8.1868 8.2422 8.2949 8.3433 8.3876 8.4348 8.4781 99% 9.6626 9.7342 9.8024 9.8573 9.9243 9.9824 10.0446 10.1029 10.1630 10.2177 99.5% 10.7659 10.8418 10.9134 10.9985 11.0667 11.1332 11.1945 11.2611 11.3375 11.4049 43 Table 1.14: Fixed-b asymptotic critical value for t-test of β in regression with intercept and two regressors, N = 25, QS kernel b 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 95% 1.7870 2.0829 2.4752 2.9884 3.6602 4.5278 5.5699 6.6972 7.7744 8.6329 97.5% 2.1270 2.4815 2.9371 3.5436 4.3513 5.3846 6.6572 8.0320 9.3128 10.3492 99% 2.5440 2.9713 3.5281 4.2744 5.2392 6.5058 8.0225 9.6946 11.2728 12.5370 99.5% 2.8123 3.2967 3.9182 4.7454 5.8424 7.2470 8.9767 10.8163 12.6241 14.0653 b 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40 95% 9.2475 9.6378 9.8866 10.0412 10.1446 10.2071 10.2604 10.2899 10.3154 10.3393 97.5% 11.1166 11.5874 11.8763 12.0800 12.2085 12.2906 12.3468 12.3913 12.4277 12.4530 99% 13.4557 14.0671 14.4533 14.7158 14.8841 14.9774 15.0857 15.1470 15.1848 15.2096 99.5% 15.1063 15.7767 16.2224 16.5170 16.6887 16.8283 16.9298 16.9828 17.0212 17.0680 b 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.60 95% 10.3527 10.3643 10.3739 10.3828 10.3897 10.3990 10.4029 10.4086 10.4133 10.4200 97.5% 12.4740 12.4893 12.5025 12.5136 12.5235 12.5329 12.5409 12.5420 12.5483 12.5543 99% 15.2323 15.2485 15.2564 15.2734 15.2875 15.2977 15.3124 15.3196 15.3307 15.3435 99.5% 17.0886 17.1114 17.1343 17.1512 17.1645 17.1798 17.1817 17.1869 17.2133 17.2270 b 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.80 95% 10.4231 10.4271 10.4279 10.4262 10.4278 10.4319 10.4349 10.4357 10.4382 10.4410 97.5% 12.5603 12.5697 12.5720 12.5781 12.5785 12.5849 12.5881 12.5913 12.5929 12.5952 99% 15.3526 15.3495 15.3554 15.3513 15.3575 15.3655 15.3730 15.3769 15.3827 15.3920 99.5% 17.2313 17.2400 17.2423 17.2405 17.2307 17.2383 17.2374 17.2485 17.2494 17.2503 b 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1.00 95% 10.4437 10.4459 10.4456 10.4464 10.4469 10.4463 10.4480 10.4482 10.4474 10.4481 97.5% 12.5982 12.5994 12.6013 12.6013 12.6019 12.6023 12.6019 12.6021 12.6028 12.6043 99% 15.3955 15.3959 15.3964 15.3969 15.3977 15.4018 15.4060 15.4099 15.4127 15.4150 99.5% 17.2619 17.2727 17.2749 17.2764 17.2739 17.2792 17.2822 17.2859 17.2892 17.2920 44 Figure 1.1: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel 45 Figure 1.2: Empirical null rejections, t-test, N = 10, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel 46 Figure 1.3: Empirical null rejections, t-test, N = 25, T = 100, ρ1 = ρ2 = 0.3, Bartlett kernel 47 Figure 1.4: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel 48 Figure 1.5: Empirical null rejections, t-test, N = 10, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel 49 Figure 1.6: Empirical null rejections, t-test, N = 25, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel 50 Figure 1.7: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel 51 Figure 1.8: Empirical null rejections, t-test, N = 10, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel 52 Figure 1.9: Empirical null rejections, t-test, N = 25, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel 53 Figure 1.10: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.3, QS kernel 54 Figure 1.11: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, QS kernel 55 Figure 1.12: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, QS kernel 56 Figure 1.13: Size adjusted power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.3, QS kernel 57 Figure 1.14: Size adjusted power of panel IM, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, QS kernel 58 Figure 1.15: Size adjusted power of panel IM, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel 59 Figure 1.16: Size adjusted power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, QS kernel 60 Figure 1.17: Size adjusted power, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, b = 0.3, QS kernel 61 Figure 1.18: Size adjusted power of panel IM, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, QS kernel 62 Figure 1.19: Size adjusted power, Wald test, N = 10, T = 50, ρ1 = ρ2 = 0.6, QS kernel 63 Proof of Theorem 1 By Assumption 2 above, we can define stacked innovation vector ηt = u1t , · · · uN t , v1t , · · · vN t , which dimension is (N + N k) × 1, and assume that [rT ] Bu (r) Λ11 W1 (r) + Λ12 W2 (r) = , Bv (r) Λ22 W2 (r) 1/2 T −1/2 ηt ⇒ Ωη W (r) = t=1 where 1/2 Ωη = W (r) = Λ11 Λ12 0N k×N Λ22 W1 (r) . W2 (r) The dimension of the above matrix are as follows: Λ11 is N × N , Λ12 is N × N k, 0N k×N is N k × N zero matrix, Λ22 is N k × N k; W1 (r) = wu,1 (r), · · · , wu,N (r) is N × 1 vector, is N k × 1 vector. W2 (r) = Wv,1 (r) , · · · , Wv,N (r) Long run variance of ηt is: 1/2 Ωη = Ωη 1/2 Ωη ∞ = E ηt ηt−j . j=−∞ Also we have: η η Ω11 Ω12 Λ11 Λ12 = η η Ω21 Ω22 0N k×N Λ22 Ωη = Λ11 0N ×N k Λ11 Λ11 + Λ12 Λ12 Λ12 Λ22 = , Λ12 Λ22 Λ22 Λ12 Λ22 Λ22 η η where Ω11 = Λ11 Λ11 + Λ12 Λ12 is long run variance of ut = u1t · · · uN t , Ω22 = Λ22 Λ22 η η is long run variance of vt = v1t · · · vN t , Ω12 = Ω21 = Λ12 Λ22 is long run covariance is N × N diagonal of ut and vt . From assumption 1, we know that Λ11 = IN ⊗ σu·v N ×N matrix, Λ12 = IN ⊗ λuv is N × N k diagonal matrix, Λ22 = IN ⊗ Ω1/2 vv N k×N k is N ×N k 1/2 N k × N k diagonal matrix, where σu·v is scalar, λuv is 1 × k vector, and Ωvv is k × k matrix. 64 Using diagonal scaling matrix   3/2 T Ik 0   A1T =  T 1/2 Ik , 0 T 1/2 IN ⊗ GD then we have:  T β˜ − β   γ˜ − Ω−1 Ωvu vv   −1/2 A ˜ ˜ ˜  A−1 1T θ − θ =  GD δ1 − δ1 P IM θ − θ = T  ..  .  GD δ˜N − δN  T −1  N −1  A−1 1T qit qit A1T = T −1 T           N −1/2 S u − x γ  . A−1 it it 1T qit T T −1 t=1 i=1  t=1 i=1 By our assumptions,  3 x T − 2 Si[rT ] 1 T − 2 xi[rT ]    1/2 r   r Ω W (s)ds B (s)ds vv v,i v,i   0      0     Bv,i (r)   Ω1/2  W (r) vv v,i             0 0 p×1       0p×1 p×1       .. . −1 .     = Πg1,i (r),   . .. A1T qit =  = ⇒ . .       1     r  r  − 2 −1 D    0 D(s)ds   0 D(s)ds T GD St      .. .       . . .       . . .   0p×1 0p×1 0p×1 and it follows that T N −1 A−1 1T qit qit A1T T −1 1 N ⇒ 0 i=1 t=1 i=1 65 Πg1,i (r)g1,i (r) Π dr. Also, by previous assumptions, u − T −1/2 x γ = T −1/2 Sit it u −x γ T −1/2 Sit it ⇒ Bu,i (r) − Bv,i (r)γ 1/2 = σu·v wu,i (r) + λuv Wv,i (r) − Ωvv Wv,i (r) γ 1/2 1/2 γ− Ωvv 1/2 γ− Ωvv 1/2 γ − Ω−1 vv Ωvu = σu·v wu,i (r) − Wv,i (r) Ωvv = σu·v wu,i (r) − Wv,i (r) Ωvv = σu·v wu,i (r) − Wv,i (r) Ωvv 1/2 −1 λuv −1 −1/2 Ωvv Ωvu so if γ = Ω−1 vv Ωvu , then u − x γ ⇒ σ w (r). T −1/2 Sit u·v u,i it Combining above results, we have:   T β˜ − β   γ˜ − Ω−1 Ωvu vv    GD δ˜1 − δ1   ..  .  GD δ˜N − δN  ⇒ −1  1 N 0 i=1 Πg1,i (s)g1,i (s) Π dr  = σu·v Π  = σu·v Π −1      ˜  = A−1  P IM θ − θ    −1  N i=1 0 1  0 i=1 0 i=1 −1  1 N g1,i (s)g1,i (s) ds −1  g1,i (s)g1,i (s) ds   1 N  N i=1 0 1 Πg1,i (s)σu·v wu,i (s)ds 1 N 0 i=1  g1,i (s)wu,i (s)ds  [G1,i (1) − G1,i (s)]dwu,i (s) = Ψ. For the sequential limit of β˜ − β , we first let T → ∞, then let N → ∞, so we have 66 √ √ ˜ N T β˜ − β = N Ik 0k×k 0k×p · · · 0k×p A−1 P IM θ − θ  −1 T N 1 −1  = Ik 0k×k 0k×p · · · 0k×p  A−1 × 1T qit qit A1T NT t=1 i=1   T N 1 u −x γ  √ 1 A−1 Sit it 1T qit √ N T t=1 i=1 T  −1 N T 1 1 −1  = Ik 0k×k 0k×p · · · 0k×p  A−1 × 1T qit qit A1T N T t=1 i=1   N T 1 1 u −x γ   √1 A−1 qit √ Sit it 1T N i=1 T t=1 T T →∞ −1 =⇒ σu·v Ik 0k×k 0k×p · · · 0k×p Π × −1    N N 1 1 1 1 g1,i (r)g1,i (r) dr  √ [G1,i (1) − G1,i (r)]dwu,i (r) N N i=1 0 i=1 0 N →∞ =⇒ Φ In order to get the distribution of Φ, we need to know the limit of the upper and left k × k block of N1 N i=1 1 0 g1,i (r)g1,i (r) N 1 √1 [G1,i (1) − G1,i (r)]dwu,i (r) N i=1 0 dr and the distribution of the upper k × 1 block of as N → ∞. First, consider the limit of the upper and left k × k block of N1 N N N i=1 1 0 g1,i (r)g1,i (r) dr. 1 1 1 Note that, N1 g1,i (r)g1,i (r) dr. The related compo0 g1,i (r)g1,i (r) dr = 0 N i=1 i=1 nents for the sequential limits is the integral of the upper and left k × k block of the limit of 1 N N g1,i (r)g1,i (r) , which is given by i=1 67 N 1 N = = = r i=1 0 N r 1 N 1 N r i=1 0 0 N r r i=1 0 r s → 0 3 r 0 Wv,i (u)du Wv,i (s) ds 0 i=1 0 0 N r s 1 N Wv,i (s) ds 0 r i=1 0 N r 1 N = r Wv,i (s)ds s Wv,i (u)Wv,i (s) duds Wv,i (u)Wv,i (s) duds+ Wv,i (u)Wv,i (s) duds r ududs · Ik + 0 r s sduds · Ik I = A1 (r) 3 k Therefore, 01 A1 (r)dr = (1/12)Ik . Second, consider the distribution of the upper k × 1 block of N 1 1 √ [G1,i (1) − G1,i (r)]dwu,i (r). N i=1 0 It has an asymptotic normal distribution, with zero mean, when conditional on G1,i (r) for all i = 1, 2, . . . , N . So we only need to find its asymptotic variance. Also, recall that the units are cross-sectional independent, so we have  N  1 1 [G1,i (1) − G1,i (r)]dwu,i (r) var  √ N i=1 0 1 N 1 = [G1,i (1) − G1,i (r)][G1,i (1) − G1,i (r)] dr 0 N i=1 as N fixed. 68 Note that the variance of the upper k × 1 block of N 1 1 √ [G1,i (1) − G1,i (r)]dwu,i (r) N i=1 0 is just the upper and left k × k block of N 1 1 [G1,i (1) − G1,i (r)][G1,i (1) − G1,i (r)] dr, 0 N i=1 which is given by 1 N = = 1 N 1 N 1 N N 1 s i=1 r 0 N 1 s 1 Wv,i (u)duds 1 i=1 r 0 r 0 N 1 s 1 s 0 r 1 u → r 0 1 r = r s 0 r Wv,i (u) duds u Wv,i (v)Wv,i (u) dvdtduds Wv,i (v)Wv,i (u) dvdtduds+ Wv,i (v)Wv,i (u) dvdtduds vdvdtduds Ik + 0 1 0 s i=1 r 0 r 0 N 1 s 1 u i=1 r 1 s r s s u udvdtduds Ik 1 (1 − r) 1 − r4 Ik = A2 (r) 12 So, we have 01 A2 (r)dr = (7/180)Ik . Using above notations, the sequential asymptotic distribution Φ is given by −1 Φ∼N 2 0, σu·v −1 1 Ω1/2 vv 1 A1 (r)dr 0 −1 1 A2 (r)dr 0 A1 (r)dr Ω1/2 vv −1 . 0 We denote its variance as β Vseq 1/2 Ωvv = 2 σu·v = 2 Ω−1 . 5.6 · σu·v vv −1 −1 1 0 A1 (r)dr 1 0 69 −1 1 A2 (r)dr 0 A1 (r)dr 1/2 −1 Ωvv Proof of Theorem 2 ˘ , with W ˘ ∈ {W ˆ ,W ˜ ,W ˜ ∗ }. Those In Theorem 2, the Wald statistics considered was W statistics only differ with respect to the used estimator of the long run variance parameter, 2∗ . As in the proof of Theorem 2, θ˜ represents the vector of panel 2 ,σ 2 ,σ 2 ∈ σ ˜u·v ˜u·v ˆu·v σ ˘u·v IM-OLS estimators δ˜ , β˜ , γ˜ estimator for VP IM is given by  , and θ denotes the vector T −1 N 2 T −2 V˘P IM = σ ˘u·v δ , β , Ωvu Ω−1 vv . The × AP IM qit qit AP IM  t=1 i=1  T  N q T −4 q q AP IM SiT − Si,t−1 q SiT − Si,t−1 AP IM  × t=1 i=1  T −1 N T −2 AP IM qit qit AP IM  t=1 i=1 = 2 σ ˘u·v · V˘ , where V˘ is the estimator for  V = Π     −1  −1 1 N 0 i=1 g1,i (s)g1,i (s) ds  1 N 0 i=1 [G1,i (1) − G1,i (s)][G1,i (1) − G1,i (s)] ds × −1 1 N 0 i=1 × g1,i (s)g1,i (s) ds Π−1 . Under the null hypothesis the Wald statistics and t statistics can be written as 70 ˘ = Rθ˜ − r W = R θ˜ − θ = −1 RAP IM V˘P IM AP IM R RAP IM V˘P IM AP IM R −1 A−1 R RAP IM AP IM Rθ˜ − r −1 R θ˜ − θ ˘ A−1 R RAP IM VP IM AP IM R θ˜ − θ −1 ˜ A−1 R RAP IM AP IM θ − θ A−1 R −1 × , and t˘ = Rθ˜ − r / RAP IM V˘P IM AP IM R −1 ˜ = A−1 R RAP IM AP IM θ − θ ˘ A−1 R RAP IM VP IM AP IM R / A−1 R . Now, by assumption the restriction matrix fulfills lim A−1 RAP IM T →∞ R = R∗ , and ˜ A−1 P IM θ − θ ⇒ Ψ (VP IM ) under the null hypothesis. Therefore, in case of consistent estimation of the conditional long 2 using V ˆP IM it follows that run variance σu·v ˆ W = −1 A−1 R RAP IM AP IM θ˜ − θ ˆ A−1 R RAP IM VP IM AP IM R A−1 R 2 ˘ A−1 R RAP IM σu·v V AP IM R A−1 R −1 −1 ˜ × A−1 R RAP IM AP IM θ − θ = −1 ˜ A−1 R RAP IM AP IM θ − θ σ2 × u·v 2 σ ˆu·v −1 ˜ × A−1 R RAP IM AP IM θ − θ ⇒ (R∗ Ψ (VP IM )) R∗ VP IM R∗ −1 (R∗ Ψ (VP IM )) ∼ χ2q and for q = 1, we have 71 −1 R∗ Ψ (VP IM ) ∼ Z. tˆ ⇒ R∗ VP IM R∗ 2 . From the ˜ using σ Next, we consider the asymptotic behavior of the test statistic W ˜u·v 2 , we know that it is an estimator based on ∆S ˜u , which is the difference construction of σ ˜u·v it u , where S ˜u = S y − S D δ˜i − S x β˜ − x γ˜ . Then, we have of S˜it t it it it it u = ∆S y − ∆S D δ˜ − ∆S x β˜ − ∆x γ ∆S˜it t i it ˜ it it = yit − Dt δ˜i − xit β˜ − vit γ˜ = Dt δi + xit β + uit − Dt δ˜i − xit β˜ − vit γ˜ γ − γ) − Dt δ˜i − δi − xit β˜ − β = uit − vit γ − vit (˜ γ − γ) − Dt δ˜i − δi − xit β˜ − β . = u+ it − vit (˜ It can be shown that the last two parts of the formula can be neglected for long run u . Thus, the long run variance estimator based on ∆S ˜u , that is variance estimation of ∆S˜it it + 2 σ ˜u·v , asymptotically coincides with long run variance estimator of uit − vit (˜ γ − γ). 2 += + = σu·v Let’s define ηit , and then its long run variance is Ω , so an u+ , v i it it Ω22 p + + → Ω+ . infeasible long run variance estimator Ω+ i , using unobserved ηit is consistent: Ωi − i 1 + ˜ + , for u+ − Note that: u+ γ − γ) = ηit , then HAC estimator, Ω it − vit (˜ i it − (˜ γ − γ) vit (˜ γ − γ) can be written as: 1 − (˜ γ − γ) Ω+ i 1 − (˜ γ − γ) with (˜ γ − γ) ⇒ 0k×k Ik 0k×p · · · 0k×p σu·v Π  −1  N N 1  i=1 0 g1,i (s)g1,i (s) ds −1/2 = σu·v Ωvv dγ , 72  i=1 0 −1 1 ×  [G1,i (1) − G1,i (s)]dwu,i (s) where dγ is the second k × 1 block of   N −1  1 g1,i (s)g1,i (s) ds i=1 0  N  1 i=1 0 [G1,i (1) − G1,i (s)]dwu,i (s) . + This implies that Ω˜i will converge to: −1/2 1 − σu·v Ωvv = 2 σu·v 2 + σu·v dγ 2 ⇒ 1 So we have: σ ˜u·v N ˜ W = dγ 1/2 −1 1/2 Ωvv Ωvv N i=1  2 σu·v 0  −1/2 Ωvv −σu·v Ωvv 0 1/2 Ωvv 1/2 Ωvv −1 dγ  2 dγ = σu·v 1 + dγ dγ . 2 1 + dγ dγ . This implies that = σu·v 2 1 + dγ dγ σu·v −1 ˜ A−1 R RAP IM AP IM θ − θ  1 ˜ A−1 R RAP IM VP IM AP IM R A−1 R 2 ˘ A−1 R RAP IM σu·v V AP IM R A−1 R −1 −1 ˜ × A−1 R RAP IM AP IM θ − θ = −1 A−1 R RAP IM AP IM θ˜ − θ σ2 × u·v 2 σ ˜u·v −1 ˜ × A−1 R RAP IM AP IM θ − θ ⇒ ∼ (R∗ Ψ (VP IM )) R∗ VP IM R∗ 1 + dγ dγ −1 (R∗ Ψ (VP IM )) χ2q 1 + dγ dγ and when q = 1, we have t˜ ⇒ Z 1 + dγ dγ 73 . −1 2∗ , we have For the result of the fixed-b test statistic, using σ ˜u·v ˜∗ = W −1 ˜ A−1 R RAP IM AP IM θ − θ ˜∗ A−1 R RAP IM VP IM AP IM R A−1 R 2 ˘ A−1 R RAP IM σu·v V AP IM R A−1 R −1 −1 ˜ × A−1 R RAP IM AP IM θ − θ = −1 ˜ A−1 R RAP IM AP IM θ − θ σ2 × u·v 2∗ σ ˜u·v −1 ˜ × A−1 R RAP IM AP IM θ − θ ⇒ −1 (R∗ Ψ (VP IM )) R∗ VP IM R∗ 1 N ∼ 1 N N i=1 −1 (R∗ Ψ (VP IM )) Q∗i (b) χ2q N i=1 Q∗i (b) and when q = 1, we have Z t˜∗ ⇒ 1 N N i=1 . Q∗i (b) Note that numerator and the denominator of the limiting distribution are independent, because Vogelsang and Wagner [2014] have proved that the numerator is independent with Q∗i (b) for all i = 1, 2, · · · , N , then it follows that the numerator is independent with the sum 1 N N i=1 Q∗i (b). Due to the independence of numerator and denominator in above limiting distribution, if we know µQ , which given by µQ = E Q∗i (b) , then as T → ∞ followed by N → ∞, we have the following sequentially limit results: ˜∗ W µ Q = = ˜∗ µQ · W χ2q ⇒ 1 N P 2∗ σ ˜u.v R∗ Vˆ R∗ µQ T Rβ˜ − r −→ χ2q N 1 ∗ µQ Qi (b) i=1 as N → ∞. 74 −1 T Rβ˜ − r Also, when q = 1, similarly, we have: t˜∗µ Q = µQ · t˜∗ Z ⇒ 1 N P −→ Z N 1 ∗ µQ Qi (b) i=1 as N → ∞. 75 Chapter 2 Hypothesis testing in cointegrated panels: Asymptotic and Bootstrap method This paper compares asymptotic and bootstrap hypothesis tests in cointegrated panels with cross-sectional uncorrelated units and endogenous regressors. All the tests are based on the panel integrated modified ordinary least square (panel IM-OLS) estimator from Vogelsang et al. [2016]. The aim of using the bootstrap tests is to deal with the size distortion problems in the finite samples of fixed-b tests. Finite sample simulations show that the bootstrap method is better than the asymptotic method in terms of having lower size distortions. In general, the stationary bootstrap is better than the conditional-on-regressors bootstrap, although in some cases, the conditional-on-regressors bootstrap has less size distortions. The improvement in size comes with only minor power losses, which can be ignored when the sample size is large. 76 2.1 Introduction The bootstrap has become common in econometric analysis, especially in performing hypothesis tests. The basic idea of hypothesis tests is to compare the observed value of a test statistic with the distribution that it would follow if the null hypothesis were true. If the distribution is known, then we can perform exact tests. However, in many cases of interest, the distribution of the test statistic is only known asymptotically or is dependent upon nuisance parameters. In many cases, bootstrap hypothesis testing works well since the bootstrap statistics converge to the same asymptotic distributions as the sample statistics do. Therefore, the nuisance parameter dependent limit distributions can be approximated by the bootstrap simulations, which makes inference available. The purpose of the present paper is to compare the fixed-b asymptotic hypothesis test with two bootstrap hypothesis tests, conditional-on-regressors bootstrap test and stationary bootstrap test, for panel cointegrated regressions with endogenous regressors. When the regressors are endogenous, it is well known that a variety of different methods, such as panel fully modified Ordinary Least Square (panel FM-OLS), panel dynamic Ordinary Least Square (panel DOLS) and panel integrated modified Ordinary Least Square (panel IMOLS), will deliver estimators that have zero mean Gaussian mixture limiting distributions, which in turn allow asymptotic inference to be carried out (see Kao and Chiang [2000], Pedroni [2000], Bai et al. [2009], Mark and Sul [2003], Vogelsang et al. [2016]). Among those methodologies, panel IM-OLS relies on the fixed-b asymptotic theory. Compared with the traditional asymptotic theory, the fixed-b asymptotic theory can capture the impact of kernel and bandwidth choices on the sampling distributions of HAC-type test statistics. However, both of those asymptotic theories often provide poor approximations to the distributions 77 of associated test statistics in finite samples, which leads to size distortion problems. To improve the quality of finite sample inference, in terms of decreasing size distortions, the bootstrap method is considered in this paper. Although bootstrap methods are widely employed for analyzing nonstationary time series data, a surprisingly small proportion are devoted to bootstrap inference in cointegrated regressions. Li and Maddala [1997] investigated the usefulness of bootstrap methods for small sample inference in cointegrated regression models. Their simulation results showed that the substantial size distortions of the asymptotic tests can be corrected by properly implemented bootstrap methods. Psaradakis [2001] applied the sieve bootstrap procedure to cointegrated regressions, and his simulation study demonstrated the small-sample superiority of the sieve bootstrap over both the traditional asymptotic approximation and the blockwise bootstrap. Chang et al. [2006] considered the sieve bootstrap based on a VAR model for the cointegrated regressions. They established the bootstrap consistency for both OLS and DOLS, which leads to valid bootstrap inference. Shin and Hwang [2013] applied the stationary bootstrap to cointegrated regressions. They established the limiting distribution of the bootstrap ordinary least square estimator (OLSE) as well as the limiting null distribution of the bootstrap Wald-type test regarding the cointegration parameter. Also, finite sample size and power properties of the bootstrap test were studied by a Monte Carlo simulation. Note that in the above literature, the bootstrap methods are applied in pure time series setting. The contribution of this paper is twofold. First, the results complement the existing literature by applying bootstrap inference to panel cointegrated regressions, and second, comparisons are made between bootstrap and fixed-b methods for inference using panel IMOLS. Bootstrap methods are applied to a cointegrated panel with uncorrelated cross sectional units and homogeneous 2nd order moments. Finite sample size and power properties of the 78 bootstrap test are studied by a Monte Carlo simulation. The bootstrap methods applied in this paper are the conditional-on-regressors bootstrap and stationary bootstrap. We do not consider the sieve bootstrap for two reasons. First, even though the sieve bootstrap can be applied in fairly general models and performs well in pure time series setting, Smeekes and Urbain [2014] questioned the validity of the use of VAR sieve bootstrap in panels with a moderate cross-sectional dimension. In addition, when estimating the models and carrying out inference, we do not assume the error terms follow AR or VAR models. The rest of the paper is organized as follows. Section 2.2 introduces the model, assumptions and asymptotic inference based on the panel IM-OLS estimators. In Section 2.3, the conditional-on-regressors bootstrap and stationary bootstrap procedures are presented. Section 2.4 provides a simulation study to compare the size and the power of the bootstrap tests with the fixed-b asymptotic test. Section 2.5 summarizes the results and concludes the paper. 2.2 2.2.1 The model, assumptions and asymptotic inference The model and assumptions Consider the panel data model given by yit = Dt δ + xit β + uit (2.1) xit = xit−1 + vit (2.2) 79 where i = 1, 2, · · · , N and t = 1, 2, · · · , T index the cross-sectional and time series units respectively; yit and uit are scalars; Dt is the deterministic component, and δ is a p × 1 vector; xit , vit and β are k ×1 vectors. Suppose that ηit = uit v it is a (k +1) dimensional stationary vector process across i, then the model introduced in (2.1) describes a system of panel cointegrated regressions, i.e. yit is cointegrated with xit . In the above system, we are interested in inference about β based on the panel IMOLS estimator. Before we define the panel IM-OLS estimator of β, we make following assumptions. Assumption 3. Assume that {ηit }N i=1 are cross-sectionally uncorrelated and 2nd order moments are constant across i. Note that the Assumption 3 only requires that the panels are homogeneous in the 2nd order moment; it’s possible the higher order moment structures are heterogeneous across i. Assumption 4. Assume that for all i, ηit is a stationary process and it satisfies a functional central limit theorem (FCLT) of the form T T −1/2 ηit ⇒ Bi (r) = Ω1/2 Wi (r), r ∈ (0, 1]. t=1 In Assumption 4, [rT ] represents the integer part of rT , and Wi (r) is a (k + 1) × 1 vector of independent standard Brownian motions. Ω1/2 is a (k + 1) × (k + 1) matrix that satisfies Ω = Ω1/2 Ω1/2 , and  ∞ Ω=  Ωuu Ωuv  E ηit ηit−j =   > 0, j=−∞ Ωvu Ωvv 80 where it is clear that Ωuv = Ωvu . The assumption Ωvv > 0 rules out cointegration in xit . Partition Bi (r) as Bi (r) = Bu,i (r) Bv,i (r) , and likewise partition Wi (r) as Wi (r) = wu,i (r) Wv,i (r) , where wu,i (r) and Wv,i (r) are a scalar and a k-dimensional standard Brownian motion respectively. Using the Cholesky form of Ω1/2 ,    σu·v λuv  Ω1/2 =  , 1/2 0k×1 Ωvv −1/2 −1 2 = Ω −Ω Ω−1 Ω , and λ = Ω it can be shown that σu·v uv uv Ωvv uu uv vv vu . By this Cholesky decomposition, we can write     Bu,i (r) σu·v wu,i (r) + λuv Wv,i (r) Bi (r) =  = . 1/2 Bv,i (r) Ωvv Wv,i (r) Assumption 5. For the deterministic component, Dt , assume that there is a p × p matrix GD and a vector of functions, D(s), such that √ lim T →∞ T G−1 D D[sT ] = D(s) r with D(s)D(s) ds < ∞, 0< 0 t˘  b=1 B ˘∗ ˘ I W CBS,j > W j=1 where I(·) is the indicator function. Reject the null hypothesis if the equal tail bootstrap p-value is less than 5%. 89 2.3.2 Stationary bootstrap The stationary bootstrap, proposed by Politis and Romano [1994], is a special type of block bootstrap where the block size follows a geometric distribution instead of a fixed number. For a geometric distribution with parameter pT , the expected block size of the stationary bootstrap is 1/pT . The stationary bootstrap has been used in the literature of unit root tests, cointegration tests and cointegrated regression inference (see Swensen [2003], Paparoditis and Politis [2005], Parker et al. [2006], Shin [2015] and Shin and Hwang [2013]). It can capture the serial correlation structure in the original sample by block resampling, and it produces stationary bootstrap samples. A formal description of the stationary bootstrap inference procedure is given below. 1. Calculate the residuals as u = S y − S D δˆ − S x βˆ − x γ Sˆit t it it ˆ it ˆ βˆ and γˆ are the panel IM-OLS estimators. where δ, u , ∆x ˆu 2. Define ηˆit = ∆Sˆit it for t = 1, 2, · · · , T , where Si0 = 0, and xi0 is zero vector for all i. T ∗ 3. Resample the series {ˆ ηit }Tt=1 via the stationary bootstrap, obtaining ηˆit t=1 . ∗ = 4. Partition ηˆit ∗ u∗it vit analogously as ηit = T samples x∗it t=1 by t x∗it ∗, vij = j=1 90 uit vit . Obtain the bootstrap T ∗ and generate the bootstrap samples yit t=1 from ∗ = D δˆ + x∗ βˆ + u∗ . yit t it it (2.6) 5. Define the bootstrap statistics as t˘∗SBS Rθ˜∗ − Rθˆ = RAP IM V˘P∗IM AP IM R ˜∗ ˆ ˘∗ W SBS = Rθ − Rθ RAP IM V˘P∗IM AP IM R −1 Rθ˜∗ − Rθˆ where θ˜∗ is the bootstrap panel IM-OLS estimator from regression (2.6), V˘P∗IM is constructed exactly as V˘P IM but using the bootstrapping data. 6. Repeat above steps 3-5 independently B times to obtain samples ˘∗ W SBS,j B j=1 t˘∗SBS,j B j=1 and . 7. Compute the equal tail bootstrap p-value as  1 p∗ t˘ = 2min  B p∗ ˘ W 1 = B B j=1 1 I t˘∗SBS,j ≤ t˘ , B B  I t˘∗SBS,j > t˘  j=1 B ˘∗ ˘ I W SBS,j > W j=1 where I(·) is the indicator function. Reject the null hypothesis if the equal tail bootstrap p-value is less than 5%. Note that the step 1 of Section 2.3.1 and Section 2.3.2 are both based on the regression 91 (2.3), which includes the augmented regressor xit in the regression. It might worth exploring how the bootstrap works if the residuals are calculated from non-augmented partial sum regression. That is, for both stationary and conditional-on-regressors bootstrap, the residuals are obtained from u = S y − S D δˆ − S x βˆ Sˆit t it it (2.7) where δˆ and βˆ are the panel IM-OLS estimators, and all other steps are the same as its corresponding procedures. Using these residuals, the stationary bootstrap resampling and the conditional-on-regressors bootstrap resampling will be more comparable, and it could capture some of endogeneity in the bootstrap resamples. In next section, we will provide the bootstrap results based on Section 2.3.1 and Section 2.3.2 as well as the bootstrap results based on the residuals from regression (2.7). 2.4 Finite sample simulations In this section, we compare finite sample size and power performance of the bootstrap tests with the asymptotic tests based on the panel IM-OLS estimators. The data generating process is the same as in Vogelsang et al. [2016], which is given by yit = µ + x1it β1 + x2it β2 + uit x1it = x1i,t−1 + v1it x2it = x2i,t−1 + v2it 92 where for all i = 1, 2, · · · , N , ui0 = 0, x1i0 and x2i0 are zero vectors, and uit = ρ1 ui,t−1 + it + ρ2 (e1it + e2it ) v1it = e1it + 0.5e1i,t−1 v2it = e2it + 0.5e2i,t−1 where it , e1it and e2it are i.i.d. standard normal random variables independent of each other. The parameter values are µ = 3, β1 = β2 = 1. In addition, we use ρ1 , ρ2 ∈ {0.6, 0.9}. The parameter ρ1 controls serial correlation in the regression error, and ρ2 determines the endogeneity of the regressors. In this paper, we only provide results where both ρ1 and ρ2 are relatively large because according to the findings in Vogelsang et al. [2016], if ρ1 and ρ2 are relatively small (ρ1 = ρ2 = 0.3), there are only minor size distortions for fixed-b asymptotic tests. Therefore, the bootstrap method is not necessary when ρ1 and ρ2 are small. The kernel function used in this simulation study is the Bartlett kernel, and the bandwidths are given by M = bT with b ∈ {0.06, 0.1, 0.3, 0.5, 0.7, 0.9, 1}. We use pT = 0.02(T /50)−1/3 as the block length parameter in the stationary bootstrap2 . The sample sizes are N = 5, T ∈ {50, 100}. The number of bootstrap replications is B = 399, and the number of simulation replications is 1000. Using the simulation designed above, we only report results for cases where ρ1 = ρ2 . The results include t-statistics for testing the null hypothesis H0 : β1 = 1 and Wald statistics 2 Politis and White (2004, 2009) considered estimators constructed via stationary bootstrap to obtain an approximation to the sampling distribution of the mean of a finite sample from the (strictly) stationary realvalued sequence. They showed that the optimal block length parameter minimizing MSE of the stationary bootstrap sample mean is cT −1/3 for some constant c. In addition, Shin and Hwang (2013) considered the bootstrap ordinary least square estimator for cointegrating regressions. They established large sample validity of a bootstrap test regarding cointegration parameters and showed that the block length parameter 0.02(T /50)−1/3 would provide stable size performance for the stationary bootstrap test. 93 for testing the joint null hypothesis H0 : β1 = 1, β2 = 1. The asymptotic panel IM2 and is labeled panel OLS statistics were implemented in two ways. The first uses σ ˜u·v 2∗ and is labeled panel IMOLS(fb). The bootstrap panel IMOLS(D), and the second uses σ ˜u·v IM-OLS statistics were implemented in four ways. The first two statistics are based on the 2 and are labeled Cond-BS IMOLS (D) and Stat-BS IMOLS(D) respectively bootstrapped σ ˜u·v for the conditional-on-regressors bootstrap and the stationary bootstrap. The second two 2∗ and are labeled Cond-BS IMOLS(fb) and Statstatistics are based on the bootstrapped σ ˜u·v BS IMOLS(fb) respectively. Rejections for panel IMOLS(D) are carried out using N (0, 1) critical values for the t test and χ22 critical values for the Wald test. Rejections for panel IMOLS(fb) are carried out using fixed-b asymptotic critical values. In contrast, rejections for the bootstrap statistics are carried out by comparing the bootstrap p-value with the nominal level, which is 5% in this simulation. In order to see if the bootstrap methods can help solve the over-rejection problem of the asymptotic tests in finite sample, we plot in Figures 2.1-2.8 null rejection probabilities of the t and Wald tests as a function of b ∈ (0, 1]. The first two figures give the results for N = 5, T = 50 using the Bartlett kernel and ρ1 = ρ2 = 0.6. In Figure 2.1, all t-tests have some over-rejection problems, and there is no test that dominates the others in this scenario. When the bandwidth is small (b = 0.1), panel IMOLS(D) is better than the other tests because it is conservative. But when bandwidth is relative large (b > 0.2), it turns out that Cond-BS IMOLS(D) is the best. Even though it is better than all other tests, the Cond-BS IMOLS(D) rejection probabilities are close to 15%, which is much larger than nominal level 5%. In Figure 2.2, for Wald tests, the stationary bootstrap tests dominate the other tests for all values of b. Its rejection probabilities are close to 10% for all values of b, which is much better than the asymptotic tests. Also, Cond-BS IMOLS(D) has rejection probabilities 94 around 12% as long as the bandwidth is not very small (b > 0.1). In Figures 2.3 and 2.4, all the settings are the same as in Figure 2.1 and 2.2 except that the time series sample size increases from T = 50 to T = 100. Comparing Figures 2.1 and 2.3, Figures 2.2 and 2.4, we see that both t and Wald tests have less size distortion when sample size increases. In Figure 2.3, the patterns of the rejection probabilities of all t tests are similar as those in Figure 2.1. And still, there is no test that dominates the others for t tests. For Wald tests, the pattern is very clear. As we can see from Figure 2.4, for all values of b, the asymptotic tests have the highest size distortions. But the rejection probabilities of the stationary bootstrap tests are stable and close to 5%, which implies that stationary bootstrap successfully solves the over-rejection problem in this scenario. The null rejection probabilities of the conditional-on-regressors bootstrap tests are higher than 5% but less than those of the asymptotic tests. As the values of ρ1 , ρ2 increase to 0.9, there exists strong serial correlation and endogeneity. We can see from Figures 2.5-2.8 that all the tests have serious over-rejection problems regardless of bandwidth. For N = 5, a time series sample size T = 100 is not large enough for the stationary bootstrap to obtain reasonable size that is close to 5%. But among all three tests, the stationary bootstrap tests are better than conditional on regressor bootstrap and asymptotic fixed-b tests. And this is true for both t and Wald tests, which is not the case when ρ1 = ρ2 = 0.6. In addition, unlike the results before, Stat-BS IMOLS(D) and Stat-BS IMOLS(fb) rejection probabilities are not that close any more. Generally speaking, when both ρ1 and ρ2 are very large, Stat-BS IMOLS(D) tends to have the smaller size distortion than Stat-BS IMOLS(fb). Therefore, when ρ1 , ρ2 are large, in order to obtain reasonable size, we need a very large time series sample size and to use the Stat-BS IMOLS(D) statistics. From the above, we see that the bootstrap tests generally have less size distortions than 95 the asymptotic tests. However, if the power of the bootstrap testing is low, then the bootstrap methods are less useful. When the alternative is true, some bootstrap methods fail to simulate critical values that are valid under the null in which case the tests have no power. Therefore, the analysis of the power properties of the bootstrap tests is necessary. For the sake of brevity we only display results of the stationary bootstrap for the case ρ1 = ρ2 = 0.6 for the Wald test for N = 5, T ∈ {50, 100} and using the Bartlett kernel. Starting from the null values of β1 and β2 equal to 1, we consider under the alternative β1 = β2 = β ∈ (1, 1.25], using (including the null value) a total of 13 values on a grid with mesh 0.02. We focus on raw power using bootstrapped critical values. Using N = 5, T = 50, with Bartlett kernel and b = 0.1, Figure 2.9 provides power comparisons between Stat-BS IMOLS(D) and Stat-BS IMOLS(fb). The power plots indicate that when the alternative is true, the stationary bootstrap is still simulating critical values that are valid under the null. Figure 2.10 displays the same power comparisons as in Figure 2.9 but with T = 100. The main finding is that power increases as T increases. From Figures 2.9 and 2.10, we can see that the bootstrap tests have good power. As mentioned in the end of Section 2.3, we also consider the stationary bootstrap and the conditional-on-regressors bootstrap based on the residuals from the non-augmented partial sum regression. Null rejection probabilities of the t and Wald tests as a function of b ∈ (0, 1] are shown in Figures 2.11-2.18. The general patterns in Figures 2.11-2.18 are close to those in Figures 2.1-2.8. Overall, the bootstrap methods based on the residuals from the nonaugmented partial sum regression have less size distortion problems than the asymptotic methods especially for the Wald test with large sample size. But when serial correlation and endogeneity are both large, it seems that the bootstrap results are depend little on the choice of the residuals. The power results in this case are displayed in Figure 2.19 and 2.20, 96 which are very similar to the power results in Figures 2.9 and 2.10. 2.5 Summary and conclusion This paper compares bootstrap tests with fixed-b asymptotic tests based on the panel IM-OLS estimator of Vogelsang et al. [2016] for a homogeneous panel cointegrated regression with endogenous regressors. The bootstrap methods used are the conditional-on-regressors bootstrap and the stationary bootstrap. The purpose of using the bootstrap tests is to improve the quality of finite sample inference. The Monte Carlo simulations show that the bootstrap methods can effectively reduce size distortions in finite samples. In general, the stationary bootstrap has less size distortions than the conditional-on-regressors bootstrap and asymptotic fixed-b tests, especially when there is strong serial correlation and endogeneity (ρ1 = ρ2 = 0.9). It is necessary to have a large time series sample size to obtain reasonable size of the tests. When the serial correlation and endogeneity is medium (ρ1 = ρ2 = 0.6), the bootstrap methods still have less size distortion, but t and Wald tests have different results. For Wald tests, the stationary bootstrap is always better than the other two methods. In contrast, for t-tests, Cond-BS IMOLS(D), the statistic constructed using a HAC estimator based on the first differences of the residuals from the augmented partial sum regression for 2 , has less size distortions when the bandwidth is relatively large (b > 0.25). In addition, σu·v the stationary bootstrap statistics are more robust than all other test statistics for all values of bandwidth. Finally, the power plots from the simulation show that the bootstrap tests have good power. Further research will study the panel IM-OLS method for estimation and inference in a heterogeneous cointegrating panel with endogenous regressors. In that more general scenario, 97 finding a fixed-b asymptotic pivotal statistic based on panel IM-OLS will be challenging. However, the results in this paper indicate that the bootstrap method could be an alternative solution for hypothesis tests. In addition, if the panel consists of cross-sectional dependent units, then the bootstrap procedure will need to be modified to resample all individuals together rather than resample individual by individual. Another topic of future research is to establish the consistency of the bootstrap for panel IM-OLS tests. 98 APPENDIX 99 Figures Figure 2.1: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel 100 Figure 2.2: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel 101 Figure 2.3: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel 102 Figure 2.4: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel 103 Figure 2.5: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel 104 Figure 2.6: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel 105 Figure 2.7: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel 106 Figure 2.8: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel 107 Figure 2.9: Raw power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel 108 Figure 2.10: Raw power, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel 109 Figure 2.11: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression 110 Figure 2.12: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression 111 Figure 2.13: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression 112 Figure 2.14: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, Bartlett kernel, residuals from non-augmented partial sum regression 113 Figure 2.15: Empirical null rejections, t-test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression 114 Figure 2.16: Empirical null rejections, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression 115 Figure 2.17: Empirical null rejections, t-test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression 116 Figure 2.18: Empirical null rejections, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.9, Bartlett kernel, residuals from non-augmented partial sum regression 117 Figure 2.19: Raw power, Wald test, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel, residuals from non-augmented partial sum regression 118 Figure 2.20: Raw power, Wald test, N = 5, T = 100, ρ1 = ρ2 = 0.6, b = 0.1, Bartlett kernel, residuals from non-augmented partial sum regression 119 Chapter 3 Estimation and Inference for Heterogeneous Cointegrated Panels with Limited Cross Sectional Dependence This paper is concerned with parameter estimation and inference in a panel cointegrating regression with endogenous regressors and heterogeneous long run variances in the cross section. In addition, the model allows a limited degree of cross-sectional dependence due to a common time effect. The estimator is labeled as panel integrated modified ordinary least squares (panel IM-OLS). Similar to panel fully modified OLS (panel FM-OLS) and panel dynamic OLS (panel DOLS), the panel IM-OLS estimator has a zero mean Gaussian mixture limiting distribution. However, standard asymptotic inference is infeasible due the existence of nuisance parameters. Inference based on panel IM-OLS relies on the stationary bootstrap. The properties of panel IM-OLS are analyzed using the stationary bootstrap in finite sample simulations. 120 3.1 Introduction In the past decade, panel cointegration methods have drawn much attention in empirical research. The attractive feature of panel cointegration methods is that they permit investigation of the long-run relationship among nonstationary variables more efficiently than using time series data alone. However, panel cointegration is more complicated than single time-series cointegration when cross-sectional dependence and heterogeneity exist. If the cross-sectional dependence and heterogeneity were ignored, it might lead to poor inference and inconsistent estimators. It is well known that the application of the first generation panel unit root tests, which generally assume cross-sectional independence, to the series with cross-sectional correlation leads to size distortion and low power. This might also be the case for the panel cointegration estimation and testing. For example, Westerlund and Edgerton [2008] claim that the tests of McCoskey and Kao [1998], Pedroni [1999], [2004] and Westerlund [2005] all require independence among the cross-sectional units, and their size properties become suspect when this assumption does not hold. The homogeneity assumption is often not well supported by the data. Therefore a framework that allows potential heterogeneity is necessary. In the panel cointegrated regression literature, the panel fully modified OLS (panel FMOLS) and the panel dynamic OLS (panel DOLS) methods are the most popular methods (see Kao and Chiang [2000], Pedroni [2000], Bai et al. [2009] and Mark and Sul [2003]). They are the extensions of the single time series fully modified OLS (FM-OLS) and dynamic OLS (DOLS). Integrated modified OLS (IM-OLS), proposed by Vogelsang and Wagner [2014], provides a fully parametric and computationally convenient alternative to the FM-OLS and the DOLS estimators. Vogelsang et al. [2016] extend IM-OLS to panel data models with 121 individual dummies and homogeneous second moment structure. The present paper considers an extension of Vogelsang et al. [2016] by allowing time dummies and heterogeneous variance structure in the model. The benefit of adding time dummies is twofold. First, time dummies can handle deterministic components and common factor shocks, and second, time dummies make the model robust to limited degrees of cross-sectional dependence. Allowing heterogeneous, rather than homogeneous, variance structure makes the framework discussed in this paper more applicable in empirical research. Bai et al. [2009] and Mark and Sul [2003] consider similar problems using the panel FM-OLS and the panel DOLS estimators. The limit theory considered here is obtained for a fixed number of cross-sectional units N , letting the number of the time periods, T , go to infinity. The setting of N fixed and T → ∞ is widely used in empirical macroeconomics, empirical energy economics and empirical finance problems (see Christopoulos and Tsionas [2004], Lee [2005], Apergis and Payne [2009], Narayan and Smyth [2008] and Canzoneri et al. [1999]). Under this scenario, even though the panel IM-OLS estimator converges to a zero mean Gaussian mixture distribution, asymptotic inference is complicated by the presence of nuisance parameters. One way to implement valid hypothesis tests is using bootstrap methods. Although bootstrap methods are widely employed for analyzing nonstationary time series data, e.g. bootstrap unit root tests and bootstrap cointegration tests (see Chang [2004], Paparoditis and Politis [2003], Parker et al. [2006], Westerlund and Edgerton [2007]), surprisingly few papers are devoted to bootstrap inference in cointegrated regressions. Psaradakis [2001] and Chang et al. [2006] employ the sieve bootstrap procedure to cointegrated regressions. Li and Maddala [1997] and Shin and Hwang [2013] apply the stationary bootstrap to cointegrated regression. In the literature, the sieve bootstrap can be applied in fairly general models and performs well in pure time series setting, but it requires fitting a finite order AR or VAR model to 122 the errors. Smeekes and Urbain [2014] questioned the validity of the use of VAR sieve bootstrap in panels with a moderate cross-sectional dimension and showed that the AR sieve bootstrap might be misleading when cross-sectional dependence is present. On the contrary, the stationary bootstrap requires no parametric structure for drawing bootstrap samples. In addition, a working paper by Li [2016] shows that the stationary bootstrap performs well in panel cointegrated regressions with fixed effects and homogeneous variance structure when cross sectional units are uncorrelated. The bootstrap method used in the present paper is the stationary bootstrap. The rest of the paper is organized as follows. Section 3.2 introduces the model, assumptions and the panel IM-OLS estimator. In Section 3.3 asymptotic inference and stationary bootstrap inference are presented. Section 3.4 provides a Monte Carlo simulation to investigate the finite sample properties of the proposed bootstrap test. Section 3.5 summarizes the results and concludes the paper. All proofs are collected in Appendices C - F. 3.2 3.2.1 Model set up and estimation The model and assumptions Consider the following panel data model yit = αi + xit β + eit (3.1) xit = xit−1 + vit (3.2) 123 where i = 1, 2, · · · , N and t = 1, 2, · · · , T index the cross-sectional and time series units, respectively; yit , αi and eit are scalars; xit , β and vit are k × 1 vectors. The regressor, xit , is potentially endogenous for each individual i. Assumption 6. Assume that the error term, eit , follows a special case of a factor model eit = Ft λ + uit . (3.3) In Assumption 6, Ft is the common factor and uit is idiosyncratic component. Assumption 6 is a special case of factor model, as the factor loading λ is constant across i. Under the above assumption, eit and ejt are correlated due to the common factor Ft , therefore the panel data model is cross-sectional dependent. Because the regressor considered here is endogenous, vit is assumed to be correlated with uit . In addition, there is no restriction on the correlation between vit and Ft . Define the error vector as ηit = uit v it and suppose that it is a (k + 1) dimensional stationary vector for each i. In addition, assume that Ft is a I(0) process. This implies that the model introduced in (3.1) describes a system of panel cointegrated regressions, i.e. yit is cointegrated with xit . It might also be interesting to consider the case that Ft is a I(1) process. Bai et al. [2009] consider the CupBC (continuously-updated and bias-corrected) and the CupFM (continuously-updated and fully-modified) estimators for panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends, where Ft is non-stationary. In this paper, the interest is in estimation and inference about β based on the panel IM-OLS estimator when Ft is stationary. In order to derive the panel IM-OLS estimator’s limiting distribution, a second assumption is sufficient. 124 Assumption 7. Assume that ηit is independent across i, and satisfies the Functional CLT   1 1/2 Bu,i (r) T −2 ηit ⇒ Bi (r) =   = Ωi Wi (r), t=1 Bv,i (r) [rT ] where r ∈ (0, 1], and [rT ] denotes the largest integer value of rT . 1/2 In Assumption 7, Ωi 1/2 is a (k + 1) × (k + 1) matrix that satisfies Ωi = Ωi 1/2 Ωi where   Ωuu,i Ωuv,i  Ωi = E ηit ηit−j =   > 0, j=−∞ Ωvu,i Ωvv,i ∞ where it is obvious that Ωuv,i = Ωvu,i . Assume that Ωvv,i is non-singular, which implies that {xit } are not cointegrated among themselves. Partition Bi (r) as Bi (r) = Bu,i (r) B (r) , v,i and likewise partition Wi (r) as Wi (r) = wu,i (r) W (r) , where wu,i (r) and Wv,i (r) are v,i a scalar and a k-dimensional standard Brownian motion, respectively. Using the Cholesky 1/2 form of Ωi ,   1/2 Ωi σu·v,i λuv,i  = , 1/2 0k×1 Ωvv,i −1/2 2 it can be shown that σu·v,i = Ωuu,i − Ωuv,i Ω−1 vv,i Ωvu,i and λuv,i = Ωuv,i Ωvv,i . In addition, it follows that     Bu,i (r) σu·v,i wu,i (r) + λuv,i Wv,i (r) Bi (r) =  . = 1/2 Bv,i (r) Ωvv,i Wv,i (r) Note that λuv,i = 0 for all i because the regressors are allowed to be endogenous. Notice that the 2nd order moment structure is heterogeneous across i. 125 3.2.2 Panel IM-OLS estimator From regression (3.1) and Assumption 1, the system can be rewritten as yit = αi + xit β + Ft λ + uit , and its cross-sectional mean is given by y¯t = α ¯ + x¯t β + Ft λ + u¯t , where 1 y¯t = N α ¯ = x¯t = u¯t = 1 N 1 N 1 N N yit i=1 N αi i=1 N xit i=1 N uit . i=1 Cross-sectional demeaning can be used to remove Ft λ and provides an estimation equation that is exactly invariant to Ft λ. Note that the cross-sectional demeaning is exactly the same as including time period dummies and projecting them out of the regression. Since Ft could be unobserved time shock, therefore projecting it out before partial summing is 126 crucial. Cross-sectional demeaning gives yit − y¯t = αi − α ¯ + xit − x¯t β + uit − u¯t , (3.4) y¨it = µi + x¨it β + u¨it , (3.5) which is denoted as where µi = α i − α ¯ y¨it = yit − y¯t x¨it = xit − x¯t u¨it = uit − u¯t . Following Vogelsang and Wagner [2014], compute the partial sum of regression (3.5) to give y¨ x ¨ β + Su ¨ Sit = tµi + Sit it , where t y¨ Sit = y¨ij j=1 t x ¨ = Sit x¨ij j=1 t u ¨ = Sit u¨ij . j=1 127 (3.6) In order to deal with the endogeneity problem generated by the correlation between uit and vit , it is sufficient to add additional regressors into regression (3.6). A natural candidate is the demeaned regressor x¨it , however, this does not work due to the heterogeneity in the model. This is formally shown in the Appendix. The endogeneity problem, which is complicated by the heterogeneity in the variance structure, can be solved by adding the decomposed x¨it into (3.6). The decomposed x¨it can be expressed as −1 −1 N −1 −1 −1 x1t , · · · xi−1,t , xit , xi+1,t , · · · x . N N N N N Nt Adding these regressors separately will overcome the heterogeneous variance problem when dealing with endogeneity. Details are given in the Appendix. Remark 4. In regression (3.5), if β is the only parameter of interest, then it is possible to demean across time to remove µi before partial summing. That is +=x y¨it ¨+ ¨+ it β + u it , where + y¨it 1 = y¨it − T T y¨ik , k=1 x¨+ it 1 = x¨it − T T x¨ik , u¨+ it k=1 1 = u¨it − T Then regression (3.6) becomes to y¨+ + + x ¨ β + Su ¨ Sit = Sit it , where y¨+ t + t +, S x ¨ y¨ij it = Sit = j=1 + t u ¨ x¨+ ij , Sit = j=1 128 u¨+ ij . j=1 T u¨ik . k=1 However, in this case, including the components of x¨it is not sufficient to deal with the endogeneity problem. Finding the additional regressors for this partial sum regression is much more challenging if not impossible, therefore this method is not considered in this paper. Remark 5. If the system does have homogeneous 2nd order moment structures, i.e. Ωi = Ωj for any i, j for {1, 2, · · · , N }, then adding x¨it to the partial sum regression will be sufficient for solving the endogeneity problem. Including the additional regressors in (3.6) gives  y¨ x ¨β+ Sit = tµi + Sit −1 N −1 xit γi +  N N  N u ˜ xjt γj  + Sit (3.7) j=1,j=i where u ˜ Sit = u ¨ Sit N −1 1 − xit γi + N N N xjt γj . j=1,j=i Stacking all time periods and all individuals’ data together, the matrix form of the system is given by S y¨ = S x¨ θ + S u¨ , 129 (3.8) where  y¨ S11      u ˜    β   S11        .     .   ..  γ   ..      1         y¨    .   u ˜ . S   .  S   1T     1T         ..     ..  y ¨ u ¨ S =  .  , θ =  γN  , S =  .  ,              y¨     u˜   SN 1   µ1   SN 1         .   .   .   .   .   .   .   .   .              y¨ u ˜ SN T SN µN T  x ¨ S11    x¨ S  12   ..  .    x¨  S1T   . x ¨ S =  ..    S x¨  N1   x¨ S  N2   ..  .   x ¨ SN T N −1 x 11 N −1 x N 21 ··· N −1 x 12 N −1 x N 22 ··· .. . .. . ··· N −1 x 1T N −1 x N 2T ··· .. . .. . .. . −1 x N 11 −1 x N 21 ··· −1 x N 12 −1 x N 22 ··· .. . .. . .. . −1 x N 1T −1 x N 2T ··· −1 x N N1  1 · · · 0   −1 x  2 · · · 0 N 2 N   .. .. ..  . . ··· .     −1 x T · · · 0  N NT  .. .. .. ..  . . . . .   N −1 x  0 · · · 1  N1 N   N −1 x  0 · · · 2  N2 N  .. .. .. ..  . . . .   N −1 x 0 · · · T NT N The panel IM-OLS estimator is the OLS estimator of regression (3.8), which is given by θˆ = S x¨ S x¨ −1 130 S x¨ S y¨ . It follows that θˆ − θ = S x¨ S x¨  N −1 T =  S x¨ S u¨ −1  qit qit  N  T u ˜ qit Sit  i=1 t=1 i=1 t=1 where q1t = x ¨ S1t N −1 x 1t N q2t = x ¨ S2t −1 x N 1t .. . .. . qN t = −1 x N 2t · · · −1 N xN t t 0 · · · 0 N −1 x 2t N · · · −1 N xN t 0 t · · · 0 .. . −1 −1 N −1 x x ¨ . SN t N x1t N x2t · · · Nt 0 0 · · · t N Define the scaling matrix   0 T · Ik      −1 AP IM =   I ⊗ I N k    1 0 IN ⊗ T 2 as a (k + N k + N ) × (k + N k + N ) diagonal matrix. The following theorem gives the asymptotic distribution of the panel IM-OLS estimator. Theorem 3. Assume that the data are generated by (3.1) and (3.2), and that Assumptions 6 and 7 hold. Define θ by stacking β, γi and µi . Then for fixed N , as T → ∞ 131  ˆ A−1 P IM θ − θ   T βˆ − β       (ˆ  γ − γ ) 1 1       ..   .       =  (ˆ   γN − γN )  √     T (ˆ µ1 − µ1 )      ..   .      √ T (ˆ µN − µN )  N −1  T −1  A−1 1T qit qit A1T = T −1 N T −1 i=1 t=1  ⇒    1 N  1 −2 u Sit˜  A−1 1T qit T i=1 t=1 −1 1 N 0 i=1 T hi (r)hi (r)dr   0 i=1  ×  hi (r) σu·v,i wu,i (r) − = Ψ 132 1 N N j=1     dr σu·v,j wu,j (r)  where  1 1 N r 2 W (s) − 1 2 Ω Ω 0 vv,i v,i vv,j Wv,j (s) N j=1 1 −1 Ω 2 W (r) N vv,1 v,1                    hi (r) =                    .. . 1 N −1 Ω 2 W (r) vv,i v,i N .. . 1 −1 Ω 2 N vv,N Wv,N (r) 0 .. . r .. . 0  ds                   .                   Conditional on hi (r) for i = 1, 2, · · · , N , it can be shown that Ψ ∼ N (0, VP IM ), where VP IM is given by  VP IM =   −1 1 N 0 i=1 hi (r)hi (r)dr N 2 σu·v,i  i=1   1 N 0 i=1 1 0 ×  ¨ i (1) − H ¨ i (r) H ¨ i (1) − H ¨ i (r) dr × H −1 hi (r)hi (r)dr and ¨ i (r) = Hi (r) − 1 H N N Hj (r). j=1 The derivation of this conditional variance is given in the Appendix. 133 3.3 3.3.1 Inference about θ Inference using panel IM-OLS This section provides a discussion of hypothesis testing using the panel IM-OLS estimator. In particular, the hypothesis being considered is given by H0 : Rθ = r where R ∈ Rq×(k+N k+N ) with full rank q and r ∈ Rq . Because the vector θˆ has elements that converge at different rates, restrictions on R are necessary. Assume that there exists a non-singular q × q matrix AR such that lim A−1 RAP IM = R∗ R T →∞ with R∗ has rank q. In order to carry out statistical inference, the asymptotic variance, VP IM , needs to be estimated. The outside parts of the sandwich form can be estimated by  N −1 T T −2 AP IM qit qit AP IM  . i=1 t=1 The tricky part is estimating the middle part of the sandwich form of the variance. Suppose 2 2 that σ ˘u·v,i is an estimator for σu·v,i , then an estimator for the middle part of the variance is 134 given by T N q q 2 AP IM S¨iT − S¨i,t−1 σ ˘u·v,i T −4 q q AP IM S¨iT − S¨i,t−1 , t=1 i=1 q q with S¨it = Sit − N1 q N j=1 Sjt q t k=1 qik . and Sit = Therefore, the estimator of VP IM takes the form  N −1 T V˘P IM = T −2 AP IM qit qit AP IM  × i=1 t=1  T  N q q AP IM S¨iT − S¨i,t−1 q q 2 σ ˘u·v,i AP IM S¨iT − S¨i,t−1 T −4 × t=1 i=1  N −1 T T −2 AP IM qit qit AP IM  . i=1 t=1 2 Here, two potential candidates for σ ˘u·v,i are considered. 2 1. The first candidate, σ ˆu·v,i , is based on the residuals of regression (3.7), i.e. 1 u ˜ = S y¨ − tˆ x ¨ βˆ − N − 1 x γ Sˆit µi − Sit ˆi + it it N N N xjt γˆj j=1,j=i where µ ˆi , βˆ and γˆi (i = 1, 2, · · · , N ) are the panel IM-OLS estimators. Define a HAC u ˜: estimator using the first difference of Sˆit T 2 σ ˆu·v,i = T T −1 k j=2 h=2 |j − h| M u ˜ S ˆu˜ . Sˆij ih 2 2. The second candidate, σ ˜u·v,i , is based on the residuals of a further augmented regres- 135 sion of the partial sum regression (3.6), i.e. 1 x ¨ β˜ − N − 1 x γ u ˜ = S y¨ − t˜ ˜i,i + µi − Sit S˜it i it it N N N ˜i xjt γ˜i,j − zit λ j=1,j=i where t−1 j T Dij − zit = t j=1 s=1 j=1 Dit = S x¨ it Dis , N −1 x −1 t −1 it · · · N xN t N x1t · · · N , and µ ˜i , β˜i , γ˜i,i and γ˜i,j are OLS from the further augmented regression given by y¨ Sit = x ¨β tµi + Sit i N −1 1 + xit γi,i − N N N xjt γi,j + zit λi . (3.9) j=1,j=i Note that for given i, the estimators γi,i is the parameter associate with ith individual’s xit regressor, and γi,j are the parameters associate with ith individual’s all xjt regressor for j = 1, 2, · · · , N and j = i. They are allowed to be different across different individual because the further augmented regressions are being done individual by individual. Therefore, the u ˜ is defined as HAC estimator using S˜it T 2 σ ˜u·v,i = T T −1 k j=2 h=2 |j − h| M u ˜ S ˜u˜ . S˜ij ih 2 Remark 6. The reason for considering the second variance estimator, σ ˜u·v,i , is that it delivers an asymptotic pivotal limit in the following two cases: (i) N = 1 (See Vogelsang and Wagner [2014]); (ii) N > 1 with homogeneous variance structure (See Vogelsang et 2 al. [2016]). However, when N > 1 and heterogeneous variance structure exists, σ ˜u·v,i no 136 longer leads to an asymptotic pivotal limit. In practice, this estimator should be considered for N = 1 case or N > 1 with homogeneous variance structure case. 2 ˆ denote statistics defined using σ Let tˆ and W ˆu·v,i to construct V˘P IM , and likewise t˜ and 2 ˜ denote statistics defined using σ ˘ denote either W ˜u·v,i to construct V˘P IM . Letting t˘ and W ˆ or t˜ and W ˜ , define the t and W ald statistics as: tˆ and W Rθˆ − r t˘ = RAP IM V˘P IM AP IM R ˘ = W Rθˆ − r RAP IM V˘P IM AP IM R −1 Rθˆ − r . Theorem 4. Assume that the data are generated by (3.1) and (3.2), and that Assumptions 6 and 7 hold. Under traditional bandwidth and kernel assumptions, with N fixed as T → ∞ • ˆ ⇒ W χ2q 1 + dγ dγ and when q = 1, Z tˆ ⇒ (1 + dγ dγ ) where χ2q is a chi-square random variable with q degrees of freedom, Z is a standard 137 normal random variable,   dγ dγ =VP−1 IM  −1 1 N 0 i=1 hi (r)hi (r)dr N 2 dγ dγ i σu·v,i i  i=1   1 N 0 i=1 1 0 ×  ¨ i (1) − H ¨ i (r) H ¨ i (1) − H ¨ i (r) dr × H −1 hi (r)hi (r)dr , and −2 d Ω dγ dγi = σu·v,i Ψi vv,i dΨi , i where dΨi is the (i + 1)th k × 1 block of the distribution Ψ. • Under fixed-b asymptotics where M = bT , b ∈ (0, 1] is held fixed as T → ∞, then the ˆ and tˆ are given by fixed-b limits of W ˆ ⇒ W χ2q ˆ Q(b) and when q = 1, tˆ ⇒ Z ˆ Q(b) 138 where  ˆ  Q(b) =VP−1 IM  −1 1 N 0 i=1 N 2 Qb Pˆi (r) σu·v,i  i=1  0 i=1 1 0  ¨ i (1) − H ¨ i (r) H ¨ i (1) − H ¨ i (r) dr × H −1 1 N  × hi (r)hi (r)dr hi (r)hi (r)dr is a stochastic process that depends on the kernel function, bandwidth and Wi (r). • Under fixed-b asymptotics where M = bT , b ∈ (0, 1] is held fixed as T → ∞, then ˜ ⇒ W χ2q ˜ Q(b) and when q = 1, Z t˜ ⇒ ˜ Q(b) where  ˜  Q(b) =VP−1 IM  −1 1 N 0 i=1 hi (r)hi (r)dr N 2 σu·v,i Qb P˜i (r)  i=1   1 N 0 i=1 1 0 ×  ¨ i (1) − H ¨ i (r) H ¨ i (1) − H ¨ i (r) dr × H −1 hi (r)hi (r)dr is a stochastic process that depends on the kernel function, bandwidth and Wi (r). 139 3.3.2 Inference using the stationary bootstrap ˘ are not asymptotic pivotal, which makes asymptotic Unfortunately, the statistics t˘ and W inference infeasible. One possible solution is applying the stationary bootstrap to mimic the non-pivotal asymptotic distribution of those statistics. The stationary bootstrap, proposed by Politis and Romano [1994], is a special type of block bootstrap where the block size follows a geometric distribution instead of a fixed number. For a geometric distribution with parameter pT , the expected block size of the stationary bootstrap is 1/pT . The stationary bootstrap has been used in the literature of unit root tests, cointegration tests and cointegrated regression inference; see Swensen [2003], Paparoditis and Politis [2005] , Parker et al. [2006], Shin [2015], and Shin and Hwang [2013]. It can capture the serial correlation structure in the original sample by block resampling, and it produces stationary bootstrap samples. A formal description of the stationary bootstrap inference procedure is given below. 1. Calculate the residuals based on regression (3.7) as 1 u ˜ = S y¨ − tˆ x ¨ βˆ − N − 1 x γ ˆi + Sˆit µi − Sit it it N N N xjt γˆj j=1,j=i ˆ γˆi and γˆj for j = 1, · · · , N and j = i are the panel IM-OLS estimators. where µ ˆi , β, 2. Define u ˜ as a proxy for u Sˆit ¨it , and x¨it as a proxy for v¨it , which is the cross-sectional demeaned vit . Based on those proxies, define η¨ˆit = proxy for η¨it = u¨it v¨ it u ˜ Sˆit x¨it = uˆ¨it vˆ¨ it as a u ˜ = 0 and x for t = 1, 2, · · · , T , and set Sˆi0 ¨i0 be zero vector for all i. 140 3. Re-sample the series ηˆ¨it via the stationary bootstrap, obtaining ηˆ¨ , which can be partitioned the same as ηˆ¨it into ηˆ¨ = uˆ¨ vˆ¨ . it it 4. Obtain the bootstrap samples x¨it by t vˆ¨ij , x¨it = j=1 and generate the bootstrap samples y¨it from 1 ˆi + x¨it βˆ + uˆ¨it . y¨it = µ 5. After obtaining the bootstrap demeaned variables x¨it and y¨it , follow the same procedure as discussed before to estimate θ, denoted by θˆ , and compute the bootstrap estimator of the limiting variance VP IM , say V˘P IM . Define the bootstrap statistics as follows t˘ Rθˆ − Rθˆ = RAP IM V˘P IM AP IM R ˘ W = Rθˆ − Rθˆ RAP IM V˘P IM AP IM R −1 6. Repeat steps 3-5 independently B times to obtain samples t˘j Rθˆ − Rθˆ . B j=1 ˘ and W j B j=1 . 1 Note that there is another method to obtain x ¨it . One can resample directly from the original regressor ¨it . However, the size of the xit to obtain xit and then apply cross-sectional demeaning to xit to obtain x tests based on this method is higher than that of the method introduced in procedure 1 to 4. Therefore, this method is not included in this paper. 141 7. Compute the equal tail bootstrap p-value as  p p t˘ ˘ W = 2min  = 1 B 1 B B j=1 1 I t˘j ≤ t˘ , B  B I t˘j > t˘  j=1 B ˘ , ˘ >W I W j j=1 where I(·) is the indicator function. Reject the null hypothesis if the equal tail bootstrap p-value is less than 5%. 3.4 Finite sample simulation This section investigates finite sample size and power of the bootstrap tests based on the panel IM-OLS estimators. The data generating process is given by yit = x1it β1 + x2it β2 + uit x1it = x1i,t−1 + v1it x2it = x2i,t−1 + v2it where for all i = 1, 2, · · · , N , ui0 = 0, x1i0 and x2i0 are zero vectors, and uit = ρ1 ui,t−1 + ρ2 (e1it + e2it ) + εit v1it = e1it + 0.5e1i,t−1 v2it = e2it + 0.5e2i,t−1 142 where for ith individual, εit , e1it and e2it are i.i.d. N 0, i2 random variables. There is no individual effect term and common time effect term included in the data generating process because the focus here is on β1 and β2 , and the estimates of β1 and β2 are exactly invariant to those terms. The parameter values are β1 = β2 = 1. In addition, ρ1 and ρ2 are chosen from {0.6, 0.9}. The parameter ρ1 controls serial correlation in the regression error, and ρ2 determines the endogeneity of the regressors. The kernel function used in this simulation study is the Bartlett kernel, and the bandwidths are given by M = bT with b ∈ {0.1, 0.5, 1}. For the block length parameter pT in the stationary bootstrap, two different settings are presented. One is pT = 0.01(4 − j)(T /50)−1/3 with j ∈ {1, 2, 3}, the other is pT = 0.04(4 − j)(T /50)−1/3 with j ∈ {1, 2, 3}. The sample sizes are N = 5, T ∈ {50, 500}. The number of bootstrap replications is B = 399, and the number of simulation replications is 1000. Results only for cases where ρ1 = ρ2 are reported. The results include t-statistics for testing the null hypothesis H0 : β1 = 1 and Wald statistics for testing the joint null hypothesis H0 : β1 = β2 = 1. The bootstrap panel IM-OLS statistics were implemented in two ways. The first one uses the stationary bootstrap procedures with the bootstrap 2 version of σ ˆu·v,i and is labeled Stat-BS IM-OLS(D). The second one uses the stationary 2 bootstrap procedures with the bootstrap version of σ ˜u·v,i and is labeled Stat-BS IM-OLS(fb). Rejections for the bootstrap statistics are carried out by comparing the bootstrap p-value with the nominal level, which is 5% in this simulation. Tables 3.1 to 3.4 report empirical null rejection probabilities of the t and Wald tests. In each table Panel A corresponds to T = 50 and Panel B to T = 500. Some common findings about the t and Wald tests can be summarized as follows. For both t and Wald tests, Stat-BS IM-OLS(D) statistics tend to have smaller null rejection probabilities than those of Stat-BS 143 IM-OLS(fb) statistics. When the bandwidth parameter b varies, the rejection probabilities are relatively stable for both t and Wald tests, which shows that the bootstrap method can successfully capture the impact of the bandwidth on the test statistics. In addition, when the sample size T increases from 50 to 500, rejection probabilities approach 0.05 as expected. As the values of ρ1 , ρ2 increase from 0.6 to 0.9, there exists strong serial correlation and endogeneity. It can be seen from Tables 3.1-3.4 that the rejection probabilities in all cases generally increase, but those increases depend on the sample size and the test statistics. If the time sample is small (T = 50), the rejection probabilities increase quite a lot for all tests. In contrast, if the time sample size is large (T = 500), the Stat-BS IM-OLS(D) statistics have similar rejection probabilities as ρ1 = ρ2 = 0.6, whereas the rejection probabilities increase quite a bit for the Stat-BS IMOLS(fb) statistics. This implies that when the time sample size is large enough, the Stat-BS IM-OLS(D) statistics can effectively handle strong serial correlation and endogeneity. Another important pattern in Tables 3.1-3.4 is that the size of the tests depends heavily on the tuning parameter pT . It is not a surprise because the stationary bootstrap is a moving block bootstrap with changing block lengths. Theoretically, there is no rule of thumb for choosing the value of pT to ensure the hypothesis test has correct size. In a given sample, Politis and White [2004] and Patton et al. [2009] propose a method to obtain an optimal block length parameter for the stationary bootstrap. However, that optimal block length parameter is based on minimizing the MSE of the stationary bootstrap sample mean, which doesn’t necessarily guarantee the correct size of the tests. Therefore, several different values for pT were used in this simulation study. In Tables 3.1-3.4, for both t and Wald tests, when pT is small, corresponding to large average block length, the tests tend to have over rejection problems. As pT increases, the over rejection problem becomes less severe and 144 under-rejection problems appear in some of the cases. To obtain the correct size, the t tests require pT to be larger than that of the Wald tests. Next consider the power properties of the tests. When the alternative is true, some bootstrap methods fail to simulate critical values that are valid under the null in which case the tests have no power. Therefore, the analysis of the power properties of bootstrap tests is important. Here, only results for the case ρ1 = ρ2 ∈ {0.6, 0.9} for the Wald test for N ∈ {5, 15}, T ∈ {50, 500} with the Bartlett kernel are provided. If the power of the test is not an issue for small sample size, like T = 50, then it will not be a concern when the sample size is large, like T = 500. Starting from the null values of β1 and β2 equal to 1, the alternative values being considered are β1 = β2 = β ∈ (1, 1.4], which are total of 21 values on a grid with mesh 0.02 including the null value. Power and size-adjusted power are reported. Note that size-adjusted power is not feasible in practice, but it allows us to see the theoretical power differences across tests while holding null rejection probabilities constant at 0.05. Figures 3.1-3.4 show that using the bootstrap method, the Stat-BS IM-OLS(D) and StatBS IM-OLS(fb) Wald tests do have power. Figure 3.1 shows the power comparison of the Stat-BS IM-OLS(D) Wald test for small (T = 50) and large (T = 500) sample sizes and using respective block size parameter values, pT ∈ {0.08, 0.00464}, give null rejections close to 5%. It can be seen that the power of the tests with the larger sample size (T = 500) and smaller pT (pT = 0.00464) grows dramatically fast. This implies that if the sample size is large enough and the resampling block size parameter pT can be wisely chosen, the Stat-BS IM-OLS(D) Wald test tends to have very high power. And even if the sample size is relatively small (T = 50), the power of the test is still acceptable if pT is carefully chosen. Next consider the impact of the serial correlation and endogeneity on the power of the 145 Stat-BS IM-OLS(D) Wald test. Figure 3.2 displays the power comparison of the Stat-BS IMOLS(D) Wald test for small (ρ1 = ρ2 = 0.6) and large (ρ1 = ρ2 = 0.9) serial correlation and endogeneity with respective block size parameter values, pT ∈ {0.00464, 0.00696}, give null rejections close to 5%. The power of the test with smaller serial correlation and endogeneity (ρ1 = ρ2 = 0.6) is higher than that of the test with larger serial correlation and endogeneity (ρ1 = ρ2 = 0.9). If the sample size is small, the power of the tests is lower as expected. Figures 3.3 and 3.4 provide size-corrected power comparisons between the Stat-BS IMOLS(D) and Stat-BS IM-OLS(fb) Wald tests for the same values of T , ρ1 , ρ2 , b but using different sample size N . In Figure 3.3, the sample size N is 5, while in Figure 3.4, the sample size N is 15. These two figures allow us to see power differences across tests while holding null rejection probabilities constant at 0.05. It can be seen that when the cross sectional sample size is small, the Stat-BS IM-OLS(fb) test has slightly higher power than that of the Stat-BS IM-OLS(D) test. However, when the cross sectional sample size increases, the power of Stat-BS IM-OLS(D) test is much higher. This implies that we should not consider using the Stat-BS IM-OLS(fb) test when N is large, because it has large size distortions and lower power in this scenario. 3.5 Summary and conclusions This paper considers the estimation and inference of a homogeneous cointegrated vector in a panel data model with individual heterogeneity and heterogeneous variance structure. In addition, the model allows a limited degree of cross-sectional dependence due to a common time effect. The estimator is labeled as panel IM-OLS. It is a fully parametric estimator that is based on a partial sum transformed regression augmented by the decomposed demeaned 146 original regressor. The advantage is that it leads to a zero mean mixed Gaussian limiting distribution without requiring the choice of tuning parameters (like bandwidth, kernel function, numbers of leads and lags). Asymptotic inference is infeasible due to the presence of nuisance parameters, and the stationary bootstrap is used for hypothesis testing. Monte Carlo simulations show that the bootstrap method can deliver good size and power for t and Wald tests, depending on the sample size, serial correlation, endogeneity and the stationary bootstrap block length resampling parameter. When there is strong serial correlation and endogeneity, for moderate time sample sizes, the size of the tests are close to nominal level for certain values of pT . Unlike in Vogelsang et al. [2016], the further augmented regression residuals do not lead to an asymptotic pivotal test, and the bootstrap hypothesis test based on it has more size distortion. When the cross sectional sample size N is small, the power of the test based on the further augmented regression residuals is a little bit higher than that of the test based on augmented regression residuals. However, when the cross sectional sample size N increases, the power of the test based on the further augmented regression residuals is much lower than that of the test based on augmented regression residuals. This power loss as N increases is because the further augmented regression requires adding many additional regressors to compute the residuals. Therefore, in practice, when N is large and the panel has cross sectional dependence and heterogeneous variance structure, inference based on the further augmented regression residuals is not recommended. One limitation of the present paper is that the cross-sectional dependence is only coming from a common time effect with a constant factor loading. This might be restrictive in some applications. Therefore, a model with more general cross-sectional dependence may be worth considering in the future. In that more general scenario, the theory of the inference 147 based on the panel IM-OLS type estimators will rely on more general bootstrap procedures. If the stationary bootstrap can mimic the non-pivotal limit of the original statistics, then formally proving the asymptotic equivalence between the stationary bootstrap statistics and the original test statistics may be a viable research topic in the future. 148 APPENDIX 149 Tables and Figures Table 3.1: Empirical null rejection probabilities, 5% level, t-tests for H0 : β1 = 1, N = 5, ρ = 0.6, Bartlett kernel pT 0.01 0.02 0.03 0.04 0.08 0.12 0.00464 0.00928 0.01393 0.01857 0.03713 0.05570 Stat-BS(D) Stat-BS(fb) b=0.1 b=0.5 b=1 b=0.1 b=0.5 b=1 Panel A: T = 50 0.246 0.214 0.226 0.346 0.335 0.336 0.219 0.191 0.198 0.31 0.309 0.31 0.199 0.17 0.173 0.3 0.308 0.303 0.184 0.162 0.157 0.285 0.286 0.287 0.127 0.114 0.115 0.231 0.239 0.239 0.102 0.094 0.095 0.213 0.209 0.207 Panel B: T = 500 0.152 0.145 0.138 0.174 0.176 0.162 0.099 0.101 0.102 0.123 0.122 0.106 0.074 0.075 0.081 0.086 0.074 0.085 0.053 0.062 0.065 0.066 0.065 0.068 0.032 0.044 0.033 0.037 0.039 0.033 0.023 0.028 0.024 0.03 0.02 0.025 Table 3.2: Empirical null rejection probabilities, 5% level, t-tests for H0 : β1 = 1, N = 5, ρ = 0.9, Bartlett kernel pT 0.01 0.02 0.03 0.04 0.08 0.12 0.00464 0.00928 0.01393 0.01857 0.03713 0.05570 Stat-BS(D) Stat-BS(fb) b=0.1 b=0.5 b=1 b=0.1 b=0.5 b=1 Panel A: T = 50 0.427 0.375 0.371 0.74 0.745 0.75 0.405 0.349 0.347 0.747 0.743 0.75 0.395 0.328 0.328 0.738 0.733 0.734 0.356 0.307 0.291 0.738 0.74 0.73 0.333 0.279 0.267 0.724 0.724 0.72 0.307 0.247 0.235 0.715 0.716 0.715 Panel B: T = 500 0.144 0.138 0.137 0.243 0.23 0.23 0.095 0.096 0.098 0.175 0.172 0.165 0.07 0.07 0.074 0.144 0.125 0.13 0.053 0.061 0.06 0.115 0.113 0.114 0.031 0.035 0.033 0.093 0.08 0.077 0.022 0.027 0.024 0.073 0.077 0.072 150 Table 3.3: Empirical null rejection probabilities, 5% level, Wald-tests for H0 : β1 = 1, β2 = 1, N = 5, ρ = 0.6, Bartlett kernel pT 0.01 0.02 0.03 0.04 0.08 0.12 0.00464 0.00928 0.01393 0.01857 0.03713 0.05570 Stat-BS(D) Stat-BS(fb) b=0.1 b=0.5 b=1 b=0.1 b=0.5 b=1 Panel A: T = 50 0.159 0.142 0.153 0.296 0.299 0.302 0.125 0.122 0.122 0.266 0.268 0.269 0.107 0.109 0.109 0.239 0.236 0.236 0.092 0.101 0.086 0.216 0.22 0.214 0.049 0.047 0.047 0.155 0.151 0.157 0.029 0.031 0.027 0.1 0.113 0.119 Panel B: T = 500 0.053 0.053 0.051 0.081 0.073 0.074 0.02 0.023 0.022 0.037 0.038 0.039 0.01 0.011 0.019 0.019 0.019 0.021 0.002 0.01 0.007 0.008 0.01 0.016 0.002 0.003 0.004 0.002 0.002 0.005 0 0.003 0.003 0 0.002 0.003 Table 3.4: Empirical null rejection probabilities, 5% level, Wald-tests for H0 : β1 = 1, β2 = 1, N = 5, ρ = 0.9, Bartlett kernel pT 0.01 0.02 0.03 0.04 0.08 0.12 0.00464 0.00928 0.01393 0.01857 0.03713 0.05570 Stat-BS(D) Stat-BS(fb) b=0.1 b=0.5 b=1 b=0.1 b=0.5 b=1 Panel A: T = 50 0.474 0.402 0.393 0.876 0.876 0.867 0.448 0.374 0.368 0.874 0.873 0.864 0.43 0.346 0.348 0.874 0.87 0.865 0.402 0.337 0.316 0.869 0.867 0.865 0.325 0.261 0.25 0.859 0.861 0.846 0.268 0.212 0.21 0.848 0.854 0.841 Panel B: T = 500 0.07 0.062 0.069 0.16 0.159 0.166 0.031 0.04 0.041 0.107 0.096 0.097 0.014 0.016 0.021 0.069 0.063 0.069 0.011 0.018 0.013 0.057 0.043 0.056 0.004 0.006 0.008 0.028 0.031 0.029 0.003 0.005 0.004 0.019 0.019 0.021 151 Figure 3.1: Power of bootstrap Stat-BS IM (D), Wald test, N = 5, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel with different T and pT 152 Figure 3.2: Power of bootstrap Stat-BS IM (D), Wald test, N = 5, T = 500, b = 0.5, Bartlett kernel with different ρ and pT 153 Figure 3.3: Size adjusted power, Wald-tests, N = 5, T = 50, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel 154 Figure 3.4: Size adjusted power, Wald-tests, N = 15, T = 50, ρ1 = ρ2 = 0.6, b = 0.5, Bartlett kernel 155 Proof of failure of using x¨it to solve endogeneity problem This is the proof showing that directly adding x¨it to regression (3.6) cannot fully deal with the endogeneity problem in the model considered in this paper. Suppose, we add x¨it into the partial sum model, which gives Sity¨ = tµi + Sitx¨ β + x ¨it γi + Situ¨ − x ¨it γi 1 .Consider the behavior of T − 2 Situ¨ − x ¨it γi as T → ∞,  T − 21 Situ¨ − x ¨it γi = T − 12 Situ − 1 N N u Sjt − xit γi + j=1 1 = T − 2 Situ − xit γi − 1 N 1 N  N xjt γi  j=1 N 1 u − xjt γi T − 2 Sjt j=1 1 ⇒ Bu,i (r) − Bv,i (r)γi − N N Bu,j (r) − Bv,j (r)γi j=1 1 2 Wv,i (r) γi = σu·v,i wu,i (r) + λuv,i Wv,i (r) − Ωvv,i 1 − N  N 1 2 σu·v,j wu,j (r) + λuv,j Wv,j (r) − Ωvv,j Wv,j (r) γi j=1 1 = σu·v,i wu,i (r) − N  N σu,v,j wu,j (r) j=1 1 −1 2 2 γi − Ωvv,i λuv,i − Wv,i (r)Ωvv,i 1 + N N 1 −1 2 2 Wv,j (r)Ωvv,j γi − Ωvv,j λuv,j j=1 1 = σu·v,i wu,i (r) − N N σu,v,j wu,j (r) j=1 Note that the last inequality holds because when there is heterogeneity in the 2nd moment structure, it is almost impossible that −1 −1 2 2 γi = Ωvv,i λuv,i = Ωvv,j λuv,j for all j = 1, 2, · · · , N . Therefore, just adding x¨it to regression (3.6) cannot fully deal with the endogeneity problem. Note that, if the 2nd moment structure is homogeneous, then only 156 adding x¨it to regression (3.6) will work, because −1 −1 −1 2 2 γi = Ωvv,i λuv,i = Ωvv,j λuv,j = γj = Ωvv2 λuv for all i, j. 157 Proof of Theorem 3 In order to derive the asymptotic distribution of the panel IM-OLS estimator, we start u ˜ with N fixed and T → ∞. with regression (3.7). First consider the limit of T −1/2 Sit  ¨ u ˜ = T −1/2 S u T −1/2 Sit it − N −1 1 xit γi + N N  u− = T −1/2 Sit 1 N j=1,j=i u −x γ + Sjt it i j=1 1 u −x γ − = T − 2 Sit it i ⇒ xjt γj  N 1 N Bu,i (r) − Bv,i (r)γi − N  N 1 N  N xjt γj  j=1 1 u −x γ T − 2 Sjt jt j j=1 1 N N Bu,j (r) − Bv,j (r)γj j=1 1 2 γ = σu·v,i wu,i (r) + λuv,i Wv,i (r) − Wv,i (r)Ωvv,i i N 1 1 2 γ σu·v,j wu,j (r) + λuv,j Wv,i (r) − Wv,j (r)Ωvv,j − j N j=1   N 1 −1 1 2 2 λ = σu·v,i wu,i (r) − σu·v,j wu,j (r) − Wv,i (r)Ωvv,i γi − Ωvv,i uv,i N j=1 1 + N N −1 1 2 2 λ Wv,j (r)Ωvv,j γj − Ωvv,j uv,j j=1 −1 −1 2 λ Therefore, when γi = Ωvv,i uv,i = Ωvv,i Ωvu,i , it follows that u ˜ T −1/2 Sit 1 =⇒ σu·v,i wu,i (r) − N 1 N σu·v,j wu,j (r). j=1 −1 = T − 2 A −1 Define A1T P IM . The next step of the proof is to obtain the limit of A1T qit for 158 N fixed and T → ∞.   N 3 3 − − 1 x x T 2 Sjt  T Sitx¨ T 2 Sit − N j=1    − 1 −1   T 2 N x1t    −1 − 21     x T 1t N     ..     . ..    1 .     − 2 N −1  1 x T    it N −1 − 2 N T x     it N .     .. ..     .     1 − −1     =  T 2 N xN t  =  −1 − 12  x T N t     N     0     0 ..     .     . ..         t     t T     T ..     .. .     .   0 0     N 1 1 N r 1 2 2 r r 1 Ωvv,j Wv,j (s) ds  0 Ωvv,i Wv,i (s) − N  0 Bv,i (s)ds − N 0 Bv,j (s)ds  j=1 j=1     1      −1 −1 2    B (r) W (r) Ω v,1  v,1 vv,1 N N        .. ..     . .      1   N −1   N −1 2 B (r)   v,i W (r) Ω N   vv,i v,i N     ..    .  . .    .   = ⇒    1 −1    B (r)  −1 2 v,N    N W (r) Ω  N vv,N v,N        0  0     ..     .    . .  .         r    r      .    .. .  ..     0 0  A−1 1T qit = − 32  hi (r). Therefore, as T → ∞, A−1 1T qit ⇒ hi (r). 159 For fixed N , as T → ∞,  ˆ A−1 P IM θ − θ  T βˆ − β     γ1 − γ1 )   (ˆ   ..   .     − 21 ˆ = T A1T θ − θ =  (ˆ  γ − γ ) N N  √  T (ˆ µ1 − µ1 )      .   ..  √ T (ˆ µN − µN ) −1   N −1  A−1 1T qit qit A1T = T −1 ⇒    1 N  − 21 u A−1 q T Sit˜  it 1T i=1 t=1 −1 1 N 0 i=1 T T −1 i=1 t=1  N T hi (r)hi (r)dr   0 i=1  ×  hi (r) σu·v,i wu,i (r) − = Ψ 160 1 N N j=1    σu·v,j wu,j (r) dr  Proof of the derivation of the form of the asymptotic variance Ψ We start from rewriting 01 1 N   0 i=1  1 N N i=1 hi (r) σu·v,i wu,i (r) − N1  hi (r) σu·v,i wu,i (r) −  1 N N j=1 N j=1 σu·v,j wu,j (r) dr as    σu·v,j wu,j (r) dr   N hi (r) − 1 hj (r) σu·v,i wu,i (r)dr = N 0 i=1 j=1   N N 1 hi (r) − 1 = σu·v,i hj (r) wu,i (r)dr N 0 i=1 j=1   N N 1 1 wu,i (r)d Hi (r) − σu·v,i = Hj (r) N 0 i=1 j=1       N N N 1 1 Hi (r) − 1 σu·v,i wu,i (r) Hi (r) − Hj (r) |10 − Hj (r) dwu,i (r) = N N 0 i=1 j=1 j=1       N N N 1 1 Hi (r) − 1 σu·v,i wu,i (1) Hi (1) − = Hj (1) − Hj (r) dwu,i (r) N N 0 i=1 j=1 j=1       N N N 1 1 Hi (1) − 1 Hi (r) − 1 σu·v,i  Hj (1) dwu,i (r) − Hj (r) dwu,i (r) = N N 0 0 i=1 j=1 N 1 σu·v,i = i=1 0 j=1 ¨ i (1) − H ¨ i (r)]dwu,i (r). [H Therefore, the variance of 1 N   0 i=1   hi (r) σu·v,i wu,i (r) − 1 N N j=1   σu·v,j wu,j (r) dr  will be same as the variance of N 1 σu·v,i i=1 0 ¨ i (1) − H ¨ i (r)]dwu,i (r) [H 161 which is N 2 σu·v,i i=1 1 0 ¨ i (1) − H ¨ i (r)][H ¨ i (1) − H ¨ i (r)] dr. [H Then, the variance of Ψ is  VP IM =   −1 1 N 0 i=1 hi (r)hi (r)dr N 2 σu·v,i  i=1   1 N 0 i=1 1 0 ×  ¨ i (1) − H ¨ i (r)][H ¨ i (1) − H ¨ i (r)] dr × [H −1 hi (r)hi (r)dr 162 . Proof of Theorem 4 This is the proof of the null limiting distribution of the test statistics in Theorem 4. 2 u ˜ , where First, consider the behavior of σ ˆu·v,i . It is based on Sˆit y¨ Sit u ˜ = Sˆit − tˆ µi − = y¨it − µ ˆi − x¨it βˆ − x ¨ βˆ − Sit N −1 1 xit γˆi + N N N −1 1 vit γˆi + N N N xjt γˆj j=1,j=i N vjt γˆj j=1,j=i 1 N −1 vit γˆi + = µi + x¨it β + u¨it − µ ˆi − x¨it βˆ − N N 1 = µi + x¨it β + uit − N 1 = uit − vit γi − N 1 + N = N j=1 N vjt γˆj j=1,j=i 1 ujt − µ ˆi − x¨it βˆ − vit γˆi + N N vjt γˆj j=1 N γi − γi ) ujt − vjt γj − vit (ˆ j=1 N vjt γˆj − γj − (ˆ µi − µi ) − x¨it βˆ − β j=1 u+ it 1 − vit (ˆ γi − γi ) − N N u+ ˆj − γj jt − vjt γ − (ˆ µi − µi ) − x¨it βˆ − β j=1 where u+ it = uit − vit γi . It can be shown that the last three parts of the formula can be u ˜ . Thus, the long run variance estimator neglected for long run variance estimation of Sˆit u ˜ , asymptotically coincides with long run variance estimator based on u+ − based on Sˆit it vit (ˆ γi − γi ). 2 0 + = + = σu·v,i , and then its long run variance is Ω . Using Define ηit u+ , v i it it 0 Ωvv,i p + , an infeasible long run variance estimator, Ω+ , is consistent. That is Ω+ → unobserved ηit i i Ω+ i . 1 + + γi − γi ) = ηit , then HAC estimator, Ω+ Note that: u+ it − vit (ˆ i , for uit − − (ˆ γi − γi ) 163 vit (ˆ γi − γi ) can be written as 1 − (ˆ γi − γi ) Ω+ i 1 − (ˆ γi − γi ) with (ˆ γi − γi ) ⇒ 0k×k 0k×k · · · Ik · · · 0k×k 0 · · · 0 ×   −1  N  1 1 N ¨ i (1) − H ¨ i (r) dwu,i (r)  σu·v,i hi (r)hi (r)dr H   0 0 i=1 i=1 = d Ψi where dΨi represents the (i + 1)th k × 1 block of the distribution Ψ. Combining the above results shows that Ω+ i converges to 1 −dΨ 2 σu·v,i 0 0 Ωvv,i i 1 2 = σu·v,i + dΨ Ωvv,i dΨi i −dΨi −2 d Ω 2 = σu·v,i 1 + σu·v,i Ψ vv,i dΨi i = 2 σu·v,i 1 + dγ dγi , i 2 2 2 ˆu·v,i , converges 1 + dγ dγi . This implies that VˆP IM , using σ which leads to σ ˆu·v,i ⇒ σu·v,i i to −1 1 N VˆP IM ⇒ × hi (r)hi (r)dr 0 i=1 N 1 2 σu·v,i 1 + dγi dγi i=1 ¨ i (1) − H ¨ i (r) H 0 −1 1 N hi (r)hi (r)dr 0 i=1 =VP IM 1 + dγ dr 164 ¨ i (1) − H ¨ i (r) dr × H where  −1 1 N  dγ dγ = VP−1 hi (r)hi (r)dr × IM 0 i=1  N 1 2 ¨ i (1) − H ¨ i (r) dγ dγ i σu·v,i H i  0 i=1 −1  ¨ i (1) − H ¨ i (r) dr H   ×  1 N  0 i=1 hi (r)hi (r)dr . The null limiting distribution of Wald and t statistics can be computed as follows. ˆ W = Rθˆ − r = R θˆ − θ = −1 A−1 R RAP IM AP IM −1 RAP IM V˘P IM AP IM R RAP IM V˘P IM AP IM R θˆ − θ Rθˆ − r −1 R θˆ − θ ˘ A−1 R RAP IM VP IM AP IM R −1 ˆ A−1 R RAP IM AP IM θ − θ −1 ⇒ [R∗ Ψ] R∗ VP IM 1 + dγ dγ (R∗ ) = [R∗ Ψ] χ2q 1 + dγ dγ and for q = 1, tˆ = = = Rθˆ − r RAP IM V˘P IM AP IM R R θˆ − θ RAP IM V˘P IM AP IM R A−1 RAP IM A−1 θˆ − θ R P IM −1 ˘ A−1 R RAP IM VP IM AP IM R AR R∗ Ψ ⇒ R∗ VP IM 1 + dγ dγ (R∗ ) = Z . 1 + dγ dγ 165 A−1 R −1 × ˆ . Recall that Second, consider the fixed-b limit of the tˆ and W 1 u ˜ = S y¨ − tˆ x ¨ βˆ − N − 1 x γ ˆi + Sˆit µi − Sit it it N N x ¨β+ = tµi + Sit x ¨ βˆ − − tˆ µi − Sit 1 N −1 xit γi − N N = u Sit 1 − N u ¨ xjt γj + Sit j=1,j=i xjt γˆj N xjt Ω−1 vv,j Ωvu,j j=1,j=i N ˆ xjt Ω−1 vv,j Ωvu,j − qit θ − θ j=1,j=i N u Sjt j=1,j=i N j=1,j=i N −1 1 xit Ω−1 Ωvu,i + vv,i N N u ¨− = Sit xjt γˆj N N −1 1 xit γˆi + N N 1 N −1 Ωvu,i + xit Ω−1 − vv,i N N N − xit Ω−1 vv,i Ωvu,i j=1 1 + N N ˆ xjt Ω−1 vv,j Ωvu,j − qit θ − θ j=1 where qit is defined above. u ˜ can be written as Then, the first difference of Sˆit u ˜ = Sˆit u Sit 1 − N 1 = uit − N N u Sjt − xit Ω−1 vv,i Ωvu,i j=1 N ujt − vit Ω−1 vv,i Ωvu,i j=1 1 + N 1 + N N xjt Ω−1 vv,j Ωvu,j − qit θˆ − θ j=1 N −1 Ω vjt Ωvv,j vu,j − j=1 166 qit θˆ − θ . Consequently, 1 [rT ] 1 [rT ] u ˜ = T −2 Sˆit T −2 t=1 uit − t=1 1 + N = + N T − 2 1 ujt  − T − 2 xi[rT ] Ω−1 vv,i Ωvu,i t=1 j=1 1 1 j=1 N 1 − N N 1 1 −2 u T − 2 Sj[rT xi[rT ] Ω−1 vv,i Ωvu,i ]−T j=1 1 1 −2 ˆ T − 2 xj[rT ] Ω−1 qi[rT ] AP IM A−1 vv,j Ωvu,j − T P IM θ − θ j=1 1 ⇒ Bu,i (r) − N 1 + N 1  [rT ] −2 T − 2 xj[rT ] Ω−1 qi[rT ] θˆ − θ vv,j Ωvu,j − T 1 u T − 2 Si[rT ] 1 N 1 N  N N Bu,j (r) − Bv,i (r)Ω−1 vv,i Ωvu,i j=1 N Bv,j (r)Ω−1 vv,j Ωvu,j − hi (r)Ψ j=1 1 = σu·v,i wu,i (r) + λuv,i Wv,i (r) − N 1 2 Ω−1 Ω − Wv,i (r)Ωvv,i vv,i vu,i + 1 N N N N σu·v,j wu,j (r) + λuv,j Wv,j (r) j=1 1 2 Ω−1 Ω Wv,j (r)Ωvv,j vv,j vu,j − hi (r)Ψ j=1 1 σu·v,j wu,j (r) − hi (r)Ψ = σu·v,i wu,i (r) − N j=1   N σu·v,j wu,j (r) 1 −1 h (r)Ψ = σu·v,i wu,i (r) − − σu·v,i i N σu·v,i j=1 = σu·v,i · Pˆi (r) 167 where N 1 Pˆi (r) = wu,i (r) − N = wu,i (r) − j=1 N σu·v,j wu,j (r) 1 N σu·v,i j=1  −1 h (r)  − σu·v,i i   −1 1 N 0 i=1 × hi (r)hi (r)dr  1 N 0 i=1 σu·v,j wu,j (r) −1 h (r)Ψ − σu·v,i i σu·v,i hi (r) σu·v,i wu,i (r) − 1 N  N  σu·v,j wu,j (r) dr . j=1 In sum, T −1 2 [rT ] 1 u ˜ = T −2 S ˆ ˆu˜ Sˆit i[rT ] =⇒ σu·v,i Pi (r). (3.10) t=1 1 2 u ˜ Next, write σ ˆu·v,i in terms of T − 2 Sˆi[rT . The kernel function used here is the Bartlett ] kernel. Define Kts = k 2K ts |t − s| M = (Kts − Kt,s+1 ) − (Kt+1,s − Kt+1,s+1 ). Simple algebra gives T 2 σ ˆu·v,i = T T −1 k j=2 h=2 T = T −1  |j − h| M  T where T aj = u ˜. Kjh Sˆih bj = h=2 168 aj b j j=2 h=2 u ˜, Sˆij T u ˜  = T −1 Kjh Sˆih  Sˆu˜ ij j=2 u ˜ S ˆu˜ Sˆij ih Using summation by parts we can write  T  T t as  bT − a1 b2 + at b t =  t=2 T −1 s=1 as (bt − bt+1 ) t=2 (3.11) s=1 which gives T 2 σ ˆu·v,i = u ˜ T −1 SˆiT T u ˜ − T −1 S ˆu˜ KT h Sˆih i1 h=2  T −1 Sˆu˜ + T −1 ij j=2  u ˜ K2h Sˆih h=2 T T u ˜ − Kjh Sˆih  h=2  u ˜  Kj+1,h Sˆih h=2 We need to apply (3.11) to the sums over h: T T −1 T u ˜ = KT h Sˆih (1) h=2 u ˜K ˆu˜ ˆu˜ Sˆih T h = SiT KT T − Si1 KT 2 + h=2 T h=2 T −1 T u ˜ = K2h Sˆih (2) h=2 u ˜K =S ˆu˜ K2T − Sˆu˜ K22 + Sˆih 2h i1 iT T −1 T u ˜ = Kjh Sˆih h=2 u ˜K =S ˆu˜ KjT − Sˆu˜ Kj2 + Sˆih jh i1 iT T u ˜K Sˆih j+1,h u ˜ = Kj+1,h Sˆih h=2 u ˜ K −K Sˆih jh j,h+1 h=2 h=2 T (4) u ˜ K −K Sˆih 2h 2,h+1 h=2 h=2 T (3) u ˜ K Sˆih T h − KT,h+1 h=2 T −1 u ˜ K ˆu˜ = SˆiT j+1,T − Si1 Kj+1,2 + u ˜ K Sˆih j+1,h − Kj+1,h+1 . h=2 169 2 Plugging in these expressions to σ ˆu·v,i gives   T −1 u ˜ S 2 ˆu˜ KT T − Sˆu˜ KT 2 + = T −1 SˆiT σ ˆu·v,i i1 iT u ˜ K  Sˆih T h − KT,h+1 h=2   T −1 u ˜ S ˆu˜ K2T − Sˆu˜ K22 + −T −1 Sˆi1 i1 iT u ˜ K −K  Sˆih 2h 2,h+1 h=2 T −1 +T −1  u ˜ K −K  Sˆih jh j,h+1 u ˜ S ˆu˜ KjT − Sˆu˜ Kj2 + Sˆij i1 iT j=2 T −1 −T −1  T −1 h=2   T −1 u ˜ S ˆu˜ Kj+1,T − Sˆu˜ Kj+1,2 + Sˆij i1 iT j=2 u ˜ K  Sˆih j+1,h − Kj+1,h+1 h=2 T −1 u ˜ K S −1 ˆu˜ = T −1 SˆiT T T iT + T u ˜ (K ˆu˜ SˆiT T h − KT,h+1 )Sih h=2 T −1 +T −1 u ˜ (K ˆu˜ ˆu˜ Sˆij jT − Kj+1,T )SiT + terms related toSi1 j=2 T −1 T −1 +T −1 u ˜ (K − K ˆu˜ Sˆij jh j,h+1 ) − (Kj+1,h − Kj+1,h+1 ) Sih j=2 h=2 T −1 T −1 = T −1 T −1 u ˜ 2K S −1 ˆu˜ Sˆij jh ih + T j=2 h=2 T −1 −1 u ˜ (K +T SˆiT Th h=2 u ˜ (K ˆu˜ Sˆij jT − Kj+1,T )SiT j=2 u ˜ + T −1 S ˆu˜ KT T Sˆu˜ + terms related toSˆu˜ − KT,h+1 )Sˆih i1 iT iT 1 u ˜ vanish as T → ∞, because T − 2 S ˆu˜ converges to σu·v,i Pˆi (0), Note that the terms related to Sˆi1 i1 which equals zero. For the Bartlett kernel we have Kts = k |t − s| M |t−s| = 1 − M , |t − s| M 0 |t − s| > M Then it follows that  0,    1, Kts − Kt,s+1 = M 1  − ,    M 0, 170 t s−M s+1−M t s s+1 t s+M t s+M +1  0,   1  t s−M −1 , s−M t s−1 Kt+1,s − Kt+1,s+1 = M 1  − , s t s−1+M    M 0, t s+M and  2  t=s M , 2K = 1 −M , t = s ± M ts   0, otherwise Using these result we have   T −1 T −M −1 2 2 u ˜S u ˜ u ˜ +S ˆu˜ Sˆu˜ ˆu˜ − 1  = T −1  σ ˆu·v,i Sˆij Sˆi,j+M Sˆij ij i,j+M ij M M j=2 j=2   T −1 T −1 1 u ˜S u ˜ S ˆu˜ − 1 ˆu˜  Sˆij SˆiT + T −1 − iT ih M M j=T −M u ˜ S ˆu˜ + T −1 SˆiT iT T −1 2 = MT h=T −M u ˜ + terms related to Sˆi1 u ˜S ˆu˜ − 2 Sˆij ij MT j=2 −1 u ˜ S ˆu˜ + T SˆiT iT T −M −1 j=2 + terms related to 2 u ˜S ˆu˜ Sˆij i,j+M − M T T −1 u ˜S ˆu˜ Sˆij iT j=T −M u ˜, Sˆi1 where the last term follows from the fact that KT T = 1. Under fixed-b asymptotics we set M = bT where b ∈ (0, 1] is held fixed as T → ∞. 2 Plugging in bT for M into σ ˆu·v,i gives 2 σ ˆu·v,i = 2 bT T −1 1 1 u ˜ T −2 S ˆu˜ − T − 2 Sˆij ij j=2 2 − bT T −1 1 2 bT 1 T −bT −1 1 1 u ˜ T −2 S ˆu˜ T − 2 Sˆij i,j+M j=2 1 1 u ˜ T −2 S ˆu˜ + T − 2 Sˆu˜ T − 2 Sˆu˜ T − 2 Sˆij iT iT iT j=T −bT 1 u ˜. +terms related to T − 2 Sˆi1 171 Using (3.10) and the continuous mapping theorem gives 2 ⇒ σ ˆu·v,i = 2 2 1 2 1−b σu·v,i Pˆi (r) dr − σu·v,i Pˆi (r)σu·v,i Pˆi (r + b)dr b 0 b 0 2 2 1 − σu·v,i Pˆi (r)σu·v,i Pˆi (1)dr + σu·v,i Pˆi (1) b 1−b 2 σu·v,i 2 1−b ˆ 2 1 ˆ2 2 1 ˆ ˆ Pi (r)dr − Pi (r)Pi (r + b)dr − Pi (r)Pˆi (1)dr + Pˆi2 (1) b 0 b 0 b 1−b 2 Qb Pˆi (r) = σu·v,i where 2 1 ˆ2 2 1−b ˆ 2 1 ˆ Qb Pˆi (r) = Pi (r)dr − Pi (r)Pˆi (r + b)dr − Pi (r)Pˆi (1)dr + Pˆi2 (1). b 0 b 0 b 1−b 2 Therefore, based on σ ˆu·v,i , the fixed-b limit of the covariance matrix is given by  VˆP IM ⇒  −1 1 N 0 i=1 × hi (r)hi (r)dr  N 1 2 ¨ i (1) − H ¨ i (r) H Qb Pˆi (r) σu·v,i  0 i=1 −1  ¨ i (1) − H ¨ i (r) dr H   ×  1 N  0 i=1 hi (r)hi (r)dr ˆ =VP IM · Q(b) where  −1 1 N ˆ  hi (r)hi (r)dr × Q(b) = VP−1 IM 0 i=1  N 1 2 ¨ i (1) − H ¨ i (r) σu·v,i Qb Pˆi (r) H  0 i=1  −1 1 N  0 i=1 hi (r)hi (r)dr . 172 ¨ i (1) − H ¨ i (r) dr H    × This implies that χ2q , ˆ Q(b) ˆ ⇒ W and for q = 1, Z tˆ ⇒ . ˆ Q(b) Lastly, consider the result for the fixed-b test statistics. Similar as above and Vogelsang 2 2 and Wagner [2014], σ ˜u·v,i ⇒ σu·v,i Qb P˜i (r) where Qb (·) is the same as above, and P˜i (r) is similar as Pˆi (r) but its component is from the further augment regression (3.9). Therefore, the fixed-b limit of the covariance matrix is such that  V˜P IM ⇒  −1 1 N 0 i=1 × hi (r)hi (r)dr  N 1 2 ¨ i (1) − H ¨ i (r) H σu·v,i Qb P˜i (r)  0 i=1 −1  ¨ i (1) − H ¨ i (r) dr H   ×  1 N  0 i=1 hi (r)hi (r)dr ˜ =VP IM · Q(b) where  −1 1 N ˜  hi (r)hi (r)dr × Q(b) = VP−1 IM 0 i=1  N 1 2 ¨ i (1) − H ¨ i (r) σu·v,i Qb P˜i (r) H  0 i=1  −1 1 N  0 i=1 hi (r)hi (r)dr . This implies that ˜ ⇒ W χ2q , ˜ Q(b) 173 ¨ i (1) − H ¨ i (r) dr H    × and for q = 1, t˜ ⇒ Z ˜ Q(b) 174 . BIBLIOGRAPHY 175 BIBLIOGRAPHY Donald W. K. Andrews. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3):817–858, May 1991. Nicholas Apergis and James E. Payne. Energy consumption and economic growth in central america: Evidence from a panel cointegration and error correction model. Energy Economics, 31(2):211–216, 2009. Jushan Bai and Chihwa Kao. Chapter 1 on the estimation and inference of a panel cointegration model with cross-sectional dependence. In Badi H. Baltagi, editor, Panel Data EconometricsTheoretical Contributions and Empirical Applications, volume 274 of Contributions to Economic Analysis, pages 3–30. Elsevier, 2006. Jushan Bai and Serena Ng. A panic attack on unit roots and cointegration. Econometrica, 72(4):1127–1177, July 2004. Jushan Bai, Chihwa Kao, and Serena Ng. Panel cointegration with global stochastic trends. Journal of Econometrics, 149(1):82–99, 2009. Matthew B Canzoneri, Robert E Cumby, and Behzad Diba. Relative labor productivity and the real exchange rate in the long run: evidence for a panel of oecd countries. Journal of International Economics, 47(2):245–266, 1999. Yoosoon Chang. Bootstrap unit root tests in panels with cross-sectional dependency. Journal of Econometrics, 120(2):263–293, 2004. Yoosoon Chang, Joon Y. Park, and Kevin Song. Bootstrapping cointegrating regressions. Journal of Econometrics, 133(2):703–739, 2006. In Choi and Eiji Kurozumi. Model selection criteria for the leads-and-lags cointegrating regression. Journal of Econometrics, 169(2):224–238, August 2012. Dimitris K Christopoulos and Efthymios G Tsionas. Financial development and economic growth: evidence from panel unit root and cointegration tests. Journal of Development Economics, 73(1):55–74, 2004. Lung fei Lee and Jihai Yu. Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154(2):165–185, February 2010. Chihwa Kao and Min-Hsien Chiang. On the estimation and inference of a cointegrated regression in panel data. In Nonstationary Panels, Panel Cointegration, and Dynamic Panels, volume 1st, chapter 7, pages 179–222. Emerald Group Publishing Limited, 2000. G. Kapetanios, M. Hashem Pesaran, and T. Yamagata. Panels with non-stationary multifactor error structures. Journal of Econometrics, 160(2):326–348, February 2011. 176 Mudit Kapoor, Harry H. Kelejian, and Ingmar R. Prucha. Panel data models with spatially correlated error components. Journal of Econometrics, 140(1):97–130, September 2007. Analysis of spatially dependent data. Mohitosh Kejriwal and Pierre Perron. Data dependent rules for selection of the number of leads and lags in the dynamic ols cointegrating regression. Econometric Theory, 24(5): 1425–1441, October 2008. Chien-Chiang Lee. Energy consumption and gdp in developing countries: A cointegrated panel analysis. Energy Economics, 27(3):415–427, 2005. Hongyi Li and G.S. Maddala. Bootstrapping cointegrating regressions. Journal of Econometrics, 80(2):297–318, 1997. Yi Li. Hypothesis testing in cointegrated panels: Asymptotic and bootstrap method. 2016. Nelson C. Mark and Donggyu Sul. Cointegration vector estimation by panel dols and longrun money demand. Oxford Bulletin of Economics & Statistics, 65(5):655–680, 2003. Suzanne McCoskey and Chihwa Kao. A residual-based test of the null of cointegration in panel data. Econometric Reviews, 17(1):57–84, 1998. Paresh Kumar Narayan and Russell Smyth. Energy consumption and real gdp in g7 countries: New evidence from panel cointegration with structural breaks. Energy Economics, 30(5): 2331–2341, 2008. Efstathios Paparoditis and Dimitris N. Politis. Residual-based block bootstrap for unit root testing. Econometrica, 71(3):813–855, 2003. Efstathios Paparoditis and Dimitris N Politis. Bootstrapping unit root tests for autoregressive time series. Journal of the American Statistical Association, 100(470):545–553, 2005. Cameron Parker, Efstathios Paparoditis, and Dimitris N. Politis. Unit root testing via the stationary bootstrap. Journal of Econometrics, 133(2):601–638, 2006. Andrew Patton, Dimitris N. Politis, and Halbert White. Correction to ”automatic blocklength selection for the dependent bootstrap” by d. politis and h. white. Econometric Reviews, 28(4):372–375, 2009. Peter Pedroni. Critical values for cointegration tests in heterogeneous panels with multiple regressors. Oxford Bulletin of Economics & Statistics, 61(4):653, 1999. Peter Pedroni. Fully modified ols for heterogeneous cointegrated panels. In Nonstationary Panels, Panel Cointegration, and Dynamic Panels., volume 1st, chapter 4, pages 93–130. Emerald Group Publishing Limited, 2000. Peter Pedroni. Panel cointegration: Asymptotic and finite sample properties of pooled time series tests with an application to the ppp hypothesis. Econometric Theory, 20(3):597–625, 2004. 177 M. Hashem Pesaran. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica, 74(4):967–1012, July 2006. Peter C. B. Phillips and Bruce E. Hansen. Statistical inference in instrumental variables regression with i(1) processes. The Review of Economic Studies, 57(1):99–125, January 1990. Dimitris N. Politis and Joseph P. Romano. The stationary bootstrap. Journal of the American Statistical Association, 89(428):1303–1313, 1994. Dimitris N. Politis and Halbert White. Automatic block-length selection for the dependent bootstrap. Econometric Reviews, 23(1):53–70, 2004. Zacharias Psaradakis. On bootstrap inference in cointegrating regressions. Economics Letters, 72(1):1–10, 2001. Dong Wan Shin. Stationary bootstrapping for panel cointegration tests under cross-sectional dependence. Statistics, 49(1):209–223, 2015. Dong Wan Shin and Eunju Hwang. Stationary bootstrapping for cointegrating regressions. Statistics & Probability Letters, 83(2):474–480, 2013. Stephan Smeekes and Jean-Pierre Urbain. On the applicability of the sieve bootstrap in time series panels. Oxford Bulletin of Economics and Statistics, 76(1):139–151, 2014. Anders Rygh Swensen. Bootstrapping unit root tests for integrated processes. Journal of Time Series Analysis, 24(1):99–126, 2003. Timothy J. Vogelsang and Martin Wagner. Integrated modified ols estimation and fixedinference for cointegrating regressions. Journal of Econometrics, 178(2):741–760, 2014. Timothy J. Vogelsang, Martin Wagner, and Yi Li. Integrated modified ols estimation and fixed-b inference for homogeneous cointegrated panels. 2016. Joakim Westerlund. New simple tests for panel cointegration. Econometric Reviews, 24(3): 297–316, 2005. Joakim Westerlund and David L. Edgerton. A panel bootstrap cointegration test. Economics Letters, 97(3):185–190, 2007. Joakim Westerlund and David L. Edgerton. A simple test for cointegration in dependent panels with structural breaks. Oxford Bulletin of Economics and Statistics, 70(5):665–704, 2008. Jihai Yu, Robert de Jong, and Lung fei Lee. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and t are large. Journal of Econometrics, 146(1):118–134, September 2008. Jihai Yu, Robert de Jong, and Lung fei Lee. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. Journal of Econometrics, 167(1):16–37, March 2012. 178