THREE ESSAYS ON ROBUST INFERENCE FOR LINEAR PANEL MODELS WITH MANY
TIME PERIODS
By
Yu Sun

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
Economics – Doctor of Philosophy
2013

ABSTRACT
THREE ESSAYS ON ROBUST INFERENCE FOR LINEAR PANEL MODELS WITH MANY
TIME PERIODS
By
Yu Sun
This dissertation consists of three chapters. The ﬁrst chapter is a critique on the two-way clusterrobust standard errors. In the presence of both cross-sectional correlation and serial correlation,
traditional one-way cluster-robust standard errors are not valid. A new robust variance estimator called two-way cluster-robust standard errors is proposed by Thompson (2011) and Cameron
et al. (2011) to conduct accurate inference when double clustering exists. However, this approach
does not allow for correlation across different ﬁrms in different time periods. If such correlation
exists, then the two-way cluster-robust standard errors will fail to work. Monte Carlo simulation
results demonstrate that using two-way cluster-robust standard errors may lead to unreliable inference even when there is a simple AR(1) time effect. One solution to address this problem is
proposed by Thompson (2011). He has improved the original formula for the two-way clusterrobust standard errors to account for correlation across different ﬁrms in different time periods.
An alternative solution is the standard errors proposed by Driscoll and Kraay (1998) that are robust to cross-sectional correlation of general and unknown form as well as heteroskedasticity and
serial correlation under covariance stationarity and weak dependence. The Driscoll and Kraay,
1998 (DK) standard errors perform well when ﬁrm dummies are included. Interestingly, without
removing the ﬁrm effect, the DK standard errors do not behave well. Simulations results illustrate
these interesting ﬁndings.
The second chapter provides an analysis of the standard errors proposed by Driscoll and Kraay
(1998) in linear Difference-in-Differences (DD) models with ﬁxed effects and individual-speciﬁc
time trends. The analysis is accomplished within the ﬁxed-b asymptotic framework developed by
Kiefer and Vogelsang (2005) for heteroskedasticity and autocorrelation (HAC) robust covariance

matrix estimator based tests. For the ﬁxed-N, large-T case, it is shown that ﬁxed-b asymptotic
distributions of test statistics constructed using the DD estimator and the DK standard errors are
different from the results found by Kiefer and Vogelsang (2005) and Vogelsang (2012). The newly
derived ﬁxed-b asymptotic distributions depend on the date of policy change, λ , individual-speciﬁc
trend functions as well as the choice of kernel and bandwidth. Whether time period dummies
are included does not affect the ﬁxed-b limits. For other regressors that don’t have a structural
change, the usual ﬁxed-b asymptotic distributions still apply. Monte Carlo simulations illustrate
the performance of the ﬁxed-b approximations in practice.
The third chapter studies ﬁnite sample properties of the naive moving blocks bootstrap (MBB)
tests based on the DK standard errors in linear DD models with individual ﬁxed effects. The naive
bootstrap procedure is a bootstrap where the formula used to compute the standard errors on the
resampled data is the same as the formula used on the original data. Following the approach in
Gonçalves (2011), the so-called “panel MBB” method is used in this chapter. This method applies
the standard MBB to the time series of vectors containing all the individual observations at each
time period. Monte Carlo simulation results show that the bootstrap is much more accurate than the
standard normal approximation, and it closely follows the new ﬁxed-b approximation proposed in
the second chapter. This improvement holds for the special case of Bartlett kernel. Results would
look similar for other kernels. It even holds when the independent and identically distributed (i.i.d.)
bootstrap is used, despite potential serial correlation in the data. Simulation results also show that
if the block length is appropriately chosen, the bootstrap can outperform the ﬁxed-b approximation
when there is strong serial correlation.

Copyright by
YU SUN
2013

To my parents.

v

ACKNOWLEDGEMENTS

I would like to express the deepest appreciation to my committee chair, Professor Timothy Vogelsang, for his excellent guidance, caring, patience, and support throughout my dissertation. His
wisdom, knowledge, and commitment to the highest standards inspired and motivated me. He
kept me optimistic and cheered me up during the toughest moments. Without his guidance and
persistent help this dissertation would not have been possible.
I want to thank my committee members, Professor Jeffrey Wooldridge and Richard Baillie,
for providing me useful comments and great advice that improved my dissertation. Special thank
goes to Professor John Jiang, who always shared his valuable experiences with me. I also want to
thank Professor Soren Anderson, Steven Haider, Thomas Jeitschko, Peter Schmidt, Byron Brown
and many other great faculty at Michigan State University, who helped me navigate life as a Ph.D.
student.
I appreciate the assistance given by our friendly staff Belen Feight, Margaret Lynch, Lori Jean
´
Nichols and Jon Glazier. I am grateful to my friend Seunghwa Rho who always supported me
and helped me a lot. Many thanks to my friends Sukampon Chongwilaikasaem, Luke Chu, Cheol
Keun Cho, Gaoyang Wang, Xiaoni Guo, Yuqing Zhou, Cuicui Lu, Wei Li, Xiaojun Wang and
many others for every moment I shared with you at Michigan State University.
Last, but by no means least, I am particularly grateful for the courage, support and endless love
I have received from my parents and my grandparents throughout my life. Thank you so much for
giving me the opportunity of an education from the best universities and your understanding.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
CHAPTER 1
ROBUST INFERENCE FOR LINEAR PANEL MODELS
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Model and Standard Errors . . . . . . . . . . . . . . . . . . . . . .
1.2.1 White and One-Way Cluster-Robust Standard Errors . . . . . .
1.2.2 FM Standard Errors . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Original and Revised Two-Way Cluster-Robust Standard Errors
1.2.4 DK Standard Errors . . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 Test Statistics and Asymptotic Distributions . . . . . . . . . . .
1.3 Finite Sample Performances . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Data Generating Process . . . . . . . . . . . . . . . . . . . . .
1.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Strange Patterns of the DK Standard Errors . . . . . . . . . . .
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
4
5
6
7
8
9
10
11
12
17
20

FIXED-B INFERENCE FOR DIFFERENCE-IN-DIFFERENCES
ESTIMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model Setup and Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .
Asymptotic Theory and Critical Values . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Models With No Additional Regressors . . . . . . . . . . . . . . . . . .
2.3.2 Models With Additional Regressors . . . . . . . . . . . . . . . . . . . .
2.3.3 Asymptotic Critical Values . . . . . . . . . . . . . . . . . . . . . . . . .
Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

22
22
24
28
30
34
39
40
44

FINITE SAMPLE PERFORMANCES OF THE MOVING BLOCKS
BOOTSTRAP FOR LINEAR DIFFERENCE-IN-DIFFERENCES
MODELS WITH INDIVIDUAL FIXED EFFECTS . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Difference-in-Differences Model . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 The Model and DD Estimator . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 The DK Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Test Statistics and Asymptotic Distributions . . . . . . . . . . . . . . . . .
Bootstrap Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finite Sample Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46
46
49
49
49
50
52
56
60

CHAPTER 2
2.1
2.2
2.3

2.4
2.5

CHAPTER 3

3.1
3.2

3.3
3.4
3.5

vii

APPENDICES . . . . . . . . . . . . . . .
Appendix A: PROOFS IN CHAPTER 1
Appendix B: TABLES IN CHAPTER 1
Appendix C: PROOFS IN CHAPTER 2
Appendix D: TABLES IN CHAPTER 2
Appendix E: FIGURES IN CHAPTER 3

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

61
62
67
84
99
142

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

viii

LIST OF TABLES

Table 1.1

Residual cross product matrix . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Table B.1 Estimating coefﬁcient, standard errors and null rejection probabilities with
ﬁrm effects: OLS and one-way clustered standard errors. . . . . . . . . . . . . . 67
Table B.2 Estimating coefﬁcient, standard errors and null rejection probabilities with
ﬁrm effects: FM standard errors. . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Table B.3 Estimating coefﬁcient, standard errors and null rejection probabilities with
time effects: OLS and clustered standard errors. . . . . . . . . . . . . . . . . . . 70
Table B.4 Estimating coefﬁcient, standard errors and null rejection probabilities with
time effects: FM standard errors. . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Table B.5 Comparing performances of White, one-way cluster-robust and two-way clusterrobust standard errors in the presence of both ﬁrm effects and time effects
when N, T varies seperately. For time effects with ρ = 0. . . . . . . . . . . . . . 73
Table B.6 Comparing performances of White, one-way cluster-robust and two-way clusterrobust standard errors in the presence of ﬁrm effects and AR(1) time effects
when N = T = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table B.7 Comparing performances of White, one-way cluster-robust and two-way clusterrobust standard errors in the presence of ﬁrm effects and AR(1) time effects
when N = T = 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Table B.8 Comparing performances of White, one-way cluster-robust and two-way clusterrobust standard errors in the presence of ﬁrm effects and AR(1) time effects
when N = T = 250. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Table B.9 Comparing performances of one-way cluster-robust, two-way cluster-robust
and DK standard errors in the presence of ﬁrm effects and AR(1) time effects
when N = T = 50 and N = T = 250. No ﬁrm dummies. . . . . . . . . . . . . . 78
Table B.10 Comparing performances of one-way cluster-robust, two-way cluster-robust
and DK standard errors in the presence of ﬁrm effects and AR(1) time effects
when N = T = 50 and N = T = 250. Firm dummies. . . . . . . . . . . . . . . . 79
Table B.11 Comparing performances of one-way cluster-robust, two-way cluster-robust
and DK standard errors in the presence of ﬁrm effects and AR(1) time effects.
No ﬁrm dummies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

ix

Table B.12 Comparing performances of one-way cluster-robust, two-way cluster-robust
and DK standard errors in the presence of a ﬁrm effect. No ﬁrm dummies. . . . . 82
Table D.1 90% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend. . . . . 99
Table D.2 95% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend. . . . . 102
Table D.3 97.5% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend. . . . 105
Table D.4 99% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend. . . . . 108
Table D.5 90% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend. 111
Table D.6 95% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend. 114
Table D.7 97.5% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend.117
Table D.8 99% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend. 120
Table D.9 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or
additional regressors. λ = .5, k = .5. AR(1) error. Two-Tailed Test of H0 :
β3 = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Table D.10 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or
additional regressors. λ = .5, k = .5. MA(2) spatial correlation in crosssection. θ = 0.5. Two-Tailed Test of H0 : β3 = 0. . . . . . . . . . . . . . . . . . 125
Table D.11 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or
additional regressors. Time dummies. λ = .5, k = .5. AR(1) error. TwoTailed Test of H0 : β3 = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Table D.12 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. No
additional regressors. λ = .5, k = .5. AR(1) errors. Two-Tailed Test of H0 :
β3 = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Table D.13 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. No
additional regressors. λ = .5, k = .5. MA(2) spatial correlation in crosssection. θ = 0.5. Two-Tailed Test of H0 : β3 = 0. . . . . . . . . . . . . . . . . . 131
Table D.14 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. Time
Dummies. No additional regressors. λ = .5, k = .5. AR(1) errors. Two-Tailed
Test of H0 : β3 = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Table D.15 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. Time
Dummies. No additional regressors. λ = .5, k = .5. MA(2) spatial correlation
in cross-section. θ = 0.5. Two-Tailed Test of H0 : β3 = 0. . . . . . . . . . . . . 135

x

Table D.16 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). One additional
regressor. No trend. λ = .5, k = .5. MA(2) spatial correlation in cross-section.
θ = 0.5. Two-Tailed Test of H0 : β3 = 0 and H0 : γ = 0. . . . . . . . . . . . . . 137
Table D.17 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend and one
additional regressor. λ = .5, k = .5. MA(2) spatial correlation in crosssection. θ = 0.5. Two-Tailed Test of H0 : β3 = 0 and H0 : γ = 0. . . . . . . . . . 139
Table D.18 Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend and
additional regressors. λ = .5, k = .5. MA(2) spatial correlation in crosssection. θ = 0.5. Two-Tailed Test of H0 : β3 = 0. . . . . . . . . . . . . . . . . . 141

xi

LIST OF FIGURES

Figure E.1 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
N = 100, T = 250, ρ = 0.3, b = 0.02. . . . . . . . . . . . . . . . . . . . . . . 142
Figure E.2 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
N = 100,T = 250, ρ = 0.3, b = 0.5. . . . . . . . . . . . . . . . . . . . . . . . 143
Figure E.3 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
N = 50, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Figure E.4 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 49,
λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Figure E.5 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
N = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Figure E.6 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N =
256, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Figure E.7 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
T = 50, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Figure E.8 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, T = 49,
λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Figure E.9 Empirical null rejection probabilities, no spatial correlation, Bartlett kernel,
T = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Figure E.10 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, T =
250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Figure E.11 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 49,
T = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Figure E.12 Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N =
256, T = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Figure E.13 Empirical null rejection probabilities, additional regressor, spatial MA(2),
Bartlett kernel, N = 49, T = 50, λ = 0.5. . . . . . . . . . . . . . . . . . . . . . 204
Figure E.14 Empirical null rejection probabilities, additional regressor, spatial MA(2),
Bartlett kernel, N = 49, T = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . . 210

xii

Figure E.15 Empirical null rejection probabilities, additional regressor, spatial MA(2),
Bartlett kernel, N = 256, T = 50, λ = 0.5. . . . . . . . . . . . . . . . . . . . . 216
Figure E.16 Empirical null rejection probabilities, additional regressor, spatial MA(2),
Bartlett kernel, N = 256, T = 250, λ = 0.5. . . . . . . . . . . . . . . . . . . . 222
Figure E.17 Empirical null rejection probabilities for DD parameter, additional regressor,
spatial MA(2), Bartlett kernel, N = 49, T = 250, λ = 0.5. . . . . . . . . . . . . 228
Figure E.18 Empirical null rejection probabilities for DD parameter, additional regressor,
spatial MA(2), Bartlett kernel, N = 256, T = 250, λ = 0.5. . . . . . . . . . . . 234
Figure E.19 Empirical null rejection probabilities for z parameter, additional regressor,
spatial MA(2), Bartlett kernel, N = 49, T = 250, λ = 0.5. . . . . . . . . . . . . 240
Figure E.20 Empirical null rejection probabilities for z parameter, additional regressor,
spatial MA(2), Bartlett kernel, N = 256, T = 250, λ = 0.5. . . . . . . . . . . . 246

xiii

CHAPTER 1
ROBUST INFERENCE FOR LINEAR PANEL MODELS

1.1

Introduction

Many empirical papers in the accounting and ﬁnance literatures use panel data sets with observations on multiple ﬁrms over multiple time periods. In such panel data settings, the common
assumption of independence in regression errors is likely to be violated. For example, temporary
market-wide common shocks will cause correlation across ﬁrms in the same time period, and persistent ﬁrm characteristics will cause correlation over time. Moreover, persistent common shocks,
such as business cycles, will cause correlation across different ﬁrms in different time periods. Potential clusterings are big challenges, since if we fail to take into account them, we will underestimate the standard error and hence over-reject the null hypothesis when conducting hypothesis tests.
Therefore, how to conduct a robust inference plays a key role in empirical researches. Throughout
this chapter, we call one dimension ﬁrm and the other time.
Various approaches are available to obtain “robust” standard errors. White (1980) proposed an
approach to account for heteroskedasticity in cross-section data. Later White (1984) presented a
formula for a multivariate dependent variable. Arellano (1987) proposed the well-known one-way
cluster-robust standard errors in linear panel models. Wooldridge (2003) provided an overview
of applications of cluster methods. Hansen (2007) investigated asymptotic properties of a robust
variance matrix estimator for panel data when T is large. Fama and MacBeth (1973) proposed a
method that computes standard errors robust to correlation across ﬁrms in the same time period.
White standard errors and one-way cluster-robust standard errors are common in econometrics
textbooks (e.g., Wooldridge, 2002).
Most papers in the literature only deal with clustering in one dimension and ignore clustering
in the other dimension. Methods that control for clustering in one dimension usually assume

1

independence in the other dimension. However, when both cross-sectional and serial correlation
exist, the one-way cluster-robust method mis-speciﬁes the error structure and underestimate the
true standard error. This will lead to over-rejections in hypothesis testing. One solution is the
two-way cluster-robust standard errors proposed by Thompson (2011) and Cameron et al. (2011).
This variance estimator is designed to produce robust inference when there is two-way non-nested
clustering. Speciﬁcally, in ﬁnance applications, clustering at the ﬁrm level and at the time (e.g.
day) level is of interest. This method allows for serial correlation for a given ﬁrm and correlation
across different ﬁrms in the same time period (cross-sectional correlation). However, this approach
assumes that there is no correlation across different ﬁrms in different time periods. This method
generalizes the standard cluster-robust variance estimator for one-way clustering to that for twoway clustering, and relies on similar relatively weak distributional assumptions. It can also be
generalized to clustering with more than two dimensions (see Cameron et al., 2011).
Petersen (2009) has compared these robust standard errors and suggested using the two-way
cluster-robust standard errors as a robustness check. Gow et al. (2010) ﬁnd that two-way clusterrobust standard errors are required for valid inference in many accounting applications. However,
the two-way clustering method only works for a speciﬁc and restricted error structure. In practice,
the assumption that there is no correlation across different ﬁrms in different time periods is likely
to be violated. Suppose now there is a common shock to all the ﬁrms in the same industry; it is
much more realistic that this shock would affect those ﬁrms to some extent in the future rather
than completely disappear at the end of the current time period. Hence different ﬁrms in different
time periods may have some correlation between each other due to the lagged effect. This could
happen in a business cycle. If so, then the two-way cluster-robust standard errors will probably
fail. There are two solutions available to correct this problem. Thompson (2011) has improved the
original formula for the two-way cluster-robust standard errors to account for correlation across
different ﬁrms in different time periods. We will call it the revised two-way cluster-robust standard
errors. Another alternative solution is to use the Driscoll and Kraay, 1998 (DK) standard errors
which account for heteroskedasticity, autocorrelation and cross-sectional correlation of general and

2

unknown form. A recent paper by Vogelsang (2012) has shown that ﬁxed-b asymptotic approximations (see Kiefer and Vogelsang, 2005) for the DK standard errors perform substantially better
than standard normal asymptotic approximations for either the DK standard errors or the one-way
cluster-robust standard errors in the context of linear panel models with individual ﬁxed effects and
cross-sectional correlation.
The objective of this chapter is to show that in the presence of both ﬁrm effect and time effect,
if there is correlation across different ﬁrms in different time periods, the two-way cluster-robust
method fails. Furthermore, two possible solutions to correct this problem are analyzed using simulations. First, several tables from Petersen (2009) are replicated and similar results are found in
simulations. In these tables, the sensitivity of standard error estimates to the presence of either
ﬁrm effects or time effects is examined. Next, we study the performance of the two-way clusterrobust standard errors in the presence of both ﬁrm effects and time effects by comparing them
to the White standard errors and the one-way cluster-robust standard errors. In this scenario, the
two-way cluster-robust standard errors perform better than the one-way clustering method. Then,
we assume that the time effect follows an AR(1) process and analyze the performance of the twoway clustering method. When the absolute value of the autocorrelation parameter, ρ, is close to
1, the two-way clustering method generally fails and leads to over-rejections. Finally, we examine
the performance of the revised two-way clustering method and the DK standard errors. The DK
standard errors perform well when ﬁrm dummies are included. Without removing the ﬁrm effect,
the DK standard errors do not behave well. Besides, ﬁrm dummies should be included if we care
about the endogeneity problem.
The rest of this chapter is organized as follows. Section 1.2 describes the model and reviews
several estimating methods for standard errors in panel data sets, including White, one-way clusterrobust, FM, original two-way cluster-robust, revised two-way cluster-robust and DK standard errors. Test statistics and their asymptotic distributions are also included in this section. Section 1.3
reports Monte Carlo simulation results. Section 1.3 also has theory for DK tests that explains some
strange patterns in simulations. Section 1.4 concludes. Appendix A contains proofs of a theorem

3

that explains the strange pattern of the DK standard errors when ﬁrm effects are not removed in
the large-N, large-T case. Appendix B contains all simulation result tables.

1.2

The Model and Standard Errors

We follow the deﬁnitions for ﬁrm effects, time effects and persistent common shocks in Thompson
(2011). Firm effect means that the errors have arbitrary serial correlation for a given ﬁrm. Time
effect means that the errors have arbitrary correlation across different ﬁrms in the same time period.
Persistent common shock means that the errors have arbitrary correlation across different ﬁrms in
different time periods. Consider a linear regression model given by
yit = xit β + εit ,
i = 1, 2, . . . , N,

(1.1)

t = 1, 2, . . . , T,

where yit , xit and εit are scalars. The error εit and the regressor xit are assumed to have the same
structure given by
εit = γi + δt + ηit ,

(1.2)

xit = µi + θt + ξit ,

(1.3)

δt = ρδt−1 + et ,

(1.4)

θt = ρθt−1 + ut ,

(1.5)

with

where δt and θt have the same autocorrelation parameter ρ. γi and µi are ﬁrm effects. δt and θt
are time effects. ηit and ξit are idiosyncratic errors. All error components have zero mean, ﬁnite
variance, and are independent of each other. It is assumed that γi , µi , et , ut , ηit and ξit all follow a
normal distribution. δt and θt are serially correlated, and they follow an AR(1) process. They are
normal when ρ = 0.

4

The parameter of interest is β , and the estimation method is the ordinary least squares (OLS)
estimator
ˆ
β=

N

T

∑ ∑

−1 N

2
xit

T

∑ ∑ xit yit

i=1 t=1
i=1 t=1
N T
−1 N T
2
= β + ∑ ∑ xit
∑ ∑ xit εit .
i=1 t=1
i=1 t=1

(1.6)

ˆ
ˆ
ˆ
ˆ
Let vit = xit εit and deﬁne vit = xit εit where εit are the OLS residuals given by εit = yit − xit β .
ˆ
N T
2
ˆ
Let Q = ∑ ∑ xit and Ω = ∑ E(vit v js ). We need to estimate the covariance matrix to obtain
i, j,t,s
i=1 t=1
robust tests. We will focus on the following approaches in this chapter: White standard errors,
one-way cluster-robust standard errors, FM standard errors, original and revised two-way clusterrobust standard errors, and DK standard errors. Note that the FM approach also uses a different
estimator of β . Details are discussed in subsection 1.2.2.

1.2.1

White and One-Way Cluster-Robust Standard Errors

In order to write down a general notation that nests each one-way approach, we use the group
notation in this subsection. With observations grouped into G clusters of Ng observations, for
g ∈ {1, . . . , G}, we can rewrite model (1.1) as
yg = xg β + εg ,
where yg , xg and εg are Ng × 1 vectors. The one-way cluster-robust variance estimator is
G

ˆ
ˆ
VC = Q−1

∑

ˆ ˆ ˆ
vg vg Q−1 ,

(1.7)

g=1
ˆ
ˆ
where vg is a Ng × 1 vector containing all vit in cluster g. If each cluster only contains one single
observation, then this estimator gives White (1980) standard errors
ˆ
ˆ
VW hite = Q−1

N

T

ˆit
∑ ∑ v2

i=1 t=1

5

ˆ
Q−1 .

(1.8)

If we cluster by ﬁrm, then G = N and Ng = T . If we cluster by time, then G = T and Ng = N. This
estimator is consistent if
G−1

G

∑

p
ˆ ˆ →
vg vg − E(vg vg ) as G → ∞.

(1.9)

g=1
When either ﬁrm effects or time effects exist, White standard errors are not valid. If there are ﬁrm
effects only, we can cluster by ﬁrm. If there are time effects only, we can cluster by time. One-way
cluster-robust standard errors allow for correlation of any unknown form within clusters, but the
errors are assumed to be uncorrelated across clusters. When both ﬁrm effects and time effects are
present, the consistency condition (1.9) is violated and thus the one-way clustering method fails to
work.

1.2.2

FM Standard Errors

The Fama and MacBeth (1973) approach is originally used in asset pricing models such as the wellknown capital asset pricing model (CAPM). Since stocks have weak serial correlation in daily and
weekly holding periods, this approach is designed to correct cross-sectional correlation. In the
original version of this approach, researchers run T cross-sectional regressions (one for each time
period). For each coefﬁcient β j , the FM estimator is the average of the T estimates
T
FM = 1
ˆ
β ,
βj
T ∑ t, j
t=1

(1.10)

and the FM variance estimator is given by
2
ˆ
ˆ
T βt, j − β FM
1
j
ˆ
s2 β FM =
.
∑
j
T
T −1
t=1

(1.11)

The variance formula assumes no correlation over time. Therefore, when there are only time
effects, this approach produces a consistent variance estimator as T → ∞. However, in the presence
of ﬁrm effects, the assumption does not hold, and hence the FM standard errors tend to be too small.

6

1.2.3

Original and Revised Two-Way Cluster-Robust Standard Errors

Thompson (2011) and Cameron et al. (2011) have extended one-way cluster-robust standard errors
to two-way cluster-robust standard errors that are robust to double clustering by ﬁrm and time. The
original version just generalizes the one-way clustering method, and assumes no correlation across
different ﬁrms in different time periods. Thompson (2011) noticed this limitation and proposed a
revised version which takes into account correlation across different ﬁrms in different time periods.
The revised formula is
ˆr
ˆ
ˆ
ˆ
Vdouble = V f irm + Vtime,0 − VW hite,0 +

L

L
ˆ
ˆ
ˆ
ˆ
∑ (Vtime,l + Vtime,l ) − ∑ (VW hite,l + VW hite,l ),
l=1
l=1
(1.12)

with
ˆ
ˆ
V f irm = Q−1
ˆ
ˆ
Vtime,l = Q−1
ˆ
ˆ
VW hite,l = Q−1

N

∑ sˆ2
i

i=1
T

∑

ˆ
Q−1 ,
ˆ
st st−l Q−1 ,
ˆ ˆ

t=l+1
N
T

∑ ∑

i=1 t=l+1

ˆ
vit vi,t−l Q−1 .
ˆ ˆ

N
T
ˆ
ˆ
ˆ
si = ∑ vit is the sum of all observations for ﬁrm i. st = ∑ vit is the sum of all observations
ˆ
t=1
i=1
ˆ
for time t. This estimator is consistent as min (N, T ) → ∞ (see Thompson, 2011). V f irm is the
ˆ
usual formula for standard errors clustered by ﬁrm, Vtime,0 is the usual formula for standard erˆ
ˆ
rors clustered by time, and VW hite,0 is the usual White standard errors. V f irm accounts for serial
ˆ
correlation for each ﬁrm, while Vtime,0 accounts for correlation across different ﬁrms in the same
ˆ
time period. The terms Vtime,l with l ≥ 1 account for the correlation across different ﬁrms in difˆ
ferent time periods. The terms VW hite,l with l ≥ 0 are subtracted off because of double counting.
The original two-way formula only contains the ﬁrst three terms in (1.12)
ˆ
ˆ
ˆ
ˆ
Vdouble = V f irm + Vtime,0 − VW hite,0 .

(1.13)

Suppose there are 3 ﬁrms and 3 time periods. Table 1.1 illustrates the sample covariance matrix
of the residuals under the assumptions for the original formula. The original version allows for
7

correlation of any unknown form within clusters, clustering either by ﬁrm or by time, but it assumes
no correlation across different ﬁrms in different time periods. The revised version corrects for
L
ˆ
ˆ
ˆ
potential persistent common shocks in the data. In fact, the Vtime,0 + ∑ (Vtime,l + Vtime,l ) part
l=1
is exactly the DK standard errors using the truncated kernel with a truncation lag L. We will talk
about the DK standard errors in details in the next subsection.
Table 1.1: Residual cross product matrix: When standard errors are clustered by both ﬁrm and
time, correlation of residuals of the same ﬁrm in different years and residuals of the same year in
different ﬁrms may be nonzero. However, correlation of residuals in different ﬁrms and different
years are assumed to be zero.

Firm 3

Firm 2

Firm 1

Firm 1
2
ε11

Firm 3

ε11 ε12 ε11 ε13 ε11 ε21
0
0
ε11 ε31
0
2
ε12 ε11
ε12
ε12 ε13
0
ε12 ε22
0
0
ε12 ε32
2
ε13 ε11 ε13 ε12
ε13
0
0
ε13 ε23
0
0
2
ε21 ε11
0
0
ε21
ε21 ε22 ε21 ε23 ε21 ε31
0
2
0
ε22 ε12
0
ε22 ε21
ε22
ε22 ε23
0
ε22 ε32
2
0
0
ε23 ε13 ε23 ε21 ε23 ε22
ε23
0
0
2
ε31 ε11
0
0
ε31 ε21
0
0
ε31
ε31 ε32
2
0
ε32 ε12
0
0
ε32 ε22
0
ε32 ε31
ε32
0

1.2.4

Firm 2

0

ε33 ε13

0

0

0
0
ε13 ε33
0
0
ε23 ε33
ε31 ε33

ε32 ε33
2
ε33 ε23 ε33 ε31 ε33 ε32
ε33

DK Standard Errors

Driscoll and Kraay (1998) ﬁrst proposed the heteroskedasticity, autocorrelation and cross-section
correlation (HACC) robust variance estimator using the time series of cross-sectional sums of observations. The idea is to ﬁrst aggregate all the individual observations at each time period and
then apply the HAC estimator to the time series of the sums. The ﬁrst step takes into account
potential cross-sectional correlation in the data, and the second step takes into account potential
8

serial correlation in the data. Therefore, the DK standard errors are robust to cross-sectional correlation of unknown form as well as heteroskedasticity and serial correlation, assuming covariance
stationarity and weak dependence in the time dimension.
T
ˆ
¯
ˆ
ˆ ˆ
Deﬁne vt = ∑N vit , and let Γ j = T −1 ∑ vt vt− j . The DK standard errors are given by
¯
ˆ
¯ ¯
i=1
t= j+1
ˆ ˆ
¯
ˆ
ˆ
VDK = T Q−1 ΩQ−1 ,
with
ˆ
ˆ
¯
¯
Ω = Γ0 +

T −1

∑

k(

j=1

(1.14)

j ˆ
ˆ
¯
¯
)(Γ j + Γ j ).
M

where k(x) is a kernel function such that k(x) = k(−x), k(0) = 1, |k(x)| ≤ 1, k(x) is continuous at
∞
x = 0, and −∞ k2 (x) < ∞. M is the bandwidth parameter, or the truncation lag.

1.2.5

Test Statistics and Asymptotic Distributions

Consider testing the null hypotheses about β of the form H0 : β = β0 . Deﬁne the t-statistic as
ˆ
β −β
t = √ 0.
ˆ
V
If we only assume heteroskedasticity, White standard errors are consistent as N → ∞. If we allow
for heteroskedasticity and general forms of serial correlation, ﬁrm clustered standard errors are
consistent as N → ∞. If we assume independence over time and allow for cross-sectional correlation, FM and time clustered standard errors are consistent as T → ∞. Two-way clustered standard
errors are consistent if there are serial correlation for a given ﬁrm and cross-sectional correlation at
a given time period but no correlation across different ﬁrms in different time periods. Consistency
of two-way cluster standard errors requires N, T → ∞. So t-statistics based on these standard errors
have a limiting standard normal distribution.
ˆ
For the DK standard errors, the traditional asymptotic approach relies on Ω being a consistent
ˆ
estimator of Ω. Consistency of Ω requires that M → ∞ as T → ∞, but at a slower rate of convergence M → 0. Under the traditional approach, the t-statistic has a limiting standard normal
T
distribution. An alternative asymptotic theory has been proposed by Kiefer and Vogelsang (2005).
9

They model the bandwidth as a ﬁxed proportion of the sample size. That is, M = bT with b a ﬁxed
constant in (0, 1]. Because b is held ﬁxed in this approach, this alternative approach is usually
labeled ﬁxed-b asymptotics while the traditional approach is labeled small-b asymptotics. Under
ˆ
the ﬁxed-b approach, Ω converges to a random variable that depends on the kernel function and
bandwidth, rather than a constant. As a result, the t-statistic has a nonstandard limiting distribution. This limiting distribution reﬂects the choice of kernel and bandwidth, but is otherwise pivotal.
Fixed-b asymptotics provide more accurate and reliable inference than small-b asymptotics. For
each kernel function, ﬁxed-b critical values can be simulated. In particular, in linear panel models
with individual ﬁxed effects, Vogelsang (2012) has shown that
t⇒

W1 (1)
,
P1 ((b)

where ⇒ denotes weak convergence, W1 (r) is the standard Wiener process, and P1 (b) is a random
matrix that depends on the kernel function and bandwidth. For example, in the case of Bartlett
kernel,
2
P1 (b) =
b

1 2
1−b
B1 (r)dr −
B1 (r)B1 (r + b)dr
0
0

where B1 (r) = W1 (r) − rW1 (1).

1.3

Finite Sample Performances

This section compares ﬁnite sample performances of the covariance matrix estimators described
in section 1.2 under different error structures. First, errors with one-way clusering are considered.
We follow Petersen (2009) and analyze the sensitivity of standard errors to the presence of ﬁrm
effects or time effects. Next, we compare the performance of White, one-way cluster-robust, and
original two-way cluster-robust standard errors in the context of double clustering and persistent
common shocks. Finally, we examine the performance of revised two-way cluster-robust and DK
standard errors in the context of persistent common shocks.

10

1.3.1

Data Generating Process

The data generating process (DGP) is based on model (1.1). Suppose the structures of εit and xit
satisfy (1.2), (1.3), (1.4) and (1.5). The true slope coefﬁcient β is 1. When there are only ﬁrm
effects, the correlation structures of εit and xit take the following form

 1, for i = j and t = s





2
σµ
corr xit , x js =
ρ =
, for i = j and all t = s
 x σ2

x



 0, for all i = j


 1, for i = j and t = s





2
σγ
corr εit , ε js =
ρε = 2 , for i = j and all t = s

σε





 0, for all i = j
When there are only time effects, the correlation structures of εit and xit take the following form

 1, for i = j and t = s





σ2
corr xit , x js =
ρx = θ , for t = s and all i = j
2


σx



 0, for all t = s


 1, for i = j and t = s





σ2
corr εit , ε js =
ρε = δ , for t = s and all i = j
2

σε





 0, for all t = s
2
2
2
So the variance of γi (or δt ), µi (or θt ), ηit and ξit can be written as ρε · σε , ρx · σx , (1 − ρε ) · σε
2
and (1 − ρx )·σx , respectively. In order to examine the sensitivity of standard errors to the presence
of either ﬁrm effects or time effects, we set σx = 1, and σε = 2. We allow the fraction of the
variance of xit and εit caused by the ﬁrm effect, i.e. ρx and ρε respectively, to vary from 0% to
75%. The simulation results are based on 5,000 random samples with 500 ﬁrms and 10 years per
ﬁrm. The empirical null rejection probabilities of t-statistics built upon White, one-way clusterrobust and FM standard errors are reported at a two-sided signiﬁcance level 1%.
11

When there are double clustering and persistent common shocks, we focus on the comparison
of the performances of each variance estimator. The DGP follows (1.2) and (1.3), with both ﬁrm
effects and time effects. Firm effects (γi , µi ) and idiosyncratic errors (ηit , ξit ) follow a standard
normal distribution. For a special case of double clustering but no persistent common shocks,
time effects (δt , θt ) are assumed to follow a standard normal distribution (ρ = 0). For a special
case of persistent common shocks, time effects (δt , θt ) are assumed to follow an AR(1) process
(ρ > 0). The (N, T ) combinations vary in different simulations, but all simulations are based
on 2,000 random samples. In the double clustering case, we allow N and T to vary from 10
to 250 separately. In the persistent common shock case, we allow N = T = 10, 50, 250. The
autocorrelation parameter, ρ, takes values from -0.95 to 0.95 in Table B.6, B.7 and B.8. ρ =
0, 0.3, 0.6, 0.9 in Table B.9 and B.10. For the DK standard errors, we focus on the Bartlett kernel,
k(x) = 1 − |x| for |x| ≤ 1 and k(x) = 0 for |x| ≥ 1. We set the bandwidth b = 0.1, 0.2, . . . , 0.9. The
truncation lag in the revised two-way clustering method is set to be the same as the bandwidth in
DK. The empirical null rejection probabilities of t-statistics are reported at a two-sided signiﬁcance
level 5%.

1.3.2

Results

Table B.1-B.4 illustrate how sensitive standard errors are to the presence of either ﬁrm effects or
time effects. The DGP of Table B.1 and B.2 contains ﬁrm effects only, and the DGP of Table B.3
and B.4 contains time effects only and ρ = 0. Table B.1 and B.3 report empirical null rejection
probabilities of t-statistics based on White standard errors and one-way cluster-robust standard
errors. Table B.2 and B.4 report empirical null rejection probabilities of t-statistics based on FM
standard errors. ρx varies across columns while ρε varies across rows. In Table B.1 and B.3,
ˆ
each cell contains the average OLS estimate of β and the standard deviation of β . The third and
ﬁfth entry are the average White standard errors and clustered standard errors, respectively. The
empirical null rejection probabilities of White and clustered t-statistics at a two-sided signiﬁcance
level 1% are shown in square brackets below the standard error estimates. In Table B.2 and B.4,

12

ˆ
each cell contains the average FM coefﬁcient estimate and the standard deviation of β . The third
entry is average FM standard errors. The empirical null rejection probabilities of FM t-statistics at
a two-sided signiﬁcance level 1% are shown in square brackets below.
For example, consider the case where 50% of the variability in both the error and the regressor
is due to the ﬁrm effect or the time effect, i.e. ρx = ρε = 0.50. In Table B.1, the average OLS
coefﬁcient estimate is 1.0008 and the standard deviation of the OLS coefﬁcient estimate is 0.0510.
The White standard error estimate is 0.0283 and the clustered standard error is 0.0508. 15.98%
of the White t-statistics are greater than 2.58 in absolute value, while 1.02% of the clustered tstatistics are greater than 2.58 in absolute value. In Table B.2, the average FM coefﬁcient estimate
is 1.0008 and the standard deviation of the FM coefﬁcient estimate is 0.0511. The FM standard
error estimate is 0.0239 and 24.98% of the FM t-statistics are greater than 2.58 in absolute value.
In Table B.3 , the average OLS coefﬁcient estimate is 0.9966 and the standard deviation of the
OLS coefﬁcient estimate is 0.3073. The White standard error estimate is 0.0277 and the clustered
standard error estimate is 0.2445. 81.28% of the White t-statistics are greater than 2.58 in absolute
value, while 7.40% of the clustered t-statistics are greater than 2.58 in absolute value. In Table
B.4, the average FM coefﬁcient estimate is 0.9999 and the standard deviation of the FM coefﬁcient
estimate is 0.0282. The FM standard error estimate is 0.0276 and 2.68% of the FM t-statistics are
greater than 2.58 in absolute value.
If there are no ﬁrm (time) effects in either the error or the regressor, White standard errors
work well. As you can see from Table B.1 and B.3, in the ﬁrst row and ﬁrst column, the rejection
probabilities are around 1%. However, as long as both of the regressor and the error contain ﬁrm
(time) effects, White standard errors underestimate the variance and lead to over-rejections. As
ρx and ρε increase, White standard errors remain the same either across columns or across rows,
but the true standard errors increase. In contrast, standard errors clustered by ﬁrm are very close
to the true standard errors. In Table B.1, the rejection probabilities for clustered t-statistics are
around 1%, despite the change of ρx and ρε . In this setting, one-way cluster-robust standard errors
correctly account for the correlation in the data and produce accurate inference. In Table B.3,

13

standard errors clustered by time are much more accurate than White standard errors, but they still
underestimate the true standard errors. Moving down the diagonal of Table B.3 from upper left to
bottom right, the rejection probabilities for clustered t-statistics at a two-sided signiﬁcance level
1% go from 4.04% to 9.16%. One possibility is that we have large N and small T (N = 500 and
T = 10) in the DGP. There are only ten clusters if clustered by time, which is not large enough for
standard normal approximations to be valid.
The FM approach is designed to account for correlation across different ﬁrms in the same
time period, so when there are only ﬁrm effects, FM standard errors fail to account for serial
correlation. From Table B.2, we can see that FM standard errors are biased downward. Moving
down the diagonal of Table B.2 from upper left to bottom right, the true standard errors rise while
the FM standard errors shrink. In the presence of time effects only, the FM approach works well.
FM standard errors are very close to the true standard errors, and the rejection probabilities for FM
t-statistics at a two-sided signiﬁcance level 1% are approximately 3% for all cells in Table B.4.
When there are both ﬁrm effects and time effects, one-way cluster-robust standard errors would
probably be biased. According to Petersen (2009), a common approach to address double clustering is to include a full set of time dummies and then cluster by ﬁrm. If the time effect is constant
across ﬁrms in the same time period, then time dummies completely eliminate the time effect.
What is left in the error term is just the ﬁrm effect. However, this approach only works when
the correlation is correctly speciﬁed. If the time effect is not constant across ﬁrms, time dummies
will not completely remove the time effect, and thus standard errors clustered by ﬁrm would be
biased. Another limitation of the inclusion of dummies that empirical researchers care about is
that it restricts the types of regressors that can be included. One solution suggested by Petersen
(2009) is to cluster by ﬁrm and time simultaneously, using the two-way cluster-robust standard errors proposed by Thompson (2011) and Cameron et al. (2011). Table B.5 compares performances
of White, one-way cluster-robust and original two-way cluster-robust standard errors.
In Table B.5, the DGP contains ﬁrm effects and time effects, but no persistent common shocks
(ρ = 0). N and T vary from 10 to 250 separately. Column 1 reports the average OLS coefﬁcient

14

estimates, and column 2-5 report the empirical null rejection probabilities for t-statistics based on
White, ﬁrm clustered, time clustered and original two-way clustered standard errors, respectively,
at a two-sided signiﬁcance level 5%. Rejection probabilities of White and clustered t-statistics are
substantially larger than 5%. When N and T are close and both of them are large, the original twoway cluster-robust standard errors work well. Table B.5 shows that when N = T = 50, the rejection
probability is 7.55%. When N = 50 and T = 100, the rejection probability is 6.70%. When N =
T = 100, the rejection probability is 6.65%. When N = 100 and T = 250, the rejection probability
is 4.85%. When N = T = 250, the rejection probability is 6.10%. When N = 250 and T = 100,
the rejection probability is 5.60%. The larger the sample size, the greater the improvement.
The limitation of the original two-way clustering method is that although it considers crosssectional correlation in the same time period, it does not allow for correlation across different
ﬁrms in different time periods. If persistent common shocks such as business cycles exist, failure
to account for them would lead to over-rejections. This approach should take into account crosssection correlation of general form.
Table B.6 to B.8 compare performances of White, one-way cluster-robust and original twoway cluster-robust standard errors when the time effect follows an AR(1) process. We set N = T =
10, 50, 250 respectively. Column 1 reports the average OLS coefﬁcient estimates, and column 2-5
report the empirical null rejection probabilities for t-statistics based on White, ﬁrm clustered, time
clustered and original two-way clustered standard errors, respectively, at a two-sided signiﬁcance
level 5%.
Again, rejection probabilities of White and clustered t-statistics are substantially larger than
5%. When N and T are small, the original two-way clustered standard errors do not work no
matter what value ρ takes. Even when ρ = 0, this method would produce a rejection probability at
12.85%. This conﬁrms that the two-way clustering approach needs both N and T to be sufﬁciently
large. When N = T = 50, different stories happen when ρ is close to zero and when ρ is close
to one. When ρ is close to zero, correlation across different ﬁrms in different time periods are
weak. The original two-way cluster-robust standard errors are still reasonable. For example, when

15

ρ = 0.1, the rejection probability is 6.60%. However, when correlation across different ﬁrms
in different time periods is strong, the original two-way clustering method over-rejects. When
ρ = 0.7, the rejection probability is 22.15%. When ρ = 0.9, the rejection probability rises to
45.60%. Increase in sample size helps improve the inference if ρ is small (|ρ| ≤ .7 in the tables).
For large ρ, increasing N, T makes it even worse for the two-way approach. As shown in Table
B.8, when ρ = 0.1, the rejection probability is 5.05%, while in Table B.7, it is 6.60%. When ρ is
very close to 1, over-rejection becomes more severe. When ρ = 0.9, the rejection probability is
52.65% while in Table B.7 it is 45.60%.
Table B.9 and B.10 compare performances of one-way cluster-robust, original and revised
two-way cluster-robust, and DK standard errors when the time effect follows an AR(1) process.
Usual ﬁxed-b critical values are used for t-statistics based on the DK standard errors. Table B.9
uses the standard OLS estimator, while Table B.10 uses the ﬁxed-effects OLS estimator. We set
N = T = 50, 250. There are several interesting ﬁndings to note. In both tables, one-way clusterrobust standard errors over-reject a lot. The original double clustering method is okay when T is
large and ρ is small. When N = T = 250 and ρ = 0.3, the rejection probability is 6%. The revised
double clustering method has a better performance than the original one only when ρ is large and
the truncation lag is not large. However, this revised method still over-rejects. When N = T = 50,
ρ = 0.9, and the truncation lag L = 5, the rejection probability of the original version is 52.5%
while the rejection probability of the revised version is 29%. When N = T = 250, ρ = 0.9, and the
truncation lag L = 5, the rejection probability of the original version is 50.9% while the rejection
probability of the revised version is 17.1%. Also, rejection probabilities of the revised method
increases as the truncation lag gets bigger. Without including ﬁrm dummies, the DK standard
errors have a strange pattern. Rejection probabilities of the DK standard errors fall as ρ increases.
In Table B.10, rejection probabilities of ﬁrm clustered standard errors are substantially larger than
5%. Rejection probabilities of time clustered standard errors and original two-way cluster-robust
standard errors are very close, since ﬁrm effects are removed by ﬁrm dummies. Similar interesting
patterns are found for the revised double clustering method. The patterns of the DK standard errors

16

are consistent with those in Vogelsang (2012), and they behave very well. When N = T = 250 and
ρ = 0, 0.3, the rejection probabilities are approximately 5% for all values of the bandwidth b. The
DK standard errors still behave well even when ρ = 0.9. When N = T = 250, ρ = 0.9, and b = .9,
the rejection probability is 8.8%.
The strange pattern of the DK standard errors in Table B.9 is caused by the presence of ﬁrm
effects. Theoretical evidence is provided in the next subsection. The patterns of the revised double
clustering method can be explained in two ways. First, as mentioned in subsection 1.2.3, the part
accounts for potential persistent common shocks in the data is exactly the DK standard errors
with truncation kernel. The downweighting causes downward bias of the variance estimator, and
thus over-rejections. This explains why rejection probabilities of the revised version is bigger
than those of the original version. Second, the revised two-way approach relies on the variance
estimator being consistent. Using the traditional approach leads to unreliable inference.

1.3.3

Strange Patterns of the DK Standard Errors

This section presents theoretical evidence to explain the strange patterns of the DK standard errors
in the large-N, large-T case. All limits are taken as N, T → ∞. Proofs are provided in Appendix A.
Consider model (1.1) with xit and εit satisfying (1.2), (1.3), (1.4) and (1.5). Consider testing
the null hypotheses about β of the form
H0 : β = β0 .
Deﬁne the t-statistic as
tDK =

ˆ
β − β0

.

ˆ
VDK

The following theorem summarizes the theoretical results for large-N, large-T case when ﬁrm
dummies are not included in the model.
Theorem 1.1. Suppose model (1.1) has one regressor xit , and the structures of εit and xit satisfy
(1.2), (1.3), (1.4) and (1.5). Suppose ﬁrm dummies are not included in the model. Assume M = bT

17

where b ∈ (0, 1] is ﬁxed. Assume N = φ T such that N → ∞ when T → ∞. The Bartlett kernel is
considered. As T → ∞,
1. If the regressor and errors in model (1.1) contain both ﬁrm effects and time effects, then
√
N β − β ⇒ Q−1
tDK ⇒

1+

1 + φ σ 2 Z1 ,
1
φσ2

·

Z1
,
P(b)

(1.15)
(1.16)

where Z1 ∼ N(0, 1), and P(b) is a random variable depending on bandwidth. Z1 is independent of P(b), and σ 2 is the long run variance of θt δt .
2. If the regressor and errors in model (1.1) only contain ﬁrm effects, then
√
N β − β ⇒ Q−1 Z2 ,
tDK → ∞,

(1.17)
(1.18)

where Z2 ∼ N(0, 1).
3. If the regressor and errors in model (1.1) only contain time effects, then usual ﬁxed-b limits
(see Vogelsang, 2012) are obtained.
Note that when the model satisﬁes (1.2), (1.3), (1.4) and (1.5), it is easy to show that θt δt
satisﬁes a Functional Central Limit Theorem (FCLT). However, it is not necessary to assume that
the time effects θt and δt are independent and they both follow AR(1). The assumption can be
relaxed to allow for a more general setting. We only need to assume that θt δt satisﬁes a FCLT.
− 1 [rT ]
That is, T 2 ∑ θt δt ⇒ σW (r), where W (r) is a standard Wiener process and σ 2 is the long run
t=1
variance of θt δt .
Theorem 1.1 shows that in the presence of ﬁrm effects and time effects, if ﬁrm dummies are not
included, the ﬁxed-b limit of tDK is not asymptotically pivotal as usual. It depends on the ratio,
φ = N , and the long run variance of θt δt , σ 2 . The reason is that the ﬁrm effect destroys the weak
T
dependence needed for results of Vogelsang (2012) to hold. Result (1.16) indicates that the usual

18

ﬁxed-b critical values have to be scaled by a nuisance parameter which is generally unknown in
practice. As a consequence, in practice one would have to either: i) estimate the scaling factor or
ii) include ﬁrm dummies to get back the asymptotically pivotal limit. Yet another important reason
to recommend the inclusion of ﬁrm dummies is the problem of endogeneity. Empirical researchers
are worried about the regressors that are not time-varying, and want to leave out ﬁrm dummies.
However, they must be very careful because solving the endogeneity problem should be a priority.
Including ﬁrm dummies removes the individual heterogeneity that is correlated to the regressors.
Furthermore, if the individual heterogeneity is the source that generates cross-sectional correlation,
the inclusion of ﬁrm dummies would completely eliminate the cross-sectional correlation and thus
one-way clustered standard errors would work.
Table B.11 demonstrates the performance of the DK standard errors in the presence of ﬁrm
effects and AR(1) time effects, using the adjusted ﬁxed-b critical values derived in Theorem 1.1.
Patterns look similar to Vogelsang (2012). For a given N, T, ρ combination, rejection probabilities
are above 5% with small b and they steadily decline as b increases. For a given value of ρ, as
T increases, rejection probabilities approach 5% for all bandwidths. When T = 250 and b = 1,
rejection probabilities are around 7% or 8% when there is strong serial correlation (ρ = 0.9).
Rejection probabilities rise as ρ increases.
When there are no time effects and only ﬁrm effects, the DK standard error estimate tends
to decline toward zero, and thus the t-statistic would go to inﬁnity. Table B.12 illustrates the
performance of the DK standard errors in this case, using the usual ﬁxed-b critical values. Given
N, as T increases, rejection probabilities for the DK standard errors blow up toward 1 for all
bandwidths. In contrast, rejection probabilities for ﬁrm clustered standard errors are close to 5%
when N is large, which is expected because the one-way approach is designed to account for any
form of serial correlation assuming independence in the cross section. Also, when both N and T
are large, the two-way approach gives similar results as the one-way approach.

19

1.4

Conclusion

This chapter compares ﬁnite sample performances of White, FM, one-way cluster-robust, two-way
cluster-robust and DK standard errors using Monte Carlo simulations. If there is only one-way
clustering, one-way clustered standard errors could work very well. However, in the presence
of two-way clustering, one-way clustered standard errors is not sufﬁcient to take into account all
potential correlations in the data. Petersen (2009) suggests applied researchers use original twoway cluster-robust standard errors. When there are no persistent common shocks, this two-way
clustering method is valid and it allows for any unknown form of correlation within clusters. The
limitation of this method is that it does not take into account correlation across different ﬁrms in
different time periods. If we assume the time effect to be a simple AR(1) process which generates correlation across different ﬁrms in different time periods, the original two-way clustering
approach over-rejects when there is strong serial correlation (ρ is large). As a result, we need to
ﬁnd a solution to solve this problem. Thompson (2011) has improved the original formula for the
two-way cluster-robust standard errors to account for correlation across different ﬁrms in different
time periods.
Another alternative solution is to use the DK standard errors which account for heteroskedasticity, autocorrelation and cross-sectional correlation of general and unknown form. The DK standard
errors are valid only when ﬁrm effects are removed. The presence of ﬁrm effects will distort the
results and lead to strange outcomes for the DK standard errors. Theoretical evidences indicate
that the usual ﬁxed-b critical values have to be scaled by a nuisance parameter which is generally
unknown in practice. Therefore, empirical researchers have to choose between estimating the scaling factor and including ﬁrm dummies. Another reason to include ﬁrm dummies is that they would
eliminate the individual heterogeneity that is potentially correlated with the regressors. After ﬁrm
effects are removed, the DK standard errors produce remarkably better performance than other
standard errors.
In sum, using the original two-way cluster-robust standard errors as a robustness check only
works in a special case of double clustering. When persistent common shocks are concerned,
20

the DK standard errors should be considered as a robustness check. However, the DK standard
errors are valid under the assumptions of covariance stationarity and weak dependence in the time
dimension. Also, ﬁrm dummies should be included to remove ﬁrm effects. Otherwise, one has to
estimate the nuisance parameter to adjust the ﬁxed-b critical values.

21

CHAPTER 2
FIXED-b INFERENCE FOR DIFFERENCE-IN-DIFFERENCES ESTIMATION

2.1

Introduction

This chapter focuses on ﬁxed-b asymptotic distributions of the Wald and t statistics for Differencein-Differences (DD) estimation in linear panel settings. Recently, DD estimation has become increasingly popular in policy analysis. DD estimation involves identifying a speciﬁc intervention
or treatment (often a policy change or a passage of a law). Applied researchers then compare the
difference in outcomes before and after the intervention for groups affected by the intervention
(treatment groups) to the same difference for unaffected groups (control groups). Such panel data
sets often contain serial correlation and/or spatial correlation in the cross section. Even though the
correlation structure is not of interest, the failure to account for potential serial and spatial correlation may lead to severe distortions in the inference about parameters of interest. After Bertrand
et al. (2004) pointed out that standard errors robust to serial correlation should be considered in
DD estimation, using clustered standard errors (see Arellano, 1987) has become a standard method
to deal with serial correlation in the DD context. Hansen (2007) extended the results for the traditional short panel case, large-N, ﬁxed-T case, to large-N, large-T and ﬁxed-N, large-T cases.
The clustered standard errors are valid under the assumption that individuals are uncorrelated with
each other. In other words, spatial correlation in the cross section is often ignored. Wooldridge
(2003) provided a useful discussion of cluster methods. Sometimes the cross-sectional observations can be divided into groups or clusters where it is assumed that individuals within a cluster are
correlated while individuals across clusters are uncorrelated. In this case, standard errors robust to
cross-section clustering can be constructed. The number of clusters could be small, though.
In time series econometrics, the nonparametric HAC robust covariance matrix estimator (see
Newey and West, 1987) is widely used. To handle the spatial correlation, robust standard errors

22

can be obtained using the approaches of Conley (1999), Kelejian and Prucha (2007), Bester et al.
(2008), Bester et al. (2011) or Kim and Sun (2011a) when a distance measure is available. Kim and
Sun (2011b) provides results on kernel HAC standard errors in linear panel models with individual
and time dummy variables using a distance measure. When a distance measure is either unavailable
or unknown for the cross section of the panel, the DK approach can be used to obtain robust
standard errors. Driscoll and Kraay (1998) established consistency of these standard errors under
mixing conditions. However, the mixing conditions do not hold for the ﬁxed-effects estimator.
Fortunately, Gonçalves (2011) has established consistency of the DK standard errors for the ﬁxedeffects estimator in the presence of general forms of cross-sectional correlation. A recent paper by
Vogelsang (2012) develops a ﬁxed-b asymptotic theory for test statistics based on the ﬁxed-effects
estimator and the DK standard errors following Kiefer and Vogelsang (2005).
This chapter provides an analysis of the DK standard errors in linear DD models with ﬁxed
effects and individual-speciﬁc time trends. The analysis is accomplished within the ﬁxed-b asymptotic framework proposed by Kiefer and Vogelsang (2005) for HAC estimator based tests. Fixed-b
asymptotics are appealing because they reﬂect the inﬂuence of the choice of kernel and bandwidth
on the behavior of the standard errors while the traditional asymptotics don’t. Large-T framework
is required in the ﬁxed-b approach. According to the survey of DD papers in Bertrand et al. (2004),
among 92 DD papers they found, 10% have at least 36 time periods and 5% have at least 51 time
periods. Therefore, it is feasible to use the DK standard errors for DD estimation to cope with
any general forms of spatial correlation in the cross section given covariance stationarity and weak
dependence in the time dimension. This chapter only considers ﬁxed-N, large-T case. Simulation
results suggest that the asymptotic theory can be extended to large-N, large-T case.
The main objective of this chapter is to derive ﬁxed-b asymptotic distributions of test statistics
constructed using the DD estimator and the DK standard errors. It is found that the ﬁxed-b limits
are different from those derived by Kiefer and Vogelsang (2005) and Vogelsang (2012). The newly
derived ﬁxed-b asymptotic distributions depend on the date of policy change, λ , and individualspeciﬁc trend functions in addition to the choice of kernel and bandwidth. For the individual

23

ﬁxed-effects model with no trend, the ﬁxed-b asymptotic distributions are the same as found in a
pure time series model with a shift in mean. New critical values are simulated in this study and
they have a U-shape with respect to λ . Whether time period dummies are included does not affect
the ﬁxed-b asymptotic distributions. For other regressors that don’t have a structural break, the
ﬁxed-b asymptotic distributions for DK test statistics found in Vogelsang (2012) still apply. The
traditional short panel case is not included. With T ﬁxed, there is not sufﬁcient information in the
time dimension for the DK approach to work.
The remainder of the chapter is organized as follows. The next section describes the DD models
and test statistics. Section 2.3 presents the ﬁxed-b asymptotic results for test statistics constructed
using the DD estimator and the DK standard errors, and new critical values for t statistics in two
special cases. Finite sample properties are examined in Section 2.4. Section 2.5 concludes. Proofs
are given in Appendix C, and tables are given in Appendix D.
Throughout the chapter, xit and β denote the full set of regressors and parameters respectively
in each model. “ ” denotes the transpose, when used in the context of a vector.

2.2

Model Setup and Test Statistics

Consider a DD model with ﬁxed effects and individual-speciﬁc deterministic trends given by

yit = f(t) ai + β1 Treati + β2 DUt + β3 Treati · DUt + uit ,
i = 1, 2, . . . , N,

(2.1)

t = 1, 2, . . . , T,

where yit and uit are scalars, f(t) denotes a J × 1 vector of trend functions, ai denotes a J × 1 vector
of individual-speciﬁc unobservable variables.1 Treati denotes an indicator for individuals in the
treatment group which takes one if individual i is in the treatment group. Without loss of generality,
we assume that the ﬁrst kN individuals are in the treatment group. Thus, Treati = 1(i ≤ kN). DUt
1 a could be either random or deterministic. Asymptotic results will not differ because of the
i
de-trending transformation.
24

denotes an indicator for post-policy-change time periods which takes one after the policy change.
That is, DUt = 1(t > λ T ) = 1(r > λ ), where the parameter λ is the relative date of policy change
within the time sample. Both k and λ are assumed known. Often time ﬁxed effects are included
which gives the model
yit = λt + f(t) ai + β1 Treati + β2 DUt + β3 Treati · DUt + uit .

(2.2)

An alternative model includes common time trends instead of time ﬁxed effects. The asymptotic
results for the alternative model remain unchanged. A more general model with additional regressors is
yit = f(t) ai + β1 Treati + β2 DUt + β3 Treati · DUt + zit γ + uit ,

(2.3)

where zit is a (K × 1) vector of additional regressors. Including time ﬁxed effects gives the model
yit = λt + f(t) ai + β1 Treati + β2 DUt + β3 Treati · DUt + zit γ + uit .

(2.4)

The focus is on estimation and inference about β3 , which explains the impact of a policy change
ˆ
on y. The ordinary least squares (OLS) estimator of β3 , β3 , is usually referred to as DD estimator.
Since we are primarily interested in the DD estimator, we could do a de-trending transformation to
get rid of the unobservable variables λt and ai , similar to the ﬁxed-effects transformation. Therefore, we will call the de-trended OLS estimator the “ﬁxed-effects OLS estimator" in the remainder.
Consider the ﬁxed-effects OLS estimator of β given by
ˆ
β=

N

−1 N

T

∑ ∑

i=1 t=1

˜ ˜
xit xit

where in model (2.1)
 


DU t
β2 


˜
ˆ
β =   , xit = xit − xit = 
,
β3
Treati · DU t

25

T

˜ ˜
∑ ∑ xit yit ,

(2.5)

i=1 t=1

yit = yit − yit ,
˜
ˆ

DU t = DUt − DU t ,

T
T
T
T
−1
−1
f(t). Note
f(t) and DU t = ∑ DUs f(s)
with yit = ∑ yis f(s)
ˆ
∑ f(s)f(s)
∑ f(s)f(s)
s=1
s=1
s=1
s=1
that Treati drops after the transformation as long as f(t) has an intercept. In model (2.2) we have
β = β3 ,
1 N
yit = yit − yit −
˜
ˆ
(y − y jt ),
ˆ
N ∑ jt
j=1
1 N
(x − x jt ) = Treat i · DU t ,
ˆ
xit = xit − xit −
˜
ˆ
N ∑ jt
j=1
with
Treat i = Treati −
Let

1 N
Treat j = 1(i ≤ kN) − k.
N ∑
j=1




DU t


hit = 
.
Treati · DU t
Here, both Treati and DUt drop after the transformation. In model (2.3) we have the same yit and
˜
˜
DU t as in model (2.1) but different β and xit given by
 
 
β2 
 
hit 
˜
β = β3  , xit =   ,
 
 
˜
zit
γ
T
T
−1
˜
ˆ
f(t). In model (2.4), yit , zit , DU t and
˜ ˜
where zit = zit − zit = zit − ∑ zis f(s)
∑ f(s)f(s)
s=1
s=1
˜
Treat i take the same form as in model (2.2). However, β and xit now become
 


β3 
Treat i · DU t 
˜
β =   , xit = 
.
˜
γ
zit
Plugging (2.1), (2.2), (2.3) or (2.4) into (2.5) for yit yields
˜
ˆ
β −β =

N

−1 N

T

∑ ∑

i=1 t=1

˜ ˜
xit xit

26

T

˜
∑ ∑ xit uit .

i=1 t=1

(2.6)

˜
˜
ˆ
˜ ˆ
Let vit = xit uit and deﬁne vit = xit uit where uit are the OLS residuals given by
ˆ
˜ ˆ
uit = yit − xit β .
ˆ
˜
As shown by Driscoll and Kraay (1998), it is possible to obtain standard errors in a panel model
that are robust to spatial correlation of unknown form, as well as heteroskedasticity and serial
correlation, under the covariance stationarity and weak dependence conditions. Deﬁne
N
ˆ
¯
vt =

ˆ
∑ vit ,

i=1
ˆ
¯
and the partial sums of vt as
ˆ
¯
S[rT ] =

[rT ]
ˆ
¯
∑ vt ,

t=1
where r ∈ (0, 1] and [rT ] is the integer part of [rT ]. Let
T

ˆ
¯
Γ j = T −1

∑

t= j+1
and then deﬁne
ˆ
ˆ
¯
¯
Ω = Γ0 +

T −1
k(

∑

j=1

ˆ ˆ
¯ ¯
vt vt− j ,

j ˆ
ˆ
¯
¯
)(Γ j + Γ j ),
M

ˆ
¯
which is the nonparametric kernel HAC estimator using the cross-sectional sum, vt , the kernel,
ˆ
¯
k(x), and bandwidth M. An equivalent expression of Ω is given by
ˆ
¯
Ω = T −1

T

T

ˆ ˆ
¯ ¯
∑ ∑ Ktsvt vs,

t=1 s=1
where

|t − s|
).
Kts = k(
M
ˆ
¯
When Ω is used as the middle term of the sandwich form of the covariance matrix, we obtain the
robust covariance matrix estimator proposed by Driscoll and Kraay (1998)
N T
N T
ˆ
¯
ˆ
˜ ˜
˜ ˜
V = T ( ∑ ∑ xit xit )−1 Ω( ∑ ∑ xit xit )−1 .
i=1 t=1
i=1 t=1

27

Consider testing linear hypotheses about β of the form
H0 : Rβ = r,
where R is a q × K ∗ matrix of known constants with full rank with q ≤ K ∗ and r is a q × 1 vector
of known constants. Deﬁne the Wald statistics as
ˆ
ˆ
ˆ
Wald = (Rβ − r) [RV R ]−1 (Rβ − r).
In the case where q = 1 we can deﬁne the t-statistics
ˆ
Rβ − r
.
t=√
ˆ
RV R
Note that q ≤ 2 in model (2.1) and q = 1 in model (2.2). In these two cases, the focus is on
the asymptotic behavior of the t-statistics under null hypotheses involving restrictions on the DD
estimator. For model (2.3) and (2.4), the asymptotic behavior of the Wald-statistics under null
hypotheses involving linear restrictions on the γ vector is also analyzed.

2.3

Asymptotic Theory and Critical Values

This section analyzes the asymptotic properties of the test statistics under null hypotheses in largeT , ﬁxed-N case. All limits are taken as T → ∞ and N held ﬁxed. Simulated critical values are
p
provided. Throughout, the symbol “⇒” denotes weak convergence. Both “− and “p lim” denote
→”
convergence in probability.
The asymptotic distributions of Wald and t statistics under null hypotheses are obtained using
large-T asymptotics. This approach allows the standard errors to be approximated within the ﬁxedb asymptotic framework developed by Kiefer and Vogelsang (2005) which captures the choice of
kernel and bandwidth in the asymptotic approximation. Moreover, it generates limits that are
invariant to general forms of spatial correlation under assumptions of covariance stationarity and
weak dependence in the time dimension. The asymptotic distributions of the statistics depend
on the form of the kernel used to compute the HAC estimators. Here we focus on Bartlett kernel,
28

k(x) = 1−|x| for |x| ≤ 1 and k(x) = 0 for |x| ≥ 1. Before we proceed, some deﬁnitions are required.
The random matrices that appear in the asymptotic results are expressed in terms of the following
functions and random variables.
Deﬁnition 2.1. Let W (r) denote a generic vector of independent standard Wiener processes. Deﬁne
H F (r, λ ) = 1(r > λ ) −

1
λ

F(s) ds

1
0

F(s)F(s) ds

−1

F(r),

1 F
H (r, λ )dW (r),
0
1
r F
1
−1 r
F(s)F(s) ds
QF (r, λ ,W ) =
H (s, λ )dW (s) −
dW (s)F(s)
F(s)H F (s, λ )ds
0
0
0
0
r F
1 F
−1 F
−
H (s, λ )2 ds
H (s, λ )2 ds
N (W ).
0
0
N F (W ) =

The following deﬁnition deﬁnes some random matrices that appear in the asymptotic results.
Deﬁnition 2.2. Let B(r) denote a generic vector of Brownian bridges. If k(x) is the Bartlett kernel,
let the random matrices, PF (b, λ , QF ), P(b, B), P21 (b, λ , QF , B) and P21 (b, λ , QF , B) be deﬁned
as follows for b ∈ (0, 1]
2 1 F
Q (r, λ ,W )QF (r, λ ,W ) dr
PF (b, λ , QF ) =
b 0
1 1−b F
[Q (r, λ ,W )QF (r + b, λ ,W ) + QF (r + b, λ ,W )QF (r, λ ,W ) ]dr,
−
b 0
2 1
1 1−b
P(b, B) =
B(r)B(r) dr −
[B(r)B(r + b) + B(r + b)B(r) ]dr,
b 0
b 0
2 1 F
1 1−b F
P12 (b, λ , QF , B) =
Q (r, λ ,W )B(r) dr −
[Q (r, λ ,W )B(r + b)
b 0
b 0
+ QF (r + b, λ ,W )B(r) ]dr,
2 1
1 1−b
P21 (b, λ , QF ) =
B(r)QF (r, λ ,W ) dr −
[B(r)QF (r + b, λ ,W )
b 0
b 0
+ B(r + b)QF (r, λ ,W ) ]dr.

For all models, the following assumption on the trend functions is sufﬁcient to obtain the main
results of this chapter.
29

Assumption 2.1. f(t) includes a constant, there exists a J × J diagonal matrix τT and a vector of
t
1
1
functions F, such that τT f(t) = F( T )+o p (1), 0 Fi (r)dr < ∞, i = 1, . . . , J, and det[ 0 F(r)F(r) dr] >
0.

Assumption 2.1 is fairly standard and is the same as the assumption used by Bunzel and Vogelsang
(2005). Note that the standard individual ﬁxed-effects model is a special case with f(t) = 1; the
individual speciﬁc trend model is a special case with f(t) = (1,t) .

2.3.1

Models With No Additional Regressors

This subsection investigates the asymptotic properties of the statistics in models (2.1) and (2.2).
For a given time period t, stack u1t , u2t , . . . , uNt into a N × 1 vector


 u1t 


u 
 2t 
ut =  . 
 . 
 . 


uNt
The following assumption is sufﬁcient to obtain results for the ﬁxed-effects OLS estimator based
on model (2.1) and (2.2).
1
− 2 [rT ]
Assumption 2.2. T
∑ ut ⇒ ΛWN (r), where WN (r) is an N × 1 vector of independent stant=1
dard Wiener processes and ΛΛ is the N × N long run variance matrix of ut .
For a given time period t, stacking the N cross-section errors in the same period into a vector
accounts for general forms of spatial correlation. Assumption 2.2 holds under covariance stationarity and weak dependence in the time dimension. It essentially requires that ut satisfy a functional
central limit theorem (FCLT). Here, ΛΛ is not restricted to be diagonal. Therefore, the assumption
allows for general forms of spatial correlation. Stationarity is not required in the cross section for
large-T , ﬁxed-N case. This is analogous to large-N, ﬁxed-T case where the random sampling in
the cross section allows for general forms of serial correlation in model, including nonstationarity.
30

Before we start to derive the results in model (2.1), it is worth noting that the t-statistics on the
DD estimator in the following three models are exactly the same.2
1. yit = ai + β1 Treati + β2 DUt + β3 Treati · DUt + uit ,
2. yit = λt + β1 Treati + β2 DUt + β3 Treati · DUt + uit ,
3. yit = ai + λt + β1 Treati + β2 DUt + β3 Treati · DUt + uit ,
where ai is a full set of individual dummies, and λt is a full set of time period dummies. This exact
equivalence result directly implies that whether time period dummies are included does not affect
the limit of the t-statistic on the DD estimator in the individual ﬁxed-effects model. Proofs of the
exact equivalence result are provided in Appendix C. Furthermore, Monte Carlo simulation results
suggest this exact equivalence continue to hold when trend is also included in the model. Proofs
are not given for this special case.
Let



1, 1, . . . , 1, 1, . . . , 1
A=

1, 1, . . . , 1, 0, . . . , 0

where A is a 2 × N matrix with all elements in the ﬁrst row and ﬁrst kN elements in the second
row equal to one. Let G = AA . The following proposition and lemma present the asymptotic
ˆ
distributions of (β − β ) and the partial sums in model (2.1).
Proposition 2.1. Suppose Assumption 2.1 and 2.2 hold. Let W ∗ (r) denote a 2 × 1 vector of standard Wiener processes and let Λ∗ denote the matrix square root of the matrix AΛΛ A . In model
(2.1), for N ﬁxed as T → ∞ the following holds:
√
ˆ
T (β − β ) ⇒ G

1 F
−1 ∗ 1 F
H (r, λ )2 dr
·Λ
H (r, λ )dW ∗ (r).
0
0

Lemma 2.2. Suppose Assumption 2.1 and 2.2 hold. Assume M = bT where b ∈ (0, 1] is ﬁxed. Let
W ∗ (r) denote a 2 × 1 vector of standard Wiener processes and let Λ∗ denote the matrix square
2 The result also holds when a global intercept is included.

31

root of the matrix AΛΛ A . In model (2.1), for N ﬁxed as T → ∞ the following holds:
T

1
−2 ˆ
¯
S[rt] ⇒ Λ∗ QF (r, λ ,W ∗ ).

When k(x) is the Bartlett kernel, from calculations in Hashimzade and Vogelsang (2008a) we
have

T −1
T −M−1
2
ˆ
ˆ ˆ 1
ˆ
ˆ ˆ
ˆ
¯
¯ ¯
¯
¯ ¯
¯
(2.7)
Ω = T −2 ∑ St St − T −2 ∑ (St St+M + St+M St )
b
b
t=1
t=1
ˆ = 0. The following proposition presents the ﬁxed-b limit of the HAC esti¯
using the fact that ST
mator.
Proposition 2.3. Suppose Assumption 2.1 and 2.2 hold. Assume M = bT where b ∈ (0, 1] is ﬁxed.
Let W ∗ (r) denote a 2 × 1 vector of standard Wiener processes and let Λ∗ denote the matrix square
root of the matrix AΛΛ A . In model (2.1), for N ﬁxed as T → ∞ the following holds:
ˆ
¯
Ω ⇒ Λ∗ PF (b, λ , QF )Λ∗ .
Based on Proposition 2.1 and 2.3, the following theorem summarizes the theoretical results for
model (2.1).
Theorem 2.1. Suppose the model does not include time period dummies nor additional regressors.
∗∗
Suppose Assumption 2.1 and 2.2 hold. Assume M = bT where b ∈ (0, 1] is ﬁxed. Let Wq denote
the q × 1 vector of standard Wiener processes. For N ﬁxed as T → ∞,
∗∗
∗∗
Wald ⇒ N F (Wq ) PF (b, λ , QF∗∗ )−1 N F (Wq )
q
t⇒

∗∗
N F (W1 )
PF (b, λ , QF∗∗ )
1

Theorem 2.1 demonstrates that asymptotically pivotal test statistics are obtained within the
ﬁxed-b framework in the presence of spatial correlation in the cross section. Therefore, the statistics based on the DK standard errors under ﬁxed-b asymptotics have broader robustness properties with respect to correlation in the model. The limiting distributions differ from those derived by Kiefer and Vogelsang (2005) and Vogelsang (2012) in the following two ways. First,
32

the ﬁxed-b limits here depend on not only the choice of kernel and bandwidth, but also the date
of policy change, λ , and individual-speciﬁc trend functions. Second, the asymptotic distribution
is different from Vogelsang (2012) because DUt is deterministic and thus there are some extra
∗∗
terms in the asymptotic distribution of partial sums. N F (Wq ) follows a normal distribution, and
PF (b, λ , QF∗∗ ) is a random matrix which depends on the date of policy change, trend functions
q
∗∗
and the choice of kernel and bandwidth. Moreover, N F (Wq ) and PF (b, λ , QF∗∗ ) are indepenq
dent. The limiting distributions of the test statistics are identical to the results in the pure time series
model with a shift in mean and deterministic trends. The limiting distributions are non-standard,
but critical values can be obtained using simulation methods.
Corollary 2.2. Suppose model (2.1) is a standard individual ﬁxed-effects model with no time
∗∗
˜
trends. That is, f(t) = 1. Deﬁne λW (1) − W (λ ) = (λ − 1)W ( λ ). Let Wq denote the q × 1
λ −1
vector of standard Wiener processes. Then
λ
˜
N F (W ) = λW (1) −W (λ ) = (λ − 1)W (
),
λ −1
r F
r F
r F
QF (r, λ ,W ) =
H (s, λ )dW (s) −W (1)
H (s, λ )ds −
H (s, λ )2 ds
0
0
0
1 F
−1 F
·
H (s, λ )2 ds
N (W ).
0
H F (r, λ ) = 1(r > λ ) − (1 − λ ),

For N ﬁxed as T → ∞, the following hold
√
ˆ
T (β − β ) ⇒

1
λ
˜
G−1 Λ∗ (λ − 1)W (
)
λ (1 − λ )
λ −1

∗∗
∗∗
Wald ⇒ N F (Wq ) PF (b, λ , QF∗∗ )−1 N F (Wq ),
q

t⇒

∗∗
N F (W1 )
PF (b, λ , QF∗∗ )
1

Corollary 2.2 provides results for a standard individual ﬁxed-effects DD model. The limits are
identical to the results in the pure time series model with a shift in mean.
When time period dummies are also included in the model (2.2), the limiting distributions of
the statistics remain the same due to the exact equivalence result. This ﬁnding is useful since
empirical researchers often put a full set of time period dummies in their model.

33

2.3.2

Models With Additional Regressors

This subsection analyzes the asymptotic properties of the statistics in models (2.3) and (2.4). Some
additional notations in this subsection are needed as follows. Let Ih denote a h × h identity matrix.
Let ι denote an N × 1 vector of ones. Let ei denote a N × 1 vector with ith element equal to one
and zeros otherwise, i.e.
ei = (0, 0, . . . , 0, 1, 0, . . . , 0) .
Deﬁne a K × (K + 1) matrix B and a K × N(K + 1) matrix Ai as follows
B = [0, IK ],

Ai = (ei ⊗ B).

Let e1 denote an (K + 1) × 1 vector with 1st element equal to one and zeros otherwise, i.e.
˜
e1 = (1, 0, . . . , 0) .
˜
Let e1 denote an (NK + 1) × 1 vector with 1st element equal to one and zeros otherwise, i.e.
¯
e1 = (1, 0, . . . , 0) .
¯
The following assumption on additional regressors zit is sufﬁcient to obtain results for the ﬁxedeffects OLS estimator based on models (2.3) and (2.4).
Assumption 2.3. Suppose there is no structural change for zit within the entire sample periods.
[rT ]
T
˜ ˜
Assume that p lim T −1 ∑t=1 zit = µi ≡ E(zi ) and p lim T −1 ∑t=1 zit zit = rQi for r ∈ (0, 1] where
N
¯
¯
Q = ∑ Qi and Q is nonsingular.
i=1
Note that Assumption 2.3 requires that the additional regressors don’t have structural change
before and after the policy change. In other words, zit is uncorrelated with Treati and DUt . Under
this assumption, zit is included to reduce the variance of the error. However, empirical researchers
are more interested in the case where the additional regressors also have a structural change. In this
case, the ﬁxed-b limits for test statistics based on the zit coefﬁcients may not be the usual ﬁxed-b
limits.
34

To handle the case where additional regressors are also included (model 2.3), Assumption 2.2
needs to be strengthened as follows. Stack the additional regressors zit and trend functions and
consider the reduced form of the T × K stacked vector zi –that is, the linear projection of zi onto
the space spanned by the T × J stacked vector of trend functions f(T )–with an error term as
zi = f(T )bi + ei ,
˜
where ei is a T × 1 vector and bi is a J × K vector. It is easy to show that zit are the OLS residuals
given by
ˆ
˜
zit = zit − bi f(t),
ˆ
where bi is the OLS estimator of bi . Deﬁne the (K + 1) × 1 vector


uit

ii 
vt = 
.
(zit − bi f(t))uit
11
NN
Stack the vectors vt , . . . , vt to form the N(K + 1) × 1 vector of time series



11
vt 



 v22 
 t 
vt =  .  .
 . 
 . 


NN
vt
1
− 2 [rT ]
˙
∑ vt ⇒ ΛW (r), where W (r) is an N(K +1)×1 vector
t=1
˙˙
of standard Wiener processes and ΛΛ is the N(K + 1) × N(K + 1) long run variance matrix of vt .
Assumption 2.4. E(uit |zit ) = 0 and T

Assumption 2.3 requires that the sample mean and sample variance-covariance matrix of the
additional regressors across time have well-deﬁned limits. The form of Qi depends on the form
of dummies included in the model and the choice of the trend functions. Assumption 2.4 allows
weak exogeneity in the cross section and over time and requires a FCLT holds for vt . Because Qi
˙˙
is not restricted to be identical for all i and because the form of ΛΛ is not restricted to be block

35

diagonal, the assumptions allow for heterogeneity in the conditional heteroskedasticity and serial
correlation as well as general forms of spatial correlation.
˜
The following lemma shows that hit and zit are asymptotically uncorrelated.
Lemma 2.4. Under Assumption 2.1 and 2.3, for N ﬁxed and as T → ∞, the following holds
T −1

N [rT ]

p

˜ →
∑ ∑ hit zit − 0

i=1 t=1

N T
p
˜ →
In particular, when r = 1, T −1 ∑ ∑ hit zit − 0.
i=1 t=1
Let



R11 R12 
R=

R21 R22

where R11 is a q1 × 2 matrix, R12 is a q1 × K matrix, R21 is a q2 × 2 matrix and R22 is a q2 × K
matrix. Usually we pay attention to restrictions either on the DD estimator or on the additional
explanatory variables, not on both of them at the same time. In other words, we are interested in
the cases when q2 = 0 and R12 = 0, or when q1 = 0 and R21 = 0. The next theorem presents the
results for model (2.3).
Theorem 2.3. Suppose the model includes additional regressors but no time period dummies.
¯
Suppose Assumption 2.1, 2.3 and 2.4 hold. Assume M = bT where b ∈ (0, 1] is ﬁxed. Let W (r)
denote a q1 × 1 vector of standard Wiener processes. Let Wq (r) denote a q2 × 1 vector of standard
˙
Wiener processes. Let W ∗ (r) denote a 2 × 1 vector of standard Wiener processes and Λ∗ is the
matrix square root of the matrix (A ⊗ e1 )ΛΛ (A ⊗ e1 ) . For N ﬁxed as T → ∞, the following hold:
˜ ˙˙
˜


1 H F (r, λ )2 dr)−1 (Λ∗ 1 H F (r, λ )dW ∗ (r)
˙
√
(G 0

0
ˆ
T (β − β ) ⇒ 
.
˙
¯
Q−1 (∑N Ai )ΛW (1)
i=1

36

If q2 = 0 and R12 = 0, that is, we are testing restrictions on the DD estimator, then R = [R11 , 0].
¯
¯
¯
Wald ⇒ N F (W ) (PF (b, λ , QF ))−1 N F (W ),
t⇒

¯
N F (W )
.
F (b, λ , QF )
¯
P1
1

If q1 = 0 and R21 = 0, that is, we are testing restrictions on the additional regressors, then R =
[0, R22 ].
Wald ⇒ Wq (1) Pq (b, B)−1Wq (1),
t⇒

Wq (1)
.
Pq (b, B)

Theorem 2.3 provides some interesting insights into doing inference for DD estimator and the
ˆ
zit coefﬁcient estimator γ under ﬁxed-b asymptotics. If we only focus on testing restrictions on
DD estimator, the limiting distributions of test statistics are the same as the results in Theorem
ˆ
2.1. If we only want to test restrictions on γ, the limiting distribution of test statistics are identical
to the results in Vogelsang (2012). Note that the limiting distributions of test statistics based on
ˆ
γ are invariant to trend functions. In either case, the test statistics are asymptotically pivotal.
Nevertheless, testing restrictions on both of them at the same time is much more complicated. The
test statistics are no longer asymptotically pivotal. General forms of the limits of the test statistics
are provided in the proof of Theorem 2.3 in Appendix C.
The most general model including both additional regressors and time period dummies (model
2.4) requires a stronger assumption than Assumption 2.4. To cope with this case, Assumption 2.4
ij
needs to be strengthened in the following way. Deﬁne the K × 1 vector vt = (zit − bi f(t))u jt . For
1j 2j
Nj
a given j stack u jt and the vectors vt , vt , . . . , vt into an (NK + 1) × 1 vector


u jt


 1j
v 
 t 
j  2j
vt =  v  ,
 t 


 . 
.
 . 


Nj
vt
37

1 2
N
and then stack the vectors vt , vt , . . . , vt into an N(NK + 1) × 1 vector



1
 vt 
 
 2
ex =  vt  ,
vt
 . 
 . 
 . 
 
N
vt
ij
where the “ex” superscript denotes an extended vector that includes vectors vt for i = j.
− 1 [rT ] ex
Assumption 2.5. E(uit |z jt ) = 0 for all i, j = 1, 2, . . . , N and T 2 ∑ vt ⇒ ΛexW ex (r), where
t=1
W ex (r) is an N(NK + 1) × 1 vector of standard Wiener processes and Λex Λex is the N(NK + 1) ×
N(NK + 1) long run variance matrix of vt .

Assumption 2.5 requires strict exogeneity in the cross section but allows weak exogeneity over
ex
time. It also requires that a FCLT hold for the extended vector vt . Here, Λex Λex is not restricted
to be block diagonal, which permits general spatial correlation. Assumption 2.4 and 2.5 indicate
that the form of exogeneity needed depends on whether or not time period dummies are included
in the model. Without time period dummies, only weak exogeneity is required in both the time and
cross-section dimensions. When time period dummies are included, strict exogeneity is needed in
the cross-section dimension while only weak exogeneity is required in the time dimension.
Like results in model (2.2), including time period dummies does not affect the ﬁxed-b limits.
The following theorem summarizes the results for model (2.4). Note that Assumption 2.4 is now
replaced with the stronger Assumption 2.5.
Theorem 2.4. Suppose the model includes both additional regressors and time period dummies.
˜
Suppose Assumption 2.1, 2.3 and 2.5 hold. Assume M = bT where b ∈ (0, 1] is ﬁxed. Let A =
2
ex∗
˜
˜˜
[1 − k, . . . , 1 − k, −k, . . . , −k] and G = AA = ∑N Treat i . Let W1 (r) denote a standard Wiener
i=1
˜ ¯
˜ ¯
processes with long run variance Λex∗2 = (A ⊗ e1 )Λex Λex (A ⊗ e1 ) . For N ﬁxed as T → ∞, the
1

38

following hold:


1 H F (r, λ )2 dr)−1 Λex∗ 1 H F (r, λ )dW ex∗ (r)
˜
√
(G 0

1 0
1
ˆ
T (β − β ) ⇒ 

¯
Q−1 (∑N Aex )ΛexW ex (1)
i=1 i
and the limits of the statistics are the same as given by Theorem 2.3.

Theorem 2.4 demonstrates that results for statistics in Theorem 2.3 continue to hold when time
period dummies are included. This is consistent to the ﬁndings in model (2.2).

2.3.3

Asymptotic Critical Values

The asymptotic critical values for Wald and t statistics based on DD estimator can be obtained
through Monte Carlo simulations. To keep the analysis straightforward, we consider the case
q = 1 and focus on the individual ﬁxed-effects model and the individual-speciﬁc trend model.
The asymptotic critical values are simulated using 50, 000 replications. The Wiener processes are
approximated by normalized sums of i.i.d. N(0, 1) errors using 1000 steps. The critical values for t
statistics in the standard individual ﬁxed-effects model are presented in Table D.1-D.4. The critical
values for t statistics in the individual-speciﬁc trend model are presented in Table D.5-D.8. Using
the Bartlett kernel, critical values are computed for the percentage points 90%, 95%, 97.5%, and
99%. Right tail critical values are given. The left tail critical values follow from symmetry around
zero. The policy change point λ goes from 0.1 to 0.9 with step size 0.1. The bandwidths b starts
from 0.02 to 1 with step size 0.02.
The critical values are invariant to the values of k. For a given b, the critical values are symmetric around λ = 0.5 with respect to λ . The minimum value occurs at λ = 0.5. As λ approaches
zero or one, the critical values increase. This pattern is the same as the pure time series model with
a known structural break (see Cho, 2012). For a given λ , with b = 0.02, critical values are close to
N(0, 1) regardless of the choice of trend functions. As b grows, tails get fatter. With b = 1 tails are
quite fat. For different choices of trend functions, tails get fatter in different rates. For example,
when λ = 0.5, in the standard individual ﬁxed-effects model the critical values at 5%/2.5% tails
39

with b = 0.02 and b = 1 are 1.712/2.056 and 4.781/5.958, respectively, while in the individual
speciﬁc model, the critical values at 5%/2.5% tails with b = 0.02 and b = 1 are 1.745/2.073 and
5.098/6.395, respectively. Therefore, tails get fatter more quickly in the individual-speciﬁc trend
model. The critical values predict that if N(0, 1) critical values are used for t statistics, then for a
given value of T , as bandwidth M increases, b increases and thus t will over-reject.

2.4

Finite Sample Properties

This section analyzes ﬁnite sample performances of the DK standard errors using a simulation
study. Because using traditional clustered standard errors is the most common method to conduct
robust inference for DD estimator, the ﬁxed-b approximations for the DK standard errors given by
the theorems are compared with the standard normal approximations for traditional clustered and
the DK standard errors. “tclus ” denotes t-statistics constructed using traditional clustered standard
errors and “tDK ” denotes t-statistics constructed using the DK standard errors.
Since applied researchers are interested in the double clustering approach proposed by Cameron
et al. (2011) and Thompson (2011), ﬁnite sample performances of the two-way clustered standard
errors are also included. “tdouble ” denotes t-statistics constructed using the original formula of
r
the double clustering approach, while “tdouble ” denotes t-statistics constructed using the revised
formula. The revised formula is
ˆr
ˆ
ˆ
ˆ
Vdouble = V f irm + Vtime,0 − VW hite,0 +

L

L

l=1

l=1

ˆ
ˆ
ˆ
ˆ
∑ (Vtime,l + Vtime,l ) − ∑ (VW hite,l + VW hite,l ),
(2.8)

with
ˆ
ˆ
V f irm = Q−1
ˆ
ˆ
Vtime,l = Q−1
ˆ
ˆ
VW hite,l = Q−1

N
ˆˆ
∑ si si

i=1
T

∑

ˆ
Q−1 ,

ˆ
ˆ ˆ
st st−l Q−1 ,

t=l+1
N
T

∑ ∑

i=1 t=l+1
40

ˆ
ˆ ˆ
vit vi,t−l Q−1 .

T
N
ˆ
ˆ
ˆ
ˆ
si = ∑ vit is the sum of all observations for individual i. st = ∑ vit is the sum of all observations
t=1
i=1
for time t. The original formula only contains the ﬁrst three terms in (2.8)
ˆ
ˆ
ˆ
ˆ
Vdouble = V f irm + Vtime,0 − VW hite,0 .

(2.9)

The DGP used for the simulations is very similar to the one used in Vogelsang (2012). The model
is
yit = ci + git + β1 Treati + β2 DUt + β3 Treati · DUt + zit γ + uit ,

(2.10)

where
uit = ρui,t−1 + εit ,

ui0 = 0,

εit ∼ N(0, 1),

cov(εit , ε js ) = 0 f or t = s;

zit = ρzi,t−1 + eit ,

zi0 = 0,

eit ∼ N(0, 1),

cov(eit , e js ) = 0 f or t = s.

ci is the individual ﬁxed effects and git is the individual-speciﬁc simple linear trend. In all cases,
all coefﬁcients are set to zero. Also set ci = 0, gi = 0, k = 0.5 and λ = 0.5. Note that we can
set ci = 0 without loss of generality because the ﬁxed effects OLS estimator is exactly invariant
to ci = 0. Only one additional regressor zit is included and it is uncorrelated with uit . zit and uit
are modeled as AR(1) processes with the same autoregressive parameter ρ. εit and eit have spatial
correlation in the cross section, though uncorrelated over time. In particular, they are constructed
in the following way. For a given time period, t, N i.i.d. N(0, 1) random variables are placed on
a square grid. At each grid point, εit is constructed as the weighted sum of the normal random
variable at that grid point, the normal random variables that are one step away to the left, right,
up or down on the grid with a weight θ and the normal random variables that are two steps away
in the same direction with a weight θ 2 . Hence, εit is a spatial MA(2) process with parameter θ
and the distance measure is maximum coordinate-wise distance on the grid. eit is constructed in a
similar way. In all cases, θ = 0.5.
Results are given for sample sizes T = 10, 50, 250 and N = 10, 50, 250 for AR(1) errors, and
N = 9, 49, 256 for spatial MA(2) errors. The number of replications is 2,500 in all cases and
the signiﬁcance level is 5%. Results are reported for the Bartlett kernel. Fixed-effects OLS as
41

discussed in section 2.2 is used to estimate the model. Results for testing the null hypothesis
H0 : β3 = 0 against the alternative H1 : β3 = 0 are labeled tDD . Results for testing the null
hypothesis H0 : γ = 0 against the alternative H1 : γ = 0 are labeled tz .
Tables D.9–D.11 reports empirical null rejection probabilities for tclus and tDK statistics in the
individual ﬁxed-effects model with no additional regressor zit . Tables D.12–D.15 reports empirical
null rejection probabilities for tclus and tDK statistics in the individual-speciﬁc trend model with
no additional regressor zit . Tables D.16–D.17 reports empirical null rejection probabilities for tDD
and tz statistics when one additional regressor zit is included. Table D.18 compares the empirical
r
null rejection probabilities for tclus , tdouble , tdouble and tDK in the individual ﬁxed-effects model
with no additional regressor zit . Tables D.9, D.11, D.12 and D.14 consider AR(1) errors, while the
other tables focus on the spatial MA(2) errors. In Tables D.11, D.14 and D.15, a full set of time
period dummies is included.
A small selection of bandwidths are considered, b = 0.02, 0.06, 0.1, 0.4, 0.7, 1. The autocorrelation parameter ρ = 0, 0.3, 0.6, 0.9. For tDK two sets of null rejection probabilities are reported.
The ﬁrst set uses the 5% N(0, 1) critical value. The second set uses the new ﬁxed-b critical valr
ues (adjusted ﬁxed-b critical values) obtained in subsection 2.3.3. For tclus , tdouble and tdouble ,
rejection probabilities are reported using the 5% N(0, 1) critical value.
There are several points worth noting. First, looking at Tables D.9 and D.11, the rejection
probabilities for each combination of N, T , ρ and b are exactly the same in these two tables. This
pattern demonstrates the exact equivalence result shown in subsection 2.3.1. Similar patterns can
be found in Table D.12 and D.14 with AR(1) errors, and Table D.13 and D.15 with spatial MA(2)
errors. These four tables suggest that the exact equivalence continue to hold in the individualspeciﬁc trend model with no additional regressors, despite the correlation structure of the error.
Next, similar patterns for tDK can be found in all tables. Patterns for tDK are quite different
when N(0, 1) critical value is used compared to when the adjusted ﬁxed-b critical values are used.
Using N(0, 1) critical value, rejection probabilities tend to be much higher than 5% and this overrejection problem gets worse as b increases or as ρ increases. Only when b is small, T is large,

42

and ρ is close to zero are rejection probabilities close to 5%. In contrast, when the adjusted
ﬁxed-b critical values are used, the over-rejection problem is less severe. For a given N, T, ρ
combination, rejection probabilities are above 5% with small b and they steadily decline as b
increases. For a given value of ρ, as T increases, rejection probabilities approach 5% for all
bandwidths. When T = 250 and b = 1, rejection probabilities are around 8% or 9% when there
is strong serial correlation (ρ = 0.9). In the presence of spatial correlation, rejection probabilities
for tclus are substantially larger than 5%. This is expected since the traditional clustered standard
errors are not robust to the spatial correlation in the cross section. For AR(1) errors in table D.9 and
D.12, the traditional clustered standard errors behave well, and can outperform the DK standard
errors when there is strong serial correlation and the bandwidth is small.
The patterns in the rejection probabilities of tDK are similar to Vogelsang (2012). As explained
ˆ
¯
in Vogelsang (2012), the bias in Ω consists of two parts. One part depends on the strength of the
serial correlation and this bias rises as the serial correlation becomes stronger, which explains why
the over-rejection problem gets worse as ρ increases. This bias causes over-rejection for either
the N(0, 1) critical value or the adjusted ﬁxed-b critical values. However, this bias declines as b
increases. The other part is captured by the adjusted ﬁxed-b approximations, but not the N(0, 1)
approximations. Therefore, over-rejection becomes less severe when ﬁxed-b critical values are
ˆ
¯
used. It is shown (see Vogelsang, 2008) that as b increases, bias in Ω initially decreases but
ˆ
¯
then increases as b increases further. Because of this, when b is close to one, Ω has substantial
downward bias and tDK tends to over-reject when the N(0, 1) critical value is used. Overall, the
N(0, 1) approximations do not reﬂect the inﬂuence of the bandwidth, and thus using the N(0, 1)
critical value may lead to severe distortions in rejections. In contrast, the ﬁxed-b approximations
ˆ
¯
capture most of the bias in Ω. In addition, the part that they cannot capture decreases as b increases.
This demonstrates why the rejection probability of tDK is lowest at b = 1 when adjusted ﬁxed-b
critical values are used.
Tables D.16 and D.17 report empirical null rejection probabilities in the individual ﬁxed-effects
model and individual-speciﬁc trend model with one additional regressor zit , respectively. For tDD ,

43

the adjusted ﬁxed-b critical values are used. For tz , the usual ﬁxed-b critical values in Kiefer and
Vogelsang (2005) and Vogelsang (2012) are used.
Note that the usual ﬁxed-b critical values are used for tz because there is no structural break in
zit . These critical values are invariant to the choices of trend functions. Patterns of the rejection
probabilities are consistent to the ﬁndings in Vogelsang (2012). The ﬁxed-b approximation for
tDD reﬂects the change of trend functions when a simple linear trend is included in the model.
Table D.18 reports the null rejection probabilities for the individual ﬁxed-effects model with
spatial MA(2) errors. Note that the correlation structure here is different from that used in chapter
1. The results illustrate that the DK standard errors using ﬁxed-b approximations lead to much
more accurate inference than the two-way clustered standard errors in the presence of a different
form of cross-sectional correlation. The ﬁndings are similar to those in chapter 1. The original
double clustering method is okay when T is large and ρ is small. The revised double clustering
method has a better performance than the original one only when ρ is large and the truncation
lag is small. The rejection probabilities of the revised method increases as the truncation lag gets
bigger. The DK approach using ﬁxed-b critical values outperform the double clustering approach
when the bandwidth is chosen appropriately.

2.5

Conclusion

This chapter derives a ﬁxed-b asymptotic theory for test statistics in DD models with ﬁxed effects
and individual speciﬁc trends in linear panel settings. The standard errors proposed by Driscoll and
Kraay (1998) that are robust to heteroskedasticity, autocorrelation and spatial correlation of general
form are analyzed. This chapter establishes the conditions under which the DK standard errors lead
to valid tests in linear DD models with ﬁxed effects and individual-speciﬁc time trends for ﬁxed-N,
large-T case. It is shown that the ﬁxed-b asymptotics for tests on the DD estimator are different
from the limits in Vogelsang (2012), but they are identical to the limits in the pure time series
model with a shift in mean for the individual ﬁxed-effects model. The tests on additional regressors
without a structural break have the same ﬁxed-b asymptotic distributions as in Vogelsang (2012).
44

The exact equivalence result is found for the cases when only individual dummies are included,
when only time period dummies are included and when both sets of dummies are included. As
a result, whether time period dummies are included in the model does not affect the asymptotic
distribution. It is also shown that the ﬁxed-b asymptotics for tests on DD estimator depend on
the individual-speciﬁc deterministic trends included and the date of policy change λ . New critical
values are simulated for individual ﬁxed-effects model and individual speciﬁc trend model. For
each value of bandwidth, the adjusted critical values shows a U-shaped pattern in λ . Tails get
fatter in different rates for different trend functions. Simulation results illustrate that the use of
ﬁxed-b critical values will lead to much more reliable inference in practice in the presence of
spatial correlation.
In a more interesting case where the additional regressors also have a structural change, the
ﬁxed-b limits of test statistics on the zit parameter would change. The conjecture of the ﬁxed-b
asymptotic distributions in this case would be similar to the ﬁndings in the pure time series model
with a structural break (see Cho, 2012).

45

CHAPTER 3
FINITE SAMPLE PERFORMANCES OF THE MOVING BLOCKS BOOTSTRAP FOR
LINEAR DIFFERENCE-IN-DIFFERENCES MODELS WITH INDIVIDUAL FIXED
EFFECTS

3.1

Introduction

This chapter studies ﬁnite sample performances of the bootstrap procedure for linear Difference-inDifferences (DD) models with individual ﬁxed effects. The bootstrap method consists of randomly
resampling the original data many times and then using the quantities computed from the simulated
pseudo-data to make inference from the original observed data. This chapter discusses bootstrap
methods in the context of hypothesis testing. Bootstrap methods are widely used in empirical
studies, especially when distributions of test statistics are nonstandard and critical values are complicated to compute, or difﬁcult to derive theoretically. Moreover, it is not even necessary for us to
know the asymptotic distribution when applying the bootstrap method.
What determines the reliability of the bootstrap is how well the bootstrap data generating process (DGP) mimics the features of the true DGP. The bootstrap has originally been proposed by
Efron (1979) for independent and identically distributed (i.i.d.) data. Later, the wild bootstrap
has been proposed by Wu (1986) to take into account heteroskedasticity. It becomes more complicated to implement bootstrap methods for dependent data. Several bootstrap procedures have
been proposed for time series data, including the moving blocks bootstrap (MBB) proposed by
Kunsch (1989) and Liu and Singh (1992). More recently, the bootstrap is applied to panel data
models. Following the approach in Gonçalves (2011), the so-called “panel MBB” method is used
in this chapter. This method applies the standard MBB to the time series of vectors containing all
the individual observations at each time period. Since this method only resamples the vectors at
each time period, it preserves the potential cross-sectional correlation structure in the data. Therefore, the panel MBB allows for inference that is robust to heteroskedasticity, serial correlation and

46

cross-sectional correlation of unknown form. Also, we use the naive bootstrap where the formula
used to compute the standard errors on the resampled data is the same as the formula used on the
original data.
The DD coefﬁcient is of interest and the estimation method is the ﬁxed-effects ordinary least
squares (OLS) estimator. The main focus is on the tests based on the DD estimator and the DK
standard errors. In particular, we consider panels with many time periods where the Driscoll and
Kraay, 1998 (DK) standard errors are valid. The DD estimator becomes more and more popular in recent empirical researches because it allows us to evaluate the causal effects of a policy
change. Researchers are concerned with the reliability of the inference based on the DD estimator. There has been an extensive research to seek robust inference for DD models. As pointed
out in Bertrand, Duﬂo, and Mullainathan, 2004 (BDM), ignoring the presence of serial correlation leads to very unreliable inference. Wooldridge (2003) and other econometricians had already
been strongly suggesting the use of clustered standard errors. Motivated by the results in BDM,
using clustered standard errors has become a common method in empirical works. Alternatively,
Bertrand et al. (2004) also suggested using the blocks bootstrap method where each cluster is a
block. Take a state-level data for example, this method ﬁrst stacks residuals for each state into
vectors and then randomly draws with replacement for each state a new residual vector from this
distribution, leaving residuals within each state unchanged. The bootstrap method is straightforward and easy to implement. However, both of these two methods lead to biased inference when
the number of clusters is small. Based on the work of BDM, Cameron, Gelbach, and Miller, 2008
(CGM) proposed a wild bootstrap-based procedure. Following CGM, applied researchers use the
wild cluster bootstrap method to obtain improved inference. Usually it is assumed that data are
independent in the cross section dimension, or are independent across clusters, but are correlated
in the time dimension. This chapter explores improved inference that is robust to cross-sectional
correlation of more general form.
In linear panel models with individual ﬁxed effects, a recent paper by Gonçalves (2011) has
provided both theoretical and simulation evidences indicating that the panel MBB, including the

47

i.i.d. bootstrap, outperforms the standard normal approximation and closely mimics the ﬁxed-b approximation proposed in Vogelsang (2012) when a standard nonparametric heteroskedasticity and
autocorrelation consistent (HAC) variance estimator is used to compute test statistics. Gonçalves
and Vogelsang (2011) have also found similar results in pure time series models. Following the
approach of Kiefer and Vogelsang (2005) and Vogelsang (2012), in chapter 2 we have derived the
asymptotic distributions of test statistics based on the DD estimator and the DK standard errors,
assuming that the bandwidth is a ﬁxed proportion of the sample size in time dimension. This new
ﬁxed-b limiting distribution is different from the one proposed in Vogelsang (2012). Therefore, the
ﬁrst-order asymptotic validity of the panel MBB needs to be examined in linear DD models.
The main goal of this chapter is to analyze ﬁnite sample properties of the panel MBB in linear
DD models with individual ﬁxed effects using Monte Carlo simulations. Simulation results show
that the panel MBB performs very well, even when there is strong serial correlation. The bootstrap
is much more accurate than the standard normal approximation, and it closely follows the new
ﬁxed-b approximation proposed in chapter 2. This improvement holds for the special case of
Bartlett kernel. Results would look similar for other kernels. The improvement even holds when
the i.i.d. bootstrap is used, despite potential serial correlation in the data. Simulations results also
show that if the block length is appropriately chosen, the panel MBB could outperform the ﬁxed-b
approximation when there is strong serial correlation. Theoretical evidences are not provided in
this chapter, but can directly follow Gonçalves (2011).
The remainder of this chapter is organized as follows. In the next section we describe the model
and test statistics. We also review the ﬁxed-b asymptotic approximation. Section 3.3 describes
the bootstrap method. Section 3.4 reports simulation results which compare the standard normal
approximation, the ﬁxed-b approximation and the bootstrap. Section 3.5 concludes. Appendix E
contains all ﬁgures.

48

3.2
3.2.1

The Difference-in-Differences Model
The Model and DD Estimator

Consider a DD model with individual ﬁxed effects given by

yit = ci + β1 Treati + β2 DUt + β3 Treati · DUt + zit γ + uit ,
i = 1, 2, . . . , N,

(3.1)

t = 1, 2, . . . , T,

where yit and uit are scalars, ci denotes the unobserved individual heterogeneity. Treati denotes
an indicator for individuals in the treatment group which takes one if individual i is in the treatment
group. Without loss of generality, we assume that the ﬁrst kN individuals are in the treatment group.
Thus, Treati = 1(i ≤ kN). DUt denotes an indicator for post-policy-change time periods which
takes one after the policy change. That is, DUt = 1(t > λ T ) = 1(r > λ ), where the parameter λ
is the relative date of the policy change within the time sample. Both k and λ are assumed known.
zit is a K × 1 vector of additional regressors.
The parameter of interest is β3 , which evaluates the impact of a policy change on y. The
estimation method is the ﬁxed-effects ordinary least squares (OLS) estimator, or the DD estimator
ˆ
β=

N

−1 N

T

¯
¯
∑ ∑ (xit − xi)(xit − xi)

T

¯
¯
∑ ∑ (xit − xi)(yit − yi),

i=1 t=1

(3.2)

i=1 t=1

where
 
β2 
 
β = β3  ,
 
 
γ

3.2.2





DUt




Treat · DU  ,
xit = 
t
i


zit

yi = T −1
¯

T

∑

t=1

yit ,

¯
xi = T −1

T

∑ xit .

t=1

The DK Standard Errors

Driscoll and Kraay (1998) ﬁrst proposed the HAC type robust variance estimator using the time
series of sums of all the individual observations at each time period. The idea is to ﬁrst aggregate
49

all the individual observations at each time period and then apply the HAC estimator to the time
series of the sums. The ﬁrst step takes into account potential cross-sectional correlation in the data,
and the second step takes into account potential serial correlation in the data. Therefore, the DK
standard errors are robust to cross-sectional correlation of unknown form as well as heteroskedasticity and serial correlation, assuming covariance stationarity and weak dependence in the time
dimension.
˜
˜
ˆ
˜ ˆ
˜
¯ ˜
Let vit = xit uit and deﬁne vit = xit uit where xit = xit − xi , yit = yit − yi , uit are the OLS
¯ ˆ
T
ˆ
¯
ˆ
ˆ ˆ
˜ ˆ
¯
ˆ
¯ ¯
residuals given by uit = yit − xit β . Deﬁne vt = ∑N vit ,, and let Γ j = T −1 ∑ vt vt− j .
ˆ
˜
i=1
t= j+1
−1 T N
˜
Let Ω = limT →∞ Var(T 2 ∑ ∑ vit ). Following the approach of Driscoll and Kraay (1998),
t=1 i=1
the estimation of Ω is implemented with the nonparametric kernel HAC estimator given by
ˆ
ˆ
¯
¯
Ω = Γ0 +

T −1

∑

k(

j=1

j ˆ
ˆ
¯
¯
)(Γ j + Γ j ),
M

where k(x) is a kernel function such that k(x) = k(−x), k(0) = 1, |k(x)| ≤ 1, k(x) is continuous at
ˆ
∞
¯
x = 0, and −∞ k2 (x) < ∞. M is the bandwidth parameter. When Ω is used as the middle term
of the sandwich form of the covariance matrix, we obtain the robust covariance matrix estimator
proposed by Driscoll and Kraay (1998)
N T
N T
ˆ
¯
ˆ = T ( ∑ ∑ x x )−1 Ω( ∑ ∑ x x )−1 .
˜ it ˜ it
˜ it ˜ it
V
i=1 t=1
i=1 t=1

3.2.3

Test Statistics and Asymptotic Distributions

Consider testing linear hypotheses about β of the form
H0 : Rβ = r,
where R is a q × (K + 2) matrix of known constants with full rank with q ≤ (K + 2) and r is a q × 1
vector of known constants. In the case where q = 1 we can deﬁne the t-statistic
ˆ
Rβ − r
.
t=√
ˆ
RV R
50

The main focus is on the asymptotic behavior of t-statistics based on the DD estimator. For comˆ
parison purposes, t-statistics based on γ are also considered in models with additional regressors.
ˆ
¯
The traditional asymptotic approach relies on Ω being a consistent estimator of Ω. Consistency
ˆ
¯
of Ω requires that M → ∞ as T → ∞, but at a slower rate of convergence M → 0. Under the
T
traditional approach, the t-statistic has a limiting standard normal distribution.
An alternative asymptotic theory has been proposed by Kiefer and Vogelsang (2005). They
model the bandwidth as a ﬁxed proportion of the sample size. That is, M = bT with b a ﬁxed
constant in (0, 1]. Because b is held ﬁxed in this approach, this new alternative approach is usually
labeled ﬁxed-b asymptotics while the traditional approach is labeled small-b asymptotics. Under
ˆ
¯
the ﬁxed-b approach, Ω converges to a random matrix rather than a constant. In Vogelsang (2012),
the random matrix depends on the kernel function and the bandwidth. In chapter 2, the random
matrix also depends on the date of the policy change, λ , in DD models. As a result, the t-statistic
has a nonstandard limiting distribution. This limiting distribution reﬂects the date of the policy
change and the choice of kernel and bandwidth, but is otherwise pivotal. Fixed-b asymptotics
provide more accurate and reliable inference than small-b asymptotics. For a given date of the
policy change, kernel function and bandwidth, ﬁxed-b critical values can be simulated.
In linear DD models with individual ﬁxed effects as in chapter 2, we have shown that
t⇒

∗∗
N F (W1 )
PF (b, λ , QF∗∗ )
1

,

∗∗
where ⇒ denotes weak convergence, W1 is the standard Wiener process, and PF (b, λ , QF∗∗ ) is
1
the random matrix that depends on the date of the policy change λ , kernel function and bandwidth.

51

In the special case of Bartlett kernel, k(x) = 1 − |x| for |x| ≤ 1 and k(x) = 0 for |x| ≥ 1, we have
H F (r, λ ) = 1(r > λ ) − (1 − λ ),

˜
N F (W ) = λW (1) −W (λ ) = (λ − 1)W (

λ
),
λ −1

r F
r F
∗∗
∗∗
H (s, λ )dW1 (s) −W1 (1)
H (s, λ )ds
0
0
r F
1 F
−1 F ∗∗
−
H (s, λ )2 ds
H (s, λ )2 ds
N (W1 ),
0
0
2 1 F
PF (b, λ , QF ) =
Q (r, λ ,W )QF (r, λ ,W ) dr
b 0
1 1−b F
[Q (r, λ ,W )QF (r + b, λ ,W ) + QF (r + b, λ ,W )QF (r, λ ,W ) ]dr.
−
b 0
∗∗
QF∗∗ = QF (r, λ ,W1 ) =
1

3.3

Bootstrap Methods

Another alternative to asymptotic approximations is the bootstrap. In order to obtain heteroskedasticity, autocorrelation and cross-sectional correlation robust inference, we follow the panel MBB
approach proposed by Gonçalves (2011). Motivated by the idea of Driscoll and Kraay (1998),
Gonçalves (2011) proposed the panel MBB which is an extension of the standard MBB to linear
panel models. The panel MBB ﬁrst stacks all the individual observations at each time period into
vectors and then applies the standard MBB to the time series of these vectors. Gonçalves (2011)
has proved that this method is robust to heteroskedasticity, serial correlation and cross-sectional
correlation of unknown form when the ﬁxed-effects OLS estimator is used, under the assumption
that N is an arbitrary nondecreasing function of T and T → ∞. Weak dependence in the time dimension is required for the MBB to be valid, but we allow the dependence in the cross section
dimension to be either weak or strong.
ˆ
Deﬁne the bootstrap ﬁxed-effects OLS estimator β ∗ as
ˆ
β∗ =

N

T

∑ ∑ (x∗ − x∗)(x∗ − x∗)
it ¯ i
it ¯ i
i=1 t=1

−1 N

T

∑ ∑ (x∗ − x∗)(y∗ − y∗),
it ¯ i it ¯i

i=1 t=1

where
y∗ = T −1
¯i

T

∑

t=1

y∗ ,
it

¯i
x∗ = T −1

52

T

∑ x∗ .
it

t=1

(3.3)

Note that (3.3) is calculated using the bootstrap data (y∗ , x∗ ). The method to construct the pseudoit it
data using the panel MBB is described below.
ˆ
The ﬁrst step is to run the pooled OLS regression to obtain the ﬁxed-effects OLS estimator β
and the residuals uit . Deﬁne the (K + 1) × 1 vector ωit = (zit , uit ) which collects the additional
ˆ
ˆ
regressors and the OLS residual for each observation in model (3.1). Let ωt = (ω1t , ω2t , . . . , ωNt )
denote the N(K + 1) × 1 vector containing the N cross-sectional observations at a given time period
t. Let l ∈ N (1 ≤ l < T ) be the block length, and let Bt,l = {ωt , ωt+1 , . . . , ωt+l−1 } be the block
of l consecutive observations starting at ωt . For simplicity, assume T = hl. Note that l = 1 is just
the standard i.i.d. bootstrap case. The MBB randomly draws h = T blocks with replacement from
l
the set of overlapping blocks {B1,l , B2,l , . . . , BT −l+1,l }. Thus the pseudo-data ωt∗ take the form
∗
∗
ω1 = ωI +1 , ω2 = ωI +2 , . . . , ωl∗ = ωI +l ,
1
1
1
∗
∗
ωl+1 = ωI +1 , . . . , ω2l = ωI +l ,
2
2
.
.
.
∗
∗
ω(h−1)l+1 = ωI +1 , . . . , ωhl = ωI +l ,
h
h
where the indices I1 , I2 , . . . , Ih are i.i.d. random variables distributed uniformly on {0, 1, . . . , T −l}.
Let x∗ = (DUt , Treati · DUt , z∗ ) . Pseudo-values y∗ are given by
it
it
it
ˆ ˆ
y∗ = x∗ β + u∗ .
it
it
it

(3.4)

It is worth noting that the bootstrap data generating process (DGP) is a bit different from that
in Gonçalves (2011). Gonçalves (2011) uses the pairs bootstrap where the bootstrap data (y∗ , x∗ )
it it
are directly drawn from the original data (yit , xit ) without a ﬁrst-step regression to obtain the OLS
residuals. The pairs bootstrap does not work in DD models because it may mix the pre and post
ˆ
policy change values and thus lead to a biased estimator β ∗ .
One might want to do the pairs bootstrap within the pre/post policy change subgroup. However, if testing the additional regressors is of interest, this method gives biased estimators for the
additional regressors. Therefore, a combination of the residual bootstrap and the pairs bootstrap
53

is used in this chapter. Since DUt and Treati · DUt are indicators, they are not resampled in the
bootstrap procedure. Only the pairs of additional regressors and the residuals are resampled. New
pseudo-values of the dependent variable are computed using (3.4).
For example, consider a simple time series model with one random regressor z:
yt = µ + β zt + ut .
We have
ˆ ˆ
yt = µ + β zt + ut ,
ˆ

(3.5)

ˆ
ˆ
where µ and β are the OLS estimators, and ut is the OLS residual. Equation (3.5) holds for all
ˆ
∗ ∗
(yt , zt ). For each bootstrap sample (yt , zt ),
∗ ˆ ˆ ∗ ˆ∗
yt = µ + β zt + ut

(3.6)

ˆ
ˆ
is always true. Equation (3.5) is the “population model” for the bootstrap sample, and µ and β are
the “population coefﬁcients”. As usual in the bootstrap literature, let E ∗ denote the expected value
induced by the bootstrap resampling, conditional on a realization of the original time series. We
have
1 T
u = 0,
ˆ
E ∗ (ut ) =
ˆ∗
T ∑ t
t=1
because ut is uniformly distributed on {u1 , . . . , uT } conditional on the original sample. The secˆ∗
ˆ
ˆ
ond equation holds because of the normal equation of the OLS estimator. Similarly, we have
T
∗ (z∗ u∗ ) = 1
E t ˆt
z u = 0.
ˆ
T ∑ t t
t=1
ˆ
ˆ
ˆ
These two conditions guarantee that the OLS estimators µ ∗ and β ∗ can consistently estimate µ
ˆ
and β , respectively. This explains why the bootstrap would work intuitively. If we resample (yt , zt )
∗ ˆ∗
within the pre/post policy change subgroup, the expected value E ∗ (zt ut ) becomes
(1−λ )T
1 λT
1
∗ ˆ∗
E ∗ (zt ut ) =
zt ut +
ˆ
ˆ
∑ zt ut = 0.
λT ∑
(1 − λ )T
t=1
t=1
∗
This method causes zt to be correlated with ut and thus leads to a biased OLS estimator.
ˆ∗
54

Next, consider model (3.1). Without loss of generality, we can set ci = 0 and β1 = 0. We have
ˆ ˆ
yit = βˆ2 DUt + βˆ3 Treati · DUt + zit γ + uit .
If we directly draw (yit , zit ) from the original data, it is possible that the pre/post policy change
values are mixed in the bootstrap sample. For example, suppose a original post-policy-change pair
(yis , zis ) appears as a pre-policy-change pair in the bootstrap data. Then in the original data we
ˆ ˆ
ˆ ˆis ˆis
have yis = βˆ2 + βˆ3 Treati + zis γ + uis , while in the bootstrap data yis = zis γ + u∗ . u∗ is no longer
the original OLS residual uis associated with (yis , zis ). This will cause z∗ to be correlated with u∗
ˆ
ˆit
it
and thus leads to a biased OLS estimator. Therefore, we have to resample (zit , uit ) and re-construct
ˆ
yit using (3.4). In (3.4), we have
1 N T
E ∗ (z∗ u∗ ) =
ˆit
z u = 0.
ˆ
it
NT ∑ ∑ it it
i=1 t=1
ˆ
The OLS estimator of (3.4) can consistently estimate β .
Given a bootstrap sample (y∗ , x∗ ), let
it it
˜ it
x∗ = x∗ − x∗ ,
it ¯ i
ˆ
¯
Γ∗ = T −1
j

T

∑

ˆ it ˜ it ˆit
v∗ = x∗ u∗ ,

ˆ∗
¯
vt =

N

ˆ it
∑ v∗ ,

i=1

ˆ∗ ˆ∗
¯ ¯
vt vt− j ,

t= j+1
T −1 j
ˆ ∗ = Γ∗ +
ˆ
ˆ
ˆ
¯
¯
¯
¯
Ω
∑ k( M )(Γ∗ + Γ∗ ),
j
j
0
j=1
N T
N T
ˆ
¯
ˆ
˜ it ˜ it
˜ it ˜ it
V ∗ = T ( ∑ ∑ x∗ x∗ )−1 Ω∗ ( ∑ ∑ x∗ x∗ )−1 .
i=1 t=1
i=1 t=1
The naive bootstrap t-statistic t ∗ can be deﬁned as
ˆ
Rβ ∗ − r∗
t∗ = √
,
ˆ
RV ∗ R
ˆ
where r∗ = Rβ .
∗
To obtain the bootstrap critical value tc for a test with a signiﬁcance level α, we generate B
bootstrap samples indexed by j and compute t ∗ . We sort t ∗ from the smallest to the largest and
j
j
∗ = t∗
then calculate tc
, where [α(B + 1)] is the integer part of α(B + 1).
[α(B+1)]
55

3.4

Finite Sample Performances

This section compares ﬁnite sample performances of the standard normal asymptotic approximation, the ﬁxed-b asymptotic approximation and the naive panel MBB using Monte Carlo simulations. We ﬁrst present results for the simplest DD model without additional regressors, and then
add one additional regressor into the model and report the results. The interesting patterns found
in Gonçalves (2011) and Gonçalves and Vogelsang (2011) hold in the simplest DD model. They
continue to hold after one additional regressor is added to the model.
The DGP used for simulations is very similar to the one used in Vogelsang (2012). The model
is
yit = ci + β1 Treati + β2 DUt + β3 Treati · DUt + zit γ + uit ,

(3.7)

where
uit = ρui,t−1 + εit ,

ui0 = 0,

εit ∼ N(0, 1),

cov(εit , ε js ) = 0 for t = s;

zit = ρzi,t−1 + eit ,

zi0 = 0,

eit ∼ N(0, 1),

cov(eit , e js ) = 0 for t = s.

ci is the unobserved individual ﬁxed effects. Only one additional regressor zit is included and it
is uncorrelated with uit . zit and uit are modeled as AR(1) processes with the same autoregressive
parameter. εit and eit have spatial correlation in the cross section dimension, though uncorrelated
over time. In particular, they are constructed in the following way. For a given time period t,
N i.i.d. standard normal random variables are placed on a square grid. At each grid point, εit
is constructed as the weighted sum of the normal random variable at that grid point, the normal
random variables that are one step away to the left, right, up or down on the grid with a weight
θ and the normal random variables that are two steps away in the same direction with a weight
θ 2 . Hence, εit is a spatial MA(2) process with parameter θ and the distance measure is maximum
coordinate-wise distance on the grid. eit is constructed in a similar way.
We consider testing the null hypothesis that H0 : β3 = 0 against the alternative H1 : β3 = 0

56

with a signiﬁcance level of 5% using the t-statistic
tDD =

βˆ3
,
se(βˆ3 )

where se(βˆ3 ) is the DK standard error estimate. In the cases where the additional regressor zit
is included, we also consider testing the null hypothesis that H0 : γ = 0 against the alternative
H1 : γ = 0 with a signiﬁcance level of 5% using the t-statistic
tz =

ˆ
γ
,
ˆ
se(γ)

ˆ
where se(γ) is the DK standard error estimate.
In all cases, β1 , β2 , β3 and γ are set to zero. Also set ci = 0, θ = 0.5, k = 0.5 and λ = 0.5 unless
otherwise speciﬁed. Note that we can set ci = 0 without loss of generality because the ﬁxed-effects
OLS estimator is exactly invariant to ci = 0. Results are reported for sample sizes T = 50, 250 and
N = 50, 250 when there is no cross-sectional correlation, T = 50, 250 and N = 49, 256 when there
is spatial correlation. In the simulations, 1, 000 random samples are generated for each pair of
(N, T ). We consider three values for the AR parameter, ρ: 0.0, 0.3 and 0.9, and four values for
the bandwidth: b = 0.02, 0.1, 0.5 and 0.7. We only consider the Bartlett kernel. We reject the null
hypothesis whenever tDD > tc1 or tz > tc2 , where tc1 and tc2 are critical values. In particular,
tc1 = tc2 = 1.96 is used for the standard normal asymptotic approximation. For the ﬁxed-b asymptotic approximation, tc1 is the 97.5% percentile of the ﬁxed-b asymptotic distribution derived in
chapter 2, while tc2 is the 97.5% percentile of the ﬁxed-b asymptotic distribution derived by Kiefer
and Vogelsang (2005). For the naive panel MBB, both tc1 and tc2 are the 97.5% bootstrap percentile of the corresponding bootstrap t-statistics. For each sample, the bootstrap tests are based
on 499 replications. In most cases, we consider the block length l = 1, i.e. the i.i.d. bootstrap.
Results for the block length l = 25 when T = 250 are reported in the case of spatial correlation.
All results are shown in ﬁgures. (See Appendix E.) Figures E.1 and E.2 illustrate the empirical
null rejection probabilities as a function of λ , given that there is no cross-sectional correlation and
N = 100, T = 250, ρ = 0.3 and b = 0.02 and 0.5, respectively. We consider ﬁve values for λ : 0.1,
0.3, 0.5, 0.7 and 0.9. The standard i.i.d. bootstrap is used. In both ﬁgures, the standard normal
57

asymptotic approximation leads to over-rejection. The empirical null rejection probabilities using
the standard normal asymptotic approximation show a U-shape with the bottom at λ = 0.5. The
over-rejection problem gets worse when λ approaches either 0 or 1. In contrast, the naive panel
MBB is more accurate than the standard normal approximation. The improvement is remarkable.
The larger the bandwidth b, the bigger the improvement. In fact, the bootstrap closely follows the
ﬁxed-b asymptotic approximation, and thus reﬂects the date of the policy change λ . The bootstrap
rejection probabilities do not vary much for different values of λ .
Figures E.3–E.20 each contains two columns. Each column contains three graphs corresponding to the three values of ρ. Every sub-ﬁgure illustrates the empirical null rejection probabilities
as a function of the bandwidth b given λ = 0.5. Figures E.3–E.12 present results for the simplest
DD model without the additional regressor. Figures E.3, E.5, E.7 and E.9 present results for models without cross-sectional correlation, while Figures E.4, E.6, E.8, and Figure E.10-E.12 present
results for models with spatial MA(2) correlation.
Figures E.3 and E.5 focus on cases when N = 50 and N = 250, respectively. Figures E.4 and E.6
focus on cases when N = 49 and N = 256, respectively. In each ﬁgure, the ﬁrst column presents
results for T = 50 while the second column presents results for T = 250. Several interesting
patterns can be found here. For the standard normal approximation, rejection probabilities tend
to be much larger than 5%. The over-rejection problem gets worse when b increases. In contrast,
the i.i.d. bootstrap is always much more accurate than the standard normal approximation. The
larger the bandwidth b, the bigger the improvement. The improvement becomes larger as the
sample size T increases. This improvement holds for N = 50 and N = 250. The improvement
holds regardless of potential cross-sectional correlation in the data. The i.i.d. bootstrap tends to
closely mimic the ﬁxed-b approximation for all DGPs, all (N, T ) combinations, and all bandwidths,
despite potential serial correlation in the data. Looking at Figures E.4 and E.6, where spatial MA(2)
correlation exists, when ρ = 0, i.e. there is no serial correlation but cross-sectional correlation
only, the bootstrap rejection probabilities are very close to 5%. Even when there is strong serial
correlation, i.e. ρ = 0.9, if the bandwidth is large enough, the bootstrap rejection probabilities

58

could still be around 10% or less.
Figures E.7-E.10 illustrate how different values of N would affect the improvement of the
i.i.d. bootstrap over the standard normal approximation. Figures E.7 and E.8 focus on cases when
T = 50, and Figures E.9 and E.10 focus on cases when T = 250. In Figures E.7 and E.9, the
ﬁrst column presents results for N = 50 while the second column presents results for N = 250.
In Figures E.8 and E.10, the ﬁrst column presents results for N = 49 while the second column
presents results for N = 256. Across all DGPs, all (N, T ) combinations and all values of ρ, no
signiﬁcant improvement of the i.i.d. bootstrap over the standard normal approximation is observed
as N increases.
Figures E.11 and E.12 compare the performance of the bootstrap with different block lengths.
In each ﬁgure, the ﬁrst column presents results for the block length l = 25 while the second column
presents results for l = 1, the i.i.d. bootstrap. Figure E.11 focuses on the case when N = 49 and
T = 250. Figure E.12 focuses on the case when N = 256 and T = 250. It is worth noting that when
there is strong serial correlation (e.g., ρ = 0.9), increasing the block length to 25 helps further
improve the inference, and the bootstrap is likely to outperform the ﬁxed-b approximation across
all the bandwidths. But when there is no serial correlation in the data (ρ = 0), yet we set the
block length to be 25, the bootstrap can over-reject a little bit. When N = 49 and l = 25, the
improvement over the ﬁxed-b approximation is very small. However, when N increases from 49
to 256, signiﬁcant improvement can be found in Figure E.12. The results suggest that if the block
length is appropriately chosen, the panel MBB can outperform the ﬁxed-b approximation when
there is strong serial correlation.
Figures E.13–E.20 present results for the DD model with one additional regressor z. Since we
are interested in the performance of the bootstrap when the cross-sectional correlation exists, all
DGPs include the spatial MA(2) correlation in the cross section. Figures E.13–E.16 illustrate the
empirical null rejection probabilities for tests based on β3 and γ. The ﬁrst column shows results for
β3 , and the second column shows results for γ. (N, T ) combinations (49, 50), (49, 250), (256, 50),
and (256, 250) are considered in Figures E.13–E.16, respectively. In other words, (large-T , small-

59

N), (small-T , large-N) and (large-T , large-N) cases are included. Figures E.17–E.20 compare
the performance of the bootstrap with different block lengths. Figures E.17 and E.19 focuses on
(N, T ) = (49, 250). Figures E.18 and E.20 focuses on (N, T ) = (256, 250). The patterns for the
DD estimator found in the simplest DD model continue to hold after the additional regressor z
is added. Similar patterns also hold for inference on the z coefﬁcient, which is consistent with
ﬁndings in Gonçalves (2011).

3.5

Conclusion

In this chapter we use Monte Carlo simulations to investigate ﬁnite sample performances of the
naive panel MBB applied to heteroskedasticity, autocorrelation and cross-sectional correlation robust tests based on the DD estimator and the DK standard errors. Simulation results show that the
naive panel MBB outperforms the standard normal approximation in the special case of Bartlett
kernel. This improvement even holds for the i.i.d. bootstrap, despite potential serial correlation in
the data. The results suggest that the ﬁnite sample performance of the naive panel bootstrap closely
follow the performance of the ﬁxed-b approximation to the ﬁrst order. In addition, the results also
suggest that the bootstrap can be more accurate than the ﬁxed-b approximation when appropriate
block length is chosen. Results would look similar for other kernels.
Gonçalves and Vogelsang (2011) have shown that the naive MBB, including the i.i.d. bootstrap, has the same limiting distribution as the ﬁxed-b asymptotic distribution. For the special case
of a location model, Gonçalves and Vogelsang (2011) have proved that the i.i.d. bootstrap can produce more accurate inference than the standard normal approximation depending on the choice of
the bandwidth and the number of ﬁnite moments in the data. Given the patterns in the simulations,
we can conjecture that the asymptotic equivalence of the panel MBB and the ﬁxed-b distribution
holds in our settings. The improvement of the i.i.d. bootstrap over the standard normal approximation could also be extended to panel models and inference on the DD parameter. Theoretical
explanations can be included in future research.

60

APPENDICES

61

Appendix A
PROOFS IN CHAPTER 1

Proofs of Theorem 1.1 is provided.
Proofs of Theorem 1.1. First, we need to show that sample variance of xit has a well-deﬁned limit.
[rT ]
[rT ]
1 N
1 N
2
xit =
(µ + θt + ξit )2
NT ∑ ∑
NT ∑ ∑ i
i=1 t=1
i=1 t=1
[rT ]
1 N
2
=
(µ 2 + θt2 + ξit + 2µi θt + 2µi ξit + 2θt ξit )
NT ∑ ∑ i
i=1 t=1
[rT ]
[rT ]
N [rT ]
N
1
[rT ] 1 N 2 1
2 +2 1
2+ 1
ξ
·
µ + ∑ θt
µ
θ
=
T N ∑ i
T
NT ∑ ∑ it
N ∑ i T ∑ t
i=1
t=1
i=1 t=1
i=1
t=1
[rT ]
[rT ]
2 N
2 N
+
µξ +
θξ
NT ∑ ∑ i it NT ∑ ∑ t it
i=1 t=1
i=1 t=1
p
2
− r E(µi2 ) + E(θt2 ) + E(ξit ) + 2E(µi )E(θt ) + 2E(µi ξit ) + 2E(θt ξit )
→
= rQ
2
where Q = E(µi2 ) + E(θt2 ) + E(ξit ).
Next, we prove (1.15) and (1.16). We have to show that θt δt is a zero mean covariance stationary process and thus it can be represented in the form of a MA(∞) process according to Wold’s
− 1 [rT ]
theorem. Therefore, θt δt satisﬁes a FCLT, and T 2 ∑ θt δt ⇒ σW (r), where W (r) is a standard
t=1
2 is the long run variance of θ δ . It is straightforward to get
Wiener process and σ
t t
E(θt δt ) = E(θt )E(δt ) = 0,
γ j = cov(θt δt , θt− j δt− j ) = E(θt δt θt− j δt− j ) = E(θt θt− j )E(δt δt− j ) =

62

ρ2 j
.
(1 − ρ 2 )2

Some algebra yields
[rT ]
− 1 −1 N
2T
N
∑ ∑ vit
i=1 t=1
[rT ]
1
− 2 −1 N
=N
T
∑ ∑ µi + θt + ξit γi + δt + ηit
i=1 t=1
[rT ]
1
− 2 −1 N
T
=N
∑ ∑ µiγi + µiδt + µiηit + γiθt + θt δt + θt ηit + γiξit + ξit δt + ηit ξit
i=1 t=1
[rT ]
[rT ]
N [rT ]
N [rT ]
N
N
1
− 2 −1
=N
T
[rT ] ∑ µi γi + ∑ µi ∑ δt + ∑ ∑ µi ηit + ∑ γi ∑ θt + N ∑ θt δt
t=1
i=1 t=1
i=1 t=1
i=1
i=1 t=1
[rT ]
[rT ]
[rT ]
[rT ]
N
N
N
N
+ ∑ ∑ θt ηit + ∑ ∑ γi ξit + ∑ ∑ ξit δt + ∑ ∑ ηit ξit
i=1 t=1
i=1 t=1
i=1 t=1
i=1 t=1
[rT ]
[rT ]
[rT ] − 1 N
−1
−1 −1 N
−1
−1 N
=
N 2 ∑ µi γi + T 2 N 2 ∑ µi T 2 ∑ δt + T 2 (NT ) 2 ∑ ∑ µi ηit
T
t=1
i=1
i=1
i=1 t=1
[rT ]
[rT ]
1 − 1 [rT ]
−1 −1 N
−1 N
−1
−1
+ T 2 N 2 ∑ γi T 2 ∑ θt + φ 2 T 2 ∑ θt δt + T 2 (NT ) 2 ∑ ∑ θt ηit
t=1
t=1
i=1
i=1 t=1
[rT ]
1
1 N [rT ]
1
−1 N
−2
2 ∑ ∑ γi ξit + T − 2 (NT )− 2 ∑ ∑ δt ξit
(NT )
+T
i=1 t=1
i=1 t=1
1 N [rT ]
−1
2 (NT )− 2 ∑ ∑ ηit ξit
+T
i=1 t=1
1 − 1 [rT ]
[rT ] − 1 N
=
N 2 ∑ µi γi + φ 2 T 2 ∑ θt δt + o p (1)
T
i=1
t=1
1
∗
⇒ rZ1 + φ 2 σW ∗ (r),
∗
∗
where Z1 ∼ N(0, 1), and W ∗ (r) is a standard Wiener process. Z1 is independent with W ∗ (r)
because µi , γi are independent with θt , δt . Therefore,
N T
1 N T 2 −1
1
·
xit
∑ ∑ vit
NT ∑ ∑
NT 2 i=1 t=1
i=1 t=1
1
∗
⇒ Q−1 Z1 + φ 2 σW ∗ (1) = Q−1 1 + φ σ 2 Z1 ,

√
ˆ
N(β − β ) =

63

ˆ
where Z1 ∼ N(0, 1). Deﬁne the partial sums of vt as
¯
ˆ
¯
S[rT ] =

[rT ]
ˆ
¯
∑ vt ,

t=1

ˆ
¯
where r ∈ (0, 1] and [rT ] is the integer part of rT . The limiting distribution of S[rT ] is
1
NT

ˆ
¯
S[rT ] =
2
=

[rT ]

1

∑

NT 2 t=1

ˆ
vt =
¯

1

N [rT ]

ˆ
∑ ∑x ε
2 i=1 t=1 it it
NT

N [rT ]

1

1

N [rT ]

ˆ
∑ ∑ x y − xit β
2 i=1 t=1 it it
NT

ˆ
∑ ∑ x ε − xit β − β
2 i=1 t=1 it it
NT


N [rT ]
N [rT ]
√
1
1
2
ˆ
vit − 
xit  · N β − β
=
∑ ∑
NT ∑ ∑
2 i=1 t=1
NT
i=1 t=1
1
1
∗
∗
⇒ rZ1 + φ 2 σW ∗ (r) − (rQ) · Q−1 Z1 + φ 2 σW ∗ (1)
1
1
∗
∗
= rZ1 + φ 2 σW ∗ (r) − rZ1 − rφ 2 σW ∗ (1)
1
1
= φ 2 σ (W ∗ (r) − rW ∗ (1)) ≡ φ 2 σ B(r)
=

where B(r) is a Brownian bridge.
ˆ
¯
Following the approach of Kiefer and Vogelsang (2005), rewrite the Ω in terms of the partial
ˆ
sums of vt . Consider the Bartlett kernel
¯


 1 − |x| |x| ≤ 1
K (x) =
 0

|x| > 1,
Algebra from Hashimzade and Vogelsang (2008b) gives
ˆ
¯
TΩ =

T

T

ˆ ˆ
¯ ¯
∑ ∑ Ktsvt vs

t=1 s=1
1 T −M−1
2 T −1 ˆ ˆ
¯ ¯
St St −
=
∑
M ∑
M
t=1
t=1
2 T −1 ˆ ˆ
1 T −M−1
¯ ¯
=
St St −
∑
M ∑
M
t=1
t=1

1 T −1
ˆ ˆ
ˆ
ˆ
ˆ ˆ
ˆ ˆ
ˆ ˆ
¯ ¯
¯
¯
¯ ¯
¯ ¯
¯ ¯
St St+M + St+M St −
∑ S S + ST St + ST ST
M t=T −M t T
ˆ ˆ
ˆ
ˆ
¯ ¯
¯
¯
St St+M + St+M St

64

ˆ
¯
using the fact that ST = 0 by the OLS normal equations. Note that in this setting, St is a scalar and
M = bT . Continuing the algebra,
2 T −bT −1 ˆ ˆ
2 T −1 ˆ2
ˆ
¯
¯
¯ ¯
St −
TΩ =
∑ St St+M
bT ∑
bT
t=1
t=1
Then
2 T −1 1
2 T −bT −1 1
1
ˆ
ˆ
ˆ
ˆ
¯
¯
¯
¯
·TΩ =
St
St −
St
∑
2
bT ∑
bT
2
2
2
NT
NT
NT
t=1 NT
t=1
1 2
1−b 2
2
⇒
φ σ B(r)2 dr −
φ σ B(r)B(r + b)dr
b 0
0
= φ σ 2 P(b)
1

1

ˆ
¯
S
2 t+M
NT

1−b B(r)B(r + b)dr . It directly follows that
1
2
0 B(r) dr − 0

2
where P(b) = b

ˆ
N · VDK = NT

N

−1

T

2
∑ ∑ xit

N

−1

T

2
∑ ∑ xit

ˆ
¯
Ω

i=1 t=1
i=1 t=1
−1
−1
1 N T 2
1
1 N T 2
ˆ
¯
=
x
·TΩ
x
NT ∑ ∑ it
NT ∑ ∑ it
NT 2
i=1 t=1
i=1 t=1
⇒ Q−1 · φ σ 2 P(b) · Q−1 = Q−2 φ σ 2 P(b)
Therefore,
tDK =

ˆ
β − β0

=

√
ˆ
N(β − β )

ˆ
VDK

⇒

ˆ
N · VDK

Q−1

1 + φ σ 2 Z1

Q−2 φ σ 2 P(b)

=

1+

1
φσ2

·

Z1
.
P(b)

Next, we prove (1.17) and (1.18) following the same steps as above.
1

N [rT ]

∑ ∑ vit =

NT 2 i=1 t=1

=

=

1

N [rT ]

∑ ∑ µi + ξit γi + ηit
NT 2 i=1 t=1
N [rT ]
1
∑ ∑ µiγi + µiηit + γiξit + ηit ξit
NT 2 i=1 t=1
[rT ] − 1 N
N 2 ∑ µi γi + o p (1)
T
i=1

⇒ rZ2 ,
65

where Z2 ∼ N(0, 1). Therefore,
√
N β −β =

−1
1 N T 2
x
·
NT ∑ ∑ it
i=1 t=1

N

1

T

∑ ∑ v ⇒ Q−1Z2
2 i=1 t=1 it
NT

ˆ
¯
The limiting distribution of S[rT ] is
1
NT

ˆ
¯
S
=
2 [rT ]

1



N [rT ]

1

N [rT ]



2
∑ ∑ vit −  NT ∑ ∑ xit  ·

NT 2 i=1 t=1

√
ˆ
N β −β

i=1 t=1

⇒ rZ2 − (rQ) · Q−1 Z2 = 0
Therefore,
T −1 1
T −bT −1
ˆ = 2
ˆ
ˆ
¯
¯t 1 St − 2
¯
·TΩ
S
∑
bT ∑
bT
NT 2
NT 2
t=1 NT 2
t=1
1
1−b
2
⇒
0 · 0dr −
0 · 0dr = 0
b 0
0
1

1
NT 2

ˆ
¯
St

1

ˆ
¯
S
2 t+M
NT

It directly follows that
−1
1
1 N T 2
ˆ
¯
ˆ
N · VDK =
·TΩ
∑ ∑ xit
2
NT
NT
i=1 t=1
⇒ Q−1 · 0 · Q−1 = 0
Therefore,
tDK =

ˆ
β − β0

−1
1 N T 2
x
NT ∑ ∑ it
i=1 t=1

√
ˆ
N β −β
→ ∞.

=

ˆ
VDK

ˆ
N · VDK

66

Appendix B
TABLES IN CHAPTER 1

Table B.1: Estimating coefﬁcient, standard errors and null rejection probabilities with ﬁrm effects:
OLS and one-way clustered standard errors.
Source of regressor volatility
Avg(βOLS )
Std(βOLS )
Avg(SEW hite )
% Sig(tW hite )
f
Avg(SEC )
f
% Sig(tC )
Source of error volatility

0%

50%

75%

1.0003

1.0004

1.0004

1.0004

0.0285

0.0283

0.0283

0.0283

0.0283

0%

25%

0.0283

0.0283

0.0283

[0.0108] [0.0098] [0.0086] [0.0078]
0.0282

0.0282

0.0282

0.0282

[0.0108] [0.0098] [0.0086] [0.0090]
25%

1.0001

1.0005

1.0007

1.0008

0.0284

0.0353

0.0411

0.0463

0.0283

0.0283

0.0283

0.0283

[0.0094] [0.0402] [0.0756] [0.1180]
0.0282

0.0352

0.0411

0.0462

[0.0090] [0.0108] [0.0092] [0.0104]
Continued on next page.

67

Table B.1 (cont’d)
Source of regressor volatility
Avg(βOLS )
Std(βOLS )
Avg(SEW hite )
% Sig(tW hite )
f
Avg(SEC )
f
% Sig(tC )

0%

25%

50%

75%

1

1.0006

1.0008

1.0009

0.0283

0.0412

0.051

0.0592

0.0283

0.0283

0.0283

0.0283

50%

[0.0110] [0.0762] [0.1598] [0.2262]
0.0282

0.0411

0.0508

0.0589

[0.0112] [0.0100] [0.0102] [0.0098]
75%

0.9999

1.0006

1.0008

1.0010

0.0283

0.0464

0.0593

0.0699

0.0283

0.0282

0.0282

0.0282

[0.0120] [0.1156] [0.2218] [0.3068]
0.0282

0.0462

0.0589

0.0694

[0.0112] [0.0090] [0.0088] [0.0102]

68

Table B.2: Estimating coefﬁcient, standard errors and null rejection probabilities with ﬁrm effects:
FM standard errors.
Source of regressor volatility
Avg(βFM )
Std(βFM )
Avg(SEFM )
% Sig(tFM )
Source of error volatility

0%

50%

75%

1.0003

1.0004

1.0004

1.0004

0.0286

0.0284

0.0283

0.0283

0.0276

0%

25%

0.0275

0.0275

0.0275

[0.0322] [0.0304] [0.0282] [0.0284]
25%

1.0001

1.0006

1.0007

1.0008

0.0285

0.0355

0.0412

0.0463

0.0276

0.0267

0.0258

0.0248

[0.0304] [0.0766] [0.1302] [0.1902]
50%

1

1.0006

1.0008

1.001

0.0285

0.0414

0.0511

0.0593

0.0276

0.0258

0.0239

0.0218

[0.0316] [0.1336] [0.2498] [0.3662]
75%

0.9999

1.0006

1.0008

1.001

0.0284

0.0466

0.0594

0.07

0.0276

0.0249

0.0218

0.0183

[0.0290] [0.1928] [0.3660] [0.5134]

69

Table B.3: Estimating coefﬁcient, standard errors and null rejection probabilities with time effects:
OLS and clustered standard errors.
Source of regressor volatility
Avg(βOLS )
Std(βOLS )
Avg(SEW hite )
% Sig(tW hite )
t
Avg(SEC )
t
% Sig(tC )
Source of error volatility

0%
0%

25%

50%

75%

1.0005 1.0005 1.0005 1.0005
0.0285 0.0289 0.0298 0.0312
0.0283 0.0287 0.0294 0.0305
0.01

0.01 0.0098 0.0102

0.026

0.026 0.0259 0.0257

0.0404 0.0406 0.0476 0.0642
25% 1.0003

0.999 0.9978 0.9961

0.028 0.1518 0.2181 0.2831
0.0279 0.0281 0.0286 0.0295
0.0116 0.6208 0.7292 0.7904
0.0254

0.124 0.1739 0.2202

0.0396 0.0524 0.0734 0.0908
50% 1.0002 0.9984 0.9966 0.9942
0.0276

0.213 0.3073 0.3994

0.0275 0.0274 0.0277 0.0283
0.0096 0.7344 0.8128 0.8540
Continued on next page.

70

Table B.3 (cont’d)
Source of regressor volatility
Avg(βOLS )
Std(βOLS )
Avg(SEW hite )
% Sig(tW hite )
t
Avg(SEC )
t
% Sig(tC )

0%

25%

50%

75%

0.0245 0.1732 0.2445 0.3103
0.0412 0.0526
75%

0.074 0.0910

1 0.9978 0.9957 0.9927
0.0272 0.2602

0.376 0.4889

0.0269 0.0266 0.0267 0.0269
0.0092 0.7856

0.853 0.8806

0.0235 0.2113 0.2989 0.3796
0.0364

71

0.052 0.0738 0.0916

Table B.4: Estimating coefﬁcient, standard errors and null rejection probabilities with time effects:
FM standard errors.
Source of regressor volatility
Avg(βFM )
Std(βFM )
Avg(SEFM )
% Sig(tFM )
Source of error volatility

0%

50%

75%

1.0006

0.9999

0.9995

0.9986

0.0285

0.0323

0.0405

0.0561

0.0275

0%

25%

0.0316

0.0389

0.0551

[0.0308] [0.0300] [0.0348] [0.0306]
25%

1.0003

0.9994

0.9999

0.999

0.0247

0.0285

0.0348

0.0492

0.0237

0.0275

0.0337

0.0476

[0.0344] [0.0300] [0.0272] [0.0318]
50%

1

0.9996

0.9999

0.9999

0.0199

0.0232

0.0282

0.0391

0.0195

0.0225

0.0276

0.0394

[0.0258] [0.0296] [0.0268] [0.0236]
75%

0.9997

1.0001

1.0005

0.9998

0.0143

0.0166

0.0202

0.0281

0.0138

0.0159

0.0195

0.0277

[0.0322] [0.0292] [0.0308] [0.0280]

72

Table B.5: Comparing performances of White, one-way cluster-robust and two-way cluster-robust
standard errors in the presence of both ﬁrm effects and time effects when N, T varies seperately.
For time effects with ρ = 0.
N

T

βOLS SEW hite

f
SEC

t
SEC SEdouble

10

10 0.9999

0.2645

0.23

0.241

0.181

10

25 0.9996

0.3735

0.209

0.271

0.137

10

50 0.9977

0.463 0.1875

0.346

0.1395

10 100 1.0004

0.566

0.166 0.4345

0.13

10 250 1.0014

0.694 0.1395 0.5915

0.1175

25

10

0.997

0.383

0.262

0.211

0.145

25

25

0.999

0.423 0.1945

0.192

0.0845

25

50 1.0013

0.52 0.1405

0.241

0.0775

25 100 1.0014

0.603 0.1295

0.35

0.0815

0.104 0.5205

0.08

25 250 1.0005

0.7225

50

10 0.9964

0.4565 0.3325

0.18

0.1295

50

25 1.0019

0.5295 0.2495

0.154

0.084

50

50 1.0001

0.554 0.1845

0.194

0.0755

50 100 1.0004

0.635 0.1385 0.2645

0.067

50 250 0.9998

0.7255 0.1065 0.4075

0.0715

100

10 1.0031

0.563 0.4395

0.166

0.133

100

25

1.002

0.604

0.131

0.0745

100

50 1.0012

0.6425

0.258 0.1485

0.078

0.67 0.1865 0.1825

0.0665

100 100 1.0006

0.336

100 250 0.9999

0.7485

0.108

0.291

0.0485

250

0.7065

0.611

0.146

0.1315

10 0.9962

Continued on next page.

73

Table B.5 (cont’d)
N

T

βOLS SEW hite

f
SEC

250

25 1.0016

0.7165

0.104

0.0825

250

50 1.0004

0.7315 0.3945 0.1015

0.0755

250 100 1.0011

0.7575 0.2935 0.1145

0.056

250 250 1.0003

0.7925 0.1735

0.061

74

0.497

t
SEC SEdouble

0.176

Table B.6: Comparing performances of White, one-way cluster-robust and two-way cluster-robust
standard errors in the presence of ﬁrm effects and AR(1) time effects when N = T = 10.
ρ

f
SEC

βOLS SEW hite

t
SEC SEdouble

-0.95 0.9984

0.6215

0.644 0.5035

0.499

-0.9 1.0053

0.5855

0.599

0.451

0.442

-0.7 1.0059

0.3945

0.393 0.2895

0.265

-0.5 1.0065

0.283

0.27 0.2145

0.181

0.2275 0.2155 0.1815

0.1365

0.996

0.203 0.1745 0.1715

0.1205

0 0.9995

0.2135 0.1805 0.1855

0.138

0.1 1.0066

0.219 0.1785 0.1875

0.1365

-0.3 0.9928
-0.1

0.3 1.0029

0.2195

0.186 0.1995

0.142

0.5 0.9973

0.2395 0.2035 0.2075

0.163

0.7 1.0025
0.9 1.0035
0.95

0.992

0.28

0.257

0.238

0.1985

0.3465 0.3125

0.273

0.2365

0.348

0.316

0.424

0.403

75

Table B.7: Comparing performances of White, one-way cluster-robust and two-way cluster-robust
standard errors in the presence of ﬁrm effects and AR(1) time effects when N = T = 50.
ρ
-0.95

f
SEC

βOLS SEW hite
0.992

-0.9 0.9953

t
SEC SEdouble

0.927 0.9225 0.6525
0.531

0.518

-0.7 1.0007

0.7655 0.5415 0.3105

0.2465

-0.5 1.0037

0.645 0.3295 0.2135

0.113

-0.3 1.0029

0.563

0.203 0.1725

-0.1 0.9974

0.566

0.198

0.183

0.074

0.565 0.1655

0.166

0.055

0 1.0015

0.896

0.846

0.6485

0.0695

0.1 0.9979

0.5765

0.184

0.191

0.066

0.3 1.0019

0.5715 0.2025

0.176

0.074

0.5 0.9995

0.6255 0.2785

0.197

0.1125

0.72 0.4825 0.2915

0.2215

0.7 0.9989
0.9 1.0005
0.95 0.9966

0.8505

0.766 0.4835

0.456

0.887 0.8345 0.5525

0.536

76

Table B.8: Comparing performances of White, one-way cluster-robust and two-way cluster-robust
standard errors in the presence of ﬁrm effects and AR(1) time effects when N = T = 250.
ρ

f
SEC

βOLS SEW hite

-0.95 0.9979

0.9665

t
SEC SEdouble

0.954 0.6635

0.662

-0.9 0.9971

0.943 0.8865 0.5275

0.52

-0.7 0.9987

0.888 0.5755

0.276

0.219

-0.5 1.0003

0.853 0.3245

0.198

0.107

-0.3 0.9996

0.8235

0.21 0.1745

0.0675

-0.1

0.999

0.7865 0.1815 0.1755

0.053

0 1.0002

0.788 0.1705 0.1635

0.049

0.1 1.0008
0.3 0.9991

0.81

0.179 0.1695

0.0505

0.8225 0.2195 0.1765

0.056

0.5 1.0005

0.811 0.3065

0.7 0.9998

0.892

0.9 1.0004

0.9495

0.95 1.0063

0.184

0.096

0.557 0.2805

0.2205

0.881

0.536

0.5265

0.976 0.9525

0.666

0.6635

77

Table B.9: Comparing performances of one-way cluster-robust, two-way cluster-robust and DK standard errors in the presence of ﬁrm
effects and AR(1) time effects when N = T = 50 and N = T = 250. No ﬁrm dummies.
f
t
SEC SEC SEdouble
N,T ρ
50 .0 .174
.3 .224
.6 .374
.9 .772
250 .0 .171
.3 .198
.6 .402
.9 .848

.186
.179
.245
.558
.172
.164
.229
.520

.071
.084
.151
.525
.059
.060
.159
.509

r
SEdouble
.1
.123
.126
.150
.290
.103
.097
.140
.171

.2
.188
.206
.229
.363
.166
.184
.216
.246

values of b
.3
.4
.5
.253 .325 .399
.283 .348 .419
.297 .355 .422
.443 .491 .544
.236 .314 .397
.247 .311 .381
.277 .319 .373
.316 .351 .400

SEDK Using Usual Fixed-b Critical Values
.6
.485
.472
.470
.613
.465
.441
.423
.443

78

.7
.536
.545
.551
.681
.530
.516
.476
.509

.8
.622
.647
.653
.781
.606
.589
.546
.599

.1
.160
.138
.125
.310
.155
.121
.087
.105

.2
.148
.131
.108
.221
.147
.116
.084
.086

values of b
.3
.4
.5
.137 .135 .132
.126 .125 .121
.100 .092 .091
.190 .183 .180
.143 .138 .138
.105 .104 .106
.080 .082 .075
.076 .073 .073

.6
.130
.124
.094
.180
.137
.110
.075
.072

.7
.133
.120
.095
.179
.136
.106
.073
.072

.8
.131
.122
.096
.181
.136
.106
.076
.070

Table B.10: Comparing performances of one-way cluster-robust, two-way cluster-robust and DK standard errors in the presence of ﬁrm
effects and AR(1) time effects when N = T = 50 and N = T = 250. Firm dummies.
f
t
SEC SEC SEdouble
N,T ρ
50 .0 .631
.3 .674
.6 .786
.9 .933
250 .0 .830
.3 .840
.6 .906
.9 .980

.075
.091
.191
.516
.049
.072
.190
.534

.082
.102
.196
.525
.050
.073
.192
.535

r
SEdouble
.1
.182
.184
.203
.328
.152
.150
.163
.202

.2
.259
.250
.287
.439
.217
.219
.242
.289

values of b
.3
.4
.5
.302 .316 .361
.310 .328 .376
.334 .378 .417
.496 .526 .565
.255 .292 .337
.253 .305 .334
.293 .329 .362
.341 .394 .420

SEDK Using Usual Fixed-b Critical Values
.6
.402
.420
.458
.587
.368
.368
.410
.479

79

.7
.445
.469
.520
.648
.412
.408
.454
.542

.8
.537
.552
.603
.727
.500
.478
.530
.625

.1
.068
.067
.103
.283
.048
.048
.069
.120

.2
.066
.063
.090
.233
.048
.051
.064
.102

values of b
.3
.4
.5
.062 .062 .060
.055 .056 .059
.087 .082 .082
.219 .202 .201
.046 .047 .047
.050 .050 .048
.061 .064 .062
.092 .094 .090

.6
.062
.059
.085
.198
.048
.050
.060
.093

.7
.060
.058
.083
.192
.048
.049
.061
.092

.8
.064
.057
.083
.190
.048
.050
.058
.089

Table B.11: Comparing performances of one-way cluster-robust, two-way cluster-robust and DK
standard errors in the presence of ﬁrm effects and AR(1) time effects. No ﬁrm dummies.
SEDK Using Adjusted Fixed-b Critical Values
values of b
N,T
50,50

f
t
ρ SEC SEC SEdouble

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

.0 .174 .186

.071

.051 .049 .052 .052 .051 .049 .051 .055 .052 .053

.3 .224 .179

.084

.073 .064 .063 .059 .062 .062 .065 .068 .064 .066

.6 .374 .245

.151

.100 .085 .079 .072 .068 .074 .071 .073 .074 .075

.9 .772 .558

.525

.310 .220 .188 .183 .180 .180 .179 .180 .181 .186

.0 .128 .264

.066

.052 .053 .053 .051 .050 .054 .051 .055 .053 .055

.3 .150 .251

.067

.049 .049 .050 .047 .047 .047 .049 .048 .047 .048

.6 .273 .258

.121

.070 .064 .063 .062 .056 .056 .058 .056 .055 .058

.9 .756 .547

.505

.187 .139 .121 .118 .121 .121 .118 .121 .123 .125

.0 .093 .403

.065

.043 .048 .046 .044 .044 .047 .049 .047 .044 .046

.3 .090 .380

.055

.041 .043 .040 .043 .041 .042 .042 .043 .043 .043

.6 .188 .350

.121

.068 .059 .061 .060 .061 .064 .064 .064 .066 .067

.9 .713 .535

.472

.102 .089 .080 .081 .077 .080 .081 .084 .082 .083

.0 .254 .137

.077

.073 .064 .066 .065 .067 .065 .064 .061 .063 .066

.3 .288 .141

.083

.072 .061 .060 .059 .058 .060 .059 .059 .060 .060

.6 .498 .224

.176

.094 .082 .080 .074 .076 .080 .077 .080 .080 .082

.9 .831 .565

.545

.299 .218 .186 .173 .165 .170 .176 .179 .178 .181

100,100 .0 .179 .180

.063

.063 .066 .059 .060 .059 .059 .060 .058 .060 .061

.3 .197 .168

.073

.056 .056 .054 .054 .051 .051 .050 .055 .054 .056

.6 .401 .246

.156

.081 .072 .073 .074 .070 .071 .070 .072 .070 .072

.9 .828 .555

.532

.187 .137 .124 .118 .117 .117 .115 .116 .115 .117

50,100

50,250

100,50

Continued on next page.

80

Table B.11 (cont’d)
SEDK Using Adjusted Fixed-b Critical Values
values of b
N,T

f
t
ρ SEC SEC SEdouble

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

100,250 .0 .103 .281

.051

.042 .042 .047 .044 .047 .049 .046 .047 .045 .047

.3 .137 .296

.061

.053 .052 .048 .047 .044 .047 .044 .047 .045 .048

.6 .266 .273

.117

.050 .052 .049 .051 .051 .053 .053 .053 .052 .055

.9 .787 .538

.505

.097 .077 .072 .070 .071 .070 .067 .066 .068 .069

.0 .395 .092

.068

.061 .053 .054 .051 .051 .052 .052 .051 .049 .051

.3 .446 .102

.071

.058 .055 .057 .058 .055 .056 .055 .059 .058 .058

.6 .645 .207

.188

.098 .085 .076 .070 .070 .070 .070 .071 .072 .073

.9 .891 .546

.539

.291 .204 .177 .168 .164 .163 .165 .166 .170 .171

250,100 .0 .299 .106

.055

.053 .056 .055 .059 .058 .060 .060 .059 .059 .063

.3 .344 .129

.078

.066 .066 .064 .061 .059 .059 .060 .057 .058 .059

.6 .569 .205

.170

.072 .071 .071 .070 .067 .071 .071 .072 .074 .075

.9 .878 .545

.535

.185 .145 .128 .124 .113 .118 .121 .123 .124 .125

250,250 .0 .171 .172

.059

.060 .053 .051 .050 .051 .053 .051 .054 .053 .055

.3 .198 .164

.060

.049 .046 .043 .045 .045 .049 .048 .046 .047 .048

.6 .401 .229

.159

.067 .066 .064 .062 .059 .059 .058 .056 .057 .059

.9 .848 .520

. 509

.103 .086 .075 .072 .073 .072 .071 .068 .072 .073

250,50

81

Table B.12: Comparing performances of one-way cluster-robust, two-way cluster-robust and DK
standard errors in the presence of a ﬁrm effect. No ﬁrm dummies.
SEDK Using Usual Fixed-b Critical Values
values of b
N

T

f
t
SEC SEC SEdouble

10

10

.118 .365

.158

.330 .301 .291 .275 .266 .270 .274 .271 .270 .275

25

.112 .525

.135

.489 .467 .447 .429 .417 .419 .415 .416 .416 .418

50

.122 .623

.134

.598 .572 .558 .553 .546 .539 .540 .537 .541 .542

100 .117 .733

.140

.716 .698 .673 .667 .658 .654 .652 .651 .652 .653

250 .114 .826

.133

.814 .801 .787 .780 .772 .772 .771 .772 .772 .774

10

.075 .376

.103

.344 .319 .296 .284 .278 .279 .280 .279 .281 .284

25

.078 .513

.089

.491 .460 .452 .446 .440 .435 .436 .434 .433 .436

50

.073 .623

.082

.607 .589 .571 .555 .546 .544 .541 .544 .542 .544

100 .076 .717

.086

.705 .679 .659 .648 .635 .633 .632 .628 .626 .630

250 .084 .845

.090

.831 .822 .815 .811 .803 .801 .799 .799 .797 .799

10

.059 .370

.077

.336 .313 .296 .276 .268 .263 .268 .265 .264 .270

25

.068 .550

.076

.521 .495 .473 .458 .446 .437 .438 .439 .437 .442

50

.057 .626

.061

.599 .573 .559 .550 .537 .535 .534 .534 .532 .536

100 .069 .739

.073

.726 .708 .696 .685 .678 .679 .675 .673 .674 .678

250 .059 .825

.061

.816 .800 .796 .791 .784 .778 .775 .777 .775 .778

10

.058 .362

.076

.331 .313 .307 .292 .284 .283 .282 .275 .275 .278

25

.063 .526

.069

.492 .466 .448 .429 .420 .414 .410 .412 .413 .417

50

.070 .628

.073

.612 .596 .575 .561 .548 .545 .543 .541 .540 .542

100 .057 .750

.060

.737 .718 .698 .692 .682 .676 .674 .675 .673 .678

250 .059 .824

.060

.813 .806 .798 .791 .784 .780 .774 .775 .776 .778

25

50

100

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

Continued on next page.

82

Table B.12 (cont’d)
SEDK Using Usual Fixed-b Critical Values
values of b
N

T

f
t
SEC SEC SEdouble

250

10

.056 .346

.070

.311 .294 .271 .268 .257 .264 .253 .252 .251 .255

25

.045 .517

.051

.489 .466 .446 .439 .431 .426 .424 .428 .429 .431

50

.046 .642

.048

.617 .595 .583 .571 .565 .559 .555 .554 .558 .561

100 .053 .749

.054

.723 .709 .695 .693 .681 .676 .672 .672 .672 .673

250 .053 .847

.054

.822 .806 .795 .785 .782 .776 .775 .774 .777 .779

.1

.2

.3

83

.4

.5

.6

.7

.8

.9

1.0

Appendix C
PROOFS IN CHAPTER 2

Proofs of the exact equivalence result, Proposition 2.1 and 2.3, Lemma 2.2 and 2.4, Theorem 2.1–
2.3 are provided in this Appendix.

Proof of the exact equivalence result. It is straightforward to obtain
T

2

∑ DU t

= λ (1 − λ )T,

t=1

DU t DU s = DUt DUs − (1 − λ )DUt − (1 − λ )DUs + (1 − λ )2 ,
N

2

∑ Treat i

= k(1 − k)N.

i=1
Deﬁne

kN T
η =λ

∑ ∑

kN λ T
uit −

∑ ∑

N
uit − kλ

T

∑ ∑

N λT
uit + k

∑ ∑ uit

i=1 t=1
i=1 t=1
i=1 t=1
i=1 t=1
N N
kN N
N kN
kN kN
kN
N kN
N
ξ = k2 St Ss − kSt Ss − kSt Ss + St Ss = (St − kSt )(Ss − kSs )
N
kN
N
k
St = ∑ uit , St N = ∑ uit
i=1
i=1
Recall t =

ˆ
β3 −β3
ˆ .
s.e.(β3 )

Consider the individual dummies case. We have




N T
DU t
DU t
−1 N T 



ˆ
β −β = ∑ ∑ 
 [DU t , Treati · DU t ]
 uit
∑ ∑
i=1 t=1 Treati · DU t
i=1 t=1 Treati · DU t

84

(C.1)

Simple algebra yields
N



T



∑ ∑


DU t

−1

 [DU t , Treati · DU t ]

i=1 t=1 Treati · DU t




N T
Treati  −1
2 1
1 k −1
= ∑ ∑ DU t 
= λ (1 − λ )NT 


Treati Treati
k k
i=1 t=1


1
 k −k
=


λ k(1 − λ )(1 − k)NT −k 1









T

N

N





T
λT
DU t
 1 

 1 

 λ ∑ uit − ∑ uit
 ∑ DU t uit = ∑ 
 uit = ∑ 
∑ ∑
i=1 Treati
t=1
i=1 t=1 Treati · DU t
i=1 Treati t=1
t=1


N λT
N T
λ ∑ ∑ uit − ∑ ∑ uit 

i=1 t=1 
=  i=1 t=1

kN λ T

 kN T
λ ∑ ∑ uit − ∑ ∑ uit
i=1 t=1
i=1 t=1
N

T

(C.2)

(C.3)
Plugging (C.2) and (C.3) into (C.1), it directly follows

ˆ
β −β =

1
λ k(1 − λ )(1 − k)NT


λT
T
k(λ ∑ ∑ uit − ∑ ∑ uit )


i>kN t=1
i>kN t=1


η

In particular, we have
η
ˆ
β3 − β3 =
.
λ k(1 − λ )(1 − k)NT
Next, consider the standard error matrix. We know




N
N
DU t


 St 
ˆ
vt = ∑ 
¯
 uit = DU t 

kN
St
i=1 Treati · DU t

85

(C.4)

Therefore,




N
 S  N kN
Kts DU t DU s  t  [Ss , Ss ]
∑ ∑
∑ ∑
kN
St
t=1 s=1
t=1 s=1


N SN SN SkN
T T
S s
t s 
= T −1 ∑ ∑ Kts DU t DU s  t

kN N kN kN
St Ss St Ss
t=1 s=1

ˆ
¯
Ω = T −1

T

T

ˆ ˆ
Kts vt vs = T −1
¯ ¯

T

T

Using this formula, it follows




N T
N T
DU t
DU t
−1 ˆ 1
1




¯
Ω
 [DU t , Treati · DU t ]

∑ ∑
∑ ∑
T
T
i=1 t=1 Treati · DU t
i=1 t=1 Treati · DU t
−1
· [DU t , Treati · DU t ]

 

2  k −k ˆ  k −k
1
¯
=

Ω

λ k(1 − λ )(1 − k)N
−k 1
−k 1


∗
2 ∗
1

=


λ k(1 − λ )(1 − k)N
T
∗ T −1 ∑t=1 ∑T Kts DU t DU s ξ
s=1
Speciﬁcally, we have
ˆ
s.e.(β3 ) =

T

1

T

∑ ∑ KtsDU t DU sξ .

T (λ k(1 − λ )(1 − k)N)2 t=1 s=1

(C.5)

Now consider the individual and time dummies case. Similarly we can derive
2
2 −1 N T
Treat i · DU t
∑ ∑
∑ ∑ Treat iDU t uit
i=1 t=1
i=1 t=1
N
T
λT
1
=
∑ Treat i λ ∑ uit − ∑ uit
λ k(1 − λ )(1 − k)NT
i=1
t=1
t=1
η
=
λ k(1 − λ )(1 − k)NT

ˆ
β3 − β3 =

N

T

For the standard error matrix, it is easy to show
N

ˆ
vt =
¯

∑ Treat iDU t uit = DU t (StkN − kStN ),

i=1

86

(C.6)

and
ˆ
¯
Ω = T −1
= T −1

T

T

∑ ∑

ˆ ˆ
Kts vt vs = T −1
¯ ¯

t=1 s=1
T T

T

T

kN
N
∑ ∑ KtsDU t DU s(StkN − kStN )(Ss − kSs )

t=1 s=1

∑ ∑ KtsDU t DU sξ .

t=1 s=1
Thus, it follows

N T
T T
1
2
2 −2 ˆ
ˆ )= 1
¯ =
Ω
Treat i · DU t
s.e.(β3
∑ ∑ KtsDU t DU sξ .
T ∑ ∑
T (λ k(1 − λ )(1 − k)N)2 t=1 s=1
i=1 t=1
(C.7)
From above, we know the top and the bottom of t statistics are exactly equivalent in these two cases.
As a result, t statistics are exact equivalent in these cases. By symmetry, it is easy to show that this
exact equivalence result holds in the case when only time period dummies are included.
Proof of Proposition 2.1.

√
−1
T ˜
T ˜ ˜
ˆ
T (β −β ) = (T −1 ∑N ∑t=1 xit xit )−1 (T 2 ∑N ∑t=1 xit uit ). Usi=1
i=1

ing Assumption 2.1 and 2.2, it can be shown that
−1 N T
−1 T
−1 T
T 2 ∑ ∑ xit uit = T 2 ∑ x1t , . . . , xNt ut = T 2 ∑ A · DU t · ut
˜
˜
˜
i=1 t=1
t=1
t=1
T
T
1 T
−1
−
= A · T 2 ∑ DUt − T −1 ∑ DUs f(s) τT T −1 ∑ τT f(s)f(s) τT
t=1
s=1
s=1
· τT f(t) ut
1
1
1
−1
⇒ AΛ
1(r > λ ) − F(s) ds
F(s)F(s) ds
F(r) dW (r)
0
λ
0
1 F
= Λ∗
H (r, λ )dW ∗ (r)
0
T
N T
T
2
T −1 ∑ ∑ xit xit = T −1 ∑ A · DU t DU t · A = G · T −1 ∑ DU t
˜ ˜
t=1
i=1 t=1
t=1
T
T
T
−1
= G · T −1 ∑ DUt − T −1 ∑ DUs f(s) τT T −1 ∑ τT f(s)f(s) τT
t=1
s=1
s=1
2
· τT f(t)
1
1
1
−1
2
⇒ G·
1(r > λ ) − F(s) ds
F(s)F(s) ds
F(r) dr
0
λ
0
1 F
=G
H (r, λ )2 dr
0
87

Therefore,
√
ˆ
T (β − β ) ⇒ (G

1 F
1 F
H (r, λ )2 dr)−1 · Λ∗
H (r, λ )dW ∗ (r)
0
0

Proof of Lemma 2.2. Using Assumption 2.1, 2.2 and Proposition 2.1, we obtain
T

[rT ]
[rT ] N
[rT ] N
−1
−1
−1
−1 ˆ
¯
ˆ
2S
¯
= T 2 ∑ vt = T 2 ∑ ∑ xit uit = T 2 ∑ ∑ xit [uit − xit (β − β )]
˜ ˆ
˜ ˜
˜ ˆ
[rt]
t=1
t=1 i=1
t=1 i=1
[rT ] N
[rT ] N
[rT ] N
√
−1
−1
ˆ
= T 2 ∑ ∑ xit uit − T 2 ∑ ∑ xit uit − T −1 ∑ ∑ xit xit
˜
˜ ¨
˜ ˜
T (β − β )
t=1 i=1
t=1 i=1
t=1 i=1
[rT ]
1 T
−1
1 T
−1
2 ∑ DU t ut − A · T − 2 ∑ us f(s) τT ·
= A·T
∑ τT f(s)f(s) τT
T
t=1
s=1
s=1
[rT ]
[rT ]
1
2 √ ˆ
−1
· ∑ τT f(t)DU t − G · T
∑ DU t T (β − β )
T
t=1
t=1
r F
1
1
−1
⇒ Λ∗
H (s, λ )dW ∗ (s) −
dW (s)F(s)
F(s)F(s) ds
0
0
0
r
r F
1 F
−1 F ∗
·
F(s)H F (s, λ )ds −
H (s, λ )2 ds
H (s, λ )2 ds
N (W )
0
0
0
= Λ∗ QF (r, λ ,W ∗ )

because
T

−1
2

[rT ] N

∑ ∑

t=1 i=1

[rT ] N

T
T
xit · ( ∑ uis f(s) )( ∑ f(s)f(s) )−1 f(t)
˜
∑ ∑
t=1 i=1
s=1
s=1
[rT ] T N
T
−1
2 ∑ ( ∑ ∑ xit uis f(s) )( ∑ f(s)f(s) )−1 f(t)
˜
=T
t=1 s=1 i=1
s=1
[rT ] T
T
−1
2 ∑ ( ∑ ADU t us f(s) )( ∑ f(s)f(s) )−1 f(t)
=T
t=1 s=1
s=1
[rT ]
1
1 T
−1 T
2 ∑ us f(s) τT · ( ∑ τT f(s)f(s) τT )−1 ∑ τT f(t)DU t
= A·T
T
T
s=1
t=1
s=1

xit uit = T
˜ ˆ

−1
2

Proof of Proposition 2.3. It directly follows from (2.7), Lemma 2.2 and the continuous mapping

88

theorem that
T −1 − 1
T −M−1 − 1
2
−1 ˆ 1
−1 ˆ
−1 ˆ
−1 ˆ
ˆ
ˆ
ˆ
¯
¯
¯
¯
¯
¯
¯
Ω = T −1 ∑ T 2 St · T 2 St − T −1 ∑ (T 2 St · T 2 St+M + T 2 St+M · T 2 St )
b
b
t=1
t=1
2 1 ∗ F
Λ Q (r, λ ,W ∗ )QF (r, λ ,W ∗ ) Λ∗ dr
⇒
b 0
1 1−b ∗ F
−
Λ Q (r, λ ,W ∗ )QF (r + b, λ ,W ∗ ) + QF (r + b, λ ,W ∗ )QF (r, λ ,W ∗ ) Λ∗ dr
b 0
= Λ∗ PF (b, λ , QF )Λ∗
Proof of Theorem 2.1. Using Proposition 2.3, it directly follows that
N

T

−1 ˆ −1 N T
−1
¯
Ω T
˜ ˜
∑ ∑
∑ ∑ xit xit R
i=1 t=1
i=1 t=1
1 F
1 F
−1 ∗ F
−1
⇒R G
H (r, λ )2 dr
Λ P (b, λ , QF )Λ∗ G
H (r, λ )2 dr
R
0
0
1 F
= PF (b, λ , QF r, λ , R(G
H (r, λ )2 dr)−1 Λ∗W ∗ )
0
∗∗
= Λ∗∗ PF (b, λ , QF (r, λ ,Wq ))Λ∗∗ = Λ∗∗ PF (b, λ , QF∗∗ )Λ∗∗
q
q
q
q
q
R T −1

xit xit
˜ ˜

(C.8)

Using Proposition 2.1, we have
√
ˆ
R T (β −β ) ⇒ R G

1 F
1 F
−1 ∗ 1 F
∗∗
H (r, λ )2 dr
·Λ
H (r, λ )dW ∗ (r) = Λ∗∗
H (r, λ )dWq (r)
q
0
0
0
(C.9)

With (C.8) and (C.9), it follows that
ˆ
ˆ
ˆ
Wald = (Rβ − r) [RV R ]−1 (Rβ − r)
N T
√
−1 ˆ −1 N T
−1 −1
ˆ
¯
= (R T (β − β )) R T −1 ∑ ∑ xit xit
˜ ˜
Ω T
˜ ˜
∑ ∑ xit xit R
i=1 t=1
i=1 t=1
√
ˆ
· R T (β − β )
1 F
1 F
∗∗
H (r, λ )dW∗∗ (r)
H (r, λ )dW∗∗ (r)) [Λ∗∗ PF (b, λ , QF q )Λ∗∗ ]−1 Λ∗∗
q
q
q
q
q
0
0
∗∗
∗∗
∗∗
= N F (Wq ) PF (b, λ , QF q )−1 N F (Wq )

⇒ (Λ∗∗
q

When q = 1, it directly follows that t ⇒

∗∗
N F (W1 )
.
PF (b,λ ,QF∗∗ )
1
89

Proof of Lemma 2.4.
T −1

T

∑

t=1

˜
DU t zit = T −1

T

T

∑ f(s)f(s)

f(s)

∑

−1 T

s=1

s=λ T +1

z
∑ f(t)˜ it = 0

(C.10)

t=1

[rT ]
T
˜
z
using the fact that ∑t=1 f(t)˜ it = 0. Hence, T −1 ∑t=1 DU t zit = o p (1). If r > λ , then
T −1

[rT ]

∑

t=1

˜
DUt zit = T −1
= T −1

[rT ]

∑
t=λ +1
[rT ]

∑

˜
zit
[rT ]

zit − T −1

∑

f(t) τT T −1

T

∑ τT f(s)f(s) τT

s=1
t=λ +1
t=λ +1
T
· T −1 ∑ τT f(s)zis
s=1
1
1
−1
p
− (r − λ ) µi −
→
F(r) dr
F(r)F(r) dr
(µi , 0, . . . , 0)
0
0

−1
(C.11)

= (r − λ )(µi − µi ) = 0
If r ≤ λ , then
T −1

[rT ]
˜
∑ DUt zit = 0

(C.12)

t=1
From (C.10), (C.11) and (C.12), it directly follows that
T −1

[rT ]

∑

t=1

˜
DU t zit = T −1

and thus
T −1

[rT ]
p
z →
∑ (DUt − DU t )˜ it − 0
t=1



(C.13)



[rT ]
p
 1  −1
˜
˜ →
T
∑ ∑ hit zit = ∑ 
∑ DU t zit − 0.
i=1 t=1
i=1 Treati
t=1
N [rT ]

N

Proof of Theorem 2.3. The K × 1 vector zit uit can be written in terms of the N(K + 1) × 1 vector
˜
vt as follows
ii
ˆ
ˆ
ˆ
zit uit = (zit − bi f(t))uit = ((zit − bi f(t)) − (bi f(t) − bi f(t)))uit = Bvt − (bi − bi ) f(t)uit
˜
−1 ˆ
= Ai vt − (τT (bi − bi )) τT f(t)uit
90

Using this formula it is easy to show that
T

1
−2

[rT ]

[rT ]
−1 ˆ
˜
∑ zit uit = T ∑ (Aivt − (τT (bi − bi)) τT f(t)uit )
t=1
t=1
1 [rT ]
1 [rT ]
−2
− 1 √ −1 ˆ
2 ( T τ (bi − bi )) · T − 2 ∑ τT f(t)uit
= Ai T
∑ vt − T
T
t=1
t=1
[rT ]
[rT ]
−1
−1
−1
= Ai T 2 ∑ vt + T 2 O p (1) · O p (1) = Ai T 2 ∑ vt + o p (1)
t=1
t=1
−1
2

˙
⇒ Ai ΛW (r)

(C.14)

using Assumption 2.1 and 2.4. With Assumption 2.1, 2.3, 2.4, Lemma 2.4 and (C.14), simple
algebra gives
N

√
ˆ
T (β − β ) =

∑

T −1

T

∑

xit xit
˜ ˜

−1

N

∑

T

−1 T
2

˜
∑ xit uit

i=1
t=1
i=1
t=1
−1 

N −1 T
T
N
N
T
−1 ∑ h h
˜
∑ T −1 ∑ hit zit   ∑ T 2 ∑ hit uit 
 ∑ T
it it
i=1

 i=1
t=1
i=1
t=1
t=1
= N

  N
1 T
N
T
T

−1 ∑ z h
−1 ∑ z z   ∑ T − 2 ∑ z u 
˜it it
˜it ˜it
˜it it
∑ T
∑ T
i=1
t=1
i=1
t=1
i=1
t=1


1 H F (r, λ )2 dr)−1
0 
(G 0
⇒

¯
0
Q−1


1 [1(r > λ ) − 1 F(s) ds( 1 F(s)F(s) ds)−1 F(r)]dW (r)
˜ ˙
(A ⊗ e1 )Λ 0

0
λ


·
N

˙ (1)
( ∑ Ai )ΛW
i=1


1 H F (r, λ )2 dr)−1 (Λ∗ 1 H F (r, λ )dW ∗ (r)
˙
(G 0

0
=

˙
¯
(Q−1 (∑N Ai )ΛW (1)
i=1


Let



˙∗
0
Λ


˙
Λ∗∗ = 
N

˙
0 ( ∑ Ai )Λ
i=1

T
which is a (K + 2) × (K + 2) block diagonal matrix. Using the fact that ∑ zit f(t) = 0, it follows
˜
t=1
91

that
[rT ] N

∑ ∑

t=1 i=1

N [rT ]
zit uit =
˜ ˆ

∑ ∑

T
zit
˜

∑

T
uis f(s)

∑ f(s)f(s)

−1

f(t)

i=1 t=1 s=1
s=1
[rT ]
T
N
−1 T
f(s)f(s)
˜
= ∑ ∑ zit f(t)
∑
∑ uisf(s)
s=1
s=1
i=1 t=1
N
T
T
−1
p
= ∑ o p (1) · ∑ f(s)f(s)
→
∑ uisf(s) − 0
i=1
s=1
s=1

(C.15)

ˆ
¯
The limits of the partial sums S[rT ] are easy to obtain
[rT ] N
1 [rT ] N
√
−1 ˆ
−2
−1
ˆ
¯
T 2 S[rT ] = T
˜ ˜
˜ ˜
∑ ∑ xit uit − (T ∑ ∑ xit xit ) T (β − β )
t=1 i=1
t=1 i=1
 


[rT ] N
[rT ] N
1 [rT ] N
−
T 2 ∑ ∑ hit (uit − uit ) T −1 ∑ ∑ hit h
T −1 ∑ ∑ hit zit 
˜
ˆ
it
 


t=1 i=1
t=1 i=1
t=1 i=1
−


=
 

[rT ] N
[rT ] N
1 [rT ] N
  −1
 −
−1 ∑ ∑ z z 
2 ∑ ∑ zit (uit − uit )
˜
ˆ
˜
T
˜it ˜it
T
T
∑ ∑ zit hit
t=1 i=1
t=1 i=1
t=1 i=1
√
ˆ
· T (β − β )


−1
1
r
1
˙
Λ∗ [ 0 H F (s, λ )dW ∗ (s) − 0 dW (s)F(s) 0 F(s)F(s) ds






r F(s)H F (s, λ )ds]
· 0
⇒



N


˙ (r)
( ∑ Ai )ΛW
i=1

 

1 H F (r, λ )2 dr)−1 (Λ∗ 1 H F (r, λ )dW ∗ (r)
˙
(G 0
r F
2

0
G H (s, λ ) ds 0  

− 0
·
 
N

˙
¯
¯
Q−1 ( ∑ Ai )ΛW (1)
0
rQ
i=1




∗
˙∗ F
Λ Q (r, λ ,W )
QF (r, λ ,W ∗ )
 N
 = Λ∗∗ 
˙ 
=


˙
( ∑ Ai )ΛB(r)
B(r)
i=1

92

ˆ
¯
The limit of Ω can be written as
T −1 − 1
T −M−1 − 1
2
−1 ˆ 1
−1 ˆ
−1 ˆ
−1 ˆ
ˆ
ˆ
ˆ
¯
¯
¯
¯
¯
¯
¯
Ω = T −1 ∑ T 2 St · T 2 St − T −1 ∑ (T 2 St · T 2 St+M + T 2 St+M · T 2 St )
b
b
t=1
t=1





F (r, λ ,W ∗ )
F (r, λ ,W ∗ ) QF (r, λ ,W ∗ )
2 1 ˙ ∗∗ Q
1 1−b ˙ ∗∗ Q


 ˙ ∗∗
Λ 
Λ
⇒



 Λ dr −
b 0
b 0
B(r)
B(r)
B(r)

 


F (r + b, λ ,W ∗ )
F (r + b, λ ,W ∗ ) QF (r, λ ,W ∗ )
Q
 Q

 ˙ ∗∗
·
 −

 Λ dr
B(r + b)
B(r + b)
B(r)


PF (b, λ , QF ) P12 (b, λ , QF , B) ∗∗
˙ 
˙
= Λ∗∗ 
Λ
F , B)
P21 (b, λ , Q
P(b, B)

R(T −1

N

T

N

T

ˆ
¯
˜ ˜
˜ ˜
∑ ∑ xit xit )−1Ω(T −1 ∑ ∑ xit xit )−1R

i=1 t=1
i=1 t=1





1 H F (r, λ )2 dr)−1
F (b, λ , QF ) P (b, λ , QF , B)
0  ∗∗  P
R11 R12  (G 0

12
˙
⇒

·Λ 

¯
R21 R22
0
Q−1
P21 (b, λ , QF , B)
P(b, B)



1 H F (r, λ )2 dr)−1
0  R11 R12 
(G 0
˙
· Λ∗∗ 


¯
0
Q−1 R21 R22


1 F
2 −1 ˙ ∗
˙
¯ −1 N
R11 (G 0 H (r, λ ) dr) Λ R12 Q (∑i=1 Ai )Λ
=

1
˙
˙
¯
R21 (G 0 H F (r, λ )2 dr)−1 Λ∗ R22 Q−1 (∑N Ai )Λ
i=1


F
F
F
 P (b, λ , Q ) P12 (b, λ , Q , B)
·

P21 (b, λ , QF , B)
P(b, B)


1 H F (r, λ )2 dr)−1 Λ∗ R Q−1 (∑N A )Λ
˙
˙
¯
R11 (G 0
12
i=1 i 
·

1
˙
˙
¯
R21 (G 0 H F (r, λ )2 dr)−1 Λ∗ R22 Q−1 (∑N Ai )Λ
i=1
(C.16)

93




1 H F (r, λ )2 dr)−1 (Λ∗ 1 H F (r, λ )dW ∗ (r)
˙
√

R11 R12  (G 0
0
ˆ
R T (β − β ) ⇒ 


˙
¯
Q−1 (∑N Ai )ΛW (1)
R21 R22
i=1

(C.17)

If q2 = 0 and R12 = 0, that is, we are testing restrictions on the DD estimator, then R = [R11 , 0]
and the limits of (C.16) and (C.17) are simpliﬁed as follows
−1 ˆ −1 N T
−1
¯
˜ ˜
˜ ˜
∑ ∑ xit xit Ω T ∑ ∑ xit xit R
i=1 t=1
i=1 t=1
1 F
1 F
˙
˙
⇒ R11 (G
H (r, λ )2 dr)−1 Λ∗ PF (b, λ , QF )Λ∗ (G
H (r, λ )2 dr)−1 R11
0
0
¯
¯ ¯
= Λ1 PF (b, λ , QF )Λ1
R T −1

N

T

and
√
ˆ
R T (β − β ) ⇒ R11 (G

1 F
1 F
1 F
˙
¯
¯
H (r, λ )2 dr)−1 Λ∗
H (r, λ )dW ∗ (r) = Λ1
H (r, λ )dW (r)
0
0
0
¯
¯
where W(r) is a q1 × 1 vector of standard Wiener processes and Λ1 is the matrix square root of
the matrix
1 F
1 F
˙ ˙
R11 (G
H (r, λ )2 dr)−1 Λ∗ Λ∗ (G
H (r, λ )2 dr)−1 R11 .
0
0
It directly follows that
1 F
1 F
¯
¯
¯ ¯
¯
¯
H (r, λ )dW (r)) (Λ1 PF (b, λ , QF )Λ1 )−1 Λ1
H (r, λ )dW (r)
0
0
1 F
1 F
¯
¯
¯
=(
H (r, λ )dW (r)) (PF (b, λ , QF ))−1
H (r, λ )dW (r)
0
0

¯
Wald ⇒ (Λ1

If q1 = 0 and R21 = 0, that is, we are testing restrictions on the additional regressors, then R =
[0, R22 ] and the limits of (C.16) and (C.17) are simpliﬁed as follows
R(T −1

N

T

N

T

ˆ
¯
˜ ˜
˜ ˜
∑ ∑ xit xit )−1Ω(T −1 ∑ ∑ xit xit )−1R

i=1 t=1
i=1 t=1
N
N
N
N
˙
˙
¯
¯
⇒ R22 ( ∑ Qi )−1 ( ∑ Ai )ΛP(b, B)Λ ( ∑ Ai ) ( ∑ Qi )−1 R22 = Λ2 P(b, B)Λ2
i=1
i=1
i=1
i=1
N
N
√
ˆ
˙
¯
R T (β − β ) ⇒ R22 ( ∑ Qi )−1 ( ∑ Ai )ΛW (1) = Λ2Wq (1)
i=1
i=1
94

¯
where Wq (1) is a q2 × 1 vector of standard Wiener processes and Λ2 is the matrix square root of
the matrix
N
N
N
N
˙˙
R22 ( ∑ Qi )−1 ( ∑ Ai )ΛΛ ( ∑ Ai ) ( ∑ Qi )−1 R22
i=1
i=1
i=1
i=1
It directly follows that
¯
¯
¯
¯
Wald ⇒ (Λ2Wq (1)) (Λ2 P(b, B)Λ2 )−1 Λ2Wq (1) = Wq (1) Pq (b, B)−1Wq (1)
Proof of Theorem 2.4. The key step is to show that the limits of

√
−1 ˆ
ˆ
¯
T (β − β ) and T 2 S[rT ] take

the same form as in Theorem 2.3. Once these results are obtained, the rest of the proof closely
follows the proof in Theorem 2.3 and details are omitted. With both trend functions and time
period dummies in the model it follows that
ˆ
zit uit = (zit − bi f(t))uit − N −1
˜

N

ˆ
∑ (z jt − b j f(t))uit

j=1

ˆ
= ((zit − bi f(t)) − (bi f(t) − bi f(t)))uit − N −1
= (zit − bi f(t))uit − N −1
+ N −1

N

ˆ
∑ ((z jt − b j f(t)) − (b j f(t) − b j f(t)))uit

j=1

N

ˆ
∑ (z jt − b j f(t))uit − (bif(t) − bif(t))uit

j=1

N

ˆ
∑ (b j f(t) − b j f(t))uit

j=1
ii
= vt − N −1

N

∑

j=1
= ([0, ei ⊗ IK ] −
+ N −1

ji
ˆ
vt − (bi − bi ) f(t)uit + N −1

N

ˆ
∑ (b j − b j ) f(t)uit

j=1

1
ex
ˆ
[0, ι ⊗ IK ])(ei ⊗ INK+1 )vt − (bi − bi ) f(t)uit
N

N

ˆ
∑ (b j − b j ) f(t)uit

j=1

−1 ˆ
ex
= Aex vt − (τT (bi − bi )) τT f(t)uit + N −1
i

95

N

−1 ˆ
∑ (τT (b j − b j )) τT f(t)uit

j=1

Using this formula it directly follows that
1 [rT ]
1
1 [rT ]
ex T − 2
ex − T − 2 (√T τ −1 (b − b )) · T − 2
ˆ
T
˜
∑ zit uit = Ai
∑ vt
∑ τT f(t)uit
i
i
T
t=1
t=1
t=1
[rT ]
N −1 √
−1
−1 ˆ
+ N −1 ∑ T 2 ( T τT (b j − b j )) · T 2 ∑ τT f(t)uit
t=1
j=1
1 [rT ]
ex T − 2
= Ai
∑ vtex + o p(1) ⇒ Aex ΛexW ex (r)
i
t=1
using Assumption 2.1 and 2.5. Using (C.13), we have
−1
2

T −1

[rT ]

[rT ] N

∑ ∑

t=1 i=1

(C.18)

[rT ] N

1 N
ˆ
(z − z jt )]
∑ ∑
N ∑ jt
j=1
t=1 i=1
[rT ] N
[rT ]
1 N
−1
−1
ˆ
ˆ
=T
Treat i DU t (zit − zit ) − T
∑ ∑
∑ N ∑ (z jt − z jt )
t=1 i=1
t=1 j=1
N
· ∑ Treat i DU t
i=1
[rT ]
[rT ] N
1 N
−1
−1
ˆ
ˆ
Treat i DU t (zit − zit ) − T
=T
∑ N ∑ (z jt − z jt )
∑ ∑
t=1 j=1
t=1 i=1

˜
Treat i DU t · zit = T −1

ˆ
Treat i DU t [zit − zit −

·0
N
=

∑

Treat i T −1

[rT ]

p

ˆ
→
∑ DU t (zit − zit ) − 0

t=1
i=1
Using Assumption 2.1, 2.3 and (C.18) it immediately follows that


1 H F (r, λ )2 dr)−1
˜
√
0 
(G 0
ˆ
T (β − β ) ⇒ 

¯
0
Q−1


˜ ⊗ e )Λex 1 [1(r > λ ) − F(r) ( 1 F(s)F(s) ds)−1 1 F(s)ds]dW ex (r)
(A ¯1

0
0
λ

·
N


( ∑ Aex )ΛexW ex (1)
i
i=1


1 H F (r, λ )2 dr)−1 Λex∗ 1 H F (r, λ )dW ex∗ (r)
˜
(G 0

0
=

¯
Q−1 (∑N Aex )ΛexW ex (1)
i=1 i

96

[rT ] N
−1 ˆ
¯
2S
is given next. From (C.15) we know ∑ ∑ (zit − zit )uit = o p (1).
ˆ ˆ
[rT ]
t=1 i=1
Similarly, it can be shown that

The result for T

T
N [rT ]
−1 T
f(s)f(s)
ˆ
(z jt − z jt )uit = ∑ ∑ (z jt − z jt )f(t)
ˆ ˆ
∑
∑ uisf(s)
∑ ∑
s=1
s=1
t=1 j=1
j=1 t=1
N
T
−1 T
p
= ∑ o p (1) · ∑ f(s)f(s)
→
∑ uisf(s) − 0
j=1
s=1
s=1

[rT ] N

Direct calculation gives
T

−1
2

[rT ] N
˜ ˜
∑ ∑ zit uit

t=1 i=1
[rT ] N
[rT ] N
1 N
−1
−1
=T 2 ∑ ∑ zit (uit − uit −
(u jt − u jt )) = T 2 ∑ ∑ zit (uit − uit )
ˆ
˜
ˆ
˜
ˆ
N ∑
t=1 i=1
j=1
t=1 i=1
[rT ] N
1 [rT ] N
1 N
−2
−1
˜
(zit − zit −
ˆ
(z − z jt ))uit
ˆ
ˆ
=T 2 ∑ ∑ zit uit − T
∑ ∑
N ∑ jt
t=1 i=1
t=1 i=1
j=1
[rT ] N
1 [rT ] N
1 [rT ] N
1 N
−2
−2
−1
2 ∑ ∑
=T
(z − z jt )uit
ˆ ˆ
˜
ˆ ˆ
∑ ∑ zit uit − T ∑ ∑ (zit − zit )uit + T
N ∑ jt
t=1 i=1
t=1 i=1
t=1 i=1 j=1
[rT ] N
−1
=T 2 ∑ ∑ zit uit + o p (1)
˜
t=1 i=1

97

Therefore,
T

[rT ] N
[rT ] N
√
−1
−1 ˆ
ˆ
¯
2S
= T 2 ∑ ∑ xit uit − (T −1 ∑ ∑ xit xit ) T (β − β )
˜ ˜
˜ ˜
[rT ]
t=1 i=1
t=1 i=1


1 [rT ] N
−
T 2 ∑ ∑ Treat i DU t uit 
˜


t=1 i=1


=

1 [rT ] N
 −

T 2 ∑ ∑ zit uit + o p (1)
˜
 t=1 i=1

[rT ] N
[rT ] N
T −1 ∑ ∑ (Treat i DU t )2
T −1 ∑ ∑ Treat i DU t zit  √
˜


t=1 i=1
t=1 i=1
ˆ
 T (β − β )

−

[rT ] N
[rT ] N
 −1

˜
˜ ˜
T
T −1 ∑ ∑ zit zit
∑ ∑ zit Treat i DU t
t=1 i=1
t=1 i=1


1 F(s)F(s) ds −1
ex∗ r H F (s, λ )dW ex∗ (s) − 1 dW (s)F(s)
Λ
0
0
0




r


· 0 F(s)H F (s, λ )ds
⇒



N


ex )ΛexW ex (r)
( ∑ Ai
i=1


2
˜ r F
G 0 H (s, λ ) ds 0 
−

¯
0
rQ


˜ 1 H F (r, λ )2 dr)−1 (Λex∗ 1 H F (s, λ )dW ex∗ (s)

(G 0
0

·
N


¯
Q−1 ( ∑ Aex )ΛexW ex (1)
i
i=1




ex∗ QF (r, λ ,W ex∗ )
Λ
F (r, λ ,W ex∗ )


Q

 = Λex∗∗ 
= N




( ∑ Aex )Λex Bex (r)
Bex (r)
i
i=1

98

Appendix D
TABLES IN CHAPTER 2

Table D.1: 90% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 1.506 1.728 1.953 2.148 2.325 2.485 2.624 2.744 2.864 2.975
0.2 1.380 1.476 1.571 1.663 1.752 1.843 1.940 2.024 2.107 2.185
0.3 1.335 1.390 1.449 1.506 1.569 1.629 1.689 1.751 1.808 1.873
0.4 1.322 1.360 1.409 1.454 1.499 1.545 1.594 1.645 1.699 1.747
0.5 1.325 1.370 1.415 1.458 1.506 1.547 1.599 1.647 1.697 1.750
0.6 1.326 1.374 1.411 1.457 1.501 1.556 1.606 1.658 1.712 1.768
0.7 1.342 1.402 1.463 1.526 1.586 1.649 1.714 1.774 1.838 1.899
0.8 1.377 1.469 1.570 1.663 1.753 1.845 1.932 2.022 2.107 2.186
0.9 1.505 1.732 1.953 2.143 2.318 2.472 2.611 2.745 2.862 2.970
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 3.076 3.174 3.269 3.357 3.442 3.529 3.605 3.686 3.768 3.847
0.2 2.267 2.345 2.416 2.484 2.554 2.621 2.684 2.751 2.814 2.881
0.3 1.938 2.001 2.064 2.131 2.197 2.253 2.313 2.369 2.420 2.476
0.4 1.805 1.862 1.922 1.978 2.036 2.094 2.147 2.200 2.257 2.313
0.5 1.801 1.857 1.916 1.971 2.026 2.086 2.141 2.194 2.247 2.301
0.6 1.822 1.879 1.934 1.990 2.045 2.105 2.158 2.214 2.272 2.329
0.7 1.962 2.025 2.089 2.155 2.218 2.281 2.338 2.394 2.450 2.505
0.8 2.261 2.337 2.403 2.473 2.540 2.607 2.670 2.737 2.800 2.862
0.9 3.067 3.175 3.274 3.371 3.449 3.534 3.619 3.703 3.788 3.867
Continued on next page.

99

Table D.1 (cont’d)
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 3.926 3.997 4.087 4.163 4.228 4.303 4.381 4.448 4.517 4.585
0.2 2.946 3.009 3.071 3.122 3.174 3.228 3.287 3.339 3.385 3.443
0.3 2.528 2.578 2.633 2.682 2.739 2.797 2.846 2.898 2.947 2.992
0.4 2.370 2.424 2.482 2.536 2.585 2.635 2.686 2.734 2.781 2.830
0.5 2.361 2.416 2.472 2.528 2.577 2.628 2.674 2.719 2.765 2.812
0.6 2.382 2.440 2.496 2.541 2.589 2.643 2.686 2.733 2.773 2.824
0.7 2.562 2.619 2.670 2.727 2.781 2.837 2.888 2.940 2.986 3.028
0.8 2.916 2.979 3.034 3.096 3.156 3.214 3.271 3.326 3.384 3.432
0.9 3.943 4.034 4.106 4.177 4.250 4.320 4.383 4.450 4.526 4.591
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 4.656 4.725 4.796 4.859 4.923 4.996 5.065 5.130 5.191 5.255
0.2 3.490 3.546 3.602 3.656 3.711 3.756 3.817 3.862 3.911 3.962
0.3 3.038 3.088 3.130 3.174 3.220 3.262 3.308 3.347 3.389 3.426
0.4 2.876 2.914 2.954 2.994 3.033 3.074 3.114 3.152 3.189 3.223
0.5 2.852 2.896 2.942 2.983 3.025 3.062 3.104 3.145 3.182 3.226
0.6 2.869 2.912 2.961 3.000 3.041 3.078 3.116 3.155 3.199 3.237
0.7 3.073 3.120 3.164 3.209 3.252 3.292 3.334 3.381 3.426 3.468
0.8 3.486 3.537 3.588 3.630 3.688 3.742 3.792 3.845 3.899 3.947
0.9 4.661 4.732 4.800 4.873 4.944 5.012 5.070 5.136 5.199 5.262
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 5.306 5.377 5.436 5.501 5.551 5.607 5.670 5.727 5.789 5.850
0.2 4.004 4.045 4.090 4.139 4.186 4.226 4.277 4.320 4.366 4.411
0.3 3.468 3.500 3.541 3.584 3.625 3.661 3.702 3.742 3.778 3.817
0.4 3.262 3.288 3.323 3.361 3.396 3.434 3.474 3.511 3.547 3.582
Continued on next page.
100

Table D.1 (cont’d)
0.5 3.267 3.298 3.337 3.376 3.411 3.448 3.484 3.521 3.560 3.597
0.6 3.277 3.318 3.354 3.384 3.420 3.457 3.493 3.531 3.567 3.605
0.7 3.507 3.544 3.584 3.630 3.674 3.712 3.748 3.788 3.827 3.865
0.8 3.989 4.039 4.082 4.124 4.169 4.205 4.259 4.305 4.349 4.393
0.9 5.319 5.388 5.449 5.522 5.590 5.646 5.687 5.754 5.812 5.872

101

Table D.2: 95% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 1.980 2.313 2.618 2.873 3.104 3.302 3.476 3.642 3.803 3.952
0.2 1.775 1.915 2.052 2.190 2.314 2.448 2.571 2.686 2.794 2.902
0.3 1.720 1.801 1.883 1.975 2.057 2.145 2.236 2.324 2.407 2.498
0.4 1.710 1.773 1.833 1.900 1.971 2.036 2.111 2.188 2.268 2.345
0.5 1.712 1.766 1.831 1.902 1.965 2.032 2.102 2.172 2.253 2.324
0.6 1.704 1.767 1.839 1.904 1.969 2.044 2.120 2.193 2.265 2.346
0.7 1.719 1.810 1.893 1.993 2.070 2.160 2.254 2.345 2.436 2.525
0.8 1.788 1.922 2.066 2.195 2.325 2.456 2.569 2.686 2.797 2.907
0.9 1.983 2.325 2.621 2.890 3.126 3.341 3.522 3.678 3.830 3.986
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 4.093 4.216 4.330 4.460 4.574 4.681 4.791 4.891 5.018 5.124
0.2 3.005 3.102 3.194 3.289 3.379 3.471 3.564 3.644 3.733 3.814
0.3 2.579 2.663 2.750 2.843 2.931 3.009 3.086 3.156 3.246 3.321
0.4 2.423 2.501 2.581 2.662 2.745 2.816 2.893 2.971 3.047 3.125
0.5 2.399 2.470 2.551 2.632 2.710 2.781 2.863 2.936 3.010 3.086
0.6 2.431 2.501 2.574 2.656 2.732 2.808 2.887 2.970 3.053 3.125
0.7 2.617 2.703 2.790 2.874 2.967 3.044 3.124 3.208 3.276 3.350
0.8 3.011 3.114 3.219 3.301 3.391 3.482 3.572 3.656 3.739 3.813
0.9 4.125 4.254 4.376 4.503 4.622 4.737 4.857 4.962 5.076 5.183
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 5.230 5.338 5.455 5.564 5.668 5.762 5.855 5.946 6.050 6.138
0.2 3.899 3.974 4.053 4.134 4.199 4.278 4.355 4.434 4.516 4.581
0.3 3.386 3.457 3.530 3.617 3.694 3.766 3.838 3.902 3.965 4.026
Continued on next page.

102

Table D.2 (cont’d)
0.4 3.198 3.266 3.329 3.417 3.479 3.547 3.606 3.680 3.737 3.795
0.5 3.161 3.231 3.306 3.380 3.448 3.521 3.588 3.663 3.714 3.780
0.6 3.206 3.273 3.352 3.426 3.496 3.567 3.640 3.696 3.754 3.814
0.7 3.429 3.497 3.570 3.644 3.709 3.783 3.851 3.910 3.974 4.045
0.8 3.893 3.968 4.055 4.143 4.224 4.295 4.362 4.444 4.519 4.596
0.9 5.294 5.399 5.505 5.616 5.714 5.814 5.923 6.007 6.101 6.186
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 6.219 6.327 6.419 6.507 6.593 6.669 6.757 6.849 6.944 7.032
0.2 4.647 4.723 4.787 4.878 4.950 5.017 5.095 5.150 5.214 5.267
0.3 4.084 4.144 4.207 4.264 4.322 4.368 4.417 4.477 4.533 4.577
0.4 3.859 3.917 3.980 4.034 4.094 4.134 4.184 4.226 4.279 4.337
0.5 3.835 3.893 3.945 3.989 4.045 4.096 4.138 4.186 4.235 4.290
0.6 3.870 3.926 3.987 4.052 4.090 4.143 4.190 4.235 4.282 4.332
0.7 4.121 4.187 4.239 4.289 4.353 4.415 4.462 4.511 4.566 4.622
0.8 4.681 4.755 4.814 4.895 4.970 5.041 5.115 5.179 5.244 5.310
0.9 6.281 6.370 6.467 6.551 6.639 6.743 6.840 6.925 7.012 7.106
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 7.114 7.205 7.279 7.375 7.454 7.535 7.612 7.701 7.780 7.862
0.2 5.328 5.382 5.440 5.503 5.556 5.608 5.667 5.723 5.783 5.840
0.3 4.627 4.684 4.726 4.782 4.842 4.886 4.938 4.985 5.036 5.087
0.4 4.383 4.427 4.481 4.530 4.582 4.630 4.682 4.729 4.774 4.823
0.5 4.337 4.387 4.444 4.489 4.531 4.577 4.635 4.687 4.735 4.781
0.6 4.387 4.442 4.493 4.541 4.587 4.633 4.680 4.721 4.772 4.821
0.7 4.683 4.746 4.786 4.828 4.887 4.941 4.995 5.044 5.099 5.149
0.8 5.389 5.433 5.486 5.535 5.585 5.639 5.707 5.768 5.829 5.886
Continued on next page.
103

Table D.2 (cont’d)
0.9 7.189 7.274 7.349 7.432 7.502 7.590 7.660 7.732 7.810 7.890

104

Table D.3: 97.5% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 2.440 2.861 3.245 3.560 3.835 4.075 4.289 4.490 4.674 4.852
0.2 2.132 2.328 2.508 2.684 2.850 2.987 3.137 3.281 3.421 3.555
0.3 2.054 2.165 2.289 2.409 2.532 2.642 2.764 2.873 2.973 3.088
0.4 2.037 2.128 2.220 2.320 2.407 2.499 2.601 2.708 2.799 2.901
0.5 2.056 2.130 2.214 2.286 2.375 2.469 2.566 2.669 2.766 2.865
0.6 2.040 2.120 2.206 2.296 2.401 2.490 2.577 2.675 2.792 2.902
0.7 2.064 2.186 2.300 2.427 2.530 2.641 2.765 2.878 2.982 3.098
0.8 2.140 2.325 2.506 2.687 2.868 3.016 3.163 3.311 3.451 3.594
0.9 2.413 2.862 3.235 3.547 3.835 4.086 4.320 4.530 4.720 4.900
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 5.032 5.195 5.348 5.500 5.651 5.791 5.947 6.083 6.221 6.370
0.2 3.695 3.813 3.935 4.045 4.164 4.274 4.403 4.515 4.619 4.714
0.3 3.199 3.312 3.424 3.536 3.648 3.751 3.855 3.961 4.064 4.153
0.4 3.003 3.095 3.198 3.306 3.397 3.501 3.608 3.713 3.800 3.893
0.5 2.958 3.060 3.163 3.268 3.360 3.456 3.560 3.658 3.744 3.829
0.6 2.994 3.087 3.190 3.293 3.397 3.499 3.590 3.692 3.783 3.883
0.7 3.210 3.317 3.426 3.530 3.641 3.749 3.849 3.946 4.047 4.146
0.8 3.726 3.867 3.985 4.103 4.205 4.341 4.459 4.587 4.685 4.778
0.9 5.090 5.276 5.432 5.584 5.735 5.884 6.035 6.204 6.334 6.454
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 6.509 6.636 6.773 6.896 7.035 7.150 7.256 7.362 7.483 7.610
0.2 4.811 4.909 5.026 5.109 5.200 5.285 5.364 5.474 5.567 5.658
0.3 4.251 4.333 4.414 4.502 4.589 4.671 4.769 4.842 4.941 5.021
Continued on next page.

105

Table D.3 (cont’d)
0.4 3.988 4.076 4.176 4.259 4.334 4.423 4.502 4.595 4.672 4.756
0.5 3.916 4.006 4.099 4.211 4.302 4.390 4.466 4.546 4.631 4.690
0.6 3.980 4.068 4.149 4.237 4.315 4.395 4.479 4.548 4.638 4.716
0.7 4.238 4.341 4.442 4.542 4.646 4.728 4.812 4.890 4.966 5.047
0.8 4.889 4.997 5.120 5.215 5.300 5.394 5.496 5.590 5.670 5.762
0.9 6.601 6.749 6.880 6.997 7.106 7.237 7.368 7.509 7.645 7.773
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 7.725 7.852 7.981 8.079 8.213 8.338 8.429 8.539 8.648 8.741
0.2 5.749 5.833 5.923 5.992 6.080 6.167 6.246 6.324 6.412 6.480
0.3 5.098 5.191 5.264 5.343 5.420 5.481 5.546 5.595 5.675 5.732
0.4 4.822 4.885 4.942 5.007 5.081 5.147 5.216 5.280 5.348 5.410
0.5 4.770 4.846 4.913 4.972 5.029 5.084 5.159 5.213 5.277 5.350
0.6 4.772 4.854 4.926 5.000 5.069 5.119 5.181 5.226 5.294 5.353
0.7 5.115 5.205 5.277 5.356 5.424 5.475 5.551 5.624 5.684 5.751
0.8 5.859 5.944 6.011 6.111 6.194 6.293 6.368 6.441 6.526 6.608
0.9 7.882 8.009 8.122 8.249 8.368 8.485 8.576 8.692 8.771 8.896
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 8.869 8.976 9.066 9.151 9.272 9.361 9.440 9.549 9.633 9.729
0.2 6.554 6.635 6.696 6.763 6.841 6.920 6.993 7.068 7.151 7.228
0.3 5.805 5.882 5.944 6.003 6.074 6.123 6.189 6.255 6.316 6.380
0.4 5.483 5.529 5.577 5.640 5.706 5.766 5.829 5.889 5.953 6.014
0.5 5.398 5.461 5.526 5.588 5.647 5.718 5.775 5.837 5.899 5.958
0.6 5.409 5.478 5.541 5.603 5.663 5.728 5.787 5.842 5.898 5.959
0.7 5.826 5.888 5.958 6.032 6.083 6.142 6.213 6.275 6.341 6.405
0.8 6.688 6.759 6.827 6.896 6.968 7.048 7.097 7.171 7.246 7.319
Continued on next page.
106

Table D.3 (cont’d)
0.9 8.994 9.109 9.215 9.317 9.415 9.506 9.602 9.696 9.790 9.881

107

Table D.4: 99% Asymptotic Critical Values for tDD (Bartlett Kernel) Without Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1

2.952

3.546

3.998

4.375

4.716

5.015

5.287

5.556

5.809

6.073

0.2

2.586

2.838

3.092

3.305

3.522

3.725

3.906

4.103

4.259

4.424

0.3

2.479

2.649

2.799

2.959

3.117

3.297

3.443

3.604

3.754

3.884

0.4

2.451

2.563

2.690

2.808

2.950

3.066

3.213

3.357

3.492

3.629

0.5

2.446

2.537

2.646

2.747

2.871

3.003

3.144

3.266

3.391

3.505

0.6

2.438

2.546

2.674

2.794

2.924

3.059

3.212

3.324

3.462

3.597

0.7

2.474

2.630

2.793

2.959

3.107

3.243

3.389

3.530

3.689

3.844

0.8

2.588

2.860

3.090

3.312

3.527

3.741

3.936

4.136

4.286

4.472

0.9

2.962

3.569

4.053

4.436

4.782

5.101

5.387

5.629

5.884

6.113

b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1

6.308

6.506

6.702

6.923

7.099

7.313

7.505

7.681

7.832

7.992

0.2

4.592

4.731

4.907

5.061

5.224

5.382

5.512

5.600

5.740

5.894

0.3

4.031

4.167

4.295

4.444

4.594

4.733

4.842

4.973

5.080

5.202

0.4

3.744

3.876

4.013

4.120

4.244

4.383

4.546

4.675

4.791

4.915

0.5

3.627

3.756

3.895

4.038

4.168

4.304

4.412

4.541

4.666

4.770

0.6

3.719

3.829

3.962

4.071

4.195

4.339

4.465

4.590

4.709

4.831

0.7

3.989

4.141

4.260

4.409

4.536

4.680

4.800

4.927

5.029

5.149

0.8

4.613

4.781

4.938

5.065

5.228

5.377

5.528

5.683

5.823

5.965

0.9

6.344

6.550

6.713

6.927

7.155

7.305

7.481

7.671

7.857

8.026

b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1

8.148

8.336

8.522

8.659

8.783

8.923

9.078

9.188

9.352

9.508

0.2

6.020

6.144

6.273

6.383

6.522

6.639

6.747

6.857

6.977

7.087

0.3

5.324

5.444

5.559

5.664

5.766

5.863

5.931

6.080

6.169

6.275

Continued on next page.

108

Table D.4 (cont’d)
0.4

5.034

5.138

5.238

5.347

5.471

5.567

5.665

5.773

5.880

6.029

0.5

4.877

4.971

5.081

5.192

5.302

5.431

5.527

5.631

5.711

5.797

0.6

4.951

5.050

5.191

5.297

5.387

5.472

5.587

5.686

5.777

5.849

0.7

5.261

5.358

5.498

5.626

5.725

5.865

5.977

6.106

6.198

6.265

0.8

6.127

6.249

6.381

6.538

6.667

6.775

6.871

7.011

7.109

7.199

0.9

8.197

8.393

8.575

8.718

8.857

9.021

9.170

9.316

9.456

9.597

b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1

9.700

9.871

9.996

10.115 10.288 10.405 10.606 10.750 10.861 11.013

0.2

7.204

7.293

7.406

7.515

7.598

7.717

7.818

7.932

8.024

8.128

0.3

6.391

6.467

6.546

6.659

6.780

6.870

6.975

7.039

7.100

7.172

0.4

6.081

6.146

6.222

6.348

6.416

6.499

6.558

6.646

6.723

6.806

0.5

5.873

6.002

6.102

6.182

6.283

6.368

6.428

6.488

6.551

6.621

0.6

5.940

6.019

6.105

6.194

6.280

6.359

6.460

6.538

6.623

6.685

0.7

6.357

6.463

6.548

6.637

6.768

6.848

6.933

7.018

7.089

7.171

0.8

7.308

7.413

7.516

7.628

7.739

7.872

7.985

8.124

8.191

8.264

0.9

9.750

9.872

10.025 10.184 10.305 10.430 10.620 10.755 10.895 11.003

b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 11.118 11.259 11.408 11.569 11.711 11.775 11.905 12.014 12.098 12.221
0.2

8.198

8.305

8.404

8.451

8.537

8.636

8.719

8.808

8.887

8.982

0.3

7.260

7.364

7.461

7.541

7.639

7.723

7.785

7.863

7.944

8.014

0.4

6.889

6.968

7.036

7.099

7.190

7.260

7.324

7.399

7.487

7.560

0.5

6.725

6.813

6.861

6.936

7.010

7.085

7.168

7.237

7.307

7.380

0.6

6.758

6.826

6.879

6.970

7.038

7.120

7.197

7.277

7.353

7.427

0.7

7.249

7.346

7.413

7.507

7.591

7.655

7.730

7.824

7.900

7.976

0.8

8.395

8.458

8.524

8.633

8.733

8.814

8.913

8.998

9.089

9.184

Continued on next page.
109

Table D.4 (cont’d)
0.9 11.170 11.242 11.364 11.519 11.671 11.737 11.831 11.946 12.111 12.236

110

Table D.5: 90% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 1.438 1.591 1.736 1.881 2.015 2.136 2.247 2.349 2.446 2.547
0.2 1.362 1.443 1.522 1.608 1.693 1.783 1.870 1.959 2.052 2.142
0.3 1.346 1.422 1.495 1.577 1.660 1.748 1.824 1.904 1.991 2.076
0.4 1.364 1.440 1.520 1.599 1.681 1.757 1.837 1.916 1.988 2.060
0.5 1.358 1.432 1.507 1.586 1.660 1.740 1.819 1.894 1.968 2.043
0.6 1.340 1.413 1.492 1.568 1.644 1.727 1.803 1.890 1.965 2.048
0.7 1.360 1.431 1.513 1.592 1.669 1.762 1.844 1.930 2.008 2.094
0.8 1.366 1.443 1.526 1.616 1.699 1.789 1.881 1.975 2.067 2.158
0.9 1.439 1.600 1.752 1.887 2.018 2.142 2.250 2.356 2.455 2.550
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 2.638 2.724 2.802 2.886 2.977 3.055 3.139 3.203 3.269 3.340
0.2 2.223 2.302 2.376 2.443 2.515 2.581 2.647 2.715 2.769 2.822
0.3 2.156 2.232 2.308 2.377 2.448 2.512 2.575 2.633 2.700 2.753
0.4 2.139 2.213 2.280 2.344 2.395 2.450 2.506 2.558 2.607 2.647
0.5 2.112 2.168 2.223 2.280 2.327 2.383 2.426 2.465 2.505 2.552
0.6 2.115 2.185 2.250 2.308 2.366 2.424 2.475 2.522 2.566 2.607
0.7 2.171 2.250 2.329 2.398 2.462 2.529 2.592 2.651 2.706 2.761
0.8 2.240 2.321 2.401 2.478 2.545 2.616 2.674 2.736 2.793 2.849
0.9 2.651 2.742 2.838 2.920 3.002 3.078 3.152 3.226 3.306 3.375
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 3.407 3.469 3.533 3.600 3.660 3.716 3.778 3.832 3.887 3.943
0.2 2.879 2.935 2.990 3.034 3.085 3.137 3.180 3.228 3.272 3.316
0.3 2.799 2.852 2.897 2.943 2.988 3.030 3.082 3.117 3.170 3.221
Continued on next page.

111

Table D.5 (cont’d)
0.4 2.686 2.731 2.779 2.826 2.873 2.912 2.964 3.009 3.055 3.096
0.5 2.596 2.632 2.664 2.704 2.739 2.786 2.832 2.879 2.922 2.973
0.6 2.652 2.697 2.746 2.795 2.844 2.886 2.935 2.977 3.019 3.061
0.7 2.810 2.855 2.904 2.950 2.996 3.040 3.083 3.132 3.182 3.226
0.8 2.900 2.948 3.003 3.054 3.103 3.151 3.197 3.249 3.293 3.338
0.9 3.439 3.506 3.575 3.639 3.698 3.761 3.819 3.871 3.921 3.982
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 3.996 4.043 4.095 4.147 4.193 4.251 4.304 4.362 4.410 4.466
0.2 3.360 3.408 3.457 3.501 3.556 3.600 3.657 3.700 3.747 3.800
0.3 3.268 3.313 3.368 3.407 3.452 3.502 3.548 3.602 3.644 3.691
0.4 3.140 3.191 3.233 3.282 3.329 3.372 3.419 3.465 3.511 3.555
0.5 3.017 3.063 3.103 3.153 3.193 3.238 3.281 3.320 3.362 3.400
0.6 3.103 3.144 3.193 3.246 3.285 3.333 3.378 3.424 3.466 3.509
0.7 3.281 3.327 3.374 3.426 3.481 3.532 3.587 3.629 3.678 3.724
0.8 3.381 3.428 3.476 3.524 3.573 3.625 3.674 3.722 3.771 3.816
0.9 4.033 4.084 4.141 4.191 4.243 4.292 4.349 4.398 4.453 4.507
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 4.524 4.576 4.629 4.685 4.732 4.779 4.842 4.894 4.942 4.992
0.2 3.847 3.894 3.934 3.980 4.025 4.068 4.112 4.158 4.201 4.242
0.3 3.738 3.785 3.834 3.879 3.928 3.972 4.020 4.063 4.104 4.145
0.4 3.596 3.638 3.677 3.719 3.761 3.801 3.841 3.882 3.924 3.965
0.5 3.442 3.483 3.522 3.562 3.599 3.635 3.674 3.717 3.755 3.792
0.6 3.550 3.593 3.636 3.677 3.717 3.757 3.795 3.832 3.874 3.913
0.7 3.771 3.810 3.854 3.905 3.949 3.993 4.036 4.079 4.120 4.163
0.8 3.856 3.908 3.953 3.992 4.043 4.083 4.129 4.176 4.216 4.262
Continued on next page.
112

Table D.5 (cont’d)
0.9 4.565 4.612 4.663 4.720 4.781 4.824 4.879 4.928 4.976 5.028

113

Table D.6: 95% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 1.877 2.095 2.306 2.497 2.679 2.840 2.986 3.143 3.279 3.408
0.2 1.755 1.864 1.976 2.102 2.225 2.349 2.467 2.601 2.734 2.859
0.3 1.739 1.843 1.953 2.066 2.181 2.294 2.414 2.526 2.646 2.757
0.4 1.755 1.873 1.980 2.101 2.213 2.325 2.435 2.549 2.661 2.749
0.5 1.745 1.856 1.961 2.081 2.199 2.305 2.409 2.524 2.619 2.713
0.6 1.735 1.844 1.965 2.073 2.190 2.306 2.416 2.533 2.636 2.734
0.7 1.749 1.857 1.971 2.092 2.203 2.323 2.433 2.556 2.675 2.792
0.8 1.748 1.866 1.989 2.115 2.247 2.371 2.497 2.629 2.760 2.873
0.9 1.874 2.091 2.301 2.489 2.665 2.832 2.982 3.142 3.278 3.414
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 3.539 3.671 3.788 3.909 4.013 4.127 4.227 4.321 4.427 4.505
0.2 2.985 3.088 3.196 3.287 3.379 3.466 3.563 3.644 3.724 3.798
0.3 2.861 2.970 3.067 3.166 3.261 3.353 3.436 3.521 3.603 3.685
0.4 2.857 2.945 3.030 3.117 3.195 3.271 3.345 3.413 3.476 3.550
0.5 2.799 2.879 2.957 3.030 3.115 3.173 3.238 3.310 3.371 3.432
0.6 2.832 2.925 3.022 3.108 3.184 3.256 3.328 3.398 3.465 3.540
0.7 2.898 3.003 3.112 3.211 3.298 3.383 3.458 3.542 3.622 3.696
0.8 2.979 3.089 3.186 3.276 3.363 3.444 3.529 3.609 3.690 3.763
0.9 3.555 3.687 3.799 3.921 4.022 4.124 4.228 4.324 4.417 4.511
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 4.601 4.678 4.757 4.842 4.931 5.012 5.081 5.153 5.228 5.301
0.2 3.870 3.929 3.992 4.055 4.119 4.186 4.256 4.328 4.397 4.465
0.3 3.759 3.821 3.877 3.948 4.019 4.083 4.152 4.220 4.282 4.343
Continued on next page.

114

Table D.6 (cont’d)
0.4 3.613 3.676 3.739 3.800 3.867 3.940 3.996 4.059 4.127 4.196
0.5 3.489 3.547 3.598 3.648 3.706 3.766 3.824 3.893 3.948 4.010
0.6 3.602 3.667 3.721 3.780 3.852 3.911 3.980 4.043 4.107 4.164
0.7 3.767 3.841 3.911 3.977 4.043 4.108 4.168 4.231 4.307 4.371
0.8 3.833 3.909 3.972 4.042 4.118 4.189 4.246 4.304 4.369 4.429
0.9 4.596 4.691 4.776 4.864 4.938 5.001 5.083 5.160 5.232 5.305
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 5.374 5.447 5.525 5.601 5.668 5.738 5.800 5.874 5.952 6.027
0.2 4.534 4.591 4.662 4.722 4.782 4.851 4.914 4.984 5.044 5.114
0.3 4.401 4.470 4.541 4.604 4.679 4.742 4.815 4.886 4.945 5.011
0.4 4.257 4.324 4.377 4.441 4.501 4.560 4.615 4.675 4.733 4.788
0.5 4.072 4.136 4.193 4.249 4.304 4.361 4.410 4.465 4.523 4.577
0.6 4.233 4.301 4.356 4.413 4.471 4.533 4.592 4.648 4.707 4.764
0.7 4.439 4.511 4.582 4.654 4.715 4.785 4.859 4.927 4.985 5.045
0.8 4.505 4.571 4.641 4.701 4.767 4.819 4.890 4.949 5.020 5.087
0.9 5.392 5.468 5.543 5.617 5.686 5.753 5.820 5.892 5.978 6.049
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 6.114 6.167 6.238 6.314 6.379 6.456 6.513 6.583 6.652 6.723
0.2 5.166 5.231 5.289 5.341 5.405 5.475 5.539 5.594 5.653 5.711
0.3 5.071 5.129 5.186 5.242 5.302 5.368 5.418 5.475 5.535 5.590
0.4 4.841 4.899 4.961 5.019 5.078 5.131 5.188 5.243 5.293 5.349
0.5 4.631 4.683 4.735 4.789 4.838 4.892 4.944 4.995 5.048 5.098
0.6 4.822 4.872 4.926 4.981 5.037 5.089 5.146 5.198 5.256 5.310
0.7 5.112 5.179 5.237 5.298 5.359 5.415 5.480 5.537 5.591 5.649
0.8 5.150 5.208 5.281 5.337 5.396 5.458 5.512 5.567 5.624 5.681
Continued on next page.
115

Table D.6 (cont’d)
0.9 6.128 6.176 6.261 6.317 6.390 6.468 6.531 6.589 6.651 6.722

116

Table D.7: 97.5% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend.
b=

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 2.289 2.577 2.845 3.074 3.307 3.527 3.714 3.906 4.067 4.240
0.2 2.121 2.269 2.420 2.587 2.756 2.912 3.060 3.227 3.386 3.538
0.3 2.095 2.243 2.378 2.533 2.677 2.827 2.972 3.127 3.279 3.419
0.4 2.117 2.252 2.393 2.551 2.696 2.852 3.009 3.162 3.301 3.421
0.5 2.073 2.227 2.387 2.535 2.676 2.831 2.978 3.124 3.233 3.351
0.6 2.095 2.239 2.388 2.547 2.690 2.837 2.983 3.127 3.259 3.395
0.7 2.104 2.251 2.394 2.541 2.683 2.834 2.971 3.125 3.270 3.414
0.8 2.111 2.267 2.423 2.591 2.752 2.922 3.081 3.229 3.365 3.516
0.9 2.244 2.538 2.798 3.053 3.298 3.500 3.689 3.863 4.031 4.205
b=

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 4.403 4.559 4.714 4.853 5.003 5.132 5.262 5.396 5.534 5.659
0.2 3.666 3.806 3.949 4.067 4.180 4.270 4.378 4.486 4.580 4.666
0.3 3.546 3.688 3.810 3.934 4.059 4.162 4.266 4.360 4.456 4.547
0.4 3.533 3.635 3.758 3.859 3.956 4.062 4.159 4.248 4.325 4.403
0.5 3.455 3.565 3.676 3.771 3.849 3.938 4.014 4.098 4.185 4.257
0.6 3.536 3.644 3.748 3.873 3.975 4.080 4.173 4.263 4.343 4.435
0.7 3.546 3.676 3.804 3.925 4.069 4.166 4.283 4.381 4.480 4.579
0.8 3.679 3.815 3.941 4.068 4.201 4.306 4.398 4.520 4.619 4.720
0.9 4.374 4.526 4.667 4.807 4.946 5.088 5.219 5.367 5.471 5.591
b=

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 5.778 5.872 5.976 6.092 6.187 6.271 6.374 6.467 6.576 6.666
0.2 4.754 4.852 4.942 5.044 5.133 5.219 5.281 5.361 5.454 5.536
0.3 4.648 4.747 4.838 4.911 4.979 5.071 5.158 5.244 5.331 5.401
Continued on next page.

117

Table D.7 (cont’d)
0.4 4.491 4.584 4.658 4.753 4.844 4.920 5.003 5.082 5.173 5.250
0.5 4.339 4.403 4.474 4.542 4.608 4.688 4.770 4.854 4.929 5.004
0.6 4.506 4.592 4.659 4.750 4.828 4.901 4.978 5.058 5.156 5.241
0.7 4.676 4.762 4.859 4.952 5.032 5.106 5.185 5.251 5.328 5.413
0.8 4.810 4.909 5.002 5.094 5.186 5.277 5.338 5.422 5.502 5.578
0.9 5.695 5.813 5.910 6.011 6.113 6.213 6.297 6.382 6.475 6.570
b=

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 6.748 6.845 6.947 7.041 7.136 7.229 7.336 7.433 7.519 7.618
0.2 5.606 5.679 5.759 5.833 5.932 6.020 6.117 6.204 6.297 6.365
0.3 5.492 5.567 5.655 5.749 5.833 5.917 5.998 6.075 6.143 6.208
0.4 5.338 5.409 5.483 5.555 5.650 5.718 5.806 5.886 5.949 6.018
0.5 5.079 5.153 5.234 5.320 5.393 5.468 5.535 5.611 5.684 5.750
0.6 5.334 5.407 5.466 5.545 5.629 5.689 5.765 5.829 5.900 5.965
0.7 5.509 5.601 5.686 5.769 5.866 5.958 6.037 6.117 6.186 6.270
0.8 5.669 5.756 5.838 5.926 6.018 6.078 6.146 6.220 6.314 6.399
0.9 6.664 6.759 6.850 6.962 7.052 7.128 7.237 7.332 7.421 7.520
b=

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1.0

λ = 0.1 7.698 7.786 7.894 7.967 8.062 8.165 8.261 8.327 8.414 8.500
0.2 6.438 6.527 6.602 6.661 6.719 6.788 6.861 6.933 7.005 7.068
0.3 6.288 6.361 6.434 6.508 6.578 6.653 6.720 6.795 6.872 6.941
0.4 6.088 6.154 6.234 6.309 6.376 6.448 6.502 6.577 6.642 6.708
0.5 5.824 5.890 5.946 6.015 6.072 6.141 6.204 6.263 6.327 6.395
0.6 6.039 6.112 6.182 6.255 6.317 6.388 6.453 6.522 6.588 6.657
0.7 6.332 6.417 6.488 6.553 6.634 6.701 6.774 6.848 6.919 6.989
0.8 6.485 6.570 6.618 6.702 6.784 6.854 6.926 6.997 7.060 7.134
Continued on next page.
118

Table D.7 (cont’d)
0.9 7.586 7.675 7.783 7.888 7.971 8.052 8.116 8.211 8.296 8.382

119

Table D.8: 99% Asymptotic Critical Values for tDD (Bartlett Kernel) With A Simple Trend.
b=

0.02

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

λ = 0.1 2.762 3.157 3.518

3.795

4.071

4.369

4.605

4.852

5.075

5.293

0.2 2.528 2.742 2.932

3.158

3.344

3.555

3.763

3.946

4.157

4.393

0.3 2.529 2.734 2.929

3.125

3.316

3.529

3.731

3.923

4.125

4.282

0.4 2.511 2.704 2.912

3.108

3.286

3.502

3.697

3.895

4.052

4.243

0.5 2.482 2.688 2.882

3.080

3.275

3.454

3.641

3.818

3.964

4.116

0.6 2.532 2.728 2.924

3.133

3.289

3.505

3.705

3.900

4.098

4.230

0.7 2.490 2.664 2.851

3.052

3.261

3.469

3.637

3.835

4.029

4.229

0.8 2.533 2.753 2.969

3.171

3.389

3.587

3.804

4.016

4.211

4.419

0.9 2.746 3.122 3.459

3.774

4.051

4.340

4.591

4.831

5.059

5.254

b=

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

λ = 0.1 5.533 5.737 5.915

6.041

6.262

6.423

6.589

6.740

6.871

7.052

0.2 4.567 4.726 4.858

5.015

5.132

5.290

5.442

5.567

5.686

5.818

0.3 4.462 4.619 4.773

4.924

5.068

5.230

5.381

5.504

5.586

5.697

0.4 4.399 4.523 4.643

4.790

4.905

5.011

5.122

5.252

5.370

5.498

0.5 4.262 4.398 4.517

4.650

4.770

4.878

5.007

5.102

5.222

5.322

0.6 4.384 4.539 4.682

4.839

4.940

5.062

5.180

5.293

5.396

5.509

0.7 4.408 4.568 4.703

4.868

5.011

5.154

5.305

5.429

5.554

5.662

0.8 4.592 4.764 4.926

5.090

5.236

5.345

5.477

5.603

5.745

5.871

0.9 5.459 5.657 5.835

5.977

6.161

6.320

6.493

6.669

6.818

6.975

b=

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

λ = 0.1 7.181 7.339 7.503

7.654

7.775

7.914

8.041

8.159

8.290

8.429

0.2 5.908 6.025 6.129

6.239

6.371

6.459

6.584

6.709

6.814

6.928

0.3 5.839 5.944 6.060

6.185

6.288

6.386

6.525

6.655

6.774

6.868

0.22

0.42

0.04

0.24

0.44

Continued on next page.

120

Table D.8 (cont’d)
0.4 5.573 5.669 5.788

5.875

5.982

6.086

6.214

6.333

6.440

6.532

0.5 5.427 5.522 5.634

5.733

5.817

5.937

6.026

6.147

6.264

6.362

0.6 5.616 5.737 5.844

5.992

6.098

6.207

6.290

6.399

6.499

6.619

0.7 5.777 5.899 6.020

6.138

6.254

6.360

6.454

6.566

6.668

6.796

0.8 6.016 6.111 6.208

6.311

6.442

6.556

6.690

6.818

6.911

7.036

0.9 7.096 7.230 7.370

7.509

7.647

7.748

7.872

7.989

8.112

8.230

b=

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

λ = 0.1 8.581 8.752 8.829

8.933

9.063

9.148

9.280

9.402

9.548

9.642

0.2 7.011 7.092 7.204

7.313

7.419

7.550

7.618

7.713

7.777

7.911

0.3 6.950 7.087 7.186

7.332

7.422

7.559

7.653

7.751

7.827

7.902

0.4 6.623 6.716 6.826

6.927

7.016

7.103

7.213

7.281

7.375

7.465

0.5 6.467 6.586 6.647

6.750

6.826

6.907

7.017

7.103

7.189

7.283

0.6 6.733 6.813 6.929

6.991

7.079

7.196

7.282

7.390

7.485

7.551

0.7 6.908 7.017 7.123

7.237

7.339

7.461

7.552

7.643

7.745

7.849

0.8 7.146 7.228 7.361

7.451

7.561

7.639

7.757

7.860

7.973

8.080

0.9 8.347 8.470 8.591

8.704

8.812

8.930

9.020

9.114

9.243

9.348

b=

0.88

0.9

0.92

0.94

0.96

0.98

1.0

0.62

0.82

0.64

0.84

0.86

λ = 0.1 9.731 9.861 9.969 10.072 10.168 10.301 10.360 10.475 10.600 10.714
0.2 8.036 8.144 8.239

8.328

8.431

8.519

8.606

8.691

8.770

8.859

0.3 8.010 8.113 8.197

8.291

8.391

8.495

8.601

8.697

8.787

8.877

0.4 7.543 7.631 7.705

7.807

7.900

7.989

8.075

8.160

8.253

8.338

0.5 7.372 7.458 7.551

7.636

7.711

7.789

7.888

7.956

8.059

8.144

0.6 7.646 7.747 7.841

7.938

8.028

8.111

8.193

8.274

8.355

8.440

0.7 7.938 8.057 8.138

8.217

8.302

8.389

8.485

8.573

8.659

8.748

0.8 8.176 8.261 8.344

8.441

8.571

8.646

8.766

8.826

8.901

8.984

Continued on next page.
121

Table D.8 (cont’d)
0.9 9.467 9.583 9.741

9.874

9.963

122

10.051 10.104 10.208 10.311 10.424

Table D.9: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or additional
regressors. λ = .5, k = .5. AR(1) error. Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV
tDK , values of b
N,T
10,10

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.485 .451 .423 .278 .275 .281

.0 .105 .060 .076 .098 .272 .401 .469

.049 .040 .044 .041 .044 .041

.3 .104 .167 .127 .133 .300 .422 .488

.147 .084 .067 .056 .054 .056

.6 .102 .344 .225 .207 .342 .460 .533

.326 .172 .127 .088 .090 .082

.9 .101 .654 .503 .446 .508 .604 .651

.640 .447 .347 .228 .218 .217

.0 .093 .049 .068 .096 .254 .371 .443

.039 .040 .044 .046 .044 .039

.3 .091 .070 .078 .104 .262 .381 .448

.054 .048 .048 .050 .047 .046

.6 .089 .123 .098 .116 .269 .386 .454

.104 .060 .066 .060 .054 .051

.9 .087 .378 .216 .194 .332 .442 .515

.354 .170 .131 .098 .092 .091

.0 .056 .113 .113 .113 .273 .381 .447

.096 .080 .068 .061 .060 .060

.3 .057 .213 .213 .213 .354 .472 .537

.195 .165 .142 .113 .107 .106

.6 .062 .363 .363 .363 .479 .571 .626

.342 .304 .267 .185 .186 .181

.9 .056 .508 .508 .508 .586 .658 .704
50,50

.328 .298 .260 .189 .184 .178

.9 .102 .503 .503 .503 .572 .659 .709

50,10

.202 .173 .153 .120 .114 .115

.6 .102 .347 .347 .347 .470 .565 .617

10,250

.111 .088 .070 .065 .067 .068

.3 .102 .221 .221 .221 .367 .470 .539

10,50

.0 .102 .127 .127 .127 .276 .397 .465

.489 .453 .420 .277 .281 .282

.0 .060 .068 .085 .112 .269 .395 .466

.053 .051 .054 .052 .054 .050

.3 .059 .176 .136 .146 .294 .416 .488

.156 .094 .076 .069 .066 .067

.6 .058 .353 .227 .211 .348 .466 .535

.330 .181 .137 .098 .092 .093

.9 .057 .640 .498 .443 .506 .593 .647

.626 .452 .356 .225 .216 .214
Continued on next page.

123

Table D.9 (cont’d)
N(0, 1) CV
tDK , values of b
N,T
50,250

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.0 .056 .054 .076 .096 .247 .363 .435

.044 .045 .044 .048 .047 .042

.3 .056 .079 .085 .103 .253 .366 .440

.062 .053 .050 .049 .049 .045

.6 .057 .125 .102 .117 .260 .372 .451

.108 .066 .058 .057 .058 .054

.9 .056 .370 .224 .200 .320 .435 .510

.345 .172 .125 .092 .091 .088

.0 .053 .112 .112 .112 .278 .394 .459

.102 .082 .064 .061 .060 .058

.3 .055 .216 .216 .216 .356 .464 .530

.198 .168 .144 .114 .111 .108

.6 .056 .352 .352 .352 .449 .543 .602

.330 .295 .266 .195 .194 .192

.9 .050 .508 .508 .508 .568 .656 .708

.486 .457 .417 .271 .266 .265

.0 .057 .065 .083 .101 .251 .375 .445

.050 .049 .050 .046 .047 .046

.3 .058 .164 .126 .135 .278 .390 .473

.147 .085 .072 .064 .064 .062

.6 .054 .337 .212 .195 .326 .442 .517

.316 .168 .127 .092 .090 .092

.9 .051 .654 .494 .438 .508 .599 .650

.638 .440 .345 .224 .212 .211

250,250 .0 .048 .053 .074 .093 .257 .379 .455

.042 .045 .049 .044 .046 .048

.3 .046 .071 .081 .097 .264 .386 .459

.060 .050 .051 .048 .049 .050

.6 .048 .119 .099 .110 .274 .388 .470

.103 .063 .064 .052 .054 .054

.9 .047 .381 .229 .204 .335 .448 .523

.362 .171 .126 .091 .093 .091

250,10

250,50

124

Table D.10: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or additional
regressors. λ = .5, k = .5. MA(2) spatial correlation in cross-section. θ = 0.5. Two-Tailed Test of
H0 : β3 = 0.
N(0, 1) CV
tDK , values of b

tDK , values of b

N,T

ρ tclus

9,10

.0 .339 .110 .110 .110 .280 .396 .473

.101 .082 .069 .072 .070 .068

.3 .339 .221 .221 .221 .368 .471 .537

.200 .175 .152 .123 .122 .123

.6 .346 .361 .361 .361 .456 .556 .613

.341 .310 .276 .202 .199 .197

.9 .334 .484 .484 .484 .556 .654 .702

.470 .430 .402 .271 .267 .263

.0 .340 .064 .076 .094 .255 .366 .441

.051 .047 .045 .050 .045 .042

.3 .337 .165 .118 .127 .274 .392 .461

.148 .080 .070 .066 .064 .063

.6 .337 .333 .215 .194 .327 .433 .503

.310 .170 .127 .093 .089 .091

.9 .342 .644 .484 .428 .500 .588 .644

.628 .430 .339 .220 .219 .210

.0 .368 .059 .072 .094 .262 .386 .460

.050 .048 .048 .052 .046 .048

.3 .368 .078 .081 .099 .270 .390 .467

.064 .052 .050 .053 .048 .050

.6 .369 .126 .099 .112 .280 .401 .472

.111 .063 .060 .056 .053 .054

.9 .366 .390 .232 .206 .343 .456 .527

.370 .178 .129 .092 .087 .088

.0 .577 .108 .108 .108 .274 .400 .482

.094 .072 .056 .055 .057 .053

.3 .568 .219 .219 .219 .370 .489 .553

.200 .165 .134 .104 .101 .099

.6 .566 .342 .342 .342 .472 .574 .635

.318 .288 .258 .186 .181 .180

.9 .562 .508 .508 .508 .574 .659 .704

.490 .458 .426 .278 .277 .276

.0 .565 .057 .073 .095 .260 .380 .455

.049 .045 .048 .050 .051 .050

.3 .557 .162 .114 .130 .292 .409 .478

.146 .079 .066 .065 .064 .064

.6 .553 .341 .218 .199 .342 .449 .519

.319 .171 .122 .094 .092 .092

9,50

9,250

49,10

49,50

.02

Adjusted Fixed-b CV

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

Continued on next page.

125

Table D.10 (cont’d)
N(0, 1) CV
tDK , values of b
N,T

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.9 .566 .654 .501 .441 .516 .603 .656

.642 .445 .349 .219 .209 .204

.0 .578 .056 .072 .092 .268 .379 .452

.047 .045 .045 .050 .050 .050

.3 .572 .077 .081 .098 .273 .384 .458

.064 .049 .050 .054 .053 .050

.6 .572 .129 .100 .118 .280 .394 .465

.113 .064 .060 .060 .058 .057

.9 .580 .387 .226 .202 .333 .455 .524

.361 .178 .125 .096 .092 .093

.0 .612 .125 .125 .125 .272 .385 .460

.110 .090 .074 .070 .068 .069

.3 .619 .222 .222 .222 .350 .465 .528

.200 .171 .149 .114 .114 .113

.6 .621 .350 .350 .350 .454 .557 .622

.328 .296 .262 .190 .185 .182

.9 .636 .508 .508 .508 .569 .662 .712

.488 .451 .422 .268 .271 .266

.0 .639 .060 .072 .092 .269 .387 .469

.047 .044 .043 .040 .042 .042

.3 .635 .174 .123 .129 .302 .415 .489

.151 .081 .068 .058 .060 .052

.6 .631 .370 .232 .208 .348 .464 .532

.344 .178 .131 .096 .092 .092

.9 .635 .662 .498 .441 .503 .600 .656

.640 .446 .358 .228 .217 .219

256,250 .0 .623 .050 .074 .093 .252 .384 .460

.038 .039 .045 .051 .049 .050

.3 .625 .071 .082 .100 .259 .387 .464

.058 .048 .051 .053 .053 .055

.6 .626 .125 .097 .110 .270 .395 .468

.104 .062 .059 .059 .058 .058

.9 .624 .373 .216 .193 .333 .437 .510

.350 .165 .126 .099 .093 .094

49,250

256,10

256,50

126

Table D.11: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend or additional
regressors. Time dummies. λ = .5, k = .5. AR(1) error. Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV
tDK , values of b
N,T
10,10

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.485 .451 .423 .278 .275 .281

.0 .105 .060 .076 .098 .272 .401 .469

.049 .040 .044 .041 .044 .041

.3 .104 .167 .127 .133 .300 .422 .488

.147 .084 .067 .056 .054 .056

.6 .102 .344 .225 .207 .342 .460 .533

.326 .172 .127 .088 .090 .082

.9 .101 .654 .503 .446 .508 .604 .651

.640 .447 .347 .228 .218 .217

.0 .093 .049 .068 .096 .254 .371 .443

.039 .040 .044 .046 .044 .039

.3 .091 .070 .078 .104 .262 .381 .448

.054 .048 .048 .050 .047 .046

.6 .089 .123 .098 .116 .269 .386 .454

.104 .060 .066 .060 .054 .051

.9 .087 .378 .216 .194 .332 .442 .515

.354 .170 .131 .098 .092 .091

.0 .056 .113 .113 .113 .273 .381 .447

.096 .080 .068 .061 .060 .060

.3 .057 .213 .213 .213 .354 .472 .537

.195 .165 .142 .113 .107 .106

.6 .062 .363 .363 .363 .479 .571 .626

.342 .304 .267 .185 .186 .181

.9 .056 .508 .508 .508 .586 .658 .704
50,50

.328 .298 .260 .189 .184 .178

.9 .102 .503 .503 .503 .572 .659 .709

50,10

.202 .173 .153 .120 .114 .115

.6 .102 .347 .347 .347 .470 .565 .617

10,250

.111 .088 .070 .065 .067 .068

.3 .102 .221 .221 .221 .367 .470 .539

10,50

.0 .102 .127 .127 .127 .276 .397 .465

.489 .453 .420 .277 .281 .282

.0 .060 .068 .085 .112 .269 .395 .466

.053 .051 .054 .052 .054 .050

.3 .059 .176 .136 .146 .294 .416 .488

.156 .094 .076 .069 .066 .067

.6 .058 .353 .227 .211 .348 .466 .535

.330 .181 .137 .098 .092 .093

.9 .057 .640 .498 .443 .506 .593 .647

.626 .452 .356 .225 .216 .214
Continued on next page.

127

Table D.11 (cont’d)
N(0, 1) CV
tDK , values of b
N,T
50,250

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.0 .056 .054 .076 .096 .247 .363 .435

.044 .045 .044 .048 .047 .042

.3 .056 .079 .085 .103 .253 .366 .440

.062 .053 .050 .049 .049 .045

.6 .057 .125 .102 .117 .260 .372 .451

.108 .066 .058 .057 .058 .054

.9 .056 .370 .224 .200 .320 .435 .510

.345 .172 .125 .092 .091 .088

.0 .053 .112 .112 .112 .278 .394 .459

.102 .082 .064 .061 .060 .058

.3 .055 .216 .216 .216 .356 .464 .530

.198 .168 .144 .114 .111 .108

.6 .056 .352 .352 .352 .449 .543 .602

.330 .295 .266 .195 .194 .192

.9 .050 .508 .508 .508 .568 .656 .708

.486 .457 .417 .271 .266 .265

.0 .057 .065 .083 .101 .251 .375 .445

.050 .049 .050 .046 .047 .046

.3 .058 .164 .126 .135 .278 .390 .473

.147 .085 .072 .064 .064 .062

.6 .054 .337 .212 .195 .326 .442 .517

.316 .168 .127 .092 .090 .092

.9 .051 .654 .494 .438 .508 .599 .650

.638 .440 .345 .224 .212 .211

250,250 .0 .048 .053 .074 .093 .257 .379 .455

.042 .045 .049 .044 .046 .048

.3 .046 .071 .081 .097 .264 .386 .459

.060 .050 .051 .048 .049 .050

.6 .048 .119 .099 .110 .274 .388 .470

.103 .063 .064 .052 .054 .054

.9 .047 .381 .229 .204 .335 .448 .523

.362 .171 .126 .091 .093 .091

250,10

250,50

128

Table D.12: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. No additional
regressors. λ = .5, k = .5. AR(1) errors. Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV
tDK , values of b
N,T
10,10

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.188 .144 .105 .070 .069 .070

.0 .100 .070 .102 .140 .312 .401 .477

.056 .057 .056 .049 .051 .053

.3 .100 .163 .140 .161 .333 .423 .486

.139 .086 .076 .067 .064 .062

.6 .102 .315 .211 .212 .365 .444 .512

.282 .134 .104 .080 .084 .083

.9 .110 .495 .336 .300 .401 .470 .538

.471 .243 .152 .098 .095 .094

.0 .102 .068 .096 .133 .307 .392 .460

.056 .048 .050 .050 .052 .052

.3 .102 .087 .106 .138 .308 .398 .462

.073 .056 .052 .050 .052 .052

.6 .107 .135 .124 .155 .322 .408 .476

.116 .066 .060 .055 .058 .057

.9 .095 .349 .220 .215 .361 .448 .522

.326 .135 .100 .083 .081 .082

.0 .053 .180 .180 .180 .328 .406 .472

.160 .119 .094 .074 .064 .065

.3 .054 .207 .207 .207 .341 .426 .487

.190 .147 .116 .089 .086 .087

.6 .057 .220 .220 .220 .340 .417 .476

.201 .155 .120 .088 .086 .086

.9 .060 .219 .219 .219 .338 .400 .453
50,50

.192 .143 .116 .087 .084 .085

.9 .090 .210 .210 .210 .328 .386 .447

50,10

.192 .140 .104 .085 .080 .080

.6 .089 .212 .212 .212 .351 .426 .489

10,250

.168 .122 .091 .081 .078 .079

.3 .098 .215 .215 .215 .354 .432 .494

10,50

.0 .092 .186 .186 .186 .331 .414 .479

.196 .149 .112 .077 .072 .072

.0 .063 .077 .108 .142 .303 .406 .475

.066 .057 .060 .059 .058 .059

.3 .063 .165 .146 .170 .328 .422 .488

.142 .085 .075 .067 .069 .071

.6 .066 .314 .226 .222 .364 .450 .515

.286 .137 .099 .080 .080 .081

.9 .058 .497 .333 .288 .399 .475 .539

.472 .238 .146 .098 .096 .097
Continued on next page.

129

Table D.12 (cont’d)
N(0, 1) CV
tDK , values of b
N,T
50,250

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.0 .058 .069 .106 .138 .316 .402 .476

.055 .055 .054 .048 .051 .048

.3 .058 .094 .119 .144 .322 .410 .475

.076 .060 .059 .053 .054 .052

.6 .054 .143 .131 .156 .330 .419 .484

.121 .072 .067 .058 .059 .058

.9 .056 .346 .223 .212 .356 .441 .511

.324 .138 .098 .078 .073 .075

.0 .054 .200 .200 .200 .356 .434 .488

.177 .131 .098 .085 .083 .082

.3 .057 .226 .226 .226 .363 .442 .512

.205 .158 .123 .095 .091 .091

.6 .055 .228 .228 .228 .360 .439 .502

.209 .159 .125 .090 .087 .086

.9 .050 .214 .214 .214 .335 .408 .470

.189 .144 .112 .078 .073 .070

.0 .052 .077 .112 .145 .318 .406 .473

.062 .060 .052 .055 .056 .053

.3 .055 .168 .152 .176 .340 .426 .490

.150 .088 .076 .062 .064 .063

.6 .051 .312 .212 .214 .365 .450 .526

.284 .137 .105 .081 .079 .080

.9 .044 .494 .329 .291 .390 .472 .538

.468 .228 .146 .097 .095 .095

250,250 .0 .048 .068 .105 .141 .312 .414 .486

.053 .055 .055 .057 .052 .052

.3 .051 .090 .117 .151 .314 .415 .499

.076 .059 .056 .056 .054 .054

.6 .051 .146 .136 .169 .320 .422 .499

.123 .068 .064 .060 .063 .062

.9 .050 .343 .212 .212 .362 .443 .514

.318 .135 .099 .081 .081 .084

250,10

250,50

130

Table D.13: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. No additional
regressors. λ = .5, k = .5. MA(2) spatial correlation in cross-section. θ = 0.5. Two-Tailed Test of
H0 : β3 = 0.
N(0, 1) CV
tDK , values of b

tDK , values of b

N,T

ρ tclus

9,10

.0 .350 .185 .185 .185 .341 .435 .504

.168 .122 .092 .078 .075 .075

.3 .368 .216 .216 .216 .366 .456 .524

.194 .146 .113 .088 .084 .084

.6 .364 .238 .238 .238 .377 .464 .526

.216 .155 .124 .087 .084 .084

.9 .345 .230 .230 .230 .343 .421 .484

.208 .152 .115 .077 .073 .074

.0 .366 .072 .104 .147 .322 .424 .495

.057 .055 .054 .051 .049 .051

.3 .360 .174 .152 .180 .343 .428 .498

.150 .088 .076 .062 .060 .058

.6 .354 .314 .228 .232 .362 .442 .514

.290 .143 .106 .085 .080 .077

.9 .349 .473 .325 .284 .386 .461 .528

.450 .219 .138 .082 .080 .081

.0 .354 .082 .114 .145 .314 .410 .491

.065 .059 .057 .060 .056 .059

.3 .354 .105 .124 .152 .320 .413 .489

.089 .070 .069 .062 .062 .063

.6 .350 .155 .144 .171 .328 .415 .487

.139 .087 .080 .068 .062 .062

.9 .361 .362 .240 .240 .373 .453 .518

.338 .160 .121 .089 .083 .083

.0 .567 .179 .179 .179 .345 .433 .504

.160 .118 .089 .076 .073 .071

.3 .558 .215 .215 .215 .366 .450 .516

.196 .147 .106 .086 .082 .082

.6 .560 .229 .229 .229 .370 .438 .513

.206 .153 .120 .088 .088 .086

.9 .588 .222 .222 .222 .351 .424 .488

.202 .147 .116 .080 .071 .071

.0 .573 .068 .101 .135 .307 .402 .480

.052 .044 .047 .060 .059 .058

.3 .568 .162 .140 .162 .330 .428 .487

.138 .076 .064 .067 .068 .070

.6 .548 .303 .212 .211 .360 .444 .516

.277 .125 .093 .082 .078 .076

9,50

9,250

49,10

49,50

.02

Adjusted Fixed-b CV

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

Continued on next page.

131

Table D.13 (cont’d)
N(0, 1) CV
tDK , values of b
N,T

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.9 .558 .462 .304 .265 .376 .447 .520

.438 .218 .136 .086 .081 .083

.0 .589 .071 .112 .144 .325 .413 .486

.057 .057 .056 .058 .057 .055

.3 .585 .098 .123 .156 .329 .426 .489

.078 .064 .058 .062 .060 .059

.6 .582 .149 .142 .169 .334 .430 .501

.128 .080 .071 .066 .065 .064

.9 .572 .364 .233 .231 .367 .454 .533

.334 .153 .113 .089 .087 .087

.0 .613 .199 .199 .199 .337 .425 .491

.180 .138 .101 .084 .082 .082

.3 .629 .222 .222 .222 .356 .438 .497

.204 .157 .123 .092 .086 .088

.6 .632 .218 .218 .218 .353 .431 .495

.196 .142 .106 .076 .076 .076

.9 .631 .206 .206 .206 .316 .390 .462

.187 .141 .103 .063 .058 .060

.0 .630 .068 .102 .139 .314 .403 .478

.049 .048 .048 .053 .053 .054

.3 .626 .168 .145 .169 .337 .427 .497

.146 .076 .067 .063 .060 .063

.6 .633 .324 .216 .220 .372 .456 .528

.290 .140 .103 .080 .082 .083

.9 .613 .477 .314 .276 .391 .472 .530

.453 .227 .138 .082 .084 .085

256,250 .0 .629 .068 .100 .135 .325 .414 .491

.053 .048 .048 .051 .055 .054

.3 .623 .088 .106 .136 .327 .417 .493

.073 .057 .054 .056 .054 .052

.6 .630 .130 .122 .151 .332 .433 .503

.111 .066 .060 .060 .058 .058

.9 .619 .365 .224 .217 .374 .460 .525

.333 .135 .096 .077 .074 .077

49,250

256,10

256,50

132

Table D.14: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. Time Dummies.
No additional regressors. λ = .5, k = .5. AR(1) errors. Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV
tDK , values of b
N,T
10,10

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.188 .144 .105 .070 .069 .070

.0 .100 .070 .102 .140 .312 .401 .477

.056 .057 .056 .049 .051 .053

.3 .010 .163 .140 .161 .333 .423 .486

.139 .086 .076 .067 .064 .062

.6 .102 .315 .211 .212 .365 .444 .512

.282 .134 .104 .080 .084 .083

.9 .110 .495 .336 .300 .401 .470 .538

.471 .243 .152 .098 .095 .094

.0 .102 .068 .096 .133 .307 .392 .460

.056 .048 .050 .050 .052 .052

.3 .102 .087 .106 .138 .308 .398 .462

.073 .056 .052 .050 .052 .052

.6 .107 .135 .124 .155 .322 .408 .476

.116 .066 .060 .055 .058 .057

.9 .095 .349 .220 .215 .361 .448 .522

.326 .135 .100 .083 .081 .082

.0 .053 .180 .180 .180 .328 .406 .472

.160 .119 .094 .074 .064 .065

.3 .054 .207 .207 .207 .341 .426 .487

.190 .147 .116 .089 .086 .087

.6 .057 .220 .220 .220 .340 .417 .476

.201 .155 .120 .088 .086 .086

.9 .060 .219 .219 .219 .338 .400 .453
50,50

.192 .143 .116 .087 .084 .085

.9 .090 .210 .210 .210 .328 .386 .447

50,10

.192 .140 .104 .085 .080 .080

.6 .089 .212 .212 .212 .351 .426 .489

10,250

.168 .122 .091 .081 .078 .079

.3 .098 .215 .215 .215 .354 .432 .494

10,50

.0 .092 .186 .186 .186 .331 .414 .479

.196 .149 .112 .077 .072 .072

.0 .063 .077 .108 .142 .303 .406 .475

.066 .057 .060 .059 .058 .059

.3 .063 .165 .146 .170 .328 .422 .488

.142 .085 .075 .067 .069 .071

.6 .066 .314 .226 .222 .364 .450 .515

.286 .137 .099 .080 .080 .081

.9 .058 .497 .333 .288 .399 .475 .539

.472 .238 .146 .098 .096 .097
Continued on next page.

133

Table D.14 (cont’d)
N(0, 1) CV
tDK , values of b
N,T
50,250

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.0 .058 .069 .106 .138 .316 .402 .476

.055 .055 .054 .048 .051 .048

.3 .058 .094 .119 .144 .322 .410 .475

.076 .060 .059 .053 .054 .052

.6 .054 .143 .131 .156 .330 .419 .484

.121 .072 .067 .058 .059 .058

.9 .056 .346 .223 .212 .356 .441 .511

.324 .138 .098 .078 .073 .075

.0 .054 .200 .200 .200 .356 .434 .488

.177 .131 .098 .085 .083 .082

.3 .057 .226 .226 .226 .363 .442 .512

.205 .158 .123 .095 .091 .091

.6 .055 .228 .228 .228 .360 .439 .502

.209 .159 .125 .090 .087 .086

.9 .050 .214 .214 .214 .335 .408 .470

.189 .144 .112 .078 .073 .070

.0 .052 .077 .112 .145 .318 .406 .473

.062 .060 .052 .055 .056 .053

.3 .055 .168 .152 .176 .340 .426 .490

.150 .088 .076 .062 .064 .063

.6 .051 .312 .212 .214 .365 .450 .526

.284 .137 .105 .081 .079 .080

.9 .044 .494 .329 .291 .390 .472 .538

.468 .228 .146 .097 .095 .095

250,250 .0 .048 .068 .105 .141 .312 .414 .486

.053 .055 .055 .057 .052 .052

.3 .051 .090 .117 .151 .314 .415 .499

.076 .059 .056 .056 .054 .054

.6 .051 .146 .136 .169 .320 .422 .499

.123 .068 .064 .060 .063 .062

.9 .050 .343 .212 .212 .362 .443 .514

.318 .135 .099 .081 .081 .084

250,10

250,50

134

Table D.15: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend. Time Dummies.
No additional regressors. λ = .5, k = .5. MA(2) spatial correlation in cross-section. θ = 0.5.
Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV
tDK , values of b

tDK , values of b

N,T

ρ tclus

9,10

.0 .350 .185 .185 .185 .341 .435 .504

.168 .122 .092 .078 .075 .075

.3 .368 .216 .216 .216 .366 .456 .524

.194 .146 .113 .088 .084 .084

.6 .364 .238 .238 .238 .377 .464 .526

.216 .155 .124 .087 .084 .084

.9 .345 .230 .230 .230 .343 .421 .484

.208 .152 .115 .077 .073 .074

.0 .366 .072 .104 .147 .322 .424 .495

.057 .055 .054 .051 .049 .051

.3 .360 .174 .152 .180 .343 .428 .498

.150 .088 .076 .062 .060 .058

.6 .354 .314 .228 .232 .362 .442 .514

.290 .143 .106 .085 .080 .077

.9 .349 .473 .325 .284 .386 .461 .528

.450 .219 .138 .082 .080 .081

.0 .354 .082 .114 .145 .314 .410 .491

.065 .059 .057 .060 .056 .059

.3 .354 .105 .124 .152 .320 .413 .489

.089 .070 .069 .062 .062 .063

.6 .350 .155 .144 .171 .328 .415 .487

.139 .087 .080 .068 .062 .062

.9 .361 .362 .240 .240 .373 .453 .518

.338 .160 .121 .089 .083 .083

.0 .567 .179 .179 .179 .345 .433 .504

.160 .118 .089 .076 .073 .071

.3 .558 .215 .215 .215 .366 .450 .516

.196 .147 .106 .086 .082 .082

.6 .560 .229 .229 .229 .370 .438 .513

.206 .153 .120 .088 .088 .086

.9 .588 .222 .222 .222 .351 .424 .488

.202 .147 .116 .080 .071 .071

.0 .573 .068 .101 .135 .307 .402 .480

.052 .044 .047 .060 .059 .058

.3 .568 .162 .140 .162 .330 .428 .487

.138 .076 .064 .067 .068 .070

.6 .548 .303 .212 .211 .360 .444 .516

.277 .125 .093 .082 .078 .076

9,50

9,250

49,10

49,50

.02

Adjusted Fixed-b CV

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

Continued on next page.

135

Table D.15 (cont’d)
N(0, 1) CV
tDK , values of b
N,T

ρ tclus

.02

Adjusted Fixed-b CV
tDK , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.9 .558 .462 .304 .265 .376 .447 .520

.438 .218 .136 .086 .081 .083

.0 .589 .071 .112 .144 .325 .413 .486

.057 .057 .056 .058 .057 .055

.3 .585 .098 .123 .156 .329 .426 .489

.078 .064 .058 .062 .060 .059

.6 .582 .149 .142 .169 .334 .430 .501

.128 .080 .071 .066 .065 .064

.9 .572 .364 .233 .231 .367 .454 .533

.334 .153 .113 .089 .087 .087

.0 .613 .199 .199 .199 .337 .425 .491

.180 .138 .101 .084 .082 .082

.3 .629 .222 .222 .222 .356 .438 .497

.204 .157 .123 .092 .086 .088

.6 .632 .218 .218 .218 .353 .431 .495

.196 .142 .106 .076 .076 .076

.9 .631 .206 .206 .206 .316 .390 .462

.187 .141 .103 .063 .058 .060

.0 .630 .068 .102 .139 .314 .403 .478

.049 .048 .048 .053 .053 .054

.3 .626 .168 .145 .169 .337 .427 .497

.146 .076 .067 .063 .060 .063

.6 .633 .324 .216 .220 .372 .456 .528

.290 .140 .103 .080 .082 .083

.9 .613 .477 .314 .276 .391 .472 .530

.453 .227 .138 .082 .084 .085

256,250 .0 .629 .068 .100 .135 .325 .414 .491

.053 .048 .048 .051 .055 .054

.3 .623 .088 .106 .136 .327 .417 .493

.073 .057 .054 .056 .054 .052

.6 .630 .130 .122 .151 .332 .433 .503

.111 .066 .060 .060 .058 .058

.9 .619 .365 .224 .217 .374 .460 .525

.333 .135 .096 .077 .074 .077

49,250

256,10

256,50

136

Table D.16: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). One additional regressor. No trend. λ = .5, k = .5. MA(2) spatial correlation in cross-section. θ = 0.5. Two-Tailed Test
of H0 : β3 = 0 and H0 : γ = 0.
Adjusted Fixed-b CV
tDD , values of b

tz , values of b

N,T

ρ

9,10

.0 .103 .085 .070 .066 .060 .060

.186 .167 .150 .121 .117 .120

.3 .193 .164 .143 .110 .110 .105

.202 .184 .164 .130 .125 .130

.6 .320 .281 .254 .174 .164 .154

.240 .214 .190 .149 .140 .143

.9 .449 .419 .387 .244 .241 .236

.301 .281 .261 .192 .174 .174

.0 .048 .049 .052 .049 .044 .046

.062 .059 .059 .060 .060 .058

.3 .143 .086 .073 .064 .055 .059

.081 .070 .068 .064 .062 .063

.6 .312 .160 .119 .086 .078 .082

.189 .118 .110 .088 .084 .086

.9 .605 .416 .328 .205 .194 .188

.442 .295 .246 .178 .162 .166

.0 .047 .044 .048 .047 .044 .042

.056 .054 .054 .050 .050 .052

.3 .064 .049 .052 .049 .048 .046

.061 .056 .056 .052 .050 .050

.6 .109 .060 .058 .055 .052 .050

.087 .072 .072 .064 .064 .067

.9 .359 .172 .123 .089 .086 .086

.243 .145 .133 .101 .093 .096

.0 .097 .074 .059 .054 .057 .054

.141 .121 .108 .092 .085 .088

.3 .204 .174 .144 .111 .107 .103

.160 .142 .125 .112 .105 .108

.6 .320 .290 .262 .186 .174 .176

.226 .203 .184 .148 .144 .146

.9 .474 .442 .410 .274 .271 .272

.319 .296 .274 .203 .191 .191

.0 .054 .050 .046 .049 .048 .045

.056 .056 .055 .053 .051 .052

.3 .148 .085 .074 .063 .066 .064

.080 .069 .061 .056 .055 .058

.6 .328 .171 .128 .100 .092 .094

.178 .114 .098 .076 .079 .082

9,50

9,250

49,10

49,50

.02

Usual Fixed-b CV

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

Continued on next page.

137

Table D.16 (cont’d)
Adjusted Fixed-b CV
tDD , values of b
N,T

ρ

.02

Usual Fixed-b CV
tz , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.9 .635 .441 .344 .228 .217 .213

.463 .290 .231 .164 .158 .160

.0 .045 .045 .044 .044 .046 .050

.048 .052 .051 .051 .053 .051

.3 .062 .050 .049 .050 .049 .050

.058 .058 .056 .051 .055 .054

.6 .114 .061 .059 .057 .056 .056

.075 .063 .061 .055 .060 .061

.9 .361 .176 .129 .093 .095 .092

.243 .129 .115 .099 .095 .099

.0 .121 .098 .079 .070 .069 .066

.111 .098 .082 .066 .070 .069

.3 .203 .178 .156 .117 .119 .116

.138 .117 .099 .082 .079 .078

.6 .334 .301 .273 .198 .192 .196

.222 .198 .174 .128 .115 .115

.9 .483 .448 .415 .272 .271 .266

.335 .314 .290 .210 .197 .194

.0 .050 .040 .040 .041 .044 .043

.055 .053 .052 .046 .047 .050

.3 .152 .085 .070 .057 .064 .057

.076 .062 .059 .057 .056 .056

.6 .347 .179 .129 .097 .095 .092

.187 .105 .090 .071 .076 .074

.9 .649 .458 .366 .227 .216 .213

.487 .298 .238 .172 .158 .158

256,250 .0 .040 .040 .044 .048 .044 .050

.053 .050 .047 .045 .045 .045

.3 .059 .046 .051 .052 .050 .050

.060 .057 .052 .043 .049 .050

.6 .105 .066 .056 .061 .056 .056

.084 .067 .060 .059 .060 .060

.9 .346 .165 .129 .100 .094 .092

.229 .114 .094 .085 .081 .086

49,250

256,10

256,50

138

Table D.17: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). Trend and one additional
regressor. λ = .5, k = .5. MA(2) spatial correlation in cross-section. θ = 0.5. Two-Tailed Test of
H0 : β3 = 0 and H0 : γ = 0.
Adjusted Fixed-b CV
tDD , values of b

tz , values of b

N,T

ρ

9,10

.0 .160 .112 .085 .082 .074 .078

.208 .190 .169 .120 .121 .120

.3 .189 .140 .105 .082 .081 .083

.225 .200 .180 .137 .136 .139

.6 .203 .148 .114 .081 .080 .080

.272 .248 .228 .188 .181 .192

.9 .194 .142 .108 .076 .073 .073

.347 .323 .299 .250 .244 .249

.0 .060 .057 .058 .054 .060 .057

.063 .064 .061 .062 .055 .058

.3 .152 .084 .074 .064 .064 .063

.082 .078 .072 .064 .066 .069

.6 .289 .146 .111 .083 .081 .080

.194 .129 .114 .089 .088 .090

.9 .444 .213 .140 .086 .089 .087

.442 .313 .274 .219 .216 .224

.0 .054 .049 .049 .048 .044 .046

.054 .053 .055 .051 .052 .055

.3 .073 .057 .058 .054 .050 .052

.060 .060 .058 .056 .056 .056

.6 .122 .073 .068 .061 .058 .056

.087 .075 .070 .062 .060 .064

.9 .313 .144 .104 .081 .077 .078

.241 .152 .140 .097 .095 .098

.0 .181 .134 .100 .087 .083 .081

.160 .150 .133 .103 .098 .103

.3 .201 .154 .112 .094 .089 .086

.186 .168 .152 .118 .117 .119

.6 .201 .151 .114 .087 .083 .082

.226 .204 .186 .156 .154 .154

.9 .190 .140 .111 .074 .068 .070

.302 .278 .254 .230 .215 .220

.0 .066 .057 .055 .058 .060 .058

.058 .056 .055 .051 .052 .052

.3 .152 .088 .072 .070 .070 .070

.078 .066 .064 .058 .061 .064

.6 .284 .136 .101 .086 .086 .084

.176 .114 .104 .079 .080 .087

9,50

9,250

49,10

49,50

.02

Usual Fixed-b CV

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

Continued on next page.

139

Table D.17 (cont’d)
Adjusted Fixed-b CV
tDD , values of b
N,T

ρ

.02

Usual Fixed-b CV
tz , values of b

.06

.1

.4

.7

1.0

.02

.06

.1

.4

.7

1.0

.9 .452 .216 .139 .089 .086 .088

.446 .302 .259 .191 .183 .187

.0 .056 .056 .052 .058 .058 .058

.047 .049 .050 .049 .050 .052

.3 .074 .060 .052 .060 .061 .059

.055 .054 .057 .056 .053 .054

.6 .118 .072 .064 .067 .067 .066

.080 .067 .066 .059 .059 .063

.9 .338 .161 .118 .095 .092 .092

.251 .141 .122 .094 .094 .098

.0 .186 .134 .104 .083 .084 .084

.127 .111 .096 .079 .078 .080

.3 .208 .164 .134 .098 .094 .094

.146 .126 .111 .097 .095 .097

.6 .217 .163 .128 .093 .090 .090

.206 .188 .169 .141 .136 .138

.9 .200 .155 .114 .077 .069 .069

.281 .259 .233 .197 .179 .187

.0 .051 .050 .048 .056 .054 .055

.060 .056 .054 .055 .051 .054

.3 .147 .077 .070 .063 .066 .066

.085 .068 .063 .057 .061 .062

.6 .294 .139 .103 .086 .086 .086

.187 .114 .105 .085 .080 .084

.9 .448 .231 .140 .088 .086 .084

.442 .285 .243 .189 .170 .174

256,250 .0 .051 .047 .049 .047 .051 .050

.054 .048 .046 .044 .044 .044

.3 .067 .056 .054 .053 .051 .050

.060 .055 .051 .047 .047 .048

.6 .113 .066 .063 .060 .057 .060

.082 .066 .066 .059 .058 .059

.9 .334 .139 .098 .077 .076 .076

.231 .122 .105 .082 .083 .086

49,250

256,10

256,50

140

Table D.18: Null Rejection Probabilities, 5% level, tDD (Bartlett Kernel). No trend and additional regressors. λ = .5, k = .5. MA(2)
spatial correlation in cross-section. θ = 0.5. Two-Tailed Test of H0 : β3 = 0.
N(0, 1) CV

N(0, 1) CV

Adjusted Fixed-b CV

r
tdouble , values of b

tDK , values of b

tDK , values of b

ρ tclus tdouble

.02

.0 .565

.063

.093 .223 .405 .466

.057 .095 .260 .380

.049 .048 .050 .051

.3 .557

.155

.108 .226 .454 .498

.162 .130 .292 .409

.146 .066 .065 .064

.6 .553

.288

.186 .227 .496 .538

.341 .199 .342 .449

.319 .122 .094 .092

.9 .566

.479

.381 .327 .632 .666

.654 .441 .516 .603

.642 .349 .219 .209

256,250 .0 .623

.045

.066 .192 .403 .468

.050 .093 .252 .384

.038 .045 .051 .049

.3 .625

.140

.066 .193 .414 .469

.071 .100 .259 .387

.058 .051 .053 .053

.6 .626

.289

.073 .194 .433 .481

.125 .110 .270 .395

.104 .059 .059 .058

.9 .624

.514

.201 .202 .494 .534

.373 .193 .333 .437

.350 .126 .099 .093

N,T
49,50

.1

.4

.7

141

.02

.1

.4

.7

.02

.1

.4

.7

Appendix E
FIGURES IN CHAPTER 3

.2

N=100, T=250, rho=.3, b=.02
normal
bootstrap

0

.1

alpha
fixed−b

0

.1

.2

.3

.4

.5
lambda

.6

.7

.8

.9

1

Figure E.1: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, N = 100,
T = 250, ρ = 0.3, b = 0.02. For interpretation of the references to color in this and all other ﬁgures,
the reader is refered to the electronic version of this dissertation.

142

.6

N=100, T=250, rho=.3, b=.5
normal
bootstrap

0

.1

.2

.3

.4

.5

alpha
fixed−b

0

.1

.2

.3

.4

.5
lambda

.6

.7

.8

.9

1

Figure E.2: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, N =
100,T = 250, ρ = 0.3, b = 0.5.

143

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=50, T=50
Figure E.3: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, N = 50,
λ = 0.5.

144

Figure E.3: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=50, T=50

145

.5

.6

.7

.8

Figure E.3: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=50, T=50

146

.5

.6

.7

.8

Figure E.3: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=50, T=250

147

.5

.6

.7

.8

Figure E.3: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=50, T=250

148

.5

.6

.7

.8

Figure E.3: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=50, T=250

149

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=50
Figure E.4: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 49, λ = 0.5.

150

Figure E.4: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=49, T=50

151

.5

.6

.7

.8

Figure E.4: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=49, T=50

152

.5

.6

.7

.8

Figure E.4: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=49, T=250

153

.5

.6

.7

.8

Figure E.4: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=49, T=250

154

.5

.6

.7

.8

Figure E.4: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=49, T=250

155

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=250, T=50
Figure E.5: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, N = 250,
λ = 0.5.

156

Figure E.5: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=250, T=50

157

.5

.6

.7

.8

Figure E.5: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=250, T=50

158

.5

.6

.7

.8

Figure E.5: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=250, T=250

159

.5

.6

.7

.8

Figure E.5: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=250, T=250

160

.5

.6

.7

.8

Figure E.5: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=250, T=250

161

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=256, T=50
Figure E.6: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 256, λ = 0.5.

162

Figure E.6: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=256, T=50

163

.5

.6

.7

.8

Figure E.6: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=256, T=50

164

.5

.6

.7

.8

Figure E.6: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=256, T=250

165

.5

.6

.7

.8

Figure E.6: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=256, T=250

166

.5

.6

.7

.8

Figure E.6: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=256, T=250

167

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=50, T=50
Figure E.7: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, T = 50,
λ = 0.5.

168

Figure E.7: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=50, T=50

169

.5

.6

.7

.8

Figure E.7: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=50, T=50

170

.5

.6

.7

.8

Figure E.7: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=250, T=50

171

.5

.6

.7

.8

Figure E.7: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=250, T=50

172

.5

.6

.7

.8

Figure E.7: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=250, T=50

173

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=50
Figure E.8: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, T = 49, λ = 0.5.

174

Figure E.8: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=49, T=50

175

.5

.6

.7

.8

Figure E.8: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=49, T=50

176

.5

.6

.7

.8

Figure E.8: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=256, T=50

177

.5

.6

.7

.8

Figure E.8: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=256, T=50

178

.5

.6

.7

.8

Figure E.8: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=256, T=50

179

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=50, T=250
Figure E.9: Empirical null rejection probabilities, no spatial correlation, Bartlett kernel, T = 250,
λ = 0.5.

180

Figure E.9: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=50, T=250

181

.5

.6

.7

.8

Figure E.9: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=50, T=250

182

.5

.6

.7

.8

Figure E.9: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

significance level
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=250, T=250

183

.5

.6

.7

.8

Figure E.9: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=250, T=250

184

.5

.6

.7

.8

Figure E.9: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=250, T=250

185

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=250
Figure E.10: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, T = 250, λ =
0.5.

186

Figure E.10: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(b) N=49, T=250

187

.5

.6

.7

.8

Figure E.10: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(c) N=49, T=250

188

.5

.6

.7

.8

Figure E.10: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

(d) N=256, T=250

189

.5

.6

.7

.8

Figure E.10: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

(e) N=256, T=250

190

.5

.6

.7

.8

Figure E.10: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

(f) N=256, T=250

191

.5

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=250, block length=25
Figure E.11: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 49, T =
250, λ = 0.5.

192

Figure E.11: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=49, T=250, block length=25

193

.6

.7

.8

Figure E.11: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=49, T=250, block length=25

194

.6

.7

.8

Figure E.11: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

(d) N=49, T=250, block length=1

195

.6

.7

.8

Figure E.11: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=49, T=250, block length=1

196

.6

.7

.8

Figure E.11: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=49, T=250, block length=1

197

.6

.7

.8

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=256, T=250, block length=25
Figure E.12: Empirical null rejection probabilities, spatial MA(2), Bartlett kernel, N = 256, T =
250, λ = 0.5.

198

Figure E.12: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=256, T=250, block length=25

199

.6

.7

.8

Figure E.12: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=256, T=250, block length=25

200

.6

.7

.8

Figure E.12: (cont’d)

.7

rho=0
normal
bootstrap

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b

0

.1

.2

.3

.4
bandwidth

.5

(d) N=256, T=250, block length=1

201

.6

.7

.8

Figure E.12: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=256, T=250, block length=1

202

.6

.7

.8

Figure E.12: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=256, T=250, block length=1

203

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=50, DD parameter
Figure E.13: Empirical null rejection probabilities, additional regressor, spatial MA(2), Bartlett
kernel, N = 49, T = 50, λ = 0.5.

204

Figure E.13: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=49, T=50, DD parameter

205

.6

.7

.8

Figure E.13: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=49, T=50, DD parameter

206

.6

.7

.8

Figure E.13: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=49, T=50, z parameter

207

.6

.7

.8

Figure E.13: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=49, T=50, z parameter

208

.6

.7

.8

Figure E.13: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=49, T=50, z parameter

209

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=49, T=250, DD parameter
Figure E.14: Empirical null rejection probabilities, additional regressor, spatial MA(2), Bartlett
kernel, N = 49, T = 250, λ = 0.5.

210

Figure E.14: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=49, T=250, DD parameter

211

.6

.7

.8

Figure E.14: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=49, T=250, DD parameter

212

.6

.7

.8

Figure E.14: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=49, T=250, z parameter

213

.6

.7

.8

Figure E.14: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=49, T=250, z parameter

214

.6

.7

.8

Figure E.14: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=49, T=250, z parameter

215

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=256, T=50, DD parameter
Figure E.15: Empirical null rejection probabilities, additional regressor, spatial MA(2), Bartlett
kernel, N = 256, T = 50, λ = 0.5.

216

Figure E.15: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=256, T=50, DD parameter

217

.6

.7

.8

Figure E.15: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=256, T=50, DD parameter

218

.6

.7

.8

Figure E.15: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=256, T=50, z parameter

219

.6

.7

.8

Figure E.15: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=256, T=50, z parameter

220

.6

.7

.8

Figure E.15: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=256, T=50, z parameter

221

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

.6

.7

.8

(a) N=256, T=250, DD parameter
Figure E.16: Empirical null rejection probabilities, additional regressor, spatial MA(2), Bartlett
kernel, N = 256, T = 250, λ = 0.5.

222

Figure E.16: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(b) N=256, T=250, DD parameter

223

.6

.7

.8

Figure E.16: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(c) N=256, T=250, DD parameter

224

.6

.7

.8

Figure E.16: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=256, T=250, z parameter

225

.6

.7

.8

Figure E.16: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=256, T=250, z parameter

226

.6

.7

.8

Figure E.16: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=256, T=250, z parameter

227

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwdth

.5

.6

.7

.8

(a) N=49, T=250, l=25, DD parameter
Figure E.17: Empirical null rejection probabilities for DD parameter, additional regressor, spatial
MA(2), Bartlett kernel, N = 49, T = 250, λ = 0.5.

228

Figure E.17: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwdth

.5

(b) N=49, T=250, l=25, DD parameter

229

.6

.7

.8

Figure E.17: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwdth

.5

(c) N=49, T=250, l=25, DD parameter

230

.6

.7

.8

Figure E.17: (cont’d)

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

(d) N=49, T=250, l=1, DD parameter

231

.6

.7

.8

Figure E.17: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=49, T=250, l=1, DD parameter

232

.6

.7

.8

Figure E.17: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=49, T=250, l=1, DD parameter

233

.6

.7

.8

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwdth

.5

.6

.7

.8

(a) N=256, T=250, l=25, DD parameter
Figure E.18: Empirical null rejection probabilities for DD parameter, additional regressor, spatial
MA(2), Bartlett kernel, N = 256, T = 250, λ = 0.5.

234

Figure E.18: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwdth

.5

(b) N=256, T=250, l=25, DD parameter

235

.6

.7

.8

Figure E.18: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwdth

.5

(c) N=256, T=250, l=25, DD parameter

236

.6

.7

.8

Figure E.18: (cont’d)

.7

rho=0
normal_DID
bootstrap_DID

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_DID

0

.1

.2

.3

.4
bandwidth

.5

(d) N=256, T=250, l=1, DD parameter

237

.6

.7

.8

Figure E.18: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=256, T=250, l=1, DD parameter

238

.6

.7

.8

Figure E.18: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=256, T=250, l=1, DD parameter

239

.6

.7

.8

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwdth

.5

.6

.7

.8

(a) N=49, T=250, l=25, z parameter
Figure E.19: Empirical null rejection probabilities for z parameter, additional regressor, spatial
MA(2), Bartlett kernel, N = 49, T = 250, λ = 0.5.

240

Figure E.19: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwdth

.5

(b) N=49, T=250, l=25, z parameter

241

.6

.7

.8

Figure E.19: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwdth

.5

(c) N=49, T=250, l=25, z parameter

242

.6

.7

.8

Figure E.19: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=49, T=250, l=1, z parameter

243

.6

.7

.8

Figure E.19: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=49, T=250, l=1, z parameter

244

.6

.7

.8

Figure E.19: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=49, T=250, l=1, z parameter

245

.6

.7

.8

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwdth

.5

.6

.7

.8

(a) N=256, T=250, l=25, z parameter
Figure E.20: Empirical null rejection probabilities for z parameter, additional regressor, spatial
MA(2), Bartlett kernel, N = 256, T = 250, λ = 0.5.

246

Figure E.20: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwdth

.5

(b) N=256, T=250, l=25, z parameter

247

.6

.7

.8

Figure E.20: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwdth

.5

(c) N=256, T=250, l=25, z parameter

248

.6

.7

.8

Figure E.20: (cont’d)

.7

rho=0
normal_z
bootstrap_z

0

.1

.2

.3

.4

.5

.6

alpha
fixed−b_z

0

.1

.2

.3

.4
bandwidth

.5

(d) N=256, T=250, l=1, z parameter

249

.6

.7

.8

Figure E.20: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.3

0

.1

.2

.3

.4
bandwidth

.5

(e) N=256, T=250, l=1, z parameter

250

.6

.7

.8

Figure E.20: (cont’d)

0

.1

.2

.3

.4

.5

.6

.7

rho=.9

0

.1

.2

.3

.4
bandwidth

.5

(f) N=256, T=250, l=1, z parameter

251

.6

.7

.8

BIBLIOGRAPHY

252

BIBLIOGRAPHY

M. Arellano. Computing robust standard errors for within-groups estimators. Oxford Bulletin of
Economics and Statistics, 49(4):431–434, 1987.
M. Bertrand, E. Duﬂo, and S. Mullainathan. How much should we trust differences-in-differences
estimates? Quarterly Journal of Economics, 119:249–275, 2004.
A.C. Bester, T.C. Conley, C.B. Hansen, and T.J. Vogelsang. Fixed-b asymptotics for spatially
dependent robust nonparametric covariance matrix estimators. Working Paper, Department of
Economics, Michigan State University, 2008.
A.C. Bester, T.C. Conley, and C.B. Hansen. Inference with dependent data using cluster covariance
estimators. Journal of Econometrics, 2011. doi:10.1016/j.jeconom.2011.01.007.
H. Bunzel and T. J. Vogelsang. Powerful trend function tests that are robust to strong serial correlation with an application to the prebisch-singer hypothesis. Journal of Business and Economic
Statistics, 23:381–394, 2005.
A. Cameron, J. Gelbach, and D. Miller. Bootstrap-based improvements for inference with clustered
errors. The Review of Economics and Statiscs, 90:414–427, 2008.
A. Cameron, J. Gelbach, and D. Miller. Robust inference with multiway clustering. Journal of
Business and Economic Statistics, 29:238–249, 2011.
C. K. Cho. Fixed b inference in a time series regression with a structural break. Working paper,
Department of Economics, Michigan State University, 2012.
T. G. Conley. GMM estimation with cross sectional dependence. Journal of Econometrics, 92(1):
1–45, 1999.
J.C. Driscoll and A.C. Kraay. Consistent covariance matrix estimation with spatially dependent
panel data. Review of Economics and Statistics, 80(4):549–560, 1998.
B. Efron. Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7:1–26, 1979.
E.F. Fama and J.D. MacBeth. Risk, return, and equilibrium: Empirical tests. Journal of Political
Economy, 81(3):607–636, 1973.
S. Gonçalves. The moving blocks bootstrap for panel linear regression models with individual
ﬁxed-effects. Econometric Theory, 27:1048–1082, 2011.
S. Gonçalves and T. J. Vogelsang. Block bootstrap HAC robust tests: The sophistication of the
naive bootstrap. Econometric Theory, 27(4):745–791, 2011.
I. Gow, G. Ormazabal, and D. Taylor. Correcting for cross-sectional and time-series dependence
in accounting research. The Accounting Review, 85:483–512, 2010.

253

C.B. Hansen. Asymptotic properties of a robust variance matrix estimator for panel data when T
is large. Journal of Econometrics, 141(2):597–620, 2007.
N. Hashimzade and T. J. Vogelsang. Fixed-b asymptotic approximation of the sampling behavior of
nonparametric spectral density estimators. Journal of Time Series Analysis, 29:142–162, 2008a.
N. Hashimzade and T. J. Vogelsang. Fixed-b asymptotic approximation of the sampling behaviour
of nonparametric spectral density estimators. Journal of Time Series Analysis, 29:142–162,
2008b.
H.H. Kelejian and I.R. Prucha. HAC estimation in a spatial framework. Journal of Econometrics,
140(1):131–154, 2007.
N. M. Kiefer and T. J. Vogelsang. A new asymptotic theory for heteroskedasticity-autocorrelation
robust tests. Econometric Theory, 21:1130–1164, 2005.
M.S. Kim and Y. Sun. Spatial heteroskedasticity and autocorrelation consistent estimation of
covariance matrix. Journal of Econometrics, 160:349–371, 2011a.
M.S. Kim and Y. Sun. Heteroskedasticity and spatiotemporal dependence robust inference for
linear panel models with ﬁxed effects. Working paper, Department of Economics, Ryerson
University, 2011b.
H. R. Kunsch. The jackknife and the bootstrap for general stationary observations. Annals of
Statistics, 17:1217–1241, 1989.
R.Y. Liu and K. Singh. Moving blocks jackknife and bootstrap capture weak dependence. In
R. LePage and L. Billiard, editors, Exploring the Limits of the Bootstrap, New York, 1992.
Wiley.
W. K. Newey and K. D. West. A simple, positive semi-deﬁnite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55:703–708, 1987.
M. A. Petersen. Estimating standard errors in ﬁnance panel data sets: Comparing approaches.
Review of Financial Studies, 22(1):435, 2009.
S. B. Thompson. Simple formulas for standard errors that cluster by both ﬁrm and time. Journal
of Financial Economics, 99:1–10, 2011.
T. J. Vogelsang. Spectral analysis. In S. N. Durlauf and L. E. Blume, editors, The New Palgrave
Dictionary of Economics. Palgrave Macmillan, 2008.
T. J. Vogelsang. Heteroskedasticity, autocorrelation, and spatial correlation robust inference in
linear panel models with ﬁxed-effects. Journal of Econometrics, 2012.
H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48:817–38, 1980.
H. White. Asymptotic Theory for Econometricians. Academic Press, New York, 1984.

254

J. M. Wooldridge. Analysis of Cross-sectional and Panel Data. Cambridge, MA: MIT Press, 2002.
J. M. Wooldridge. Cluster-sample methods in applied econometrics. American Economic Review
Papers and Proceedings, 93(2):133–138, 2003.
C. F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. Annals of
Statistics, 14:1261–95, 1986.

255