\u. ..

i {t
«£11

I .9.

.1. 1.3.

Er.
La... um

5:35.)... A
‘ “Va: 1.

.. {Ha " «unnJhan.

. mawww

.a.

 

, V .1 11.1. .11.P.41
.2: it; . . .

Fancy .
.. x. r $.38, w; 9.... .32.:

 

--“-- 1"

MICHIGAIJJIBRARIES
STATE UNIVERSITY
EAST LANSING, MICH 48824-1048

This is to certify that the
dissertation entitled

THREE ESSAYS ON ECONOMETRICS

presented by

MYUNGSUP KIM

has been accepted towards fulﬁllment
of the requirements for the

PhD. degree in Economics

 

 

We SUL

 

Major Professor’s Signature

mm w. 2005

 

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

._.- .ﬁ-U-l-I-l-D-.-I-Q-Q-O-.~l-.-I-I-I-.--.-C-.-C-l-.-l-l-l-I-l-I-l.

PLACE IN RETURN BOX to remove this checkout from your record.
7’1 AVOID FINES return on or before date due.

."_..

THREE ESSAYS ON ECONOMETRICS
By

Myungsup Kim

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Economics

2005

ABSTRACT
THREE ESSAYS ON ECONOMETRICS
By

Myungsup Kim

Consider a simple stochastic frontier model explaining the output of a ﬁrm by
y = x’ﬁ + v — u. While 1) represents random shocks outside the control of producers,
u represents technical inefficiency in the production process.

In the ﬁrst chapter, we wish to test whether technical inefﬁciency depends on
observable characteristics of the ﬁrm. It is well known that two-step procedures, in
which the second step is the regression of an inefficiency measure on ﬁrm character-
istics, do not properly estimate the effects of ﬁrm characteristics on inefficiency. In
this chapter we show that this regression also does not lead to a valid test of the
hypothesis of no effect. A valid test of the hypothesis of no effect can be constructed
by using an adjustment to the variance matrix of the estimated coefﬁcients in the
second step regression. Unfortunately the form of this adjustment is not distribution
free. We show that this test is the LM test in the speciﬁc case that technical inef-
ﬁciency is exponential and the alternative is a scaled exponential distribution. We
also consider tests based on nonlinear least squares. These tests do not depend on
a distributional assumption. There are some technical complications involved due
to the non-identiﬁcation of some of the parameters under the null. We perform an
extensive set of simulations to compare the size and power characteristics of these
tests and other similar tests, including the Wald test based on a one-step estimate of
the entire model.

In the second chapter, we study the construction of conﬁdence intervals for efﬁ-
ciency levels of individual ﬁrms in stochastic frontier models with panel data. The
focus is on bootstrapping and related methods. We start with a survey of various

versions of the bootstrap. Then we offer some simple alternatives based on standard

methods when one acts as if the identity of the best ﬁrm is known. Monte Carlo
simulations indicate that these simple alternatives work better than the percentile
bootstrap but perhaps not as well as the bias-adjusted and accelerated bootstrap.
None of the methods yields very accurate conﬁdence intervals except when the time-
series sample size is large enough, or the error variance is small enough, that the
identity of the best ﬁrm is clear. We also present empirical results for two well-known
data sets.

In the last chapter, we consider the problem of testing the null hypothesis that
a series is stationary against the unit root alternative. A standard test for this null
hypothesis is the KPSS test, which is based on cumulations of deviations from the
means of the series. A paper by de Jong, Amsler, and Schmidt (2002) constructs a
“robust” version of the KPSS test by using an indicator of whether the observation
is above or below the sample median. This test, called the indicator KPSS test, is
robust in that it does not require existence of moments of the series, yet the asymptotic
distribution of the indicator KPSS statistic is the same as that of the KPSS statistic.
However, in this chapter we allow a non-zero level for the series under consideration,
but not a deterministic trend. The purpose of this chapter is to extend the indicator
KPSS statistic to the case of a deterministic trend. The relevant indicator in this
setting is whether the residual is positive or negative in a least absolute deviations
regression of the series on a time trend. This chapter shows that, under the null
of trend-stationarity, the indicator KPSS statistic with a time trend has the same

limiting distribution as the KPSS statistic with a time trend.

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor, Professor Peter Schmidt,
who taught my ﬁrst Econometrics course with passion. His willingness to motivate
and support my work has made it possible for me to develop my skills as a researcher.
Needless to say, his wise advice and continuing counsel have been essential for the
course of my studies. Also, I am enormously grateful to Professor Robert M. de
Jong for his patient support and extraordinary encouragement of my learning of
methodological tools. I owe special thanks to Professor Jeffrey M. Wooldridge for
invaluable comments. I would like to thank the other members of my committee:
Professor Christine E. Amsler and Professor Robert J. Myers. I greatly appreciate
their input and time to this dissertation.

I want to thank my wife, Jiyoung, for being my partner and best friend with her
unending love, sacriﬁce and support. I am also greatly indebted to my parents and
my parents-in—law who have always been very supportive of my pursuit of education.

My special thanks also go to my fellow graduate students for their friendships and
the staff in the Department of Economics for their jovial smile and helpful nature

that have always greeted any problem or deadline.

iv

TABLE OF CONTENTS

LIST OF TABLES ..............................

1 Valid Tests of Whether Technical Inefficiency Depends on Firm
Characteristics ..............................
1.1 Introduction ................................
1.2 Two-Step Procedures ...........................
1.3 The Scaled Exponential Case ......................
1.4 A Test Based on Nonlinear Least Squares ................
1.5 Simulations: Experimental Design ....................
1.6 Simulation Results: Size .........................

1.6.1 Base case .............................
1.6.2 Effects of changing a or 6 ....................
1.6.3 Effects of changing N .......................
1.6.4 Effects of changing p .......................
1.6.5 Effects of changing A .......................
1.6.6 Effects of changing 03 ......................
1.7 Simulation Results: Power ........................
1.8 Simulation Results: Robustness .....................
1.8.1 Normal-truncated normal ....................
1.8.2 Normal-gamma ..........................
1.9 Concluding Remarks ...........................
1.10 Output Tables ...............................
1.11 Appendix: LM Test for the Scaled Exponential Case .........
1.12 Appendix: Supplementary Tables ....................

2 On the Accuracy of Bootstrap Conﬁdence Intervals for Emciency
Levels in Stochastic Frontier Models with Panel Data .......
2.1 Introduction ................................
2.2 Fixed-Effects Estimation of the Model .................
2.3 Construction of Conﬁdence Intervals by Bootstrapping ........
2.4 A Simple Alternative to the Bootstrap .................
2.5 Simulations ................................
2.6 Empirical Results .............................

2.6.1 Indonesian Rice Farms ......................
2.6.2 Texas Utilities ...........................
2.7 Conclusions ................................
2.8 Output Tables ...............................

wi—tl-l

8
10
14
18

19
19
2O
20
21
23

26
28
44
51

60
60
61
65
7O
73
82
82
84

88

3 Indicator KPSS with a Time Trend .................. 98

3.1 Introduction ................................ 98
3.2 Asymptotic Theory ............................ 99
3.2.1 Assumptions ............................ 99

3.2.2 Indicator KPSS statistic ..................... 101

3.2.3 Conjectures ............................ 102

3.2.4 The Asymptotic Distributions of the Indicator KPSS Statistic 103

3.3 Concluding remarks ............................ 105
3.4 Appendix: Mathematical Proof ..................... 106
BIBLIOGRAPHY .............................. 135

vi

LIST OF TABLES

1.1(BASECASE)a=5=—.6=0,53=,\=1,p=0.5,N-_-.-200
[E(exp(—u)) = 0.5232] ..........................

1.2 (ChangeofN)N=500,a=5=6=0,53=A=1,p=0.5
[E(exp(—u)) = 0.5232] ..........................

1.3 (ChangeofN)N-_— 1000,a :6 =6 = 0, 5,2, = A = 1,p=0.5
[E(exp(—u)) = 0.5232] ..........................

1.4 (Changeofp)p= —0.5,a =p= 6:0,6-3 = A = 1,N =200
[E(exp(—u)) = 0.5232] ..........................

1.5 (Change of p) p = 0, a = ﬂ = 6 = 0, 0,2, = A : 1, N = 200 [E(exp(—-u)) =
0.5232] ...................................

1.6(Changeofp)p=09026262003=A 1,N=200
[E(exp(—u)) = 0.5232] ..........................

1.7 (ChangeofA)A=3,a=ﬂ=6=0,a,2,=1,p=0.5,N=200
[E(exp(-—u)) = 0.1095] ..........................

1.8 (Changeof03)a,2, =9,a:ﬁ =6=0, A = 1,p=0.5,N= 200
[E(exp(—u)) = 0.5100] ..........................

1.9 (Change of 6) 6 = 0.05, a = r3 = 0, 0,2, = A = 1, p = 0.5, N = 1000
[E(exp(—u)) = 0.5232] ..........................

1.10 (Change of 6) 6 = 0.1, a = B = 0, 63 = A = 1, p = 0.5, N = 1000
[E(exp(—u)) = 0.5232] ..........................

1.11 (Change of 6) 6 = 0.15, a = p = 0, 63 = A = 1, p = 0.5, N = 1000
[E(exp(—u)) = 0.5232] ..........................

1.12 (Change of 6 and p) 6 = 0.1, p = 0.9, a = [3 = 0, 0.3 = A = 1, N 21000
[E(exp(—u)) = 0.5232] ..........................

1.13 (Change of scaling functions to ¢(62,-)/(1 — <I>(6z,~))) 6 = 0.1, a = B = 0,
53 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] .........

1.14 (Change of the distribution of u: to N(0,7r/2)+) a = ﬂ = 6 = 0, 0,2, =
A = 1, p = 0.5, N = 1000 [E(exp(-u)) = 0.5232] ............

vii

28

29

30

31

32

33

34

35

36

37

38

39

40

41

1.15 (Change of the distribution of u: to gamma(0.5,2)) oz = [3 = 6 = 0,

53 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 42
1.16 (Change of the distribution of u: to gamma(2,0.5)) a = B = 6 = 0,

6,2, = A = 1, p .-= 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 43
1.17 (Changeofp)p= 025,6. :5 = 6 = 0, 53 = A = 1, N = 200

[E(exp(—u)) = 0.5232] .......................... 51
1.18 (Change ofp) p = 0.75, a = 5 = 6 = 0, 53 = A = 1, N = 200

[E(exp(—u)) = 0.5232] .......................... 52
1.19 (Change of6 and p) 6 = 0.05, p = 0.9, a = ﬂ = 0, 0,2, = A = 1, N = 1000

[E(exp(—u)) = 0.5232] .......................... 53
1.20 (Change of 6 and p) 6 = 0.15, p = 0.9, a = 6 = 0, 63 = A = 1, N = 1000

[E(exp(—u)) = 0.5232] .......................... 54
1.21 (Change of the distribution of u? to N(0,1)+) a = B = 6 = 0, 03 = A = 1,

p = 0.5, N = 1000 [E(exp(-—-u)) = 0.5232] ................ 55
1.22 (Change of the distribution of uf to N(0,7r/(1r - 2))+) or = B = 6 = 0,

6,2, = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 56
1.23 (Change of the distribution of u: to N(1,1)+)a = B 2 6 = 0, 03 = A =1,

p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ................ 57
1.24 (Change of the distribution of u: to gamma(0.5, \/2)) a = [3 = 6 = 0,

5,2, = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 58
1.25 (Change of the distribution of u: to gamma(2,1/\/2)) a = ﬂ = 6 = 0,

53 = A = 1, p -.= 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 59
2.1 Biases of Fixed Effects Estimates ...................... 88
2.2 90% Conﬁdence Intervals for Relative Efﬁciency (if) ........... 89
2.3 90% Conﬁdence Intervals for Relative Efﬁciency (if) ........... 90
2.4 Bias Correction in the 300 Bootstrap Intervals .............. 90
2.5 90% Conﬁdence Intervals for Relative Efﬁciency (7'?) ........... 91

2.6 Biases of Fixed Effects Estimates (Case that u,- are ﬁxed over replications) 92

viii

2.7 90% Conﬁdence Intervals for Relative Efﬁciency (if) (Case that u,- are
ﬁxed across replications) ......................... 93

2.8 Estimated Efﬁciencies and 90% Conﬁdence Intervals: Indonesian Rice Farms 94

2.9 90% Conﬁdence Intervals: Indonesian Rice Farms ............. 95
2.10 Estimated Efﬁciencies and 90% Conﬁdence Intervals: Texas Utilities . . . 96
2.11 90% Conﬁdence Intervals: Texas Utilities .................. 97

ix

Chapter 1

Valid Tests of Whether Technical
Inefﬁciency Depends on Firm

Characteristics

1 . 1 Introduction

In this chapter we consider the stochastic frontier model
316 = $23 + ”z” - Hi, “2' Z 0- (1-1)

The frontier is y: = xi-ﬂ + U, and u,- represents technical inefﬁciency. We follow the
literature in assuming that the x,- are “ﬁxed” and the v,- are i.i.d. normal. Now we
ask whether u,- depends on some variables 2,, which could be characteristics of the
ﬁrm or measures of the environment in which it operates. Speciﬁcally, we wish to
test the hypothesis that u,- does not depend on zi.

One way to do this is to assume a speciﬁc model of the alternative hypothesis that

shows how the z,- affect the 11,-. For example, we could assume:
u,- = exp(z,’-6) - uf, (1.2)

where the u: are i.i.d. according to some speciﬁc distribution, like exponential or half-
normal. Now we can estimate 6 by MLE and do a Wald test of the hypothesis that
6 = 0, which corresponds to the hypothesis that 2,- does not affect ui. In the frontiers
literature this would correspond to what is called a “one-step” procedure (e.g., see
Wang and Schmidt (2002)). Models of the form of (1.2) have been considered by Reif-
schneider and Stevenson (1991), Caudill and Ford (1993), Caudill, Ford, and Cropper
(1995), Wang and Schmidt (2002) and Alvarez, Amsler, Orea, and Schmidt (2005),
among others. We will follow the literature and call the multiplicative decomposition
of u, (as a function of 2,- times a random variable that does not depend on 2i) the
“scaling property.”

An objection to this type of procedure is that it depends fundamentally on the
alternative chosen. Under the null the scaling function exp(z£6) really does not exist
and so there are many more or less equally plausible alternatives. Partly for this
reason, one could consider a “two-step procedure” in which Step 1 would be to es-
timate the model ignoring the 2) to obtain efﬁciency measures 121', and Step 2 would
be a regression of fr,- on 2,- (or some function of z,). It is well known (Wang and
Schmidt (2002)) that when 2,- does affect 14,-, there are serious biases in both steps, so
two-step procedures are not recommended. However, under the null that 2,- does not
affect 11,, these biases do not arise, and it is not known whether a two-step procedure
provides a valid test of this null hypothesis. One contribution of this chapter is to
show that a two-step procedure that uses a standard t or F test in the second step
does not yield an asymptotically valid test. However, the test becomes valid if we use

a corrected variance matrix for the second-step coefﬁcients. Unfortunately, the form

of this correction is distribution-speciﬁc.

This raises the question of whether a test based on such a corrected two-step
procedure entails a loss of power. We do not have a full answer to this question. We
do show that, in the case that the alternative is the scaled exponential distribution,
the LM test of 6 = 0 is asymptotically equivalent to the corrected version of the
two-step procedure. Therefore at least in this case the two-step procedure entails no
loss in asymptotic local power.

If we assume the scaling property, as in (1.2) above, the stochastic frontier model
can also be estimated by nonlinear least squares. Testing whether 6 = 0 based on
nonlinear least squares involves some technical difﬁculties, because the mean of u? is
identiﬁed separately from the overall intercept under the alternative but not under
the null. We show how to deal with these difﬁculties and obtain an asymptotically
valid test.

In the last section of the chapter, we report the results of an extensive set of

simulations that investigate the size and power of these tests.

1 .2 Two-Step Procedures

We consider the stochastic frontier model (1.1). As stated in the Introduction, we
treat the 1:, as ﬁxed and we assume that the v, are i.i.d. N (0, 0,2,). We also assume that
the u: are i.i.d. with some speciﬁc distribution, such as exponential or half-normal,
that is known up to some parameters. Finally, the zi variables whose inﬂuence on
u,- we wish to test are independent of v; and uf. For the purposes of this section,
these assumptions could be weakened somewhat, but we would need the stronger set
subsequently, so we simply make them here.

To motivate the tests considered here, suppose that u,- were observed. Then we

could regress u,- on 2, and test the hypothesis that the coefﬁcients equal zero by

standard methods. More precisely, the regression would have to include an intercept
because E(u,-) is not equal to zero, and we would do an F-test on the coefﬁcients other
than the intercept.

Now let 1,6 equal the unknown parameters of the problem. These would be [3, 03
and whatever parameters there are in the distribution of uf. Step 1 of the two-step
procedure results in an estimate 1/3 which should be consistent and asymptotically
normal (subject to the usual regularity conditions). We then obtain an estimate of 11,-,
say 0,-(26). In the stochastic frontier model, fr,- is the expected value of u, conditional
on e,- E v, — 21,-, evaluated at the sample estimates, as suggested by Jondrow, Lovell,
Materov, and Schmidt (1982). It should be noted that, even if 26 were known, 21,-(16)
would be E(u,~|e,-) which is different from u,. However, 6,0,6) is a function of 6,, which
is i.i.d. and independent of 25,-. So, if we regressed 16,-(16) on intercept and 21-, an F-
test of the signiﬁcance of the coefficients of 2,- should be asymptotically valid. The

A

question is whether this is still true when u,(¢) is replaced by ugh/2). Unfortunately,

the answer is no. A valid test must account for the estimation error in 1]).

To show this, we could consider a regression of 0,-(2/3) on intercept and zi. However,
it is simpler to demean the 0,- by switching our attention to (MW) = E(u,-|c,-) -— E(u,-),
with 6,- : 05(26) being the corresponding estimate evaluated at the ﬁrst-step estimates

A

1,6. So now we simply wish to test whether 7 = 0 in the regression:

A

b,- = 2:7 + 12,-. (1.3)

Our test statistic will be ’7’ [WM-1'), where ’7 is the least squares estimate from
(1.3), and this should be asymptotically x2, if Varﬁ) is properly calculated.
This is a “generated dependent variable” problem that can be analyzed by meth-

ods similar to those used for the “generated regressor” problem (e.g., Wooldridge

(2002)lpp- 139-14ll)- We have bi = bill/J) = f (yr, 132', 111) and 56 = 55017) = f (315, 336, If).

By the Mean Value Theorem,

bi— — bi. + V16f(yii$ia ¢)I(¢— 11)) (14)

where 16 is between 16 and 16. Therefore

We Va:

1
N
2: ZiV¢f(yi,xe,16)'\/1V(d3 — 10)] 05)

From the last line of equation (1.5), we can see immediately that the term involv-
ing the estimation error in 16 will be relevant unless E[z,-V¢ f (yi,:r,',16)] = O. (In
this exceptional case, N '1 29;] ziV¢ f (yi,:c,-,16) A 0 and the last term vanishes.
Otherwise it does not.)

To proceed further, we use the same device as in Wooldridge (2002), and assume

that

N
x/N(16— N;ri(16 )+ op(1(1.6)

El”

where Evy-(1,6) = 0. We will be more speciﬁc about the form of ”(16), below. Then

‘=1 (1.7)

It follows from a central limit theorem applied to (1.7) that

x/N’y —-> N(0,B‘1AB“1) (1.8)
where

B = Ezizg, (1.9a)

A = E[(zibi + GTiXZibz' + Guy], (1.91))

G = EziV¢f(y,-,xi,16)’. (1.9c)

Also, all of these quantities can be consistently estimated by the corresponding sample
quantities: B: N“1 2?: 12,2; ,A= N—1 2,1: 1[(z,-b +Gr¢)(z,-bi +Gr,)'], G =
N71 217:1 ZiV¢f(yi,$i,¢) .

The remaining detail is an expansion for ri. The ﬁrst-step MLE 16 satisﬁes
23,-”:1 31(16) = 0, where 3,-(16) is the score function for observation 2'. (That is, 5,-(16)
is the derivative with respect to 1,6 of the ith observation’s contribution to the log

likelihood). Then another Mean Value Theorem expansion yields

N A N N u A
= 2 31(1)) = 23.06 + Evian/1x16 — 6), (1.10)
i=1 i=1

where 16 is between 16 and 16. So

i-l W i-l
N- - (1.11)
1
= —— I°‘1s.(1b)+ o (1)
x/TV' Z3, ”
where
, 1
1° = E86(¢)86(¢)' = —EV1686(¢) = ngnoo NI’ (1-12)
and
I = E(V¢1nL)(V¢lnL)’ = — E vi}, 1nL. (1.13)

I is the information matrix for the ﬁrst-step MLE problem with a log-likelihood
of lnL, and 1° is the limiting information matrix. In terms of the score, I =
Z£1E3i(16)si(16)’ = — zglEV¢si(16). Therefore, in (1.6) and the subsequent
expressions above, 13(16) 2 I°’ls,;(16). In terms of sample quantities, 1",- = T°’lsi(16)
where 2° 2 N’1 2,”; 31(16)s,-(16)’.

We note two things. First, the standard (naive) test of 7 = 0 that ignores the
effect of estimation error in 6,: corresponds to omitting the terms corresponding to
Gr,- in (1.9b). This test will be invalid unless G = 0. Since G = EziV¢f(y¢,x,-, 16),
this condition will hold if z,- is independent of x,- as well as of v,- and 11,. However, it
will generally fail if z, and x,- are correlated. Second, the “correct” test is not difﬁcult.
However, unsurprisingly, the form of the correction depends on the distribution of 11,-,
since that inﬂuences the nature of the ﬁrst-step MLE problem. There is no simple,

distribution-free correction.

1.3 The Scaled Exponential Case

In this section we consider the special case that 11.1 follows a scaled exponential distri-
bution. That is, u; = exp(zz'-6) - 11?, as in (1.2), where u? is distributed as exponential
with parameter A. We will derive the LM test of the hypothesis 6 = 0, and Show that
it is asymptotically equivalent to the (corrected) two-step procedure of the last sec-
tion. This shows that there is at least one case in which the two-step procedure does
not entail any loss of (local) power, compared to the usual Wald-likelihood ratio-LM
trinity of tests.

For the normal-scaled exponential model we consider, the pdf of the composite

error (e,- = v,- — 11,-) is:

 

1 61 012, 51 0v
. = ____. __ . 1-11) _ __
“61) Aexp(z£6) exp(Aexp(zz’.6) + 2A2 exp(2z,’-6)) ( (av + Aexp(zz’-6)))
(1.14)

where <I> is the cumulative distribution function of the standard normal distribution.
Note that under the null of 6 = 0, E(e,~) = — E(u,-) = —A and Var(e,-) = 03+A2. Also,
the distribution of u,- given q is N ( —c,- — 0,, / (A exp(z’-6)),0 3+) where “+” represents
truncation on the left at zero.

From (1.14), it follows that the log-likelihood function lnL(6, 5,03, A2) = lnL(O)

is given by:

lnL(l9 =2—Zln(A exp( 2,- 6) )+ Z_______. lAexp(z’6)

Z 2261:8110- (- #5))

(1.15)

 

The generic form of the LM statistic is

LM = V9lnL(6)'-I"1(6) . v9 lnL(6). (1.15)

Here 6 is the MLE subject to the restriction 6 = 0; 1(6) is the information matrix
evaluated at 6 = 6; and V9 lnL(6) is the score function, V9 lnL(6), evaluated at 6 - 6

If we partition 6 = (6’,16’), where 16 = (6’, 0,2,, Az)’, then

‘7‘“an l’(6)- 1‘” I” . (1.17)

V0 lnL(6) = 1
V1), lnL(6) L)“; 19,19

It IS a standard result that V9 lnL(6) is equal to zero for those elements of 6 that are
unrestricted. That is, V9, lnL(6) = 0. Therefore

' ° [I_1(9~)l66 ' 1761111467)
(1.18)

LM = V5 lnL(6)
= V51DL(6)’ - [i155 — 1.616112161166l—1 - V5 lnL(6)

where Z... stands for the *,* block of I, evaluated at 6 = 6.

A straightforward calculation reveals that

N ~v* ~ ~2
V5lnL(6) =22: (1%: 1___§%_1) (1.19)

where g,- = (16(6‘9/09 +0},/A))(1 - <I>(€,-/0‘v +0"v/A))"1 and 16 is the pdf of the standard

normal distribution. Note that

1 N - ~ - 63 1 N
V6 lnL(6)= :\ -Z 2,; Uvéi — 62' — T — 2; 221:5; (1.20)
i=1

where f),- = (01,51- - é; — fig/A — A) is (E(u,|e,-) — E(u,-)) E 0,- evaluated at 6. (This
follows because E(u,-|c,~) = 0v(£,- — (69/0,, + 0,,/A)) while E(u,-) = A.) Note that apart

from the scalar l/A, V5 lnL(6) equals the numerator of W6 2 (N ‘1 26.1.1 z,z£)‘1
N ’1/2 2,1:1 2,6,. So the LM test must be asymptotically equivalent to a properly
constructed test based on the two-step estimator 6. Some further algebraic details of
this equivalence are given in the Appendix. Basically, the naive test that ignores the
effects of estimation error in 61' would correspond to omitting the terms 151626 91,1?ng
in (1.18). These terms correspond to the same correction as was created by the terms
Gr,- in (1.9b) above.

This section’s result (that the LM test is asymptotically equivalent to a prOperly
constructed test based on a two-step procedure) holds for the case that u,- is expo-
nential with a scaling factor of the form exp(zz'6). So far as we can determine, it does
not hold for the scaled half-normal case. If it does not, then in the half-normal we
would expect the LM test to be better (in the sense of asymptotic local power) than
the two-step test of the last section. An interesting question for further research is

whether we can identify a class of distributions for which a result like the present one

holds.

1.4 A Test Based on Nonlinear Least Squares

In this section we continue to assume that the stochastic frontier model (1.1) is
correct. We further assume that the scaling property (1.2), with an exponential
O

scaling function, holds, so u,- = exp(z,’.6) - 11,-.

However, now we do not make any
speciﬁc distributional assumption about the uf. We simply assume that they are

i.i.d. and independent of 03,-, z,- and 1),.

Let p E E(u:’) = E(u;-’|:c,-, 2,). Then

E(y,-|x,-, 2,) = 3:26 — p - exp(z,'-6), (1.21)

10

or equivalently
111 = $23 - u - exp(zﬁé) + w: (1.22)

where E(w,-|:r,-, 2i) = 0. This model can be estimated consistently by nonlinear least
squares, as has been noted by Simar, Lovell, and Vanden Eeckaut (1994), Wang and
Schmidt (2002) and others. This raises the question of whether we can test the
hypothesis 6 = 0 based on the nonlinear least squares regression.

There is a non-trivial problem because the parameter p is not identiﬁed (separately
from the intercept in the regression) when 6 = 0. To see this clearly, we explicitly
distinguish the intercept from the rest of 213,-: 2:; = (1,122"), 6’ = (a, 6"") so that (1.22)
becomes

*I

y, = a + x, 6* — p - exp(zg6) + 103'. (1.23)
Alternatively we can write this as
111 = (a -u) +1066“ +u(1-exp(z£5))+wi. (124)

From (1.24) it is clear that (oz —— p) is identiﬁed, but 6 is identiﬁed only when 6 ¢ 0.

In cases such as this, in which some parameters (“nuisance parameters”) are not
identiﬁed under the null hypothesis, standard tests like the Wald test or the likelihood
ratio test are not asymptotically valid. A standard reference on this problem is
Hansen (1996). A Wald test in this context would consist of estimating 6 and then
testing whether it is signiﬁcantly different from zero, using a statistic of the form
6’ [Var(6)]—16, where 6 is the NLLS estimate and Var(6) is the asymptotic variance
matrix of 6. Such a test is not valid in this context because the usual Var(6) that

would be valid when 6 79 0 is not valid when 6 = 0, because of the non-identiﬁcation

11

of 11.

It is interesting that for our problem (though not for general problems) an asymp-
totically valid test can be derived from the LM (or score) test principle. We follow
the discussion in Wooldridge (2002)[pp. 363-369]. Let the NLLS criterion function
be

1 N 1 N
= N Z q(wi,0) - —-ﬁ 2(2 — xlﬂ + uexp(2£5))2, (1-25)

where 6 represents 6, p and 6, and mi represents yi, mi and 2,. Then the LM or
score test is based on the quantity V5QN(6), that is, on the derivative of Q N(6) with
respect to 6, evaluated at the restricted estimates 6. We might expect this approach
to fail here because [a is not well deﬁned. However, this turns out not to matter.
Doing the apprOpriate calculation,

View): % (y.-—x213+uexp<zlé>><nexn<zzéla> 1126)

EM?

and therefore (since 6 = 0):

N N
wen/(é) = N 22(16- — (a— 6)— 0’6“) >022.) = 92921212 zit-(621). (1.27)

z: =
Here 6 = ((61 — ii), 6*’)’ is just the coefﬁcient in a regression of y on X, and 11),: 2
22:6. In matrix form, the sum in (1.27) is equal to y'MX(ﬂZ), where MX =
I — X (X ’ X )‘1X ’ is the projection orthogonal to X. Note that if we regressed y on
[X, ﬁZ], the coefﬁcients of iiZ would be [(112 )' M X ([22 )]'I ([12 )' M Xy, so that the sum
in (1.27) is equal to the random (numerator) portion of this coefﬁcient. Therefore the

LM statistic will be equivalent to an F-statistic for the signiﬁcance of the coefficients

12

(say, C) of (ﬂzi) in the regression
y,- = .1326 -l- (fizz-)2 + error,. (1.28)

Now, the essential point is that this F-statistic is invariant to any non-zero value
of ii. That is, ii is just a scale factor for 2,, and changing ﬂ is like changing the units
of measurements of 2,. It does not affect the value of the F-statistic. (If we double [51,
this will cause 6 to be divided by two, and Var(é) to be divided by four, so the scale
factor “two” cancels from the test statistic.) So we can just set [1 = 1, and calculate
the LM statistic as the F-statistic for the signiﬁcance of the coefﬁcients of z,- in a
regression of yi on [$9, z,].

This is an intuitively reasonable result because, under the null hypothesis being
tested, E(y|X, Z) does not depend on Z.

An interesting and relevant fact is that the same test statistic would result if
we replaced the exponential scaling function exp(z,'-6) by any scaling function g(z,’-6),
where g is monotonic and differentiable at zero. The same derivation as above leads
us to a regression of y,- on x,- and pg’(0)z,t, or equivalently a regression of y,- on 1:,- and
2,. This is relevant because it suggests that the OLS-based test may have reasonable
power against a variety of alternatives (different scaling functions), whereas the power
properties of the MLE-based tests when the scaling function is misspeciﬁed are not
at all clear.

We note that, if v,- and 113° are i.i.d., the error in (1.27) is homoskedastic under
the null hypothesis. Nevertheless it is possible to consider a heteroskedasticity-robust
test. We simply have to use the heteroskedasticity—robust variance matrix of White
(1980). See Wooldridge (2002)[pp. 55-58] for details.

Another thing to note is the following. The test above is the F-test for the

signiﬁcance of the coefﬁcients of z in a regression of y on a: and z. This is the

13

same as the F-test for the signiﬁcance of the coefﬁcients of z in a regression of 11; on
:r and z, where as in (1.27) above 171 = y -— 126. It is essential that a: be included in
this regression, even though a: is orthogonal to 11'). If we regressed ti) on z only and

did an F-test, this test would not be valid, even asymptotically.

1.5 Simulations: Experimental Design

We wish to perform simulations to investigate the size and power properties of the
tests derived in the previous sections. The data generating process for our simulations
will be as follows:

3;,- = a + 631+ 11,-- exp(6zi) - u: (129)

=a+6xi—Aexp(6zi)+w,, 1': 1,--- ,N,

where w; _—_-. v,- -— exp(6z,)(ui — A). All random draws are independent over 1'. The
explanatory variables 2:,- and z,- are both scalars, and (15,-, z,)' is standard bivariate
normal with correlation p. The 12,- are distributed as N (0, 0,2,) and the u: are distrib-
uted as exponential with parameter A. The random variables (113,-,zi)’, vi and u: are
mutually independent.

The set of parameters is therefore a, 6, 6, 0,2,, A, p and N. We chose a “base case”

set of parameters as follows:
5:0,6=0,6=0,63=1,A=1,p=0.5,N=200. (1.30)

We will then change these parameter values, as described below, in our experiments.
We consider the following tests.

WALD. For the WALD test we estimate (1.29) by MLE and then test whether 6

14

is signiﬁcantly different from zero. Speciﬁcally, the WALD statistic is given by
WALD = [62 (I55 —I59I;,p1Ig)] (1.31)

where the notation is the same as in Section 1.3. Two different versions of the WALD
statistic are computed. WALD-OPG uses the OPG (outer product of the gradient)
estimate of the information matrix, while WALD-HES uses the negative Hessian
estimate of the information matrix.

LM. This is the LM statistic discussed in Section 1.3. The statistic is given by

N'

NP
LM= £20,217 (155—15¢1;W) 25121 . (1.32)

Air-1 i=1

>"H:

Once again we have different versions, depending on how the information matrix is
estimated. LM-OPG and LM-HES are analogous to WALD-OPG and WALD-HES.
GDV. This is the “generated dependent variable” test discussed in Section 1.2.

More speciﬁcally,

GDV= \/_7[Var(\/—)]1\/_7

=2: biz;

‘1 (1.33)

M2
571
3?

(b.z.+GI 3.61))2

i=1 i=1

I'[:]2

Here If is the negative Hessian form of the information matrix for the ﬁrst-step MLE,
as in (1.11) above. We also consider the test BADGDV, which is the invalid test
based on regression (1.3) above and which ignores the estimation error in 16.

OLS. This is the set of tests discussed in Section 1.4. OLS refers to the standard
F-test for signiﬁcance of the coefﬁcients of z,- in a regression of w on (1, Ii, 21-). This
reduces to a t-test in the present case since 2,; is scalar. We use the critical values

based on the standard normal distribution rather than the t-distribution but for our

15

values of N this makes essentially no difference. OLS-H is the heteroskedasticity-
robust version of the test. BADOLS is the invalid test based on the t-statistic for
the signiﬁcance of the coefﬁcient of I) when (3,- is regressed on 2,- (without intercept
or x,- in the regression), as discussed at the end of Section 1.4. BADOLS-H is the
heteroskedasticity-robust version of BADOLS.

The number of replications in the experiment was 10,000, except for a few cases
noted below.

The outputs of the experiments are as follows. For each of the parameter estimates,
we calculated their mean, standard deviation, and MSE. For the MLE of the full model
(needed for the WALD test calculations), the parameters estimated are a, ﬁ, 6, 0,2,
and A. For the MLE of the model subject to the restriction 6 = 0 (needed for the
LM and GDV test calculations), the parameters estimated are a, 0, 0,2, and A. Note
that, in the output tables, we report the mean, standard deviation, and MSE of the
estimates of A2, not A, for an easier comparison with the estimates of 0,2,. For the
NLLS estimates under the restriction that 6 = 0 (which is just OLS of y,- on I), and
is needed for the OLS test calculations), the parameters estimated are 17 = a — A, 0
and 0,2,, = 03 + A2.

We also calculated the mean, standard deviation and MSE of the technical efﬁ-
ciency estimates for the MLE and the restricted MLE. The technical efﬁciency of ﬁrm
2' is TE,- = exp(—11,.) and the technical efﬁciency estimate (Battese and Coelli (1988))

is

TEz' = E(eXP(-uz')lfi)

_ <I> (—av — «Si/av — cry/(A exp(dzi)» 1122 ' 03 (1.34)
— “*1/01; — av/(A exp(ézz‘») exP ( 2 + 6‘ + Aexp(6z,-)) °

 

Here a, = v,- —u, = y) —a — ﬁx) and TE,- is the expression (1.34) evaluated at the MLE

estimates. By the law of iterated expectations, E(TEi) = Eexp(—u,-). However, for

16

the calculation of MSE we average the squared deviations of TE,- for TE,- = exp(—11,),
not from E exp(—ui). The mean, standard deviation and MSE for TE,- are calculated
by averaging across observations (1' = 1, - -- ,N) as well as across replications. We
also report the correlation of TE, and TE). The is the average across replications of
the correlation coefﬁcient for a given replication.

For the tests, we calculated the proportion of rejections, which is interpreted
as size (if 6 = 0) or power (if 6 79 0). The size (or power) is calculated in four
ways. Sizel uses all 10,000 replications. Size2 drops replications in which there was a
numerical failure in the calculation of the WALD or LM statistics, due to outliers in
the estimates. Outliers are deﬁned as |6| _>_ 16, 6,, g 10'7 or in, 2 37, and A g 10‘7
or A 2 37. Size3 drops observations with negative LM statistics. These may occur
when the maximization algorithm fails to reach the global maximum. Finally, Size4
drops any replication dropped by either the Size2 or the Size3 calculation.

We also report the mean and standard deviations of the test statistics. This
calculation was done over the same set of replications used to calculated Size4.

Many of the replications discarded in Size2 and Size4 are ones in which the variance
parameters (03 and A2) and 6 are poorly estimated. Very small values of A2 tended
to go with very large values of 5, as the likelihood calculation seemed to try to
accommodate the presence of the one-sided error exp(6zi) - 11;? by balancing a small
variance of u: with a large value of exp(6zi). In these cases the variance of cf is also
hard to calculate, and it is just not clear whether or not they constitute evidence
against the null that 6 = 0. Dropping these cases primarily reduces the number of
rejections for the WALD tests. However, except for a few parameter values (e. g. very

large 03,), not enough replications were dropped to make much difference.

17

1.6 Simulation Results: Size

In this section, we investigate the size of the tests. Therefore all of the cases considered
have 6 = 0 so that the null hypothesis is true. All of the tests except BADGDV,
BADOLS and BADOLS-H (which we will call the BAD tests for short) are valid
asymptotically but we are interested in how substantial their size distortions may be

in ﬁnite samples.

1.6.1 Base case

We ﬁrst consider the base case: a = ﬂ = 6 = 0, 0,2, = A = 1, p = 0.5, and N = 200.
The results are given in Table 1.1.

The results for the point estimates are fairly unremarkable. There is little or no
evidence of ﬁnite-sample bias. The restricted MLE’s are better than the unrestricted
MLE’s, in terms of standard deviation and MSE, but the differences are quite small.

The sizes of the various tests differ fairly substantially from each other. All of the
BAD tests are indeed bad, in the sense of size substantially less than 5%. However,
some of the asymptotically valid tests also have sizes that are substantially different
from 5%. The WALD tests are substantially undersized. Conversely, the LM-OPG
test rejects too often. The LM-HES, GDV and OLS tests have size fairly close to 5%,

and the OLS-H test is only slightly worse than those three.

1.6.2 Effects of changing a or 5

Changes in a or B would not be expected to change the results, and this is true in
the following sense. We did one simulation with the same parameters as in the base
case except that a = 1, and another simulation with the same parameters except that
[3 = 1. These changes did not change the size of any of the tests, and the only effect

on the point estimates was to change the mean value of 6: or B by one.

18

1.6.3 Effects of changing N

Next we considered parameter values that were the same as in the base case, except
that we changed N to N = 500 (Table 1.2) and N = 1000 (Table 1.3).

When we increase N, we reduce the standard deviation and MSE of the various
parameter estimates, as expected. However, it is notable that we do not increase the
precision of the technical efﬁciency estimates except perhaps trivially. To understand
why, recall that the technical efﬁciency estimate is the expectation of exp(-u) condi-
tional on (v — 11), evaluated at the estimated values of the parameters. The variance
of this estimate depends on (i)“intrinsic variability,” by which we mean the variance
of exp(—u) conditional on (v - u), which does not depend on N, and (ii) “sampling
error,” by which we mean the variance of the parameter estimates, which does depend
on N. Apparently even for N = 200 sampling error is quite small relative to intrinsic
variability.

As would be expected, increasing N does not reduce the size distortions of the
BAD tests, but it does improve the asymptotically valid tests. For N = 500 we have
the same pattern of size distortions as we observed for N = 200, but they are much
smaller. Also the various types of numerical failures that distinguish Sizel from Size2,
Size3 and Size4 have largely disappeared. For N = 1000 all of the asymptotically valid
tests have reasonably accurate size; the worst is LM-OPG with size of 5.77%. The
good news in this statement is that the tests behave as they should asymptotically.
The bad news is that N = 1000 would be a very large sample size indeed for the type

of efﬁciency measurement exercise that is considered here.

1.6.4 Effects of changing p

Now we consider changes in p, the correlation between a: and z. The question of
interest is whether strong correlation between a: and 2 creates difficulties (akin to

multicollinearity) in estimation and whether this affects the tests. In the base case

19

we had p = 0.5, and now we keep the rest of the base case parameters but consider
p = —0.5 (Table 1.4), p = 0 (Table 1.5) and p = 0.9 (Table 1.6). We also considered
p = 0.25 and p = 0.75, and those results are in a supplementary set of tables.

In terms of the point estimates based on MLE, the value of p makes little difference.
When p = 0.9 the standard deviation and MSE of ﬂ and 6 do increase, but not by
very much. The value of p does not matter very much for any of the asymptotically
valid tests, and in fact the results for the OLS and OLS-H tests do not change at all.
For the BAD tests, it makes more difference, as asymptotic theory would suggest. For
p = 0 the BAD tests are asymptotically valid, and they have approximately correct

size, while for p = 0.9 the BAD tests have size of nearly zero.

1.6.5 Effects of changing A

Next we consider a change in A, the parameter of exponential distribution of the
one-sided error 11°. In Table 1.7 we report the results for A = 3, whereas the base
case had A = 1.

Since the overall error in the model is 1) — u, where v is normal noise, increasing A
effectively decreases the relative importance of the noise, and should make inference
about u or about the effect of 2 on u more reliable. Comparing Table 1.7 to Table 1.1,
we see that this is true. With the larger value of A, the sizes of the asymptotically
valid tests (other than GDV) all become closer to 5%. The effects of this change
on the point estimates were less clear, in part because when A = 3 there were more

outliers.

1.6.6 Effects of changing of,

Now we change 03 to 9, as opposed to its base case value of 1, holding the other
parameters the same. The results are in Table 1.8. This is a pure increase in statistical

noise and it should make all of the estimates and tests worse. Comparing Table 1.8

20

to Table 1.1, that turns out to be true for all of the estimates, and for most of the
tests. Among the asymptotically valid tests, the WALD tests and the GDV test are
very seriously affected. They give very few rejections. There is relatively little effect
of this change on the size of the LM-OPG and LM-HES tests or the OLS and OLS-H
tests, however.

It is also notable that the number of replications dropped in the size calculations
is very large with the higher value of 03. The data are close enough to normal that
the maximization process was difﬁcult. As a curiosity we ran the Schmidt and Lin
(1984) test of the hypothesis of no one-sided error, and we could reject this hypothesis
(at the 5% level) only 1,086 times out of 10,000.

1.7 Simulation Results: Power

In this section we investigate the power of the various tests. We therefore set 6 to
some non-zero value. An immediate problem that arises is that it is not meaningful
to compare the power of tests if their sizes are very different. One possibility is to
consider size-adjusted power, but this has the disadvantage that then we are no longer
investigating the power of a procedure that is feasible outside the simulation setting.
An alternative possibility, which we follow, is to investigate power using a sample size
sufﬁciently large that size distortions are not a serious problem. Therefore for all of
our simulations in this section we will set N = 1000. Our “base case” is therefore
the set of parameters for the simulations reported in Table 1.3, and we now change
6 from 0 to 0.05, 0.10 and 0.15, where these values were chosen to yield power that
moved through a reasonable part of the range between zero and one. These results
are given in Tables 1.9, 1.10 and 1.11.

Changing 6 has very little effect on any of the point estimates, other than the

mean of 6 , and we will not discuss the estimation results further.

21

Power increases as 6 increases, for obvious reasons. If we compare the WALD,
LM and GDV tests their powers are quite similar. Fine distinctions are hard to make
because even with N = 1000 their sizes were slightly different in Table 1.3. These
tests are all asymptotically valid, and they all have the same asymptotic local power,
so it is not surprising that their powers should be similar for N = 1000. A more
interesting comparison is between their power and the power of the OLS-based tests
(OLS and OLS-H). The OLS-based tests do not make use of the assumption that
the u: are exponential, and the failure to exploit this fact ought to make them less
powerful than the WALD, LM and GDV tests. This turns out to be true, with the
difference in power being non-trivial but not huge. For example, for 6 = 0.1, compare
0.51 for OLS to 0.64 for LM-HES.

We also did some additional simulations with p = 0.9, so that the variables I and
z are more highly correlated than in the cases just considered (which had p = 0.5).
Table 1.12 gives the results for 6 = 0.1 and p = 0.9, and the results for 6 = 0.05
and 0.15 are in our supplemental set of tables. Comparing Table 1.12 to Table 1.10,
we can see that the higher value of p results in substantially lower powers for all of
the tests. Among the asymptotically valid tests, the loss in power is much larger for
the OLS-based tests than for the WALD, LM or GDV tests. These differences are
certainly non-trivial. For example, the power of the LM-HES test changes from 0.64
to 0.48 when p changes from 0.5 to 0.9, while the power of the OLS test changes from
0.51 to 0.17.

The low power of the OLS-based tests occurs because of multicollinearity in the
OLS regression when I and z are highly correlated. The coefficient of z is poorly
estimated and it is hard to reject the hypothesis that it is zero. The MLE-based tests
do a better job of exploiting the nonlinearity of the relationship between y, I and z
and suffer less when I and z are highly correlated. How much this matters, in an

empirical setting, obviously will depend on how different the variables in z are from

22

those in I.

Finally, we did some simulations in which the tests are exactly as above, and
are therefore based on the assumption that the true scaling function is exp(6zi),
when in fact this is not the true scaling function. For these simulations, we have
u,- = ¢(6z,-)(1—<I>(6z,~))'1u;-’, where 11) is the standard normal density, <I> is the standard
normal cdf, and 11;? is exponential with parameter A = 1. So, in the data generating
process, the scaling function is the inverse Mill’s ratio, ¢(6zi)(1 — <I>(6z,'))"1.

Under the null, 6 = 0 and u, is exponential with parameter m. So our tests
based on the exponential scaling function correctly encompass the null, and the only
question is power. For the MLE-based tests, their power properties when the scaling
function is misspeciﬁed are certainly not clear. For the OLS-based tests, however, we
saw that the same statistic resulted from the score test principle for any monotonic
differentiable scaling function g(6z,-). As a result, we might expect our OLS test to
have better power properties relative to the MLE-based tests when the MLE-based
tests are based on the wrong scaling function.

Table 1.13 gives the simulation results with 6 = 0.1. These simulations have N =
1000, and are based on 2000 replications. The surprising aspect of these results is the
good performance of the MLE-based methods. The parameter estimates look quite
reasonable, despite the misspeciﬁcation of the model. Similarly the MLE-based tests
are more powerful than the OLS-based tests, despite the arguments of the previous

paragraph. These optimistic results deserve attention in future research.

1.8 Simulation Results: Robustness

In this section we investigate the effects of misspeciﬁcation of the distribution of the
one-sided error term. Speciﬁcally, we will consider the properties of the tests based

on the MLE that assumes an exponential error, when in fact the distribution of 11° is

23

either truncated normal or gamma.

We note at the outset that this issue does not arise with our OLS-based tests.
These do not rely on any distributional assumptions on the errors, and they are
asymptotically valid for any error distribution with ﬁnite variance (so that the central
limit theorem applies).

The MLE-based tests, on the other hand, will generally be invalid when the error
distribution is misspeciﬁed. Fundamentally this is simply because the likelihood is
then misspeciﬁed. To be more speciﬁc, consider the LM test or the GDV test based
on the normal-exponential model, as discussed in Sections 1.2 and 1.3 above. These
fundamentally depend on the quantity 211:1 2,13,: where f),- is an estimate of b,- :-
E(uilei) — E(ui), with e,- = v,- — 11,. The precise form of b, depends on the assumption
that v: is normal and U? is exponential. If in fact 11;? is not exponential, then E(bi) ¢ 0
and we cannot expect the test to be valid. A secondary but still relevant issue is that
the asymptotic variance of 2?; 746,-, which also ﬁgures into the test statistic, also
depends on the distributional assumption for uf being correct. See Section 1.2 above.

We emphasize that the lack of robustness of the MLE-based tests to distribu-
tional misspeciﬁcation is not just a ﬁnite-sample issue. This problem persists even
asymptotically.

The lack of validity of the MLE-based tests should show up in simulations as
incorrect size when the null hypothesis is true. The question then is how serious this
problem is. Greene (1990) has argued that the rankings of estimated inefﬁciencies
are often not sensitive to distributional assumptions on the one-sided error. Also, the
exponential distribution shares same features with other one-sided distributions. The
half-normal distribution, like the exponential, has a mode at zero. The gamma(gl, 92)
distribution with 91 = 1 is exponential, and for 0 < gl < 1 it has a shape similar to
the exponential.

In the simulations of this section we have a = B = 6 = 0, 03 = 1 and N = 1000.

24

The number of replication is 2000.

1.8.1 Normal-truncated normal

Here the distribution of u: is N (p, a2)+, that is, truncated normal. Table 1.14 gives
our results for the case that u = 0 and a2 = 7r / 2. This is the half-normal distribution
with mean equal to one. This choice makes the distribution somewhat comparable
to the exponential distribution with parameter one, as in Table 1.3 above. However,
the truncated normal with p = 0 and 02 = 1r/2 has variance equal to 0.57. (A
truncated normal, unlike an exponential does not have its mean equal to its standard
deviation.) We also considered three other cases: (i) p = 0, 02 = 7r / (7r — 2), for which
the variance equals one but the mean equals 1.32; (ii) p = O, a2 = 1; (iii) [.1 = 1,
02 = 1. The results for these three cases are in our supplemental set of tables.

In Table 1.14 we see that the OLS-based tests appear to have proper size, while the
MLE-based tests exhibit signiﬁcant size distortions. For MLE there are also consid-
erable biases in the parameter estimates. The WALD and GDV tests are undersized,
while the LM-OPG test rejects too often. This same pattern occurs for all four cases
that we considered but the extent of the size distortions varied considerably over
choices of p and 03.

Comparing Table 1.14 to Table 1.3, we also see that there are many more repli-
cations dropped when the distribution is misspeciﬁed. Obviously the data do not

always ﬁt the likelihood well and numerical problems occur.

1.8.2 N ormal—gamma

Now the distribution of u: is gamma(gl, 92). The results in Table 1.15 and Table
1.16 are similar to those in Table 1.14 for the normal-truncated normal case. The
OLS-based tests have more or less proper size, while the MLE-based tests do not. The

LM-OPG test rejects too often, and this is true across all of our (91, gg) values. The

25

WALD, LM-HES and GDV tests also show signiﬁcant size distortions, and sometimes
reject too seldom and sometimes too often, depending the value of (91, 92). The MLE
parameter estimates show clear biases. However, unlike the truncated normal case,
not many replications were dropped here. The exponential model ﬁts the data better
in the normal-gamma case than in the normal-truncated normal case. Interestingly,
that does not mean that it leads to more robust inference in the former case than in

the latter.

1.9 Concluding Remarks

In this chapter we have considered tests of the hypothesis that observable ﬁrm char-
acteristics do not affect technical efﬁciency. We do this in the context of a speciﬁc
model in which the one-sided errors are exponential. Under the null they are i.i.d.
while under the alternative they are scaled by a function exp(zz’.6), where z,- are the
ﬁrm characteristics whose inﬂuence we are testing.

In this context we can estimate the model by MLE and test whether 6 = 0, which
is the WALD test. We can also use an LM test. We show that a simple two-step test
is not valid. (Here step one is to estimate technical efﬁciency for each ﬁrm. Step two
is to regress these estimates on 2,- and test whether the coefﬁcients are zero.) This
test can be made valid by correcting the asymptotic variance matrix for the second-
step estimates. This correction is distribution—speciﬁc. When technical efﬁciency is
exponential, we show that the corrected two-step test is asymptotically equivalent to
the LM test.

We can also derive a valid test from the score test principle applied to the nonlinear
least squares problem. This takes the form of an F-test of the signiﬁcance of the
coeﬁicients of z,- in an OLS regression of output on 2,- and the inputs. This test does

not require a distributional assumption and it would be the same for any scaling

26

function of the form g(z,’-6), where g is monotonic and differentiable at zero. The
OLS-based test therefore has good robustness properties, but it may be expected to
have lower power than the MLE—based tests when the model for MLE is correctly
speciﬁed.

We perform a number of simulations to investigate the size and power properties of
the tests we have suggested. The OLS-based tests do turn out to have good robustness
properties and the MLE-based tests do turn out to be more powerful when the model
is correctly speciﬁed. The loss in power for the OLS-based tests is especially large
when the inputs and the ﬁrm characteristics 2,- are highly correlated. The MLE—based
tests show signiﬁcant differences among themselves when the sample size is not very
large. The WALD tests reject too seldom and the LM-OPG test rejects too often.
The LM test using the Hessian (LM-HES) and the corrected two—step test (GDV)
are generally most reliable. The MLE-based tests perform reasonably well if the
scaling function is misspeciﬁed but they do not have proper size if the distribution of
inefﬁciency is misspeciﬁed.

These results provide some guidance for empirical work. If the researcher’s interest
is not in the inefﬁciencies themselves, but just in testing whether they depend on ﬁrm
characteristics (like ﬁrm size, state versus private ownership, etc.) then the OLS-based
tests would be natural, unless these ﬁrm characteristics are very strongly correlated
with the inputs. However, if the researcher is going to estimate ﬁrm-level efﬁciencies
in any case, then a distributional assumption will ultimately be needed, and MLE-
based tests may as well be used. Among these tests the LM test using the Hessian or

the corrected two-step test would be preferred.

27

1.10 Output Tables

Table 1.1: (BASE CASE) 0 = ﬂ = 6 = 0, 03 =

[E(exp(—u)) = 0.5232]

A=1,p=0.5,N=200

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0001 0.1157 0.0134
6 -00342 0.1874 0.0363
8 0.0000 0.1004 0.0101
63 1.0072 0.2279 0.0520
12 0.9594 0.3524 0.1258
IE 0.5148 0.1769 0.0555 0.6131
Restricted MLE(6=0) 5: -0.0214 0.1817 0.0335
,6 0.0001 0.0932 0.0087
53 0.9985 0.2240 0.0502
12 0.9919 0.3461 0.1199
IE 0.5099 0.1767 0.0547 0.6194
Restricted NLLS 1'7 -1.0004 0.0994 0.0099
(OLS on 11,277+ pea-+1.6: 6" 0.0007 0.1004 0.0101
n=—1,p=0,ﬁ,=2) 53,, 2.0019 0.2691 0.0724
STATISTICS; Sizel Size2 Size3 Size4 Mean s.d.
WALD-OPG 0.0211 0.0213 0.0214 0.0215 —0.0027 0.8669
WALD-HES 0.0298 0.0300 0.0296 0.0297 .0.0045 0.9345
LM-OPG 0.0788 0.0783 0.0766 0.0763 1.2115 1.7972
LM-HES 0.0523 0.0518 0.0515 0.0509 1.0561 3.3408*
GDV 0.0466 0.0467 0.0471 0.0471 1.0067 1.3462
BADGDV 0.0363 0.0355 0.0350 0.0344 0.8564 1.2163
OLS 0.0495 0.0490 0.0481 0.0475 -0.0011 0.9973
OLS-H 0.0575 0.0572 0.0561 0.0558 -0.0003 1.0216
BADOLS 0.0237 0.0235 0.0233 0.0230 .0.0010 0.8634
BADOLS-H 0.0266 0.0265 0.0262 0.0260 .0.0014 0.8806
Rep. dropped 0 73 121 171

 

 

 

 

 

 

* due to outliers

28

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.2: (Change of N) N = 500, a = 6 = 6 = 0, 0,2, = A = 1, p = 0.5
[E(exp(-u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 25‘ 0.0000 0.0649 0.0042
6. -00125 0.1108 0.0124
B 00005 0.0627 0.0039
63 1.0024 0.1405 0.0197
12 0.9849 0.2171 0.0473
IE 0.5050 0.1790 0.0522 0.6215
Restricted MLE(6=0) 6 -0.0074 0.1089 0.0119
[3' -0.0005 0.0583 0.0034
63 0.9986 0.1390 0.0193
12 0.9985 0.2146 0.0461
C’P‘E‘ 0.5032 0.1792 0.0520 0.6227
Restricted NLLS 77 -10007 0.0632 0.0040
(OLS on y,=n+6e,e+w,-: 6 -00005 0.0629 0.0040
n=—1,B=0.03,=2) 53,, 2.0023 0.1681 0.0283
STATISTICS Sizel Size2_ Size3 Size4 Mean s.d.
WALD-OPG 0.0361 0.0361 0.0361 0.0361 -0.0005 0.9386
WALD-HES 0.0434 0.0434 0.0434 0.0434 0.0003 0.9741
LM-OPG 0.0611 0.0611 0.0611 0.0612 1.0967 1.5831
LM-HES 0.0503 0.0503 0.0502 0.0502 0.9876 1.4005
GDV 0.0502 0.0502 0.0502 0.0502 1.0127 1.3969
BADGDV 0.0355 0.0355 0.0355 0.0355 0.8613 1.2250
OLS 0.0513 0.0513 0.0513 0.0513 0.0005 0.9941
OLS-H 0.0538 0.0538 0.0538 0.0538 0.0008 1.0043
BADOLS 0.0245 0.0245 0.0245 0.0245 0.0003 0.8604
BADOLS-H 0.0254 0.0254 0.0254 0.0254 0.0005 0.8681
Rep.dropped 0 2 8 9

 

 

 

 

 

 

29

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.3: (Change of N) N = 1000, a = ﬂ = 6 = 0, 0,2, = A = 1, p = 0.5
[E(exp(—u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 0.0006 0.0450 0.0020
6 -00070 0.0757 0.0058
6 0.0007 0.0447 0.0020
63 1.0023 0.0965 0.0093
512 0.9907 0.1509 0.0228
TE 0.5026 0.1796 0.0514 0.6230
Restricted MLE (6: 0) 6 -0.0046 0.0750 0.0057
8' 0.0005 0.0419 0.0018
63 1.0006 0.0961 0.0092
312 0.9971 0.1499 0.0225
CITE" 0.5018 0.1796 0.0514 0.6235
Restricted NLLS 77 -1.0004 0.0448 0.0020
(OLS on y,=n+6x,-+w,: p" 0.0005 0.0449 0.0020
n=—1,,6=0,63=2) 63, 2.0001 0.1196 0.0143
STATISTICS Sizel Sizez' Size3 Size4 Mean s.d.
WALD-OPG 0.0448 0.0448 0.0448 0.0448 0.0141 0.9748
WALD-HES 0.0485 0.0485 0.0485 0.0485 0.0146 0.9916
LM-OPG 0.0577 0.0577 0.0577 0.0577 1.0545 1.5021
LM-HES 0.0513 0.0513 0.0513 0.0513 1.0006 1.4092
GDV 0.0522 0.0522 0.0522 0.0522 1.0136 1.4101
BADGDV 0.0377 0.0377 0.0377 0.0377 0.8789 1.2469
OLS 0.0490 0.0490 0.0490 0.0490 .0.0143 0.9970
OLS-H 0.0488 0.0488 0.0488 0.0488 -0.0145 1.0008
BADOLS 0.0243 0.0243 0.0243 0.0243 -0.0124 0.8636
BADOLS-H 0.0253 0.0253 0.0253 0.0253 -0.0123 0.8661
Rep. dropped 0 0 1 1

 

 

 

 

 

 

30

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.4: (Change of p) p = —0.5, 6 = 6 = 6 = 0, 63 = A = 1, N = 200
[E(exp(—u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0012 0.1157 0.0134
6 -0.0338 0.1859 0.0357
8 -0.0003 0.0994 0.0099
63 1.0071 0.2273 0.0517
512 0.9596 0.3518 0.1254
7"?) 0.5147 0.1769 0.0554 0.6133
Restricted MLE (6:0) 6 -00210 0.1808 0.0331
6 0.0001 0.0933 0.0087
63 0.9984 0.2244 0.0504
12 0.9922 0.3458 0.1196
TE 0.5098 0.1767 0.0546 0.6194
NLLS under the null 77' -1.0002 0.0995 0.0099
(OLS on y,=n+6s,-+w,-: ,6 0.0008 0.1004 0.0101
n=—1,6=0,63,=2) 63, 2.0023 0.2689 0.0723
STATISTKTS Sizel Sizez’ Size3 Size4 Mean s.d.
WALD-OPG 0.0175 0.0176 0.0177 0.0178 0.0092 0.8595
WALD-HES 0.0302 0.0304 0.0302 0.0304 0.0093 0.9313
LM-OPG 0.0776 0.0773 0.0758 0.0756 1.2179 1.7921
LM-HES 0.0515 0.0503 0.0515 0.0502 1.0036 1.4506
GDV 0.0467 0.0470 0.0472 0.0474 1.0081 1.3418
BADGDV 0.0367 0.0360 0.0355 0.0349 0.8546 1.2196
OLS 0.0495 0.0490 0.0481 0.0477 -0.0007 0.9973
OLS-H 0.0575 0.0571 0.0560 0.0557 0.0003 1.0217
BADOLS 0.0239 0.0236 0.0234 0.0231 -0.0002 0.8629
BADOLS-H 0.0283 0.0282 0.0276 0.0275 0.0013 0.8808
Rep. dropped 0 82 133 184

 

 

 

 

 

 

 

31

Table 1.5: (Change of p) p = 0, 07 = 6 = 6 = 0, 03 = A = 1, N = 200 [E(exp(—u)) =

 

 

 

 

 

 

 

0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0007 0.1062 0.0113
6 —0.0339 0.1858 0.0357
B 0.0001 0.0934 0.0087
63 1.0072 0.2271 0.0516
.12 0.9605 0.3509 0.1247
TE 0.5145 0.1767 0.0556 0.6150
Restricted MLE(6=0) 64 -0.0214 0.1812 0.0333
6 0.0001 0.0933 0.0087
63 0.9987 0.2243 0.0503
Li? 0.9916 0.3459 0.1197
T‘E’ 0.5099 0.1767 0.0546 0.6194
Restricted NLLS 17 -10003 0.0994 0.0099
(OLS on y,-=n+ 311734-1011 6 0.0008 0.1004 0.0101
n=—1,p=0,63,=2) ”3, 2.0020 0.2690 0.0723
STATISTICS Sizel Size2_ Size3 Size4 Mean s.d.
WALD-OPG 0.0207 0.0208 0.0209 0.0210 0.0034 0.8710
WALD-HES 0.0293 0.0295 0.0293 0.0295 0.0019 0.9376
LM-OPG 0.0770 0.0763 0.0754 0.0748 1.2030 1.7534
LM-HES 0.0515 0.0502 0.0514 0.0500 1.0164 1.6641
GDV 0.0452 0.0453 0.0455 0.0455 1.0128 1.3481
BADGDV 0.0517 0.0507 0.0502 0.0492 0.9824 1.3752
OLS 0.0495 0.0485 0.0483 0.0474 .0.0013 0.9963
OLS-H 0.0575 0.0568 0.0563 0.0556 .0.0002 1.0211
BADOLS 0.0499 0.0489 0.0487 0.0478 —0.0013 0.9963
BADOLS-H 0.0557 0.0550 0.0545 0.0539 .0.0002 1.0159
Rep. dropped 0 70 118 162

 

 

 

 

 

 

 

 

 

 

 

32

Table 1.6: (Change ofp) p = 0.9, a = )6 = 6 = 0, 0,2, =

,\ = 1, N = 200 [E(exp(-U)) =

 

 

 

 

 

 

 

 

 

 

 

 

0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 -00012 0.1414 0.0200
6 -00321 0.1876 0.0362
6” -0.0008 0.1217 0.0148
*3 1.0051 0.2280 0.0520
12 0.9593 0.3544 0.1272
{FE 0.5149 0.1781 0.0560 0.6079
Restricted MLE(6=0) 6 -00200 0.1812 0.0332
8 0.0000 0.0934 0.0087
63 0.9970 0.2234 0.0499
12 0.9946 0.3459 0.1196
TE 0.5094 0.1769 0.0546 0.6195
Restricted NLLS 6 -1.0004 0.0995 0.0099
(OLS on y,=n+6r,e+w,-: 6 0.0007 0.1005 0.0101
n=—1,8=0,63_,=2) 63, 2.0030 0.2695 0.0726
STATISTICS Sizel Size2- Size3 Size4 Mean s.d.
WALD-OPG 0.0188 0.0191 0.0192 0.0195 -0.0132 0.8417
WALD-HES 0.0314 0.0319 0.0316 0.0320 -0.0149 0.9290
LM-OPG 0.0820 0.0815 0.0807 0.0803 1.2490 1.8683
LM—HES 0.0561 0.0558 0.0558 0.0552 1.0876 6.7456*
GDV 0.0405 0.0408 0.0414 0.0415 0.9791 1.2808
BADGDV 0.0108 0.0106 0.0109 0.0108 0.5739 0.8467
OLS 0.0495 0.0491 0.0486 0.0484 0.0027 0.9991
OLS-H 0.0575 0.0571 0.0565 0.0562 0.0037 1.0230
BADOLS 0.0000 0.0000 0.0000 0.0000 0.0015 0.4345
BADOLS-H 0.0000 0.0000 0.0000 0.0000 0.0008 0.4441
Rep. dropped 0 171 217 346

 

 

 

 

 

 

* due to outliers

33

Table 1.7: (Change of A) A = 3, a = 6 = 6 = 0, 03 = 1, p = 0.5, N = 200

[E(exp(-u)) = 0.1095]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0000 0.0768 0.0059
6 -0.0183 0.2100 0.0445
6 -0.0010 0.1462 0.0214
63 0.9873 0.3385 0.1147
,1? 8.9144 1.7316 3.0054
TE 0.2531 0.2234 0.0330 0.7736
Restricted MLE(6=0) (1 -0.0122 0.2502 0.0627
6 -0.0010 0.1411 0.0199
63 0.9884 0.9388 0.8814
12 9.0074 1.7342 3.0072
”FE 0.2521 0.2234 0.0337 0.7740
Restricted NLLS 1’7 -2.9982 0.2227 0.0496
on yi=n+6Ii+wi: 6 0.0003 0.2224 0.0495
n=—3,6=0,a3,=10 63, 9.9914 1.8539 3.4365
STATISTICS Sizel Size2_ Size3 Size4 Mean s.d.
WALD-OPG 0.0397 0.0402 0.0397 0.0402 0.0001 0.9366
WALD-HES 0.0422 0.0427 0.0422 0.0427 -0.0002 0.9757
LM-OPG 0.0629 0.0580 0.0629 0.0580 1.0675 1.4782
LM-HES 0.0493 0.0440 0.0493 0.0441 0.9573 1.3053
GDV 0.0545 0.0499 0.0545 0.0499 1.0083 1.3753
BADGDV 0.0443 0.0386 0.0443 0.0386 0.8949 1.2349
OLS 0.0503 0.0440 0.0503 0.0441 .0.0041 0.9753
OLS-H 0.0565 0.0513 0.0565 0.0513 -0.0027 1.0011
BADOLS 0.0226 0.0183 0.0226 0.0183 .0.0034 0.8449
BADOLS-H 0.0259 0.0228 0.0259 0.0228 -0.0030 0.8664
Rep. dropped 0 124 1 125

 

 

 

 

 

 

34

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.8: (Change of 63) 63 = 9, 6 = = 6 = 0, A = 1, p = 0.5, N = 200
[E(exp(—u)) = 0.5100]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0012 0.2333 0.0544
6 0.0486 0.5434 0.2977
6 0.0035 0.2494 0.0622
63 8.5074 1.3725 2.1263
62 1.3239 1.1434 1.4120
TE 0.5221 0.0982 0.1002 0.2302
Restricted MLE(6=O) 6 0.0713 0.5544 0.3124
6 0.0036 0.2262 0.0512
63 8.5207 1.4162 2.2351
12 1.4226 1.1787 1.5676
TE 0.5138 0.0819 0.0965 0.2771
Restricted NLLS 6 -10015 0.2221 0.0493
(OLS on yi=n+ﬂI¢+wiz 6 0.0040 0.2254 0.0508
6=—1,6=0,63,=10) 63, 10.0157 1.0331 1.0675
STATISTICS Sizel Size2 f Size3 Size4 Mean s.d.
WALD-OPG 0.0011 0.0014 0.0021 0.0024 .0.0040 0.5877
WALD-HES 0.0041 0.0054 0.0040 0.0045 -0.0067 0.7009
LM-OPG 0.0792 0.0834 0.0506 0.0531 0.9325 1.4464
LM-HES 0.0506 0.0530 0.0674 0.0634 1.4862 21.7190*
GDV 0.0177 0.0160 0.0107 0.0114 0.6292 0.8579
BADGDV 0.0317 0.0305 0.0143 0.0133 0.5772 0.8821
OLS 0.0520 0.0503 0.0326 0.0323 0.0122 0.8829
OLS-H 0.0576 0.0558 0.0369 0.0363 0.0148 0.9078
BADOLS 0.0241 0.0234 0.0128 0.0123 0.0107 0.7641
BADOLS-H 0.0287 0.0286 0.0162 0.0163 0.0115 0.7844
Rep. dropped 0 2352 4689 5349

 

 

 

 

 

 

* due to outliers

35

Table 1.9: (Change of 6) 6 = 0.05, a = 6 = 0, 0,2, = A = 1, p = 0.5, N = 1000
[E(exp(-u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0520 0.0450 0.0020
6 -0.0048 0.0753 0.0057
6 0.0023 0.0442 0.0020
*3 1.0004 0.0979 0.0096
.12 0.9935 0.1507 0.0227
TE 0.5023 0.1802 0.0514 0.6240
Restricted MLE(6=0) 6 0.0006 0.0743 0.0055
6 -0.0156 0.0413 0.0020
63 0.9964 0.0971 0.0094
Z\2 1.0087 0.1497 0.0225
TE 0.5001 0.1803 0.0514 0.6240
Restricted NLLS (OLS on 77 -1.0009 0.0466 0.0022
y¢=n+BIi+wizn=—1.0013, 6 -0.0238 0.0447 0.0020
6=—0.0250,63,=2.0075) 63, 2.0091 0.1197 0.0143
STATISTICS [Powerl Power2 Power3 Power4 Mean s.d.
WALD-OPG l0.1960 0.1960 0.1962 0.1962 1.1269 0.9574
WALD-HES 0.2035 0.2035 0.2037 0.2037 1.1489 0.9714
LM—OPG 0.2270 0.2270 0.2272 0.2272 2.4506 2.7265
LM-HES 0.2090 0.2090 0.2092 0.2092 2.3278 2.5553
GDV 0.2115 0.2115 0.2117 0.2117 2.3049 2.4668
BADGDV 0.1710 0.1710 0.1712 0.1712 2.0401 2.2455
OLS 0.1690 0.1690 0.1687 0.1687 -0.9800 1.0053
OLS-H 0.1670 0.1670 0.1672 0.1672 —0.9824 1.0085
BADOLS 0.1000 0.1000 0.1001 0.1001 -0.8492 0.8713
BADOLS-H 0.0985 0.0985 0.0986 0.0986 -0.8496 0.8727
Rep. dropped 0 0 2 2

 

 

 

 

 

 

The number of replication is 2000.

36

Table 1.10; (Change of 6) 6 : 0.1, a : 6 : 0, e3 : ,\ : 1, p : 0.5, N : 1000
[E(exp(——u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.1024 0.0456 0.0021
6 -00047 0.0754 0.0057
6 0.0022 0.0441 0.0020
63 1.0003 0.0979 0.0096
62 0.9935 0.1517 0.0230
TE 0.5023 0.1814 0.0513 0.6268
Restricted MLE (6:0) 6 0.0092 0.0739 0.0055
6 -00329 0.0414 0.0028
63 0.9901 0.0970 0.0095
12 1.0336 0.1515 0.0241
T17: 0.4973 0.1817 0.0515 0.6251
Restricted NLLS (OLS on 77 -1.0047 0.0467 0.0022
y,:6+6x,-+w,~:6:—1.0050, 6 —0.0490 0.0449 0.0020
6:—0.0503,e3,:2.0304) 63, 2.0305 0.1228 0.0151
STATISTICS Powerl Power2 Power3 Power4 Mean s.d.
WALD-OPG 0.6115 0.6118 0.6115 0.6118 2.1879 0.9419
WALD-HES 0.6350 0.6353 0.6350 0.6353 2.2309 0.9407
LM—OPG 0.6580 0.6578 0.6580 0.6578 6.4945 4.8214
LM-HES 0.6410 0.6408 0.6410 0.6408 6.1642 4.5083
GDV 0.6425 0.6423 0.6425 0.6423 5.8979 4.1321
BADGDV 0.5915 0.5913 0.5915 0.5913 5.4355 4.0116
OLS 0.5105 0.5103 0.5105 0.5103 —1.9417 1.0116
OLS-H 0.5060 0.5058 0.5060 0.5058 -1.9357 1.0040
BADOLS 0.3770 0.3767 0.3770 0.3767 -1.6816 0.8769
BADOLS-H 0.3685 0.3682 0.3685 0.3682 -1.6696 0.8681
Rep. dropped 0 1 0 1

 

 

 

 

 

 

The number of replication is 2000.

37

Table 1.11: (Change of 6) 6 : 0.15, a : 6 : 0, e3 : A : 1, p : 0.5, N : 1000
[E(exp(—u)) = 0.5232]

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE ” 0.1528 0.0465 0.0022

0mw 0mm 0mm

0.0021 0.0440 0.0019

63 10000 00979 00096

A2 0%% 0mm 0mn

awn 0mm 0mm 0am
0mn 0mm 0mm
-0M% 0mm 0mg

0%m 0mm 0W%

10744 01556 00297

0.4925 0.1837 0.0516 0.6269
-1mm 0mm 0mm
0mm 0mm 0mm

2mm 0mm 0mm

 

 

7%) Q) 0'1

)

 

Restricted MLE (6 = 0)

 

Restricted NLLS (OLS on
31; = 7) + 61‘; + w): 7) = —1.0113,
6 : —0.0758, 63, : 2.0693)

 

 

quunai g) $31,755. Q: g)

 

 

 

 

 

 

STATISTICS l Powerl Power2 Power3 Power4 Mean s.d.
WALD-OPG 0.9100 0.9114 0.9104 0.9118 3.1972 0.9177
WALD-HES 0.9255 0.9269 0.9259 0.9273 3.2605 0.8930
LM-OPG 0.9350 0.9349 0.9354 0.9353 13.1080 7.0740
LM-HES 0.9310 0.9309 0.9314 0.9313 12.4001 6.5219
GDV 0.9280 0.9279 0.9284 0.9283 11.3511 5.6386
BADGDV 0.9065 0.9064 0.9069 0.9068 11.0093 5.9053
OLS 0.8220 0.8217 0.8228 0.8226 -2.9061 1.0167
OLS-H 0.8230 0.8227 0.8238 0.8236 -2.8711 0.9920
BADOLS 0.7425 0.7421 0.7432 0.7429 -2.5151 0.8813
BADOLS-H 0.7300 0.7296 0.7307 0.7303 -2.4673 0.8566
Rep. dropped 0 3 2 5

 

 

 

 

 

 

The number of replication is 2000.

38

Table 1.12: (Change of6 and p) 6 = 0.1, p = 0.9, a = 6 = 0, 03 = A = 1, N = 1000
[E(exp(—u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.1032 0.0541 0.0029
6 -0.0048 0.0757 0.0058
6 0.0032 0.0531 0.0028
63 1.0007 0.0980 0.0096
A2 0.9926 0.1525 0.0233
TE 0.5024 0.1815 0.0514 0.6261
Restricted MLE (6:0) 6 0.0052 0.0747 0.0056
6 -0.0607 0.0416 0.0054
63 0.9932 0.0975 0.0095
A2 1.0257 0.1520 0.0238
TE 0.4983 0.1811 0.0516 0.6235
Restricted NLLS (OLS on 6 -10047 0.0468 0.0022
y,:n+6n,-+w,-:6:—1.0050, 6 -0.0890 0.0452 0.0020
6:—0.0905,a3,:2.0304) 63, 2.0252 0.1220 0.0149
STATISTICS Powerl Power2 Power3 Power4 Mean s.d.
WALD-OPG 0.4405 0.4429 0.4405 0.4429 1.8212 0.9378
WALD-HES 0.4655 0.4681 0.4655 0.4681 1.8627 0.9343
LM—OPG 0.4915 0.4902 0.4915 0.4902 4.8166 4.0995
LM-HES 0.4835 0.4816 0.4835 0.4816 4.5987 3.8783
GDV 0.4670 0.4656 0.4670 0.4656 4.2959 3.4242
BADGDV 0.2675 0.2680 0.2675 0.2680 2.7913 2.3952
OLS 0.1715 0.1699 0.1715 0.1699 -0.9813 1.0056
OLS-H 0.1705 0.1689 0.1705 0.1689 -0.9835 1.0083
BADOLS 0.0000 0.0000 0.0000 0.0000 -0.4280 0.4382
BADOLS-H 0.0000 0.0000 0.0000 0.0000 -0.4247 0.4355
Rep. dropped 0 11 0 11

 

 

 

 

 

 

The number of replication is 2000.

39

Table 1.13: (Change of scaling functions to ¢(6z,-)/(1 — <I>(6z,-))) 6 = 0.1, a = 6 = 0,
a3 : A : 1, p : 0.5, N : 1000 [E(exp(—u)) : 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The number of replication is 2000.

40

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
6 0.0827 0.0514 0.0029
6 -0.0065 0.0783 0.0062
6 0.0021 0.0418 0.0018
63 1.0012 0.0938 0.0088
A2 0.6278 0.1206 0.1531
TE 0.5600 0.1574 0.0627 0.5509
Restricted MLE (6 : 0) 6 0.0043 0.0764 0.0059
6 -0.0230 0.0389 0.0020
63 0.9939 0.0928 0.0086
A2 0.6503 0.1187 0.1364
TE 0.5557 0.1577 0.0623 0.5485
Restricted NLLS 77 -0.7987 0.0420 0.0423
(OLS on 6,:6+6s,-+w,:: 6 -00305 0.0406 0.0026
6:—1,6:0,63,:2) 63, 1.6471 0.0897 0.1550
—STATISTICS P6wer4 Mean s.d.
WALD-OPG 0.3255 1.5395 0.9178
WALD-HES 0.3540 1.5824 0.9317
LM-OPG 0.4045 3.8867 73.6256
LM-HES 0.3740 3.6723 3.6913
GDV 0.3745 3.4652 3.0224
BADGDv 0.3170 3.1357 2.9020
OLS 0.2825 -1.3685 1.0079
OLS-H 0.2855 -1.3710 1.0093
BADOLS 0.1855 -1.1856 0.8737
BADOLS-H 0.1885 -1.1848 0.8731
Rep. dropped 0

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.14: (Change of the distribution of u: to N(0,7r/2)+) a = 6 = 6 =
a3:A:1,p:0.5,N:1000[E(exp(—u)):0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 -0.0001 0.0746 0.0056
6 -04192 0.1093 0.1876
6 0.0015 0.0423 0.0018
63 1.2244 0.1108 0.0626
A? 0.3455 0.1110 0.4407
TE 0.6359 0.1128 0.0834 0.5383
Restricted MLE (6:0) 61 -0.4106 0.1087 0.1804
6 0.0013 0.0383 0.0015
63 1.2185 0.1118 0.0602
A2 0.3569 0.1148 0.4268
TE 0.6320 0.1131 0.0815 0.5465
Restricted NLLS 6 -09993 0.0396 0.0016
(OLS on y,:6+6r,-+w,-: 6 0.0015 0.0386 0.0015
6:—1,6:0,63,:2) 63, 1.5730 0.0717 0.1874
STATISTICS Sizel Size2: Size3 Size4 Mean s.d.
WALD-OPG 0.0210 0.0213 0.0221 0.0223 -0.0005 0.8788
WALD-HES 0.0270 0.0274 0.0285 0.0287 .0.0003 0.9174
LM-OPG 0.0635 0.0635 0.0622 0.0622 1.0649 1.5782
LM-HES 0.0635 0.0620 0.0611 0.0595 1.0941 2.0656
GDV 0.0370 0.0371 0.0390 0.0388 0.9229 1.2642
BADGDV 0.0320 0.0320 0.0311 0.0308 0.8265 1.2130
OLS 0.0500 0.0503 0.0496 0.0494 0.0159 0.9915
OLS-H 0.0515 0.0518 0.0506 0.0505 0.0163 0.9963
BADOLS 0.0250 0.0249 0.0253 0.0250 0.0142 0.8591
BADOLS-H 0.0250 0.0249 0.0248 0.0244 0.0145 0.8626
Rep. dropped 0 32 103 118

 

 

 

 

 

 

The number of replication is 2000.

41

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.15: (Change of the distribution of 11;? to gamma(0.5,2)) a = 6 =
63:A:1,p:0.5,N:1000[E(exp(—u)):0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0003 0.0413 0.0017
6 0.4262 0.0651 0.1859
6 0.0010 0.0444 0.0020
63 0.7906 0.0825 0.0507
A2 2.0376 0.2222 1.1259
TE 0.4149 0.2146 0.0886 0.6763
Restricted MLE (6:0) 6 0.4276 0.0658 0.1871
6 0.0010 0.0422 0.0018
63 0.7897 0.0831 0.0511
A2 2.0450 0.2243 1.1424
TE 0.4144 0.2146 0.0888 0.6763
Restricted NLLS 17 -1.0003 0.0550 0.0030
(OLS on y,:6+6r,+w,: 6 0.0013 0.0541 0.0029
6:—1,6:0,1L3,:2) 63, 2.9983 0.2504 1.0594
—STATISTICS S-ize4 Mean s.d.
WALD-OPG 0.0860 0.0059 1.1330
WALD-HES 0.0765 0.0069 1.0860
LM-OPG 0.0630 1.1103 1.5402
LM-HES 0.0765 1.1829 1.6295
GDV 0.0585 1.0788 1.4734
BADGDV 0.0495 0.9787 1.3557
OLS 0.0560 .0.0140 1.0078
OLS-H 0.0540 -0.0143 1.0091
BADOLS 0.0235 -0.0120 0.8725
BADOLS-H 0.0230 -0.0122 0.8722
Rep. dropped 0

 

 

 

The number of replication is 2000.

42

 

 

 

 

 

 

 

 

 

 

 

 

Table 1.16: (Change of the distribution of 11;? to gamma(2,0.5)) 01 = 6 =
63:A:1,p:0.5,N:1000[E(exp(—u)):0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0013 0.0639 0.0041
6 -0.3772 0.0976 0.1518
6 0.0006 0.0413 0.0017
63 1.1015 0.1041 0.0211
A2 0.3945 0.1088 0.3785
TE 0.6187 0.1250 0.0703 0.5223
Restricted MLE (6:0) 6 -0.3695 0.0984 0.1462
6 0.0003 0.0379 0.0014
63 1.0959 0.1050 0.0202
A2 0.4058 0.1132 0.3659
ﬁ 0.6154 0.1255 0.0690 0.5263
Restricted NLLS 6 -0.9997 0.0392 0.0015
(OLS on y,:6+6.r,-+w,-: 6 0.0001 0.0385 0.0015
6:—1,6:0,63,:2) 63, 1.5002 0.0708 0.2548
STATISTICS Sizel Size2_ Size3 Size4 Mean s.d.
WALD-OPG 0.0300 0.0301 0.0304 0.0304 0.0204 0.9101
WALD-HES 0.0320 0.0321 0.0324 0.0325 0.0178 0.9430
LM-OPG 0.0640 0.0637 0.0633 0.0634 1.0674 1.4957
LM-HES 0.0600 0.0591 0.0592 0.0588 1.0155 1.4688
GDV 0.0415 0.0411 0.0415 0.0416 0.9545 1.2671
BADGDV 0.0335 0.0331 0.0329 0.0330 0.8294 1.1569
OLS 0.0480 0.0476 0.0471 0.0472 .0.0240 0.9990
OLS—H 0.0500 0.0496 0.0491 0.0492 -0.0236 1.0057
BADOLS 0.0235 0.0231 0.0228 0.0228 -0.0211 0.8648
BADOLS-H 0.0245 0.0241 0.0238 0.0238 -0.0206 0.8694
Rep. dropped 0 5 25 28

 

 

 

 

 

 

The number of replication is 2000.

43

1.11 Appendix: LM Test for the Scaled Exponen-
tial Case

Recall that the LM statistic of (1.18) is:
LM : V5 lnL(6)’[i'55 — 25¢TJ$T¢5]_1V5lnL(6~)
1 N I 1 N
_ __ ”.. " _“ ”-1“ -1 _ ”..
— A Z; bz 27 [I66 I6¢I¢¢I¢6l :\ 2 0,2,

1 N~ [.12- ~-_1~ '1 1 N~
= ﬁzbizi [N (1156—1-61pr $1.35)] W262i .

i=1

Now we compare LM with x/N ’y’ (Var(\/N’y))’1\/N’y where the asymptotic dis-
tribution of x/Nﬁ is derived in (1.8):

Wi'Warb/‘Niii-lx/Ni
1 N ~ I 1 N —1 l 1 N \_1 1 N ~
: (ﬁgbili) (Ngzzzz) BA B (Ngzizz) (TN £0121)
_ Ligh- ’A—l J—ﬁé +0 (1)
__ mizl 1 1 «TV—3:1 121 P
1 N ~ I 1 N I -1
= (76,233) N§(Zibi +GTi)(Zibi +070)

(1.36)

In the following we prove LM and x/I—V’y’ (Var(\/N 6))-1m '31 have the same as-
ymptotic distribution by showing that the probability limit of (N '1 231:1(2333 +

Gri)(z,-b,~ + Gri)') is [A2(I§5 — 1,3301%;11; 6)] where 1° is the limiting information

44

matrix as deﬁned in (1.12).

From (1.15), the gradient of the log-likelihood, V9 lnL(6) is

 

 

 

 

 

 

 

 

 

 

 

 

l L
(Q6%‘\ (26:13:15) A
61 L
79%— : 26:1 86(5) (137)
01 L — -
(Tang 261:1 31’ (012;)
61 L N . 2
(gr) (2.: 61A ))
Speciﬁcally,
N N 2
, __ 0062' 52' _ 0v _ .
2381(6) — g (Aexp(z ’6) Aexp(z£6) A2 exp(2256) 1) 2,,
N N
{i 1 >
3' 5 = (— - —— I ,
g 1( ) rzzl 0v Aexp(z:6) 1
N N
2 _ ( 1 _ 5i ﬁg;
E5310“) — Z; \2A2 exp(2zg6) 2A exp(zf6)av + 203) ’
N N
2:302) = :1( avg, 6‘ _ <73 _ 1 )
i=1 1 6— 2A3 exp(3z’-6) 2A3 exp(3zg6) 2A4 exp(4z,’-6) 2A2 exp(Zzgé)
(1.38)
where
' A 6
51' Z (Mg/av + 012/( exp(z’- ))) (1.39)

 

1 — @(ei/av + ov/(A exp(z;6)))
Let H (6) denote the Hessian, vglan) _=_ 62 lnL(0)/6066', and If denote H (6)
evaluated at 19 = 6 = (0’, 6’, 63, Az)’. We partition If conformably to (1.17) as

.. F155 1315,),

H : (1.40)

5’46 H66

45

where 11...... stands for the *,* block of H, evaluated at the estimates, and 16 =

(6’, 03, A2)’. Speciﬁcally, each element of If is:

N N
H55 = A—2 Z (Var(u,'|c,-) -— A2) 232;,

H56— — (A011) )lzvar(u1l€1) )331'2 it
1': 1N

H036 : (27163)" 121((—— — —) Vmeo + A)z,’.,

51,2, = (269-12 (Var(u.-lei) — i2) 2;.
1:1

N
H66 — a. 4 Z (Vents-Is) — a.) 4.4;,
i=1
N E-
' 2 1 J— — s v — ’,
H v5 ( 0 ) g ((53 ) ar(u1l€1) 51) $1
a. N N
HA26 = (2A 03)'12Var(ui|e,-)I,,
i=1
" — 45-1 _i___"l .._~2_ 7
H0303 — (4011) 12:; ((61, A ) Var(u,|6,) 6, 200) ,
a. N C 1 N
H3202 = (4A363)_1 Z ((732- — —) Var(u,|c,-) + A),
” i=1 a” ’\
~ ~ N N ~
HA2A2 = (466)—1 Z (Var(ui|€i) — 62) .
i=1

where we use

56.1..) = a. (a — (.3— . mitt-)0
Var(ui|6,-)— — 0,, 2(1 +((:: + Egg-(2767) 5,- - £3).

46

(1.41)

(1.42)

Lemma 1. Let G be deﬁned as in (1.9c). Then, with 6 = 0,

1 N )1 1 N
GZNZZ iV¢f(yi,xi,16 16) +0p(1):N-XZZI:V1/Jf(yiixi)w)’ +0130)
i=1 =1

1
WW

VIM2

A

(1.43)
A 2 /\
= NVW lnL I5=0 + 0p(1) = I—V-H5¢,6:0 + 013(1)
1
= -A (-1—V-H6¢,6:0) + 012(1) = ‘6166,6:0 + 010(1)-
Lemma 2. With 6 = 0,
I 1 N
2 _ . . ’
ATl‘N Eb Z22: :N 12G b '2') (6132') _ N;S'(6)s’(6) [5:0 (144)

: 166,6:0 + 019(1)-

Lemma 3. Consider 221:1 53(16) 2 (211:1 3,-(6)', 211:1 33(03), 221:1 83(A2))' as given
in (1.38). Then, with 6 : 0, N-1 23:, 5,-(16 16,-)6 2’- - A136 5:0 + 6,,(1).

Proof. Note that N‘1 21:1 53(16)b,-zz'- is equal to:

 

 

 

N
(012:; ZEVar(u,|e,)Izz \
N /\ . 1
+ ;(N(§ _. 5%) — W(Euilei - Emu-”1'21
—1 i (£1._1)V ( I )_ ) I
20.3N i=1( 03 A 31' U1, 61; 51, zi
N
A 1 6 66
‘N§('2T_2A6v 26;) '
1 ib2 I
1'78 I

 

 

47

                  

 

 

 

 

 

 

—1
(012)1( ' dwizi \
1 N 2 03 035i ‘73
+:3N§(2av+A—2+ A (—A0 + (1)503“-
: ’1 iv: (fi— l)Var(u |e)— e-zg)
203Ni=1 0,2, A z z
N
A 1 5 e 6
_ N236? - 2AA, + 2:71,)”, i
1 N
Ina—DE"; /

(1.45)

Now, the second terms in the ﬁrst two elements of the above vector converge in

probability to zero:

1 N 04 026' 03 I
0v 2N (202 +—— A — (Adv + A) 52‘) 12,2, = 013(1),

i=1

N
A 1 ﬁt 626' I
— — — . = 1 .
N 21(2A2 2on + 203 z* 0P( )

1:

 

 

     

 

This is because, ﬁrst,

1 o4 026‘ o3
E [32 (203+ A7 + :2- (Am, + 3%) 6i) 2:21;]

104 3 I
=00 —(203" + 73 +9; E(ei) — (Mu + ”7) Boss) E(m)

 

1 2 o4 02 03 0v I
= 3 (20,, + Kg + 7W4) - (My + -/\- 7 E(xizi) = 0,

U

(1.46)

(1.47)

where, with 6 = 0, e,- and 45,- are functions of only the error terms, 22,- and n: which

48

are assumed to be independent of x,- and Zi~ Note that E(g,) = O'v/A since

A = E(uz’) = E[E(ui|5i)l = E [w (Q — (:1. + if)»
A a v (1.48)
= 01) (E(éi) + 0—v " f):
which we solve for E(fi). Secondly,

I {2' 6i€i _ , 2 _
E(2A2 2on 20,3 ”AE31(0v)l6=0—0

 

(1.49)
where 3,-(03) is as deﬁned in (1.38). Then (1.45) is equal to:
-—(03N)"1 25:1 Var(ui|e,')xiz,'- + op(1)
—(203N)’12i’i1((ei/03- l/A) Varese» - at) z: + 0pm
_ N (1.50)
(2A3N) 122:1”?3;
l
= ‘NHW-w + 0pm = A (--1\7H¢6,6=0) + op<I> = Hinze + 0pm.
Cl

Recall that r,- = r,(16) = I°_ls,-(i6) = 122d16=03i(¢) = Izglsi as shown in (1.11).
Now we are ready to evaluate our main expression:

i=1
l N
= N bezzz; (1.51a)
i=1
1 N , ,
+ ’N 2 amp (1-51b)
i=1
1 N I ,
+ 7V- 2 ZzszzG (1'51C)
2'21

49

N
1
+ NZerizg. (1.51d)
We will evaluate these term by term. By Lemma 2,

bfz, z-gz _ —A 21°, + 0,,(1). (1.52)

2I~
1M2

Next, we evaluate (1.51b):

N N
iga We tZena)(rials->(Ittlsn1—vts+0p<1>

i=1
2 1 1 N 1
_ I _.
= A 13111311) “N232“ 13¢ L726 +0141) (1.53)
z:

_ 2 —1 —
— ’\ 13:11:22» 13:11:21» 136 + 019(1)
= Hamlets + opu)

Finally, we evaluate (1.51c):

N N

1 1 _

N E zibz-rgG’ = N E :Zibisglz)¢l(“’\11c66)+ 013(1)
i=1 i=1

1.54
: (W¢)I;;1(“AI;5) + 013(1) (using Lemma 3) ( )

= —AZI§¢I;;1117}6 + 0p(1).

And term (1.51d) is exactly the same as this term.
Inserting (1.52), (1.53) and (1.54) into (1.51), we obtain

N
1 o _
N' 2 1:( zzb + Gri) (z,b,- + Gr,)’ = A2 (13’, -— 1,5,13,11,35) + 0p(1)- (1-55)

Therefore the expressions inside the inverse in equations (1.35) and (1.36), for the

LM test and the GDV test respectively, have the same probability limit.

50

1.12 Appendix: Supplementary Tables

Supplemental Table 1.17: (Change of p) p = 0.25, a = 5 = 6 = 0, 0,2, = A = 1,
N = 200 [E(exp(—u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 0.0003 0.1084 0.0118
6 -0.0338 0.1859 0.0357
3 0.0001 0.0951 0.0090
63 1.0069 0.2271 0.0516
R? 0.9606 0.3510 0.1247
TE 0.5145 0.1767 0.0553 0.6145
Restricted MLE(6:0) 6”! -0.0213 0.1811 0.0333
3 0.0001 0.0933 0.0087
63 0.9985 0.2241 0.0502
P 0.9918 0.3459 0.1197
CFE 0.5098 0.1767 0.0546 0.6194
Restricted NLLS 5 4.0003 0.0994 0.0099
(OLS on y,=n+6x,-+w,-: B 0.0008 0.1004 0.0101
n=—1,6=0,a§,=2) 6,3, 2.0020 0.2691 0.0724
STATISTICS; Sizel Size2 ' Size3 Size4 Mean s.d.
WALD-OPG 0.0212 0.0214 0.0215 0.0216 .0.0005 0.8719
WALD-HES 0.0303 0.0305 0.0305 0.0306 .0.0019 0.9385
LM-OPG 0.0787 0.0780 0.0769 0.0762 1.2060 1.7718
LM-HES 0.0516 0.0507 0.0512 0.0503 1.0830 6.7671*
GDV 0.0465 0.0465 0.0471 0.0470 1.0121 1.3502
BADGDV 0.0484 0.0475 0.0469 0.0461 0.9524 1.3354
OLS 0.0495 0.0487 0.0483 0.0475 0.0008 0.9966
OLS-H 0.0575 0.0569 0.0563 0.0558 0.0016 1.0212
BADOLS 0.0432 0.0425 0.0420 0.0413 0.0007 0.9650
BADOLS-H 0.0490 0.0484 0.0479 0.0473 0.0009 0.9839
Rep. dropped 0 73 124 172

 

 

 

 

 

 

* due to outliers

51

 

 

 

 

 

 

 

 

 

 

 

 

Supplemental Table 1.18: (Change of p) p = 0.75, a = ﬂ = 6 = 0, 0,2, = A = 1,
N = 200 [E(exp(—u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 -0.0006 0.1280 0.0164
6 -00327 0.1869 0.0360
6‘ -00003 0.1109 0.0123
63 1.0055 0.2272 0.0516
,1? 0.9601 0.3527 0.1260
"TE 0.5147 0.1776 0.0557 0.6107
Restricted MLE(6:0) a 0.0199 0.1804 0.0329
6 0.0001 0.0933 0.0087
63 0.9968 0.2228 0.0496
:12 0.9944 0.3455 0.1194
TE 0.5094 0.1769 0.0546 0.6195
Restricted NLLS 77 -1.0004 0.0995 0.0099
(OLS on y,=n+Bx,-+w,-: B 0.0007 0.1005 0.0101
17=—1,/3=0,0,2U=2) 63,, 2.0025 0.2692 0.0725
STATISTICS Sizel Size2 Size3 Size4 Mean s.d.
WALD-OPG 0.0213 0.0216 0.0215 0.0217 -0.006O 0.8548
WALD-HES 0.0308 0.0312 0.0309 0.0312 -0.0072 0.9304
LM-OPG 0.0794 0.0790 0.0775 0.0775 1.2305 1.8417
LM-HES 0.0529 0.0527 0.0524 0.0523 1.0285 2.1901
GDV 0.0442 0.0446 0.0445 0.0448 0.9949 1.3236
BADGDV 0.0226 0.0224 0.0222 0.0220 0.6994 1.0175
OLS 0.0495 0.0491 0.0483 0.0480 -0.0003 0.9993
OLS-H 0.0575 0.0574 0.0563 0.0563 0.0005 1.0234
BADOLS 0.0027 0.0027 0.0027 0.0028 0.0000 0.6601
BADOLS-H 0.0052 0.0053 0.0052 0.0052 -0.0008 0.6740
Rep. dropped 0 123 161 244

 

 

 

 

 

 

52

 

 

 

 

 

 

 

 

 

 

 

 

Supplemental Table 1.19: (Change of 6 and p) 6 = 0.05, p = 0.9, a ﬁ = 0,
03 = A = 1, N = 1000 [E(exp(—u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 5 0.0527 0.0534 0.0029
8 -00045 0.0751 0.0057
3 0.0033 0.0530 0.0028
63 1.0004 0.0979 0.0096
12 0.9932 0.1506 0.0227
TE 0.5023 0.1803 0.0515 0.6234
Restricted MLE (6 = O) 61 -0.0003 0.0744 0.0055
1? -0.0295 0.0414 0.0026
6?, 0.9971 0.0975 0.0095
12 1.0069 0.1498 0.0225
:75 0.5006 0.1802 0.0515 0.6236
Restricted NLLS (OLS on f] -1.0009 0.0466 0.0022
yi=n+ﬂxi+wiz n=—1.0013, [3' -0.0438 0.0448 0.0020
6=-0.0451,e§,=2.0075) 53,, 2.0081 0.1194 0.0143
STATISTICS Powerl Power2 _Power3 Power4 Mean s.d.
WALD-OPG 0.1380 0.1384 0.1381 0.1385 0.9407 0.9431
WALD-HES 0.1520 0.1525 0.1521 0.1525 0.9617 0.9560
LM-OPG 0.1795 0.1795 0.1796 0.1796 2.0105 2.4353
LM-HES 0.1610 0.1610 0.1611 0.1611 1.9042 2.2712
GDV 0.1650 0.1650 0.1651 0.1651 1.8615 2.1505
BADGDV 0.0560 0.0562 0.0560 0.0562 1.1487 1.3851
OLS 0.0750 0.0747 0.0750 0.0748 -O.4981 1.0049
OLS-H 0.0750 0.0747 0.0750 0.0748 -0.5001 1.0104
BADOLS 0.0000 0.0000 0.0000 0.0000 -0.2175 0.4377
BADOLS-H 0.0000 0.0000 0.0000 0.0000 -0.2174 0.4383
Rep. dropped 0 6 l 7

 

 

 

 

 

 

The number of replication is 2000.

53

0,

 

 

 

 

 

 

 

 

 

 

 

 

Supplemental Table 1.20: (Change of 6 and p) 6 = 0.15, p = 0.9, a = ﬂ =
0,2, = /\ = 1, N = 1000 [E(exp(—u)) = 0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 0.1536 0.0549 0.0030
6 -00045 0.0758 0.0058
6 0.0029 0.0530 0.0028
63 0.9999 0.0980 0.0096
P 0.9931 0.1536 0.0236
TE 0.5023 0.1836 0.0512 0.6308
Restricted MLE (6:0) 6 0.0147 0.0746 0.0058
3 -0.0917 0.0419 0.0102
63 0.9856 0.0977 0.0097
312 1.0580 0.1548 0.0273
2717: 0.4945 0.1828 0.0520 0.6232
Restricted NLLS (OLS on 17 -1.0110 0.0470 0.0022
yi=n+ﬂxi+wizn=—1.0113, 6 -0.1353 0.0459 0.0021
6=—0.1365,63_,,=2.0693) 63, 2.0542 0.1264 0.0162
STATISTI-CS Powerl Power2 "Power3 Power4 Mean s.d.
WALD-OPG 0.7830 0.7865 0.7829 0.7864 2.6717 0.9268
WALD-HES 0.8140 0.8177 0.8139 0.8176 2.7330 0.8988
LM-OPG 0.8360 0.8358 0.8359 0.8357 9.3681 5.8643
LM-HES 0.8260 0.8262 0.8259 0.8261 9.0193 5.6024
GDV 0.8125 0.8122 0.8124 0.8121 7.9604 4.5539
BADGDV 0.6310 0.6308 0.6308 0.6307 5.5233 3.5299
OLS 0.3155 0.3154 0.3157 0.3156 -1.4680 1.0098
OLS-H 0.3155 0.3149 0.3157 0.3151 -1.4676 1.0078
BADOLS 0.0005 0.0005 0.0005 0.0005 -O.6396 0.4401
BADOLS-H 0.0000 0.0000 0.0000 0.0000 -O.6271 0.4319
Rep. dropped 0 1 9 1 10

 

 

 

 

 

 

The number of replication is 2000.

54

 

 

 

 

 

 

 

 

 

 

 

 

Supplemental Table 1.21: (Change of the distribution of u: to N (0, 1)+) a = B
6:0,63=A=1,p=0.5,N=1000[E(exp(—u))=0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 3 0.0010 0.0984 0.0097
6 -O.3498 0.1193 0.1366
6 0.0014 0.0396 0.0016
63 1.1488 0.0995 0.0321
.12 0.2119 0.0933 0.6298
TE 0.6958 0.0885 0.0835 0.4635
Restricted MLE (6 =0) 67 -0.3395 0.1159 0.1286
3 0.0011 0.0356 0.0013
63 1.1434 0.0996 0.0305
i2 0.2218 0.0955 0.6147
7713‘ 0.6901 0.0881 0.0807 0.4832
Restricted NLLS 77 -0.7972 0.0365 0.0425
(OLS on y,=n+6x,-+w,-: 6" 0.0012 0.0357 0.0013
n=—1,6=0,63,=2) 63, 1.3658 0.0612 0.4060
STATISTICS; Sizel Size2 — Size3 Size4 Mean s.d.
WALD-OPG 0.0125 0.0132 0.0149 0.0153 0.0178 0.8294
WALD-HES 0.0205 0.0217 0.0217 0.0223 0.0183 0.8670
LM-OPG 0.0560 0.0565 0.0490 0.0491 1.0036 1.5132
LM—HES 0.0725 0.0713 0.0657 0.0637 2.0072 34.9627*
GDV 0.0255 0.0269 0.0291 0.0300 0.8246 1.1256
BADGDV 0.0320 0.0328 0.0291 0.0293 0.7639 1.1324
OLS 0.0510 0.0486 0.0434 0.0427 -0.0098 0.9697
OLS-H 0.0540 0.0502 0.0453 0.0446 -0.0093 0.9744
BADOLS 0.0260 0.0254 0.0229 0.0229 -0.0082 0.8404
BADOLS-H 0.0240 0.0232 0.0205 0.0204 .0.0079 0.8439
Rep. dropped 0 107 387 431

 

 

 

 

 

 

 

* due to outliers. The number of replication is 2000.

55

Supplemental Table 1.22: (Change of the distribution of u: to N (0,1r/ (7r — 2))+)

a = 6 = 6 = 0, 63 = ,\ = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 3 -00002 0.0548 0.0030
6 -05135 0.1031 0.2744
8 0.0016 0.0460 0.0021
63 1.3480 0.1296 0.1379
R? 0.6616 0.1488 0.1366
{FE 0.5541 0.1456 0.0774 0.6116
Restricted MLE (6 =0) 52 -0.5069 0.1047 0.2679
6 0.0017 0.0423 0.0018
63 1.3420 0.1311 0.1342
.12 0.6744 0.1533 0.1295
T177 0.5518 0.1460 0.0764 0.6140
Restricted NLLS 1? -13225 0.0448 0.1060
(OLS on y,-=n+6:r,:+w,-: 8 0.0019 0.0433 0.0019
n=—1,6=0,63,=2) 63, 2.0018 0.0938 0.0088
STATISTICS Sizel Size2! Size3 Size4 Mean s.d.
WALD-OPG 0.0295 0.0296 0.0296 0.0297 .0.0025 0.9065
WALD-HES 0.0395 0.0396 0.0396 0.0397 -0.0009 0.9490
LM-OPG 0.0605 0.0607 0.0607 0.0608 1.0705 1.5630
LM-HES 0.0535 0.0536 0.0532 0.0533 0.9746 1.4373
GDV 0.0510 0.0511 0.0512 0.0513 0.9959 1.3894
BADGDV 0.0390 0.0391 0.0391 0.0392 0.8596 1.2448
OLS 0.0480 0.0481 0.0481 0.0483 0.0098 1.0040
OLS-H 0.0500 0.0501 0.0502 0.0503 0.0099 1.0085
BADOLS 0.0260 0.0261 0.0261 0.0261 0.0089 0.8698
BADOLS-H 0.0270 0.0271 0.0271 0.0271 0.0091 0.8730
Rep. dropped 0 5 6 11

 

 

 

 

 

 

The number of replication is 2000.

56

 

 

 

 

 

 

 

 

 

 

 

 

Supplemental Table 1.23: (Change of the distribution of u: to N (1, 1)+) a = B
6:0,63=A=1,p=0.5,N=1000[E(exp(—u))=0.5232]
ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 0.0000 0.0992 0.0098
6 -0.8053 0.1340 0.6665
8 0.0015 0.0431 0.0019
63 1.3787 0.1197 0.1577
.12 0.2470 0.1141 0.5800
ﬁ 0.6807 0.0910 0.1564 0.5174
Restricted MLE (6:0) 6 0.7959 0.1294 0.6502
6 0.0015 0.0391 0.0015
63 1.3738 0.1193 0.1539
i2 0.2565 0.1144 0.5658
27:5 0.6755 0.0884 0.1471 0.5412
Restricted NLLS 17 -1.2871 0.0405 0.0840
(OLS on yi=n+ﬂxi+wiz 6 0.0015 0.0392 0.0015
n=—1,6=0,63,=2) 63, 1.6312 0.0736 0.1414
STATISTICS Sizel Size2_ Size3 Size4 Mean s.d.
WALD-OPG 0.0055 0.0058 0.0069 0.0070 0.0118 0.7864
WALD-HES 0.0120 0.0127 0.0119 0.0122 0.0148 0.8308
LM-OPG 0.0605 0.0602 0.0469 0.0480 0.9462 1.3146
LM-HES 0.0745 0.0755 0.0695 0.0698 1.2467 4.0045*
GDV 0.0195 0.0195 0.0188 0.0192 0.7776 0.9973
BADGDV 0.0285 0.0280 0.0200 0.0205 0.7140 0.9702
OLS 0.0540 0.0512 0.0438 0.0423 .0.0224 0.9365
OLS-H 0.0550 0.0517 0.0444 0.0423 -0.0228 0.9407
BADOLS 0.0215 0.0211 0.0144 0.0141 -0.0196 0.8114
BADOLS-H 0.0215 0.0211 0.0138 0.0135 .0.0202 0.8146
Rep. dropped 0 106 402 439

 

 

 

 

 

 

* due to outliers. The number of replication is 2000.

57

Supplemental Table 1.24: (Change of the distribution of u: to gamma(0.5, J2»

6 = B = 6 = 0, 63 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 8 0.0004 0.0446 0.0020
6 0.3384 0.0667 0.1190
6 0.0009 0.0412 0.0017
63 0.8493 0.0812 0.0293
X2 1.0968 0.1470 0.0309
TE 0.4905 0.1903 0.0837 0.6173
Restricted MLE (6:0) 61 0.3389 0.0660 0.1192
6” 0.0008 0.0389 0.0015
63 0.8494 0.0810 0.0292
512 1.0996 0.1452 0.0310
TE 0.4902 0.1902 0.0838 0.6175
Restricted NLLS 1'? -0.7075 0.0449 0.0876
(OLS on y,-=n+ 66,416,: 6" 0.0009 0.0440 0.0019
n=—1,6=0,63,=2) 63, 1.9974 0.1367 0.0187
—STATISTICS Size4 Mean s.d.
WALD-OPG 0.0710 0.0081 1.0704
WALD-HES 0.0620 0.0089 1.0491
LM-OPG 0.0630 1.0989 1.5310
LM-HES 0.0655 1.1158 1.5362
GDV 0.0525 1.0550 1.4346
BADGDV 0.0425 0.9360 1.3003
OLS 0.0495 -0.0156 0.9961
OLS-H 0.0520 -0.0155 0.9984
BADOLS 0.0225 -0.0132 0.8622
BADOLS-H 0.0215 .0.0132 0.8623
Rep. dropped 0

 

 

The number of replication is 2000.

58

Supplemental Table 1.25: (Change of the distribution of u? to gamma(2,1/\/2))

a = 6 = 6 = 0, 63 = ,\ = 1, p = 0.5, N =1000[E(exp(-u))= 0.5232]

 

 

 

 

 

 

 

 

 

 

 

 

ESTIMATION METHODS Estimates Mean s.d. MSE Corr
MLE 6 0.0007 0.0488 0.0024
6 -O.5069 0.0880 0.2647
6 0.0009 0.0452 0.0020
63 1.1766 0.1170 0.0449
2\2 0.8278 0.1491 0.0519
TE 0.5253 0.1637 0.0684 0.6072
Restricted MLE(6:0) 61 -0.5013 0.0901 0.2594
6 0.0007 0.0422 0.0018
63 1.1716 0.1179 0.0433
.12 0.8403 0.1540 0.0492
TE 0.5235 0.1641 0.0677 0.6084
Restricted NLLS 6 -1.4140 0.0452 0.1735
(OLS on yi=n+ﬁxi+w32 6 0.0005 0.0445 0.0020
n=—1,6=0,63,=2) 63, 2.0009 0.1020 0.0104
—STATISTICS Size4 Mean s.d.
WALD-OPG 0.0395 0.0161 0.9434
WALD-HES 0.0415 0.0153 0.9745
LM-OPG 0.0610 1.0617 1.4787
LM-HES 0.0475 0.9753 1.3569
GDV 0.0535 1.0122 1.3750
BADGDV 0.0335 0.8640 1.2086
OLS 0.0465 -0.0230 1.0065
OLS-H 0.0500 -0.0225 1.0130
BADOLS 0.0235 -0.0202 0.8711
BADOLS-H 0.0240 -0.0197 0.8754
Rep. dropped 0

 

 

 

The number of replication is 2000.

59

Chapter 2

On the Accuracy of Bootstrap
Conﬁdence Intervals for Efﬁciency

Levels in Stochastic Frontier

Models with Panel Data

2.1 Introduction

This chapter is concerned with the construction of conﬁdence intervals for efficiency
levels of individual ﬁrms in stochastic frontier models with panel data. A number
of different techniques have been proposed in this literature to address this problem.
Given a distributional assumption for technical inefﬁciency, maximum likelihood esti-
mation was proposed by Pitt and Lee (1981). Battese and Coelli (1988) showed how
to construct point estimates of technical efﬁciency for each ﬁrm, and Horrace and
Schmidt (1996) showed how to construct confidence intervals for these efﬁciency lev-
els. Without a distributional assumption for technical efﬁciency, Schmidt and Sickles

(1984) proposed ﬁxed effects estimation, and the point estimation problem for efﬁ-

60

ciency levels was discussed by Schmidt and Sickles (1984) and Park and Simar (1994).
Simar (1992) and Hall, Hardle, and Simar (1993) suggested using bootstrapping to
conduct inference on the efﬁciency levels. Horrace and Schmidt (1996) and Horrace
and Schmidt (2000) constructed conﬁdence intervals using the theory of multiple
comparisons with the best, and Kim and Schmidt (1999) suggested a univariate ver-
sion of comparisons with the best. Bayesian methods have been suggested by Koop,
Osiewalski, and Steel (1997) and Osiewalski and Steel (1998).

In this chapter we will focus on bootstrapping and some related procedures. We
provide a survey of various versions of the bootstrap, for construction of conﬁdence
intervals for efﬁciency levels. We also propose a simple alternative to the bootstrap
that uses standard parametric methods, acting as if the identity of the best ﬁrm
is known with certainty, and we propose some new resampling methods that corre-
spond to this parametric procedure. We present Monte Carlo simulation evidence on
the accuracy of the bootstrap and our simple alternative. Finally, we present some

empirical results to indicate how these methods work in practice.

2.2 Fixed-Effects Estimation of the Model

Consider the basic panel data stochastic frontier model of Pitt and Lee (1981) and
Schmidt and Sickles (1984),

yit=a+x2tﬂ+vit—ui, i=1,---,N, t=1,--- ,T, (2.1)

where i indexes ﬁrms or productive units and t indexes time periods. y“ is the scalar
dependent variable representing the logarithm of output for the 7th ﬁrm in period t, a
is a scalar intercept, zit is a K x 1 column vector of inputs (e.g., in logarithms for the
Cobb-Douglas speciﬁcation), 6 is a K x 1 vector of coefﬁcients, and ”it is an i.i.d. error

term with zero mean and ﬁnite variance. The time-invariant 11,- satisfy u,- 2 0, and

61

11,- > 0 is an indication of technical inefﬁciency. For a logarithmic speciﬁcation such
as Cobb-Douglas, the technical efﬁciency of the 1th ﬁrm is deﬁned as ri = exp(—11,),
so technical inefﬁciency is 1 — r3. For small values of 11,-, 11,- is approximately equal to
1 — exp(—11;) = 1 — r3, so that 11,; itself is sometimes used as a measure of technical
inefﬁciency.

Now deﬁne a; = oz -— 111'. With this deﬁnition, (2.1) becomes the standard panel

data model with time-invariant individual effects:
I
3111 = 01 + $115 + ”it (2-2)

Obviously we have 11,- : a — oz,- and 01,: S 01 since 11,- 2 0. The previous discussion
regards zero as the minimal possible value of 111' and a as the maximal possible value
of a,- over any possible sample; that is, essentially, as N —+ 00. It is also useful
to consider the following representation in a given sample size of N. We write the

intercepts oz,- in ranked order, as:
0(1) S 0(2) S. S a(N) (2-3)

so that in particular (N) is the index of the ﬁrm with largest value of 01,- among
N ﬁrms. It is convenient to write the values of 11,: in the opposite ranked order, as
“(N) _<_ S 11(2) 3 11(1), so that a“) = 01 — 11(3). Then obviously 0W) = 01 — “(N),
and ﬁrm (N) has the largest value of a,- or equivalently the smallest value of 11,-
among N ﬁrms. We will call this ﬁrm the best ﬁrm in the sample. In some methods
we measure inefﬁciency relative to the best ﬁrm in the sample, and this corresponds

to considering the relative efﬁciency measures:

:1:

11:- = 11,- — "(N) 2 am) -— 61,-, r3" = exp(—113). (2.4)

62

Fixed effects estimation refers to the estimation of the panel data regression model
(2.2), treating a,- as ﬁxed parameters. Because the a, are treated as parameters, we
do not need to make any distributional assumption about the inefﬁciencies; nor do we
need to assume that they are uncorrelated with the $11 or the 11,-t. We assume strict
exogeneity of the regressors 33,}, in the sense that (1,-1,1652, - - - ,x,T) are independent
of (0,1,v,2, - -« ,11,T). We also assume that the 12,} are i.i.d. with zero mean and
constant variance 0,2,. We do not need to assume a distribution for the v,,.

The ﬁxed effects estimates 6, also called the within estimates, may be calculated
by regressing (ya — 17,-) on (17111 - 13,-), or equivalently by regressing ya on “lit and a set
of N dummy variables for ﬁrms. We then obtain 0?,- = g, — 5:3, or equivalently the
61 are the estimated coefﬁcients of the dummy variables. This leads to the following

expression for 61,-:
6,- = 6,- + 17,- — 6m“ — 6). (2.5)

The ﬁxed effects estimate 6 is consistent as NT -—> co, and its variance is of order

(N (T — 1))‘1. For a given ﬁrm 1', the estimated intercept 61,- is a consistent estimate

of a,- as T —-> 00. Large T is needed for the term 17,- in (2.5) to become negligible.
Schmidt and Sickles (1984) suggested the following estimates of technical inefﬁ-

ciency, based on the ﬁxed effects estimates:
2‘ = 61 — 61,-. (2.6)

Since these estimates clearly measure inefﬁciency relative to the ﬁrm estimated to be
the best in the sample, they are naturally viewed as estimates of a( N) and 112‘, that
is, of relative rather than absolute inefﬁciency.

We deﬁne some further notation. Suppose we write the estimates 61,- in ranked

63

order, as follows:
511 S 512 S S 51[N]- (2-7)

So [N] is the index of the ﬁrm with the largest 131,, whereas (N) was the index of the
ﬁrm with the largest 01,-. These may not be the same; for example, ﬁrm 129 could be
the true best ﬁrm (that is, the one with the biggest 01,-), so that (N) = 129, but ﬁrm
71 could be the estimated best ﬁrm (that is, the one with the biggest 61,-), so that
[N] = 71. Note also that d as deﬁned in (2.6) above is the same as ﬁlm], but it may
not be the same as am), the estimated a for the unknown best ﬁrm.

As T —+ 00 with N ﬁxed, 61 is a consistent estimate of a( N) and 112' is a consistent
estimate of 11;“. However, it is important to note that in ﬁnite samples (for small T)
61 is likely to be biased upward, since 61 2 c‘r( N) and E(&(N)) 2 am). That is, the
“max” operator in (2.6) induces upward bias, since the largest d,- is more likely to
contain positive estimation error than negative error. This bias is larger when N is
larger and when the 61,- are estimated less precisely. The upward bias in ii induces
an upward bias in the 11;“ and a downward bias in 7"; = exp(—1‘12“); we underestimate
efficiency because we overestimate the level of the frontier.

Schmidt and Sickles (1984) argued that 51 and 11'; are consistent estimates of a and
11,- if both N and T approach 00; that is, if both N and T are large, we can regard the
11'; as estimates of absolute and not just relative inefﬁciency. The argument is simple.
As T ——> 00, d and 11: are consistent estimates of a( N) and 11;", as noted above. As
N —-> oo, 11( N) should converge to 0 so that 01( N) converges to a and the 11: should
converge to the corresponding 11,-. A more rigorous treatment of the asymptotics for
this model is given by Park and Simar (1994), who show that, in addition to N —+ co
and T —> 00, we need to require T"1/2 In N ——> 0 in order to ensure the consistency of

61 as an estimate of a. This latter requirement limits the rate at which N can grow

64

relative to T in order to ensure that the upward bias induced by the max operation

disappears asymptotically.

2.3 Construction of Conﬁdence Intervals by Boot-
strapping

We can use bootstrapping to construct conﬁdence intervals for functions of the ﬁxed
effects estimates. The inefﬁciency measures 11;? and the efﬁciency measures 1"“ =
a:

exp(—1‘1-

,) are functions of the ﬁxed effects estimates and so bootstrapping can be

used for inference on these measures.

We begin with a very brief discussion of bootstrapping in the general setting in
which we have a parameter 6, and there is an estimate 6 based on a sample 21, - - - ,2”
of i.i.d. random variables. The estimator 6 is assumed to be regular enough so

that n1/2(6 — 6) is asymptotically normal. The following bootstrap procedure will

be repeated many times, say for b = 1, - -- ,B where B is large. For iteration 0,
construct pseudo data 2?) , - -- ,ng) by sampling randomly with replacement from
the original data 21, - -- ,2". From the pseudo data, construct the estimate 6“”.

The basic result of the bootstrap is that, under fairly general circumstances, the
asymptotic (large 11) distribution of n1/2(6(b) — 6) conditional on the sample is the
same as the (unconditional) asymptotic distribution of n1/2(6 — 6). Thus for large n
the distribution of 6 around the unknown 6 is the same as the bootstrap distribution
of 6“” around 6, which is revealed by a large number (B) of draws.

We now consider the application of the bootstrap to the speciﬁc case of the ﬁxed
effects estimates. Our discussion follows Simar (1992). Let the ﬁxed effects estimates
be ,6 and 61,-, from which we calculate 1‘1; and 1“: (1' = 1, - - - ,N). Let the residuals be
13,-, = y,, — d,- — 263,6 (1' = 1, - ~- ,N, t = 1, - -- ,T). The bootstrap samples will be

drawn by resampling these residuals, because the ”it are the quantities analogous to

65

the 2’s in the previous paragraph, in the sense that they are assumed to be i.i.d., and
the 17,-, are the observable versions of the 1),}. (The sample size 71 above corresponds
to NT). So, for bootstrap iteration b (= 1, -- . ,B) we calculate the bootstrap sample

11(5) and the pseudo data 311(1)“ _ ai+$ 115“”, (5)

.From these data we get the bootstrap
estimates 3(5), 619,) 11:“) ,and r, (b ) ,and the bootstrap distribution of these estimates
is used to make inferences about the parameters.

We note that the estimates 1‘1: and 1“: depend on the quantity max,- 62,-. Since
“max” is not a smooth function, it is not immediately apparent that this quantity
is asymptotically normal, and if it were not the validity of the bootstrap would be
in doubt. A rigorous proof of the validity of the bootstrap for this problem is given
by Hall, Héirdle, and Simar (1995). They prove the equivalence of the following
three statements: (i) max,- 61,- is asymptotically normal. (ii) The bootstrap is valid
as T —) 00 with N ﬁxed. (iii) There are no ties for max,- 01,-: that is, there are a
unique index (N) such that a( N) = max_,- 01,-. There are two important implications
of this result. First, the bootstrap will not be reliable unless T is large. Second, this
is especially true if there are near ties for max,- a,-, in other words, when there is
substantial uncertainty about which ﬁrm is best.

We now turn to speciﬁc bootstrapping procedures, which differ in the way they
draw inferences based on the bootstrap estimates. In each case, suppose that we are
trying to construct a conﬁdence interval for 11’; = max,- aj — 01,-. That is, for a given
conﬁdence level c, we seek lower and upper bounds L,, U,- such that P(L,- 5 11: s
U,) = 1 -- c.

The simplest version of the bootstrap is the percentile bootstrap. Here we simply
take L,- and U,- to be the upper and lower c/ 2 fractiles of the bootstrap distribution
of the 11:“). More formally, let F be the cumulative distribution function (cdf) for 11;“
so that E(s) = P(1‘1:(b) S s) = the fraction of the B bootstrap replications in which
11:0,) 3 3. Then, we take L,- = F‘1(c/2) and U,- = F’1(1 — c/2).

66

The percentile bootstrap intervals are accurate for large T but may be inaccurate
for small to moderate T. This is a general statement, but in the present context there
is a speciﬁc reason to be worried, which is the ﬁnite sample upward bias in max,- 61,- as
an estimate of maxj 01,-. This will be reﬂected in improper centering of the intervals
and therefore inaccurate coverage probabilities. Simulation evidence on the severity
of this problem is given by Hall, Hiirdle, and Simar (1993) and in Section 2.5 of this
chapter.

Several more sophisticated versions of the bootstrap have been suggested to con-
struct conﬁdence intervals with higher coverage probabilities. Hall, Héirdle, and Simar
(1993) and Hall, HérdIe, and Simar (1995) suggested the iterated bootstrap, also called
the double bootstrap, which consists of two stages. The ﬁrst stage is the usual per—
centile bootstrap which constructs, for any given c, a conﬁdence interval that is in-
tended to hold with probability of 1 — c. We will call these “nominal” 1 — c conﬁdence
intervals. The second stage of the bootstrap is used to estimate the true coverage
probability of the nominal 1 — c conﬁdence intervals, as a function of c. That is, if
we deﬁne the function 7r(c) = true coverage probability level of the nominal 1 — c
level conﬁdence interval from the percentile bootstrap, then we attempt to evaluate
the function 7r(c). When we have done so, we ﬁnd 0*, say, such that 1r(c* ) = 1 — c,
and then we use as our conﬁdence interval from the ﬁrst stage percentile bootstrap,
which we “expect” to have a true coverage probability of 1 — c.

The mechanics of the iterated bootstrap are uncomplicated but time-consuming.
For each of the original (ﬁrst stage) bootstrap iterations B, the second stage involves
a set of 32 draws from the bootstrap residuals, construction of pseudo data, and
construction of percentile conﬁdence intervals, which then either do or do not cover
the original estimate 6. The coverage probability function 7r(c), which is the actual
rate at which a nominal c—level interval based on the bootstrap estimates covers the

true parameter 6, is estimated by the rate at which a nominal c-level interval based on

67

the iterated bootstrap estimates covers the original estimate 6. To understand this,
note that data generated from the true 6 yield 6; bootstrap data generated based
on 6 yield the bootstrap estimates 60’); and data based on 60’) yield the iterated
bootstrap estimates, say 6(b’b1). So the iterated bootstrap estimates 6(b’bl) have the
same relationship to 6 as the bootstrap estimates 60’) have to 6.

Generally we take B2 = B, so that the total number of draws has increased from

B to B2. by going to the iterated bootstrap. Theoretically, the error in the percentile

—1/2 1

bootstrap is of order 11 while the error in the iterated bootstrap is of order n“ .
There is no clear connection between this statement and the question of how well
ﬁnite sample bias is handled.

An objection to the iterated bootstrap is that it does not explicitly handle bias.
For example, if the nominal 90% conﬁdence intervals only cover 75% of the bootstrap
estimate in the ﬁrst stage, it simply insists on a higher nominal conﬁdence level, like
98%, so as to get 90% coverage. That is, it just makes the intervals wider when
bias might more reasonably be handled by recentering the intervals. A technique
that does recenter the intervals is the bias-adjusted bootstrap of Efron (1982) and
Efron (1985). As above, let 6 be the parameter of interest, 6 the sample estimate
and 60’) the bootstrap estimate (for b = 1, - -- ,B), and F the bootstrap cdf. For
n large enough that the bootstrap is accurate, we should expect F(6) = 0.5, and
failure of this to occur is a suggestion of bias. Now deﬁne 20 = <I>‘1(F(6)) where (I)
is a standard normal cdf, and where F(6) = 0.5 would imply 20 = 0. Let 26/2 be

the usual normal critical value; e.g. for c = 0.1, 26/2 = 20.05 = 1.645. Then, the

bias-adjusted bootstrap conﬁdence interval is [L,-, U,] with:

A A

L. = F‘1(<I>(2zo — 262)). U.- = F‘1(<I>(2zo + 262)) (2.8)

For example, suppose that there is an upward bias, reﬂected by the fact that 60%

68

of the bootstrap draws are larger than 6, so that F(6) = 0.4. Then .20 = —0.253, and
for c = 0.1 we have <I>(2zo — 26/2) = <I>(-—2.152) = 0.016 and <I>(220 + zc/2) = 0.873.
Thus our conﬁdence interval comes from the lower tail 0.016 fractile and the upper
tail 0.127 fractile, and we have compensated for upward bias by moving the interval
left. This seems intuitively reasonable.

The assumption that justiﬁes the bias-adjusted bootstrap is that, for some monotone
increasing function 9, (9(6) — 9(6)) is distributed as N (—zoa, 02) and (9(6(b)) — 9(6))
is also distributed as N (—200, 02) for some 20, 02. (The ﬁrst distribution is from
the probability law of the sample, and the second is the bootstrap distribution in-
duced by resampling from the given sample.) Thus we have normality, and also equal
biases and variances, for some transformation of 6. The transformation function 9
need not be known. This is an advantage in implementation, but a disadvantage in
trying to decide whether the assumption holds. It is not known whether the bias-
adjusted bootstrap is valid for our speciﬁc problem, but it performs relatively well in
the simulations reported in Section 2.5.

The ﬁnal version of the bootstrap that we will consider is the bias-adjusted and
accelerated bootstrap of Efron and Tibshirani (1993). This is intended to allow for a
possibility that the variances of 6 depends on 6, so that a bias-adjustment also requires
a change in variance. This correction depends on some quantities deﬁned in terms of
the so-called jackknife values of 6. For i = 1, - - - ,n, let 6“) be the value of the estimate
based on all observations other than observation 1; and let é(°) = 11’1 £3le 6(,-) be

the average of these values. Then the “acceleration” factor a is deﬁned by:

?=1 (99) 7 5(1))3
1.5
6 (231 (an) - 6“(1))2)

 

a = (2.9)

69

With 20 and 26/2 deﬁned as above, deﬁne

(20 + 20/2)

(30 “ Zen)
1 _ Gui (20 + 26/») (2.10)

(1—6, (zo—zc/,))'

Then the conﬁdence interval is [L,-, U,] with L,- = 13"1 (<I’(b,-1)) and U,- = F‘1 (<I>(b,-2)).

 

bt'1=20+( 1 b12=Zo+

More discussion can be found in Efron and Tibshirani (1993, chapter 14).

It is important to note that there are cases in which the acceleration factor fails to
be deﬁned. This happens when all the jackknifed estimates are the same, which yields
zero both for the numerator and for the denominator of the acceleration factor. For
example, one ﬁrm could be so dominantly efﬁcient in the industry that jackkniﬁng the
best ﬁrm (in our case, dropping one time dimensional observation) would not change
the efﬁciency rank for the best ﬁrm. Also, with large T, the ﬁrms’ efﬁciency ranking
would not be affected by taking out one time period observation, so that it is more
likely for the acceleration factor not to be deﬁned. However, as N gets large, it is less
likely for the acceleration factor not to be deﬁned since it would be harder to have one
speciﬁc ﬁrm uniformly as the best estimated ﬁrm with more ﬁrms in sample. In the
following sections, when the acceleration factor is not deﬁned, we do not accelerate
the bias-adjusted bootstrap. After all, the bias-adjusted bootstrap is a special case

of the bias-adjusted and accelerated bootstrap with the acceleration factor of zero.

2.4 A Simple Alternative to the Bootstrap

In this section we propose a simple parametric alternative to the bootstrap, and some
related resampling procedures. We begin with the following simple observation. We
wish to construct a conﬁdence interval for 11’: = a( N) — 61,-, or r: = exp(—113‘). If we

knew which ﬁrm was best - that is, if we knew the index (N) - we could construct a

70

parametric conﬁdence interval of the form:
(61( N) — 61,-) i (critical value) :1: (standard error), (2.11)

where “critical value” would be the apprOpriate c/ 2 level critical value of the standard
normal distribution, and “standard error” would be the square root of the quantity:
estimated variance of 61( N) + estimated variance of 61,- - 2*estimated covariance of
(61(N),61,-). This interval would be valid asymptotically as T —> 00 with N ﬁxed.
In fact, if the 12,-, are i.i.d. normal and we use the critical value from the student-t
distribution, this interval would be valid in ﬁnite samples as well.

The conﬁdence interval (2.11) is infeasible because the identity of the best ﬁrm is

unknown. However, we can construct the conﬁdence interval:
(61[ N] — 61,-) :1: (critical value) =1: (standard error), (2.12)

where as before max,- 61,- : am]. That is, we use a conﬁdence interval that would
be apprOpriate if (N) were known, and we simply pretend that [N] = (N). That
is, we pretend that we do know the identity of the best ﬁrm. This is our “simple
parametric” conﬁdence interval.

Two details should be noted. First, in calculating the standard error in (2.12),
we evaluate Var(61[N]) and Cov(6r[N], 61,-) using the standard formulas that ignore the
fact that the index [N] is data-determined. That is, again we pretend that [N] = (N)
is known. Second, although 01(N) — a, Z 0, the lower bound of the conﬁdence interval
in (2.12) can be negative. If it is, set it to zero. This corresponds to setting the upper
bound of the relative efﬁciency measure r; to one.

The asymptotic (T —-+ 00 with N ﬁxed) validity of this procedure follows from
the same argument that Hall, Hiirdle, and Simar (1995) used to show that maxj 61,-

is asymptotically normal. If there are no ties for max,- 01,-, then as T —> 00, P( [N] =

71

(N )) —-> 1. That is, with no ties, in the limit there is no uncertainty about the identity
of the best ﬁrm.

An obvious implication of this argument is the following. For data sets in which
there is substantial uncertainty about the identity of the best ﬁrm, the accuracy of
either bootstrap intervals or our simple parametric intervals is doubtful.

The simple parametric intervals differ from bootstrap intervals in an important
way that goes beyond parametric versus resampling methods. Consider the following
resampling scheme, which could also be used to create a conﬁdence interval for 11'; =
a( N) — 01,-, treating (N) = [N] as known. Create bootstrap samples b = 1, - -- ,B as
above. For sample b, calculate

a:,(11b'1)az—best = ($33] - alb) (2-13)
where [N] is still the index such that lel = max,- 61,- in the original sample. Then

create a percentile-interval from these quantities.

140

um am_ be st differ from the bootstrap quantities

Note that the quantities 11
{,TU’) = max 61gb) — (31(1)), (2.14)

as deﬁned in Section 2.3. For the bootstrap quantities, there is a “max” in the original
data to get 61[ N] and then there is another “max” in each bootstrap sample. That
is, the bootstrap samples are deliberately analyzed in exactly the same way as the
original sample was. In (2.13), there is still a “max” in the original sample, but in
the bootstrap samples we maintain the identity of the “best” ﬁrm in the original
samples. We will call this the “max-best bootstrap,” although actually it is not
really a bootstrap procedure at all. It is just a resampling scheme. Semantic issues
aside, it is the “max-best” bootstrap that should be similar to our simple parametric

procedure. Our motivation for discussing the “max-best” procedure is mostly to make

72

clear why our simple parametric intervals may be expected to be rather different from
percentile bootstrap intervals, when the identity of the best ﬁrms is in doubt.

As noted above, the “max” operator causes c31( N) to be biased upward as an
estimate of a( N): and this causes an upward bias in 112‘ and a downward bias in 1": =
exp(—112‘). The second “max” in (2.14) in the bootstrap samples causes additional
bias. For this reason the percentile bootstrap intervals will tend to be seriously
miscentered. Our simple parametric intervals, or “max-best” bootstrap intervals, do
not contain the second source of bias and may be expected to be more accurate than
percentile bootstrap intervals. Of course, precisely because they do not contain the
second source of bias, the parametric or “max-best” intervals cannot be bias-adjusted.
The bias-adjusted (or bias-adjusted and accelerated) bootstrap intervals described in
the previous section use the bias at the bootstrap stage to correct the bias in the
original estimates. The ability to do this is a potentially signiﬁcant advantage of

bootstrap methods.

2.5 Simulations

In this section we conduct Monte Carlo simulations to investigate the reliability of
conﬁdence intervals based on bootstrapping and on the alternative procedures de-
scribed in the last section. We are interested in the coverage rates of the conﬁdence
intervals and the way that they are related to bias in estimation of efﬁciency levels.
Results for other methods including the MLE can be found in Kim (1999).

The model is the basic panel data stochastic frontier model given in (2.1) above.
However, we consider the model with no regressors so that we can concentrate our
interest on the estimation of efﬁciencies without having to be concerned about the
nature of the regressors. In practical cases, the regression parameters 6 are likely to

be estimated so much more efﬁciently than the other parameters that treating them

73

as known is not likely to make much difference.

Our data generating process is:
9,, =a+v,-, —11,- =c1,-+11,-,, i: 1,--- ,N, t: 1,--- ,T, (2.15)

in which the 12,-, are i.i.d. N (0, 03) and the 11,- are i.i.d. half-normal: that is, let
11,- : |u,| where u,- ~ N (0, 0,2,). Since our point estimates and conﬁdence intervals
are based on the ﬁxed effects estimates of 611,- - - ,aN, the distributional assumptions
on 11,, and 11,- do not enter into the estimation procedure. They just deﬁne the data
generation mechanism.

The parameter space is (01, 03, 03, N, T), but this can be reduced. Without loss
of generality, we can ﬁx a to any number, since a change in the constant term only
shifts the estimated constant term by the same amount, without any effect on the
bias and variance of any of the estimates. For simplicity, we ﬁx the constant term
equal to one.

We need two parameters to characterize the variance structure of model. It is
natural to think in terms of 03 and 03. Alternatively, recognizing that 03 is the
variance of the untruncated normal from which 11 is derived, not the variance of 11, we
can think instead in terms of 03 and Var(11), where Var(u) = 0,2,(7r — 2) / 7r. However,
we obtain more readily interpretable results if we think instead in terms of the size of
total variance and the relative allocation of total variance between 11 and 11. The total
variance is deﬁned as a? = 03 + Var(u). Olson, Schmidt, and Waldman (1980) used
)1 = (Ia/0,, to represent the relative variance structure, so that their parametrization
was in terms of a? and A. Coelli (1995) used 01:2 and either 7 = 03/(03 + 0,2,) or
7* = Var(11)/(0,2, + Var(11)). The choice between these two parameters is a matter of

convenience. We decided to use 7* due to its ease of interpretation, so that we use the

parameters 0:2 and 7*. The reason this is a convenient parametrization (compared to

74

the “obvious” choice of 03 and 0,2,) is that, following Olson, Schmidt, and Waldman
(1980), one can show that comparisons among the various estimators are not affected

by 062. The effect of multiplying or? by a factor of k holding 7* constant, is as follows.

1. constant term: bias change by a factor of x/l; and variance changes by a factor

of k,
2. 03 and 0,2,: bias changes by a factor of k and variance changes by a factor of 162,
3. 7* (or 7 or A): bias and variance are unaffected.

We set a? at 0.25 arbitrarily, so that the only parameters left to consider are (7*, N, T).
We consider three values for 7*, to include a case in which the variance of 11 dom-
inates, a case in which the variance of 11 dominates, and an intermediate case. We
take 7* = 0.1, 0.5, and 0.9 to represent the above three cases. With a? = 0.25, 03,

Var(11), and 0,2, are determined as follows for each value of 7*.
1. 7* = 0.1; 63 = 0.225, Var(u) = 0.025, 63 = 0.069,
2. 7* = 0.5: 63 = 0.125, Var(u) = 0.125, 63 = 0.344,
3. 7* = 0.9: 63 = 0.025, Var(u) = 0.225, 63 = 0.619.

Four values of N and T are considered. In order to investigate the effect of changing
N, we ﬁx T = 10 and consider N =10, 20, 50, and 100. Similarly, T is assigned the
values of 10, 20, 50, and 100 while ﬁxing N = 10. This is done for each different value
of 7*.

For each parameter conﬁguration (7*, N, T), we perform R = 300 replications of

the experiment. For each replication, we calculate the following:
1. The estimate of a, 61 = max]- (31]- : 61W].

2. The infeasible estimate of c1, 5‘(N)-

75

3. The relative efﬁciency estimate, 11: = 61 — 61,-, for each i = 1, 2, . - - ,N.

4. The percentile bootstrap conﬁdence interval for 113‘, for each i.

5. The BCa bootstrap conﬁdence interval for 11;", for each 1'.

6. The simple parametric conﬁdence interval (of Section 2.4) for 112‘, for each 1'.

7. The “max-best” bootstrap conﬁdence interval for 112‘, for each i.

8. The infeasible parametric conﬁdence interval (of Section 2.4) for 112‘, for each i.

The bootstrap results were based on B = 1000 replications. Note that we did not
consider the iterated bootstrap due to its computational demands.

We are primarily interested in the biases of the point estimates and the coverage
rates of the conﬁdence intervals. These biases and coverage rates are reported as
averages over both the N ﬁrms (where relevant) and the R replications. In particular,
the coverage rate of the conﬁdence intervals is just the fraction of times that coverage
occurs.

We begin the discussion of our results with Table 2.1. Three measures of biases are
considered. biasl = E(61 — a) is the bias in the overall constant, bias2 = E(u: — 11,)
is the bias of the estimated relative inefﬁciency compared to true inefﬁciency, and
bias3 = E(61: — 11;“) is the bias of the estimated relative inefficiency compared to true
relative inefﬁciency.

There are two different sources of biasl. These are easily understood in terms of

the identity:

61—c1=(d—a(N))-(a—a(N)). (2.16)

biasl is E(61 — oz). The ﬁrst (and generally most important) source of this bias is

E(61 — “(N)), which is positive. That is, 61 is biased upward as an estimate of c1( N):

76

because of the “max” operation that deﬁnes 61 = max,- 61,-. This bias increases with
N, but decreases when T and/or 7* increase. It disappears as T —> 00 or 7* ——> 1.
The second source of bias is that E(a(N)) < a, resulting in downward bias for 61. This
reﬂects the fact that 61 - a( N) = min, u,- 2 0. This bias disappears as N —) 00. More
generally, it decreases as N increases, and increases with 7*, but does not depend on
T. We see examples of both positive and negative bias in column (1) of Table 2.1. As
expected, the largest positive bias occurs for large N and small T and 7*, whereas
negative bias (absolute value) increases for larger 7* and T and smaller N.

The bias of 112‘ as an estimate of 11,- is given in column (2) of Table 2.1. It is
essentially the same as the bias of the overall constant term:

bias2 = E(112‘ — 11,) = E ((01 —- 61,-) — (c1 — c1,)) = E(61 — a) — E(61,- — 01,-)
(2.17)
= biasl — E(61,- — 61,-)

and E(61,- — (1,) = 0.

The estimate 112‘ is perhaps more naturally viewed as an estimate of 112‘. Column

(3) gives the bias of 112 as an estimate of 112‘:

biasB = E(112‘ — 112‘) 2: E ((61 — 61,-)-(a(1v) — 011)) = E(ét - “(N)) " E(61 — 02')

= E(d — 0(N)) > O
(2.18)

since E(61,- — c1) = 0. Note that bias3 is the ﬁrst source of biasl, as described above
and is always positive. In other words, 112‘ can overestimate or underestimate the
absolute efficiency 11,, but (on average) it overestimates the relative efﬁciency 112‘.
We now turn our attention to question of the accuracy of the various types of
conﬁdence intervals we have discussed. We present results for 90% conﬁdence inter-

vals for r2‘ : exp(—112‘), but the coverage rates would be exactly the same for the

77

corresponding conﬁdenceintervals for 112. We are primarily interested in the coverage
rates of the intervals, and the proportions of observations that fall below the lower
bound and above the upper bound. The reason we present intervals for r2 (rather
than 112‘) is that it is bounded between zero and one, and so the average width of the
intervals is easier to interpret.

Table 2.2 gives the results for the infeasible parametric intervals based on equation
(2.11) of Section 2.4. The coverage rates of these intervals are very close to 0.90, as
they should be. These intervals are infeasible in practice, since they depend on
knowledge of the identity of the best ﬁrm, but they illustrate two points. First, for
obvious reasons, the intervals are narrower when T is large and when 7* is large (that
is, when the variance of inefﬁciency is large relative to the variance of noise). The
number of ﬁrms, N, is not really relevant if we know which one is best. Second,
and more fundamentally, there is no difﬁculty in constructing accurate conﬁdence
intervals for technical efﬁciency if we know which ﬁrm is best. All of the problems
that we will see with the accuracy of feasible intervals are due to not knowing with
certainty which ﬁrm is best.

Table 2.3 gives the results for the percentile bootstrap and BC", bootstrap conﬁ-
dence intervals. Consider ﬁrst the percentile bootstrap. Its coverage rate is virtually
always less than the nominal level of 90%. The problem is that the intervals are not
centered on the true values, due to the bias problem discussed above. (The upward
bias of 61 as an estimate of 61(N) corresponds to an upward bias in 112‘ and a downward
bias in 62‘. Thus too many r2‘ lie above the upper bound of the conﬁdence intervals.)
Theoretically, the intervals should be accurate in the limit (as T —+ 00 with N ﬁxed),
if there are no ties for max,- 01,-, and so the validity of the percentile bootstrap depends
on large T. The bias problem is small when we have large T and 7* and small N, and
the coverage probability reaches almost 0.9 for these cases, but it falls in the opposite

cases where the bias is big. The width of the intervals decreases as T or 7* increases.

78

However, the intervals get narrower with larger N, while the bias increases as N in-
creases. This explains why the coverage probabilities of the percentile intervals fall
rapidly as N increases.

The results in Table 2.3 indicate that the BC,, intervals provide better coverage
rates than the uncorrected percentile intervals, but with the same pattern. They are
more accurate when T and 7* are large and when N is small. When T and 7* is
small or N is large, there are very considerable improvements over the uncorrected
percentile intervals, even though the 80,, intervals do not succeed entirely in yielding
correct coverage rates.

The bias corrected conﬁdence intervals are obtained by shifting the bootstrap
distribution by approximately twice the estimated bias in the bootstrapping stage. If
on average (max,- 612-b) —

expect a properly centered interval with a coverage rate of approximately 0.9 after

max,- (1,) were the same as (max,- a,- — max,- 01,-), we would

the bias is corrected. In our simulations, however, only some part of the bias gets

corrected. Some evidence on this point is given in Table 2.4, which shows the average

of max,- 01,-, max,- 01,-, and max,- 612“) over different values of N, T, and 7*. The fourth

column in the table shows the average bias in the ﬁxed effects estimates of max,- 61,-,

and the last column shows the average bias in the bootstrap estimates. We see that

(.b) _
J

is substantial when 7* is small and N is large. As a result, the bias correction is

(man 61 max, 61,-) is always smaller than (max,- 61, — max,- a,-) and the difference
incomplete especially when 7* is small and N is large. However, the bias correction
is always in the right direction, and this explains why BCa intervals are better than
the percentile intervals.

Table 2.5 gives the results for the feasible parametric intervals based on equation
(2.12) of Section 2.4, and for the “max-best” bootstrap. We expect the feasible
parametric intervals and those from the “max-best” bootstrap to give similar results,

and they do. The parametric intervals have slightly better coverage rates, because

79

they are wider, but the differences are quite small. As a result we will limit our
further discussion to the feasible parametric intervals.

The feasible parametric intervals are clearly more accurate than the percentile
bootstrap intervals. This is especially true in the worst cases. For example, for
N = 100, T = 10 and 7* = 0.1, compare coverage rates of 0.195 for the percentile
bootstrap and 0.663 for the parametric intervals. The parametric intervals are wider
and they are better centered, both of which imply higher coverage rates. To under-
stand the point about better centering, recall the discussion of bias in Section 2.4. The
parametric intervals have one level of bias (61 is a biased estimate of 0‘(N)) whereas
the percentile bootstrap has two (61 is a biased estimate of (1,N), and max,- (3(5) is a
biased “estimator” of 61).

A more interesting comparison is the feasible parametric intervals versus the B0,,
intervals. The feasible parametric intervals generally but not always have better
coverage rates than the ECG intervals. This is because they are wider. The cases in
which the ECG intervals have better coverage rates than the parametric intervals are
cases in which T, N and 7* are all small. These are cases of considerable bias but
not the cases with the most bias (see Table 2.4), which would be cases in which T
and 7* are small but N is big. Overall, it is hard to say whether the parametric or
BCa intervals are better, because there is a conﬂict between our desire for conﬁdence
intervals to cover with correct probability and our desire for them not to be wide.

Our last set of simulations is designed to consider cases in which the identity of
the best ﬁrm is clear. Here we set out one 11,- at the 0.05 quantile of the half normal
distribution, while the other (N — 1) are set at equally spaced points between the 0.75
and 0.95 quantiles, inclusive. These 11,- are then held ﬁxed across replications of the
experiment. The only randomness therefore comes from the stochastic error 11. Since
the identity of the best ﬁrm should be clear, the bias caused by the max operator

should be minimal. Table 2.6 gives the bias of the ﬁxed effects estimate, and is of the

80

same format as Table 2.1. Recall that bias3 is the component of the bias caused by
the max operator (see equation (2.18) above) and should be small when the identity
of the best ﬁrm is clear. We can see that bias3 in Table 2.6 is indeed much smaller
than in Table 2.1.

Correspondingly, we expect the various bootstrap and parametric intervals to be
more accurate in the current cases than in the previous ones. Table 2.7 gives the
results for the percentile bootstrap, the BC", bootstrap, and the feasible parametric
intervals. Clearly the intervals are much more reliable now than they were in the pre-
vious cases for which results were reported in Tables 2.3 and 2.5. Note in particular
that the percentile bootstrap now does pretty well in all cases except the least favor-
able (small T and 7*, and large N). The 30,, bootstrap is now usually worse than
the percentile bootstrap. It is counterproductive to try to correct for bias when there
is little or no bias. The parametric intervals often cover too often, rather than too
seldom, and again this is a reﬂection of the intervals being wider than the bootstrap
intervals.

The overall conclusions we draw from our simulations are straightforward. If it is
clear from the data which ﬁrm is best, all of the methods of constructing conﬁdence
intervals work fairly well. There is no need to consider more complicated procedures
than the percentile bootstrap. The parametric intervals are also reliable, but they
may be wider than necessary. Conversely, if it is not clear from the data which ﬁrm
is best, none of the methods of constructing conﬁdence intervals are very reliable.
The percentile bootstrap is particularly bad. The 80,, bootstrap intervals or the

parametric intervals are probably preferred.

81

2.6 Empirical Results

We now apply the procedures described above to two well-known data sets. These
data sets were chosen to have rather different characteristics. The ﬁrst data set
consists of N = 171 Indonesian rice farms observed for T = 6 growing seasons. For
this data set, the variance of stochastic noise (11) is large relative to the variability
in 11 (Var(11)): that is, 7* = 0.222 with 67? = 0.138. Inference on inefﬁciencies will

* is small and N is large. The second data

be very imprecise because T is small, 7
set consists of N = 10 Texas utilities observed for T = 18 years. For this data
set, 03 is small relative to Var(u): 7* = 0.700 with 6122 = 0.010. In this case we
can estimate ineﬂiciencies much more precisely because T and 7* are larger, and N
is smaller. We will see that the precision of the estimates will differ across these
data sets, and that choice of technique matters more where precision is low. A more

detailed analysis of these data, including Bayesian results and results for multiple and

marginal comparisons with the best, can be found in Kim and Schmidt (1999).

2.6.1 Indonesian Rice Farms

These data are due to Erwidodo (1990) and have been analyzed subsequently by Lee
(1991), Lee and Schmidt (1993), Horrace and Schmidt (1996), Horrace and Schmidt
(2000) and others. There are N = 171 rice farms and T = 6 six-month growing
seasons. Output is rice in kilograms and inputs are land in hectares, labor in hours,
seed in kilograms and two types of fertilizer (urea in kilograms and phosphate in
kilograms). The functional form is Cobb-Douglas with some dummy variables added
for region, seasonality for dry or wet season, the use of pesticide and seed types for
high yield or traditional or mixed. For a complete discussion of the data, see Erwidodo
(1990).

The estimated regression parameters are given in Horrace and Schmidt (1996) and

82

we will not repeat them here. Instead we will give point estimates of efﬁciencies and
90% conﬁdence intervals for these efﬁciencies. There are 171 ﬁrms and so we report
results for the three ﬁrms (164, 118, and 163) that are most efﬁcient; for the ﬁrms
at the 75““ percentile (31), 50““) percentile (15) and 25(th) percentile (16) of the
efﬁciency distribution; and for the two worst ﬁrms (117, 45). All of these rankings
are according to ﬁxed effects estimates.

We begin with Table 2.8. It gives the ﬁxed effects point estimates and the lower
and upper bounds of the 90% parametric conﬁdence intervals. For the purpose of
comparison we also give the point estimates and the lower and upper bound of the 90%
conﬁdence intervals for the MLE based on the assumption that inefﬁciency has a half-
normal distribution. See Horrace and Schmidt (1996) for the details of calculations
for the MLE.

The estimated efﬁciency levels based on the ﬁxed effects estimates are rather low.
They are certainly much smaller than the MLE estimates. This is presumably due
to bias in the ﬁxed effects estimates, as discussed previously. This data set has
characteristics that should make the bias problem severe: N is large; the a,- are
estimated imprecisely because 0,2, is large and T is small; and there are near ties for
max,- a,- because 03 is small.

Table 2.9 gives 90% conﬁdence intervals based on the percentile bootstrap, the
ECG bootstrap, and the iterated bootstrap, as well as the (feasible) parametric inter-
vals and the “max-best” bootstrap intervals. The bootstrap results are based on 1000
replications, and in the case of the iterated bootstrap each second-level bootstrap is
also based on 1000 replications.

There is some similarity between the intervals from different methods, but there
are also some interesting comparisons to make. The percentile bootstrap intervals are

clearly closest to zero (i.e. they would indicate the lowest levels of efﬁciency). This

is presumably a reﬂection of bias. Note, for example, that the midpoints of these

83

intervals are clearly less than the ﬁxed effects estimate (which is itself biased toward
zero). For the reasons given above, we do not regard these intervals as trustworthy for
this data set. The iterated bootstrap intervals are centered similarly to the percentile
bootstrap but are wider. The BCa intervals are an upward shift (in the direction
of higher efﬁciency) of the percentile intervals and might be a good choice for this
data set. The parametric intervals are also an upward shift of the percentile intervals,
though not by as much as the ECG intervals. They are wider than the ECG intervals,
and in fact they are about as wide as the iterated bootstrap intervals. They are
another possible good choice for this data set; in a sense they are conservative choice.
The “max-best” bootstrap intervals are similar to the parametric intervals and are

therefore another possible good choice.

2.6.2 Texas Utilities

In this section, we consider the Texas utility data of Kumbhakar (1996), which was
also analyzed by Horrace and Schmidt (1996) and Horrace and Schmidt (2000). As in
the previous section, we will estimate a Cobb-Douglas production function, whereas
Kumbhakar ( 1996) estimated a cost function. The data contain information on output
and inputs of 10 privately owned Texas electric utilities for 18 years from 1966 to 1983.
Output is electric power generated, and input measures on labor, capital and fuel are
derived from dividing expenditures on each input by its price. For more details on
the data see Kumbhakar (1996).

Table 2.10 gives the ﬁxed effects point estimates, the 90% parametric intervals,
and the MLE point estimates and 90% conﬁdence intervals. The format is the same
as that of Table 2.8, except that now we can report the results for all of the ﬁrms.
Table 2.11 gives the 90% conﬁdence intervals for the same set of procedures as before,
and it is of the same format as Table 2.9, except that results are given for all ﬁrms.

Compared to the previous data set, we estimate the intercepts a,- much more

84

precisely, because T is larger and 0,2, is smaller. For this reason, and also because N
is smaller, we expect there not to be a severe ﬁnite sample bias problem in the ﬁxed
effects estimates, and we expect that the choice of technique will not matter as much.

The MLE estimated efﬁciencies are larger than those based on ﬁxed effects (except
for the “best” ﬁrm), but the difference is not nearly as large as for the previous
data set. Similarly, the MLE conﬁdence intervals are narrower than the parametric
intervals, but not by nearly as much as in Table 2.8. A distributional assumption is
much less valuable in the present case. In fact, the accuracy of the MLE intervals is
now suspect, because we have only 10 ﬁrms, and the asymptotic justiﬁcation for the
MLE requires large N.

In Table 2.11, we can see that the parametric intervals and all of the bootstrapping
intervals are quite similar. The bias problem is apparently negligible for this data set,
and correspondingly our faith in the accuracy of these intervals is relatively strong.

We can compare the features of this data set with the setup of our simulation.
One of the parametric conﬁgurations in our simulation had N = 10, T = 20, and
7* = 0.5, which matches these data quite well. In that case the coverage rates of
the various conﬁdence intervals were in the range of 0.87 to 0.88, which are obviously
close to 0.90.

A technical detail worth noting is that the acceleration factor in the ECG bootstrap
was undeﬁned and was therefore set equal to zero. This is further evidence that there

was very little bias in estimation.

2.7 Conclusions

In this chapter we have provided a survey of the use of bootstrapping to construct
conﬁdence intervals for efﬁciency measures. We discussed several versions of the

bootstrap, including the percentile bootstrap, the iterated bootstrap, and the bias-

85

adjusted and accelerated bootstrap. In stochastic frontier models, these methods
can be applied to the ﬁxed effects estimates, yielding inferences that are correct
asymptotically as T —) 00 with N ﬁxed.

We have proposed a simple parametric method of constructing conﬁdence inter-
vals. It uses standard methods and simply acts as if the identity of the best ﬁrm
is known. We also proposed a resampling scheme, the “max-best” bootstrap, which
ought to yield conﬁdence intervals similar to the parametric intervals. These pro-
cedures are valid under the same conditions that the bootstrap methods are valid,
namely, as T —> 00 with N ﬁxed, and provided that there is a unique best ﬁrm.

The main problem that we encounter is the upward bias in the ﬁxed effects esti-
mate of the frontier, which translates into a downward bias for the estimated efﬁcien-
cies. The bias is large when T is small, N is large, and/or statistical noise is large
relative to the variation in the frontier. These are exactly the same circumstances in
which the identity of the best ﬁrm is uncertain, and so it is fair to say that bias is a
problem when the identity of the best ﬁrm is in question.

Our simulation results show that the percentile bootstrap is seriously inaccurate
when the bias problem exists, that is, when the identity of the best ﬁrm is not
clear. The percentile bootstrap intervals are miscentered because the bias in the
original estimates is compounded by similar “bias” in the bootstrap estimates. Our
parametric intervals, or our “max-best” bootstrap intervals, avoid the second source
of bias, are more reliable than the percentile bootstrap intervals. The bias corrected
and accelerated (BCa) bootstrap makes a bias correction based on the “bias” in the
second round, and these intervals are also more reliable than the percentile bootstrap
intervals. Comparing the parametric intervals and the ECG intervals, neither clearly
dominates the other. The parametric intervals are more conservative.

A negative conclusion of the simulations is that none of the methods of construct-

ing conﬁdence intervals based on the ﬁxed effects estimates is very reliable if the

86

identity of the best ﬁrm is in serious doubt. In such cases it may be worthwhile to
consider assuming a distribution for technical inefﬁciency and using MLE.

We performed an empirical analysis of two data sets, one of which had charac-
teristics very unfavorable to the bootstrap (large N, small T, and large variance of
noise). In this case there was evidence of bias, and the bootstrap intervals were both
unreliable and too wide to be informative. Our other data set had more favorable
characteristics, and the empirical analysis yielded results that were quite precise and
seemingly sensible. Hence, as in the simulations, a major lesson is that the reliability

of inference on efﬁciencies can be judged based on observable features of the data.

87

2.8 Output Tables

Table 2.1: Biases of Fixed Effects Estimates

 

 

 

biasl bias2 bias3

E(61 — a) E(112‘ — 11,) E(112‘ — 112‘)

T 7* N (1) (2) (3)
10 0.1 10 0.103 0.105 0.133
10 0.1 20 0.153 0.155 0.169
10 0.1 50 0.234 0.235 0.241
10 0.1 100 0.276 0.274 0.277
10 0.5 10 -0.009 -0.008 0.055
10 0.5 20 0.045 0.046 0.078
10 0.5 50 0.119 0.119 0.132
10 0.5 100 0.153 0.152 0.159
10 0.9 10 -0.076 -0.075 0.010
10 0.9 20 -0.028 -0.028 0.016
10 0.9 50 0.018 0.018 0.035
10 0.9 100 0.039 0.039 0.049
10 0.1 10 0.103 0.105 0.133
20 0.1 10 0.049 0.046 0.078
50 0.1 10 0.006 0.005 0.035
100 0.1 10 -0.007 -0.007 0.021
10 0.5 10 -0.009 -0.008 0.055
20 0.5 10 -0.038 -0.041 0.030
50 0.5 10 -0.054 -0.054 0.013
100 0.5 10 ~0.058 -0.058 0.004
10 0.9 10 -0.076 -0.075 0.010
20 0.9 10 -0.090 -0.091 0.004
50 0.9 10 -0.087 -0.088 0.002
100 0.9 10 -0.084 -0.085 0.000

 

 

88

Table 2.2: 90% Conﬁdence Intervals for Relative Efﬁciency (r2)

 

Infeasible Parametric

 

 

 

T 7* N Width P( < lb) P(>ub) cover
10 0.1 10 0.551 0.057 0.037 0.905
10 0.1 20 0.564 0.038 0.052 0.910
10 0.1 50 0.599 0.059 0.043 0.898
10 0.1 100 0.594 0.048 0.049 0.903
10 0.5 10 0.326 0.057 0.037 0.905
10 0.5 20 0.335 0.038 0.052 0.910
10 0.5 50 0.352 0.059 0.043 0.898
10 0.5 100 0.351 0.048 0.049 0.903
10 0.9 10 0.127 0.057 0.037 0.905
10 0.9 20 0.131 0.038 0.052 0.910
10 0.9 50 0.136 0.059 0.043 0.898
10 0.9 100 0.137 0.048 0.049 0.903
10 0.1 10 0.551 0.057 0.037 0.905
20 0.1 10 0.379 0.044 0.045 0.910
50 0.1 10 0.236 0.038 0.043 0.919
100 0.1 10 0.167 0.050 0.038 0.912
10 0.5 10 0.326 0.057 0.037 0.905
20 0.5 10 0.228 0.044 0.045 0.910
50 0.5 10 0.143 0.038 0.043 0.919
100 0.5 10 0.101 0.050 0.038 0.912
10 0.9 10 0.127 0.057 0.037 0.905
20 0.9 10 0.090 0.044 0.045 0.910
50 0.9 10 0.057 0.038 0.043 0.919
100 0.9 10 0.040 0.050 0.038 0.912

 

89

at"

 

Table 2.3: 90% Conﬁdence Intervals for Relative Efﬁciency (r2)

 

Percentile Bootstrap

 

BCa Bootstrap

 

 

 

 

 

T 7* N Width P(<lb) P(>ub) COVCI‘ Width P(<lb) P(>ub) cover
10 0.1 10 0.354 0.001 0.289 0.709 0.336 0.015 0.130 0.855
10 0.1 20 0.346 0.000 0.447 0.553 0.328 0.015 0.164 0.821
10 0.1 50 0.323 0.000 0.676 0.324 0.320 0.008 0.275 0.717
10 0.1 100 0.305 0.000 0.805 0.195 0.306 0.007 0.341 0.652
10 0.5 10 0.248 0.015 0.157 0.829 0.252 0.044 0.092 0.864
10 0.5 20 0.245 0.003 0.235 0.762 0.243 0.041 0.108 0.851
10 0.5 50 0.230 0.001 0.448 0.552 0.232 0.023 0.184 0.794
10 0.5 100 0.219 0.000 0.603 0.397 0.221 0.018 0.229 0.753
10 0.9 10 0.111 0.040 0.084 0.876 0.115 0.057 0.081 0.861
10 0.9 20 0.112 0.018 0.116 0.867 0.113 0.061 0.084 0.855
10 0.9 50 0.108 0.005 0.234 0.761 0.108 0.048 0.115 0.837
10 0.9 100 0.105 0.002 0.363 0.636 0.104 0.037 0.150 0.813
10 0.1 10 0.354 0.001 0.289 0.709 0.336 0.015 0.130 0.855
20 0.1 10 0.282 0.002 0.225 0.773 0.267 0.027 0.099 0.874
50 0.1 10 0.197 0.005 0.152 0.843 0.190 0.036 0.079 0.885
100 0.1 10 0.145 0.008 0.131 0.861 0.144 0.034 0.072 0.895
10 0.5 10 0.248 0.015 0.157 0.829 0.252 0.044 0.092 0.864
20 0.5 10 0.192 0.014 0.113 0.872 0.196 0.044 0.088 0.868
50 0.5 10 0.131 0.018 0.085 0.897 0.136 0.044 0.074 0.882
100 0.5 10 0.094 0.028 0.070 0.902 0.097 0.061 0.074 0.866
10 0.9 10 0.111 0.040 0.084 0.876 0.115 0.057 0.081 0.861
20 0.9 10 0.083 0.031 0.068 0.901 0.085 0.059 0.083 0.858
50 0.9 10 0.055 0.031 0.063 0.906 0.056 0.044 0.076 0.880
100 0.9 10 0.039 0.045 0.047 0.908 0.040 0.053 0.069 0.878

 

Table 2.4: Bias Correction in the ECG Bootstrap Intervals

 

 

max,- 01,- man dj max,- (32")
T 7“ N (1) (2) (3) (2)-(1) (3)-(2)
10 0.1 10 0.972 1.103 1.175 0.132 0.072
50 0.1 10 0.970 1.006 1.034 0.036 0.029
10 0.1 50 0.994 1.234 1.342 0.240 0.108
10 0.5 10 0.937 0.991 1.027 0.054 0.037
50 0.5 10 0.933 0.946 0.957 0.013 0.011
10 0.5 50 0.988 1.119 1.183 0.131 0.064
10 0.9 10 0.915 0.924 0.933 0.009 0.008
50 0.9 10 0.910 0.913 0.915 0.003 0.002
10 0.9 50 0.983 1.018 1.039 0.035 0.021

 

 

 

90

Table 2.5: 90% Conﬁdence Intervals for Relative Efﬁciency (if)

 

Feasible Parametric

“Max-best” Bootstrap

 

 

 

 

 

T 7* N width P( < 1131 P(>ub) cover width P( < lb) P(>ub) cover
10 0.1 10 0.463 0.071 0.113 0.816 0.433 0.072 0.138 0.790
10 0.1 20 0.471 0.039 0.144 0.817 0.444 0.039 0.173 0.787
10 0.1 50 0.455 0.018 0.258 0.724 0.429 0.018 0.295 0.687
10 0.1 100 0.445 0.010 0.327 0.663 0.420 0.010 0.374 0.617
10 0.5 10 0.301 0.058 0.070 0.872 0.282 0.060 0.089 0.852
10 0.5 20 0.308 0.033 0.085 0.881 0.290 0.035 0.107 0.858
10 0.5 50 0.301 0.017 0.163 0.820 0.285 0.017 0.190 0.793
10 0.5 100 0.298 0.009 0.215 0.776 0.281 0.009 0.248 0.743
10 0.9 10 0.124 0.055 0.049 0.896 0.117 0.059 0.061 0.880
10 0.9 20 0.129 0.032 0.061 0.907 0.122 0.039 0.075 0.886
10 0.9 50 0.130 0.019 0.096 0.885 0.123 0.021 0.116 0.864
10 0.9 100 0.130 0.010 0.132 0.857 0.123 0.011 0.156 0.833
10 0.1 10 0.463 0.071 0.113 0.816 0.433 0.072 0.138 0.790
20 0.1 10 0.344 0.067 0.090 0.844 0.333 0.067 0.099 0.834
50 0.1 10 0.227 0.053 0.073 0.874 0.224 0.053 0.078 0.869
100 0.1 10 0.162 0.053 0.067 0.880 0.161 0.053 0.068 0.879
10 0.5 10 0.301 0.058 0.070 0.872 0.282 0.060 0.089 0.852
20 0.5 10 0.219 0.051 0.065 0.884 0.212 0.053 0.070 0.877
50 0.5 10 0.141 0.042 0.055 0.904 0.139 0.042 0.058 0.900
100 0.5 10 0.100 0.050 0.049 0.901 0.100 0.052 0.051 0.897
10 0.9 10 0.124 0.055 0.049 0.896 0.117 0.059 0.061 0.880
20 0.9 10 0.089 0.048 0.051 0.901 0.087 0.052 0.056 0.893
50 0.9 10 0.057 0.038 0.048 0.914 0.056 0.041 0.052 0.907
100 0.9 10 0.040 0.052 0.041 0.908 0.040 0.055 0.043 0.901

 

91

Table 2.6: Biases of Fixed Effects Estimates (Case that ui are ﬁxed over replications)

 

 

 

biasl bz'a32 biasB

E(61 — a) E(u’: — ui) E(ﬁ: — 1%“)

T 7* N (1) (2) (3)
10 0.1 10 0.010 0.013 0.029
10 0.1 20 0.006 0.006 0.023
10 0.1 50 0.045 0.046 0.062
10 0.1 100 0.061 0.061 0.078
10 0.5 10 -0.035 —0.032 0.004
10 0.5 20 -0.049 -0.049 —0.012
10 0.5 50 -0.029 -0.028 0.008
10 0.5 100 -0.042 -0.041 -0.005
10 0.9 10 —0.048 -0.047 0.002
10 0.9 20 -0.055 -0.055 -0.006
10 0.9 50 -0.046 -0.046 0.004
10 0.9 100 -0.052 -0.051 -0.002
10 0.1 10 0.010 0.013 0.029
20 0.1 10 -0.021 -0.021 -0.004
50 0.1 10 -0.019 -0.018 -0.001
100 0.1 10 -0.019 -0.019 -0.002
10 0.5 10 -0.035 -0.032 0.004
20 0.5 10 -0.042 -0.042 -0.005
50 0.5 10 -0.039 -0.038 -0.001
100 0.5 10 -0.039 -0.039 -0.002
10 0.9 10 -0.048 -0.047 0.002
20 0.9 10 -0.052 -0.052 -0.002
50 0.9 10 ~0.050 -0.050 0.000
100 0.9 10 -0.050 -0.050 -0.001

 

 

92

“II—asp.” yuan—“w. .1":

 

 

 

3:3 333.3 333.3 333.3 3:33 3333 333.3 333.3 :3 333.3 333.3 333.3 3: 3 33:
333.3 333.3 3333 :333 333.3 333.3 333.3 :333 333.3 333.3 333.3 :333 3: 3 33
:3 333.3 33 333.3 333 3:33 :333 3:33 333 :33 :333 3:33 3: 3 33
3:3 3:33 3:33 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 3: 3 3:
3:3 333.3 3:33 333.3 3:3 333.3 33.3 333.3 :3 333.3 333.3 333.3 3: 3.3 33:
333.3 333.3 333.3 333.3 333.3 333.3 333.3 :333 333.3 333.3 333.3 :333 3: 3.3 33
:3 333.3 33 333 333.3 33 :333 3.3 333 :33 :333 3.3 3: 3.3 33
3:3 3:33 3:33 3:3.3 333.3 :333 333.3 333 3333 333.3 333.3 333 3: 3.3 3:
3:3 333.3 3:33 333 333.3 333.3 333.3 333 :3 333.3 333.3 333 3: :3 33:
333.3 333.3 333.3 333 333.3 23 333.3 333 333.3 333.3 333.3 :33 3: :3 33
333 333.3 3:33 3:33 333.3 333.3 :33 :333 3:3 333.3 333.3 333.3 3: :3 33
333 3:33 :333 333 333.3 :333 33:.3 333.3 333.3 333.3 333.3 333.3 3: :3 3:
333.3 3:33 333.3 333.3 333.3 333.3 333.3 E33 E33 3333 333.3 333.3 33: 3 3:
333 333.3 33 3.333 333.3 333.3 333.3 3.33.3 3333 333.3 :333 333.3 33 3.3 3:
:3 333.3 333.3 333.3 333.3 33 333.3 :33 333.3 33.3 333.3 :333 33 3 3:
3:3 3:33 33 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 3: 3 3:
333.3 3:33 333.3 333.3 :333 333.3 333.3 3:33 333.3 3333 333.3 3:33 33: 3.3 3:
333 333.3 33 333.3 :333 333.3 333.3 3:3.3 333.3 333.3 :333 3:3.3 33 3.3 3:
3:33 333.3 333.3 333.3 333.3 333.3 333.3 3:33 333.3 3.3 333.3 3:33 33 3.3 3:
3:3 3:33 3333 3:33 333.3 :333 333.3 333 333.3 333.3 333.3 333 3: 3.3 3:
333 3333 333.3 333 333.3 333.3 3:3 333.3 333.3 3333 :333 3:3.3 33: :3 3:
333 333.3 3:3.3 333 333.3 333.3 33:.3 3333 333 333.3 333.3 333.3 33 3 3:
:33 33.3 333.3 333 :33 333.3 :33 333.3 333.3 33:.3 3:33 333.3 33 :3 3:
333 3:33 :333 333 333.3 :33 33:3 333.3 333.3 333.3 333.3 333.3 3: :3 3:
:28 3.53: 33:3 333:3 5.53 3.50: :33“: 333:3 33.38 3.53: svm: 333:3 z *3 3.

 

253::33m 33:33.3:

 

 

adbmuoom 3.0m

 

 

93.:333oom 25:80.83“

 

 

3:053:93: 339:3 :3me 3:3 3: 3.3:: 330V A TV xocmsﬁm 9:33:33 :8 3:33:35 3:35:00 .3500. ”Fm .3338

93

Table 2.8: Estimated Efﬁciencies and 90% Conﬁdence Intervals: Indonesian Rice
Farms

 

 

 

Fixed Effects MLE
Firm Point Point
No. Estimate LB UB Estimate LB UB
164 1.000 1.000 1.000 0.964 0.903 0.998
118 0.933 0.682 1.000 0.964 0.902 0.998
31 0.620 0.447 0.859 0.924 0.823 0.994
15 0.554 0.403 0.762 0.923 0.792 0.990
16 0.501 ' 0.362 0.694 0.845 0.725 0.969
117 0.380 0.275 0.524 0.773 0.658 0.907
45 0.366 0.266 0.504 0.774 0.659 0.908

 

 

 

94

 

 

 

 

 

 

 

 

 

ddvd dnmd weed eemd uevd nmmd Ndvd ddmd vmvd mend eemd ev
eﬁed ewmd vmed enmd ddvd mead weed dmmd Rind Emd deed 3.:
ﬂed dmmd weed Need deed Bed dated Hmvd weed weed Bed e:
webd :vd mend edvd eded Heed dmnd Evd weed ddwd uweed e:
Edd eevd deed 333d need mdmd dmwd ﬂed dehd evwd dmed He
odd: need odd: Need dddé awed odd: etd odd: :ed emdd e:
odd: odd: dddé dddg dddg meed dddé heed doe: mid odd: we:
m5 m3 m3 m5 m5 m3 m2 m3 m3 m4 3.3m— .oZ
awbmuoom 13535 935330on awbmuoom amsmuoom m: :EE
2.333-332: 2.538333% dougmﬁ 3.0m 25:33.:om

 

38.3.3: 02m 53:30:35: ”3:35.35 83:03:80 Rodd dd 0338

95

Table 2.10: Estimated Efﬁciencies and 90% Conﬁdence Intervals: Texas Utilities

 

 

 

— ___ (‘_‘n.‘ 5'...—

 

Fixed Effects MLE
Firm Point Point
N 0. Estimate LB UB Estimate LB UB
5 1.000 1.000 1.000 0.987 0.971 0.999
3 0.916 0.823 1.000 0.978 0.959 0.996
10 0.861 0.786 0.943 0.908 0.889 0.927
1 0.835 0.784 0.889 0.864 0.846 0.882
8 0.820 0.773 0.869 0.846 0.828 0.864
9 0.806 0.766 0.848 0.826 0.809 0.843
2 0.801 0.749 0.855 0.831 0.814 0.848
7 0.786 0.732 0.844 0.817 0.800 0.834
6 0.785 0.730 0.845 0.820 0.803 0.837
4 0.762 0.719 0.808 0.786 0.770 0.801

 

 

 

96

 

 

 

 

 

 

 

 

 

 

weed dmnd eeed eHnd eend emnd nend eHnd eend omnd mend v
Zed mend eeed dend eeed nend Heed dend Need mend eend e
need eend vved mend need eend deed eend deed eend eend n
Eed eend eeed evnd eeed eend mved eend eeed eend Sed m
eeed eend eeed eend eeed ennd eeed Hnnd zed eend eded e
need nnnd eeed ennd need Hend eeed ennd eeed nnnd dmed e
need eend eeed ﬁend vned mend wned eend nned eend eeed H
weed eend eeed eend deed eend Need eend weed eend Heed dH
ddda need dddA emed ddda eeed dddA eeed dddg need eHed e
dddg dddA ddda dddA ddda need dddg eeed odd; eeed dddg e
m3 m5 m3 m3 m3 m5 m3 m3 m3 m3 .pmm— .oZ

aaﬁﬁoom g msbﬁoom aabﬁoom mesmuoom mm :Em
..._$o£-x.m2: oCumESmm ©3553 ebm mzﬁagom

$5253 @588 ”meme/H35 8:25:00 cede ”EN @368

97

Chapter 3

Indicator KPSS with a Time Trend

3. 1 Introduction

In this chapter, we propose a statistic to test whether a time series is stationary,
and we allow for a time trend. A standard test for stationarity is the KPSS test
by Kwiatkowski, Phillips, Schmidt, and Shin (1992). The KPSS test, 1‘7” uses the
scaled sum of squares of cumulations of demeaned data with a long-run variance
estimate in the denominator. A deterministic trend can be allowed in the test of
trend-stationarity in which the demeaned data in 1?” are replaced by the residuals
from the regression of the series on intercept and trend.

In the construction of the KPSS tests, conditions enough to imply Functional
Central Limit Theorems (FCLT) are assumed. One of these conditions is the ﬁnite
variance assumption. However, when the data have fat-tailed errors such as those from
the Cauchy distribution in which the moments do not exist, the limiting distributions
of the KPSS statistics are functionals of the Lévy process (Amsler and Schmidt 2000),
not a Wiener process. In the paper by de Jong, Amsler, and Schmidt (2002), the
authors relax the moment assumption and propose a modiﬁed version of KPSS test,
1‘7”. They call their test the “indicator KPSS” test which we will label Zp. The sample
data are transformed using an indicator which gives the value of 1, 0, or -1 depending

on whether or not the sample observation is above, on, or below the sample median.

98

 

Under the null of level-stationarity, in is shown to have the same limiting distribution
as the KPSS test, ii”.

In this chapter, we use a similar indicator to transform the data, but allow for a
deterministic trend as well as a non-zero level for the data. Let the indicator KPSS
statistic with a time trend be denoted as £7. We show that the asymptotic distribution
of ET under the null of trend-stationarity is a function of the second-level Brownian

bridge, which is also the limiting distribution of the KPSS statistic with a time trend,

A

771-

3.2 Asymptotic Theory

3.2.1 Assumptions

Let {{:1:Tj};r:1}§9:1 be a triangular array of random variables such that

my = 00 + ﬂog: + 63'. (3.1)

Assumption 1. There exist unique £10,180 such that med(:ch) = (10 + Bo j /T for all
T andj=1,-~ ,T.

Note that this implies that the unique median of ej = ij — a0 — 30 j / T is zero.
The next assumption is a convergence condition on the average variance of the sum

of transformed ej with the ﬁniteness of the long run variance, 02.

2
Assumption 2. Deﬁne 02 = limr_r_,ooE(T"1/2 Z$=1 sgn(ej)) , where the sgn
function takes three different values of 1, 0, or -1 depending the sign of an argu-
ment: sgn(a:) = 1 ifx > 0, sgn(:1:) = 0 ifa: = 0, and sgn(:r) = —1 ifa' < 0. Then,

2

0<a <oo.

The next assumption is about the kernel function, k().

99

 

Assumption 3. k() is continuous at 0 and at all but a ﬁnite number of points.
Ic(a:) = h(—$) for all a: E R. k(0) = 1. |k(x)| S l(:c) where 1(3) is nonincreasing and
f0°° |l(x)|da: < 00. Also, k() satisﬁes [3°00 |w(£)|d§ < 00, where

00

we = (270—1 f k<x>exp<—z‘sx>dz. (3.2)

"'00

The Bartlett, Parzen, Quadratic Spectral, and Tukey-Hanning kernel functions
are some possible choices (de Jong and Davidson 2000). These kernel functions are
designed to lessen the effects of the longer lags smoothly to zero so that the kernel
function such as the uniform or the truncated kernel is excluded. The next set of
assumptions is about the ej, and will be used in deriving the asymptotic distribution

of the indicator KPSS statistic under the null of trend-stationarity.

Assumption 4. The ej are stationary random variables and strong (a—) mixing with
— r .—

mixing coefﬁcients a(m) which satisﬁes a(m) S Cm :2 77 for some ﬁnite r > 2,

some 17 > 0 and a constant C. And ej has a continuous density f (e) in a neighborhood

{—7}, n] of 0 for some n > 0, and inf66[_,m] f(e) > 0.

Assumption 41is different from general conditions on the stationary errors used in
the derivation of the asymptotic distributions of the KPSS statistics (Phillips (1987)
or Phillips and Perron (1988)). The important difference is moment conditions on
ej. For example, in Phillips (1987 ), the moment condition like sup,- E lej|‘9 < 00 for
some 19 > 2 is assumed. However, in this chapter, we do not assume the existence of
moments of 53' under the null. This is made possible by the use of the indicators.

The next assumption is for the alternative of unit root.

Assumption 5. The 63- satisfy T _1/ 25[§T] => AW(§) for some /\ 6 (0,00) andé E

 

1In this chapter, Assumption 4 is stated in terms of 45,-, not 1273 as in de Jong, Amsler, and
Schmidt (2002). This is to emphasize that we are interested in the test of trend-stationarity.
That is, the assumptions for e,- in this chapter (or the detrended series, arr,- — ao — ﬁg j /T)
are the same as the Assumption 2 in de Jong, Amsler, and Schmidt (2002).

100

[0,1], where W() is a Wiener process or Brownian motion.

Note that Assumption 5 also implies that T‘l/zxmm => AW(§) since T’l/2
$T,[gT] = T-1/2(ao + ﬁolETl/T) + T-l/zqgr] = 012(1) + T-l/qur] =¢ AW“)-

3.2.2 Indicator KPSS statistic

Using the least absolute deviations (LAD) estimators 6r, 3 which are solutions to

 

T
. .7
rain: $T] — a — 5T , (3.3)
1:1
we deﬁne the cumulation of the indicator data
‘ «j
STt = :sgn(:ch — 61 — ﬁT). (3.4)
i=1
Then, the indicator KPSS statistic with a time trend, L, is deﬁned as
T
6‘2T‘2 Z 5%,. (3.5)
t=1

2 can be constructed from the “indicator” resid-

A consistent estimator of 02, 6
uals, sgn(a:Tj — o} - ﬂit/T). Using a weighting function, the heteroskedasticity-

autocorrelation consistent (HAC) estimator, 62 is obtained by

T

T . . . .
A2 ---1 2 - J A A z A A J
0 = T 2.2—1:124 k( ”YT )sgn(:rT,- — 01 — ﬂ?) 5811(5’3Tj “ a — 5?), (3'6)

 

where k() is the kernel function. ’YT is the lag truncation parameter which goes to
00 as T —> co and satisﬁes the condition of 7T/T —-) 0.
Note that ET is deﬁned in a similar way as the KPSS statistic, 1‘77. The difference is

that we use the deviations from the median while 1‘77 is based on the deviations from

101

 

the mean of the series. The indicator KPSS is based on the sample median which is
the generalization of the ﬁt from a LAD regression.

As noted in de Jong, Amsler, and Schmidt (2002), the purpose of trimming the
data is to remove the effects of fat tails or make the variance ﬁnite. We use the sgn
function to bypass the problem of how to scale the data so that only the location of
the data is used to transform the data. This is because sgn(a:) = (I(a: Z 0) —I(:r S 0))
and |:c| = a: - (I(a: 2 0) — I(a: S 0)) = a: - sgn(a:), where I() takes the value of one if

the argument is true and zero otherwise.

3.2.3 Conjectures

Before stating theorems for the asymptotic distributions of £7, let us make conjectures

on a and 3 as the proofs for the following claims are only partially done.

Conjecture 1. Under Assumptions 1 and 4, T1/2(& — 00) = 0,,(1) and T1/2(ﬁ’ -—
50) = 0p(1)-

What we want to assert in this conjecture is, for an arbitrarily large K > 0,

limsupP( sup sup Y1T(¢1.¢>2) 2 0)
T-)00 ¢1>K ¢2>K (3 7)

=1imsupP( 811p sup Y2T(¢1.¢2) 2 0) = 0
T—ioo ¢1>K ¢2>K

so that the probability of having solutions (bl, (1)2 outside <I> = {(cbl, (152) 6 1R2 : —K _<_

4513 K,—K§ (152 g K} goes to zeroasT—-+oowhere

T . .
Y1T(¢1.¢2) = ”FA/2 ngn(ij — <10 — 50% — T_1/2(<f>1+ ¢2%)).
jzl (3.8)

T .
Y2T(¢1.¢2) = T4” ngn($:rj - a0 - 50;;- - T—1/2(¢1 + (152%))

2'.
. T'
1:1

However, there are the four possibilities for obtaining large values for 451 and/ or 432:

102

0 case [1]: (b1 > K and $2 > K,

0 case [2]: d1 < —K and (ﬁg < —K,
0 case [3]: qbl < -—K and (b2 > K,
a case [4]: (b1 > K and (b2 < —K.

The proof of (3.7) corresponds to case [1]. The proof for case [1] and case [2] is shown
in an Appendix. However, the proof for case [3] and case [4] remains to be done. Also,
note that the case in which only one of |¢1| and |¢2| is larger than K is a special case
of either case [1] or case [2] and can be proved in a similar way as in the proof for the
ﬁrst two cases.

The following conjecture makes a similar claim as Conjecture 1, but the difference

is that we assume 6]: is an I(1) process.

Conjecture 2. Under Assumptions 1 and 5, T’1/2(& — a0) = 019(1) and T-1/2(ﬁ —
50) = Op(1)~

Here we also have to consider the four possibilities in which |T‘1/2cil and/or
|T‘1/2ﬁ| are greater than K. In the Appendix, we prove two cases when both
IT’l/zdl and IT‘l/zﬁ] are either greater than K or less than K. The two other

cases would be proved in a similar way as in the unsolved cases of Conjecture 1.

3.2.4 The Asymptotic Distributions of the Indicator KPSS
Statistic
Theorem 1. Under Assumptions 1, 2, 3, and 4 and Conjecture 1,
T

I
T_2 23% —d> 02/0 V2(r)2dr, (3.9)
t=l

103

where V2 (r) is the second-level Brownian bridge,
1
V2(r) = W(r) + (2r — 3r2)W(1) + (—6r + 6r2)/ W(r)dr. (3.10)
0
And
62 L 02. (3.11)

The limiting distribution of ET is fol V2(r)2dr, which is also the limiting distribution
of the KPSS test with time trend, 1?, so that the same critical values in the paper
by Kwiatkowski, Phillips, Schmidt, and Shin (1992, p.166) can be used. Under the

alternative in which :rTj is an I(1) process, we have the following result.

Theorem 2. Under Assumptions 1 and 5 and Conjecture 2,

T

1 c 2
T"3 23%, —d+ A2 /0 (f0 sgn (W(g) - é - g5) d5) dc, (3.12)

t=1

where (T-1/2&,T‘1/ZB)’ —d> (A, B)’ for random variables A and B, and 62/771 —a—'—)

2ft?o k(§)d(.

Other than whether the underlying series is stationary or not, the important
difference between the assumptions used in deriving the results of Theorem 1 and
Theorem 2 is the moment condition on 63-. In Theorem 1, we do not impose a
condition for the existence of the moments. However, in Theorem 2, we need a ﬁnite
second moment of ej in order to apply FCLT.

Also, note that the limiting distribution under the alternative of unit root in

Theorem 2 is different from that of the KPSS statistic with a time trend which is

[01 U: W*(s)ds)2da/K fol W*(s)2ds (3.13)

104

where W*(s) = W(s) + (63 — 4) fol W(r)dr + (—123 + 6) fol rW(r)dr. The differences
in the asymptotic distributions will turn into power differences as in de Jong, Amsler,
and Schmidt (2002). Under the alternative of unit root with the fat-tailed errors,
the indicator KPSS test with a time trend would be more powerful than the KPSS
test with trend, and less powerful when the errors are normally distributed. This is

because the indicator is only concerned with the location of the data.

3.3 Concluding remarks

In this chapter, we have extended the indicator KPSS test proposed by de Jong,
Amsler, and Schmidt (2002) to the case in which a time trend as well as non-zero
level is allowed. The indicator KPSS test with a time trend also does not require the
existence of the moments of the series, yet produce the same asymptotic results as
the KPSS test with a time trend, 177-. However, this result depends on our conjectures
on the estimators.

The indicator can be extended to unit root tests such as Dickey-Fuller, Phillips-
Perron, or Schmidt-Phillips tests. We expect that the use of the indicator would
produce more powerful results when the errors have sufﬁciently fat tails, which is
commonly associated with ﬁnancial time series. However, the asymptotic results of
unit root tests with the indicator under the null of unit root might be different from
those without the indicator. As in our chapter and de Jong, Amsler, and Schmidt
(2002), the unit root tests with the indicator might produce the same asymptotic

results as the tests without the indicator under the alternative of stationarity.

105

3.4 Appendix: Mathematical Proof

Here is the outline of proofs. Lemma 1 shows an inequality involving the Lp—norm
which will be used in Lemma 2. Lemma 2 states that GT(1,¢) — EGT(1,¢) =
T-l/2 E?=1(yTj (4)) - EyTj (45)) is stochastically equicontinuous. In Lemma 3, the
uniform convergences of GT(r,¢) and HT(1,7) over corresponding compact sets of
parameter values will be established. A partial proof of Conjecture 1 follows. In
Lemma 4, the asymptotic distributions of the estimators of the regression coefﬁcients
are derived. Then, Theorem 1 proves the asymptotic distribution of the indicator
KPSS statistic along with the consistency of the long-run variance estimator. Con-
jecture 2 is partially proved. Finally, in Theorem 2, we show the limiting distribution

of the statistic when the ij have a unit root.

Lemma 4. For strong (a-) mixing random variables yTj E IR whose a-mixing coef-

— r _-
ﬁcients satisfy a(m) S Cm :2 ’7 for some n > 0,

T T
E :(yTj - EyTj) ~ S E max E(yTj - EyTj) S 0'2 M 3m ”3
j=1 - - '=1 i=1

(3.14)

for constants C, C' .

Proof of Lemma 1. By Theorem 17.5 and Corollary 16.10 of Davidson (1994). El

Lemma 5. Let z = (Li/TY. 4’ = (¢1.¢2)' and 1P = Win/ny- Let yTj(¢) =

sgn(:ch — 010 — Boj/T — T'1/2Z’¢)— sgn(a:Tj - 0:0 — Boj/T). Then, under Assumptions

106

1 and 4, for all K,e > 0,

lim lim sup P sup
57° T-*°° |¢1|SK.I¢2|SK.
|¢1ISK.I¢2|SK.

(|¢1-¢1|+l¢2-¢2I)<5
T

71““2 2 Iowa) — Ema» - (mu) — Eyre-(«pm < e) =1.
'=1

(3.15)

Proof of Lemma 2. For T large enough such that 2K T‘l/ 2 g n

T
SUD T_1/2 IEyT'(¢) - EyT'(¢)I
(Ida-1P1|+|<z52--1/22|)<<S Z J J
T
= 2 sup T‘1/2 Z [F(T-1/2z’¢) — F(T-1/2z’¢)[
(|¢1-¢1l+|¢2-¢2l)<5
T
= 2 sup 1”: f(T‘1/2z’)T"1/2(¢1 — 1P1)
(|¢1—¢1|+|¢2-¢2|)<5T j:1

 

+ f(T"1/2Z'£) -T_1/2(¢>2 - do)?

=2 SUP T4 f(T T1/22'f)[(¢1 -¢1)+(¢2-¢2)i
(|¢1-¢1|+|¢2-¢2|)<5 Z( T

_<_2 - sup (ST-12“?" T1/2Z'£)||¢1- ¢1|+|¢2_¢2||
(|¢1-¢1|+|¢2-¢2|)<5

< arr-12“ WW (:1 +£2— T»
j: —1
S 26 _ sup f(§).
{El-77117]
where F () is the cdf of e and 5 E L(¢, 1b), a line segment from 4) to 1b. This establishes
the equicontinuity of T’l/2 ZleEyTjw) on (P = {(451,432) 6 R2 : —K S 451 s

K, —K S (252 S K} since 6 can be made arbitrarily small. Then, the stochastic

107

equicontinuity of T‘l/2 Z}; yTJ-(d) can be shown as follows. Let ii); = (i6, i5)’ and
i(1+2); = ((i + 2)5. (i + 2)5)'-

T
—1/2
sup T lyTj(¢) - yT '(WI
|¢1lSKJ¢2|SKa 1:2:1 J
WllSKJWISK.

(|¢1-¢1|+|¢2-¢2l)<5

= sup sup
_[%]_15,S[%] (151.1111 €[i6,(i+2)6]n[—K,K]

T
sup T_1/2 9T (¢) - yT ('1’)
¢Z,tb2€[i6,(i+2)6]n[—K,K] 231' J J I
T
S SUP T-1/2 Z (yTj(ii6) - yTj(i(i+2)6))
431-lass]
T
3* sup T_1/2 Z (E yTj (ii6) — EyTj (i(i+2)d))
Taxes—1
T
S SUD T_1/2 IEyT'(¢) - EyT '(II’) -
|¢1ISK.I¢2ISK. £31 3 J I
|¢1|SK.l1/)2|SK.

(|¢1-¢1|+|¢2-¢2l)<5

For the ﬁrst inequality above, note that yTj is nonincreasing so that the maximum
distance between ¢ and 1/1 in sub-intervals of [-—K, K] will give rise to the supremum
of (yTj(i,:5) — yTj (i(,-+2)5)) for each sub-interval. For the last inequality, 43 and p are
now chosen all over the interval. The pointwise convergence for every i holds because,

for T large enough such that 2KT"1/2 S 77, by Lemma 1,

T 2
E (TI/2 2: [(yTj (its) - yTj (i(i+2)6)) _. E (yTj (it'd) ’ yr) (i(i+2)5))l)

i=1

T
g CENT—V2 (yTj (its) - yTj (i(i+2)6)) H3
j=l

108

T
= CT‘1 21]] (yTj (ii6) — yTj (i(i+2)6))|[12-
J:

gCT'IZ supKllysj<¢>Il3

T . .
= CT 1 Z sup || 8311(ij — as — 51%— T‘1/2¢1 — 134%?
j=1]¢1]SK1
|¢2ISK
j 2
- Sgn(-TTj — a0 - 1305:)“

1‘

ii"

3 C’ 1 — 2F(—2T‘1/2K)

 

‘Ilw

= C’ 2T‘1/2K sup f(§)
IEISn

——>0,

 

 

as T —> 00 and for some positive constants C, C’. Note that F() is the cdf of e and
there are two cases to consider for the last inequality since (T’1/2gb1 + T-l/ 2¢2 j /T)
will be either nonpositive or nonnegative. In the below, we prove that the inequality

holds in either case;

. case [1]: T’1/2¢1 + T-l/Zszj/T g 0. This implies -—2T"1/2K g —T-1/2K —
T-l/ZKj/T g T-1/2¢1 + T‘l/qugj/T so that

SUP [I Sgn($Tj - ao - ﬂog.- - T-1/2451 - T'1/2¢2§J:)

 

|¢1ISK.
|¢2ISK
j 2
- 5811(ij - 00 - 50%)”?
2
- su (EI2I(T"1/2 _1/2 1 '— — l )1”),-
— p ¢1+T 452 SICT) a0 50 SO
|¢1ISK. T T
|¢2|SK

)2.
. %
= 4 (E1 (—2T"1/2K g (BTJ' — a0 —[30€l]; S 0))

g (E [21 (—2T"1/2K S JITJ‘ - an — 605],— S 0)

 

109

~ilk:

4 (P (—2T‘1/2K g ij — a0 — egg; 3 0))

22 %[1—2F(— 2T‘1/2K)]%

0 case [2]: 0 _<_ T-1/2q51 + T_1/2¢2j/T. Then,

j _ _ .
sup [lssneTj—ao—ﬂoT—T “211—7“ 1”12%)
|¢1|SK.
|¢2|SK

j 2
— sgn(a:Tj — a0 — BOT)|]1~

‘NN

- r
= sup (E[—2I(0<ij—a0—ﬂo% <T—1/2¢ +T‘1/2¢2— —)[)
|¢1ISK.
l¢2|SK
__ r ._ _ J; —1/2 r F
E El 2l I 031T, 010 ﬁoTSZT K

 

= 22-; (2F (2T‘1/2K) —— 1);.

E]
Lemma 6. Let yTj(¢) be as in Lemma 2, and let
erl T—1/2 ET
:1 W '(7)
cm 1)-—-T1/2Zy:r)(1) 111(11): _1/2 ,3 . ’ . (3.16)
1': T 23:1 TyTjW)
Then, under Assumptions 1 and 4, for any K > 0,
sup sup 16110.1) — E62111)! A o. (3.17)
l¢ll£K1 TE[0,1]
|¢2|SK
SUP IHT(1.7) - EHT(1.'7)| i1 0. (3-18)
|71|SK.
|72|SK

110

Proof of Lemma 3. Let

JT(¢)= SUP IGT(T,¢)-EGT(T.¢)I- (3-19)
r6[0,1]

For each 41 with its elements in [—K, K] and T large enough such that 2K T‘l/ 2 S 17,

2
E (JT(¢))2 = E ( SUP IGT(7‘.¢) - EGT(T.¢)I)

rE[0,l]
2
= E sup
(r€[0,1] )

[TT]
T4” 2 (yTj(¢) - E yTj(¢))
T 2
S 0T4 Elli/TM)”.
j=1

 

 

j=1
(3.20)

‘3lt0

g C’ 1— 2F(—2T-1/2K)

 

‘IIN

= C’ 2T—1/2K sup f(§)
|£|Sn

—>0,

 

 

as T —> co and for some positive constants C, C’. This implies JT(¢) = op(1). JT(¢)

111

is also stochastically equicontinuous because, for d E <I> and 41’ E <I>,

IJT(¢) - JT(¢')I

: sup IGT(T,¢) — EGT(T,¢)] — SUP [GT(T)¢’) — EGT(T)¢I)|

r6[0,1] r€[0,1]
S 33131] [GT(7°1¢) - E GT0. 4’) - GT(7‘1¢’) + E GT0“. ¢')l
W] (3 21)
= SUP T_1/2 Z (yTj(¢) - E yTj(¢) — yTj(¢') + E yTj(¢')) '
T'E[0,I] j-1
W]
S SUP T4” 2 IyTj(¢) - E yTj(¢) - yTj(¢') + E yTj(¢')|
TE[0,I] j=1

T
S T.”2 Z lyTj(¢) - EyTj(¢) - yTj(¢') + EyTj(¢')|-

j=1
Then, by applying Lemma 2 to (3.21) and together with pointwise convergence in
(3.20), uniform convergence of JT(¢) follows, which proves (3.17). Pointwise con-
vergence in probability of GT(r, 41) to EGT(r, 41) in 45 for every possible r has been
proved. In other words, JT(¢) is pointwise convergent in probability to zero and
stochastically equicontinuous. The uniform convergence in probability of JT(¢) to
zero or that of GT(r,¢) to EGT(r, 41) follows because, with compact sets and the
equicontinuity of E GT(r, 41), pointwise convergence with stochastic equicontinuity is
equivalent to uniform convergence. See Newey (1991).

For the uniform convergence of (HT(1,*y) — E HT(1,7)) to zero, note that

011(1),” - E GT(117)

HT(1,'y) — EHT(1.')) = T -
T‘l/2 Zj=1i'(3/Tj(‘7)" EyTj(’Y))

where GT(1,7) — ECT(1,7) is uniformly convergent from (3.17), and T—l/2 Z};

(yTJ-(y) — E 3177(7)) j / T is also uniformly convergent which can be seen by comparing

112

the following expressions to (3.20) and (3.21), respectively:
T j( 2
T” Z T (11,-(v —EyT,-(7))
2
< E E(T_1/2T Z (yTj(’Y) - )EyTj(7)))

1:1

T
S CZHIJTJ'W)
i=1

 

and
T j( T j(
T’”2 Z T (rm-(7 ‘EyTj(7))T1/2Z 5): (WM ‘1')- EyTjW»
i=1 1:1
T 1
S T—1/2231ITI(yTj(7El/Tj(7) - yTj('Y’) + EyTj('Y'))]
T
S TV2 ;I( yrj(7 Eyr) (7) - yrj(7') + Eyrj(7'))l.
thereby proving (3.18). E]

Partial Proof of Conjecture 1. Here, we provide a partial proof of Conjecture 1 under

Assumptions 1 and 2: that is, T1/2(61 — a0) and T1/2(B — BO) are Op(1). Let

T . .
Y1T(¢1.¢2) = T—1/2 ZSgIICETj - 010 - 50% - T_1/2(¢1+ ¢2%)).
j -1 (3.22)

Y2T(<131.<z52)=T1/2 28811073“ - CYo - 50% - T_1/2(¢1+ (1)21 T))

l
3‘: -1 T

In order to show T1/2(d — an) 2 0,,(1) and T1/2(ﬁ — B0) = 0,,(1), four possibilities

for obtaining large values for (151 and / or (132 have to be considered;

0 case [1]: (151 >Kand ¢2>K,

113

0 case [2]: (151 < —K and (£12 < —K,
0 case [3]: d1 < —K and (152 > K,
0 case [4]: $1 > K and (152 < —K.

However, we prove only ﬁrst two cases in which both (151 and ([52 are either greater
than K or less than —K, which are done by showing that the probability of having
such solutions outside the compact set <I> = {(451,452) 6 R2 : —K S (131 S K, —K S
(152 S K} is arbitrarily small. Also, cases when only one of (151 and (252 is outside <I>
can be proved in the same way as in de Jong, Amsler, and Schmidt (2002).

Suppose that 451 > K and (:52 > K. Let’s start with Y1T(¢1,¢2). ForT Z 4K2n"2,

T . .
__ .7 — .7
sup sup 111131.12) = T “2 Zssnarj - as - 30f — T mm + KT»
¢1>K¢2>K j=l

T .
P :r-ll/2 2(1— 2F(T-1/‘-’-(K + 1%)»
-T‘1/2:2(F( —F(T-1/2(K+K;)))

= T-W 221(1)<—T*1/2(K + 11%))
j=1

for some 2: E [0, T‘1/2K( 1 +40] ]C [0,T—1/2K(1 + 1)] Q [0.77]

 

T+1
<—2 f TlK 1+— )=—2K 1+ inf :1:
[315.7me E: :( ( 2T )ITISnf( )

T
< —2K 1 — f —3K 'f
_ ( +2T)linSnf($): [313an

so that

limsup P( sup sup Y1T(¢1,¢2) 2 0) S limsup P(— —3K inf f(x) _>_ 0) = , (3.23)
T->oo ¢>1>K ¢2>K T—ioo lxlsrr

114

since K > 0 and ianTISn f(x) > 0. Next, in case of Y2T(¢1, 412),

SUP SUP T 1/2ZSSH(~TTj—010-,BO—_T 1(2((?51 +¢21T))

J'
¢1>K ¢2>K j_ _1 T

=T 1/2ngn( (sTj—ao—sO—- T“1/2(K+K—

.7
j=1 TDT

T . . .
— -1 2 ._ _ i_ -1 2 l 2.
—T / EH?) 00 ﬁoT T ((K+KT) 20)T

—12 —12 j
—T /TjéusT,—aO—so—— T /(K+KT)<0)T

T‘1/2(K+KT)
:3 d2:

__,T_1/22(/T°_°1/2HK KT) )f(s)da:—/_oo f()

: T-1/2 Z(F(oo) — 2F(T-1/2(K + 11%)) + F(—oo))—%
j=1

T .
—_- :r-l/2 2(1 — 2F(T-1/2(K + K%))

ﬁle.

 

j=1
T
_ —1/2 -1/2 Z.
_T g2f(:c)( —T (K+KT))T forsomexE[0,n]
1 T 1 11
_2T K|£|Iisfnf(x )1; (T + T5)
_ _ T(T + 1) T(T + 1)(2T + 1))
- 1K 1121.111) (T2 613
< —2K inf f(T) (1+1)— " —§K inf f(T )
_ [3'5" 2 3 3 [2;|<n

Therefore,

limsupP( sup sup
T—+oo ¢ >K¢ >K
1 2 (3.24)

Tl/zisgn (3t - 010 — ﬂo— _ T—1/2(¢1 + 4527-1)) 717:0) = 0‘

j: -1

115

The second case to consider is when 451 < —K and (1)2 < —K. The proofs in this

case are similar to the ones just done. Let’s start with Y1T(¢1,¢>2).

' f ' f Y ,
¢11<11_K¢21<11_K 1T(¢1 (152)

T . .
_ —1 2 __ _ _~7_ _ —l 2 _ _ l
_T / Elsgncrfp] ao ﬁOT T / ( K KT))
7]:

T .
Jl+ T'1/2j;1(1 — 2F(T‘1/2(—K — K-%)))

T .
_ _ J
= T 1/2; :2(F(0) — F(T l/2(—K — KT)»

 

=T1/212:2f(x)(T1/2(K+K%)) forsomexE[-n,0]
T+1
_<_ 2 1nfT1K(1 +% K(+1 Inf
I:I=|<nf(x)T 2} 2T )lx|<nf(x x)

T
2<K(1 + ~27.) lar|n£ﬂf(x)— - 3K|iln<f17 f(flI)

so that

limsup P( inf inf Y1T(¢1,¢2)_ < O)_ < lim sup P(3K inf f(a: )_<_ O)_ — 0.
T—>oo ¢1<-K¢2< <-K T—mo |$|<Tl

(3.25)

In case of Y2T(¢1, (1)2),

inf inf T— I/zzsgnm—ao—ao—-T—1/2(¢ +¢2— T»

.7
¢1<—-K¢2<-K j“ -1 T

T . . .

_ —1/2 _ _ 1 —1/2 1 1

_T Elsgncrt 010 ﬁ0T+T (K+KT))T
J:

116

1» T"1/2JZ=:1(1 —— 2F(—T-1/2(K + K%)))%
1/2 T 1/2 J j
=T Emma“ (K+K ”T

T . .2
2 2T'"1K| iln<f f(x) 2 (l + L) for some a: E [—n,0]
3’ _77

5
> -K ' f .
—3 IglISnﬂx)

Therefore,

lim sup P inf inf
T—>oo ¢1<‘K¢2<-K

T . . .
T_1/2 :Sgn (ij - 010 - 50% - T_1/2(¢1 + 452%) “5% S 0) = 0-
i=1

The above two cases imply that lim suquoo P(T1/2(|& — (10] + If} — ﬁol) > K) can

be made arbitrarily small by choosing K large enough. C]

Lemma 7. Under Asaumptions 1 and 4,

 

—1
T1/2(& — a0) _ T T 1;; \
T1/2 * _ _ 2 f(O) T+1 (T+1)(2T+1)
(5 ﬁo) 2 6T / (3'27)
0WT(1) ‘ 0p(1)
X +
oWTm — oT-l 22-11 Warp opu)

117

Proof of Lemma 4. Let2

. Tm“! - 00)
’Y = . -
TW - ﬁo)
Note that

—1/2 1‘ ._ . _ “ '

T‘l/2 2:le {~Esgn(ij — d — 8f.)

: zcr-l/2 2$=1(F(0) — F(é + 85; — ao — 50%))
2T-1/2 )3le gm) — F(c‘r + 35'- — ao — 30%))

= -2T‘1/2 Zimo) — f(O) + f(0))(d + 8% — a0 - 30$)
-2T-1/2 2321 We» — W) + f(0))(d + 3;. — a0 _ 50;)

= —2f<0>T—1/22:}‘=1<é +Bl — ao - 50%)
—2f(0)T‘1/2 ELI {-(é + 3% - <10 — 30%)

+ —2T-1/2::,-T=1(f(ej) - f((»)(a +64} - ao - 50%)
—2:r-1/2 23%;] 71mg) — i<0>><a +3l~ — ao - Bel)

= -919 T Lid Tl/zm - 0°) + o
T 11%; (T+l%(r}2T+l) rIl/zw _ 50)

since max1gng|Ej| S maX1gng I(é — a0) + (3 - ﬂolj/Tl S Id - arol + IB - 50!

 

 

2so that

erl

. _ . «j '
GTUKY) = T ”2 j;(sgn($7‘j - 01 - Bf) - Sgn($Tj - a0 - 50%)),
[VT]

. _. . ~J'
EGT(r,7) = T 1/2j§_:lEsgn(m — a—ﬂi).

118

= 0p(T—l/2) by Conjecture 1, and “133(13ng |f(§j) — f(0)| -—) 0. Then,

oar-“2) = TM 21:35ng — é _ 67;?

T4” 2}; i~ 89(ij — é — M»)
= (HT( (1.7 -EHT(1
T‘ l/2 __ _

Zj=1sgn($rj a0 ﬂoﬂ +EHT(1:’7)
T—l/2 Zj_1 fsgn :L'Tj - ao - ﬂog»)
O'WT(1)

0WT(1)—0T" lzj_1WT( f)

 

_2 __(__0) T 151—1 T1/2(c‘r - 00)
T 1%; (T+123(121T+1) T1/2(3 _ HO)

Therefore, we have

 

T1/2(5I-010)
T1/2(B-ﬁo)
—1
T+1 W 1
~54— T 7; ”‘2 - we
M) T;1 (+351, +2 aWT(1)—aT-1$,-=1WT(%)
-—l
1 1 .1, aW(1)
_._)._.___.
2f(0) % :1; 1)_af01Wrdr
C]

Proof of Theorem 1. First, we derive the asymptotic distribution of Tl/zSTt. Second,

we show the consistency of the long run variance estimator. Put together, we have

119

the limiting distribution of the indicator KPSS statistic with a time trend, ET.

[rT]
Tl/zzsgneTj—a— 3— T)
j: —1
W]
=(GT(r ¢>— Earm- ¢>> +T 1/2jzzﬁlsgnon — a0 —30T)

l
+ (T—l/ZJ zlrlela 53 (E sgn(:ch - Oz- 5%) )) lazaoﬂzﬂo) ((5! _ (10))

 

 

T4” Zszllﬁ (E 8811(1‘Tj - a - [3%) lam,0 ﬁzﬁo (3 - 30)
W]
=(GT(7‘ ¢)- EGT(7' ¢)) + T ”22881“ (IBTj - 00 - ﬂoT)
j: —1
[rT] ,
- 2fwwl/2ZM-ao)(0-2f)Cl’l/221T-(3- 30)
.7= -1 j: —1
[TT]
=(GT<r ¢)— EGT<r ¢>>> + T ”2 ngn( n, — a0 — 3oT>
j: —1
- 2f(0)[-T-.-TT]T1/2(a —ao> — 2f(0 0)ererng + 1)T1/2(3 — 30)
= op(1>+ 0W1"(7")+(#1l — 3W? +1 )T"”2 ngnej)

 

T .
+ (2%] _ dang] + 1))T—1/2Z% Sgn(ej) by (3.27)

 

 

 

 

 

'=1
= (To) + 0WT(r) + (#31 — 3["'T:§:Z] + 1))0WT(1)
+ (9%:51 _ 6mg] + 1))0 TIZWTQF) >+ (W
= op(1)+ aww) + (”fl - 3["T](;;] + 1))0WT(1)
(£13.73; + 6er1<331+ 1))T_1§:10WT(%)
J:

1+ 0(W(r) + (2r — 3r2)W(1) + (-6T + 6T2)/1 W(TW) = 0V2“),
0

120

where the op(1) term is uniform in r. Note that for each j = 1, ..., T,

._ _ j _ _ ~ “ _ _ l
Esgn(a:TJ a ﬁ— T=)a -ao,ﬁ= 30 —1 2F(a+ﬁ ao éoT) (3.28)

=1 — 23(0) — 2<a+3T — a0 — 3oT)f<3+3T — a0 — 3oT>.

ﬁr».

where F is the cdf of ej. When 61 and B are consistent, f (6 + ﬁTv — a0 — BOT-v) would

be asymptotically equal to f (O)

0 J
a; ESEDCUTJ _ a ‘ 'BT)la=ao,ﬂ=BO

__ 3- - i =-

_ 2f(a + 3T (10 ﬁOT)!a=ao,ﬁ:ﬁ0 2f(0), (3 29)
a __Esgn($ 0‘31” .
6? TJ_ T a=aoﬁ=ﬁ0

__1 i_ — i --1

— 2Tf(a+ﬂT ao ﬂOT)la=a0,ﬁ=BO — 2Tf(0).

Also, note that

T t
— —T ”2 gng) T“1 (%)(:’1/2:: ngn(ei)) + 019(1) (3.30)
j=1i=1

121

since

t
Z: Sgn(€i)

i=1

Ms

K).
II
H

= sgn(€1)+(sgn(61)+ ssn(62)) + - ' ' + (8811051) + 8311(62) + ° ' ’ + 58’1“”)
T

(T- j+ 1) Sgn(€j)
:1

T T T
=TZIS sgn( ej) -Z(J’ - sgn(ej)) + ngnkj),
j=1j=1 j:1

K).

and
3 T t
T'Z Z Z sgn(e,)
j=li=1
T T J T
-T—1/2ngn(e )—T 1/2Zngn(e )+T lT ”2289(9)
1:1 j=l 321
T T
— T 1/2Z:sgn(e )- T 1/2§_:%sgn(csj) +op(1)

K)
H
H
K)
II
g—n

Since the sgn function is regular (Park and Phillips 1999, p. 272), we apply
Theorem 3.2 in Park and Phillips (1999) to derive the limiting distribution (3.5):

0271-2 2 STt

—d—>/01 (VI/(r) (2r - 37‘ 2)W(1) + (—6r + 6r2)/01W(r))2d'r

Next, we will prove the consistency of 62. First, note that for t = 1, - -- ,T,

(3.31)

, At
Sgn($t — a — ﬂ?)

= (M7) — Eyt(’7)) + Esgnm " 5‘ ‘ 3%) + Sgnm _ a0 _ 50%)

122

= (My) — mm» + (1 — 23(3 + 3T

- 00 - 30%)) + Sgn($t - Cto - 50%)
= (My) — Eym» + (1 — 23(3 + 3T — ao — 30T)) + sgn<en

= on + th + Ct.

Then,

 

T T
A _ t— s
02 =T l E ,2 :k( 7T )(aTt+th+Ct)(aTs+st+Cs), (332)

which will be shown to be asymptotically equivalent to

 

T422142;

t: 13:1

S)CtC3

so that

 

Then, by Theorem 2.1 of de Jong and Davidson (2000),

 

T T
71122qu T)ctcs——>a2.

t=ls=

What is shown below is that all the cross products in the right hand side of (3.32)
except the term involving ctcs are 019(1) as T -—> 00. First, note that, for T large

enough,

sup Md = sup 1— 2F(a +BT— a0 — ﬂoT)
13th 13th

.t ~t t
= sup 2d+ﬂ——ao-5o—)°f51+ﬁ—‘T010‘30—)
lgth ( T T ( T T

 

123

s 2 sup m2) (Ia—a0: + IB—aol) = 0p(T’1/2),
lesrl
since for large T and consistent d and ,3, f (5: + ét/T - a0 — ﬂot/T) converges
to f(O) uniformly in t so that the above inequality holds for |a:| g 17. Second,
T'l/2 2&1 laTtl = 0p(1) by Lemma 3. Then,

 

 

 

 

 

 

             

 

             

 

T—l Z 2: Mt; S)aTtaTs
t=13=1
=T’IZZ/_oo eXp<i€t; Ts)¢(€)d5'aTtaTs
t=13=1 °°
00 T
=T-1 :aTtZaTseprs“ Dame
°°t= 1
00 T
gT‘l Zlanl ZaTseXP(: Dam) d5
°°t= 1
00 T
ST_1 ZlaTt|Z|aTsl WE)
t: 1 2
4 W(T-uiw)
~00 t=1
=0p(1)
Tdii t—s)
k( )aTths
t=13=1 7T
T
T_l§: —"s')st
t=1
<T <§ZT3 —)T1/2 811 lel
_ t=l l<sI<)T s
3 T 1/2 —1/2
ST 221ml 2‘ :i-lrr ~0p<T >
t=1

124

T

T
=T-1-T‘1/2ZIaTtI- Z

t=l j=-T

 

14%)] -0p<1>

= 0100) - 2} 010(1) ——— 0pm,
T

T1232“;

t==131

— —T‘l 2T3: /_°° exp (TC7 Ts» «mm - aths

t=ls=1°°

= T-1 °° {his exp (is (ti—D «mode

7T

 

s)aTtCs

 

 

 

 

 

<T" [:élanl Zesexp(i€ (JT ))w(od5
= T-1/2glaTt|£:T_1/2 sacs exp (T (t JT 3)) T(é) dr

 

 

=Top<1) 010(1) = ope),

             

           

 

 

       

 

 

 

 

T_ st
t: 13: l
T
ST_IZ
T T t—s
ST-l sup |le SUP lel k( )
lgth t lgsST 3 gal; 7T
T T J
< sup um: supT IbT IT 1 im—
1<t< <T l<3< 3 ,2;ng 7T
< sup |le sup |le )l
1<t< <T tlgs <T sngM] 7—T
_ _ 7T
<0p(T ”2+0 (0T 1/2>-7T=0p(—T—)=op(1),
T-

                   

t==ls

125

 

T
ST—IZ(1 8 Cs)t—3
t=1

T
< sup |an T (Zlcsl 2M

             

 

 

 

 

1<t <T _1

 

 

 

Partial Proof of Conjecture 2. We provide a partial proof for (T‘l/zd,T‘1/2B)’ —d—)
(A, B)’ for some random variables A and B. In order to show that T'l/Zd and
T’l/ZB are 0,,(1), we need to consider four cases as in the proof of Conjecture 1.
However, we only provide two cases in which both T"1/ 261 and T‘l/Zﬁ are either
greater than K or less than —K. First, we show that the probability such that both

T"1/2& and. T‘l/zﬁ are greater than K goes to zero as K —> 00 and T —> 00. Note

that
1 T—1/2 j
sup sup T sgn(T xT- — a — b—)
a>K b>K :31 J T
T j
_ -l -1 2 .
_ T :sgn(T MT] — K — K—T-)
d 1
—+ / sgn(AWoo — K — Koala
0
E T1(K)
and
sup supT lz:sgn(T 1”2:513- —a—b-j-)J—
a>K b>K T T

j: -—1

126

"3|“

T .

_. _. J

=T 12 :sgn(T 1/2ij—K—K?)
i=1

1
—"+ f sgn(AWe) — K - Keats
0

E T2(K),

where the limiting distributions of T‘1 2121ng (T—1/2ij — K — K j /T) and
T‘1 2L1 sgn(T‘1/2ij — K — Kj/T)j/T are obtained by applying Theorem 3.2 of
Park and Phillips (1999) because the sgn function is regular (Park and Phillips 1999,
p. 272).

Then, the probability with which T‘1/2c‘r and T-1/2B are not bounded in the

limit is calculated as follows:

P(T‘l/Zé > K and Two“ > K)

T .
= P( sup sup T‘123gn(T—1/2xTJ-— a — b-J—) 2 O)

G>K b>K j=l T

T .
-1 —1 2 .7 .7
+ P( sup sup T E sgn(T / xTJ- — a — of)? Z 0)

O>K b>K j=1

—* P(T1(K) Z 0) + P(T2(K) 2 0)-

Note that as K —> oo, sgn(AW(€) — K - K5) —-> sgn(—oo) = —1 so that T1(K) —p—->
f01(-1)d€ = _1 and T2(K) .1) f01(-§)d§ = —0.5. This implies that P(T1(K) _ O)
and P(T2(K) Z 0) will go to zero as K —> 00. Therefore,

lim sup lim sup P(T'l/zo‘z > K and T—l/zﬁ > K) = 0. (3.33)

K—+oo T—)oo.

127

Similarly, it can be shown that

limsuplimsup P(T_1/2ci < —K and T—l/zﬂi < —K) = 0.

K—)oo T—>oo

(3.34)

Note that

'f le 1/2-— —b1
aimirix 2,3811” a T)

_ —1 —12 , j
—T éngT /$TJ+K+KT)

d l
—~>/0 sgn(AW(€) + K + K£)d§

E T3(K)

and

‘f le 1/2.— —b1
came 2333“” a T)

HI“

= T“1 ngn(T—1/2TTJ- + K + K%)%

d l
—>/O sgn(AW(€) + K + K§)€d€

E T4(K).

As K —+ 00, T3(K) —p—> 1 since sgn(/\W(£)+K+K£) —+ sgn(oo) = 1 and T4(K) —p—>
0.5 since fol sgn(AW(€) + K + K§)§d§ —> fol gag = 0.5. Then,
P(T"1/2& < —K and TT—l/ZB < —K)

=P( ian inf T Ingn(T T1/2ij—a—bl

<0
a<—K b<—K T)_ )

128

T .
- - -1 —1 2 J J
+ P(aén—fK b<1n_fKT jizlsgn(T / ij — a — bf)? g 0)

—> P(T3(K) S 0) + P(T4(K) S 0).

Since T3(K) 11> 1 and T4(K) 11) 0.5 as K —+ 00, P(T3(K) g 0) —> o and
P(T4(K) S O) —+ 0 which implies (3.34). Therefore, conditional on proof of the

other two cases, T-l/za and T‘1/2B are 0p(1). E]

Proof of Theorem 2. Let

. (3.35)
T-1 23,21 sgn(T-“2m — T'Wd — T—l/W)

T’1 ZT=1sgn(T"l/2$Tj - T’l/Zd ‘ ”Fl/23TH”

where d, = (a, B)’. We rely on Theorem 2.7 of Kim and Pollard (1990) to ensure
that (T’l/Zd,T‘1/23)’ converge to the solutions to the asymptotic version of (3.35),

Q((A, B)’ ) for some random variables A, B where

1s nAW —A-B d
Q<<A,B>'>= f” g ( (5) 5” . (3.36)

* fol sgn(AWe) - A — Bards

As noted in the proof of de J ong, Amsler, and Schmidt (2002), in order to use Theorem
2.7 in Kim and Pollard (1990), IQT(0)| has to go to 00 as |0| —+ 00, which does not
hold. But, this can be ﬁxed by considering \II'1(|QT(-)|) where \I' is the cdf of normal
distribution as |QT()| is bound between zero and one.

Note that for any (a, B)’ E R2,

QT<(a, W) —"-’> QM BY). (33?)

129

Although this does not follow directly from the continuous mapping theorem due to
the discontinuity of the sgn function, a continuous function arbitrarily close to the
sgn function can be used in the place of the sgn function, which is the argument used
in Park and Phillips (1999).

Now, we will prove the stochastic equicontinuity of QT(9) on <I> = {(61, 02)’ E R2 :
—K S 61 S K, —K _<_ 02 S K}, thereby establishing that QT(0) => Q((A, B)’) on <I>.

As the equations below get longer, we define some notation for substitution:

.. — - — j
=1,Tj = sgn(T 1/2131‘3' - T 1”0112‘ — T UzﬂlTT)

_ _ _ J
— sgn(T 1/21'Tj — T ”2le - T 1/2/62TT),

—~ —- _ _ J
=2,Tj = sgn(T 1/233Tj — T 1/2C¥1T - T l/zﬁzTT)

- sgn(T-main - T—1/2012T - T_1/252T%),

-1/233Tj - a2 - 52%)-

A1,Tj = sgn(T
A21]- : sgn(T—”23:17 — a1 — (22%) — sgn(T

First, for 0 E <I> and 0’ E <I>, we prove the stochastic equicontinuity of Q1T(-) by

showing that

(lirrblim supP SUP lQlTlo) — Q1T(9’)l > 6
—-> n—>oo
o,o’:|91—e’1|<5;|92-6§|<5 (3,38)

s 1imlimsupP(6(1 +6) sup |L(1,s)| > a) = 0.
6—>O 11,—+00 sE[—K,K]

130

Note that

Q1T((01T:51T)I) - Q1T((0‘2Ta ﬁle')
= (Q1T((alTnBlT)’) - Q1T((0‘1Ta52T)I))
+ (owuam ﬁm’) — Qwuaw. 5sz (3°39)

T T
= T_1 Z 51,117+ T—1 252,77.
j=1 j=1

—1/2 —1/2

By Conjecture 2, we can replace T alT with a1 6 [—K, K]: similarly, T agT

with (12 e [—K, K], T-1/251T with b1 6 [—K, K], and T'1/2ﬂ2T with by, e [—K, K].
Then, (3.39) becomes

T T
T-l Zth + T4 2?ij
i=1 i=1

< sup sup T‘12A11j (3.40)
b16[—K,K]b2-—|b2 b1|<6 -

+ sup sup T-IZA2,Tj .
01€[-K,K] a2zla2—a1|<6 '

Now, by dividing an interval of [—K, K] into sub-intervals of an equal length of 6,

(3.40) can be written as

sup sup sup T"1 Z A1,Tj
K Kb6'6,°+16b:b—b <6
{34-1547} 1[Z(z )] 2| 2 1|

+ su sup A
—[%-]-1:i<[K] 016[i5,(i+1)5]a2.|a:-I()11I<5 le; 2 Tj

T
S sup sup sup T—1 Z |A1,Tj|
-[§]-13is[§r] b16[i6,(z'+1)6] b2:|b2—b1|<6

131

T
+ sup sup sup T‘1 2 [112,17]

’[lg']-1.<_TS[K] 016[i5,(i+1)5]a2:]a2-all<6
S sup T121<a1+i6% <T1/2xT <a1+(z' +1)6-%)
{Ia-194%] ,= -1
+ sup T‘IZI (i6+b2% g T-1/2xT, g (i+1)6+b2%)

{ii-194i]
= sup ]/011(a1+i6£SWT([€TT ———])<a1 +(z+1)6€)d{

161-194%

1
+ sup / I(i6+b2§<WT([£TT 1) S(z’+1)6+b2§) d5

-[%l-lsz's[%] °

a1+(z'+1)6§ (i+1)6+621,r
su / L(1,s)ds +/ L(1,s)ds
_[K ]_ -l<i< [_Ig] al+i5§ i6+bg€
1

$65 sup |L(,8)|+5 SUP lLllislli
sE[—K, K] sE[-K,K]

which can be made arbitrarily small as 6 —+ 0. This proves (3.38). That is, Q1T(-)
is stochastically equicontinuous on a compact set <I>. In the last equality, we use the
occupation times formula as in Park and Phillips (1999). L(1, s), a local time is a
continuous stochastic process of time spent by the Brownian motion at the spatial
point 3 over the interval [0,1], and sup36[_ K, K] |L(1,s)| is a well-deﬁned random
variable.

The stochastic equicontinuity of Q2T(-) can be proved along the same lines as in

the above.

Q2T((011T, 161T),) - Q2T((012Ti 5H),)
= (Q2T((alT: ﬁlT)') - Q2T((0‘1Ti32T)I))
+ (Q2T((011Ti Ble') - Q2T((012Ti 5%),»

132

|/\

.. j
sup sup T 1 Z |A1,Tj| T
_b16[—K, K] b2 Ib2— b1I<5

T .

- .7

+ Sllp sup T 1 E |A2’le 7f;
016[-K,K] azzla2—01I<5 '=

T
T—l
< sup sup E |A1,Tj|
bIE[—K, K]b2: Ib2— b1l<5 -

T
+ sup sup T”1 Z |A2,Tj|
016[-K,K] a2zla2—a1|<6 -

since (j/T) S 1 for all t = 1, - -- ,T. Then, the remaining lines of proof for the
stochastic equicontinuity of Q2T(-) follow from those in the proof of the stochastic
equicontinuity of Q1T(.).

Next, note that the ﬁnite-dimensional convergence of T"1 23531] sgn(ij — d —
B j /T) for each 5 E [0, 1] holds because of a similar argument in (3.37) so that

[6T] . ,
T-lZsgn(T-l/ZxT,—T-1/2a—T—1/2Bl)—d>/ sgn(AW(§)—A—B§)d§. (3.41)
. T 0

From (3.41) and the stochastic equicontinuity implied by that of QT(-) in the

above, the limiting distribution of T‘3 25;] 5% is as follows:

T t . 2
T-3 J: 1(; sgn (1‘1", - CY - 8%))
T

2
=T 1 (T Ingn (Tl/2m—T-1/26—T-1/23-%))

i=1

133

Liz/01 (focsgn (W({)— é - gs) d§)2d(.

Finally, the estimate of long run variance, 62/77 is equal to

T .
7%1T—1 ngn (2377 — d — Bl)?

 

J=1 7‘
1 1 T j
+2 - T- k —
7T 2 (7T)
]—1

T'j , .z' . -z'+j
>< sgn(xTi-O-ﬂfﬁgﬂxmm)‘0‘5 T )

i=1

T+1 '
= 019(1) + MEI/l k(%)dj

_1 T—j+1 . .
X T A sgn(xmgT] — a — :66)

8(5 + 99)] dt

X _ A _
sgn (xT.[(£+§;1)T1 a T

_T_ _1-
=o,,(1)+2/7T+7T [led—(111])

l
W .L ’YT
1+1-C7 . .
x % (sgn(mnle — a -— ﬂé)

. ~ (VT

X sgn (SET,[(£+ (72: )T] — a -' 3(6 + T)))d€] dc
oo 1

—+ 2/0 k(() ../0 sgn(TT,[€T] — a — B€)2d§d(

=2meoa.

where the substitutions (j /T) = 5 and (j /'yT) = C are made.

134

BIBLIOGRAPHY

Alvarez, A., C. Amsler, L. Orea, and P. Schmidt, 2005, Interpreting and testing
the scaling property in models where inefﬁciency depends on ﬁrm characteristics,
Journal of Productivity Analysis, forthcoming.

Amsler, C., and P. Schmidt, 2000, Tests of short memory with thick tailed errors,
Unpublished Manuscript, Department of Economics, Michigan State University.

Battese, GE, and T.J. Coelli, 1988, Prediction of ﬁrm-level technical efﬁciencies with
a generalized frontier production function and panel data, Journal of Econometrics
38, 387-399.

Caudill, SB, and J .M. Ford, 1993, Biases in frontier estimation due to heteroskedas-
ticity, Economics Letters 41, 17—20.

 

, and D.M. Cropper, 1995, Frontier estimation and ﬁrm-speciﬁc inefﬁciency
measures in the presence of heteroskedasticity, Journal of Business and Economic
Statistics 13, 105-111.

Coelli, T.J., 1995, Estimators and hypothesis tests for a stochastic frontier function:
A Monte Carlo analysis, Journal of Productivity Analysis 6, 247—268.

Davidson, J ., 1994, Stochastic Limit Theory (Oxford University Press: Oxford).

de Jong, R.M., C. Amsler, and P. Schmidt, 2002, A robust version of the KPSS test,
based on indicators, Unpublished Manuscript, Department of Economics, Michigan
State University.

de Jong, R.M., and J. Davidson, 2000, Consistency of kernel estimators of het-
eroscedastic and autocorrelated covariance matrices, Econometrica 68, 407—423.

Efron, B., 1982, The Jackknife, the Bootstrap and Other Resampling Plans (Philadel-
phia: Society for Industrial and Applied Mathematics).

, 1985, Bootstrap conﬁdence intervals for a class of parametric problems,
Biometrika 72, 45—58.

 

 

, and R.J. Tibshirani, 1993, An Introduction to the Bootstrap (New York:
Chapman and Hall).

Erwidodo, 1990, Panel data analysis on farm-level efﬁciency, input demand and out-
put supply of rice farming in West Java, Indonesia, Ph.D. dissertation, Department
of Agricultural Economics, Michigan State University.

Greene, W.H., 1990, A gamma-distributed stochastic frontier model, Journal of
Econometrics 46, 141—163.

135

Hall, P., W. Hardle, and L. Simar, 1993, On the inconsistency of bootstrap distribu-
tion estimators, Computational Statistics and Data Analysis 16, 11—18.

 

, 1995, Iterated bootstrap with applications to frontier models, Journal of
Productivity Analysis 6, 63—76.

Hansen, BE, 1996, Inference when a nuisance parameter is not identiﬁed under the
null hypothesis, Econometrica 64, 413—430.

Horrace, W.C., and P. Schmidt, 1996, Conﬁdence statements for efﬁciency estimates
from stochastic frontier models, Journal of Productivity Analysis 7, 257—282.

 

, 2000, Multiple comparisons with the best, with economic applications, Jour-
nal of Applied Econometrics 15, 1—26.

Jondrow, J ., C.A.K. Lovell, I.S. Materov, and P. Schmidt, 1982, On the estimation of
technical efﬁciency in stochastic frontier production model, Journal of Economet-
rics 19, 233—238.

Kim, J ., and D. Pollard, 1990, Cube root asymptotics, Annals of Statistics 18, 191—
219. .

Kim, Y., 1999, A study in estimation and inference on ﬁrm efﬁciency, Ph.D. disser-
tation, Department of Economics, Michigan State University.

 

, and P. Schmidt, 1999, Marginal comparisons with the best and the efﬁciency
measurement problem, Unpublished Manuscript, Department of Economics, Michi-
gan State University.

Koop, G., J. Osiewalski, and M.F. Steel, 1997, Bayesian efﬁciency analysis through
individual effects: Hospital cost frontiers, Journal of Econometrics 76, 77—106.

Kumbhakar, SC, 1996, Estimation of cost efﬁciency with heteroscedasticity: An
application to electric utilities, Journal of the Royal Statistical Society, Series D
(The Statistician) 45, 319—335.

Kwiatkowski, D., P.C.B. Phillips, P. Schmidt, and Y. Shin, 1992, Testing the null
hypothesis of stationarity against the alternative of a unit root: How sure are we
that economic time series have a unit root?, Journal of Econometrics 54, 159—178.

Lee, Y.H., 1991, Panel data models with multiplicative individual and time effects:
Application to compensation and frontier production functions, Ph.D. dissertation,
Department of Economics, Michigan State University.

, and P. Schmidt, 1993, A Production Frontier Model with Flexible Temporal
Variation in Technical Efﬁciency, In H.O. Fried, C.A.K. Lovell, and SS. Schmidt
(eds.), The Measurement of Productive Eﬁiciency (New York: Oxford University
Press).

 

136

Newey, W.K., 1991, Uniform convergence in probability and stochastic equicontinuity,
Econometrica 59, 1161—1167.

Olson, J .A., P. Schmidt, and D. Waldman, 1980, A Monte Carlo study of estimators
of stochastic frontier models, Journal of Econometrics 13, 67-82.

Osiewalski, J ., and M. Steel, 1998, Numerical tools for the Bayesian analysis of sto-
chastic frontier models, Journal of Productivity Analysis 10, 103—117.

Park, B.U., and L. Simar, 1994, Efﬁcient semiparametric estimation in a stochastic
frontier model, Journal of the American Statistical Association 89, 929—936.

Park, J .Y., and PCB. Phillips, 1999, Asymptotics for nonlinear transformations of
integrated time series, Econometric Theory 15, 269-298.

Phillips, P.C.B., 1987, Time series regression with a unit root, Econometrica 55,
277—301.

 

, and P. Perron, 1988, Testing for a unit root in time series regression, Bio-
metrika 75, 335—346.

Pitt, M.M., and LP. Lee, 1981, The measurement and sources of technical inefﬁciency
in the Indonesian weaving industry, Journal of Development Economics 9, 43—64.

Reifschneider, D., and R. Stevenson, 1991, Systematic departures from the frontier:
A framework for the analysis of ﬁrm ineﬁiciency, International Economic Review
32, 715—723.

Schmidt, P., and TR Lin, 1984, Simple tests of alternative speciﬁcations in stochastic
frontier models, Journal of Econometrics 24, 349—361.

Schmidt, P., and RC. Sickles, 1984, Production frontiers and panel data, Journal of
Business and Economic Statistics 2, 367—374.

Simar, L., 1992, Estimating efﬁciencies from frontier models with panel data: A com-
parison of parametric, non-parametric and semi-parametric methods with boot-
strapping, Journal of Productivity Analysis 3, 171—203.

, C.A.K. Lovell, and P. Vanden Eeckaut, 1994, Stochastic frontiers incorpo-
rating exogenous inﬁuences on efﬁciency, Discussion Papers No. 9403, Institutde
Statistique, Université Catholique de Louvain.

 

Wang, H.J., and P. Schmidt, 2002, One-step and two-step estimation of the effects of
exogenous variables on technical efﬁciency levels, Journal of Productivity Analysis
18, 129—144.

White, H., 1980, A heteroskedasticity-consistent covariance matrix estimator and a
direct test for heteroskedasticity, Econometrica 48, 817—838.

Wooldridge, J .M., 2002, Econometric Analysis of Cross Section and Panel Data (The
MIT Press: Cambridge, Massachusetts).

137

   

\l"ﬂllljllﬂijllljﬂﬂljllﬂi”[191]:I