\u. .. i {t «£11 I .9. .1. 1.3. Er. La... um 5:35.)... A ‘ “Va: 1. .. {Ha " «unnJhan. . mawww .a. , V .1 11.1. .11.P.41 .2: it; . . . Fancy . .. x. r $.38, w; 9.... .32.: --“-- 1" MICHIGAIJJIBRARIES STATE UNIVERSITY EAST LANSING, MICH 48824-1048 This is to certify that the dissertation entitled THREE ESSAYS ON ECONOMETRICS presented by MYUNGSUP KIM has been accepted towards fulfillment of the requirements for the PhD. degree in Economics We SUL Major Professor’s Signature mm w. 2005 Date MSU is an Affirmative Action/Equal Opportunity Institution ._.- .fi-U-l-I-l-D-.-I-Q-Q-O-.~l-.-I-I-I-.--.-C-.-C-l-.-l-l-l-I-l-I-l. PLACE IN RETURN BOX to remove this checkout from your record. 7’1 AVOID FINES return on or before date due. ."_.. THREE ESSAYS ON ECONOMETRICS By Myungsup Kim A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 2005 ABSTRACT THREE ESSAYS ON ECONOMETRICS By Myungsup Kim Consider a simple stochastic frontier model explaining the output of a firm by y = x’fi + v — u. While 1) represents random shocks outside the control of producers, u represents technical inefficiency in the production process. In the first chapter, we wish to test whether technical inefficiency depends on observable characteristics of the firm. It is well known that two-step procedures, in which the second step is the regression of an inefficiency measure on firm character- istics, do not properly estimate the effects of firm characteristics on inefficiency. In this chapter we show that this regression also does not lead to a valid test of the hypothesis of no effect. A valid test of the hypothesis of no effect can be constructed by using an adjustment to the variance matrix of the estimated coefficients in the second step regression. Unfortunately the form of this adjustment is not distribution free. We show that this test is the LM test in the specific case that technical inef- ficiency is exponential and the alternative is a scaled exponential distribution. We also consider tests based on nonlinear least squares. These tests do not depend on a distributional assumption. There are some technical complications involved due to the non-identification of some of the parameters under the null. We perform an extensive set of simulations to compare the size and power characteristics of these tests and other similar tests, including the Wald test based on a one-step estimate of the entire model. In the second chapter, we study the construction of confidence intervals for effi- ciency levels of individual firms in stochastic frontier models with panel data. The focus is on bootstrapping and related methods. We start with a survey of various versions of the bootstrap. Then we offer some simple alternatives based on standard methods when one acts as if the identity of the best firm is known. Monte Carlo simulations indicate that these simple alternatives work better than the percentile bootstrap but perhaps not as well as the bias-adjusted and accelerated bootstrap. None of the methods yields very accurate confidence intervals except when the time- series sample size is large enough, or the error variance is small enough, that the identity of the best firm is clear. We also present empirical results for two well-known data sets. In the last chapter, we consider the problem of testing the null hypothesis that a series is stationary against the unit root alternative. A standard test for this null hypothesis is the KPSS test, which is based on cumulations of deviations from the means of the series. A paper by de Jong, Amsler, and Schmidt (2002) constructs a “robust” version of the KPSS test by using an indicator of whether the observation is above or below the sample median. This test, called the indicator KPSS test, is robust in that it does not require existence of moments of the series, yet the asymptotic distribution of the indicator KPSS statistic is the same as that of the KPSS statistic. However, in this chapter we allow a non-zero level for the series under consideration, but not a deterministic trend. The purpose of this chapter is to extend the indicator KPSS statistic to the case of a deterministic trend. The relevant indicator in this setting is whether the residual is positive or negative in a least absolute deviations regression of the series on a time trend. This chapter shows that, under the null of trend-stationarity, the indicator KPSS statistic with a time trend has the same limiting distribution as the KPSS statistic with a time trend. ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor, Professor Peter Schmidt, who taught my first Econometrics course with passion. His willingness to motivate and support my work has made it possible for me to develop my skills as a researcher. Needless to say, his wise advice and continuing counsel have been essential for the course of my studies. Also, I am enormously grateful to Professor Robert M. de Jong for his patient support and extraordinary encouragement of my learning of methodological tools. I owe special thanks to Professor Jeffrey M. Wooldridge for invaluable comments. I would like to thank the other members of my committee: Professor Christine E. Amsler and Professor Robert J. Myers. I greatly appreciate their input and time to this dissertation. I want to thank my wife, Jiyoung, for being my partner and best friend with her unending love, sacrifice and support. I am also greatly indebted to my parents and my parents-in—law who have always been very supportive of my pursuit of education. My special thanks also go to my fellow graduate students for their friendships and the staff in the Department of Economics for their jovial smile and helpful nature that have always greeted any problem or deadline. iv TABLE OF CONTENTS LIST OF TABLES .............................. 1 Valid Tests of Whether Technical Inefficiency Depends on Firm Characteristics .............................. 1.1 Introduction ................................ 1.2 Two-Step Procedures ........................... 1.3 The Scaled Exponential Case ...................... 1.4 A Test Based on Nonlinear Least Squares ................ 1.5 Simulations: Experimental Design .................... 1.6 Simulation Results: Size ......................... 1.6.1 Base case ............................. 1.6.2 Effects of changing a or 6 .................... 1.6.3 Effects of changing N ....................... 1.6.4 Effects of changing p ....................... 1.6.5 Effects of changing A ....................... 1.6.6 Effects of changing 03 ...................... 1.7 Simulation Results: Power ........................ 1.8 Simulation Results: Robustness ..................... 1.8.1 Normal-truncated normal .................... 1.8.2 Normal-gamma .......................... 1.9 Concluding Remarks ........................... 1.10 Output Tables ............................... 1.11 Appendix: LM Test for the Scaled Exponential Case ......... 1.12 Appendix: Supplementary Tables .................... 2 On the Accuracy of Bootstrap Confidence Intervals for Emciency Levels in Stochastic Frontier Models with Panel Data ....... 2.1 Introduction ................................ 2.2 Fixed-Effects Estimation of the Model ................. 2.3 Construction of Confidence Intervals by Bootstrapping ........ 2.4 A Simple Alternative to the Bootstrap ................. 2.5 Simulations ................................ 2.6 Empirical Results ............................. 2.6.1 Indonesian Rice Farms ...................... 2.6.2 Texas Utilities ........................... 2.7 Conclusions ................................ 2.8 Output Tables ............................... wi—tl-l 8 10 14 18 19 19 2O 20 21 23 26 28 44 51 60 60 61 65 7O 73 82 82 84 88 3 Indicator KPSS with a Time Trend .................. 98 3.1 Introduction ................................ 98 3.2 Asymptotic Theory ............................ 99 3.2.1 Assumptions ............................ 99 3.2.2 Indicator KPSS statistic ..................... 101 3.2.3 Conjectures ............................ 102 3.2.4 The Asymptotic Distributions of the Indicator KPSS Statistic 103 3.3 Concluding remarks ............................ 105 3.4 Appendix: Mathematical Proof ..................... 106 BIBLIOGRAPHY .............................. 135 vi LIST OF TABLES 1.1(BASECASE)a=5=—.6=0,53=,\=1,p=0.5,N-_-.-200 [E(exp(—u)) = 0.5232] .......................... 1.2 (ChangeofN)N=500,a=5=6=0,53=A=1,p=0.5 [E(exp(—u)) = 0.5232] .......................... 1.3 (ChangeofN)N-_— 1000,a :6 =6 = 0, 5,2, = A = 1,p=0.5 [E(exp(—u)) = 0.5232] .......................... 1.4 (Changeofp)p= —0.5,a =p= 6:0,6-3 = A = 1,N =200 [E(exp(—u)) = 0.5232] .......................... 1.5 (Change of p) p = 0, a = fl = 6 = 0, 0,2, = A : 1, N = 200 [E(exp(—-u)) = 0.5232] ................................... 1.6(Changeofp)p=09026262003=A 1,N=200 [E(exp(—u)) = 0.5232] .......................... 1.7 (ChangeofA)A=3,a=fl=6=0,a,2,=1,p=0.5,N=200 [E(exp(-—u)) = 0.1095] .......................... 1.8 (Changeof03)a,2, =9,a:fi =6=0, A = 1,p=0.5,N= 200 [E(exp(—u)) = 0.5100] .......................... 1.9 (Change of 6) 6 = 0.05, a = r3 = 0, 0,2, = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] .......................... 1.10 (Change of 6) 6 = 0.1, a = B = 0, 63 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] .......................... 1.11 (Change of 6) 6 = 0.15, a = p = 0, 63 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] .......................... 1.12 (Change of 6 and p) 6 = 0.1, p = 0.9, a = [3 = 0, 0.3 = A = 1, N 21000 [E(exp(—u)) = 0.5232] .......................... 1.13 (Change of scaling functions to ¢(62,-)/(1 — (6z,~))) 6 = 0.1, a = B = 0, 53 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 1.14 (Change of the distribution of u: to N(0,7r/2)+) a = fl = 6 = 0, 0,2, = A = 1, p = 0.5, N = 1000 [E(exp(-u)) = 0.5232] ............ vii 28 29 30 31 32 33 34 35 36 37 38 39 40 41 1.15 (Change of the distribution of u: to gamma(0.5,2)) oz = [3 = 6 = 0, 53 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 42 1.16 (Change of the distribution of u: to gamma(2,0.5)) a = B = 6 = 0, 6,2, = A = 1, p .-= 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 43 1.17 (Changeofp)p= 025,6. :5 = 6 = 0, 53 = A = 1, N = 200 [E(exp(—u)) = 0.5232] .......................... 51 1.18 (Change ofp) p = 0.75, a = 5 = 6 = 0, 53 = A = 1, N = 200 [E(exp(—u)) = 0.5232] .......................... 52 1.19 (Change of6 and p) 6 = 0.05, p = 0.9, a = fl = 0, 0,2, = A = 1, N = 1000 [E(exp(—u)) = 0.5232] .......................... 53 1.20 (Change of 6 and p) 6 = 0.15, p = 0.9, a = 6 = 0, 63 = A = 1, N = 1000 [E(exp(—u)) = 0.5232] .......................... 54 1.21 (Change of the distribution of u? to N(0,1)+) a = B = 6 = 0, 03 = A = 1, p = 0.5, N = 1000 [E(exp(-—-u)) = 0.5232] ................ 55 1.22 (Change of the distribution of uf to N(0,7r/(1r - 2))+) or = B = 6 = 0, 6,2, = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 56 1.23 (Change of the distribution of u: to N(1,1)+)a = B 2 6 = 0, 03 = A =1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ................ 57 1.24 (Change of the distribution of u: to gamma(0.5, \/2)) a = [3 = 6 = 0, 5,2, = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 58 1.25 (Change of the distribution of u: to gamma(2,1/\/2)) a = fl = 6 = 0, 53 = A = 1, p -.= 0.5, N = 1000 [E(exp(—u)) = 0.5232] ......... 59 2.1 Biases of Fixed Effects Estimates ...................... 88 2.2 90% Confidence Intervals for Relative Efficiency (if) ........... 89 2.3 90% Confidence Intervals for Relative Efficiency (if) ........... 90 2.4 Bias Correction in the 300 Bootstrap Intervals .............. 90 2.5 90% Confidence Intervals for Relative Efficiency (7'?) ........... 91 2.6 Biases of Fixed Effects Estimates (Case that u,- are fixed over replications) 92 viii 2.7 90% Confidence Intervals for Relative Efficiency (if) (Case that u,- are fixed across replications) ......................... 93 2.8 Estimated Efficiencies and 90% Confidence Intervals: Indonesian Rice Farms 94 2.9 90% Confidence Intervals: Indonesian Rice Farms ............. 95 2.10 Estimated Efficiencies and 90% Confidence Intervals: Texas Utilities . . . 96 2.11 90% Confidence Intervals: Texas Utilities .................. 97 ix Chapter 1 Valid Tests of Whether Technical Inefficiency Depends on Firm Characteristics 1 . 1 Introduction In this chapter we consider the stochastic frontier model 316 = $23 + ”z” - Hi, “2' Z 0- (1-1) The frontier is y: = xi-fl + U, and u,- represents technical inefficiency. We follow the literature in assuming that the x,- are “fixed” and the v,- are i.i.d. normal. Now we ask whether u,- depends on some variables 2,, which could be characteristics of the firm or measures of the environment in which it operates. Specifically, we wish to test the hypothesis that u,- does not depend on zi. One way to do this is to assume a specific model of the alternative hypothesis that shows how the z,- affect the 11,-. For example, we could assume: u,- = exp(z,’-6) - uf, (1.2) where the u: are i.i.d. according to some specific distribution, like exponential or half- normal. Now we can estimate 6 by MLE and do a Wald test of the hypothesis that 6 = 0, which corresponds to the hypothesis that 2,- does not affect ui. In the frontiers literature this would correspond to what is called a “one-step” procedure (e.g., see Wang and Schmidt (2002)). Models of the form of (1.2) have been considered by Reif- schneider and Stevenson (1991), Caudill and Ford (1993), Caudill, Ford, and Cropper (1995), Wang and Schmidt (2002) and Alvarez, Amsler, Orea, and Schmidt (2005), among others. We will follow the literature and call the multiplicative decomposition of u, (as a function of 2,- times a random variable that does not depend on 2i) the “scaling property.” An objection to this type of procedure is that it depends fundamentally on the alternative chosen. Under the null the scaling function exp(z£6) really does not exist and so there are many more or less equally plausible alternatives. Partly for this reason, one could consider a “two-step procedure” in which Step 1 would be to es- timate the model ignoring the 2) to obtain efficiency measures 121', and Step 2 would be a regression of fr,- on 2,- (or some function of z,). It is well known (Wang and Schmidt (2002)) that when 2,- does affect 14,-, there are serious biases in both steps, so two-step procedures are not recommended. However, under the null that 2,- does not affect 11,, these biases do not arise, and it is not known whether a two-step procedure provides a valid test of this null hypothesis. One contribution of this chapter is to show that a two-step procedure that uses a standard t or F test in the second step does not yield an asymptotically valid test. However, the test becomes valid if we use a corrected variance matrix for the second-step coefficients. Unfortunately, the form of this correction is distribution-specific. This raises the question of whether a test based on such a corrected two-step procedure entails a loss of power. We do not have a full answer to this question. We do show that, in the case that the alternative is the scaled exponential distribution, the LM test of 6 = 0 is asymptotically equivalent to the corrected version of the two-step procedure. Therefore at least in this case the two-step procedure entails no loss in asymptotic local power. If we assume the scaling property, as in (1.2) above, the stochastic frontier model can also be estimated by nonlinear least squares. Testing whether 6 = 0 based on nonlinear least squares involves some technical difficulties, because the mean of u? is identified separately from the overall intercept under the alternative but not under the null. We show how to deal with these difficulties and obtain an asymptotically valid test. In the last section of the chapter, we report the results of an extensive set of simulations that investigate the size and power of these tests. 1 .2 Two-Step Procedures We consider the stochastic frontier model (1.1). As stated in the Introduction, we treat the 1:, as fixed and we assume that the v, are i.i.d. N (0, 0,2,). We also assume that the u: are i.i.d. with some specific distribution, such as exponential or half-normal, that is known up to some parameters. Finally, the zi variables whose influence on u,- we wish to test are independent of v; and uf. For the purposes of this section, these assumptions could be weakened somewhat, but we would need the stronger set subsequently, so we simply make them here. To motivate the tests considered here, suppose that u,- were observed. Then we could regress u,- on 2, and test the hypothesis that the coefficients equal zero by standard methods. More precisely, the regression would have to include an intercept because E(u,-) is not equal to zero, and we would do an F-test on the coefficients other than the intercept. Now let 1,6 equal the unknown parameters of the problem. These would be [3, 03 and whatever parameters there are in the distribution of uf. Step 1 of the two-step procedure results in an estimate 1/3 which should be consistent and asymptotically normal (subject to the usual regularity conditions). We then obtain an estimate of 11,-, say 0,-(26). In the stochastic frontier model, fr,- is the expected value of u, conditional on e,- E v, — 21,-, evaluated at the sample estimates, as suggested by Jondrow, Lovell, Materov, and Schmidt (1982). It should be noted that, even if 26 were known, 21,-(16) would be E(u,~|e,-) which is different from u,. However, 6,0,6) is a function of 6,, which is i.i.d. and independent of 25,-. So, if we regressed 16,-(16) on intercept and 21-, an F- test of the significance of the coefficients of 2,- should be asymptotically valid. The A question is whether this is still true when u,(¢) is replaced by ugh/2). Unfortunately, the answer is no. A valid test must account for the estimation error in 1]). To show this, we could consider a regression of 0,-(2/3) on intercept and zi. However, it is simpler to demean the 0,- by switching our attention to (MW) = E(u,-|c,-) -— E(u,-), with 6,- : 05(26) being the corresponding estimate evaluated at the first-step estimates A 1,6. So now we simply wish to test whether 7 = 0 in the regression: A b,- = 2:7 + 12,-. (1.3) Our test statistic will be ’7’ [WM-1'), where ’7 is the least squares estimate from (1.3), and this should be asymptotically x2, if Varfi) is properly calculated. This is a “generated dependent variable” problem that can be analyzed by meth- ods similar to those used for the “generated regressor” problem (e.g., Wooldridge (2002)lpp- 139-14ll)- We have bi = bill/J) = f (yr, 132', 111) and 56 = 55017) = f (315, 336, If). By the Mean Value Theorem, bi— — bi. + V16f(yii$ia ¢)I(¢— 11)) (14) where 16 is between 16 and 16. Therefore We Va: 1 N 2: ZiV¢f(yi,xe,16)'\/1V(d3 — 10)] 05) From the last line of equation (1.5), we can see immediately that the term involv- ing the estimation error in 16 will be relevant unless E[z,-V¢ f (yi,:r,',16)] = O. (In this exceptional case, N '1 29;] ziV¢ f (yi,:c,-,16) A 0 and the last term vanishes. Otherwise it does not.) To proceed further, we use the same device as in Wooldridge (2002), and assume that N x/N(16— N;ri(16 )+ op(1(1.6) El” where Evy-(1,6) = 0. We will be more specific about the form of ”(16), below. Then ‘=1 (1.7) It follows from a central limit theorem applied to (1.7) that x/N’y —-> N(0,B‘1AB“1) (1.8) where B = Ezizg, (1.9a) A = E[(zibi + GTiXZibz' + Guy], (1.91)) G = EziV¢f(y,-,xi,16)’. (1.9c) Also, all of these quantities can be consistently estimated by the corresponding sample quantities: B: N“1 2?: 12,2; ,A= N—1 2,1: 1[(z,-b +Gr¢)(z,-bi +Gr,)'], G = N71 217:1 ZiV¢f(yi,$i,¢) . The remaining detail is an expansion for ri. The first-step MLE 16 satisfies 23,-”:1 31(16) = 0, where 3,-(16) is the score function for observation 2'. (That is, 5,-(16) is the derivative with respect to 1,6 of the ith observation’s contribution to the log likelihood). Then another Mean Value Theorem expansion yields N A N N u A = 2 31(1)) = 23.06 + Evian/1x16 — 6), (1.10) i=1 i=1 where 16 is between 16 and 16. So i-l W i-l N- - (1.11) 1 = —— I°‘1s.(1b)+ o (1) x/TV' Z3, ” where , 1 1° = E86(¢)86(¢)' = —EV1686(¢) = ngnoo NI’ (1-12) and I = E(V¢1nL)(V¢lnL)’ = — E vi}, 1nL. (1.13) I is the information matrix for the first-step MLE problem with a log-likelihood of lnL, and 1° is the limiting information matrix. In terms of the score, I = Z£1E3i(16)si(16)’ = — zglEV¢si(16). Therefore, in (1.6) and the subsequent expressions above, 13(16) 2 I°’ls,;(16). In terms of sample quantities, 1",- = T°’lsi(16) where 2° 2 N’1 2,”; 31(16)s,-(16)’. We note two things. First, the standard (naive) test of 7 = 0 that ignores the effect of estimation error in 6,: corresponds to omitting the terms corresponding to Gr,- in (1.9b). This test will be invalid unless G = 0. Since G = EziV¢f(y¢,x,-, 16), this condition will hold if z,- is independent of x,- as well as of v,- and 11,. However, it will generally fail if z, and x,- are correlated. Second, the “correct” test is not difficult. However, unsurprisingly, the form of the correction depends on the distribution of 11,-, since that influences the nature of the first-step MLE problem. There is no simple, distribution-free correction. 1.3 The Scaled Exponential Case In this section we consider the special case that 11.1 follows a scaled exponential distri- bution. That is, u; = exp(zz'-6) - 11?, as in (1.2), where u? is distributed as exponential with parameter A. We will derive the LM test of the hypothesis 6 = 0, and Show that it is asymptotically equivalent to the (corrected) two-step procedure of the last sec- tion. This shows that there is at least one case in which the two-step procedure does not entail any loss of (local) power, compared to the usual Wald-likelihood ratio-LM trinity of tests. For the normal-scaled exponential model we consider, the pdf of the composite error (e,- = v,- — 11,-) is: 1 61 012, 51 0v . = ____. __ . 1-11) _ __ “61) Aexp(z£6) exp(Aexp(zz’.6) + 2A2 exp(2z,’-6)) ( (av + Aexp(zz’-6))) (1.14) where is the cumulative distribution function of the standard normal distribution. Note that under the null of 6 = 0, E(e,~) = — E(u,-) = —A and Var(e,-) = 03+A2. Also, the distribution of u,- given q is N ( —c,- — 0,, / (A exp(z’-6)),0 3+) where “+” represents truncation on the left at zero. From (1.14), it follows that the log-likelihood function lnL(6, 5,03, A2) = lnL(O) is given by: lnL(l9 =2—Zln(A exp( 2,- 6) )+ Z_______. lAexp(z’6) Z 2261:8110- (- #5)) (1.15) The generic form of the LM statistic is LM = V9lnL(6)'-I"1(6) . v9 lnL(6). (1.15) Here 6 is the MLE subject to the restriction 6 = 0; 1(6) is the information matrix evaluated at 6 = 6; and V9 lnL(6) is the score function, V9 lnL(6), evaluated at 6 - 6 If we partition 6 = (6’,16’), where 16 = (6’, 0,2,, Az)’, then ‘7‘“an l’(6)- 1‘” I” . (1.17) V0 lnL(6) = 1 V1), lnL(6) L)“; 19,19 It IS a standard result that V9 lnL(6) is equal to zero for those elements of 6 that are unrestricted. That is, V9, lnL(6) = 0. Therefore ' ° [I_1(9~)l66 ' 1761111467) (1.18) LM = V5 lnL(6) = V51DL(6)’ - [i155 — 1.616112161166l—1 - V5 lnL(6) where Z... stands for the *,* block of I, evaluated at 6 = 6. A straightforward calculation reveals that N ~v* ~ ~2 V5lnL(6) =22: (1%: 1___§%_1) (1.19) where g,- = (16(6‘9/09 +0},/A))(1 - (€,-/0‘v +0"v/A))"1 and 16 is the pdf of the standard normal distribution. Note that 1 N - ~ - 63 1 N V6 lnL(6)= :\ -Z 2,; Uvéi — 62' — T — 2; 221:5; (1.20) i=1 where f),- = (01,51- - é; — fig/A — A) is (E(u,|e,-) — E(u,-)) E 0,- evaluated at 6. (This follows because E(u,-|c,~) = 0v(£,- — (69/0,, + 0,,/A)) while E(u,-) = A.) Note that apart from the scalar l/A, V5 lnL(6) equals the numerator of W6 2 (N ‘1 26.1.1 z,z£)‘1 N ’1/2 2,1:1 2,6,. So the LM test must be asymptotically equivalent to a properly constructed test based on the two-step estimator 6. Some further algebraic details of this equivalence are given in the Appendix. Basically, the naive test that ignores the effects of estimation error in 61' would correspond to omitting the terms 151626 91,1?ng in (1.18). These terms correspond to the same correction as was created by the terms Gr,- in (1.9b) above. This section’s result (that the LM test is asymptotically equivalent to a prOperly constructed test based on a two-step procedure) holds for the case that u,- is expo- nential with a scaling factor of the form exp(zz'6). So far as we can determine, it does not hold for the scaled half-normal case. If it does not, then in the half-normal we would expect the LM test to be better (in the sense of asymptotic local power) than the two-step test of the last section. An interesting question for further research is whether we can identify a class of distributions for which a result like the present one holds. 1.4 A Test Based on Nonlinear Least Squares In this section we continue to assume that the stochastic frontier model (1.1) is correct. We further assume that the scaling property (1.2), with an exponential O scaling function, holds, so u,- = exp(z,’.6) - 11,-. However, now we do not make any specific distributional assumption about the uf. We simply assume that they are i.i.d. and independent of 03,-, z,- and 1),. Let p E E(u:’) = E(u;-’|:c,-, 2,). Then E(y,-|x,-, 2,) = 3:26 — p - exp(z,'-6), (1.21) 10 or equivalently 111 = $23 - u - exp(zfié) + w: (1.22) where E(w,-|:r,-, 2i) = 0. This model can be estimated consistently by nonlinear least squares, as has been noted by Simar, Lovell, and Vanden Eeckaut (1994), Wang and Schmidt (2002) and others. This raises the question of whether we can test the hypothesis 6 = 0 based on the nonlinear least squares regression. There is a non-trivial problem because the parameter p is not identified (separately from the intercept in the regression) when 6 = 0. To see this clearly, we explicitly distinguish the intercept from the rest of 213,-: 2:; = (1,122"), 6’ = (a, 6"") so that (1.22) becomes *I y, = a + x, 6* — p - exp(zg6) + 103'. (1.23) Alternatively we can write this as 111 = (a -u) +1066“ +u(1-exp(z£5))+wi. (124) From (1.24) it is clear that (oz —— p) is identified, but 6 is identified only when 6 ¢ 0. In cases such as this, in which some parameters (“nuisance parameters”) are not identified under the null hypothesis, standard tests like the Wald test or the likelihood ratio test are not asymptotically valid. A standard reference on this problem is Hansen (1996). A Wald test in this context would consist of estimating 6 and then testing whether it is significantly different from zero, using a statistic of the form 6’ [Var(6)]—16, where 6 is the NLLS estimate and Var(6) is the asymptotic variance matrix of 6. Such a test is not valid in this context because the usual Var(6) that would be valid when 6 79 0 is not valid when 6 = 0, because of the non-identification 11 of 11. It is interesting that for our problem (though not for general problems) an asymp- totically valid test can be derived from the LM (or score) test principle. We follow the discussion in Wooldridge (2002)[pp. 363-369]. Let the NLLS criterion function be 1 N 1 N = N Z q(wi,0) - —-fi 2(2 — xlfl + uexp(2£5))2, (1-25) where 6 represents 6, p and 6, and mi represents yi, mi and 2,. Then the LM or score test is based on the quantity V5QN(6), that is, on the derivative of Q N(6) with respect to 6, evaluated at the restricted estimates 6. We might expect this approach to fail here because [a is not well defined. However, this turns out not to matter. Doing the apprOpriate calculation, View): % (y.-—x213+uexp> 1126) EM? and therefore (since 6 = 0): N N wen/(é) = N 22(16- — (a— 6)— 0’6“) >022.) = 92921212 zit-(621). (1.27) z: = Here 6 = ((61 — ii), 6*’)’ is just the coefficient in a regression of y on X, and 11),: 2 22:6. In matrix form, the sum in (1.27) is equal to y'MX(flZ), where MX = I — X (X ’ X )‘1X ’ is the projection orthogonal to X. Note that if we regressed y on [X, fiZ], the coefficients of iiZ would be [(112 )' M X ([22 )]'I ([12 )' M Xy, so that the sum in (1.27) is equal to the random (numerator) portion of this coefficient. Therefore the LM statistic will be equivalent to an F-statistic for the significance of the coefficients 12 (say, C) of (flzi) in the regression y,- = .1326 -l- (fizz-)2 + error,. (1.28) Now, the essential point is that this F-statistic is invariant to any non-zero value of ii. That is, ii is just a scale factor for 2,, and changing fl is like changing the units of measurements of 2,. It does not affect the value of the F-statistic. (If we double [51, this will cause 6 to be divided by two, and Var(é) to be divided by four, so the scale factor “two” cancels from the test statistic.) So we can just set [1 = 1, and calculate the LM statistic as the F-statistic for the significance of the coefficients of z,- in a regression of yi on [$9, z,]. This is an intuitively reasonable result because, under the null hypothesis being tested, E(y|X, Z) does not depend on Z. An interesting and relevant fact is that the same test statistic would result if we replaced the exponential scaling function exp(z,'-6) by any scaling function g(z,’-6), where g is monotonic and differentiable at zero. The same derivation as above leads us to a regression of y,- on x,- and pg’(0)z,t, or equivalently a regression of y,- on 1:,- and 2,. This is relevant because it suggests that the OLS-based test may have reasonable power against a variety of alternatives (different scaling functions), whereas the power properties of the MLE-based tests when the scaling function is misspecified are not at all clear. We note that, if v,- and 113° are i.i.d., the error in (1.27) is homoskedastic under the null hypothesis. Nevertheless it is possible to consider a heteroskedasticity-robust test. We simply have to use the heteroskedasticity—robust variance matrix of White (1980). See Wooldridge (2002)[pp. 55-58] for details. Another thing to note is the following. The test above is the F-test for the significance of the coefficients of z in a regression of y on a: and z. This is the 13 same as the F-test for the significance of the coefficients of z in a regression of 11; on :r and z, where as in (1.27) above 171 = y -— 126. It is essential that a: be included in this regression, even though a: is orthogonal to 11'). If we regressed ti) on z only and did an F-test, this test would not be valid, even asymptotically. 1.5 Simulations: Experimental Design We wish to perform simulations to investigate the size and power properties of the tests derived in the previous sections. The data generating process for our simulations will be as follows: 3;,- = a + 631+ 11,-- exp(6zi) - u: (129) =a+6xi—Aexp(6zi)+w,, 1': 1,--- ,N, where w; _—_-. v,- -— exp(6z,)(ui — A). All random draws are independent over 1'. The explanatory variables 2:,- and z,- are both scalars, and (15,-, z,)' is standard bivariate normal with correlation p. The 12,- are distributed as N (0, 0,2,) and the u: are distrib- uted as exponential with parameter A. The random variables (113,-,zi)’, vi and u: are mutually independent. The set of parameters is therefore a, 6, 6, 0,2,, A, p and N. We chose a “base case” set of parameters as follows: 5:0,6=0,6=0,63=1,A=1,p=0.5,N=200. (1.30) We will then change these parameter values, as described below, in our experiments. We consider the following tests. WALD. For the WALD test we estimate (1.29) by MLE and then test whether 6 14 is significantly different from zero. Specifically, the WALD statistic is given by WALD = [62 (I55 —I59I;,p1Ig)] (1.31) where the notation is the same as in Section 1.3. Two different versions of the WALD statistic are computed. WALD-OPG uses the OPG (outer product of the gradient) estimate of the information matrix, while WALD-HES uses the negative Hessian estimate of the information matrix. LM. This is the LM statistic discussed in Section 1.3. The statistic is given by N' NP LM= £20,217 (155—15¢1;W) 25121 . (1.32) Air-1 i=1 >"H: Once again we have different versions, depending on how the information matrix is estimated. LM-OPG and LM-HES are analogous to WALD-OPG and WALD-HES. GDV. This is the “generated dependent variable” test discussed in Section 1.2. More specifically, GDV= \/_7[Var(\/—)]1\/_7 =2: biz; ‘1 (1.33) M2 571 3? (b.z.+GI 3.61))2 i=1 i=1 I'[:]2 Here If is the negative Hessian form of the information matrix for the first-step MLE, as in (1.11) above. We also consider the test BADGDV, which is the invalid test based on regression (1.3) above and which ignores the estimation error in 16. OLS. This is the set of tests discussed in Section 1.4. OLS refers to the standard F-test for significance of the coefficients of z,- in a regression of w on (1, Ii, 21-). This reduces to a t-test in the present case since 2,; is scalar. We use the critical values based on the standard normal distribution rather than the t-distribution but for our 15 values of N this makes essentially no difference. OLS-H is the heteroskedasticity- robust version of the test. BADOLS is the invalid test based on the t-statistic for the significance of the coefficient of I) when (3,- is regressed on 2,- (without intercept or x,- in the regression), as discussed at the end of Section 1.4. BADOLS-H is the heteroskedasticity-robust version of BADOLS. The number of replications in the experiment was 10,000, except for a few cases noted below. The outputs of the experiments are as follows. For each of the parameter estimates, we calculated their mean, standard deviation, and MSE. For the MLE of the full model (needed for the WALD test calculations), the parameters estimated are a, fi, 6, 0,2, and A. For the MLE of the model subject to the restriction 6 = 0 (needed for the LM and GDV test calculations), the parameters estimated are a, 0, 0,2, and A. Note that, in the output tables, we report the mean, standard deviation, and MSE of the estimates of A2, not A, for an easier comparison with the estimates of 0,2,. For the NLLS estimates under the restriction that 6 = 0 (which is just OLS of y,- on I), and is needed for the OLS test calculations), the parameters estimated are 17 = a — A, 0 and 0,2,, = 03 + A2. We also calculated the mean, standard deviation and MSE of the technical effi- ciency estimates for the MLE and the restricted MLE. The technical efficiency of firm 2' is TE,- = exp(—11,.) and the technical efficiency estimate (Battese and Coelli (1988)) is TEz' = E(eXP(-uz')lfi) _ (—av — «Si/av — cry/(A exp(dzi)» 1122 ' 03 (1.34) — “*1/01; — av/(A exp(ézz‘») exP ( 2 + 6‘ + Aexp(6z,-)) ° Here a, = v,- —u, = y) —a — fix) and TE,- is the expression (1.34) evaluated at the MLE estimates. By the law of iterated expectations, E(TEi) = Eexp(—u,-). However, for 16 the calculation of MSE we average the squared deviations of TE,- for TE,- = exp(—11,), not from E exp(—ui). The mean, standard deviation and MSE for TE,- are calculated by averaging across observations (1' = 1, - -- ,N) as well as across replications. We also report the correlation of TE, and TE). The is the average across replications of the correlation coefficient for a given replication. For the tests, we calculated the proportion of rejections, which is interpreted as size (if 6 = 0) or power (if 6 79 0). The size (or power) is calculated in four ways. Sizel uses all 10,000 replications. Size2 drops replications in which there was a numerical failure in the calculation of the WALD or LM statistics, due to outliers in the estimates. Outliers are defined as |6| _>_ 16, 6,, g 10'7 or in, 2 37, and A g 10‘7 or A 2 37. Size3 drops observations with negative LM statistics. These may occur when the maximization algorithm fails to reach the global maximum. Finally, Size4 drops any replication dropped by either the Size2 or the Size3 calculation. We also report the mean and standard deviations of the test statistics. This calculation was done over the same set of replications used to calculated Size4. Many of the replications discarded in Size2 and Size4 are ones in which the variance parameters (03 and A2) and 6 are poorly estimated. Very small values of A2 tended to go with very large values of 5, as the likelihood calculation seemed to try to accommodate the presence of the one-sided error exp(6zi) - 11;? by balancing a small variance of u: with a large value of exp(6zi). In these cases the variance of cf is also hard to calculate, and it is just not clear whether or not they constitute evidence against the null that 6 = 0. Dropping these cases primarily reduces the number of rejections for the WALD tests. However, except for a few parameter values (e. g. very large 03,), not enough replications were dropped to make much difference. 17 1.6 Simulation Results: Size In this section, we investigate the size of the tests. Therefore all of the cases considered have 6 = 0 so that the null hypothesis is true. All of the tests except BADGDV, BADOLS and BADOLS-H (which we will call the BAD tests for short) are valid asymptotically but we are interested in how substantial their size distortions may be in finite samples. 1.6.1 Base case We first consider the base case: a = fl = 6 = 0, 0,2, = A = 1, p = 0.5, and N = 200. The results are given in Table 1.1. The results for the point estimates are fairly unremarkable. There is little or no evidence of finite-sample bias. The restricted MLE’s are better than the unrestricted MLE’s, in terms of standard deviation and MSE, but the differences are quite small. The sizes of the various tests differ fairly substantially from each other. All of the BAD tests are indeed bad, in the sense of size substantially less than 5%. However, some of the asymptotically valid tests also have sizes that are substantially different from 5%. The WALD tests are substantially undersized. Conversely, the LM-OPG test rejects too often. The LM-HES, GDV and OLS tests have size fairly close to 5%, and the OLS-H test is only slightly worse than those three. 1.6.2 Effects of changing a or 5 Changes in a or B would not be expected to change the results, and this is true in the following sense. We did one simulation with the same parameters as in the base case except that a = 1, and another simulation with the same parameters except that [3 = 1. These changes did not change the size of any of the tests, and the only effect on the point estimates was to change the mean value of 6: or B by one. 18 1.6.3 Effects of changing N Next we considered parameter values that were the same as in the base case, except that we changed N to N = 500 (Table 1.2) and N = 1000 (Table 1.3). When we increase N, we reduce the standard deviation and MSE of the various parameter estimates, as expected. However, it is notable that we do not increase the precision of the technical efficiency estimates except perhaps trivially. To understand why, recall that the technical efficiency estimate is the expectation of exp(-u) condi- tional on (v — 11), evaluated at the estimated values of the parameters. The variance of this estimate depends on (i)“intrinsic variability,” by which we mean the variance of exp(—u) conditional on (v - u), which does not depend on N, and (ii) “sampling error,” by which we mean the variance of the parameter estimates, which does depend on N. Apparently even for N = 200 sampling error is quite small relative to intrinsic variability. As would be expected, increasing N does not reduce the size distortions of the BAD tests, but it does improve the asymptotically valid tests. For N = 500 we have the same pattern of size distortions as we observed for N = 200, but they are much smaller. Also the various types of numerical failures that distinguish Sizel from Size2, Size3 and Size4 have largely disappeared. For N = 1000 all of the asymptotically valid tests have reasonably accurate size; the worst is LM-OPG with size of 5.77%. The good news in this statement is that the tests behave as they should asymptotically. The bad news is that N = 1000 would be a very large sample size indeed for the type of efficiency measurement exercise that is considered here. 1.6.4 Effects of changing p Now we consider changes in p, the correlation between a: and z. The question of interest is whether strong correlation between a: and 2 creates difficulties (akin to multicollinearity) in estimation and whether this affects the tests. In the base case 19 we had p = 0.5, and now we keep the rest of the base case parameters but consider p = —0.5 (Table 1.4), p = 0 (Table 1.5) and p = 0.9 (Table 1.6). We also considered p = 0.25 and p = 0.75, and those results are in a supplementary set of tables. In terms of the point estimates based on MLE, the value of p makes little difference. When p = 0.9 the standard deviation and MSE of fl and 6 do increase, but not by very much. The value of p does not matter very much for any of the asymptotically valid tests, and in fact the results for the OLS and OLS-H tests do not change at all. For the BAD tests, it makes more difference, as asymptotic theory would suggest. For p = 0 the BAD tests are asymptotically valid, and they have approximately correct size, while for p = 0.9 the BAD tests have size of nearly zero. 1.6.5 Effects of changing A Next we consider a change in A, the parameter of exponential distribution of the one-sided error 11°. In Table 1.7 we report the results for A = 3, whereas the base case had A = 1. Since the overall error in the model is 1) — u, where v is normal noise, increasing A effectively decreases the relative importance of the noise, and should make inference about u or about the effect of 2 on u more reliable. Comparing Table 1.7 to Table 1.1, we see that this is true. With the larger value of A, the sizes of the asymptotically valid tests (other than GDV) all become closer to 5%. The effects of this change on the point estimates were less clear, in part because when A = 3 there were more outliers. 1.6.6 Effects of changing of, Now we change 03 to 9, as opposed to its base case value of 1, holding the other parameters the same. The results are in Table 1.8. This is a pure increase in statistical noise and it should make all of the estimates and tests worse. Comparing Table 1.8 20 to Table 1.1, that turns out to be true for all of the estimates, and for most of the tests. Among the asymptotically valid tests, the WALD tests and the GDV test are very seriously affected. They give very few rejections. There is relatively little effect of this change on the size of the LM-OPG and LM-HES tests or the OLS and OLS-H tests, however. It is also notable that the number of replications dropped in the size calculations is very large with the higher value of 03. The data are close enough to normal that the maximization process was difficult. As a curiosity we ran the Schmidt and Lin (1984) test of the hypothesis of no one-sided error, and we could reject this hypothesis (at the 5% level) only 1,086 times out of 10,000. 1.7 Simulation Results: Power In this section we investigate the power of the various tests. We therefore set 6 to some non-zero value. An immediate problem that arises is that it is not meaningful to compare the power of tests if their sizes are very different. One possibility is to consider size-adjusted power, but this has the disadvantage that then we are no longer investigating the power of a procedure that is feasible outside the simulation setting. An alternative possibility, which we follow, is to investigate power using a sample size sufficiently large that size distortions are not a serious problem. Therefore for all of our simulations in this section we will set N = 1000. Our “base case” is therefore the set of parameters for the simulations reported in Table 1.3, and we now change 6 from 0 to 0.05, 0.10 and 0.15, where these values were chosen to yield power that moved through a reasonable part of the range between zero and one. These results are given in Tables 1.9, 1.10 and 1.11. Changing 6 has very little effect on any of the point estimates, other than the mean of 6 , and we will not discuss the estimation results further. 21 Power increases as 6 increases, for obvious reasons. If we compare the WALD, LM and GDV tests their powers are quite similar. Fine distinctions are hard to make because even with N = 1000 their sizes were slightly different in Table 1.3. These tests are all asymptotically valid, and they all have the same asymptotic local power, so it is not surprising that their powers should be similar for N = 1000. A more interesting comparison is between their power and the power of the OLS-based tests (OLS and OLS-H). The OLS-based tests do not make use of the assumption that the u: are exponential, and the failure to exploit this fact ought to make them less powerful than the WALD, LM and GDV tests. This turns out to be true, with the difference in power being non-trivial but not huge. For example, for 6 = 0.1, compare 0.51 for OLS to 0.64 for LM-HES. We also did some additional simulations with p = 0.9, so that the variables I and z are more highly correlated than in the cases just considered (which had p = 0.5). Table 1.12 gives the results for 6 = 0.1 and p = 0.9, and the results for 6 = 0.05 and 0.15 are in our supplemental set of tables. Comparing Table 1.12 to Table 1.10, we can see that the higher value of p results in substantially lower powers for all of the tests. Among the asymptotically valid tests, the loss in power is much larger for the OLS-based tests than for the WALD, LM or GDV tests. These differences are certainly non-trivial. For example, the power of the LM-HES test changes from 0.64 to 0.48 when p changes from 0.5 to 0.9, while the power of the OLS test changes from 0.51 to 0.17. The low power of the OLS-based tests occurs because of multicollinearity in the OLS regression when I and z are highly correlated. The coefficient of z is poorly estimated and it is hard to reject the hypothesis that it is zero. The MLE-based tests do a better job of exploiting the nonlinearity of the relationship between y, I and z and suffer less when I and z are highly correlated. How much this matters, in an empirical setting, obviously will depend on how different the variables in z are from 22 those in I. Finally, we did some simulations in which the tests are exactly as above, and are therefore based on the assumption that the true scaling function is exp(6zi), when in fact this is not the true scaling function. For these simulations, we have u,- = ¢(6z,-)(1—(6z,~))'1u;-’, where 11) is the standard normal density, is the standard normal cdf, and 11;? is exponential with parameter A = 1. So, in the data generating process, the scaling function is the inverse Mill’s ratio, ¢(6zi)(1 — (6z,'))"1. Under the null, 6 = 0 and u, is exponential with parameter m. So our tests based on the exponential scaling function correctly encompass the null, and the only question is power. For the MLE-based tests, their power properties when the scaling function is misspecified are certainly not clear. For the OLS-based tests, however, we saw that the same statistic resulted from the score test principle for any monotonic differentiable scaling function g(6z,-). As a result, we might expect our OLS test to have better power properties relative to the MLE-based tests when the MLE-based tests are based on the wrong scaling function. Table 1.13 gives the simulation results with 6 = 0.1. These simulations have N = 1000, and are based on 2000 replications. The surprising aspect of these results is the good performance of the MLE-based methods. The parameter estimates look quite reasonable, despite the misspecification of the model. Similarly the MLE-based tests are more powerful than the OLS-based tests, despite the arguments of the previous paragraph. These optimistic results deserve attention in future research. 1.8 Simulation Results: Robustness In this section we investigate the effects of misspecification of the distribution of the one-sided error term. Specifically, we will consider the properties of the tests based on the MLE that assumes an exponential error, when in fact the distribution of 11° is 23 either truncated normal or gamma. We note at the outset that this issue does not arise with our OLS-based tests. These do not rely on any distributional assumptions on the errors, and they are asymptotically valid for any error distribution with finite variance (so that the central limit theorem applies). The MLE-based tests, on the other hand, will generally be invalid when the error distribution is misspecified. Fundamentally this is simply because the likelihood is then misspecified. To be more specific, consider the LM test or the GDV test based on the normal-exponential model, as discussed in Sections 1.2 and 1.3 above. These fundamentally depend on the quantity 211:1 2,13,: where f),- is an estimate of b,- :- E(uilei) — E(ui), with e,- = v,- — 11,. The precise form of b, depends on the assumption that v: is normal and U? is exponential. If in fact 11;? is not exponential, then E(bi) ¢ 0 and we cannot expect the test to be valid. A secondary but still relevant issue is that the asymptotic variance of 2?; 746,-, which also figures into the test statistic, also depends on the distributional assumption for uf being correct. See Section 1.2 above. We emphasize that the lack of robustness of the MLE-based tests to distribu- tional misspecification is not just a finite-sample issue. This problem persists even asymptotically. The lack of validity of the MLE-based tests should show up in simulations as incorrect size when the null hypothesis is true. The question then is how serious this problem is. Greene (1990) has argued that the rankings of estimated inefficiencies are often not sensitive to distributional assumptions on the one-sided error. Also, the exponential distribution shares same features with other one-sided distributions. The half-normal distribution, like the exponential, has a mode at zero. The gamma(gl, 92) distribution with 91 = 1 is exponential, and for 0 < gl < 1 it has a shape similar to the exponential. In the simulations of this section we have a = B = 6 = 0, 03 = 1 and N = 1000. 24 The number of replication is 2000. 1.8.1 Normal-truncated normal Here the distribution of u: is N (p, a2)+, that is, truncated normal. Table 1.14 gives our results for the case that u = 0 and a2 = 7r / 2. This is the half-normal distribution with mean equal to one. This choice makes the distribution somewhat comparable to the exponential distribution with parameter one, as in Table 1.3 above. However, the truncated normal with p = 0 and 02 = 1r/2 has variance equal to 0.57. (A truncated normal, unlike an exponential does not have its mean equal to its standard deviation.) We also considered three other cases: (i) p = 0, 02 = 7r / (7r — 2), for which the variance equals one but the mean equals 1.32; (ii) p = O, a2 = 1; (iii) [.1 = 1, 02 = 1. The results for these three cases are in our supplemental set of tables. In Table 1.14 we see that the OLS-based tests appear to have proper size, while the MLE-based tests exhibit significant size distortions. For MLE there are also consid- erable biases in the parameter estimates. The WALD and GDV tests are undersized, while the LM-OPG test rejects too often. This same pattern occurs for all four cases that we considered but the extent of the size distortions varied considerably over choices of p and 03. Comparing Table 1.14 to Table 1.3, we also see that there are many more repli- cations dropped when the distribution is misspecified. Obviously the data do not always fit the likelihood well and numerical problems occur. 1.8.2 N ormal—gamma Now the distribution of u: is gamma(gl, 92). The results in Table 1.15 and Table 1.16 are similar to those in Table 1.14 for the normal-truncated normal case. The OLS-based tests have more or less proper size, while the MLE-based tests do not. The LM-OPG test rejects too often, and this is true across all of our (91, gg) values. The 25 WALD, LM-HES and GDV tests also show significant size distortions, and sometimes reject too seldom and sometimes too often, depending the value of (91, 92). The MLE parameter estimates show clear biases. However, unlike the truncated normal case, not many replications were dropped here. The exponential model fits the data better in the normal-gamma case than in the normal-truncated normal case. Interestingly, that does not mean that it leads to more robust inference in the former case than in the latter. 1.9 Concluding Remarks In this chapter we have considered tests of the hypothesis that observable firm char- acteristics do not affect technical efficiency. We do this in the context of a specific model in which the one-sided errors are exponential. Under the null they are i.i.d. while under the alternative they are scaled by a function exp(zz’.6), where z,- are the firm characteristics whose influence we are testing. In this context we can estimate the model by MLE and test whether 6 = 0, which is the WALD test. We can also use an LM test. We show that a simple two-step test is not valid. (Here step one is to estimate technical efficiency for each firm. Step two is to regress these estimates on 2,- and test whether the coefficients are zero.) This test can be made valid by correcting the asymptotic variance matrix for the second- step estimates. This correction is distribution—specific. When technical efficiency is exponential, we show that the corrected two-step test is asymptotically equivalent to the LM test. We can also derive a valid test from the score test principle applied to the nonlinear least squares problem. This takes the form of an F-test of the significance of the coefiicients of z,- in an OLS regression of output on 2,- and the inputs. This test does not require a distributional assumption and it would be the same for any scaling 26 function of the form g(z,’-6), where g is monotonic and differentiable at zero. The OLS-based test therefore has good robustness properties, but it may be expected to have lower power than the MLE—based tests when the model for MLE is correctly specified. We perform a number of simulations to investigate the size and power properties of the tests we have suggested. The OLS-based tests do turn out to have good robustness properties and the MLE-based tests do turn out to be more powerful when the model is correctly specified. The loss in power for the OLS-based tests is especially large when the inputs and the firm characteristics 2,- are highly correlated. The MLE—based tests show significant differences among themselves when the sample size is not very large. The WALD tests reject too seldom and the LM-OPG test rejects too often. The LM test using the Hessian (LM-HES) and the corrected two—step test (GDV) are generally most reliable. The MLE-based tests perform reasonably well if the scaling function is misspecified but they do not have proper size if the distribution of inefficiency is misspecified. These results provide some guidance for empirical work. If the researcher’s interest is not in the inefficiencies themselves, but just in testing whether they depend on firm characteristics (like firm size, state versus private ownership, etc.) then the OLS-based tests would be natural, unless these firm characteristics are very strongly correlated with the inputs. However, if the researcher is going to estimate firm-level efficiencies in any case, then a distributional assumption will ultimately be needed, and MLE- based tests may as well be used. Among these tests the LM test using the Hessian or the corrected two-step test would be preferred. 27 1.10 Output Tables Table 1.1: (BASE CASE) 0 = fl = 6 = 0, 03 = [E(exp(—u)) = 0.5232] A=1,p=0.5,N=200 ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0001 0.1157 0.0134 6 -00342 0.1874 0.0363 8 0.0000 0.1004 0.0101 63 1.0072 0.2279 0.0520 12 0.9594 0.3524 0.1258 IE 0.5148 0.1769 0.0555 0.6131 Restricted MLE(6=0) 5: -0.0214 0.1817 0.0335 ,6 0.0001 0.0932 0.0087 53 0.9985 0.2240 0.0502 12 0.9919 0.3461 0.1199 IE 0.5099 0.1767 0.0547 0.6194 Restricted NLLS 1'7 -1.0004 0.0994 0.0099 (OLS on 11,277+ pea-+1.6: 6" 0.0007 0.1004 0.0101 n=—1,p=0,fi,=2) 53,, 2.0019 0.2691 0.0724 STATISTICS; Sizel Size2 Size3 Size4 Mean s.d. WALD-OPG 0.0211 0.0213 0.0214 0.0215 —0.0027 0.8669 WALD-HES 0.0298 0.0300 0.0296 0.0297 .0.0045 0.9345 LM-OPG 0.0788 0.0783 0.0766 0.0763 1.2115 1.7972 LM-HES 0.0523 0.0518 0.0515 0.0509 1.0561 3.3408* GDV 0.0466 0.0467 0.0471 0.0471 1.0067 1.3462 BADGDV 0.0363 0.0355 0.0350 0.0344 0.8564 1.2163 OLS 0.0495 0.0490 0.0481 0.0475 -0.0011 0.9973 OLS-H 0.0575 0.0572 0.0561 0.0558 -0.0003 1.0216 BADOLS 0.0237 0.0235 0.0233 0.0230 .0.0010 0.8634 BADOLS-H 0.0266 0.0265 0.0262 0.0260 .0.0014 0.8806 Rep. dropped 0 73 121 171 * due to outliers 28 Table 1.2: (Change of N) N = 500, a = 6 = 6 = 0, 0,2, = A = 1, p = 0.5 [E(exp(-u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 25‘ 0.0000 0.0649 0.0042 6. -00125 0.1108 0.0124 B 00005 0.0627 0.0039 63 1.0024 0.1405 0.0197 12 0.9849 0.2171 0.0473 IE 0.5050 0.1790 0.0522 0.6215 Restricted MLE(6=0) 6 -0.0074 0.1089 0.0119 [3' -0.0005 0.0583 0.0034 63 0.9986 0.1390 0.0193 12 0.9985 0.2146 0.0461 C’P‘E‘ 0.5032 0.1792 0.0520 0.6227 Restricted NLLS 77 -10007 0.0632 0.0040 (OLS on y,=n+6e,e+w,-: 6 -00005 0.0629 0.0040 n=—1,B=0.03,=2) 53,, 2.0023 0.1681 0.0283 STATISTICS Sizel Size2_ Size3 Size4 Mean s.d. WALD-OPG 0.0361 0.0361 0.0361 0.0361 -0.0005 0.9386 WALD-HES 0.0434 0.0434 0.0434 0.0434 0.0003 0.9741 LM-OPG 0.0611 0.0611 0.0611 0.0612 1.0967 1.5831 LM-HES 0.0503 0.0503 0.0502 0.0502 0.9876 1.4005 GDV 0.0502 0.0502 0.0502 0.0502 1.0127 1.3969 BADGDV 0.0355 0.0355 0.0355 0.0355 0.8613 1.2250 OLS 0.0513 0.0513 0.0513 0.0513 0.0005 0.9941 OLS-H 0.0538 0.0538 0.0538 0.0538 0.0008 1.0043 BADOLS 0.0245 0.0245 0.0245 0.0245 0.0003 0.8604 BADOLS-H 0.0254 0.0254 0.0254 0.0254 0.0005 0.8681 Rep.dropped 0 2 8 9 29 Table 1.3: (Change of N) N = 1000, a = fl = 6 = 0, 0,2, = A = 1, p = 0.5 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 0.0006 0.0450 0.0020 6 -00070 0.0757 0.0058 6 0.0007 0.0447 0.0020 63 1.0023 0.0965 0.0093 512 0.9907 0.1509 0.0228 TE 0.5026 0.1796 0.0514 0.6230 Restricted MLE (6: 0) 6 -0.0046 0.0750 0.0057 8' 0.0005 0.0419 0.0018 63 1.0006 0.0961 0.0092 312 0.9971 0.1499 0.0225 CITE" 0.5018 0.1796 0.0514 0.6235 Restricted NLLS 77 -1.0004 0.0448 0.0020 (OLS on y,=n+6x,-+w,: p" 0.0005 0.0449 0.0020 n=—1,,6=0,63=2) 63, 2.0001 0.1196 0.0143 STATISTICS Sizel Sizez' Size3 Size4 Mean s.d. WALD-OPG 0.0448 0.0448 0.0448 0.0448 0.0141 0.9748 WALD-HES 0.0485 0.0485 0.0485 0.0485 0.0146 0.9916 LM-OPG 0.0577 0.0577 0.0577 0.0577 1.0545 1.5021 LM-HES 0.0513 0.0513 0.0513 0.0513 1.0006 1.4092 GDV 0.0522 0.0522 0.0522 0.0522 1.0136 1.4101 BADGDV 0.0377 0.0377 0.0377 0.0377 0.8789 1.2469 OLS 0.0490 0.0490 0.0490 0.0490 .0.0143 0.9970 OLS-H 0.0488 0.0488 0.0488 0.0488 -0.0145 1.0008 BADOLS 0.0243 0.0243 0.0243 0.0243 -0.0124 0.8636 BADOLS-H 0.0253 0.0253 0.0253 0.0253 -0.0123 0.8661 Rep. dropped 0 0 1 1 30 Table 1.4: (Change of p) p = —0.5, 6 = 6 = 6 = 0, 63 = A = 1, N = 200 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0012 0.1157 0.0134 6 -0.0338 0.1859 0.0357 8 -0.0003 0.0994 0.0099 63 1.0071 0.2273 0.0517 512 0.9596 0.3518 0.1254 7"?) 0.5147 0.1769 0.0554 0.6133 Restricted MLE (6:0) 6 -00210 0.1808 0.0331 6 0.0001 0.0933 0.0087 63 0.9984 0.2244 0.0504 12 0.9922 0.3458 0.1196 TE 0.5098 0.1767 0.0546 0.6194 NLLS under the null 77' -1.0002 0.0995 0.0099 (OLS on y,=n+6s,-+w,-: ,6 0.0008 0.1004 0.0101 n=—1,6=0,63,=2) 63, 2.0023 0.2689 0.0723 STATISTKTS Sizel Sizez’ Size3 Size4 Mean s.d. WALD-OPG 0.0175 0.0176 0.0177 0.0178 0.0092 0.8595 WALD-HES 0.0302 0.0304 0.0302 0.0304 0.0093 0.9313 LM-OPG 0.0776 0.0773 0.0758 0.0756 1.2179 1.7921 LM-HES 0.0515 0.0503 0.0515 0.0502 1.0036 1.4506 GDV 0.0467 0.0470 0.0472 0.0474 1.0081 1.3418 BADGDV 0.0367 0.0360 0.0355 0.0349 0.8546 1.2196 OLS 0.0495 0.0490 0.0481 0.0477 -0.0007 0.9973 OLS-H 0.0575 0.0571 0.0560 0.0557 0.0003 1.0217 BADOLS 0.0239 0.0236 0.0234 0.0231 -0.0002 0.8629 BADOLS-H 0.0283 0.0282 0.0276 0.0275 0.0013 0.8808 Rep. dropped 0 82 133 184 31 Table 1.5: (Change of p) p = 0, 07 = 6 = 6 = 0, 03 = A = 1, N = 200 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0007 0.1062 0.0113 6 —0.0339 0.1858 0.0357 B 0.0001 0.0934 0.0087 63 1.0072 0.2271 0.0516 .12 0.9605 0.3509 0.1247 TE 0.5145 0.1767 0.0556 0.6150 Restricted MLE(6=0) 64 -0.0214 0.1812 0.0333 6 0.0001 0.0933 0.0087 63 0.9987 0.2243 0.0503 Li? 0.9916 0.3459 0.1197 T‘E’ 0.5099 0.1767 0.0546 0.6194 Restricted NLLS 17 -10003 0.0994 0.0099 (OLS on y,-=n+ 311734-1011 6 0.0008 0.1004 0.0101 n=—1,p=0,63,=2) ”3, 2.0020 0.2690 0.0723 STATISTICS Sizel Size2_ Size3 Size4 Mean s.d. WALD-OPG 0.0207 0.0208 0.0209 0.0210 0.0034 0.8710 WALD-HES 0.0293 0.0295 0.0293 0.0295 0.0019 0.9376 LM-OPG 0.0770 0.0763 0.0754 0.0748 1.2030 1.7534 LM-HES 0.0515 0.0502 0.0514 0.0500 1.0164 1.6641 GDV 0.0452 0.0453 0.0455 0.0455 1.0128 1.3481 BADGDV 0.0517 0.0507 0.0502 0.0492 0.9824 1.3752 OLS 0.0495 0.0485 0.0483 0.0474 .0.0013 0.9963 OLS-H 0.0575 0.0568 0.0563 0.0556 .0.0002 1.0211 BADOLS 0.0499 0.0489 0.0487 0.0478 —0.0013 0.9963 BADOLS-H 0.0557 0.0550 0.0545 0.0539 .0.0002 1.0159 Rep. dropped 0 70 118 162 32 Table 1.6: (Change ofp) p = 0.9, a = )6 = 6 = 0, 0,2, = ,\ = 1, N = 200 [E(exp(-U)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 -00012 0.1414 0.0200 6 -00321 0.1876 0.0362 6” -0.0008 0.1217 0.0148 *3 1.0051 0.2280 0.0520 12 0.9593 0.3544 0.1272 {FE 0.5149 0.1781 0.0560 0.6079 Restricted MLE(6=0) 6 -00200 0.1812 0.0332 8 0.0000 0.0934 0.0087 63 0.9970 0.2234 0.0499 12 0.9946 0.3459 0.1196 TE 0.5094 0.1769 0.0546 0.6195 Restricted NLLS 6 -1.0004 0.0995 0.0099 (OLS on y,=n+6r,e+w,-: 6 0.0007 0.1005 0.0101 n=—1,8=0,63_,=2) 63, 2.0030 0.2695 0.0726 STATISTICS Sizel Size2- Size3 Size4 Mean s.d. WALD-OPG 0.0188 0.0191 0.0192 0.0195 -0.0132 0.8417 WALD-HES 0.0314 0.0319 0.0316 0.0320 -0.0149 0.9290 LM-OPG 0.0820 0.0815 0.0807 0.0803 1.2490 1.8683 LM—HES 0.0561 0.0558 0.0558 0.0552 1.0876 6.7456* GDV 0.0405 0.0408 0.0414 0.0415 0.9791 1.2808 BADGDV 0.0108 0.0106 0.0109 0.0108 0.5739 0.8467 OLS 0.0495 0.0491 0.0486 0.0484 0.0027 0.9991 OLS-H 0.0575 0.0571 0.0565 0.0562 0.0037 1.0230 BADOLS 0.0000 0.0000 0.0000 0.0000 0.0015 0.4345 BADOLS-H 0.0000 0.0000 0.0000 0.0000 0.0008 0.4441 Rep. dropped 0 171 217 346 * due to outliers 33 Table 1.7: (Change of A) A = 3, a = 6 = 6 = 0, 03 = 1, p = 0.5, N = 200 [E(exp(-u)) = 0.1095] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0000 0.0768 0.0059 6 -0.0183 0.2100 0.0445 6 -0.0010 0.1462 0.0214 63 0.9873 0.3385 0.1147 ,1? 8.9144 1.7316 3.0054 TE 0.2531 0.2234 0.0330 0.7736 Restricted MLE(6=0) (1 -0.0122 0.2502 0.0627 6 -0.0010 0.1411 0.0199 63 0.9884 0.9388 0.8814 12 9.0074 1.7342 3.0072 ”FE 0.2521 0.2234 0.0337 0.7740 Restricted NLLS 1’7 -2.9982 0.2227 0.0496 on yi=n+6Ii+wi: 6 0.0003 0.2224 0.0495 n=—3,6=0,a3,=10 63, 9.9914 1.8539 3.4365 STATISTICS Sizel Size2_ Size3 Size4 Mean s.d. WALD-OPG 0.0397 0.0402 0.0397 0.0402 0.0001 0.9366 WALD-HES 0.0422 0.0427 0.0422 0.0427 -0.0002 0.9757 LM-OPG 0.0629 0.0580 0.0629 0.0580 1.0675 1.4782 LM-HES 0.0493 0.0440 0.0493 0.0441 0.9573 1.3053 GDV 0.0545 0.0499 0.0545 0.0499 1.0083 1.3753 BADGDV 0.0443 0.0386 0.0443 0.0386 0.8949 1.2349 OLS 0.0503 0.0440 0.0503 0.0441 .0.0041 0.9753 OLS-H 0.0565 0.0513 0.0565 0.0513 -0.0027 1.0011 BADOLS 0.0226 0.0183 0.0226 0.0183 .0.0034 0.8449 BADOLS-H 0.0259 0.0228 0.0259 0.0228 -0.0030 0.8664 Rep. dropped 0 124 1 125 34 Table 1.8: (Change of 63) 63 = 9, 6 = = 6 = 0, A = 1, p = 0.5, N = 200 [E(exp(—u)) = 0.5100] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0012 0.2333 0.0544 6 0.0486 0.5434 0.2977 6 0.0035 0.2494 0.0622 63 8.5074 1.3725 2.1263 62 1.3239 1.1434 1.4120 TE 0.5221 0.0982 0.1002 0.2302 Restricted MLE(6=O) 6 0.0713 0.5544 0.3124 6 0.0036 0.2262 0.0512 63 8.5207 1.4162 2.2351 12 1.4226 1.1787 1.5676 TE 0.5138 0.0819 0.0965 0.2771 Restricted NLLS 6 -10015 0.2221 0.0493 (OLS on yi=n+flI¢+wiz 6 0.0040 0.2254 0.0508 6=—1,6=0,63,=10) 63, 10.0157 1.0331 1.0675 STATISTICS Sizel Size2 f Size3 Size4 Mean s.d. WALD-OPG 0.0011 0.0014 0.0021 0.0024 .0.0040 0.5877 WALD-HES 0.0041 0.0054 0.0040 0.0045 -0.0067 0.7009 LM-OPG 0.0792 0.0834 0.0506 0.0531 0.9325 1.4464 LM-HES 0.0506 0.0530 0.0674 0.0634 1.4862 21.7190* GDV 0.0177 0.0160 0.0107 0.0114 0.6292 0.8579 BADGDV 0.0317 0.0305 0.0143 0.0133 0.5772 0.8821 OLS 0.0520 0.0503 0.0326 0.0323 0.0122 0.8829 OLS-H 0.0576 0.0558 0.0369 0.0363 0.0148 0.9078 BADOLS 0.0241 0.0234 0.0128 0.0123 0.0107 0.7641 BADOLS-H 0.0287 0.0286 0.0162 0.0163 0.0115 0.7844 Rep. dropped 0 2352 4689 5349 * due to outliers 35 Table 1.9: (Change of 6) 6 = 0.05, a = 6 = 0, 0,2, = A = 1, p = 0.5, N = 1000 [E(exp(-u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0520 0.0450 0.0020 6 -0.0048 0.0753 0.0057 6 0.0023 0.0442 0.0020 *3 1.0004 0.0979 0.0096 .12 0.9935 0.1507 0.0227 TE 0.5023 0.1802 0.0514 0.6240 Restricted MLE(6=0) 6 0.0006 0.0743 0.0055 6 -0.0156 0.0413 0.0020 63 0.9964 0.0971 0.0094 Z\2 1.0087 0.1497 0.0225 TE 0.5001 0.1803 0.0514 0.6240 Restricted NLLS (OLS on 77 -1.0009 0.0466 0.0022 y¢=n+BIi+wizn=—1.0013, 6 -0.0238 0.0447 0.0020 6=—0.0250,63,=2.0075) 63, 2.0091 0.1197 0.0143 STATISTICS [Powerl Power2 Power3 Power4 Mean s.d. WALD-OPG l0.1960 0.1960 0.1962 0.1962 1.1269 0.9574 WALD-HES 0.2035 0.2035 0.2037 0.2037 1.1489 0.9714 LM—OPG 0.2270 0.2270 0.2272 0.2272 2.4506 2.7265 LM-HES 0.2090 0.2090 0.2092 0.2092 2.3278 2.5553 GDV 0.2115 0.2115 0.2117 0.2117 2.3049 2.4668 BADGDV 0.1710 0.1710 0.1712 0.1712 2.0401 2.2455 OLS 0.1690 0.1690 0.1687 0.1687 -0.9800 1.0053 OLS-H 0.1670 0.1670 0.1672 0.1672 —0.9824 1.0085 BADOLS 0.1000 0.1000 0.1001 0.1001 -0.8492 0.8713 BADOLS-H 0.0985 0.0985 0.0986 0.0986 -0.8496 0.8727 Rep. dropped 0 0 2 2 The number of replication is 2000. 36 Table 1.10; (Change of 6) 6 : 0.1, a : 6 : 0, e3 : ,\ : 1, p : 0.5, N : 1000 [E(exp(——u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.1024 0.0456 0.0021 6 -00047 0.0754 0.0057 6 0.0022 0.0441 0.0020 63 1.0003 0.0979 0.0096 62 0.9935 0.1517 0.0230 TE 0.5023 0.1814 0.0513 0.6268 Restricted MLE (6:0) 6 0.0092 0.0739 0.0055 6 -00329 0.0414 0.0028 63 0.9901 0.0970 0.0095 12 1.0336 0.1515 0.0241 T17: 0.4973 0.1817 0.0515 0.6251 Restricted NLLS (OLS on 77 -1.0047 0.0467 0.0022 y,:6+6x,-+w,~:6:—1.0050, 6 —0.0490 0.0449 0.0020 6:—0.0503,e3,:2.0304) 63, 2.0305 0.1228 0.0151 STATISTICS Powerl Power2 Power3 Power4 Mean s.d. WALD-OPG 0.6115 0.6118 0.6115 0.6118 2.1879 0.9419 WALD-HES 0.6350 0.6353 0.6350 0.6353 2.2309 0.9407 LM—OPG 0.6580 0.6578 0.6580 0.6578 6.4945 4.8214 LM-HES 0.6410 0.6408 0.6410 0.6408 6.1642 4.5083 GDV 0.6425 0.6423 0.6425 0.6423 5.8979 4.1321 BADGDV 0.5915 0.5913 0.5915 0.5913 5.4355 4.0116 OLS 0.5105 0.5103 0.5105 0.5103 —1.9417 1.0116 OLS-H 0.5060 0.5058 0.5060 0.5058 -1.9357 1.0040 BADOLS 0.3770 0.3767 0.3770 0.3767 -1.6816 0.8769 BADOLS-H 0.3685 0.3682 0.3685 0.3682 -1.6696 0.8681 Rep. dropped 0 1 0 1 The number of replication is 2000. 37 Table 1.11: (Change of 6) 6 : 0.15, a : 6 : 0, e3 : A : 1, p : 0.5, N : 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE ” 0.1528 0.0465 0.0022 0mw 0mm 0mm 0.0021 0.0440 0.0019 63 10000 00979 00096 A2 0%% 0mm 0mn awn 0mm 0mm 0am 0mn 0mm 0mm -0M% 0mm 0mg 0%m 0mm 0W% 10744 01556 00297 0.4925 0.1837 0.0516 0.6269 -1mm 0mm 0mm 0mm 0mm 0mm 2mm 0mm 0mm 7%) Q) 0'1 ) Restricted MLE (6 = 0) Restricted NLLS (OLS on 31; = 7) + 61‘; + w): 7) = —1.0113, 6 : —0.0758, 63, : 2.0693) quunai g) $31,755. Q: g) STATISTICS l Powerl Power2 Power3 Power4 Mean s.d. WALD-OPG 0.9100 0.9114 0.9104 0.9118 3.1972 0.9177 WALD-HES 0.9255 0.9269 0.9259 0.9273 3.2605 0.8930 LM-OPG 0.9350 0.9349 0.9354 0.9353 13.1080 7.0740 LM-HES 0.9310 0.9309 0.9314 0.9313 12.4001 6.5219 GDV 0.9280 0.9279 0.9284 0.9283 11.3511 5.6386 BADGDV 0.9065 0.9064 0.9069 0.9068 11.0093 5.9053 OLS 0.8220 0.8217 0.8228 0.8226 -2.9061 1.0167 OLS-H 0.8230 0.8227 0.8238 0.8236 -2.8711 0.9920 BADOLS 0.7425 0.7421 0.7432 0.7429 -2.5151 0.8813 BADOLS-H 0.7300 0.7296 0.7307 0.7303 -2.4673 0.8566 Rep. dropped 0 3 2 5 The number of replication is 2000. 38 Table 1.12: (Change of6 and p) 6 = 0.1, p = 0.9, a = 6 = 0, 03 = A = 1, N = 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.1032 0.0541 0.0029 6 -0.0048 0.0757 0.0058 6 0.0032 0.0531 0.0028 63 1.0007 0.0980 0.0096 A2 0.9926 0.1525 0.0233 TE 0.5024 0.1815 0.0514 0.6261 Restricted MLE (6:0) 6 0.0052 0.0747 0.0056 6 -0.0607 0.0416 0.0054 63 0.9932 0.0975 0.0095 A2 1.0257 0.1520 0.0238 TE 0.4983 0.1811 0.0516 0.6235 Restricted NLLS (OLS on 6 -10047 0.0468 0.0022 y,:n+6n,-+w,-:6:—1.0050, 6 -0.0890 0.0452 0.0020 6:—0.0905,a3,:2.0304) 63, 2.0252 0.1220 0.0149 STATISTICS Powerl Power2 Power3 Power4 Mean s.d. WALD-OPG 0.4405 0.4429 0.4405 0.4429 1.8212 0.9378 WALD-HES 0.4655 0.4681 0.4655 0.4681 1.8627 0.9343 LM—OPG 0.4915 0.4902 0.4915 0.4902 4.8166 4.0995 LM-HES 0.4835 0.4816 0.4835 0.4816 4.5987 3.8783 GDV 0.4670 0.4656 0.4670 0.4656 4.2959 3.4242 BADGDV 0.2675 0.2680 0.2675 0.2680 2.7913 2.3952 OLS 0.1715 0.1699 0.1715 0.1699 -0.9813 1.0056 OLS-H 0.1705 0.1689 0.1705 0.1689 -0.9835 1.0083 BADOLS 0.0000 0.0000 0.0000 0.0000 -0.4280 0.4382 BADOLS-H 0.0000 0.0000 0.0000 0.0000 -0.4247 0.4355 Rep. dropped 0 11 0 11 The number of replication is 2000. 39 Table 1.13: (Change of scaling functions to ¢(6z,-)/(1 — (6z,-))) 6 = 0.1, a = 6 = 0, a3 : A : 1, p : 0.5, N : 1000 [E(exp(—u)) : 0.5232] The number of replication is 2000. 40 ESTIMATION METHODS Estimates Mean s.d. MSE Corr 6 0.0827 0.0514 0.0029 6 -0.0065 0.0783 0.0062 6 0.0021 0.0418 0.0018 63 1.0012 0.0938 0.0088 A2 0.6278 0.1206 0.1531 TE 0.5600 0.1574 0.0627 0.5509 Restricted MLE (6 : 0) 6 0.0043 0.0764 0.0059 6 -0.0230 0.0389 0.0020 63 0.9939 0.0928 0.0086 A2 0.6503 0.1187 0.1364 TE 0.5557 0.1577 0.0623 0.5485 Restricted NLLS 77 -0.7987 0.0420 0.0423 (OLS on 6,:6+6s,-+w,:: 6 -00305 0.0406 0.0026 6:—1,6:0,63,:2) 63, 1.6471 0.0897 0.1550 —STATISTICS P6wer4 Mean s.d. WALD-OPG 0.3255 1.5395 0.9178 WALD-HES 0.3540 1.5824 0.9317 LM-OPG 0.4045 3.8867 73.6256 LM-HES 0.3740 3.6723 3.6913 GDV 0.3745 3.4652 3.0224 BADGDv 0.3170 3.1357 2.9020 OLS 0.2825 -1.3685 1.0079 OLS-H 0.2855 -1.3710 1.0093 BADOLS 0.1855 -1.1856 0.8737 BADOLS-H 0.1885 -1.1848 0.8731 Rep. dropped 0 Table 1.14: (Change of the distribution of u: to N(0,7r/2)+) a = 6 = 6 = a3:A:1,p:0.5,N:1000[E(exp(—u)):0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 -0.0001 0.0746 0.0056 6 -04192 0.1093 0.1876 6 0.0015 0.0423 0.0018 63 1.2244 0.1108 0.0626 A? 0.3455 0.1110 0.4407 TE 0.6359 0.1128 0.0834 0.5383 Restricted MLE (6:0) 61 -0.4106 0.1087 0.1804 6 0.0013 0.0383 0.0015 63 1.2185 0.1118 0.0602 A2 0.3569 0.1148 0.4268 TE 0.6320 0.1131 0.0815 0.5465 Restricted NLLS 6 -09993 0.0396 0.0016 (OLS on y,:6+6r,-+w,-: 6 0.0015 0.0386 0.0015 6:—1,6:0,63,:2) 63, 1.5730 0.0717 0.1874 STATISTICS Sizel Size2: Size3 Size4 Mean s.d. WALD-OPG 0.0210 0.0213 0.0221 0.0223 -0.0005 0.8788 WALD-HES 0.0270 0.0274 0.0285 0.0287 .0.0003 0.9174 LM-OPG 0.0635 0.0635 0.0622 0.0622 1.0649 1.5782 LM-HES 0.0635 0.0620 0.0611 0.0595 1.0941 2.0656 GDV 0.0370 0.0371 0.0390 0.0388 0.9229 1.2642 BADGDV 0.0320 0.0320 0.0311 0.0308 0.8265 1.2130 OLS 0.0500 0.0503 0.0496 0.0494 0.0159 0.9915 OLS-H 0.0515 0.0518 0.0506 0.0505 0.0163 0.9963 BADOLS 0.0250 0.0249 0.0253 0.0250 0.0142 0.8591 BADOLS-H 0.0250 0.0249 0.0248 0.0244 0.0145 0.8626 Rep. dropped 0 32 103 118 The number of replication is 2000. 41 Table 1.15: (Change of the distribution of 11;? to gamma(0.5,2)) a = 6 = 63:A:1,p:0.5,N:1000[E(exp(—u)):0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0003 0.0413 0.0017 6 0.4262 0.0651 0.1859 6 0.0010 0.0444 0.0020 63 0.7906 0.0825 0.0507 A2 2.0376 0.2222 1.1259 TE 0.4149 0.2146 0.0886 0.6763 Restricted MLE (6:0) 6 0.4276 0.0658 0.1871 6 0.0010 0.0422 0.0018 63 0.7897 0.0831 0.0511 A2 2.0450 0.2243 1.1424 TE 0.4144 0.2146 0.0888 0.6763 Restricted NLLS 17 -1.0003 0.0550 0.0030 (OLS on y,:6+6r,+w,: 6 0.0013 0.0541 0.0029 6:—1,6:0,1L3,:2) 63, 2.9983 0.2504 1.0594 —STATISTICS S-ize4 Mean s.d. WALD-OPG 0.0860 0.0059 1.1330 WALD-HES 0.0765 0.0069 1.0860 LM-OPG 0.0630 1.1103 1.5402 LM-HES 0.0765 1.1829 1.6295 GDV 0.0585 1.0788 1.4734 BADGDV 0.0495 0.9787 1.3557 OLS 0.0560 .0.0140 1.0078 OLS-H 0.0540 -0.0143 1.0091 BADOLS 0.0235 -0.0120 0.8725 BADOLS-H 0.0230 -0.0122 0.8722 Rep. dropped 0 The number of replication is 2000. 42 Table 1.16: (Change of the distribution of 11;? to gamma(2,0.5)) 01 = 6 = 63:A:1,p:0.5,N:1000[E(exp(—u)):0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0013 0.0639 0.0041 6 -0.3772 0.0976 0.1518 6 0.0006 0.0413 0.0017 63 1.1015 0.1041 0.0211 A2 0.3945 0.1088 0.3785 TE 0.6187 0.1250 0.0703 0.5223 Restricted MLE (6:0) 6 -0.3695 0.0984 0.1462 6 0.0003 0.0379 0.0014 63 1.0959 0.1050 0.0202 A2 0.4058 0.1132 0.3659 fi 0.6154 0.1255 0.0690 0.5263 Restricted NLLS 6 -0.9997 0.0392 0.0015 (OLS on y,:6+6.r,-+w,-: 6 0.0001 0.0385 0.0015 6:—1,6:0,63,:2) 63, 1.5002 0.0708 0.2548 STATISTICS Sizel Size2_ Size3 Size4 Mean s.d. WALD-OPG 0.0300 0.0301 0.0304 0.0304 0.0204 0.9101 WALD-HES 0.0320 0.0321 0.0324 0.0325 0.0178 0.9430 LM-OPG 0.0640 0.0637 0.0633 0.0634 1.0674 1.4957 LM-HES 0.0600 0.0591 0.0592 0.0588 1.0155 1.4688 GDV 0.0415 0.0411 0.0415 0.0416 0.9545 1.2671 BADGDV 0.0335 0.0331 0.0329 0.0330 0.8294 1.1569 OLS 0.0480 0.0476 0.0471 0.0472 .0.0240 0.9990 OLS—H 0.0500 0.0496 0.0491 0.0492 -0.0236 1.0057 BADOLS 0.0235 0.0231 0.0228 0.0228 -0.0211 0.8648 BADOLS-H 0.0245 0.0241 0.0238 0.0238 -0.0206 0.8694 Rep. dropped 0 5 25 28 The number of replication is 2000. 43 1.11 Appendix: LM Test for the Scaled Exponen- tial Case Recall that the LM statistic of (1.18) is: LM : V5 lnL(6)’[i'55 — 25¢TJ$T¢5]_1V5lnL(6~) 1 N I 1 N _ __ ”.. " _“ ”-1“ -1 _ ”.. — A Z; bz 27 [I66 I6¢I¢¢I¢6l :\ 2 0,2, 1 N~ [.12- ~-_1~ '1 1 N~ = fizbizi [N (1156—1-61pr $1.35)] W262i . i=1 Now we compare LM with x/N ’y’ (Var(\/N’y))’1\/N’y where the asymptotic dis- tribution of x/Nfi is derived in (1.8): Wi'Warb/‘Niii-lx/Ni 1 N ~ I 1 N —1 l 1 N \_1 1 N ~ : (figbili) (Ngzzzz) BA B (Ngzizz) (TN £0121) _ Ligh- ’A—l J—fié +0 (1) __ mizl 1 1 «TV—3:1 121 P 1 N ~ I 1 N I -1 = (76,233) N§(Zibi +GTi)(Zibi +070) (1.36) In the following we prove LM and x/I—V’y’ (Var(\/N 6))-1m '31 have the same as- ymptotic distribution by showing that the probability limit of (N '1 231:1(2333 + Gri)(z,-b,~ + Gri)') is [A2(I§5 — 1,3301%;11; 6)] where 1° is the limiting information 44 matrix as defined in (1.12). From (1.15), the gradient of the log-likelihood, V9 lnL(6) is l L (Q6%‘\ (26:13:15) A 61 L 79%— : 26:1 86(5) (137) 01 L — - (Tang 261:1 31’ (012;) 61 L N . 2 (gr) (2.: 61A )) Specifically, N N 2 , __ 0062' 52' _ 0v _ . 2381(6) — g (Aexp(z ’6) Aexp(z£6) A2 exp(2256) 1) 2,, N N {i 1 > 3' 5 = (— - —— I , g 1( ) rzzl 0v Aexp(z:6) 1 N N 2 _ ( 1 _ 5i fig; E5310“) — Z; \2A2 exp(2zg6) 2A exp(zf6)av + 203) ’ N N 2:302) = :1( avg, 6‘ _ <73 _ 1 ) i=1 1 6— 2A3 exp(3z’-6) 2A3 exp(3zg6) 2A4 exp(4z,’-6) 2A2 exp(Zzgé) (1.38) where ' A 6 51' Z (Mg/av + 012/( exp(z’- ))) (1.39) 1 — @(ei/av + ov/(A exp(z;6))) Let H (6) denote the Hessian, vglan) _=_ 62 lnL(0)/6066', and If denote H (6) evaluated at 19 = 6 = (0’, 6’, 63, Az)’. We partition If conformably to (1.17) as .. F155 1315,), H : (1.40) 5’46 H66 45 where 11...... stands for the *,* block of H, evaluated at the estimates, and 16 = (6’, 03, A2)’. Specifically, each element of If is: N N H55 = A—2 Z (Var(u,'|c,-) -— A2) 232;, H56— — (A011) )lzvar(u1l€1) )331'2 it 1': 1N H036 : (27163)" 121((—— — —) Vmeo + A)z,’., 51,2, = (269-12 (Var(u.-lei) — i2) 2;. 1:1 N H66 — a. 4 Z (Vents-Is) — a.) 4.4;, i=1 N E- ' 2 1 J— — s v — ’, H v5 ( 0 ) g ((53 ) ar(u1l€1) 51) $1 a. N N HA26 = (2A 03)'12Var(ui|e,-)I,, i=1 " — 45-1 _i___"l .._~2_ 7 H0303 — (4011) 12:; ((61, A ) Var(u,|6,) 6, 200) , a. N C 1 N H3202 = (4A363)_1 Z ((732- — —) Var(u,|c,-) + A), ” i=1 a” ’\ ~ ~ N N ~ HA2A2 = (466)—1 Z (Var(ui|€i) — 62) . i=1 where we use 56.1..) = a. (a — (.3— . mitt-)0 Var(ui|6,-)— — 0,, 2(1 +((:: + Egg-(2767) 5,- - £3). 46 (1.41) (1.42) Lemma 1. Let G be defined as in (1.9c). Then, with 6 = 0, 1 N )1 1 N GZNZZ iV¢f(yi,xi,16 16) +0p(1):N-XZZI:V1/Jf(yiixi)w)’ +0130) i=1 =1 1 WW VIM2 A (1.43) A 2 /\ = NVW lnL I5=0 + 0p(1) = I—V-H5¢,6:0 + 013(1) 1 = -A (-1—V-H6¢,6:0) + 012(1) = ‘6166,6:0 + 010(1)- Lemma 2. With 6 = 0, I 1 N 2 _ . . ’ ATl‘N Eb Z22: :N 12G b '2') (6132') _ N;S'(6)s’(6) [5:0 (144) : 166,6:0 + 019(1)- Lemma 3. Consider 221:1 53(16) 2 (211:1 3,-(6)', 211:1 33(03), 221:1 83(A2))' as given in (1.38). Then, with 6 : 0, N-1 23:, 5,-(16 16,-)6 2’- - A136 5:0 + 6,,(1). Proof. Note that N‘1 21:1 53(16)b,-zz'- is equal to: N (012:; ZEVar(u,|e,)Izz \ N /\ . 1 + ;(N(§ _. 5%) — W(Euilei - Emu-”1'21 —1 i (£1._1)V ( I )_ ) I 20.3N i=1( 03 A 31' U1, 61; 51, zi N A 1 6 66 ‘N§('2T_2A6v 26;) ' 1 ib2 I 1'78 I 47 —1 (012)1( ' dwizi \ 1 N 2 03 035i ‘73 +:3N§(2av+A—2+ A (—A0 + (1)503“- : ’1 iv: (fi— l)Var(u |e)— e-zg) 203Ni=1 0,2, A z z N A 1 5 e 6 _ N236? - 2AA, + 2:71,)”, i 1 N Ina—DE"; / (1.45) Now, the second terms in the first two elements of the above vector converge in probability to zero: 1 N 04 026' 03 I 0v 2N (202 +—— A — (Adv + A) 52‘) 12,2, = 013(1), i=1 N A 1 fit 626' I — — — . = 1 . N 21(2A2 2on + 203 z* 0P( ) 1: This is because, first, 1 o4 026‘ o3 E [32 (203+ A7 + :2- (Am, + 3%) 6i) 2:21;] 104 3 I =00 —(203" + 73 +9; E(ei) — (Mu + ”7) Boss) E(m) 1 2 o4 02 03 0v I = 3 (20,, + Kg + 7W4) - (My + -/\- 7 E(xizi) = 0, U (1.46) (1.47) where, with 6 = 0, e,- and 45,- are functions of only the error terms, 22,- and n: which 48 are assumed to be independent of x,- and Zi~ Note that E(g,) = O'v/A since A = E(uz’) = E[E(ui|5i)l = E [w (Q — (:1. + if)» A a v (1.48) = 01) (E(éi) + 0—v " f): which we solve for E(fi). Secondly, I {2' 6i€i _ , 2 _ E(2A2 2on 20,3 ”AE31(0v)l6=0—0 (1.49) where 3,-(03) is as defined in (1.38). Then (1.45) is equal to: -—(03N)"1 25:1 Var(ui|e,')xiz,'- + op(1) —(203N)’12i’i1((ei/03- l/A) Varese» - at) z: + 0pm _ N (1.50) (2A3N) 122:1”?3; l = ‘NHW-w + 0pm = A (--1\7H¢6,6=0) + op = Hinze + 0pm. Cl Recall that r,- = r,(16) = I°_ls,-(i6) = 122d16=03i(¢) = Izglsi as shown in (1.11). Now we are ready to evaluate our main expression: i=1 l N = N bezzz; (1.51a) i=1 1 N , , + ’N 2 amp (1-51b) i=1 1 N I , + 7V- 2 ZzszzG (1'51C) 2'21 49 N 1 + NZerizg. (1.51d) We will evaluate these term by term. By Lemma 2, bfz, z-gz _ —A 21°, + 0,,(1). (1.52) 2I~ 1M2 Next, we evaluate (1.51b): N N iga We tZena)(rials->(Ittlsn1—vts+0p<1> i=1 2 1 1 N 1 _ I _. = A 13111311) “N232“ 13¢ L726 +0141) (1.53) z: _ 2 —1 — — ’\ 13:11:22» 13:11:21» 136 + 019(1) = Hamlets + opu) Finally, we evaluate (1.51c): N N 1 1 _ N E zibz-rgG’ = N E :Zibisglz)¢l(“’\11c66)+ 013(1) i=1 i=1 1.54 : (W¢)I;;1(“AI;5) + 013(1) (using Lemma 3) ( ) = —AZI§¢I;;1117}6 + 0p(1). And term (1.51d) is exactly the same as this term. Inserting (1.52), (1.53) and (1.54) into (1.51), we obtain N 1 o _ N' 2 1:( zzb + Gri) (z,b,- + Gr,)’ = A2 (13’, -— 1,5,13,11,35) + 0p(1)- (1-55) Therefore the expressions inside the inverse in equations (1.35) and (1.36), for the LM test and the GDV test respectively, have the same probability limit. 50 1.12 Appendix: Supplementary Tables Supplemental Table 1.17: (Change of p) p = 0.25, a = 5 = 6 = 0, 0,2, = A = 1, N = 200 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 0.0003 0.1084 0.0118 6 -0.0338 0.1859 0.0357 3 0.0001 0.0951 0.0090 63 1.0069 0.2271 0.0516 R? 0.9606 0.3510 0.1247 TE 0.5145 0.1767 0.0553 0.6145 Restricted MLE(6:0) 6”! -0.0213 0.1811 0.0333 3 0.0001 0.0933 0.0087 63 0.9985 0.2241 0.0502 P 0.9918 0.3459 0.1197 CFE 0.5098 0.1767 0.0546 0.6194 Restricted NLLS 5 4.0003 0.0994 0.0099 (OLS on y,=n+6x,-+w,-: B 0.0008 0.1004 0.0101 n=—1,6=0,a§,=2) 6,3, 2.0020 0.2691 0.0724 STATISTICS; Sizel Size2 ' Size3 Size4 Mean s.d. WALD-OPG 0.0212 0.0214 0.0215 0.0216 .0.0005 0.8719 WALD-HES 0.0303 0.0305 0.0305 0.0306 .0.0019 0.9385 LM-OPG 0.0787 0.0780 0.0769 0.0762 1.2060 1.7718 LM-HES 0.0516 0.0507 0.0512 0.0503 1.0830 6.7671* GDV 0.0465 0.0465 0.0471 0.0470 1.0121 1.3502 BADGDV 0.0484 0.0475 0.0469 0.0461 0.9524 1.3354 OLS 0.0495 0.0487 0.0483 0.0475 0.0008 0.9966 OLS-H 0.0575 0.0569 0.0563 0.0558 0.0016 1.0212 BADOLS 0.0432 0.0425 0.0420 0.0413 0.0007 0.9650 BADOLS-H 0.0490 0.0484 0.0479 0.0473 0.0009 0.9839 Rep. dropped 0 73 124 172 * due to outliers 51 Supplemental Table 1.18: (Change of p) p = 0.75, a = fl = 6 = 0, 0,2, = A = 1, N = 200 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 -0.0006 0.1280 0.0164 6 -00327 0.1869 0.0360 6‘ -00003 0.1109 0.0123 63 1.0055 0.2272 0.0516 ,1? 0.9601 0.3527 0.1260 "TE 0.5147 0.1776 0.0557 0.6107 Restricted MLE(6:0) a 0.0199 0.1804 0.0329 6 0.0001 0.0933 0.0087 63 0.9968 0.2228 0.0496 :12 0.9944 0.3455 0.1194 TE 0.5094 0.1769 0.0546 0.6195 Restricted NLLS 77 -1.0004 0.0995 0.0099 (OLS on y,=n+Bx,-+w,-: B 0.0007 0.1005 0.0101 17=—1,/3=0,0,2U=2) 63,, 2.0025 0.2692 0.0725 STATISTICS Sizel Size2 Size3 Size4 Mean s.d. WALD-OPG 0.0213 0.0216 0.0215 0.0217 -0.006O 0.8548 WALD-HES 0.0308 0.0312 0.0309 0.0312 -0.0072 0.9304 LM-OPG 0.0794 0.0790 0.0775 0.0775 1.2305 1.8417 LM-HES 0.0529 0.0527 0.0524 0.0523 1.0285 2.1901 GDV 0.0442 0.0446 0.0445 0.0448 0.9949 1.3236 BADGDV 0.0226 0.0224 0.0222 0.0220 0.6994 1.0175 OLS 0.0495 0.0491 0.0483 0.0480 -0.0003 0.9993 OLS-H 0.0575 0.0574 0.0563 0.0563 0.0005 1.0234 BADOLS 0.0027 0.0027 0.0027 0.0028 0.0000 0.6601 BADOLS-H 0.0052 0.0053 0.0052 0.0052 -0.0008 0.6740 Rep. dropped 0 123 161 244 52 Supplemental Table 1.19: (Change of 6 and p) 6 = 0.05, p = 0.9, a fi = 0, 03 = A = 1, N = 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 5 0.0527 0.0534 0.0029 8 -00045 0.0751 0.0057 3 0.0033 0.0530 0.0028 63 1.0004 0.0979 0.0096 12 0.9932 0.1506 0.0227 TE 0.5023 0.1803 0.0515 0.6234 Restricted MLE (6 = O) 61 -0.0003 0.0744 0.0055 1? -0.0295 0.0414 0.0026 6?, 0.9971 0.0975 0.0095 12 1.0069 0.1498 0.0225 :75 0.5006 0.1802 0.0515 0.6236 Restricted NLLS (OLS on f] -1.0009 0.0466 0.0022 yi=n+flxi+wiz n=—1.0013, [3' -0.0438 0.0448 0.0020 6=-0.0451,e§,=2.0075) 53,, 2.0081 0.1194 0.0143 STATISTICS Powerl Power2 _Power3 Power4 Mean s.d. WALD-OPG 0.1380 0.1384 0.1381 0.1385 0.9407 0.9431 WALD-HES 0.1520 0.1525 0.1521 0.1525 0.9617 0.9560 LM-OPG 0.1795 0.1795 0.1796 0.1796 2.0105 2.4353 LM-HES 0.1610 0.1610 0.1611 0.1611 1.9042 2.2712 GDV 0.1650 0.1650 0.1651 0.1651 1.8615 2.1505 BADGDV 0.0560 0.0562 0.0560 0.0562 1.1487 1.3851 OLS 0.0750 0.0747 0.0750 0.0748 -O.4981 1.0049 OLS-H 0.0750 0.0747 0.0750 0.0748 -0.5001 1.0104 BADOLS 0.0000 0.0000 0.0000 0.0000 -0.2175 0.4377 BADOLS-H 0.0000 0.0000 0.0000 0.0000 -0.2174 0.4383 Rep. dropped 0 6 l 7 The number of replication is 2000. 53 0, Supplemental Table 1.20: (Change of 6 and p) 6 = 0.15, p = 0.9, a = fl = 0,2, = /\ = 1, N = 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 0.1536 0.0549 0.0030 6 -00045 0.0758 0.0058 6 0.0029 0.0530 0.0028 63 0.9999 0.0980 0.0096 P 0.9931 0.1536 0.0236 TE 0.5023 0.1836 0.0512 0.6308 Restricted MLE (6:0) 6 0.0147 0.0746 0.0058 3 -0.0917 0.0419 0.0102 63 0.9856 0.0977 0.0097 312 1.0580 0.1548 0.0273 2717: 0.4945 0.1828 0.0520 0.6232 Restricted NLLS (OLS on 17 -1.0110 0.0470 0.0022 yi=n+flxi+wizn=—1.0113, 6 -0.1353 0.0459 0.0021 6=—0.1365,63_,,=2.0693) 63, 2.0542 0.1264 0.0162 STATISTI-CS Powerl Power2 "Power3 Power4 Mean s.d. WALD-OPG 0.7830 0.7865 0.7829 0.7864 2.6717 0.9268 WALD-HES 0.8140 0.8177 0.8139 0.8176 2.7330 0.8988 LM-OPG 0.8360 0.8358 0.8359 0.8357 9.3681 5.8643 LM-HES 0.8260 0.8262 0.8259 0.8261 9.0193 5.6024 GDV 0.8125 0.8122 0.8124 0.8121 7.9604 4.5539 BADGDV 0.6310 0.6308 0.6308 0.6307 5.5233 3.5299 OLS 0.3155 0.3154 0.3157 0.3156 -1.4680 1.0098 OLS-H 0.3155 0.3149 0.3157 0.3151 -1.4676 1.0078 BADOLS 0.0005 0.0005 0.0005 0.0005 -O.6396 0.4401 BADOLS-H 0.0000 0.0000 0.0000 0.0000 -O.6271 0.4319 Rep. dropped 0 1 9 1 10 The number of replication is 2000. 54 Supplemental Table 1.21: (Change of the distribution of u: to N (0, 1)+) a = B 6:0,63=A=1,p=0.5,N=1000[E(exp(—u))=0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 3 0.0010 0.0984 0.0097 6 -O.3498 0.1193 0.1366 6 0.0014 0.0396 0.0016 63 1.1488 0.0995 0.0321 .12 0.2119 0.0933 0.6298 TE 0.6958 0.0885 0.0835 0.4635 Restricted MLE (6 =0) 67 -0.3395 0.1159 0.1286 3 0.0011 0.0356 0.0013 63 1.1434 0.0996 0.0305 i2 0.2218 0.0955 0.6147 7713‘ 0.6901 0.0881 0.0807 0.4832 Restricted NLLS 77 -0.7972 0.0365 0.0425 (OLS on y,=n+6x,-+w,-: 6" 0.0012 0.0357 0.0013 n=—1,6=0,63,=2) 63, 1.3658 0.0612 0.4060 STATISTICS; Sizel Size2 — Size3 Size4 Mean s.d. WALD-OPG 0.0125 0.0132 0.0149 0.0153 0.0178 0.8294 WALD-HES 0.0205 0.0217 0.0217 0.0223 0.0183 0.8670 LM-OPG 0.0560 0.0565 0.0490 0.0491 1.0036 1.5132 LM—HES 0.0725 0.0713 0.0657 0.0637 2.0072 34.9627* GDV 0.0255 0.0269 0.0291 0.0300 0.8246 1.1256 BADGDV 0.0320 0.0328 0.0291 0.0293 0.7639 1.1324 OLS 0.0510 0.0486 0.0434 0.0427 -0.0098 0.9697 OLS-H 0.0540 0.0502 0.0453 0.0446 -0.0093 0.9744 BADOLS 0.0260 0.0254 0.0229 0.0229 -0.0082 0.8404 BADOLS-H 0.0240 0.0232 0.0205 0.0204 .0.0079 0.8439 Rep. dropped 0 107 387 431 * due to outliers. The number of replication is 2000. 55 Supplemental Table 1.22: (Change of the distribution of u: to N (0,1r/ (7r — 2))+) a = 6 = 6 = 0, 63 = ,\ = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 3 -00002 0.0548 0.0030 6 -05135 0.1031 0.2744 8 0.0016 0.0460 0.0021 63 1.3480 0.1296 0.1379 R? 0.6616 0.1488 0.1366 {FE 0.5541 0.1456 0.0774 0.6116 Restricted MLE (6 =0) 52 -0.5069 0.1047 0.2679 6 0.0017 0.0423 0.0018 63 1.3420 0.1311 0.1342 .12 0.6744 0.1533 0.1295 T177 0.5518 0.1460 0.0764 0.6140 Restricted NLLS 1? -13225 0.0448 0.1060 (OLS on y,-=n+6:r,:+w,-: 8 0.0019 0.0433 0.0019 n=—1,6=0,63,=2) 63, 2.0018 0.0938 0.0088 STATISTICS Sizel Size2! Size3 Size4 Mean s.d. WALD-OPG 0.0295 0.0296 0.0296 0.0297 .0.0025 0.9065 WALD-HES 0.0395 0.0396 0.0396 0.0397 -0.0009 0.9490 LM-OPG 0.0605 0.0607 0.0607 0.0608 1.0705 1.5630 LM-HES 0.0535 0.0536 0.0532 0.0533 0.9746 1.4373 GDV 0.0510 0.0511 0.0512 0.0513 0.9959 1.3894 BADGDV 0.0390 0.0391 0.0391 0.0392 0.8596 1.2448 OLS 0.0480 0.0481 0.0481 0.0483 0.0098 1.0040 OLS-H 0.0500 0.0501 0.0502 0.0503 0.0099 1.0085 BADOLS 0.0260 0.0261 0.0261 0.0261 0.0089 0.8698 BADOLS-H 0.0270 0.0271 0.0271 0.0271 0.0091 0.8730 Rep. dropped 0 5 6 11 The number of replication is 2000. 56 Supplemental Table 1.23: (Change of the distribution of u: to N (1, 1)+) a = B 6:0,63=A=1,p=0.5,N=1000[E(exp(—u))=0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 0.0000 0.0992 0.0098 6 -0.8053 0.1340 0.6665 8 0.0015 0.0431 0.0019 63 1.3787 0.1197 0.1577 .12 0.2470 0.1141 0.5800 fi 0.6807 0.0910 0.1564 0.5174 Restricted MLE (6:0) 6 0.7959 0.1294 0.6502 6 0.0015 0.0391 0.0015 63 1.3738 0.1193 0.1539 i2 0.2565 0.1144 0.5658 27:5 0.6755 0.0884 0.1471 0.5412 Restricted NLLS 17 -1.2871 0.0405 0.0840 (OLS on yi=n+flxi+wiz 6 0.0015 0.0392 0.0015 n=—1,6=0,63,=2) 63, 1.6312 0.0736 0.1414 STATISTICS Sizel Size2_ Size3 Size4 Mean s.d. WALD-OPG 0.0055 0.0058 0.0069 0.0070 0.0118 0.7864 WALD-HES 0.0120 0.0127 0.0119 0.0122 0.0148 0.8308 LM-OPG 0.0605 0.0602 0.0469 0.0480 0.9462 1.3146 LM-HES 0.0745 0.0755 0.0695 0.0698 1.2467 4.0045* GDV 0.0195 0.0195 0.0188 0.0192 0.7776 0.9973 BADGDV 0.0285 0.0280 0.0200 0.0205 0.7140 0.9702 OLS 0.0540 0.0512 0.0438 0.0423 .0.0224 0.9365 OLS-H 0.0550 0.0517 0.0444 0.0423 -0.0228 0.9407 BADOLS 0.0215 0.0211 0.0144 0.0141 -0.0196 0.8114 BADOLS-H 0.0215 0.0211 0.0138 0.0135 .0.0202 0.8146 Rep. dropped 0 106 402 439 * due to outliers. The number of replication is 2000. 57 Supplemental Table 1.24: (Change of the distribution of u: to gamma(0.5, J2» 6 = B = 6 = 0, 63 = A = 1, p = 0.5, N = 1000 [E(exp(—u)) = 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 8 0.0004 0.0446 0.0020 6 0.3384 0.0667 0.1190 6 0.0009 0.0412 0.0017 63 0.8493 0.0812 0.0293 X2 1.0968 0.1470 0.0309 TE 0.4905 0.1903 0.0837 0.6173 Restricted MLE (6:0) 61 0.3389 0.0660 0.1192 6” 0.0008 0.0389 0.0015 63 0.8494 0.0810 0.0292 512 1.0996 0.1452 0.0310 TE 0.4902 0.1902 0.0838 0.6175 Restricted NLLS 1'? -0.7075 0.0449 0.0876 (OLS on y,-=n+ 66,416,: 6" 0.0009 0.0440 0.0019 n=—1,6=0,63,=2) 63, 1.9974 0.1367 0.0187 —STATISTICS Size4 Mean s.d. WALD-OPG 0.0710 0.0081 1.0704 WALD-HES 0.0620 0.0089 1.0491 LM-OPG 0.0630 1.0989 1.5310 LM-HES 0.0655 1.1158 1.5362 GDV 0.0525 1.0550 1.4346 BADGDV 0.0425 0.9360 1.3003 OLS 0.0495 -0.0156 0.9961 OLS-H 0.0520 -0.0155 0.9984 BADOLS 0.0225 -0.0132 0.8622 BADOLS-H 0.0215 .0.0132 0.8623 Rep. dropped 0 The number of replication is 2000. 58 Supplemental Table 1.25: (Change of the distribution of u? to gamma(2,1/\/2)) a = 6 = 6 = 0, 63 = ,\ = 1, p = 0.5, N =1000[E(exp(-u))= 0.5232] ESTIMATION METHODS Estimates Mean s.d. MSE Corr MLE 6 0.0007 0.0488 0.0024 6 -O.5069 0.0880 0.2647 6 0.0009 0.0452 0.0020 63 1.1766 0.1170 0.0449 2\2 0.8278 0.1491 0.0519 TE 0.5253 0.1637 0.0684 0.6072 Restricted MLE(6:0) 61 -0.5013 0.0901 0.2594 6 0.0007 0.0422 0.0018 63 1.1716 0.1179 0.0433 .12 0.8403 0.1540 0.0492 TE 0.5235 0.1641 0.0677 0.6084 Restricted NLLS 6 -1.4140 0.0452 0.1735 (OLS on yi=n+fixi+w32 6 0.0005 0.0445 0.0020 n=—1,6=0,63,=2) 63, 2.0009 0.1020 0.0104 —STATISTICS Size4 Mean s.d. WALD-OPG 0.0395 0.0161 0.9434 WALD-HES 0.0415 0.0153 0.9745 LM-OPG 0.0610 1.0617 1.4787 LM-HES 0.0475 0.9753 1.3569 GDV 0.0535 1.0122 1.3750 BADGDV 0.0335 0.8640 1.2086 OLS 0.0465 -0.0230 1.0065 OLS-H 0.0500 -0.0225 1.0130 BADOLS 0.0235 -0.0202 0.8711 BADOLS-H 0.0240 -0.0197 0.8754 Rep. dropped 0 The number of replication is 2000. 59 Chapter 2 On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data 2.1 Introduction This chapter is concerned with the construction of confidence intervals for efficiency levels of individual firms in stochastic frontier models with panel data. A number of different techniques have been proposed in this literature to address this problem. Given a distributional assumption for technical inefficiency, maximum likelihood esti- mation was proposed by Pitt and Lee (1981). Battese and Coelli (1988) showed how to construct point estimates of technical efficiency for each firm, and Horrace and Schmidt (1996) showed how to construct confidence intervals for these efficiency lev- els. Without a distributional assumption for technical efficiency, Schmidt and Sickles (1984) proposed fixed effects estimation, and the point estimation problem for effi- 60 ciency levels was discussed by Schmidt and Sickles (1984) and Park and Simar (1994). Simar (1992) and Hall, Hardle, and Simar (1993) suggested using bootstrapping to conduct inference on the efficiency levels. Horrace and Schmidt (1996) and Horrace and Schmidt (2000) constructed confidence intervals using the theory of multiple comparisons with the best, and Kim and Schmidt (1999) suggested a univariate ver- sion of comparisons with the best. Bayesian methods have been suggested by Koop, Osiewalski, and Steel (1997) and Osiewalski and Steel (1998). In this chapter we will focus on bootstrapping and some related procedures. We provide a survey of various versions of the bootstrap, for construction of confidence intervals for efficiency levels. We also propose a simple alternative to the bootstrap that uses standard parametric methods, acting as if the identity of the best firm is known with certainty, and we propose some new resampling methods that corre- spond to this parametric procedure. We present Monte Carlo simulation evidence on the accuracy of the bootstrap and our simple alternative. Finally, we present some empirical results to indicate how these methods work in practice. 2.2 Fixed-Effects Estimation of the Model Consider the basic panel data stochastic frontier model of Pitt and Lee (1981) and Schmidt and Sickles (1984), yit=a+x2tfl+vit—ui, i=1,---,N, t=1,--- ,T, (2.1) where i indexes firms or productive units and t indexes time periods. y“ is the scalar dependent variable representing the logarithm of output for the 7th firm in period t, a is a scalar intercept, zit is a K x 1 column vector of inputs (e.g., in logarithms for the Cobb-Douglas specification), 6 is a K x 1 vector of coefficients, and ”it is an i.i.d. error term with zero mean and finite variance. The time-invariant 11,- satisfy u,- 2 0, and 61 11,- > 0 is an indication of technical inefficiency. For a logarithmic specification such as Cobb-Douglas, the technical efficiency of the 1th firm is defined as ri = exp(—11,), so technical inefficiency is 1 — r3. For small values of 11,-, 11,- is approximately equal to 1 — exp(—11;) = 1 — r3, so that 11,; itself is sometimes used as a measure of technical inefficiency. Now define a; = oz -— 111'. With this definition, (2.1) becomes the standard panel data model with time-invariant individual effects: I 3111 = 01 + $115 + ”it (2-2) Obviously we have 11,- : a — oz,- and 01,: S 01 since 11,- 2 0. The previous discussion regards zero as the minimal possible value of 111' and a as the maximal possible value of a,- over any possible sample; that is, essentially, as N —+ 00. It is also useful to consider the following representation in a given sample size of N. We write the intercepts oz,- in ranked order, as: 0(1) S 0(2) S. S a(N) (2-3) so that in particular (N) is the index of the firm with largest value of 01,- among N firms. It is convenient to write the values of 11,: in the opposite ranked order, as “(N) _<_ S 11(2) 3 11(1), so that a“) = 01 — 11(3). Then obviously 0W) = 01 — “(N), and firm (N) has the largest value of a,- or equivalently the smallest value of 11,- among N firms. We will call this firm the best firm in the sample. In some methods we measure inefficiency relative to the best firm in the sample, and this corresponds to considering the relative efficiency measures: :1: 11:- = 11,- — "(N) 2 am) -— 61,-, r3" = exp(—113). (2.4) 62 Fixed effects estimation refers to the estimation of the panel data regression model (2.2), treating a,- as fixed parameters. Because the a, are treated as parameters, we do not need to make any distributional assumption about the inefficiencies; nor do we need to assume that they are uncorrelated with the $11 or the 11,-t. We assume strict exogeneity of the regressors 33,}, in the sense that (1,-1,1652, - - - ,x,T) are independent of (0,1,v,2, - -« ,11,T). We also assume that the 12,} are i.i.d. with zero mean and constant variance 0,2,. We do not need to assume a distribution for the v,,. The fixed effects estimates 6, also called the within estimates, may be calculated by regressing (ya — 17,-) on (17111 - 13,-), or equivalently by regressing ya on “lit and a set of N dummy variables for firms. We then obtain 0?,- = g, — 5:3, or equivalently the 61 are the estimated coefficients of the dummy variables. This leads to the following expression for 61,-: 6,- = 6,- + 17,- — 6m“ — 6). (2.5) The fixed effects estimate 6 is consistent as NT -—> co, and its variance is of order (N (T — 1))‘1. For a given firm 1', the estimated intercept 61,- is a consistent estimate of a,- as T —-> 00. Large T is needed for the term 17,- in (2.5) to become negligible. Schmidt and Sickles (1984) suggested the following estimates of technical ineffi- ciency, based on the fixed effects estimates: 2‘ = 61 — 61,-. (2.6) Since these estimates clearly measure inefficiency relative to the firm estimated to be the best in the sample, they are naturally viewed as estimates of a( N) and 112‘, that is, of relative rather than absolute inefficiency. We define some further notation. Suppose we write the estimates 61,- in ranked 63 order, as follows: 511 S 512 S S 51[N]- (2-7) So [N] is the index of the firm with the largest 131,, whereas (N) was the index of the firm with the largest 01,-. These may not be the same; for example, firm 129 could be the true best firm (that is, the one with the biggest 01,-), so that (N) = 129, but firm 71 could be the estimated best firm (that is, the one with the biggest 61,-), so that [N] = 71. Note also that d as defined in (2.6) above is the same as film], but it may not be the same as am), the estimated a for the unknown best firm. As T —+ 00 with N fixed, 61 is a consistent estimate of a( N) and 112' is a consistent estimate of 11;“. However, it is important to note that in finite samples (for small T) 61 is likely to be biased upward, since 61 2 c‘r( N) and E(&(N)) 2 am). That is, the “max” operator in (2.6) induces upward bias, since the largest d,- is more likely to contain positive estimation error than negative error. This bias is larger when N is larger and when the 61,- are estimated less precisely. The upward bias in ii induces an upward bias in the 11;“ and a downward bias in 7"; = exp(—1‘12“); we underestimate efficiency because we overestimate the level of the frontier. Schmidt and Sickles (1984) argued that 51 and 11'; are consistent estimates of a and 11,- if both N and T approach 00; that is, if both N and T are large, we can regard the 11'; as estimates of absolute and not just relative inefficiency. The argument is simple. As T ——> 00, d and 11: are consistent estimates of a( N) and 11;", as noted above. As N —-> oo, 11( N) should converge to 0 so that 01( N) converges to a and the 11: should converge to the corresponding 11,-. A more rigorous treatment of the asymptotics for this model is given by Park and Simar (1994), who show that, in addition to N —+ co and T —> 00, we need to require T"1/2 In N ——> 0 in order to ensure the consistency of 61 as an estimate of a. This latter requirement limits the rate at which N can grow 64 relative to T in order to ensure that the upward bias induced by the max operation disappears asymptotically. 2.3 Construction of Confidence Intervals by Boot- strapping We can use bootstrapping to construct confidence intervals for functions of the fixed effects estimates. The inefficiency measures 11;? and the efficiency measures 1"“ = a: exp(—1‘1- ,) are functions of the fixed effects estimates and so bootstrapping can be used for inference on these measures. We begin with a very brief discussion of bootstrapping in the general setting in which we have a parameter 6, and there is an estimate 6 based on a sample 21, - - - ,2” of i.i.d. random variables. The estimator 6 is assumed to be regular enough so that n1/2(6 — 6) is asymptotically normal. The following bootstrap procedure will be repeated many times, say for b = 1, - -- ,B where B is large. For iteration 0, construct pseudo data 2?) , - -- ,ng) by sampling randomly with replacement from the original data 21, - -- ,2". From the pseudo data, construct the estimate 6“”. The basic result of the bootstrap is that, under fairly general circumstances, the asymptotic (large 11) distribution of n1/2(6(b) — 6) conditional on the sample is the same as the (unconditional) asymptotic distribution of n1/2(6 — 6). Thus for large n the distribution of 6 around the unknown 6 is the same as the bootstrap distribution of 6“” around 6, which is revealed by a large number (B) of draws. We now consider the application of the bootstrap to the specific case of the fixed effects estimates. Our discussion follows Simar (1992). Let the fixed effects estimates be ,6 and 61,-, from which we calculate 1‘1; and 1“: (1' = 1, - - - ,N). Let the residuals be 13,-, = y,, — d,- — 263,6 (1' = 1, - ~- ,N, t = 1, - -- ,T). The bootstrap samples will be drawn by resampling these residuals, because the ”it are the quantities analogous to 65 the 2’s in the previous paragraph, in the sense that they are assumed to be i.i.d., and the 17,-, are the observable versions of the 1),}. (The sample size 71 above corresponds to NT). So, for bootstrap iteration b (= 1, -- . ,B) we calculate the bootstrap sample 11(5) and the pseudo data 311(1)“ _ ai+$ 115“”, (5) .From these data we get the bootstrap estimates 3(5), 619,) 11:“) ,and r, (b ) ,and the bootstrap distribution of these estimates is used to make inferences about the parameters. We note that the estimates 1‘1: and 1“: depend on the quantity max,- 62,-. Since “max” is not a smooth function, it is not immediately apparent that this quantity is asymptotically normal, and if it were not the validity of the bootstrap would be in doubt. A rigorous proof of the validity of the bootstrap for this problem is given by Hall, Héirdle, and Simar (1995). They prove the equivalence of the following three statements: (i) max,- 61,- is asymptotically normal. (ii) The bootstrap is valid as T —) 00 with N fixed. (iii) There are no ties for max,- 01,-: that is, there are a unique index (N) such that a( N) = max_,- 01,-. There are two important implications of this result. First, the bootstrap will not be reliable unless T is large. Second, this is especially true if there are near ties for max,- a,-, in other words, when there is substantial uncertainty about which firm is best. We now turn to specific bootstrapping procedures, which differ in the way they draw inferences based on the bootstrap estimates. In each case, suppose that we are trying to construct a confidence interval for 11’; = max,- aj — 01,-. That is, for a given confidence level c, we seek lower and upper bounds L,, U,- such that P(L,- 5 11: s U,) = 1 -- c. The simplest version of the bootstrap is the percentile bootstrap. Here we simply take L,- and U,- to be the upper and lower c/ 2 fractiles of the bootstrap distribution of the 11:“). More formally, let F be the cumulative distribution function (cdf) for 11;“ so that E(s) = P(1‘1:(b) S s) = the fraction of the B bootstrap replications in which 11:0,) 3 3. Then, we take L,- = F‘1(c/2) and U,- = F’1(1 — c/2). 66 The percentile bootstrap intervals are accurate for large T but may be inaccurate for small to moderate T. This is a general statement, but in the present context there is a specific reason to be worried, which is the finite sample upward bias in max,- 61,- as an estimate of maxj 01,-. This will be reflected in improper centering of the intervals and therefore inaccurate coverage probabilities. Simulation evidence on the severity of this problem is given by Hall, Hiirdle, and Simar (1993) and in Section 2.5 of this chapter. Several more sophisticated versions of the bootstrap have been suggested to con- struct confidence intervals with higher coverage probabilities. Hall, Héirdle, and Simar (1993) and Hall, HérdIe, and Simar (1995) suggested the iterated bootstrap, also called the double bootstrap, which consists of two stages. The first stage is the usual per— centile bootstrap which constructs, for any given c, a confidence interval that is in- tended to hold with probability of 1 — c. We will call these “nominal” 1 — c confidence intervals. The second stage of the bootstrap is used to estimate the true coverage probability of the nominal 1 — c confidence intervals, as a function of c. That is, if we define the function 7r(c) = true coverage probability level of the nominal 1 — c level confidence interval from the percentile bootstrap, then we attempt to evaluate the function 7r(c). When we have done so, we find 0*, say, such that 1r(c* ) = 1 — c, and then we use as our confidence interval from the first stage percentile bootstrap, which we “expect” to have a true coverage probability of 1 — c. The mechanics of the iterated bootstrap are uncomplicated but time-consuming. For each of the original (first stage) bootstrap iterations B, the second stage involves a set of 32 draws from the bootstrap residuals, construction of pseudo data, and construction of percentile confidence intervals, which then either do or do not cover the original estimate 6. The coverage probability function 7r(c), which is the actual rate at which a nominal c—level interval based on the bootstrap estimates covers the true parameter 6, is estimated by the rate at which a nominal c-level interval based on 67 the iterated bootstrap estimates covers the original estimate 6. To understand this, note that data generated from the true 6 yield 6; bootstrap data generated based on 6 yield the bootstrap estimates 60’); and data based on 60’) yield the iterated bootstrap estimates, say 6(b’b1). So the iterated bootstrap estimates 6(b’bl) have the same relationship to 6 as the bootstrap estimates 60’) have to 6. Generally we take B2 = B, so that the total number of draws has increased from B to B2. by going to the iterated bootstrap. Theoretically, the error in the percentile —1/2 1 bootstrap is of order 11 while the error in the iterated bootstrap is of order n“ . There is no clear connection between this statement and the question of how well finite sample bias is handled. An objection to the iterated bootstrap is that it does not explicitly handle bias. For example, if the nominal 90% confidence intervals only cover 75% of the bootstrap estimate in the first stage, it simply insists on a higher nominal confidence level, like 98%, so as to get 90% coverage. That is, it just makes the intervals wider when bias might more reasonably be handled by recentering the intervals. A technique that does recenter the intervals is the bias-adjusted bootstrap of Efron (1982) and Efron (1985). As above, let 6 be the parameter of interest, 6 the sample estimate and 60’) the bootstrap estimate (for b = 1, - -- ,B), and F the bootstrap cdf. For n large enough that the bootstrap is accurate, we should expect F(6) = 0.5, and failure of this to occur is a suggestion of bias. Now define 20 = ‘1(F(6)) where (I) is a standard normal cdf, and where F(6) = 0.5 would imply 20 = 0. Let 26/2 be the usual normal critical value; e.g. for c = 0.1, 26/2 = 20.05 = 1.645. Then, the bias-adjusted bootstrap confidence interval is [L,-, U,] with: A A L. = F‘1((2zo — 262)). U.- = F‘1((2zo + 262)) (2.8) For example, suppose that there is an upward bias, reflected by the fact that 60% 68 of the bootstrap draws are larger than 6, so that F(6) = 0.4. Then .20 = —0.253, and for c = 0.1 we have (2zo — 26/2) = (-—2.152) = 0.016 and (220 + zc/2) = 0.873. Thus our confidence interval comes from the lower tail 0.016 fractile and the upper tail 0.127 fractile, and we have compensated for upward bias by moving the interval left. This seems intuitively reasonable. The assumption that justifies the bias-adjusted bootstrap is that, for some monotone increasing function 9, (9(6) — 9(6)) is distributed as N (—zoa, 02) and (9(6(b)) — 9(6)) is also distributed as N (—200, 02) for some 20, 02. (The first distribution is from the probability law of the sample, and the second is the bootstrap distribution in- duced by resampling from the given sample.) Thus we have normality, and also equal biases and variances, for some transformation of 6. The transformation function 9 need not be known. This is an advantage in implementation, but a disadvantage in trying to decide whether the assumption holds. It is not known whether the bias- adjusted bootstrap is valid for our specific problem, but it performs relatively well in the simulations reported in Section 2.5. The final version of the bootstrap that we will consider is the bias-adjusted and accelerated bootstrap of Efron and Tibshirani (1993). This is intended to allow for a possibility that the variances of 6 depends on 6, so that a bias-adjustment also requires a change in variance. This correction depends on some quantities defined in terms of the so-called jackknife values of 6. For i = 1, - - - ,n, let 6“) be the value of the estimate based on all observations other than observation 1; and let é(°) = 11’1 £3le 6(,-) be the average of these values. Then the “acceleration” factor a is defined by: ?=1 (99) 7 5(1))3 1.5 6 (231 (an) - 6“(1))2) a = (2.9) 69 With 20 and 26/2 defined as above, define (20 + 20/2) (30 “ Zen) 1 _ Gui (20 + 26/») (2.10) (1—6, (zo—zc/,))' Then the confidence interval is [L,-, U,] with L,- = 13"1 ((b,-2)). bt'1=20+( 1 b12=Zo+ More discussion can be found in Efron and Tibshirani (1993, chapter 14). It is important to note that there are cases in which the acceleration factor fails to be defined. This happens when all the jackknifed estimates are the same, which yields zero both for the numerator and for the denominator of the acceleration factor. For example, one firm could be so dominantly efficient in the industry that jackknifing the best firm (in our case, dropping one time dimensional observation) would not change the efficiency rank for the best firm. Also, with large T, the firms’ efficiency ranking would not be affected by taking out one time period observation, so that it is more likely for the acceleration factor not to be defined. However, as N gets large, it is less likely for the acceleration factor not to be defined since it would be harder to have one specific firm uniformly as the best estimated firm with more firms in sample. In the following sections, when the acceleration factor is not defined, we do not accelerate the bias-adjusted bootstrap. After all, the bias-adjusted bootstrap is a special case of the bias-adjusted and accelerated bootstrap with the acceleration factor of zero. 2.4 A Simple Alternative to the Bootstrap In this section we propose a simple parametric alternative to the bootstrap, and some related resampling procedures. We begin with the following simple observation. We wish to construct a confidence interval for 11’: = a( N) — 61,-, or r: = exp(—113‘). If we knew which firm was best - that is, if we knew the index (N) - we could construct a 70 parametric confidence interval of the form: (61( N) — 61,-) i (critical value) :1: (standard error), (2.11) where “critical value” would be the apprOpriate c/ 2 level critical value of the standard normal distribution, and “standard error” would be the square root of the quantity: estimated variance of 61( N) + estimated variance of 61,- - 2*estimated covariance of (61(N),61,-). This interval would be valid asymptotically as T —> 00 with N fixed. In fact, if the 12,-, are i.i.d. normal and we use the critical value from the student-t distribution, this interval would be valid in finite samples as well. The confidence interval (2.11) is infeasible because the identity of the best firm is unknown. However, we can construct the confidence interval: (61[ N] — 61,-) :1: (critical value) =1: (standard error), (2.12) where as before max,- 61,- : am]. That is, we use a confidence interval that would be apprOpriate if (N) were known, and we simply pretend that [N] = (N). That is, we pretend that we do know the identity of the best firm. This is our “simple parametric” confidence interval. Two details should be noted. First, in calculating the standard error in (2.12), we evaluate Var(61[N]) and Cov(6r[N], 61,-) using the standard formulas that ignore the fact that the index [N] is data-determined. That is, again we pretend that [N] = (N) is known. Second, although 01(N) — a, Z 0, the lower bound of the confidence interval in (2.12) can be negative. If it is, set it to zero. This corresponds to setting the upper bound of the relative efficiency measure r; to one. The asymptotic (T —-+ 00 with N fixed) validity of this procedure follows from the same argument that Hall, Hiirdle, and Simar (1995) used to show that maxj 61,- is asymptotically normal. If there are no ties for max,- 01,-, then as T —> 00, P( [N] = 71 (N )) —-> 1. That is, with no ties, in the limit there is no uncertainty about the identity of the best firm. An obvious implication of this argument is the following. For data sets in which there is substantial uncertainty about the identity of the best firm, the accuracy of either bootstrap intervals or our simple parametric intervals is doubtful. The simple parametric intervals differ from bootstrap intervals in an important way that goes beyond parametric versus resampling methods. Consider the following resampling scheme, which could also be used to create a confidence interval for 11'; = a( N) — 01,-, treating (N) = [N] as known. Create bootstrap samples b = 1, - -- ,B as above. For sample b, calculate a:,(11b'1)az—best = ($33] - alb) (2-13) where [N] is still the index such that lel = max,- 61,- in the original sample. Then create a percentile-interval from these quantities. 140 um am_ be st differ from the bootstrap quantities Note that the quantities 11 {,TU’) = max 61gb) — (31(1)), (2.14) as defined in Section 2.3. For the bootstrap quantities, there is a “max” in the original data to get 61[ N] and then there is another “max” in each bootstrap sample. That is, the bootstrap samples are deliberately analyzed in exactly the same way as the original sample was. In (2.13), there is still a “max” in the original sample, but in the bootstrap samples we maintain the identity of the “best” firm in the original samples. We will call this the “max-best bootstrap,” although actually it is not really a bootstrap procedure at all. It is just a resampling scheme. Semantic issues aside, it is the “max-best” bootstrap that should be similar to our simple parametric procedure. Our motivation for discussing the “max-best” procedure is mostly to make 72 clear why our simple parametric intervals may be expected to be rather different from percentile bootstrap intervals, when the identity of the best firms is in doubt. As noted above, the “max” operator causes c31( N) to be biased upward as an estimate of a( N): and this causes an upward bias in 112‘ and a downward bias in 1": = exp(—112‘). The second “max” in (2.14) in the bootstrap samples causes additional bias. For this reason the percentile bootstrap intervals will tend to be seriously miscentered. Our simple parametric intervals, or “max-best” bootstrap intervals, do not contain the second source of bias and may be expected to be more accurate than percentile bootstrap intervals. Of course, precisely because they do not contain the second source of bias, the parametric or “max-best” intervals cannot be bias-adjusted. The bias-adjusted (or bias-adjusted and accelerated) bootstrap intervals described in the previous section use the bias at the bootstrap stage to correct the bias in the original estimates. The ability to do this is a potentially significant advantage of bootstrap methods. 2.5 Simulations In this section we conduct Monte Carlo simulations to investigate the reliability of confidence intervals based on bootstrapping and on the alternative procedures de- scribed in the last section. We are interested in the coverage rates of the confidence intervals and the way that they are related to bias in estimation of efficiency levels. Results for other methods including the MLE can be found in Kim (1999). The model is the basic panel data stochastic frontier model given in (2.1) above. However, we consider the model with no regressors so that we can concentrate our interest on the estimation of efficiencies without having to be concerned about the nature of the regressors. In practical cases, the regression parameters 6 are likely to be estimated so much more efficiently than the other parameters that treating them 73 as known is not likely to make much difference. Our data generating process is: 9,, =a+v,-, —11,- =c1,-+11,-,, i: 1,--- ,N, t: 1,--- ,T, (2.15) in which the 12,-, are i.i.d. N (0, 03) and the 11,- are i.i.d. half-normal: that is, let 11,- : |u,| where u,- ~ N (0, 0,2,). Since our point estimates and confidence intervals are based on the fixed effects estimates of 611,- - - ,aN, the distributional assumptions on 11,, and 11,- do not enter into the estimation procedure. They just define the data generation mechanism. The parameter space is (01, 03, 03, N, T), but this can be reduced. Without loss of generality, we can fix a to any number, since a change in the constant term only shifts the estimated constant term by the same amount, without any effect on the bias and variance of any of the estimates. For simplicity, we fix the constant term equal to one. We need two parameters to characterize the variance structure of model. It is natural to think in terms of 03 and 03. Alternatively, recognizing that 03 is the variance of the untruncated normal from which 11 is derived, not the variance of 11, we can think instead in terms of 03 and Var(11), where Var(u) = 0,2,(7r — 2) / 7r. However, we obtain more readily interpretable results if we think instead in terms of the size of total variance and the relative allocation of total variance between 11 and 11. The total variance is defined as a? = 03 + Var(u). Olson, Schmidt, and Waldman (1980) used )1 = (Ia/0,, to represent the relative variance structure, so that their parametrization was in terms of a? and A. Coelli (1995) used 01:2 and either 7 = 03/(03 + 0,2,) or 7* = Var(11)/(0,2, + Var(11)). The choice between these two parameters is a matter of convenience. We decided to use 7* due to its ease of interpretation, so that we use the parameters 0:2 and 7*. The reason this is a convenient parametrization (compared to 74 the “obvious” choice of 03 and 0,2,) is that, following Olson, Schmidt, and Waldman (1980), one can show that comparisons among the various estimators are not affected by 062. The effect of multiplying or? by a factor of k holding 7* constant, is as follows. 1. constant term: bias change by a factor of x/l; and variance changes by a factor of k, 2. 03 and 0,2,: bias changes by a factor of k and variance changes by a factor of 162, 3. 7* (or 7 or A): bias and variance are unaffected. We set a? at 0.25 arbitrarily, so that the only parameters left to consider are (7*, N, T). We consider three values for 7*, to include a case in which the variance of 11 dom- inates, a case in which the variance of 11 dominates, and an intermediate case. We take 7* = 0.1, 0.5, and 0.9 to represent the above three cases. With a? = 0.25, 03, Var(11), and 0,2, are determined as follows for each value of 7*. 1. 7* = 0.1; 63 = 0.225, Var(u) = 0.025, 63 = 0.069, 2. 7* = 0.5: 63 = 0.125, Var(u) = 0.125, 63 = 0.344, 3. 7* = 0.9: 63 = 0.025, Var(u) = 0.225, 63 = 0.619. Four values of N and T are considered. In order to investigate the effect of changing N, we fix T = 10 and consider N =10, 20, 50, and 100. Similarly, T is assigned the values of 10, 20, 50, and 100 while fixing N = 10. This is done for each different value of 7*. For each parameter configuration (7*, N, T), we perform R = 300 replications of the experiment. For each replication, we calculate the following: 1. The estimate of a, 61 = max]- (31]- : 61W]. 2. The infeasible estimate of c1, 5‘(N)- 75 3. The relative efficiency estimate, 11: = 61 — 61,-, for each i = 1, 2, . - - ,N. 4. The percentile bootstrap confidence interval for 113‘, for each i. 5. The BCa bootstrap confidence interval for 11;", for each 1'. 6. The simple parametric confidence interval (of Section 2.4) for 112‘, for each 1'. 7. The “max-best” bootstrap confidence interval for 112‘, for each i. 8. The infeasible parametric confidence interval (of Section 2.4) for 112‘, for each i. The bootstrap results were based on B = 1000 replications. Note that we did not consider the iterated bootstrap due to its computational demands. We are primarily interested in the biases of the point estimates and the coverage rates of the confidence intervals. These biases and coverage rates are reported as averages over both the N firms (where relevant) and the R replications. In particular, the coverage rate of the confidence intervals is just the fraction of times that coverage occurs. We begin the discussion of our results with Table 2.1. Three measures of biases are considered. biasl = E(61 — a) is the bias in the overall constant, bias2 = E(u: — 11,) is the bias of the estimated relative inefficiency compared to true inefficiency, and bias3 = E(61: — 11;“) is the bias of the estimated relative inefficiency compared to true relative inefficiency. There are two different sources of biasl. These are easily understood in terms of the identity: 61—c1=(d—a(N))-(a—a(N)). (2.16) biasl is E(61 — oz). The first (and generally most important) source of this bias is E(61 — “(N)), which is positive. That is, 61 is biased upward as an estimate of c1( N): 76 because of the “max” operation that defines 61 = max,- 61,-. This bias increases with N, but decreases when T and/or 7* increase. It disappears as T —> 00 or 7* ——> 1. The second source of bias is that E(a(N)) < a, resulting in downward bias for 61. This reflects the fact that 61 - a( N) = min, u,- 2 0. This bias disappears as N —) 00. More generally, it decreases as N increases, and increases with 7*, but does not depend on T. We see examples of both positive and negative bias in column (1) of Table 2.1. As expected, the largest positive bias occurs for large N and small T and 7*, whereas negative bias (absolute value) increases for larger 7* and T and smaller N. The bias of 112‘ as an estimate of 11,- is given in column (2) of Table 2.1. It is essentially the same as the bias of the overall constant term: bias2 = E(112‘ — 11,) = E ((01 —- 61,-) — (c1 — c1,)) = E(61 — a) — E(61,- — 01,-) (2.17) = biasl — E(61,- — 61,-) and E(61,- — (1,) = 0. The estimate 112‘ is perhaps more naturally viewed as an estimate of 112‘. Column (3) gives the bias of 112 as an estimate of 112‘: biasB = E(112‘ — 112‘) 2: E ((61 — 61,-)-(a(1v) — 011)) = E(ét - “(N)) " E(61 — 02') = E(d — 0(N)) > O (2.18) since E(61,- — c1) = 0. Note that bias3 is the first source of biasl, as described above and is always positive. In other words, 112‘ can overestimate or underestimate the absolute efficiency 11,, but (on average) it overestimates the relative efficiency 112‘. We now turn our attention to question of the accuracy of the various types of confidence intervals we have discussed. We present results for 90% confidence inter- vals for r2‘ : exp(—112‘), but the coverage rates would be exactly the same for the 77 corresponding confidenceintervals for 112. We are primarily interested in the coverage rates of the intervals, and the proportions of observations that fall below the lower bound and above the upper bound. The reason we present intervals for r2 (rather than 112‘) is that it is bounded between zero and one, and so the average width of the intervals is easier to interpret. Table 2.2 gives the results for the infeasible parametric intervals based on equation (2.11) of Section 2.4. The coverage rates of these intervals are very close to 0.90, as they should be. These intervals are infeasible in practice, since they depend on knowledge of the identity of the best firm, but they illustrate two points. First, for obvious reasons, the intervals are narrower when T is large and when 7* is large (that is, when the variance of inefficiency is large relative to the variance of noise). The number of firms, N, is not really relevant if we know which one is best. Second, and more fundamentally, there is no difficulty in constructing accurate confidence intervals for technical efficiency if we know which firm is best. All of the problems that we will see with the accuracy of feasible intervals are due to not knowing with certainty which firm is best. Table 2.3 gives the results for the percentile bootstrap and BC", bootstrap confi- dence intervals. Consider first the percentile bootstrap. Its coverage rate is virtually always less than the nominal level of 90%. The problem is that the intervals are not centered on the true values, due to the bias problem discussed above. (The upward bias of 61 as an estimate of 61(N) corresponds to an upward bias in 112‘ and a downward bias in 62‘. Thus too many r2‘ lie above the upper bound of the confidence intervals.) Theoretically, the intervals should be accurate in the limit (as T —+ 00 with N fixed), if there are no ties for max,- 01,-, and so the validity of the percentile bootstrap depends on large T. The bias problem is small when we have large T and 7* and small N, and the coverage probability reaches almost 0.9 for these cases, but it falls in the opposite cases where the bias is big. The width of the intervals decreases as T or 7* increases. 78 However, the intervals get narrower with larger N, while the bias increases as N in- creases. This explains why the coverage probabilities of the percentile intervals fall rapidly as N increases. The results in Table 2.3 indicate that the BC,, intervals provide better coverage rates than the uncorrected percentile intervals, but with the same pattern. They are more accurate when T and 7* are large and when N is small. When T and 7* is small or N is large, there are very considerable improvements over the uncorrected percentile intervals, even though the 80,, intervals do not succeed entirely in yielding correct coverage rates. The bias corrected confidence intervals are obtained by shifting the bootstrap distribution by approximately twice the estimated bias in the bootstrapping stage. If on average (max,- 612-b) — expect a properly centered interval with a coverage rate of approximately 0.9 after max,- (1,) were the same as (max,- a,- — max,- 01,-), we would the bias is corrected. In our simulations, however, only some part of the bias gets corrected. Some evidence on this point is given in Table 2.4, which shows the average of max,- 01,-, max,- 01,-, and max,- 612“) over different values of N, T, and 7*. The fourth column in the table shows the average bias in the fixed effects estimates of max,- 61,-, and the last column shows the average bias in the bootstrap estimates. We see that (.b) _ J is substantial when 7* is small and N is large. As a result, the bias correction is (man 61 max, 61,-) is always smaller than (max,- 61, — max,- a,-) and the difference incomplete especially when 7* is small and N is large. However, the bias correction is always in the right direction, and this explains why BCa intervals are better than the percentile intervals. Table 2.5 gives the results for the feasible parametric intervals based on equation (2.12) of Section 2.4, and for the “max-best” bootstrap. We expect the feasible parametric intervals and those from the “max-best” bootstrap to give similar results, and they do. The parametric intervals have slightly better coverage rates, because 79 they are wider, but the differences are quite small. As a result we will limit our further discussion to the feasible parametric intervals. The feasible parametric intervals are clearly more accurate than the percentile bootstrap intervals. This is especially true in the worst cases. For example, for N = 100, T = 10 and 7* = 0.1, compare coverage rates of 0.195 for the percentile bootstrap and 0.663 for the parametric intervals. The parametric intervals are wider and they are better centered, both of which imply higher coverage rates. To under- stand the point about better centering, recall the discussion of bias in Section 2.4. The parametric intervals have one level of bias (61 is a biased estimate of 0‘(N)) whereas the percentile bootstrap has two (61 is a biased estimate of (1,N), and max,- (3(5) is a biased “estimator” of 61). A more interesting comparison is the feasible parametric intervals versus the B0,, intervals. The feasible parametric intervals generally but not always have better coverage rates than the ECG intervals. This is because they are wider. The cases in which the ECG intervals have better coverage rates than the parametric intervals are cases in which T, N and 7* are all small. These are cases of considerable bias but not the cases with the most bias (see Table 2.4), which would be cases in which T and 7* are small but N is big. Overall, it is hard to say whether the parametric or BCa intervals are better, because there is a conflict between our desire for confidence intervals to cover with correct probability and our desire for them not to be wide. Our last set of simulations is designed to consider cases in which the identity of the best firm is clear. Here we set out one 11,- at the 0.05 quantile of the half normal distribution, while the other (N — 1) are set at equally spaced points between the 0.75 and 0.95 quantiles, inclusive. These 11,- are then held fixed across replications of the experiment. The only randomness therefore comes from the stochastic error 11. Since the identity of the best firm should be clear, the bias caused by the max operator should be minimal. Table 2.6 gives the bias of the fixed effects estimate, and is of the 80 same format as Table 2.1. Recall that bias3 is the component of the bias caused by the max operator (see equation (2.18) above) and should be small when the identity of the best firm is clear. We can see that bias3 in Table 2.6 is indeed much smaller than in Table 2.1. Correspondingly, we expect the various bootstrap and parametric intervals to be more accurate in the current cases than in the previous ones. Table 2.7 gives the results for the percentile bootstrap, the BC", bootstrap, and the feasible parametric intervals. Clearly the intervals are much more reliable now than they were in the pre- vious cases for which results were reported in Tables 2.3 and 2.5. Note in particular that the percentile bootstrap now does pretty well in all cases except the least favor- able (small T and 7*, and large N). The 30,, bootstrap is now usually worse than the percentile bootstrap. It is counterproductive to try to correct for bias when there is little or no bias. The parametric intervals often cover too often, rather than too seldom, and again this is a reflection of the intervals being wider than the bootstrap intervals. The overall conclusions we draw from our simulations are straightforward. If it is clear from the data which firm is best, all of the methods of constructing confidence intervals work fairly well. There is no need to consider more complicated procedures than the percentile bootstrap. The parametric intervals are also reliable, but they may be wider than necessary. Conversely, if it is not clear from the data which firm is best, none of the methods of constructing confidence intervals are very reliable. The percentile bootstrap is particularly bad. The 80,, bootstrap intervals or the parametric intervals are probably preferred. 81 2.6 Empirical Results We now apply the procedures described above to two well-known data sets. These data sets were chosen to have rather different characteristics. The first data set consists of N = 171 Indonesian rice farms observed for T = 6 growing seasons. For this data set, the variance of stochastic noise (11) is large relative to the variability in 11 (Var(11)): that is, 7* = 0.222 with 67? = 0.138. Inference on inefficiencies will * is small and N is large. The second data be very imprecise because T is small, 7 set consists of N = 10 Texas utilities observed for T = 18 years. For this data set, 03 is small relative to Var(u): 7* = 0.700 with 6122 = 0.010. In this case we can estimate inefliciencies much more precisely because T and 7* are larger, and N is smaller. We will see that the precision of the estimates will differ across these data sets, and that choice of technique matters more where precision is low. A more detailed analysis of these data, including Bayesian results and results for multiple and marginal comparisons with the best, can be found in Kim and Schmidt (1999). 2.6.1 Indonesian Rice Farms These data are due to Erwidodo (1990) and have been analyzed subsequently by Lee (1991), Lee and Schmidt (1993), Horrace and Schmidt (1996), Horrace and Schmidt (2000) and others. There are N = 171 rice farms and T = 6 six-month growing seasons. Output is rice in kilograms and inputs are land in hectares, labor in hours, seed in kilograms and two types of fertilizer (urea in kilograms and phosphate in kilograms). The functional form is Cobb-Douglas with some dummy variables added for region, seasonality for dry or wet season, the use of pesticide and seed types for high yield or traditional or mixed. For a complete discussion of the data, see Erwidodo (1990). The estimated regression parameters are given in Horrace and Schmidt (1996) and 82 we will not repeat them here. Instead we will give point estimates of efficiencies and 90% confidence intervals for these efficiencies. There are 171 firms and so we report results for the three firms (164, 118, and 163) that are most efficient; for the firms at the 75““ percentile (31), 50““) percentile (15) and 25(th) percentile (16) of the efficiency distribution; and for the two worst firms (117, 45). All of these rankings are according to fixed effects estimates. We begin with Table 2.8. It gives the fixed effects point estimates and the lower and upper bounds of the 90% parametric confidence intervals. For the purpose of comparison we also give the point estimates and the lower and upper bound of the 90% confidence intervals for the MLE based on the assumption that inefficiency has a half- normal distribution. See Horrace and Schmidt (1996) for the details of calculations for the MLE. The estimated efficiency levels based on the fixed effects estimates are rather low. They are certainly much smaller than the MLE estimates. This is presumably due to bias in the fixed effects estimates, as discussed previously. This data set has characteristics that should make the bias problem severe: N is large; the a,- are estimated imprecisely because 0,2, is large and T is small; and there are near ties for max,- a,- because 03 is small. Table 2.9 gives 90% confidence intervals based on the percentile bootstrap, the ECG bootstrap, and the iterated bootstrap, as well as the (feasible) parametric inter- vals and the “max-best” bootstrap intervals. The bootstrap results are based on 1000 replications, and in the case of the iterated bootstrap each second-level bootstrap is also based on 1000 replications. There is some similarity between the intervals from different methods, but there are also some interesting comparisons to make. The percentile bootstrap intervals are clearly closest to zero (i.e. they would indicate the lowest levels of efficiency). This is presumably a reflection of bias. Note, for example, that the midpoints of these 83 intervals are clearly less than the fixed effects estimate (which is itself biased toward zero). For the reasons given above, we do not regard these intervals as trustworthy for this data set. The iterated bootstrap intervals are centered similarly to the percentile bootstrap but are wider. The BCa intervals are an upward shift (in the direction of higher efficiency) of the percentile intervals and might be a good choice for this data set. The parametric intervals are also an upward shift of the percentile intervals, though not by as much as the ECG intervals. They are wider than the ECG intervals, and in fact they are about as wide as the iterated bootstrap intervals. They are another possible good choice for this data set; in a sense they are conservative choice. The “max-best” bootstrap intervals are similar to the parametric intervals and are therefore another possible good choice. 2.6.2 Texas Utilities In this section, we consider the Texas utility data of Kumbhakar (1996), which was also analyzed by Horrace and Schmidt (1996) and Horrace and Schmidt (2000). As in the previous section, we will estimate a Cobb-Douglas production function, whereas Kumbhakar ( 1996) estimated a cost function. The data contain information on output and inputs of 10 privately owned Texas electric utilities for 18 years from 1966 to 1983. Output is electric power generated, and input measures on labor, capital and fuel are derived from dividing expenditures on each input by its price. For more details on the data see Kumbhakar (1996). Table 2.10 gives the fixed effects point estimates, the 90% parametric intervals, and the MLE point estimates and 90% confidence intervals. The format is the same as that of Table 2.8, except that now we can report the results for all of the firms. Table 2.11 gives the 90% confidence intervals for the same set of procedures as before, and it is of the same format as Table 2.9, except that results are given for all firms. Compared to the previous data set, we estimate the intercepts a,- much more 84 precisely, because T is larger and 0,2, is smaller. For this reason, and also because N is smaller, we expect there not to be a severe finite sample bias problem in the fixed effects estimates, and we expect that the choice of technique will not matter as much. The MLE estimated efficiencies are larger than those based on fixed effects (except for the “best” firm), but the difference is not nearly as large as for the previous data set. Similarly, the MLE confidence intervals are narrower than the parametric intervals, but not by nearly as much as in Table 2.8. A distributional assumption is much less valuable in the present case. In fact, the accuracy of the MLE intervals is now suspect, because we have only 10 firms, and the asymptotic justification for the MLE requires large N. In Table 2.11, we can see that the parametric intervals and all of the bootstrapping intervals are quite similar. The bias problem is apparently negligible for this data set, and correspondingly our faith in the accuracy of these intervals is relatively strong. We can compare the features of this data set with the setup of our simulation. One of the parametric configurations in our simulation had N = 10, T = 20, and 7* = 0.5, which matches these data quite well. In that case the coverage rates of the various confidence intervals were in the range of 0.87 to 0.88, which are obviously close to 0.90. A technical detail worth noting is that the acceleration factor in the ECG bootstrap was undefined and was therefore set equal to zero. This is further evidence that there was very little bias in estimation. 2.7 Conclusions In this chapter we have provided a survey of the use of bootstrapping to construct confidence intervals for efficiency measures. We discussed several versions of the bootstrap, including the percentile bootstrap, the iterated bootstrap, and the bias- 85 adjusted and accelerated bootstrap. In stochastic frontier models, these methods can be applied to the fixed effects estimates, yielding inferences that are correct asymptotically as T —) 00 with N fixed. We have proposed a simple parametric method of constructing confidence inter- vals. It uses standard methods and simply acts as if the identity of the best firm is known. We also proposed a resampling scheme, the “max-best” bootstrap, which ought to yield confidence intervals similar to the parametric intervals. These pro- cedures are valid under the same conditions that the bootstrap methods are valid, namely, as T —> 00 with N fixed, and provided that there is a unique best firm. The main problem that we encounter is the upward bias in the fixed effects esti- mate of the frontier, which translates into a downward bias for the estimated efficien- cies. The bias is large when T is small, N is large, and/or statistical noise is large relative to the variation in the frontier. These are exactly the same circumstances in which the identity of the best firm is uncertain, and so it is fair to say that bias is a problem when the identity of the best firm is in question. Our simulation results show that the percentile bootstrap is seriously inaccurate when the bias problem exists, that is, when the identity of the best firm is not clear. The percentile bootstrap intervals are miscentered because the bias in the original estimates is compounded by similar “bias” in the bootstrap estimates. Our parametric intervals, or our “max-best” bootstrap intervals, avoid the second source of bias, are more reliable than the percentile bootstrap intervals. The bias corrected and accelerated (BCa) bootstrap makes a bias correction based on the “bias” in the second round, and these intervals are also more reliable than the percentile bootstrap intervals. Comparing the parametric intervals and the ECG intervals, neither clearly dominates the other. The parametric intervals are more conservative. A negative conclusion of the simulations is that none of the methods of construct- ing confidence intervals based on the fixed effects estimates is very reliable if the 86 identity of the best firm is in serious doubt. In such cases it may be worthwhile to consider assuming a distribution for technical inefficiency and using MLE. We performed an empirical analysis of two data sets, one of which had charac- teristics very unfavorable to the bootstrap (large N, small T, and large variance of noise). In this case there was evidence of bias, and the bootstrap intervals were both unreliable and too wide to be informative. Our other data set had more favorable characteristics, and the empirical analysis yielded results that were quite precise and seemingly sensible. Hence, as in the simulations, a major lesson is that the reliability of inference on efficiencies can be judged based on observable features of the data. 87 2.8 Output Tables Table 2.1: Biases of Fixed Effects Estimates biasl bias2 bias3 E(61 — a) E(112‘ — 11,) E(112‘ — 112‘) T 7* N (1) (2) (3) 10 0.1 10 0.103 0.105 0.133 10 0.1 20 0.153 0.155 0.169 10 0.1 50 0.234 0.235 0.241 10 0.1 100 0.276 0.274 0.277 10 0.5 10 -0.009 -0.008 0.055 10 0.5 20 0.045 0.046 0.078 10 0.5 50 0.119 0.119 0.132 10 0.5 100 0.153 0.152 0.159 10 0.9 10 -0.076 -0.075 0.010 10 0.9 20 -0.028 -0.028 0.016 10 0.9 50 0.018 0.018 0.035 10 0.9 100 0.039 0.039 0.049 10 0.1 10 0.103 0.105 0.133 20 0.1 10 0.049 0.046 0.078 50 0.1 10 0.006 0.005 0.035 100 0.1 10 -0.007 -0.007 0.021 10 0.5 10 -0.009 -0.008 0.055 20 0.5 10 -0.038 -0.041 0.030 50 0.5 10 -0.054 -0.054 0.013 100 0.5 10 ~0.058 -0.058 0.004 10 0.9 10 -0.076 -0.075 0.010 20 0.9 10 -0.090 -0.091 0.004 50 0.9 10 -0.087 -0.088 0.002 100 0.9 10 -0.084 -0.085 0.000 88 Table 2.2: 90% Confidence Intervals for Relative Efficiency (r2) Infeasible Parametric T 7* N Width P( < lb) P(>ub) cover 10 0.1 10 0.551 0.057 0.037 0.905 10 0.1 20 0.564 0.038 0.052 0.910 10 0.1 50 0.599 0.059 0.043 0.898 10 0.1 100 0.594 0.048 0.049 0.903 10 0.5 10 0.326 0.057 0.037 0.905 10 0.5 20 0.335 0.038 0.052 0.910 10 0.5 50 0.352 0.059 0.043 0.898 10 0.5 100 0.351 0.048 0.049 0.903 10 0.9 10 0.127 0.057 0.037 0.905 10 0.9 20 0.131 0.038 0.052 0.910 10 0.9 50 0.136 0.059 0.043 0.898 10 0.9 100 0.137 0.048 0.049 0.903 10 0.1 10 0.551 0.057 0.037 0.905 20 0.1 10 0.379 0.044 0.045 0.910 50 0.1 10 0.236 0.038 0.043 0.919 100 0.1 10 0.167 0.050 0.038 0.912 10 0.5 10 0.326 0.057 0.037 0.905 20 0.5 10 0.228 0.044 0.045 0.910 50 0.5 10 0.143 0.038 0.043 0.919 100 0.5 10 0.101 0.050 0.038 0.912 10 0.9 10 0.127 0.057 0.037 0.905 20 0.9 10 0.090 0.044 0.045 0.910 50 0.9 10 0.057 0.038 0.043 0.919 100 0.9 10 0.040 0.050 0.038 0.912 89 at" Table 2.3: 90% Confidence Intervals for Relative Efficiency (r2) Percentile Bootstrap BCa Bootstrap T 7* N Width P(ub) COVCI‘ Width P(ub) cover 10 0.1 10 0.354 0.001 0.289 0.709 0.336 0.015 0.130 0.855 10 0.1 20 0.346 0.000 0.447 0.553 0.328 0.015 0.164 0.821 10 0.1 50 0.323 0.000 0.676 0.324 0.320 0.008 0.275 0.717 10 0.1 100 0.305 0.000 0.805 0.195 0.306 0.007 0.341 0.652 10 0.5 10 0.248 0.015 0.157 0.829 0.252 0.044 0.092 0.864 10 0.5 20 0.245 0.003 0.235 0.762 0.243 0.041 0.108 0.851 10 0.5 50 0.230 0.001 0.448 0.552 0.232 0.023 0.184 0.794 10 0.5 100 0.219 0.000 0.603 0.397 0.221 0.018 0.229 0.753 10 0.9 10 0.111 0.040 0.084 0.876 0.115 0.057 0.081 0.861 10 0.9 20 0.112 0.018 0.116 0.867 0.113 0.061 0.084 0.855 10 0.9 50 0.108 0.005 0.234 0.761 0.108 0.048 0.115 0.837 10 0.9 100 0.105 0.002 0.363 0.636 0.104 0.037 0.150 0.813 10 0.1 10 0.354 0.001 0.289 0.709 0.336 0.015 0.130 0.855 20 0.1 10 0.282 0.002 0.225 0.773 0.267 0.027 0.099 0.874 50 0.1 10 0.197 0.005 0.152 0.843 0.190 0.036 0.079 0.885 100 0.1 10 0.145 0.008 0.131 0.861 0.144 0.034 0.072 0.895 10 0.5 10 0.248 0.015 0.157 0.829 0.252 0.044 0.092 0.864 20 0.5 10 0.192 0.014 0.113 0.872 0.196 0.044 0.088 0.868 50 0.5 10 0.131 0.018 0.085 0.897 0.136 0.044 0.074 0.882 100 0.5 10 0.094 0.028 0.070 0.902 0.097 0.061 0.074 0.866 10 0.9 10 0.111 0.040 0.084 0.876 0.115 0.057 0.081 0.861 20 0.9 10 0.083 0.031 0.068 0.901 0.085 0.059 0.083 0.858 50 0.9 10 0.055 0.031 0.063 0.906 0.056 0.044 0.076 0.880 100 0.9 10 0.039 0.045 0.047 0.908 0.040 0.053 0.069 0.878 Table 2.4: Bias Correction in the ECG Bootstrap Intervals max,- 01,- man dj max,- (32") T 7“ N (1) (2) (3) (2)-(1) (3)-(2) 10 0.1 10 0.972 1.103 1.175 0.132 0.072 50 0.1 10 0.970 1.006 1.034 0.036 0.029 10 0.1 50 0.994 1.234 1.342 0.240 0.108 10 0.5 10 0.937 0.991 1.027 0.054 0.037 50 0.5 10 0.933 0.946 0.957 0.013 0.011 10 0.5 50 0.988 1.119 1.183 0.131 0.064 10 0.9 10 0.915 0.924 0.933 0.009 0.008 50 0.9 10 0.910 0.913 0.915 0.003 0.002 10 0.9 50 0.983 1.018 1.039 0.035 0.021 90 Table 2.5: 90% Confidence Intervals for Relative Efficiency (if) Feasible Parametric “Max-best” Bootstrap T 7* N width P( < 1131 P(>ub) cover width P( < lb) P(>ub) cover 10 0.1 10 0.463 0.071 0.113 0.816 0.433 0.072 0.138 0.790 10 0.1 20 0.471 0.039 0.144 0.817 0.444 0.039 0.173 0.787 10 0.1 50 0.455 0.018 0.258 0.724 0.429 0.018 0.295 0.687 10 0.1 100 0.445 0.010 0.327 0.663 0.420 0.010 0.374 0.617 10 0.5 10 0.301 0.058 0.070 0.872 0.282 0.060 0.089 0.852 10 0.5 20 0.308 0.033 0.085 0.881 0.290 0.035 0.107 0.858 10 0.5 50 0.301 0.017 0.163 0.820 0.285 0.017 0.190 0.793 10 0.5 100 0.298 0.009 0.215 0.776 0.281 0.009 0.248 0.743 10 0.9 10 0.124 0.055 0.049 0.896 0.117 0.059 0.061 0.880 10 0.9 20 0.129 0.032 0.061 0.907 0.122 0.039 0.075 0.886 10 0.9 50 0.130 0.019 0.096 0.885 0.123 0.021 0.116 0.864 10 0.9 100 0.130 0.010 0.132 0.857 0.123 0.011 0.156 0.833 10 0.1 10 0.463 0.071 0.113 0.816 0.433 0.072 0.138 0.790 20 0.1 10 0.344 0.067 0.090 0.844 0.333 0.067 0.099 0.834 50 0.1 10 0.227 0.053 0.073 0.874 0.224 0.053 0.078 0.869 100 0.1 10 0.162 0.053 0.067 0.880 0.161 0.053 0.068 0.879 10 0.5 10 0.301 0.058 0.070 0.872 0.282 0.060 0.089 0.852 20 0.5 10 0.219 0.051 0.065 0.884 0.212 0.053 0.070 0.877 50 0.5 10 0.141 0.042 0.055 0.904 0.139 0.042 0.058 0.900 100 0.5 10 0.100 0.050 0.049 0.901 0.100 0.052 0.051 0.897 10 0.9 10 0.124 0.055 0.049 0.896 0.117 0.059 0.061 0.880 20 0.9 10 0.089 0.048 0.051 0.901 0.087 0.052 0.056 0.893 50 0.9 10 0.057 0.038 0.048 0.914 0.056 0.041 0.052 0.907 100 0.9 10 0.040 0.052 0.041 0.908 0.040 0.055 0.043 0.901 91 Table 2.6: Biases of Fixed Effects Estimates (Case that ui are fixed over replications) biasl bz'a32 biasB E(61 — a) E(u’: — ui) E(fi: — 1%“) T 7* N (1) (2) (3) 10 0.1 10 0.010 0.013 0.029 10 0.1 20 0.006 0.006 0.023 10 0.1 50 0.045 0.046 0.062 10 0.1 100 0.061 0.061 0.078 10 0.5 10 -0.035 —0.032 0.004 10 0.5 20 -0.049 -0.049 —0.012 10 0.5 50 -0.029 -0.028 0.008 10 0.5 100 -0.042 -0.041 -0.005 10 0.9 10 —0.048 -0.047 0.002 10 0.9 20 -0.055 -0.055 -0.006 10 0.9 50 -0.046 -0.046 0.004 10 0.9 100 -0.052 -0.051 -0.002 10 0.1 10 0.010 0.013 0.029 20 0.1 10 -0.021 -0.021 -0.004 50 0.1 10 -0.019 -0.018 -0.001 100 0.1 10 -0.019 -0.019 -0.002 10 0.5 10 -0.035 -0.032 0.004 20 0.5 10 -0.042 -0.042 -0.005 50 0.5 10 -0.039 -0.038 -0.001 100 0.5 10 -0.039 -0.039 -0.002 10 0.9 10 -0.048 -0.047 0.002 20 0.9 10 -0.052 -0.052 -0.002 50 0.9 10 ~0.050 -0.050 0.000 100 0.9 10 -0.050 -0.050 -0.001 92 “II—asp.” yuan—“w. .1": 3:3 333.3 333.3 333.3 3:33 3333 333.3 333.3 :3 333.3 333.3 333.3 3: 3 33: 333.3 333.3 3333 :333 333.3 333.3 333.3 :333 333.3 333.3 333.3 :333 3: 3 33 :3 333.3 33 333.3 333 3:33 :333 3:33 333 :33 :333 3:33 3: 3 33 3:3 3:33 3:33 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 3: 3 3: 3:3 333.3 3:33 333.3 3:3 333.3 33.3 333.3 :3 333.3 333.3 333.3 3: 3.3 33: 333.3 333.3 333.3 333.3 333.3 333.3 333.3 :333 333.3 333.3 333.3 :333 3: 3.3 33 :3 333.3 33 333 333.3 33 :333 3.3 333 :33 :333 3.3 3: 3.3 33 3:3 3:33 3:33 3:3.3 333.3 :333 333.3 333 3333 333.3 333.3 333 3: 3.3 3: 3:3 333.3 3:33 333 333.3 333.3 333.3 333 :3 333.3 333.3 333 3: :3 33: 333.3 333.3 333.3 333 333.3 23 333.3 333 333.3 333.3 333.3 :33 3: :3 33 333 333.3 3:33 3:33 333.3 333.3 :33 :333 3:3 333.3 333.3 333.3 3: :3 33 333 3:33 :333 333 333.3 :333 33:.3 333.3 333.3 333.3 333.3 333.3 3: :3 3: 333.3 3:33 333.3 333.3 333.3 333.3 333.3 E33 E33 3333 333.3 333.3 33: 3 3: 333 333.3 33 3.333 333.3 333.3 333.3 3.33.3 3333 333.3 :333 333.3 33 3.3 3: :3 333.3 333.3 333.3 333.3 33 333.3 :33 333.3 33.3 333.3 :333 33 3 3: 3:3 3:33 33 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 333.3 3: 3 3: 333.3 3:33 333.3 333.3 :333 333.3 333.3 3:33 333.3 3333 333.3 3:33 33: 3.3 3: 333 333.3 33 333.3 :333 333.3 333.3 3:3.3 333.3 333.3 :333 3:3.3 33 3.3 3: 3:33 333.3 333.3 333.3 333.3 333.3 333.3 3:33 333.3 3.3 333.3 3:33 33 3.3 3: 3:3 3:33 3333 3:33 333.3 :333 333.3 333 333.3 333.3 333.3 333 3: 3.3 3: 333 3333 333.3 333 333.3 333.3 3:3 333.3 333.3 3333 :333 3:3.3 33: :3 3: 333 333.3 3:3.3 333 333.3 333.3 33:.3 3333 333 333.3 333.3 333.3 33 3 3: :33 33.3 333.3 333 :33 333.3 :33 333.3 333.3 33:.3 3:33 333.3 33 :3 3: 333 3:33 :333 333 333.3 :33 33:3 333.3 333.3 333.3 333.3 333.3 3: :3 3: :28 3.53: 33:3 333:3 5.53 3.50: :33“: 333:3 33.38 3.53: svm: 333:3 z *3 3. 253::33m 33:33.3: adbmuoom 3.0m 93.:333oom 25:80.83“ 3:053:93: 339:3 :3me 3:3 3: 3.3:: 330V A TV xocmsfim 9:33:33 :8 3:33:35 3:35:00 .3500. ”Fm .3338 93 Table 2.8: Estimated Efficiencies and 90% Confidence Intervals: Indonesian Rice Farms Fixed Effects MLE Firm Point Point No. Estimate LB UB Estimate LB UB 164 1.000 1.000 1.000 0.964 0.903 0.998 118 0.933 0.682 1.000 0.964 0.902 0.998 31 0.620 0.447 0.859 0.924 0.823 0.994 15 0.554 0.403 0.762 0.923 0.792 0.990 16 0.501 ' 0.362 0.694 0.845 0.725 0.969 117 0.380 0.275 0.524 0.773 0.658 0.907 45 0.366 0.266 0.504 0.774 0.659 0.908 94 ddvd dnmd weed eemd uevd nmmd Ndvd ddmd vmvd mend eemd ev efied ewmd vmed enmd ddvd mead weed dmmd Rind Emd deed 3.: fled dmmd weed Need deed Bed dated Hmvd weed weed Bed e: webd :vd mend edvd eded Heed dmnd Evd weed ddwd uweed e: Edd eevd deed 333d need mdmd dmwd fled dehd evwd dmed He odd: need odd: Need dddé awed odd: etd odd: :ed emdd e: odd: odd: dddé dddg dddg meed dddé heed doe: mid odd: we: m5 m3 m3 m5 m5 m3 m2 m3 m3 m4 3.3m— .oZ awbmuoom 13535 935330on awbmuoom amsmuoom m: :EE 2.333-332: 2.538333% dougmfi 3.0m 25:33.:om 38.3.3: 02m 53:30:35: ”3:35.35 83:03:80 Rodd dd 0338 95 Table 2.10: Estimated Efficiencies and 90% Confidence Intervals: Texas Utilities — ___ (‘_‘n.‘ 5'...— Fixed Effects MLE Firm Point Point N 0. Estimate LB UB Estimate LB UB 5 1.000 1.000 1.000 0.987 0.971 0.999 3 0.916 0.823 1.000 0.978 0.959 0.996 10 0.861 0.786 0.943 0.908 0.889 0.927 1 0.835 0.784 0.889 0.864 0.846 0.882 8 0.820 0.773 0.869 0.846 0.828 0.864 9 0.806 0.766 0.848 0.826 0.809 0.843 2 0.801 0.749 0.855 0.831 0.814 0.848 7 0.786 0.732 0.844 0.817 0.800 0.834 6 0.785 0.730 0.845 0.820 0.803 0.837 4 0.762 0.719 0.808 0.786 0.770 0.801 96 weed dmnd eeed eHnd eend emnd nend eHnd eend omnd mend v Zed mend eeed dend eeed nend Heed dend Need mend eend e need eend vved mend need eend deed eend deed eend eend n Eed eend eeed evnd eeed eend mved eend eeed eend Sed m eeed eend eeed eend eeed ennd eeed Hnnd zed eend eded e need nnnd eeed ennd need Hend eeed ennd eeed nnnd dmed e need eend eeed fiend vned mend wned eend nned eend eeed H weed eend eeed eend deed eend Need eend weed eend Heed dH ddda need dddA emed ddda eeed dddA eeed dddg need eHed e dddg dddA ddda dddA ddda need dddg eeed odd; eeed dddg e m3 m5 m3 m3 m3 m5 m3 m3 m3 m3 .pmm— .oZ aafifioom g msbfioom aabfioom mesmuoom mm :Em ..._$o£-x.m2: oCumESmm ©3553 ebm mzfiagom $5253 @588 ”meme/H35 8:25:00 cede ”EN @368 97 Chapter 3 Indicator KPSS with a Time Trend 3. 1 Introduction In this chapter, we propose a statistic to test whether a time series is stationary, and we allow for a time trend. A standard test for stationarity is the KPSS test by Kwiatkowski, Phillips, Schmidt, and Shin (1992). The KPSS test, 1‘7” uses the scaled sum of squares of cumulations of demeaned data with a long-run variance estimate in the denominator. A deterministic trend can be allowed in the test of trend-stationarity in which the demeaned data in 1?” are replaced by the residuals from the regression of the series on intercept and trend. In the construction of the KPSS tests, conditions enough to imply Functional Central Limit Theorems (FCLT) are assumed. One of these conditions is the finite variance assumption. However, when the data have fat-tailed errors such as those from the Cauchy distribution in which the moments do not exist, the limiting distributions of the KPSS statistics are functionals of the Lévy process (Amsler and Schmidt 2000), not a Wiener process. In the paper by de Jong, Amsler, and Schmidt (2002), the authors relax the moment assumption and propose a modified version of KPSS test, 1‘7”. They call their test the “indicator KPSS” test which we will label Zp. The sample data are transformed using an indicator which gives the value of 1, 0, or -1 depending on whether or not the sample observation is above, on, or below the sample median. 98 Under the null of level-stationarity, in is shown to have the same limiting distribution as the KPSS test, ii”. In this chapter, we use a similar indicator to transform the data, but allow for a deterministic trend as well as a non-zero level for the data. Let the indicator KPSS statistic with a time trend be denoted as £7. We show that the asymptotic distribution of ET under the null of trend-stationarity is a function of the second-level Brownian bridge, which is also the limiting distribution of the KPSS statistic with a time trend, A 771- 3.2 Asymptotic Theory 3.2.1 Assumptions Let {{:1:Tj};r:1}§9:1 be a triangular array of random variables such that my = 00 + flog: + 63'. (3.1) Assumption 1. There exist unique £10,180 such that med(:ch) = (10 + Bo j /T for all T andj=1,-~ ,T. Note that this implies that the unique median of ej = ij — a0 — 30 j / T is zero. The next assumption is a convergence condition on the average variance of the sum of transformed ej with the finiteness of the long run variance, 02. 2 Assumption 2. Define 02 = limr_r_,ooE(T"1/2 Z$=1 sgn(ej)) , where the sgn function takes three different values of 1, 0, or -1 depending the sign of an argu- ment: sgn(a:) = 1 ifx > 0, sgn(:1:) = 0 ifa: = 0, and sgn(:r) = —1 ifa' < 0. Then, 2 0exp<—z‘sx>dz. (3.2) "'00 The Bartlett, Parzen, Quadratic Spectral, and Tukey-Hanning kernel functions are some possible choices (de Jong and Davidson 2000). These kernel functions are designed to lessen the effects of the longer lags smoothly to zero so that the kernel function such as the uniform or the truncated kernel is excluded. The next set of assumptions is about the ej, and will be used in deriving the asymptotic distribution of the indicator KPSS statistic under the null of trend-stationarity. Assumption 4. The ej are stationary random variables and strong (a—) mixing with — r .— mixing coefficients a(m) which satisfies a(m) S Cm :2 77 for some finite r > 2, some 17 > 0 and a constant C. And ej has a continuous density f (e) in a neighborhood {—7}, n] of 0 for some n > 0, and inf66[_,m] f(e) > 0. Assumption 41is different from general conditions on the stationary errors used in the derivation of the asymptotic distributions of the KPSS statistics (Phillips (1987) or Phillips and Perron (1988)). The important difference is moment conditions on ej. For example, in Phillips (1987 ), the moment condition like sup,- E lej|‘9 < 00 for some 19 > 2 is assumed. However, in this chapter, we do not assume the existence of moments of 53' under the null. This is made possible by the use of the indicators. The next assumption is for the alternative of unit root. Assumption 5. The 63- satisfy T _1/ 25[§T] => AW(§) for some /\ 6 (0,00) andé E 1In this chapter, Assumption 4 is stated in terms of 45,-, not 1273 as in de Jong, Amsler, and Schmidt (2002). This is to emphasize that we are interested in the test of trend-stationarity. That is, the assumptions for e,- in this chapter (or the detrended series, arr,- — ao — fig j /T) are the same as the Assumption 2 in de Jong, Amsler, and Schmidt (2002). 100 [0,1], where W() is a Wiener process or Brownian motion. Note that Assumption 5 also implies that T‘l/zxmm => AW(§) since T’l/2 $T,[gT] = T-1/2(ao + fiolETl/T) + T-l/zqgr] = 012(1) + T-l/qur] =¢ AW“)- 3.2.2 Indicator KPSS statistic Using the least absolute deviations (LAD) estimators 6r, 3 which are solutions to T . .7 rain: $T] — a — 5T , (3.3) 1:1 we define the cumulation of the indicator data ‘ «j STt = :sgn(:ch — 61 — fiT). (3.4) i=1 Then, the indicator KPSS statistic with a time trend, L, is defined as T 6‘2T‘2 Z 5%,. (3.5) t=1 2 can be constructed from the “indicator” resid- A consistent estimator of 02, 6 uals, sgn(a:Tj — o} - flit/T). Using a weighting function, the heteroskedasticity- autocorrelation consistent (HAC) estimator, 62 is obtained by T T . . . . A2 ---1 2 - J A A z A A J 0 = T 2.2—1:124 k( ”YT )sgn(:rT,- — 01 — fl?) 5811(5’3Tj “ a — 5?), (3'6) where k() is the kernel function. ’YT is the lag truncation parameter which goes to 00 as T —> co and satisfies the condition of 7T/T —-) 0. Note that ET is defined in a similar way as the KPSS statistic, 1‘77. The difference is that we use the deviations from the median while 1‘77 is based on the deviations from 101 the mean of the series. The indicator KPSS is based on the sample median which is the generalization of the fit from a LAD regression. As noted in de Jong, Amsler, and Schmidt (2002), the purpose of trimming the data is to remove the effects of fat tails or make the variance finite. We use the sgn function to bypass the problem of how to scale the data so that only the location of the data is used to transform the data. This is because sgn(a:) = (I(a: Z 0) —I(:r S 0)) and |:c| = a: - (I(a: 2 0) — I(a: S 0)) = a: - sgn(a:), where I() takes the value of one if the argument is true and zero otherwise. 3.2.3 Conjectures Before stating theorems for the asymptotic distributions of £7, let us make conjectures on a and 3 as the proofs for the following claims are only partially done. Conjecture 1. Under Assumptions 1 and 4, T1/2(& — 00) = 0,,(1) and T1/2(fi’ -— 50) = 0p(1)- What we want to assert in this conjecture is, for an arbitrarily large K > 0, limsupP( sup sup Y1T(¢1.¢>2) 2 0) T-)00 ¢1>K ¢2>K (3 7) =1imsupP( 811p sup Y2T(¢1.¢2) 2 0) = 0 T—ioo ¢1>K ¢2>K so that the probability of having solutions (bl, (1)2 outside = {(cbl, (152) 6 1R2 : —K _<_ 4513 K,—K§ (152 g K} goes to zeroasT—-+oowhere T . . Y1T(¢1.¢2) = ”FA/2 ngn(ij — <10 — 50% — T_1/2(1+ ¢2%)). jzl (3.8) T . Y2T(¢1.¢2) = T4” ngn($:rj - a0 - 50;;- - T—1/2(¢1 + (152%)) 2'. . T' 1:1 However, there are the four possibilities for obtaining large values for 451 and/ or 432: 102 0 case [1]: (b1 > K and $2 > K, 0 case [2]: d1 < —K and (fig < —K, 0 case [3]: qbl < -—K and (b2 > K, a case [4]: (b1 > K and (b2 < —K. The proof of (3.7) corresponds to case [1]. The proof for case [1] and case [2] is shown in an Appendix. However, the proof for case [3] and case [4] remains to be done. Also, note that the case in which only one of |¢1| and |¢2| is larger than K is a special case of either case [1] or case [2] and can be proved in a similar way as in the proof for the first two cases. The following conjecture makes a similar claim as Conjecture 1, but the difference is that we assume 6]: is an I(1) process. Conjecture 2. Under Assumptions 1 and 5, T’1/2(& — a0) = 019(1) and T-1/2(fi — 50) = Op(1)~ Here we also have to consider the four possibilities in which |T‘1/2cil and/or |T‘1/2fi| are greater than K. In the Appendix, we prove two cases when both IT’l/zdl and IT‘l/zfi] are either greater than K or less than K. The two other cases would be proved in a similar way as in the unsolved cases of Conjecture 1. 3.2.4 The Asymptotic Distributions of the Indicator KPSS Statistic Theorem 1. Under Assumptions 1, 2, 3, and 4 and Conjecture 1, T I T_2 23% —d> 02/0 V2(r)2dr, (3.9) t=l 103 where V2 (r) is the second-level Brownian bridge, 1 V2(r) = W(r) + (2r — 3r2)W(1) + (—6r + 6r2)/ W(r)dr. (3.10) 0 And 62 L 02. (3.11) The limiting distribution of ET is fol V2(r)2dr, which is also the limiting distribution of the KPSS test with time trend, 1?, so that the same critical values in the paper by Kwiatkowski, Phillips, Schmidt, and Shin (1992, p.166) can be used. Under the alternative in which :rTj is an I(1) process, we have the following result. Theorem 2. Under Assumptions 1 and 5 and Conjecture 2, T 1 c 2 T"3 23%, —d+ A2 /0 (f0 sgn (W(g) - é - g5) d5) dc, (3.12) t=1 where (T-1/2&,T‘1/ZB)’ —d> (A, B)’ for random variables A and B, and 62/771 —a—'—) 2ft?o k(§)d(. Other than whether the underlying series is stationary or not, the important difference between the assumptions used in deriving the results of Theorem 1 and Theorem 2 is the moment condition on 63-. In Theorem 1, we do not impose a condition for the existence of the moments. However, in Theorem 2, we need a finite second moment of ej in order to apply FCLT. Also, note that the limiting distribution under the alternative of unit root in Theorem 2 is different from that of the KPSS statistic with a time trend which is [01 U: W*(s)ds)2da/K fol W*(s)2ds (3.13) 104 where W*(s) = W(s) + (63 — 4) fol W(r)dr + (—123 + 6) fol rW(r)dr. The differences in the asymptotic distributions will turn into power differences as in de Jong, Amsler, and Schmidt (2002). Under the alternative of unit root with the fat-tailed errors, the indicator KPSS test with a time trend would be more powerful than the KPSS test with trend, and less powerful when the errors are normally distributed. This is because the indicator is only concerned with the location of the data. 3.3 Concluding remarks In this chapter, we have extended the indicator KPSS test proposed by de Jong, Amsler, and Schmidt (2002) to the case in which a time trend as well as non-zero level is allowed. The indicator KPSS test with a time trend also does not require the existence of the moments of the series, yet produce the same asymptotic results as the KPSS test with a time trend, 177-. However, this result depends on our conjectures on the estimators. The indicator can be extended to unit root tests such as Dickey-Fuller, Phillips- Perron, or Schmidt-Phillips tests. We expect that the use of the indicator would produce more powerful results when the errors have sufficiently fat tails, which is commonly associated with financial time series. However, the asymptotic results of unit root tests with the indicator under the null of unit root might be different from those without the indicator. As in our chapter and de Jong, Amsler, and Schmidt (2002), the unit root tests with the indicator might produce the same asymptotic results as the tests without the indicator under the alternative of stationarity. 105 3.4 Appendix: Mathematical Proof Here is the outline of proofs. Lemma 1 shows an inequality involving the Lp—norm which will be used in Lemma 2. Lemma 2 states that GT(1,¢) — EGT(1,¢) = T-l/2 E?=1(yTj (4)) - EyTj (45)) is stochastically equicontinuous. In Lemma 3, the uniform convergences of GT(r,¢) and HT(1,7) over corresponding compact sets of parameter values will be established. A partial proof of Conjecture 1 follows. In Lemma 4, the asymptotic distributions of the estimators of the regression coefficients are derived. Then, Theorem 1 proves the asymptotic distribution of the indicator KPSS statistic along with the consistency of the long-run variance estimator. Con- jecture 2 is partially proved. Finally, in Theorem 2, we show the limiting distribution of the statistic when the ij have a unit root. Lemma 4. For strong (a-) mixing random variables yTj E IR whose a-mixing coef- — r _- ficients satisfy a(m) S Cm :2 ’7 for some n > 0, T T E :(yTj - EyTj) ~ S E max E(yTj - EyTj) S 0'2 M 3m ”3 j=1 - - '=1 i=1 (3.14) for constants C, C' . Proof of Lemma 1. By Theorem 17.5 and Corollary 16.10 of Davidson (1994). El Lemma 5. Let z = (Li/TY. 4’ = (¢1.¢2)' and 1P = Win/ny- Let yTj(¢) = sgn(:ch — 010 — Boj/T — T'1/2Z’¢)— sgn(a:Tj - 0:0 — Boj/T). Then, under Assumptions 106 1 and 4, for all K,e > 0, lim lim sup P sup 57° T-*°° |¢1|SK.I¢2|SK. |¢1ISK.I¢2|SK. (|¢1-¢1|+l¢2-¢2I)<5 T 71““2 2 Iowa) — Ema» - (mu) — Eyre-(«pm < e) =1. '=1 (3.15) Proof of Lemma 2. For T large enough such that 2K T‘l/ 2 g n T SUD T_1/2 IEyT'(¢) - EyT'(¢)I (Ida-1P1|+|2 - do)? =2 SUP T4 f(T T1/22'f)[(¢1 -¢1)+(¢2-¢2)i (|¢1-¢1|+|¢2-¢2|)<5 Z( T _<_2 - sup (ST-12“?" T1/2Z'£)||¢1- ¢1|+|¢2_¢2|| (|¢1-¢1|+|¢2-¢2|)<5 < arr-12“ WW (:1 +£2— T» j: —1 S 26 _ sup f(§). {El-77117] where F () is the cdf of e and 5 E L(¢, 1b), a line segment from 4) to 1b. This establishes the equicontinuity of T’l/2 ZleEyTjw) on (P = {(451,432) 6 R2 : —K S 451 s K, —K S (252 S K} since 6 can be made arbitrarily small. Then, the stochastic 107 equicontinuity of T‘l/2 Z}; yTJ-(d) can be shown as follows. Let ii); = (i6, i5)’ and i(1+2); = ((i + 2)5. (i + 2)5)'- T —1/2 sup T lyTj(¢) - yT '(WI |¢1lSKJ¢2|SKa 1:2:1 J WllSKJWISK. (|¢1-¢1|+|¢2-¢2l)<5 = sup sup _[%]_15,S[%] (151.1111 €[i6,(i+2)6]n[—K,K] T sup T_1/2 9T (¢) - yT ('1’) ¢Z,tb2€[i6,(i+2)6]n[—K,K] 231' J J I T S SUP T-1/2 Z (yTj(ii6) - yTj(i(i+2)6)) 431-lass] T 3* sup T_1/2 Z (E yTj (ii6) — EyTj (i(i+2)d)) Taxes—1 T S SUD T_1/2 IEyT'(¢) - EyT '(II’) - |¢1ISK.I¢2ISK. £31 3 J I |¢1|SK.l1/)2|SK. (|¢1-¢1|+|¢2-¢2l)<5 For the first inequality above, note that yTj is nonincreasing so that the maximum distance between ¢ and 1/1 in sub-intervals of [-—K, K] will give rise to the supremum of (yTj(i,:5) — yTj (i(,-+2)5)) for each sub-interval. For the last inequality, 43 and p are now chosen all over the interval. The pointwise convergence for every i holds because, for T large enough such that 2KT"1/2 S 77, by Lemma 1, T 2 E (TI/2 2: [(yTj (its) - yTj (i(i+2)6)) _. E (yTj (it'd) ’ yr) (i(i+2)5))l) i=1 T g CENT—V2 (yTj (its) - yTj (i(i+2)6)) H3 j=l 108 T = CT‘1 21]] (yTj (ii6) — yTj (i(i+2)6))|[12- J: gCT'IZ supKllysj<¢>Il3 T . . = CT 1 Z sup || 8311(ij — as — 51%— T‘1/2¢1 — 134%? j=1]¢1]SK1 |¢2ISK j 2 - Sgn(-TTj — a0 - 1305:)“ 1‘ ii" 3 C’ 1 — 2F(—2T‘1/2K) ‘Ilw = C’ 2T‘1/2K sup f(§) IEISn ——>0, as T —> 00 and for some positive constants C, C’. Note that F() is the cdf of e and there are two cases to consider for the last inequality since (T’1/2gb1 + T-l/ 2¢2 j /T) will be either nonpositive or nonnegative. In the below, we prove that the inequality holds in either case; . case [1]: T’1/2¢1 + T-l/Zszj/T g 0. This implies -—2T"1/2K g —T-1/2K — T-l/ZKj/T g T-1/2¢1 + T‘l/qugj/T so that SUP [I Sgn($Tj - ao - flog.- - T-1/2451 - T'1/2¢2§J:) |¢1ISK. |¢2ISK j 2 - 5811(ij - 00 - 50%)”? 2 - su (EI2I(T"1/2 _1/2 1 '— — l )1”),- — p ¢1+T 452 SICT) a0 50 SO |¢1ISK. T T |¢2|SK )2. . % = 4 (E1 (—2T"1/2K g (BTJ' — a0 —[30€l]; S 0)) g (E [21 (—2T"1/2K S JITJ‘ - an — 605],— S 0) 109 ~ilk: 4 (P (—2T‘1/2K g ij — a0 — egg; 3 0)) 22 %[1—2F(— 2T‘1/2K)]% 0 case [2]: 0 _<_ T-1/2q51 + T_1/2¢2j/T. Then, j _ _ . sup [lssneTj—ao—floT—T “211—7“ 1”12%) |¢1|SK. |¢2|SK j 2 — sgn(a:Tj — a0 — BOT)|]1~ ‘NN - r = sup (E[—2I(0 0, sup sup 16110.1) — E62111)! A o. (3.17) l¢ll£K1 TE[0,1] |¢2|SK SUP IHT(1.7) - EHT(1.'7)| i1 0. (3-18) |71|SK. |72|SK 110 Proof of Lemma 3. Let JT(¢)= SUP IGT(T,¢)-EGT(T.¢)I- (3-19) r6[0,1] For each 41 with its elements in [—K, K] and T large enough such that 2K T‘l/ 2 S 17, 2 E (JT(¢))2 = E ( SUP IGT(7‘.¢) - EGT(T.¢)I) rE[0,l] 2 = E sup (r€[0,1] ) [TT] T4” 2 (yTj(¢) - E yTj(¢)) T 2 S 0T4 Elli/TM)”. j=1 j=1 (3.20) ‘3lt0 g C’ 1— 2F(—2T-1/2K) ‘IIN = C’ 2T—1/2K sup f(§) |£|Sn —>0, as T —> co and for some positive constants C, C’. This implies JT(¢) = op(1). JT(¢) 111 is also stochastically equicontinuous because, for d E and 41’ E , IJT(¢) - JT(¢')I : sup IGT(T,¢) — EGT(T,¢)] — SUP [GT(T)¢’) — EGT(T)¢I)| r6[0,1] r€[0,1] S 33131] [GT(7°1¢) - E GT0. 4’) - GT(7‘1¢’) + E GT0“. ¢')l W] (3 21) = SUP T_1/2 Z (yTj(¢) - E yTj(¢) — yTj(¢') + E yTj(¢')) ' T'E[0,I] j-1 W] S SUP T4” 2 IyTj(¢) - E yTj(¢) - yTj(¢') + E yTj(¢')| TE[0,I] j=1 T S T.”2 Z lyTj(¢) - EyTj(¢) - yTj(¢') + EyTj(¢')|- j=1 Then, by applying Lemma 2 to (3.21) and together with pointwise convergence in (3.20), uniform convergence of JT(¢) follows, which proves (3.17). Pointwise con- vergence in probability of GT(r, 41) to EGT(r, 41) in 45 for every possible r has been proved. In other words, JT(¢) is pointwise convergent in probability to zero and stochastically equicontinuous. The uniform convergence in probability of JT(¢) to zero or that of GT(r,¢) to EGT(r, 41) follows because, with compact sets and the equicontinuity of E GT(r, 41), pointwise convergence with stochastic equicontinuity is equivalent to uniform convergence. See Newey (1991). For the uniform convergence of (HT(1,*y) — E HT(1,7)) to zero, note that 011(1),” - E GT(117) HT(1,'y) — EHT(1.')) = T - T‘l/2 Zj=1i'(3/Tj(‘7)" EyTj(’Y)) where GT(1,7) — ECT(1,7) is uniformly convergent from (3.17), and T—l/2 Z}; (yTJ-(y) — E 3177(7)) j / T is also uniformly convergent which can be seen by comparing 112 the following expressions to (3.20) and (3.21), respectively: T j( 2 T” Z T (11,-(v —EyT,-(7)) 2 < E E(T_1/2T Z (yTj(’Y) - )EyTj(7))) 1:1 T S CZHIJTJ'W) i=1 and T j( T j( T’”2 Z T (rm-(7 ‘EyTj(7))T1/2Z 5): (WM ‘1')- EyTjW» i=1 1:1 T 1 S T—1/2231ITI(yTj(7El/Tj(7) - yTj('Y’) + EyTj('Y'))] T S TV2 ;I( yrj(7 Eyr) (7) - yrj(7') + Eyrj(7'))l. thereby proving (3.18). E] Partial Proof of Conjecture 1. Here, we provide a partial proof of Conjecture 1 under Assumptions 1 and 2: that is, T1/2(61 — a0) and T1/2(B — BO) are Op(1). Let T . . Y1T(¢1.¢2) = T—1/2 ZSgIICETj - 010 - 50% - T_1/2(¢1+ ¢2%)). j -1 (3.22) Y2T(<131.Kand ¢2>K, 113 0 case [2]: (151 < —K and (£12 < —K, 0 case [3]: d1 < —K and (152 > K, 0 case [4]: $1 > K and (152 < —K. However, we prove only first two cases in which both (151 and ([52 are either greater than K or less than —K, which are done by showing that the probability of having such solutions outside the compact set = {(451,452) 6 R2 : —K S (131 S K, —K S (152 S K} is arbitrarily small. Also, cases when only one of (151 and (252 is outside can be proved in the same way as in de Jong, Amsler, and Schmidt (2002). Suppose that 451 > K and (:52 > K. Let’s start with Y1T(¢1,¢2). ForT Z 4K2n"2, T . . __ .7 — .7 sup sup 111131.12) = T “2 Zssnarj - as - 30f — T mm + KT» ¢1>K¢2>K j=l T . P :r-ll/2 2(1— 2F(T-1/‘-’-(K + 1%)» -T‘1/2:2(F( —F(T-1/2(K+K;))) = T-W 221(1)<—T*1/2(K + 11%)) j=1 for some 2: E [0, T‘1/2K( 1 +40] ]C [0,T—1/2K(1 + 1)] Q [0.77] T+1 <—2 f TlK 1+— )=—2K 1+ inf :1: [315.7me E: :( ( 2T )ITISnf( ) T < —2K 1 — f —3K 'f _ ( +2T)linSnf($): [313an so that limsup P( sup sup Y1T(¢1,¢2) 2 0) S limsup P(— —3K inf f(x) _>_ 0) = , (3.23) T->oo ¢>1>K ¢2>K T—ioo lxlsrr 114 since K > 0 and ianTISn f(x) > 0. Next, in case of Y2T(¢1, 412), SUP SUP T 1/2ZSSH(~TTj—010-,BO—_T 1(2((?51 +¢21T)) J' ¢1>K ¢2>K j_ _1 T =T 1/2ngn( (sTj—ao—sO—- T“1/2(K+K— .7 j=1 TDT T . . . — -1 2 ._ _ i_ -1 2 l 2. —T / EH?) 00 fioT T ((K+KT) 20)T —12 —12 j —T /TjéusT,—aO—so—— T /(K+KT)<0)T T‘1/2(K+KT) :3 d2: __,T_1/22(/T°_°1/2HK KT) )f(s)da:—/_oo f() : T-1/2 Z(F(oo) — 2F(T-1/2(K + 11%)) + F(—oo))—% j=1 T . —_- :r-l/2 2(1 — 2F(T-1/2(K + K%)) file. j=1 T _ —1/2 -1/2 Z. _T g2f(:c)( —T (K+KT))T forsomexE[0,n] 1 T 1 11 _2T K|£|Iisfnf(x )1; (T + T5) _ _ T(T + 1) T(T + 1)(2T + 1)) - 1K 1121.111) (T2 613 < —2K inf f(T) (1+1)— " —§K inf f(T ) _ [3'5" 2 3 3 [2;|K¢ >K 1 2 (3.24) Tl/zisgn (3t - 010 — flo— _ T—1/2(¢1 + 4527-1)) 717:0) = 0‘ j: -1 115 The second case to consider is when 451 < —K and (1)2 < —K. The proofs in this case are similar to the ones just done. Let’s start with Y1T(¢1,¢>2). ' f ' f Y , ¢11<11_K¢21<11_K 1T(¢1 (152) T . . _ —1 2 __ _ _~7_ _ —l 2 _ _ l _T / Elsgncrfp] ao fiOT T / ( K KT)) 7]: T . Jl+ T'1/2j;1(1 — 2F(T‘1/2(—K — K-%))) T . _ _ J = T 1/2; :2(F(0) — F(T l/2(—K — KT)» =T1/212:2f(x)(T1/2(K+K%)) forsomexE[-n,0] T+1 _<_ 2 1nfT1K(1 +% K(+1 Inf I:I=|oo ¢1<-K¢2< <-K T—mo |$| -K ' f . —3 IglISnflx) Therefore, lim sup P inf inf T—>oo ¢1<‘K¢2<-K T . . . T_1/2 :Sgn (ij - 010 - 50% - T_1/2(¢1 + 452%) “5% S 0) = 0- i=1 The above two cases imply that lim suquoo P(T1/2(|& — (10] + If} — fiol) > K) can be made arbitrarily small by choosing K large enough. C] Lemma 7. Under Asaumptions 1 and 4, —1 T1/2(& — a0) _ T T 1;; \ T1/2 * _ _ 2 f(O) T+1 (T+1)(2T+1) (5 fio) 2 6T / (3'27) 0WT(1) ‘ 0p(1) X + oWTm — oT-l 22-11 Warp opu) 117 Proof of Lemma 4. Let2 . Tm“! - 00) ’Y = . - TW - fio) Note that —1/2 1‘ ._ . _ “ ' T‘l/2 2:le {~Esgn(ij — d — 8f.) : zcr-l/2 2$=1(F(0) — F(é + 85; — ao — 50%)) 2T-1/2 )3le gm) — F(c‘r + 35'- — ao — 30%)) = -2T‘1/2 Zimo) — f(O) + f(0))(d + 8% — a0 - 30$) -2T-1/2 2321 We» — W) + f(0))(d + 3;. — a0 _ 50;) = —2f<0>T—1/22:}‘=1<é +Bl — ao - 50%) —2f(0)T‘1/2 ELI {-(é + 3% - <10 — 30%) + —2T-1/2::,-T=1(f(ej) - f((»)(a +64} - ao - 50%) —2:r-1/2 23%;] 71mg) — i<0>>— Earm- ¢>> +T 1/2jzzfilsgnon — a0 —30T) l + (T—l/ZJ zlrlela 53 (E sgn(:ch - Oz- 5%) )) lazaoflzflo) ((5! _ (10)) T4” Zszllfi (E 8811(1‘Tj - a - [3%) lam,0 fizfio (3 - 30) W] =(GT(7‘ ¢)- EGT(7' ¢)) + T ”22881“ (IBTj - 00 - floT) j: —1 [rT] , - 2fwwl/2ZM-ao)(0-2f)Cl’l/221T-(3- 30) .7= -1 j: —1 [TT] =(GT>> + T ”2 ngn( n, — a0 — 3oT> j: —1 - 2f(0)[-T-.-TT]T1/2(a —ao> — 2f(0 0)ererng + 1)T1/2(3 — 30) = op(1>+ 0W1"(7")+(#1l — 3W? +1 )T"”2 ngnej) T . + (2%] _ dang] + 1))T—1/2Z% Sgn(ej) by (3.27) '=1 = (To) + 0WT(r) + (#31 — 3["'T:§:Z] + 1))0WT(1) + (9%:51 _ 6mg] + 1))0 TIZWTQF) >+ (W = op(1)+ aww) + (”fl - 3["T](;;] + 1))0WT(1) (£13.73; + 6er1<331+ 1))T_1§:10WT(%) J: 1+ 0(W(r) + (2r — 3r2)W(1) + (-6T + 6T2)/1 W(TW) = 0V2“), 0 120 where the op(1) term is uniform in r. Note that for each j = 1, ..., T, ._ _ j _ _ ~ “ _ _ l Esgn(a:TJ a fi— T=)a -ao,fi= 30 —1 2F(a+fi ao éoT) (3.28) =1 — 23(0) — 2. fir». where F is the cdf of ej. When 61 and B are consistent, f (6 + fiTv — a0 — BOT-v) would be asymptotically equal to f (O) 0 J a; ESEDCUTJ _ a ‘ 'BT)la=ao,fl=BO __ 3- - i =- _ 2f(a + 3T (10 fiOT)!a=ao,fi:fi0 2f(0), (3 29) a __Esgn($ 0‘31” . 6? TJ_ T a=aofi=fi0 __1 i_ — i --1 — 2Tf(a+flT ao flOT)la=a0,fi=BO — 2Tf(0). Also, note that T t — —T ”2 gng) T“1 (%)(:’1/2:: ngn(ei)) + 019(1) (3.30) j=1i=1 121 since t Z: Sgn(€i) i=1 Ms K). II H = sgn(€1)+(sgn(61)+ ssn(62)) + - ' ' + (8811051) + 8311(62) + ° ' ’ + 58’1“”) T (T- j+ 1) Sgn(€j) :1 T T T =TZIS sgn( ej) -Z(J’ - sgn(ej)) + ngnkj), j=1j=1 j:1 K). and 3 T t T'Z Z Z sgn(e,) j=li=1 T T J T -T—1/2ngn(e )—T 1/2Zngn(e )+T lT ”2289(9) 1:1 j=l 321 T T — T 1/2Z:sgn(e )- T 1/2§_:%sgn(csj) +op(1) K) H H K) II g—n Since the sgn function is regular (Park and Phillips 1999, p. 272), we apply Theorem 3.2 in Park and Phillips (1999) to derive the limiting distribution (3.5): 0271-2 2 STt —d—>/01 (VI/(r) (2r - 37‘ 2)W(1) + (—6r + 6r2)/01W(r))2d'r Next, we will prove the consistency of 62. First, note that for t = 1, - -- ,T, (3.31) , At Sgn($t — a — fl?) = (M7) — Eyt(’7)) + Esgnm " 5‘ ‘ 3%) + Sgnm _ a0 _ 50%) 122 = (My) — mm» + (1 — 23(3 + 3T - 00 - 30%)) + Sgn($t - Cto - 50%) = (My) — Eym» + (1 — 23(3 + 3T — ao — 30T)) + sgna2. t=ls= What is shown below is that all the cross products in the right hand side of (3.32) except the term involving ctcs are 019(1) as T -—> 00. First, note that, for T large enough, sup Md = sup 1— 2F(a +BT— a0 — floT) 13th 13th .t ~t t = sup 2d+fl——ao-5o—)°f51+fi—‘T010‘30—) lgth ( T T ( T T 123 s 2 sup m2) (Ia—a0: + IB—aol) = 0p(T’1/2), lesrl since for large T and consistent d and ,3, f (5: + ét/T - a0 — flot/T) converges to f(O) uniformly in t so that the above inequality holds for |a:| g 17. Second, T'l/2 2&1 laTtl = 0p(1) by Lemma 3. Then, T—l Z 2: Mt; S)aTtaTs t=13=1 =T’IZZ/_oo eXp t=1 124 T T =T-1-T‘1/2ZIaTtI- Z t=l j=-T 14%)] -0p<1> = 0100) - 2} 010(1) ——— 0pm, T T1232“; t==131 — —T‘l 2T3: /_°° exp (TC7 Ts» «mm - aths t=ls=1°° = T-1 °° {his exp (is (ti—D «mode 7T s)aTtCs -7T=0p(—T—)=op(1), T- t==ls 125 T ST—IZ(1 8 Cs)t—3 t=1 T < sup |an T (Zlcsl 2M 1 00 and T —> 00. Note that 1 T—1/2 j sup sup T sgn(T xT- — a — b—) a>K b>K :31 J T T j _ -l -1 2 . _ T :sgn(T MT] — K — K—T-) d 1 —+ / sgn(AWoo — K — Koala 0 E T1(K) and sup supT lz:sgn(T 1”2:513- —a—b-j-)J— a>K b>K T T j: -—1 126 "3|“ T . _. _. J =T 12 :sgn(T 1/2ij—K—K?) i=1 1 —"+ f sgn(AWe) — K - Keats 0 E T2(K), where the limiting distributions of T‘1 2121ng (T—1/2ij — K — K j /T) and T‘1 2L1 sgn(T‘1/2ij — K — Kj/T)j/T are obtained by applying Theorem 3.2 of Park and Phillips (1999) because the sgn function is regular (Park and Phillips 1999, p. 272). Then, the probability with which T‘1/2c‘r and T-1/2B are not bounded in the limit is calculated as follows: P(T‘l/Zé > K and Two“ > K) T . = P( sup sup T‘123gn(T—1/2xTJ-— a — b-J—) 2 O) G>K b>K j=l T T . -1 —1 2 .7 .7 + P( sup sup T E sgn(T / xTJ- — a — of)? Z 0) O>K b>K j=1 —* P(T1(K) Z 0) + P(T2(K) 2 0)- Note that as K —> oo, sgn(AW(€) — K - K5) —-> sgn(—oo) = —1 so that T1(K) —p—-> f01(-1)d€ = _1 and T2(K) .1) f01(-§)d§ = —0.5. This implies that P(T1(K) _ O) and P(T2(K) Z 0) will go to zero as K —> 00. Therefore, lim sup lim sup P(T'l/zo‘z > K and T—l/zfi > K) = 0. (3.33) K—+oo T—)oo. 127 Similarly, it can be shown that limsuplimsup P(T_1/2ci < —K and T—l/zfli < —K) = 0. K—)oo T—>oo (3.34) Note that 'f le 1/2-— —b1 aimirix 2,3811” a T) _ —1 —12 , j —T éngT /$TJ+K+KT) d l —~>/0 sgn(AW(€) + K + K£)d§ E T3(K) and ‘f le 1/2.— —b1 came 2333“” a T) HI“ = T“1 ngn(T—1/2TTJ- + K + K%)% d l —>/O sgn(AW(€) + K + K§)€d€ E T4(K). As K —+ 00, T3(K) —p—> 1 since sgn(/\W(£)+K+K£) —+ sgn(oo) = 1 and T4(K) —p—> 0.5 since fol sgn(AW(€) + K + K§)§d§ —> fol gag = 0.5. Then, P(T"1/2& < —K and TT—l/ZB < —K) =P( ian inf T Ingn(T T1/2ij—a—bl <0 a<—K b<—K T)_ ) 128 T . - - -1 —1 2 J J + P(aén—fK b<1n_fKT jizlsgn(T / ij — a — bf)? g 0) —> P(T3(K) S 0) + P(T4(K) S 0). Since T3(K) 11> 1 and T4(K) 11) 0.5 as K —+ 00, P(T3(K) g 0) —> o and P(T4(K) S O) —+ 0 which implies (3.34). Therefore, conditional on proof of the other two cases, T-l/za and T‘1/2B are 0p(1). E] Proof of Theorem 2. Let . (3.35) T-1 23,21 sgn(T-“2m — T'Wd — T—l/W) T’1 ZT=1sgn(T"l/2$Tj - T’l/Zd ‘ ”Fl/23TH” where d, = (a, B)’. We rely on Theorem 2.7 of Kim and Pollard (1990) to ensure that (T’l/Zd,T‘1/23)’ converge to the solutions to the asymptotic version of (3.35), Q((A, B)’ ) for some random variables A, B where 1s nAW —A-B d Q<'>= f” g ( (5) 5” . (3.36) * fol sgn(AWe) - A — Bards As noted in the proof of de J ong, Amsler, and Schmidt (2002), in order to use Theorem 2.7 in Kim and Pollard (1990), IQT(0)| has to go to 00 as |0| —+ 00, which does not hold. But, this can be fixed by considering \II'1(|QT(-)|) where \I' is the cdf of normal distribution as |QT()| is bound between zero and one. Note that for any (a, B)’ E R2, QT<(a, W) —"-’> QM BY). (33?) 129 Although this does not follow directly from the continuous mapping theorem due to the discontinuity of the sgn function, a continuous function arbitrarily close to the sgn function can be used in the place of the sgn function, which is the argument used in Park and Phillips (1999). Now, we will prove the stochastic equicontinuity of QT(9) on = {(61, 02)’ E R2 : —K S 61 S K, —K _<_ 02 S K}, thereby establishing that QT(0) => Q((A, B)’) on . As the equations below get longer, we define some notation for substitution: .. — - — j =1,Tj = sgn(T 1/2131‘3' - T 1”0112‘ — T UzfllTT) _ _ _ J — sgn(T 1/21'Tj — T ”2le - T 1/2/62TT), —~ —- _ _ J =2,Tj = sgn(T 1/233Tj — T 1/2C¥1T - T l/zfizTT) - sgn(T-main - T—1/2012T - T_1/252T%), -1/233Tj - a2 - 52%)- A1,Tj = sgn(T A21]- : sgn(T—”23:17 — a1 — (22%) — sgn(T First, for 0 E and 0’ E , we prove the stochastic equicontinuity of Q1T(-) by showing that (lirrblim supP SUP lQlTlo) — Q1T(9’)l > 6 —-> n—>oo o,o’:|91—e’1|<5;|92-6§|<5 (3,38) s 1imlimsupP(6(1 +6) sup |L(1,s)| > a) = 0. 6—>O 11,—+00 sE[—K,K] 130 Note that Q1T((01T:51T)I) - Q1T((0‘2Ta file') = (Q1T((alTnBlT)’) - Q1T((0‘1Ta52T)I)) + (owuam fim’) — Qwuaw. 5sz (3°39) T T = T_1 Z 51,117+ T—1 252,77. j=1 j=1 —1/2 —1/2 By Conjecture 2, we can replace T alT with a1 6 [—K, K]: similarly, T agT with (12 e [—K, K], T-1/251T with b1 6 [—K, K], and T'1/2fl2T with by, e [—K, K]. Then, (3.39) becomes T T T-l Zth + T4 2?ij i=1 i=1 < sup sup T‘12A11j (3.40) b16[—K,K]b2-—|b2 b1|<6 - + sup sup T-IZA2,Tj . 01€[-K,K] a2zla2—a1|<6 ' Now, by dividing an interval of [—K, K] into sub-intervals of an equal length of 6, (3.40) can be written as sup sup sup T"1 Z A1,Tj K Kb6'6,°+16b:b—b <6 {34-1547} 1[Z(z )] 2| 2 1| + su sup A —[%-]-1:i<[K] 016[i5,(i+1)5]a2.|a:-I()11I<5 le; 2 Tj T S sup sup sup T—1 Z |A1,Tj| -[§]-13is[§r] b16[i6,(z'+1)6] b2:|b2—b1|<6 131 T + sup sup sup T‘1 2 [112,17] ’[lg']-1.<_TS[K] 016[i5,(i+1)5]a2:]a2-all<6 S sup T121. In the last equality, we use the occupation times formula as in Park and Phillips (1999). L(1, s), a local time is a continuous stochastic process of time spent by the Brownian motion at the spatial point 3 over the interval [0,1], and sup36[_ K, K] |L(1,s)| is a well-defined random variable. The stochastic equicontinuity of Q2T(-) can be proved along the same lines as in the above. Q2T((011T, 161T),) - Q2T((012Ti 5H),) = (Q2T((alT: filT)') - Q2T((0‘1Ti32T)I)) + (Q2T((011Ti Ble') - Q2T((012Ti 5%),» 132 |/\ .. j sup sup T 1 Z |A1,Tj| T _b16[—K, K] b2 Ib2— b1I<5 T . - .7 + Sllp sup T 1 E |A2’le 7f; 016[-K,K] azzla2—01I<5 '= T T—l < sup sup E |A1,Tj| bIE[—K, K]b2: Ib2— b1l<5 - T + sup sup T”1 Z |A2,Tj| 016[-K,K] a2zla2—a1|<6 - since (j/T) S 1 for all t = 1, - -- ,T. Then, the remaining lines of proof for the stochastic equicontinuity of Q2T(-) follow from those in the proof of the stochastic equicontinuity of Q1T(.). Next, note that the finite-dimensional convergence of T"1 23531] sgn(ij — d — B j /T) for each 5 E [0, 1] holds because of a similar argument in (3.37) so that [6T] . , T-lZsgn(T-l/ZxT,—T-1/2a—T—1/2Bl)—d>/ sgn(AW(§)—A—B§)d§. (3.41) . T 0 From (3.41) and the stochastic equicontinuity implied by that of QT(-) in the above, the limiting distribution of T‘3 25;] 5% is as follows: T t . 2 T-3 J: 1(; sgn (1‘1", - CY - 8%)) T 2 =T 1 (T Ingn (Tl/2m—T-1/26—T-1/23-%)) i=1 133 Liz/01 (focsgn (W({)— é - gs) d§)2d(. Finally, the estimate of long run variance, 62/77 is equal to T . 7%1T—1 ngn (2377 — d — Bl)? J=1 7‘ 1 1 T j +2 - T- k — 7T 2 (7T) ]—1 T'j , .z' . -z'+j >< sgn(xTi-O-flffigflxmm)‘0‘5 T ) i=1 T+1 ' = 019(1) + MEI/l k(%)dj _1 T—j+1 . . X T A sgn(xmgT] — a — :66) 8(5 + 99)] dt X _ A _ sgn (xT.[(£+§;1)T1 a T _T_ _1- =o,,(1)+2/7T+7T [led—(111]) l W .L ’YT 1+1-C7 . . x % (sgn(mnle — a -— flé) . ~ (VT X sgn (SET,[(£+ (72: )T] — a -' 3(6 + T)))d€] dc oo 1 —+ 2/0 k(() ../0 sgn(TT,[€T] — a — B€)2d§d( =2meoa. where the substitutions (j /T) = 5 and (j /'yT) = C are made. 134 BIBLIOGRAPHY Alvarez, A., C. Amsler, L. Orea, and P. Schmidt, 2005, Interpreting and testing the scaling property in models where inefficiency depends on firm characteristics, Journal of Productivity Analysis, forthcoming. Amsler, C., and P. Schmidt, 2000, Tests of short memory with thick tailed errors, Unpublished Manuscript, Department of Economics, Michigan State University. Battese, GE, and T.J. Coelli, 1988, Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data, Journal of Econometrics 38, 387-399. Caudill, SB, and J .M. Ford, 1993, Biases in frontier estimation due to heteroskedas- ticity, Economics Letters 41, 17—20. , and D.M. Cropper, 1995, Frontier estimation and firm-specific inefficiency measures in the presence of heteroskedasticity, Journal of Business and Economic Statistics 13, 105-111. Coelli, T.J., 1995, Estimators and hypothesis tests for a stochastic frontier function: A Monte Carlo analysis, Journal of Productivity Analysis 6, 247—268. Davidson, J ., 1994, Stochastic Limit Theory (Oxford University Press: Oxford). de Jong, R.M., C. Amsler, and P. Schmidt, 2002, A robust version of the KPSS test, based on indicators, Unpublished Manuscript, Department of Economics, Michigan State University. de Jong, R.M., and J. Davidson, 2000, Consistency of kernel estimators of het- eroscedastic and autocorrelated covariance matrices, Econometrica 68, 407—423. Efron, B., 1982, The Jackknife, the Bootstrap and Other Resampling Plans (Philadel- phia: Society for Industrial and Applied Mathematics). , 1985, Bootstrap confidence intervals for a class of parametric problems, Biometrika 72, 45—58. , and R.J. Tibshirani, 1993, An Introduction to the Bootstrap (New York: Chapman and Hall). Erwidodo, 1990, Panel data analysis on farm-level efficiency, input demand and out- put supply of rice farming in West Java, Indonesia, Ph.D. dissertation, Department of Agricultural Economics, Michigan State University. Greene, W.H., 1990, A gamma-distributed stochastic frontier model, Journal of Econometrics 46, 141—163. 135 Hall, P., W. Hardle, and L. Simar, 1993, On the inconsistency of bootstrap distribu- tion estimators, Computational Statistics and Data Analysis 16, 11—18. , 1995, Iterated bootstrap with applications to frontier models, Journal of Productivity Analysis 6, 63—76. Hansen, BE, 1996, Inference when a nuisance parameter is not identified under the null hypothesis, Econometrica 64, 413—430. Horrace, W.C., and P. Schmidt, 1996, Confidence statements for efficiency estimates from stochastic frontier models, Journal of Productivity Analysis 7, 257—282. , 2000, Multiple comparisons with the best, with economic applications, Jour- nal of Applied Econometrics 15, 1—26. Jondrow, J ., C.A.K. Lovell, I.S. Materov, and P. Schmidt, 1982, On the estimation of technical efficiency in stochastic frontier production model, Journal of Economet- rics 19, 233—238. Kim, J ., and D. Pollard, 1990, Cube root asymptotics, Annals of Statistics 18, 191— 219. . Kim, Y., 1999, A study in estimation and inference on firm efficiency, Ph.D. disser- tation, Department of Economics, Michigan State University. , and P. Schmidt, 1999, Marginal comparisons with the best and the efficiency measurement problem, Unpublished Manuscript, Department of Economics, Michi- gan State University. Koop, G., J. Osiewalski, and M.F. Steel, 1997, Bayesian efficiency analysis through individual effects: Hospital cost frontiers, Journal of Econometrics 76, 77—106. Kumbhakar, SC, 1996, Estimation of cost efficiency with heteroscedasticity: An application to electric utilities, Journal of the Royal Statistical Society, Series D (The Statistician) 45, 319—335. Kwiatkowski, D., P.C.B. Phillips, P. Schmidt, and Y. Shin, 1992, Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?, Journal of Econometrics 54, 159—178. Lee, Y.H., 1991, Panel data models with multiplicative individual and time effects: Application to compensation and frontier production functions, Ph.D. dissertation, Department of Economics, Michigan State University. , and P. Schmidt, 1993, A Production Frontier Model with Flexible Temporal Variation in Technical Efficiency, In H.O. Fried, C.A.K. Lovell, and SS. Schmidt (eds.), The Measurement of Productive Efiiciency (New York: Oxford University Press). 136 Newey, W.K., 1991, Uniform convergence in probability and stochastic equicontinuity, Econometrica 59, 1161—1167. Olson, J .A., P. Schmidt, and D. Waldman, 1980, A Monte Carlo study of estimators of stochastic frontier models, Journal of Econometrics 13, 67-82. Osiewalski, J ., and M. Steel, 1998, Numerical tools for the Bayesian analysis of sto- chastic frontier models, Journal of Productivity Analysis 10, 103—117. Park, B.U., and L. Simar, 1994, Efficient semiparametric estimation in a stochastic frontier model, Journal of the American Statistical Association 89, 929—936. Park, J .Y., and PCB. Phillips, 1999, Asymptotics for nonlinear transformations of integrated time series, Econometric Theory 15, 269-298. Phillips, P.C.B., 1987, Time series regression with a unit root, Econometrica 55, 277—301. , and P. Perron, 1988, Testing for a unit root in time series regression, Bio- metrika 75, 335—346. Pitt, M.M., and LP. Lee, 1981, The measurement and sources of technical inefficiency in the Indonesian weaving industry, Journal of Development Economics 9, 43—64. Reifschneider, D., and R. Stevenson, 1991, Systematic departures from the frontier: A framework for the analysis of firm inefiiciency, International Economic Review 32, 715—723. Schmidt, P., and TR Lin, 1984, Simple tests of alternative specifications in stochastic frontier models, Journal of Econometrics 24, 349—361. Schmidt, P., and RC. Sickles, 1984, Production frontiers and panel data, Journal of Business and Economic Statistics 2, 367—374. Simar, L., 1992, Estimating efficiencies from frontier models with panel data: A com- parison of parametric, non-parametric and semi-parametric methods with boot- strapping, Journal of Productivity Analysis 3, 171—203. , C.A.K. Lovell, and P. Vanden Eeckaut, 1994, Stochastic frontiers incorpo- rating exogenous infiuences on efficiency, Discussion Papers No. 9403, Institutde Statistique, Université Catholique de Louvain. Wang, H.J., and P. Schmidt, 2002, One-step and two-step estimation of the effects of exogenous variables on technical efficiency levels, Journal of Productivity Analysis 18, 129—144. White, H., 1980, A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity, Econometrica 48, 817—838. Wooldridge, J .M., 2002, Econometric Analysis of Cross Section and Panel Data (The MIT Press: Cambridge, Massachusetts). 137 \l"flllljllflijllljflflljllfli”[191]:I