LIBRARY Illehlm State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE FEB 0 51997 2009 97.0.20», 4L ' {\I MSU Is An Affirmative Action/Equal Opportunity Institution cmma-ot THREE ESSAYS ON SHARE CONTRACTS, LABOR SUPPLY, AND THE ESTIMATION OF MODELS FOR DYNAMIC PANEL DATA BY Seung Chan Ahn A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1990 ABSTRACT THREE ESSAYS ON SHARE CONTRACTS, LABOR SUPPLY, AND THE ESTIMATION OF MODELS FOR DYNAMIC PANEL DATA BY Seung Chan Ahn This dissertation deals with three topics: share contracts, labor supply, and the estimation of models for dynamic panel data. Chapter 1 proposes a model which predicts under generally acceptable assumptions, fixed wages across different economic states and lay-offs for bad states, and shows that a share contract exists that Pareto-dominates and has no less employment than the fixed-wage contract. Chapter 2 considers joint estimation of the determinants of the employment status of married women, their labor-force participation decisions, and their market wages. The empirical results imply that recognizing frictions in the labor market is important to explain the determinants of individuals' employment status in a concrete and correct way. The estimation procedure including the wage equation generates more significant and reasonably signed estimates. Chapter 3 considers a dynamic model using panel data which include a large number of cross-section observations, but only over a short period of time. 'This chapter proposes an estimator that is efficient under general circumstances. TABLE OF CONTENTS Introduction . . . . . . . . . . . . . . Chapter 1 A share Economy as a Work incentive Device I. Introduction . . . . II. Wage Contract Model III. Share Contract Model IV. Conclusion . . . . . Appendix . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . Chapter 2 The Joint Estimation of a Model of Labor Force and Employment Decisions and Market I. Introduction . . . II. Model . . . . . . III. Data . . . . . . . IV. Empirical Result V. Conclusion . . . . Appendix A . . . . . . . . . . . . . . Appendix B . . . . . . . . . . . . . . . Appendix C . . . . . . . . . . . . . . . Appendix D . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . iii Wages 17 24 29 32 33 33 36 50~ 53 64 75 78 81 86 89 Table of Contents (cont'd) Chapter 3 Efficient Estimation of Models for Dynamic Panel Data . . . . . . . . . . . . . . . . . . . . . I. Introduction . . . . . . . . . . . . . . . . . 91 II. Conventional IV Methods . . . . . . . . . . . 94 III. Derivation of Moment Conditions . . . . . . . 98 IV. Estimation . . . . . . . . . . . . . . . . . . 108 V. Estimation with Exogenous Variables . . . . . 115 VI. Conclusion . . . . . . . . . . . . . . . . . . 124 Appendix A . . . . . . . . . . . . . . . . . . . . 127 Appendix B . . . . . . . . . . . . . . . . . . . . 130 Appendix C . . . . . . . . . . . . . . . . . . . . 132 Appendix D . . . . . . . . . . . . . . . . . . . . 146 Appendix E . . . . . . . . . . . . . . . . . . . . 148 Appendix F . . . . . . . . . . . . . . . . . . . . 151 References . . . . . . . . . . . . . . . . . . . . 154 iv Table Table Table Table Table Table Table Table Table Table Table 10 LIST OF TABLES 66 67 68 69 69 7O 71 72 73 74 103 INTRODUCTION This dissertation deals with three topics: share contracts, labor supply, and the estimation of models for dynamic panel data. Since each topic is independent of the others, each of following chapters focuses on one topic and contains its own introduction and conclusion sections. Chapter 1 attempts to provide a theoretical basis for share contracts. There are many studies comparing share and fixed-wage contracts from the point of view of welfare or/and employment. However, their methods of comparison are arbitrary, in the sense:that they simply assume the fixed-wage contract to be optimal among wage contracts. A better comparison could be done by investigating the conditions which generate fixed wages across different economic states, and examining whether a share contract could perform better than a fixed-wage contract under those conditions. For this reason, I propose a microeconomic model which predicts, under generally acceptable assumptions, fixed wages across different economic states and lay-offs for bad states. I then show'that a share contract exists that Pareto-dominates and has no less employment than the fixed—wage contract. This result implies that share contracts could not only improve every economic 2 agent's well-being, but also stabilize the employment level in the economy. Chapter 2 considers joint estimation of the determinants of the employment status of married women, their labor-force participation decisions, and their market wages. Many of the previous studies of labor supply assume that the employment status of an individual is determined solely by his/her desire to work. Those studies treat the unemployed and non-participants as behaviorally equivalent, ignoring frictions in the labor market. In this chapter, the unemployed are regarded as willing to work but not successful in their job search, and therefore they are treated as behaviorally different from.non-participantsm Therefore, the model considered in this chapter consists of two equations describing employment and labor-force decisions, and also a wage equation. The empirical results given in this chapter imply that recognizing frictions in the labor market is important to explain the determinants of individuals' employment status in a concrete and correct way, and that the traditional labor-supply model generates biased estimates of the determinants of willingness to work. Furthermore, compared to other methods for joint estimation of labor-force and employment decisions, the estimation procedure including the wage equation generates more significant and reasonably signed estimates. Significant sample selection biases generated by employment and participation decisions are also detected in the distribution of observed wage rates, and they 3 are successfully corrected by the joint estimation procedure. Chapter 3 considers a dynamic model using panel data which include a large number of cross-section observations, but only over a short period of time. This chapter proposes an estimator that is efficient under general circumstance. Several authors have proposed simple but consistent instrumental-variable (IV) estimators, which are identical to generalized-methods—of-moments (GMM) estimators based on some available moment conditions. GMM estimators are efficient in general circumstances if all known is a certain set of moment restrictions. This chapter adopts standard assumptions for the dynamic panel data model, and characterizes all of the moment conditions that these assumptions imply. It turns out that previous studies do not impose all of the available moment conditions, which reveals the inefficiency of the previous IV estimators. The estimator proposed in this chapter is efficient because it is obtained exploiting all useful information from the standard assumptions. Chapter 1 A Share Economy as a Work Incentive Device I. Introduction Stagflation during the last two decades has put an end to the Gblden era of the Keynesian doctrine. If the Phillips curve is downward sloping, then a trade-off between unemployment and inflation must exist, and the government can choose a desirable combination of them. However, the lesson we have learned during last 20 years is that the long-run Phillips curve seems to be vertical. High unemployment and inflation do not alternate; they rather frequently occur simultaneously. Government is no longer able to buy employment at the cost of inflation. In a Keynesian framework, an expansionary policy successfully reduces the unemployment rate, since inflation drives down real wages. However, the public's expectation of inflation seems to catch up to actual inflation so quickly that nominal wages increase at approximately the same rate as the general price level. Therefore, government's expansionary policies often create high inflation without affecting employment even during recessions. This means that real as well as nominal wages seem to be rigid. Through a series of publications, M. Weitzman argues that stagflation is tied to ‘wage rigidity. He argues that "stagflation is just an unfortunate consequence of the wage- payment system."1 His basic view is identical to the 5 Keynesians' in that he believes that all problems have their origins in wage rigidity. However, his prescription differs from the Keynesians' in that he emphasizes the necessity of making wages flexible. His suggestion is to tie compensation to "an appropriate index of the firm's performance, say a share of its revenues or profits."2 For simplicity, imagine a payment system in which the wage rate is determined by two components -- some fixed compensation and a portion depending on firm's total revenue. In this case, wages will move in the same direction as the firm's performance. By a simple algebraic equation, Weitzman shows that firms are always characterized by an excess demand for labor. Therefore, in a share economy, firms behave like a vacuum cleaner, constantly searching for employees and eagerly sucking' up all the unemployed. In a share economy, capitalism not only guarantees Consumer Sovereignty; but Worker Sovereignty.3 With this belief, Weitzman suggests that the government should offer tax incentives in order to get firms to adopt share contracts. There are two main criticisms of weitzman's analysis. His ideas can be summarized in two propositions. First, in a share economy compensation is no longer rigid, and it adjusts in a manner that leads the economy to full employment even in a short run. Second, the share economy improves the welfare of economic agents. Nordhaus [1988] and John [1987] have criticized Weitzman's first proposition. Nordhaus argues that Weitzman's analysis of the short-run behavior of a share 6 economy omits a detailed specification of labor supply, and shows that the excess-demand proposition no longer holds when labor supply constraints are introduced. John shows that a share economy may actually lead to greater employment fluctuation depending on the specification of labor supply curves. Share contracts will have less employment fluctuation only’ if the share jparameters are. determined. by' correct information about the demand for and supply of labor. Cooper, (see Nordhaus and John [1986]) amongst others, has criticized the second proposition. Implicit contract and efficiency wage theories provide some intuition on the rigid wage phenomena. The former suggests that risk-averse workers would prefer rigid wages. The latter argues that wages are rigid downward in order to prevent workers from shirking. If wage rigidity comes from the self-interests of economic agents, then the payment system will not allow wages to fluctuate. Weitzman does not deny that fixed wages would be optimal at.a:microeconomic level. His basic assertion is that the wage contracts are not optimal at a macroeconomic level. In order to support this claim, Weitzman [1985] shows that share contracts could increase employment while at the same time offering approximately the same compensation as under fixed-wage contracts. This result is based on a macroeconomic model. In response, Cooper [1988] shows that an injection of share contracts into one sector of a two-sector economy will yield a Pareto-improving resource allocation only in a special case. That is, a share economy, in which all firms adopt 7 share contracts, may be superior to a wage economy, but share contracts adopted by a subset of sectors may not help the whole economy. This paper is an attempt to provide support for the share economy, which Wietzman's model fails to do. If there exists a share contract that Pareto-dominates, and has no less employment than a wage contract in a microeconomic model, then the two main criticisms described above will be no longer valid. The Pareto-dominating share contract will be able to help the whole economy without suspending some group's self- interest. Employment will also fluctuate less in a share economy than in a wage economy. I demonstrate this in two steps. Some implicit contract models predict constant compensation across states of nature under fairly strong assumptions. My first focus is on whether constant compensation could be observed under more general conditions, in particular, when it is necessary to monitor labor's effort on the job. I assume only risk-neutral firms and risk-averse workers. Workers may have an incentive to shirk on the job, once they are employed. Shapiro and Stiglitz's [1984] efficiency wage model shows how the monitoring cost on workers could bring about downward wage rigidity and unemployment. My key point is the introduction of a no-shirking condition (NSC) to the implicit contract framework as an incentive compatible mechanism. My model predicts fixed compensation across states with lay-offs in bad states. If there is no monitoring cost, 8 firms maintain a higher employment level even in bad states to insure risk-averse workers. If there is a monitoring cost, firms have an incentive to decrease the employment level to save on monitoring. They cannot easily cut compensation for workers, because lower wages give the employed an incentive to shirk. This intuition partly explains my results. As a second step, I compare share contracts with the optimal wage contract. Not surprisingly, there always exists a share contract which Pareto-dominates the optimal wage contract, The reason is quite simple. ‘Under a wage contract, shirkers still can get the agreed compensation regardless of whether or not they shirk. On the contrary, under a share contract, shirkers, themselves, suffer from their shirking, because the firm's total output and revenue decrease when shirking occurs. Therefore, firms can reduce the monitoring cost per worker due to workers' decreased incentive to shirk. Furthermore, I show the existence of some forms of share contracts which have no less employment than the wage contract. My model has a very different implication from Weitzman's. The share contracts in Weitzman's model Pareto- dominate the wage contracts at a macroeconomic level, not at a microeconomic level. The share contracts have a positive effect on the economy-wide employment level, which in turn increases aggregate demand, improving all firms' market conditionse The workers ‘who are already employed. will initially suffer from lower wages. However, their firms' 9 improved profitability under a share economy finally will compensate their suffering with higher wages. This reasoning is correct only when sufficiently many firms adopt share contracts. Instead my model predicts Pareto-dominance of share contracts over wage contract by a different mechanism. Share contract creates some cost to shirkers. Monitoring cost, a pure social welfare loss, will decrease. We can also choose a share contract that has no less employment than the wage contract has. Higher welfare and employment will result. This is possible even at a micro level. Section II shows that the optimal wage contract has fixed wages across the states. Section III proves that a share contract exists that Pareto-dominates the optimal wage contract, and has no less employment. Section IV summarizes some conclusions. II. Wage Contract Model First, consider a labor contract between a firm and N homogeneous workers. For simplicity, assume that the firm uses only labor to produce a single commodity; Output depends on total employment (E) and.each worker's level of effort (e), which are perfect substitutes in the production process. Define the revenue function by sf (eE) , where s denotes an predictable product-demand shock. Here, f(-) is strictly increasing and strictly concave, i.e., f' > 0, f" < 0. For simplicity, I assume that each employee's working hours are fixed for technological reason. We can relax this assumption 10 without changing the results in this section. (For details, see APPENDIX) Then, we have the profit function: (1) "(8) ' 8f(e(8)E(8)) - we(S)E(S) - W“(S)(N-E(S)) where w‘ is the wage paid to each employed worker and wu is the severance pay for the unemployed. Each worker has the same concave utility function: (2) U I U(Y,e) where Y denotes consumption. Assume that U? > 0, Ue‘< 0. For employed workers, consumption in state 8 is given by w°(s), and for the unemployed, w“(s). Assume that the firm is risk- neutral, and workers_are risk-averse. Then, 6w describes a wage contract with 6.,={E(s).e(s).W°(s).W“(s)} The optimal contract, 6w*, can be characterized by the solution to the problem: (C.1) maxa Earns) subject to Eg{(E(8)/N)U(W°(S).e(S))+(1-E(S)/N)U(W“(S).0)} 2 U0 0 s E/N _<. 1 e 2 0, for all s where 00 is a utility level of a worker obtained in the worker's next best alternative. The first constraint will be binding, because otherwise the firm. could lower' w” and increase profit. (See Cooper [1987].) This formulation is very close to Cooper's [1987] basic implicit contract model. The only difference is that I use e rather than the worker's hours as in Cooper's model. This difference makes my model 11 similar to the principal-agent model [1978]. Basically, principal-agent models are designed to show how a firm could improve its workers' productivity with some specific compensation scheme. As efficiency wage models suggest, worker productivity will be related to the wage rate. The model specified above offers the mechanism generating correlation between productivity and the wage in an implicit contract framework. The first interesting result that arises from (C.1) is that for any 8, the firm employs all workers --- there is full employment. We can summarize this as following. PROPOSITION 1. In any optimal contract, E(s)=N for all s. Proof. Suppose not, i.e., in 6w*, there exists a state, 31' with E(sl) < N. Given sl, a worker's expected utility is given by (E(Sl)/N)U(W°(81)re(81)) + (1-E(81)/N)U(W°(Sl).0) Consider another contract, 6 _w, such that gw is identical to 6w*, for all states other than s1, and at s1, there is full employment with §(81)N = e(81)E(81) We.) = (E(sl)lN)W°(s) + (1-E(sl)/N)W“(sl) Xu(81) = 0 Then, 1(81) = slf(e(81)N) - iflsflN = 81f(e(51)E(81)) ' (3(81))W°(51) ‘ (N'E(Sl))wu(81) 12 = n* Therefore, the firm is indifferent between fiw and 6w*. Now, compare the worker's expected utility under the two contracts: (E(Sl)lN)U(W°(81).e(81)) + (1-E(81)/N)U(W°(Sl),0) < U((E(81)/N)W°(81)+(1-E(81)/N)W°(81).E(81)e(81)/N) 3 U(!e(51):§(31)) The inequality is due to the assumption that workers are risk- averse. Thus, workers prefers aw to 6w*, so that 6w* is not an optimal contract. QED (C.1) implicitly assumes that the contract can be enforced voluntarily. However, this assumption is quite unrealistic. Even when a worker shirks, he still gets the agreed compensation. Since all workers are identical, no one will work. Therefore, the firm has to monitor workers in order to sort out those who shirk on the job. .Assume that the firm bears some cost (C) when monitoring a worker. Let m be the probability of catching a given shirker. Assume (3) C = C(m); C'>0 Now, each employed worker decides whether or not to shirk. If an.employed.worker does not shirk, he gets utility of U(w°,e). Otherwise he gets expected utility of (1-m)U(w°,0) + mU(o,0), because shirkers, once caught, are fired immediately. To prevent workers from shirking, the following condition (no- shirking condition;NSC) must be satisfied: 13 (4) U(w‘,e)z (1-m)U(w°,0) + mU(0,0) for all s. In this case, each firm faces the following profit function: (5) "(8) ' 8f(e(S)E(S)) - w°(S)E(S) - wu(S)(N-E(S)) - C(m(8))E(S) Denote a contract by 5w = {E(8),e(8).W°(S).W“(S).m(8)} Then, the optimal contract, 6w*, solves (C.2) maxa Ean(s) subject to (0-2-1) E,{(E(S)/N)U(W°(S)76(8))+(1-E(S)/N)U(W”(S).0)} 2 U0 (c.2-2) U(w°(s),e(s)) z (1-m(s))U(w°(s),0) + m(s)U(0,0) (c.2-3) o s E(s)/N s 1 The second constraint will bind, since the firm could otherwise decrease m, and save on monitoring costs. If there is no monitoring cost, i.e., if C(m) = O for any m, (C.2) will be identical to (C.1). The reason is quite simple. The firm can perfectly monitor workers without incurring any cost. That is, the optimal choice of m.must be 1. Since workers' expected utility does not depend on m, nonshirking workers will not resist the firm's perfect monitoring. If m = 1, (C.2-2) becomes U(w°, e) z U(0,0) Obviously, this condition must hold even for the solution of (C.1), because otherwise no one will work. Also, the firm's profit function in (C.2) is exactly identical to that in (C.1). This means that when monitoring cost is arbitrarily 14 small, there is no significant difference between (C.1) and (C.2). Actually, (C.1) is a special case of (C.2), which can be obtained under the assumption of zero monitoring cost. Assuming that the solution always satisfies (C.2-3), (C.2) predicts fixed wage compensation across states and lay- offs in bad states. PROPOSITION 2. The solution to (C.2) satisfies followings: W“(s) = 1“ W°(s) = re e(8) = e m(s) = m for all s, and dE/ds 2 0 Proof. To solve (C.2), we can construct Lagrangean: L = sf(eE) - 93 - w“(N-E) -C(m)E + 8{(E/N)U(w°,e)+(1-E/N)U(w“,0)} + ¢G(w°,e,m) where.G(w°,e,m) ==‘U(w°,e) - (1-m)U(w°,0) - mU(0,0). Note that 6 is independent of 5, while o is a function of s. From the first order condition, we have (6) U’“w - N/e = o (7) U°w(6/N) + ¢Gm/E - 1 = o (8) sf' + Ufie(0/N) + ¢GelE = o (9) c' - ¢Gm/E = o (10) sf'e - w° + wu - C + (Ue-U“)(6/N) = O (11) G = o 15 where each subscript denotes the derivative with respect to the variable it represents, and U9 = U(w°,e), Uu = U(w“,0). Since 9 is independent of s, w“ is also independent of s. This means that w” is a constant (wu = £9). By substituting (9) into (7), (8), (10), and (11), we can rewrite the last five equations as follows: (12) Ufiw(8/N) + c'sw/sm - 1 = o (13) sf' + U°°(8/N) + C'Ge/Gm = o (14) c' - ¢GmlE = o (15) sf'e - w° + w“ - c + (Ue-Uu)(8/N) = o (16) G(w°,e,m) = 0 Total differentiation and application of Cramer's rule of these equations yields4 dw‘lds = de/ds = dm/ds = 0 dE/ds = -f'/f"g > o QED It is very hard to provide clear-cut explanation for these results, because all of the variables are interrelated in a complex manner. However, some partial intuition follows. The existence of monitoring costs gives the firm an incentive to decrease its level of employment. As we saw in (C.1), if there is no monitoring cost, the optimal contract is characterized by full employment in all states (to provide insurance to the risk-averse workers.) This is possible because work effort is perfectly substitutable for employment in production precess. Workers are willing to accept lower wages in order to guarantee employment. However, monitoring 16 costs can be regarded as a fixed cost of employment. Therefore, in bad states, the firm would prefer lay-offs to save on the fixed cost of employment. Furthermore, since the employment level has no effect on NSC (see (C.2-2)), the firm has more discretion in choosing the employment level. This explains the employment fluctuation result. In this case, the firm must compensate workers with higher wages even in the bad states to make them bear the risk of being unemployed. This causes the wage profile across states to be flatter. Another interesting result is that the optimal contract fixes the level of work effort. This is consistent with observation that unions usually try to predetermine worker's on-the-job duties in labor contracts.s There is an conventional explanation about this phenomenon. If the firm has discretion on using workers, labor productivity could be increased, because firm. could deploy its employees efficiently. Higher labor productivity will allow the firm to produce the given quantity of output with a lower level of employment. Therefore, the firm will have a smaller incentive to increase employment, if there is some fixed cost of employment. Therefore, unions resist increasing labor productivity. This interpretation is supported by my model. In bad states, the firm would prefer to increase the workers' level of effort with higher wages, decreasing total employment to save on monitoring cost. However, since risk-averse workers put higher value on employment than on wages, they will resist this strategy. Also, the firm's profitability of 17 adopting this strategy is limited. Higher work effort decreases nonshirking workers' utility. Higher wages increase shirkers' expected utility as well as nonshirkers' utility. Therefore, the firm has to increase monitoring intensity and cost. Different from the common explanation, (C.2) suggests that job descriptions are not only for workers' interest but also for that of firms. III. Share Contract Model Section II provided a model which explains lay-offs and fixed wage compensation during a contract period. In this section, I will show the Pareto-dominance of share contracts over wage contracts. The wage compensation per employee under a share contract can be defined as following: (17) w° = v + asf(eE)/E where v is a fixed component of compensation, and a is a share parameter that specifies the ‘variable component of compensation. A share contract, 68 is defined by 6. = {E(S).e(8).V(S).a(8).m(S)} Begin by assuming that the contract agreement is enforced voluntarily, so that we may ignore the NSC. In this case, no share contract can Pareto-dominate the optimal wage contract. In fact, the optimal share contract is identical to that of the wage contracts The reason is quite simple. ‘Under a share contract, wage compensation is decomposed into two parameters -- v and a. However, it is impossible to determine v and a separately, because the first order conditions for'V'and.a are 18 identical. Without the NSC, the optimal share contract, 63*, solves (C.3) max5 Ea {(1-a)sf(e(s)E(s))-v(s)E(s)-C(m(s))E(s) 'W“(8) (N-E(S) )} subject to (C.3-1) Ea {(E(s)/N)U(asf(e(s)E(s))/E(s)+v(s),e(s)) +(1-E(s)/N)U(w“(s),0)} 2 U0 (c.3-2) o s E/N s 1 The first order conditions for v and a are given: -E + (e/N)EU°w=o -sf(eE) + (6/N)sf(eE)U§d=0 Both equations are reduced to U°w =N/6 Therefore, we cannot determine v and a separately. Instead we can derive only an optimal combination of v and.a. This result implies an interesting characteristic of the optimal share contract, as summarized in following proposition. PROPOSITION 3. When workers have no incentive to shirk, the optimal share contract is identical to the optimal wage contract. Proof. Define w‘ = csf(eE)/E + v Substitute we into (C.3), and find the solution. ‘This must be the solution to (C.1). From w°(s), we can obtain an optimal combination of v and a. QED 19 Note that the solution to (C.1) predicts full employment in all statesu Basically, a contract between the firm.and its workers provides insurance that cannot be obtained in.market, due to the nontransferable characteristic of human capital. If both parties have perfect information about each other, the contract will be Pareto optimal. This result is nothing more than the optimal resource allocation under Debreu-Arrow's world of uncertainty. Therefore, no reform of the wage scheme could make both the firm and workers better off concurrently. However, insurance markets usually suffer from the Moral Hazard problem, caused by insurance companies' imperfect information on customers' behaviour: As we saw in section II, the workers' incentive to shirk leads to unemployment under the optimal wage contract. To prevent workers from shirking, the firm wastes its resources in monitoring workers. In this case, a wage compensation scheme which can suppress the incentive to shirk may improve the performance of the economy. This turns out to be true. Consider the NSC in (C.2-2) . Under a wage contract, shirkers do not suffer from their own shirking, since they still receive the same compensation as nonshirkers do. Under a share contract, however, shirkers do suffer from their own shirking. If a worker shirks, the total output actually produced ‘will be smaller than the amount agreed. to be produced. Workers get lower compensation, because some portion of wages is related to total revenue. The wage under a share contract is given by: 20 asf(e(E-S))/E + v where S is the number of shirkers. When a worker decides to shirk, his expected utility is given: (18) (1-m)U(asf(e(E-1))/E+v,0) + mU(0,0) Therefore, the NSC under a share contract can be expressed as (C.3-3) U(asf(Ee)/E+v,e) 2 (1-m)U(asf(e(E-1))/E+v,0) +mU(0,0) Comparing (C.3-3) with (C.2-3), we can. easily see that shirkers have lower expected utility under a share contract. Hence, the firm could reduce m, and C(m). This implies that firm and workers could be better off under a share contract. This is stated in following proposition. PROPOSITION 4. When workers have incentive to shirk, there exists a share contract which Pareto-dominates the optimal wage contract. PROOF. The optimal wage contract is characterized by 6.} = {P.g“.e.E*(8).m} Consider the following share contract, 5. = {0(8).V(S).W“(S).e(5).E(S)rm(S)} with m8) = E*(s). e(8) = s. 0(S)Sf(e(S)E(8))/E(S) + V(S) = E9. w“(s) = 39, and m(s) = m for all s. 21 For all states, both contracts yield the same level of profit for the firm and the same level of expected utility for the workers. However, 0(08f(§E(8))/E(S)+V(S).0) > 0(asf(s(E(S)-1))/E(S)+V(S).0) Therefore, U(08f(EE(8))/E(8)+V(S)r e(8)) ' U(!°.e(8)) = (1-m)U(LI°.0) + mU(0.0) = (l-m)U(an(§.E(S))/E(S)+V(S).0) + mU(0,0) > (1-m)U(an(§(E(S)-1))/E(S)+V(S).0) + mU(Oro) The NSC is not binding under the share contract. Therefore, the firm can reduce m, thereby decreasing C(m) , and increasing profits. QED For any share contract, the bigger is the portion of compensation related to total revenue, the higher cost a shirker bears“ Therefore, the best way to reduce the workers' incentive to shirk is to increase the share parameter, a, as much as possible. This gives us following result. PROPOSITION 5. For any share contract, the optimal combination of a and v requires v = 0. PROOF. Suppose not. Consider a share contract specifying 63 = {a,v,e,E,m}. I will suppress s for notational convenience. Suppose v¢o at some so. Choose g such that gsf(Ee)/E = asf(eE)/E + v. 22 Consider a contract, 6, which is identical to 6, except that g replaces a. Then, firm's profits and workers' utility do not change under both contracts. However, asf(e(E-1))/E + v = {asf(eE)/E+v}{f(e(E-1))/f(eE)} + v{1-f(e(E-1))/f(eE)} {QfleE} /E}{f(e(E-1) ) /f(eE)} + v{1-f(e(E-1) ) /f(eE)} gf(e(E-1))/E + v{1-f(e(E-1))/f(eE)} > gf(e(E-1))/E Therefore, U(gsf(eE)/E,e) > (l-m)U(gsf(e(E-1))/E,e) + mU(0,0) Firm can decrease m, thereby increasing profits at 50- This is a contradiction. QED A share contract may cause (on average) more unemployment than a wage contract, even though all agents could be better off under that share contract. Higher unemployment in an economy will generate a contraction in aggregate demand, and thereby worsen all firms' economic positions. This implies a downward shift in the distribution of s, which firms confront. If this is true, that kind of share contract may not be desirable at a macroeconomic level. Therefore, my next question is whether a share contract could guarantee higher employment. The answer is affirmative as summarized in following proposition. 23 PROPOSITION 6. There exists a share contract with is Pareto- superior to, and has no less employment than the optimal wage contract specified in (C.2). PROOF. The optimal wage contract is denoted by a; = m“. 2“. s. 2*(8). 11}- Consider a share contract, 53 = {e(s), w“(s), e(s), E(s), m(8)} satisfying (19) e(S) = a. (20) 0(8)Sf(sE(S))/E(S) = Re. (21) W“ = a“. (22) U(G(S)8f(e(S)E(S))/E(S),e(S)) = (1-m(8))U(a(S)Sf(e(S)(E(S)-1))/E(S).0) + m(8)U(0:0) for all states. Note that asf(g(E-1))/E,O) < £8, for any E, as long as (20) holds. Consider the case in which E(s)=E*(s). Since U(y_r°,§) = (1-m)U(w°,0)+mU(0,0), m(s) < m for any 8. Therefore, 68 must Pareto-dominate 6;, when E(s) = E*(s). Then the optimal form, 63*, of 68's also must Pareto-dominates 6;. This means that both the firm and the workers are better off under 6". By Bellman's principle, for any state, both are better off under 68*. Let 53* = {a*(s), 39, g, FN*(s), m*(s)}. Suppose that E**(s) < E*(s), for some so. Then, (E**/N>UU(.w.“.o> = (E"/N)U(xz°.s) + (1-E**/N)U(g“.0) < U(x°.g> + (1-E*/N)U(y".0) 24 This shows that workers have lower expected utility under 63* at so. This is a contradiction. QED Now, in addition to Pareto dominating wage contracts, a share contract generates less employment fluctuation than the optimal wage contract. If workers are more likely to be employed, there will be an increase in aggregate consumption, which shifts up distributions of s, and improves firms' profitability. An interesting implication of Proposition 6 is that this is possible even when wages actually given to workers are constant across states. Therefore, share contracts will Pareto-dominate even at the macro level. One shortcoming of Weitzman's analysis is that a firm has excess demand for labor only if workers already employed are willing to accept lower wages. (See Nordhaus [1988].) My model avoids this problem. IV. Conclusion In contrast to Weitzman's work, my model, which is based on a framework of implicit contract theory, allows us to make a complete welfare comparison between two different wage compensation schemes -- fixed-wage and share compensations. Fixed wages across the states are predicted rather than assumed in an ad hoc way. Therefore, the welfare comparison given this paper is less open to criticism than comparisons given by other studies. 25 Wage contracts can be regarded as a special form of share contracts. A share contract predetermines some fixed portion of wage compensation, with the remaining portion tied to the firm's performance in the product market. A fixed-wage contract, which is the optimal wage contract in my model, is a share contract with no variable portion of compensation. This suggests that fixed-wage contracts would be a suboptimal choice among share contracts. My model shows that this is the case. If workers have no incentive to shirk on their job, the wage and the optimal share contracts are identical. Wage contracts are always characterized by full employment in order to insure risk-averse workers. However, if workers have some incentive to shirk, the optimal wage contract specifies fixed wages across the states of nature, generating lay-offs in bad states. In this case, there exists a share contract which.not only Pareto—dominates the optimal wage contract, but also has no less employment for any state. This is possible because workers' incentive to shirk decreases under share contracts. In Weitzman's model, firms could have excess demand for labor only when their employed workers are willing to accept lower wages. The employed workers will accept only if the share contracts they accept generate sufficient.macroeconomic externalities on the whole economy that their suspended self- interests are ultimately compensated. For this to be true, substantially large portion of sectors in the economy must adopt the share contracts, and the share parameters should be based on the exact information on the economy, all of which 26 seem to be practically difficult. However, in my model, no one suffers from a share contract even at a micro level. No one has to wait until the macroeconomic externalities compensate his suspended self-interest. Furthermore, the share contracts will increase the economy-wide employment level, even when the share sectors are small. Exact information on the economy is not required to find this form of share contracts, as suggested by PROPOSITION 6. Therefore, the implementation of the share contracts will not be very costly. However, even though my model provides an argument in favor of the share economy, it is early to draw some policy conclusions. Unfortunately, my model fails to provide a clear-cut answer to a different question: Why does an economy resist converting from a wage to a share economy, if share contracts are really superior to wage contracts? I can offer only some partial intuition. under a share contract, each worker's wage depends on other agents' work effort. If there is a shirker, all workers suffer from lower'wagesw Therefore, some burden of monitoring should fall on the workers themselves -- each worker becomes more sensitive to the other workers' behavior. This generates some kind of psychological cost to workers. In this situation, workers may prefer wage contracts. From standpoint of the firms, a share contract will reduce a management's discretion on production. A share contract can be successful, only if workers and a firm share the. exact information. concerning' the firm's real. market In 27 situation and its true total revenue. In other words, a credibility problem arises in a share economy. This means labor's participation in management is a necessary condition for a successful share contract. In this case, a management cannot efficiently cope with an abrupt change in the firm's market situation, because any decision of the management should wait for its workers' approval. A longer decision process will reduce the firm's profitability. Therefore, firms might also prefer wage contracts. These are just some possible reasons why an economy might be characterized by wage contracts. Clearly, this question requires further study. 28 ENDNOTES 1. Weitzman [1985], p. 3. 2. Ibid., p. 3. 3. Ibid., pp 118-122. 4. The total differentiation of (13), (14), (15), (16), and (17) yields: - o U:w(6/N)+C'wa/Gm-C'Gw smw/s: U:e(9/N)+C'Gwe /Gm o U:w(6/N)+C'Gew/Gm-C'Gw Gmw/s; sf"E+U:e(8/N)+C'Gee/Gm -Gm/E -¢Gmw/E o 0 U: (6/N)-1 sf"eE+sf'+U:(0/N) _ 0 GW Ge c"sw/sm+c'swm/Gm o - r d¢ . - o 1 C"Ge/Gm sf"e dwe -f' c" ¢Gm/E2 de = 0 ds -C' sf"e2 dE -f'e Gm O . _ dm d _ 0 _ 5. See Balfour [1987], pp 300-328. APPENDIX 29 APPENDIX In Section II, I assumed that each worker's hours on the job are fixed. This assumption is not required to obtain fixed wage under the optimal wage contact. To show this, redefine e as the hourly effort level of a worker, and we as the hourly wage rate. I assume that working hours (h) are perfect substitute for hourly working effort (e) in the production process. First, suppose that workers have no incentive to shirk on the job. Let 6w = {E,e,h,w°,w“} For simplicity, I suppress s. The optimal contract, Sw‘", solves (11.1) max5 E311 = E8{sf(ehE)-wehE-w“(N-E)} subject to (A.1-1) Ea{(E/N)U(w°,h, eh)+(1-E/N)U(w“,0)} 2 U0 We can easily show that PROPOTISION 1 still holds for (A.1). I omit the proof, since it is similar to that of PROPOSITION 1. Now, consider the case in which workers have incentive to shirk. Describe a wage contract by 6 = {E,e,h,w‘,w“,m}. w The optimal contract 6w*, solves (A.2) maxa E3{sf(ehE)- ehE-w“(N-E)-C(m)E} subject to (A.2-1) EB{(E/N)U(w°h,eh)+(1-E/N)U(w“,0)} 2 U0 (1.2-2) U(w°h,eh) z (1-m)U(w°h,0) + mU(0,0) 30 Consider the first order conditions with respect to e, we,‘and h: (1) sf'hE + 0(E/N)Ueeh + ween = o (2) -E + (E/N)U°W + ¢(er-(1-m)er) = 0 (3) sf'eE - weE + 6((E/N)erwe+(E/N)Ueee) + ¢(erwe+Ueee-(1-m)erwe) = 0 where Ue denotes the utility of shirkers, U(weh,0). Observe that one of (1), (2), and (3) are a linear combination of the others. This means that we cannot determine e, we, and h, separately. To avoid this problem, let ee s eh w = weh. Substitution of e° and w into (A.2) gives us (C.2). Therefore, the optimal choice, across the states of nature, must fix e° and w at ge and g, respectively. Total wage compensation for each worker (w) and total working effort per worker (e°) are independent of the states. Firm can arbitrarily choose fixed working hours before a contract. Then, as in PROPOSITION 2, we have fixed hourly wage and effort level under the optimal contract. (A.2) contains another interesting implication. One may consider the case in which firm predetermines e at some level regardless of the states. Suppose that e=1. Then, (A.2) is still identical to (C.2) except that h replaces e, and that w (=Meh) takes the role of we in (C.2). Substituting w and h into and solving (C.2) will generate constant w and h. That is, when e is predetermined, the optimal contract has the 31 constant we during a contract period. This result is surprising, because conventional implicit contract models usually fail to predict fixed wage rates when working hours are allowed to vary. If effort level is fixed, and if NSC (A.2-2) is ignored, (A.2) becomes a standard contract model. (See p. 8 in Cooper [1987].) Without NSC, (A.2) fails to generate fixed wage rate, unless there are some fairly strong assumptions on the form of workers' utility function and on variations in hours. Therefore, NSC in (A.2) has a crucial role in generating fixed wages across the states of nature under the optimal contract. By this reasoning, it is safe to say that we do not have to assume fixed working hours. REFERENCES 32 REFERENCES Balfour, A. (1987), o - ana eme ons Chan Beehehy, Prentice-Hall, Inc., Englewood Cliff. Cooper, R. (1987), We e and o e at s i abo QQDLIAQES; Mierefeundehions ahd Macgoeconehie Implieehiehe, Harwood Academic Publishers, New York. Cooper, R. (1988), "Will Share Contracts Increase Economic Welfare?," The Amegican Eeonomie Review, Vol 78, pp 139 - 154. Cooper, R. and John A. (1988), "Coordinating Coordination Failure in the Keynsian Models," Quartezly Qeuzhal ef Economies. August. pp 441 - 463- John, A. (1987), "Employment Fluctuations in a "Share Economy", Working Paper, Michigan State University. Miller, R.E. (1979), ami O imizat' a ono Applications, McGraw Hill Inc. Milton, H. and Artur, R. (1978), "Some Results on Incentive Contracts with.Applications to Education and Employment, Health Insurance, and Law Enforcement," mm Eson2m12_8exier. Vol. 68. pp 20 -30- Nordhaus, W. (1988), "Can the Share Economy Conquer Stagflation?" Quereegly;gourhal of Economies, February, pp 201 - 223. Nordhaus, W. and John, A., eds. (1986), "The Share Economy: A Symposium," gogrna; oi gomparaeive Economics, Vol. 10, pp 414 - 473. Shapiro, C. and Stiglitz, J.E. (1984) , "Equilibrium Unemployment as a Worker Discipline Device, " The mezicah Ecenomie Review, Vol. 74, pp 433 - 444. Weitzman, M.L. (1984), The Shaze Ecenomy, Harvard University Press, Cambridge. Weitzman, M.L. (1984), "Profit Sharing as Macroeconomic Policy." WW. Vol. 75. pp 41 - 45- Weitzman, M.L. (1985), "The Simple Macroeconomics of Profit Sharing," The hmegieah Economic Review, Vol. 75, pp 937 - 953. Weitzman, M. L. (1987) , "Steady State Unemployment Under Profit Sharing," The Ego DQEiC Jehgha ai, Vol. 97, pp 86 - 105. Chapter 2 The Joint Estimation of a Model of Labor Force and Employment Decisions and Market Wages I. Introduction Much of the empirical literature on the labor supply decision simply assumes that individuals can obtain jobs once they decide to enter the labor market. That is, the employment status of an individual is determined by only one selection criterion —- the individual's decision on whether to work. In many models used to explain employment status, individuals are categorized into two groups: employed and nonemployed. The unemployed and non-participants are treated as behaviorally identical in their decision process, and both are regarded as one single group -- the nonemployed. These models implicitly ignore frictions in the labor market. It is a well-known fact that unemployment is not simply explained by individuals' work incentives. According to the single selectivity criterion based on employment and nonemployment, both the unemployed and people not in the labor force (NLF) choose not to work because their reservation wage rates are greater than their market wage rates. Therefore, the unemployment status of an individual is purely voluntary. In this sense, I call these traditional models No-Friction models. The aim of this paper is to provide estimates of the determinants of the employment status of married women and their market wages by using two different selection criteria 33 34 -- preferences for work and ability to become employed. Some studies show the importance of the existence of the unemployed in a given data set. Flinn and Heckman [1983], applying a duration model to young men selected from the National Longitudinal Survey, reject the hypothesis that the classifications unemployed and NLF are behaviorally equivalent. Ham [1982], using a sample of prime aged males taken from the University of Michigan's Panel Study of Income Dynamics (PSID), shows that the estimates of parameters in an equation for work hours are biased if the unemployed or underemployed workers are ignored. Also, Blundell, Ham, and Meghir [1987] reject the Tobit model based on the traditional No—Friction model, using a sample of married women drawn from the UK Family Expenditure Survey of 1981. We categorize individuals into three different groups: employed, unemployed, and non-participants. A married woman is assumed to enter the labor market if her reservation wage is less than the prevailing market wage. However, not all individuals who decide to enter the labor market get jobs immediately. A woman is employed only when she matches with an employer who is willing to hire her. An individual who is better able to find potential employers will have a higher probability of being employed. In this sense, I call the model in this paper a Friction model. For this model, we may construct a job-match equation which can distinguish between the employed and the unemployed in a probit framework. Labor- force and employment decisions can be jointly explained by a 35 bivariate probit model with partial observability. (See Meng and Schmidt [1985], and Farber [1983].) This paper also estimates the parameters in the wage equation by the maximum-likelihood estimation (MLE) method. We observe only the currently employed workers' wages. That is, the observed distribution of wages depends not only on individuals' decisions about labor-force participation, but also on their ability to find jobs. The data collected, therefore, will have two types of selection biases. If we estimate labor—force and employment decisions and the wage equation jointly, the parameters in equations for labor-force and employment decisions will be estimated more efficiently. At the same time, the conventional loglikelihood ratio (LR) test is applicable for the hypothesis of no selection bias. The extension of Heckman's simple two-stage estimation method is usually used in other studies for cases where two selection rules generate the sample. (See Fishe, Trost, and Lurie [1981], and Ham [1982].) The selectivity regressors used in the least squares (OLS) estimation of the wage equation are generated by a bivariate probit model. We can easily apply this extended two-stage estimation method to the case where employed, unemployed, and NLF people are observed separately. Other studies usually use an F-statistic for the test of the joint significance of the selectivity regressors. In Heckman's simple selectivity model, the standard t- statistic for the selectivity regressor has been used for the test of no selection bias. Melino[1982] (also, Lin[1982]) 36 shows that this t-statistic is asymptotically equivalent to the Lagrangean Multiplier (LM) test statistic. Likewise, this paper shows that the F statistic for the hypothesis of no selection bias for the model with two selection rules is asymptotically equivalent to the LM test" This means that.the F-statistic has good power properties, at least asymptotically. The empirical results in this paper reveal that the Friction model explains a married woman's labor status better than the No-Friction model. The joint estimation of labor- force, employment status and the wage rate generates more reliable estimates. There is significant evidence of selection bias from ignoring labor-force and employment status. The correction for sample selection bias in the extended two-stage method turns out to be not quite as satisfactory as that by the MLE method. This paper is organized in the following way. Section II describes the basic model for frictions in the labor market. Section III summarizes the data, and describes the explanatory variables used for the empirical study. Section IV demonstrates the empirical results. Some concluding remarks follow in Section V. II. Model This section explains the basic model based on the assumption of frictions in the labor market. Notice that not all married.women who enter the labor market get jobs. ‘We may 37 assume that if an individual has higher market wage than her reservation wage, she enters the labor market. However, unless she matches an employer who is willing to hire her, she remains unemployed. The Current Population Survey (CPS), the National Longitudinal Survey (NLS), and the Panel Study of Income Dynamics (PSID) provide the data on three different groups of married women: the employed, the unemployed, and the non-participants. In this case, we need another employment criterion to consider the unemployed people separately from the NLF people. We may imagine that each individual has her own job-match skill. Those who have better job match ability will have a higher probability of being employed. Let "1* be the index for the i'th individual's job-match skill; wi, the market wage; win, the reservation wage. If "1 2 wiR, the i'th individual participates in the labor market. If M1' 2 0, she matches an employer, and get a job which pays her wi. If “1 < win, she retains her NLF status. When "1 z w1* and “1*‘< 0, she is in the labor market, and remains unemployed. “1 is observed if and only if the i'th individual is employed. Therefore, individuals' behavior in the labor market is determined by the three variables -- market wage rates, reservation wage rates, and the job-match index. These are summarized by Model I. Model I. The Structural Model R _ (1.1) w, - 21161 + 611 38 (1.2) ”1* = 22162 + 621 (1.3) “1 3 23163 + 631 611 _ ° 2:11 2312 2:13 621 N o 212 1 2:23 o i = 1,2,ooo,N. ‘31 r 213 323 333 v (611,621,63i)' are independently and identically distributed. The 351 are the observed lokj vectors of explanatory variables. yli = [1, if W1 ZWiR 0, otherwise. y21 = 1, if 111* >= 0 [ 0, otherwise. y21 is observed if and only if y11=1. wIR and ”1* are not observed. wi is observed if and only if y11y21=1. Y11 denotes labor-force status (LF), and Y21 denotes ability to find a job. Therefore an individual is employed if y11y2f=1. For a simple estimation procedure for this model, let (2-1) 911 = “31"11” (2311”333'2313)“2 (2.2) e21 - £21 (2.3) e31 = 631 Then, (eli'e2ile3i)' ~ N(O, 0) where “=[ . 23 33 e era PD QQQ (.1 w L___J 39 1 (223'212)/(211+233’2213)§ (333'213)/(211+233'2213)§ ' ° 1 223 e e 233 Let (3 e 1) yli* (Vi-Win)[(211+233-2213)1/2 (2315341151) / (311+333"2Z13)1/2 +911 3 x1151 + 911 (3-2) Y21* = M1 = z2152 + 521 = X2152 + 921 (3-3) Y31 = "1 '3 z3153 "’ ‘11 = x3153 + 931 Now, we can rewrite Model I as follows. Model II. Reduced-Form Model. “-1) hi = x11191 + 911 (4-2) Y21 = x2152 4’ 921 “-3 Y31 = x3153 + 931 e11 821 ' N(0, Q) i = 1,2,000,N. e31 where 1 p 013 n = o l 023 ° ° “33 yli = I: 1, if ylif Z 0 0, otherwise. 0, otherwise. 40 Y21 is observed if and only if ydi=1. Y11* and Y21* are not observed; y31 is observed if and only if y11y25=1. If our interest is just in the joint estimation of labor- force and employment. decisions, the jparameters in those processes can be estimated by a bivariate probit method with partial observability. (Specifically, this is the "Censored probit" model of Farber [1983] and Meng and Schmidt [1985].) Here we may distinguish two cases, depending on what is observed. First, there is the case of partial observability in the sense of Poirier [1980], in which we do not observe Y1i or Yzi for anyone, but we observe (Y11Y21)° This corresponds tarobserving only employment status (employed or not), but.not labor-force participation status for individuals who are not employed. Interestingly, we can still estimate separately the labor-force participation and employment equations, using Poirier' model. Case I. Partial Observability in the sense of Poirier y1i = 1, if yn" z o [ 0, otherwise yz1 = 1, if yn‘” 2 o [ 0, otherwise. i = 1,2,ooo,N. Only Y11Y21 is observed. Y11 is one if the i'th person in the labor-force, and 1’21 is one if the i'th person is able to find a job. Thus Y11Y21 is one if the i'th person is employed. 41 In this case, the maximum likelihood (ML) estimators of 81, 82, and p are derived by' maximizing the following log-likelihood function with respect to £1, 32, and p : (5) 1“ Lp(fl1132rp) =1§1{Y11Y211U[F(X11511X2152:P )] + (1'Y11Y21)1n[1'F(X1151:X2152:P)1} where F(o) is the bivariate standard normal distribution function. This method can allow for frictions in the labor market. However, the problem with this model is that the information about who is unemployed is wasted. This model basically categorizes the individuals in a given data set into just two groups; the employed and the nonemployed. The latter group consists of those who are not willing to work (NLF people) and those who want to work but can not find jobs (unemployed). This model does not identify who are who among the nonemployed. Therefore, even though the estimates from this model would be consistent, these estimates are.generally inefficient, if information about who is unemployed is available. (See Meng and Schmidt [1985].) The second case we consider is the case in which we observe Yéi when Y11=1 (though not when y1f=0). Thus we can observe an individual's success or failure in job search only when she is in the labor market. The unemployed people are identified as those who are willing to work (being in the labor market) but are not successful in their job search. Since this model can distinguish the unemployed from the NLF people, we can get more efficient estimates than we would get from Poirier's model. In this case, the parameters can be 42 estimated by censored probit method. Case II. Censored probit. * Y11=[1, ifyli 20 0, otherwise * y21=[1,ify21 20 0, otherwise. yli is observed. Y21 is observed if and only if Yii = 1. In this case, the ML estimators of £1, £2, and p are derived by maximizing the following log-likelihood function with respect to Bl, 82, and p : N (5) 1“ LC(31:52:P ) =121{Y11Y211n[F(xlifierZifiZIp)] + Y11(1-Y21)lnINXlifil)-F(X11filrxnfizm)] + (1’Y11)1n(1'e(X1131))} where o(.) is the standard normal distribution.function- lThis paper uses the censored probit model to estimate LF and employment decisions. The extension of Heckman's simple two-stage method is usually used when two different types of selection biases exist in the data set. The wage equation given in Model II also could be estimated by the extended two-stage method. The conditional mean of e3i‘will be given by (7) 1“331' Y11*?°rY21*Z°) = E(eBil 9112’X1131r9212’32152) ¢(Xnfil)¢[ (X2132-9X11fi1)/(1-92)"] F(X1131:X2132r P) + 0'13 43 ¢(X2132)9[ (xlifil-px2132)/(1-p2)g] F(X11511x215219) where 1 l 2 2 f(t ,t ,p) = exp[- -——————— (t -2pt t +t )] 1 2 2"(1_pz)1/2 2(1-p )2 1 1 2 2 and, h k F(h.k.p) = I I f(t1.t2.p)dt2dt1. -oo -oo where ¢(o) is the standard normal distribution function. We can find this formula in Fishe, Trost, and Lurie [1981], Ham [1981], Maddala [1987], and Poirier [1980]. (Also, see APPENDIX A.) Now, we can rewrite (4.3) as (4-3') Y31 = X3133 + or13l‘11 + or23"21 + V31 where E(v3ily11*20,y21*20) = 0, and ¢¢r (X21B2-0x11fi1)/(1-92)”] “11 = F(X1151:X2132r P) ¢ 3 - p where 81, 82, 3 are the ML estimates of 31,32, and p generated by the censored probit model. Let Ni denote the number of the observed y31. For simplicity, assume that plim Nl/N = k, 02+H.J + ylul-nnlnmu-Fn + (1-y11)ln(1-§11)} where P = (323“312)/(211"""33'2’313)1/2 013 = (333413”(211+333'231;a)1/2 F1 = ““1151: 23153: P) _ “1131+(e13/z33’(“1'23163) 22162+(223/333)(”1’23153) H-F 2 -a2 )/2 )1/2 ' (<2 -22 )/z )1/2 ' ‘( 33 13 33 33 23 33 p233"“13223 ] _ 2 1/2 _ 2 1/2 (233 013) (233 223) F1 = Fh‘nfiirxzifizrp) ’11 = e(xnfil) x1181 = (23153“21151)/(311+“333‘2213)1/2 Maximizing (17) with respect to 61,62,63, and E generates the ML estimates of the parameters. For identification at least one variable in 231 must not be included in 211° III. Data The sample of married women is taken from the University of Michigan's Panel Study of Income Dynamics (PSID) for 1981. These women are between the ages of 18 and 60 years. The sample excludes wives who are in the agricultural sector; self-employed; retired; disabled; students; or not in the continental U.S.. Wives whose total family money income in 51 1980 is less than $5,000 are also excluded from the sample. Black, White, and Hispanics are included in the sample, but people of other races are eliminated. Some respondents' answers to various question items are inconsistent. For example, some wives are reported as working in 1981, but are recorded as having zero hourly wage rates. Those observations with. unreliable answers are also excluded” .After 'this process, the sample contains 1962 observations. Of these, 923 people are recorded as working at the time the survey was taken in 1981, and are therefore: categorized as the employed. 956 women are reported as housewives, and they belong to the NLF group. This means that 48.7% of women in the data are out of the labor force. 83 women, or 8.2% of those in the labor force, are looking for jobs or temporarily laid-off. These women are regarded as the unemployed. The definitions, means, and standard deviations of the variables used in the analysis are shown in Table 1. The mean hourly wage rate shown in Table 1 is quite low. This is misleading because more than half of the women in the sample are not currently employed and their "wage" is recorded zero. If we consider only the employed, the average hourly wage is $5.87. Other earned and unearned income (OFINC) could affect a wife's labor-force participation and ability to get acceptable job offers. This variable is obtained by extracting the wife's labor income from total family money income. Since the logs of reservation wage and market wage are used in the model, I choose the log of OFINC (LOFINC) as 52 a regressor. Some families in the data have zero OFINC. Therefore, one is added to OFINC to calculate LOFINC; i.e., LOFINC=1n(OFINC+1). Regional effects are captured by city size and area of residence. The dummy variables, URB and REGS represent residency in an SMSA.and.in the South, respectivelyu Regional effects could be investigated in more detail if more dummy variables for regions were created. However, in this case, a proportionate increase in computational cost would follow. Demographic variables, such as years of education (ED), age (AGE), the number of children below'the age of 6 years (KIDS), and a dummy variable for race (MINOR), are also used. Blacks and. Hispanics are grouped as a single 'minorityu Work experience could affect.the market.wage an individual can.earn once employed or her job-match skill. The actual number of years worked since the age of 18 (EXP) is used to capture this effect. Finally, this study includes the local unemployment rate in order to capture differing demand conditions across areas. The PSID reports the unemployment rate in the respondent's county. This variable (UNEMPR) is used as the local unemployment rate. The explanatory variables in the equation for labor-force decisions are the constant; ED; URB; MINOR; REGS; UNEMPR; LOFINC; KIDS; AGE, and AGE squared divided by 1,000 (AGEZ); EXP, and EXP squared divided by 1,000 (EXP2). The vector of explanatory variables in the job-match equation includes the same variables in the labor-force decision except AGE and 53 AGE2. The explanatory variables in the wage equation are all the same variables as in the labor-force decision except LOFINC and KIDS. 1v. Empirical Results. The first column of Table 2 reports the estimates of a simple probit model for the No-Friction model. The last two columns describe the results of the simple probit models for the Friction model, assuming zero correlation between error terms in the labor-force and employment decisions. Both the first and second columns are about willingness to work. However, the results in these two columns are derived by different treatment of the unemployed. The model for no friction does not regard those unemployed people as having a desire to work, while the model for friction does. In spite of these differences, the estimates in the second column are generally similar to those in the first column and the sizes of effects and their signs satisfy our expectation. This similarity may come from the fact that only 4.2% of women in the data are unemployed, so that their treatment will not change the results dramatically. Some differences exist, though, between the results in the first and the second columns. First, the effect of race is four times as large in the second column compared to that in the first column. In fact, MINOR is insignificant when.the unemployed are considered as preferring not to work. On the contrary, as we see in the second column, when we interpret 54 the unemployed as willing to work, MINOR becomes significant at the 10% level. This means that the model for no friction understates the effect of race on the preference to work. Second, the coefficient.of'UNEMPR.is -3.77 (significant at the 1% level) under the assumption of no friction while it is -2.38 (significant at the 10% level) under the assumption of frictions in labor market. Therefore, the No-Friction model seems to exaggerate the effect of the local unemployment rate on the willingness to work. The third column in Table 2 shows the estimates of parameters for the employment status equation based on the Friction model. MINOR has a significantly negative effect on employment status. This confirms our expectation. Blacks and other minorities are more willing to enter the labor force, but either their job-search ability is less than that of whites, or there is discrimination in employment. This may also explain why the No-Friction model underestimates the effect of race on the preference for work. ‘Under the Friction model, MINOR has two opposite effects on employment status. Minorities have higher probabilities of being employed because they are more likely to be in the labor force. At the same time, they are more likely to be unemployed because of their poor job-search skills. These two opposite effects are captured by one equation under the No-Friction model, and therefore, these opposite effects cancel out. The Friction model can capture those different effects of race separately by two different equations. 55 The other notable.result in the second column is that.the local unemployment rate has a huge effect on employment status. ‘This implies that.the demand side of the labor'market is a major factor determining an individual's employment status. The effects on employment status of the local unemployment rate are twofold. First, as we see in the third column, a higher unemployment rate decreases the probability of an individual being in labor force. Second, once she enters the labor market, a woman has a lower probability of being employed. These two different negative effects of the unemployment rate are captured by the single equation for preference for work under the model for no friction. This may explain why the Friction model generates the exaggerated effect of UNEMPR on preference to work. Even though the Friction Model explains an individual's behavior more completely than the No—Friction Model, there is no direct method for discriminating between the two models, because one is not nested in the other. One roundabout.way is to compare the goodness-of-fit of the different models. In probabilistic-choice models, the proportion of successful predictions of the choices made is widely used for the measure of goodness-of-fit. (See Maddala [1987], pp 76 - 77.) Table 3 describes the frequencies of predicted outcomes for the No- Friction Model, while Tables 4-A and -B show those for the Friction Model. In Table 3, 69.7% of predicted outcomes are correct. According to Tables 4-A and -B, the Friction Model correctly predicts 76.1% of the total outcomes. The Friction 56 Model shows better predictive power than the No-Friction Model. In short, as we see in Table 2, the Friction model explains an individual's labor-force status in a more complete way than the No-Friction model. Some variables have opposite effects on employment status, leading the No-Friction model to understate the effect of those variables on the‘willingness to work. On the contrary, if some variables affect employment directly and indirectly through the labor-force decision but in the same direction, the No-Friction model overestimates their effects on the willingness tO‘workh These results imply that recognizing friction in the labor market can provide a more reliable explanatory mechanism for the supply approach to analyzing employment status. Until now, we have assumed zero correlation coefficient between labor-force and employment decisions. Table 5 reports results for the censored probit model, allowing a non-zero correlation coefficient. A test of zero correlation yields the LR statistic of 4.89, larger than the critical x2(1) value of 3.84 at the 5% level. Also, the conventional t-test shows significance of p at the 1% level. In spite of this fact, the results in Table 5 are generally close to those in the last two columns of Table 2. This is not surprising, because the estimates in Table 2 obtained under the assumption of zero correlation are still consistent. Some small differences also followu If we allow p to be different from zero, the signs of some of the coefficients in the employment decision equation 57 become more reasonable. For example, compared with those in the third column of Table 2, the estimated coefficients of ED and LOFINC become positive and negative, respectively, following our expectation. MINOR has an insignificant coefficient in Table 5, but the sign of the coefficient is still negative. The number of years of work experience has no significant.explanatory power for employment.status, when.zero correlation is assumed. However, allowing non-zero correlation reveals significant effects of work experience and the expected inverted-U shape. Finally, we can test the hypothesis that the coefficients entering in both labor-force and employment decisions have the same sizes of effects on those decisions. Under this null hypothesis, the restricted log-likelihood function (results not presented) has the value of -1421.76. Then the value of the LR statistic is 60.50, which is considerably larger than the critical x2(10) value of 23.21 at the 1% level. This implies that the explanatory variables affect labor-force and employment status separately in significantly different ways. Jointly estimating' labor-force and employment decisions and the wage equation could generate more efficient estimates of the parameters describing labor-force and employment decisions. This could be done based on the Reduced-Form model given by (4.1)-—(4.3) . Table 6 shows the ML estimation results for this joint estimation. Compared.with the results in Table 5 derived by the censored probit model, many estimates of the parameters describing employment status become significant, or 58 more significant. For example, ED, URB, MINOR, and LOFINC have no significant effects on employment status when they are estimated by the censored probit model. (See Table 5) They become significant at the 5% level when the joint model is estimated, as we see in Table 6. 'This is due to the increased efficiency of MLE for the joint model as compared to the censored probit model. The estimates in the first and the second columns of Table 6 are generally close to their counterparts in Table 5. MINOR has an insignificant effect on labor-force status in Table 6 while having a significant effect in Table 5. However, the sizes of the effect of MINOR described in the two tables are not substantially different. Almost all the estimates in Table 6 have the signs we would expect. More educated people and residents of bigger cities have a higher probability of being in the labor force, and are more likely to be employed. They also obtain higher wage rates once employed. Minorities are more willing to participate, but their probability of being unemployed is higher. Even when they find jobs, their wage rates are lower than those of whites. .Age has no significant effect on labor- force status or the market wage rate. (This is because experience is included in the equations.) The local unemployment rate seems not to affect labor-force status significantly. chever, the local unemployment rate has a large significant effect on employment status. A higher unemployment rate generates a greater likelihood of an individual being unemployed, and substantially decreases the 59 hourly wage rate. The income earned by other family members decreases a married woman's willingness to work and her probability of being employed. This negative effect of other family members' income on a woman's employment status is not surprising in light.of economic search models.and.commonsense. A woman with higher income due to others' earning will be more selective in choosing jobs. The number of children has very significant negative effects on both labor-force and employment decisions, as we would expect. This effect also could be explained by a search framework. A woman with more children will have more burdens of housework, which will lessen her intensity of job search. Years of work experience have quadratic effects on LF and employment status and the market wage rate. According to the estimates in Table 6, more experienced women have a greater desire to work, a higher likelihood of being employed, and a higher wage rate. Table 6 also shows that the error terms for labor-force and employment decisions are significantly correlated with the error term for the market wage rate. The covariances between errors in the labor-force decision and the wage rate, and between errors in employment status and the wage rate, are positive and significant at the 1% level. 'The same unobserved individual characteristics that make a woman more likely to enter the labor force and to find a job also tend to lead to an expectedly higher wage. 60 This result also implies the existence of sample selection bias in the estimation of an equation for observed wage rates. Table 7 shows the results of the restricted MLE under the null hypothesis of no selection bias. The restricted MLE of the parameters in the wage equation are just OLS applied to the observations on the employed women, while the restricted MLE for labor-force and employment decisions are identical to the estimates obtained by the censored probit model. Therefore, the results of Table 5 are repeated in the first and the second columns of Table 7. A test of no selection bias yields an LR statistic of 24.40, considerably larger than the critical value for 12(2) of 9.21 at the 1% level. This suggests that the OLS estimates could be seriously biased. Comparing the third column of Table 6 with that of Table 7, we can see that OLS underestimates the effect of the local unemployment rate on the wage rate. Under the null hypothesis of no selection bias, the estimated coefficient of UNEMPR is insignificant, while the unrestricted MLE shows a significant and large negative effect of UNEMPR.on the wage rate. The OLS estimates also understate the effect of work experience. The absolute values of the coefficients of EXP and EXP2 from the unrestricted MLE are almost twice as big as those obtained by OLS. Sample selection biases can be corrected by the two-stage estimation method. Table 8 reports the results. The estimated covariance between the labor-force decision and.the wage rate is significant and has a positive sign, while the 61 covariance between employment status and the wage rate is insignificant and has a negative sign. The F-statistic, which is asymptotically equivalent to the LM statistic, has a value of 8.46, greater than the critical F value of 4.61. Again, the null hypothesis of no selection bias is rejected. The two-stage estimation method corrects the biases of the OLS estimates for EXP and EXP2 quite well, and the estimated effects of EXP are quite close to those obtained using the unrestricted MLE. However, some estimates have an unexpected sign. Even though AGE and AGE2 are not significant in either the two-stage or the MLE estimates, in the two-stage estimates older women are predicted to receive lower wage rates. This is an unbelievable result. The estimated coefficient of UNEMPR is also insignificant and has an unexpected positive sign. Therefore, the two-stage estimates seem to fail in successfully eliminating biases in the OLS estimates. Also, the two-stage method produces an insignificant and negative covariance between errors in employment status and the wage rate. Compared to the unrestricted MLE results on the effect of race, the two-stage method underestimates the effect on the wage rate. The estimated coefficient of MINOR is not significantly different from zero. Considering many studies showing wage discrimination, we expect a significant and negative effect of minority on wage rates, while we get by MLE but not by the two stage method. On this point, the MLE result seems to be more reliable. 62 In short, the two-stage estimation method, as well as the MLE method, confirm the presence of sample selection biases. However, compared with the MLE, the two-stage method does not successfully eliminate the biases of the:OLS estimates, and in some cases generates perversely signed coefficients. These results imply the superiority of MLE over the two-stage method. As a final step, Table 9 describes the result of the joint estimation of the employment decision and the reservation and market wage rates, which is obtained by applying unrestricted MLE to the Structural model given in (1.1)-(1.3). UNEMPR, EXP, and EXP2 are excluded from the reservation wage equation. This is done for a computational reason. For the identification of the equation for the reservation wage, at least one explanatory variable in the wage equation must be excluded from the set of explanatory variables of the reservation-wage equation. The reservation wage could be interpreted as a woman's value of leisure or the nonmarket value of housework. In this sense, there is no reason why‘UNEMPR, EXP, and EXP2 should affect the reservation wage. .Also, the exclusion of these variables can be justified by their insignificance in explanation of reservation wages. If we include the variables, MLE on the Structural model amounts to that on the Reduced-Form model. The log-likelihood value given in Table 9 is not significantly different from that in Table 6, rejecting the hypothesis of significant effects of UNEMPR, EXP, and EXP2 on the reservation wage. 63 The estimated coefficients of the explanatory variables in the employment decision and the market wage rate equation are almost identical to their counterparts in Table 6. Error terms in the reservation wage rate equation are significantly and positively correlated with those in the market wage rate equation (see 213 in Table 9), but are not significantly correlated with those in the employment decision equation. A woman with an unexpectedly high reservation wage rate tends to get an unexpectedly high.wage rate when she is employed, while the unexplained part of her reservation wage rate does not affect the probability of her being employed. The results for the reservation wage equation are shown in the first column of Table 9. More educated people have a higher reservation wage rate. This, however, does not mean that more educated people are less willing to work, as an extra year of education increases the market wage more than the reservation wage. The high labor-force participation rate of minorities can be also explained by the results in the first column showing their lower reservation wage. The effect of age on reservation wage is insignificant but positive. As we expected, the more other income in a family, and the more children the family has, the higher the wife's reservation wage rate. The elasticity of the reservation wage rate with respect to other income is 0.15. Table 9 also shows that residents in.SMSAs or the South.have higher reservation wages. There is no theory which explains the effects of region on married women's reservation wages. However, the LR test of no 64 regional effects yields a 12 statistic of 117.95, considerably above the 12(2) critical value of 9.21 at the 1% level. This is a quite interesting result, and further examination seems to be required. v. Conclusion This paper presents a joint estimation method for labor- force and employment decisions and.market.wageu This is based on a Full Information Maximum Likelihood procedure as well as on a two-step method. Frictions in the labor market are assumed, and. therefore the ‘unemployed. are recognized. as behaviorally different from non-participants. The information about unemployed workers present in a given sample can be used to estimate the parameters describing the probability of a particular individual's being employed. This joint estimation method provides an explanatory mechanism showing the different and separate effects a variable could have on labor-force participation and employment decisions, as well as improving the estimates of parameters in the wage equation. The traditional labor supply model, which assumes no friction in the labor market, does not explain married women's employment status in a satisfactory way. Some variables could affect employment decisions directly and indirectly through preference for work. The No-Friction model, assuming that a person's employment status depends only on that person's willingness to work, cannot discriminate between those two different effects, so that it usually generates biased 65 estimates of parameters describing preferences for work. Compared to other methods for estimating labor-force and employment decisions jointly (for example, bivariate probit methods) , the estimation procedure including the wage equation generates more significant and reasonably signed estimates. A.given.data set usually contains a relatively small number of unemployed people, and therefore the information on them may not be enough to generate much more efficient estimates of parameters in employment decisions. Jointly considering the wage rate will be helpful for more efficient estimation of parameters in labor-force and employment decisions. These two decision rules can censor the observed distribution of wage rates, if error terms in the two equations describing labor- force and employment decisions are correlated with those in the wage equation. Therefore, as this paper shows, the information on wage rates can improve the efficiency of estimates of parameters explaining labor-force and employment decisions. 66 Table 1. Means, Standard Deviations and Definitions of Variables Variables Definition Mean S.D. EMP Employed=1 0.4704 0.4993 NLF NLF=1 0.4873 0.5000 WRATE Hourly wage rate($) 2.7615 3.3966 LRATE Log Of WRATE 0.8002 0.8758 ED Years of Education 12.205 2.2735 URB Resident in SMSA=1 0.6865 0.4640 MINOR Nonwhite=1 0.2717 0.4449 AGE Years of Age 37.485 11.298 AGE2 AGE2/1000 1.5327 0.8973 REGS South=l 0.3435 0.4750 UNEMPR Unemployment rate in the 0.0743 0.0242 resident's county in 1980 OFINC Other family member's 24850.7 20372.2 income in 1980($) LOFINC Log Of (OFINC+1) 9.9052 0.6822 KIDS Number of children 3 0.5076 0.7745 5 years of age EXP Number of years worked 8.3578 6.9111 Since age 18 EXP2 EXP2/1000 0.1176 0.1912 67 Table 2. Simple Probit Models for Labor-Force Participation and Employment Model No Friction Friction Dep. Var. EMP LF EMP CONSTANT 1.9064*** 2.8483*** 0.7660 (0.6564) (0.6720) (0.8787) ED 0.0688*** 0.0797*** -0.0099 (0.0151) (0.0156) (0.0299) URB 0.1793*** 0.1403** 0.2031 (0.0693) (0.0699) (0.1369) MINOR 0.0403 0.1615** -0.3846** (0.0776) (0.0781) (0.1700) AGE 0.0022 -0.0252 - (0.0262) (0.0265) AGE2 -0.4820 -0.2201 - (0.3187) (0.3219) REGS 0.1117 0.1008 0.1750 (0.0729) (0.0730) (0.1827) UNEMPR -3.7654*** -2.3757* -7.7640*** (1.3442) (1.3645) (2.6920) LOFINC -0.2713*** -0.3224*** 0.1147 (0.0575) (0.0603) (0.0885) KIDS -0.4954*** -0.5131*** -0.2054** (0.0383) (0.0386) (0.0941) EXP 0.1371*** 0.1543*** 0.0332 (0.0176) (0.0177) (0.0334) EXP2 -2.3910*** -2.8442*** -0.4396 (0.5982) (0.6012) (1.4115) N 1962 1962 1006 Log L -1148.8 -1128.9 -265.01 * significant at the 10% level ** significant at the 5% level *** significant at the 1% level standard errors in parenthesis 4 :V' l" 7"“, .'"< U) 68 Table 3. Frequencies of Actual and Predicted Outcomes for the Probit Model of No-Friction. Actual\Predicted =0 EMP=1 Total EMP = 0 761 278 1039 EMP = 1 316 607 923 Total 1077 885 1962 69 Table 4-A. Frequencies of Actual and Predicted Outcomes for the Probit Model of Labor-Force Status Based on the assumption of Frictions in Labor Market Actual\Predicted LF=0 LF=1 Total LF = 0 636 320 936 LF = 1 276 730 1006 Total 912 1050 1962 Table 4-B. Frequencies of Actual and Predicted Outcomes for the Probit Model of EMP Status Based on the Assumption of Frictions in Labor Market Actual\Predicted EMP=0 EMP=1 Total EMP = 0 O 83 83 EMP = 1 0 923 923 Total 0 1006 1006 70 Table 5. Censored Probit Estimates of the Friction Model Dep. Var. LF EMP CONSTANT 2.9638*** 1.0227*** (0.6659) (0.8859) ED 0.0793*** 0.0363*** (0.0157) (0.0321) URB 0.1437** 0.2240** (0.0703) (0.1232) MINOR 0.1645* -0.2469 (0.0788) (0.1600) AGE -0.0356 - (0.0259) AGE2 -0.0941 - (0.3134) REGS 0.1014 0.1876 (0.0735) (0.1585) UNEMPR -2.2564* -7.7352*** (1.3726) (2.3775) LOFINC -0.3161*** -0.0465 (0.0602) (0.1180) KIDS -0.5164*** -0.3668*** (0.0393) (0.0947) EXP 0.1566*** 0.0640** (0.0177) (0.0301) EXP2 -2.9305*** -1.1999*** (0.6015) (1.2245) p 0.6159*** (0.2100) N 1962 1006 Log L ~1391.51 * significant at the 10% level ** significant at the 5% level *** significant at the 1% level standard errors in parenthesis 71 Table 6. Joint Estimation of Labor-Force, Employment Decisions and the Wage Equation (the Reduced-Form model) Dep. Var. LF EMP LWAGE CONSTANT 3.5160*** 2.2994*** 0.1640 (0.6600) (0.8264) (0.1949) ED 0.0884*** 0.0698** 0.0764*** (0.0158) (0.0285) (0.0490) URB 0.1547** 0.2559** 0.1910*** (0.0699) (0.1163) (0.0289) MINOR 0.1057 -0.3364** -0.0959*** (0.0802) (0.1592) (0.0289) AGE -0.0213 - 0.0079 (0.0258) (0.0104) AGE2 -0.2192 - -0.1931 (0.3132) (0.1250) REGS 0.0746 0.0967 -0.0629** (0.0742) (0.1476) (0.0305) UNEMPR -2.1860 -7.1815*** -1.1938** (1.3617) (2.2748) (0.4973) LOFINC -0.4172*** -0.2473** - (0.0597) (0.1066) KIDS -0.4565*** -0.2583*** - (0.0391) (0.0877) EXP 0.1530*** 0.0783*** 0.0493*** (0.0180) (0.0279) (0.0080) EXP2 -2.8690*** -1.2782 -0.8524*** (0.6213) (1.1404) (0.2055) p 0.6858*** **(*0.1833) *** *** 013,023,033 0.2215 0.2922 0.1442 (0.0515) (0.0353) (0.0127) N 1962 1006 923 Log L -1664.86 ** significant at the 5% level *** significant at the 1% level standard errors in parenthesis 72 Table 7. Restricted MLE Estimates of the Three-Equation System (ai3=o¢3=0) Dep. Var. LF EMP LWAGE CONSTANT 2.9638*** 1.0227*** 0.3834** (0.6659) (0.8859) (0.1899) ED 0.0793*** 0.0363*** 0.0694*** (0.0157) (0.0321) (0.0053) URB 0.1437** 0.2240** 0.1696*** (0.0703) (0.1232) (0.0253) MINOR 0.1645** -0.2469 -0.0928*** (0.0788) (0.1600) (0.0276) AGE -0.0356 - 0.0134 (0.0259) (0.0097) AGE2 -0.0941 - -0.2147* (0.3134) (0.1203) REGS 0.1014 0.1876 -0.0809*** (0.0735) (0.1585) (0.0260) UNEMPR -2.2564* -7.7352*** -0.5296 (1.3726) (2.3775) (0.4771) LOFINC -0.3161*** -0.0465 - (0.0602) (0.1180) KIDS -0.5164*** -0.3668*** - (0.0393) (0.0947) EXP 0.1566*** 0.0640** 0.0273*** (0.0177) (0.0301) (0.0062) EXP2 -2.9305*** -1.1999*** -0.4301* (0.6015) (1.2245) (0.1888) p 0.6159*** (0.2100) N 1962 923 Log L -1391.51 -285.55 * significant at the 10% level ** significant at the 5% level *** significant at the 1% level standard errors in parenthesis 73 Table 8. Results for Two-Stage Estimation of the Wage Equation Dependent variable LWAGE Variables CONSTANT 0.4133*** (0.2475) ED 0.0769*** (0.0070) URB 0.1504*** (0.0375) MINOR -0.0178 (0.0516) AGE -0.0020 (0.0119) AGE2 -0.1283 (0.1382) REGS -0.0894** (0.0369) UNEMPR 0.2323 (0.8870) EXP 0.0505*** (0.0106) EXP2 -0.8612*** (0.2792) 013 0.2553*** (0.0949) (0.2556) 033 0.2042 R2 0.3321 N 923 * significant at the 10% level ** significant at the 5% level *** Significant at the 1% level correct standard errors in parenthesis. 74 Table 9. Unrestricted Joint Estimation of Employment Decisions and Reservation and.Market Wage Rates (the Structural Model) Dependent Unobserved . EMP LWAGE Variable Log of Reservation Wage Rate Variable CONSTANT -1.5060*** 2.2593*** 0.1743 (0.2770) (0.8332) (0.1930) ED 0.0451*** 0.0679** 0.0764*** (0.0075) (0.0289) (0.0049) URB 0.1376*** 0.2594** 0.1918*** (0.0326) (0.1164) (0.0289) MINOR -0.1336*** —0.3410** -0.0965*** (0.0382) (0.1601) (0.0314) AGE 0.0116 --- 0.0056 (0.0112) (0.0101) AGE2 -0.0710 --- -0.1668 (0.1416) (0.1217) REGS -0.0853** 0.1032 -0.0610** (0.1485) (0.0303) UNEMPR --- -7.0411*** -0.9428** (2.2617) (0.3994) LOFINC 0.1473*** -0.2409** --- (0.0284) (0.1080) KIDS 0.1599*** -0.2568*** --- (0.0220) (0.0882) EXP --- 0.0786*** 0.0518*** (0.0278) (0.0075) EXP2 --- -1.3000 -0.9316*** (1.1410) (0.1838) 211,212,213 0.1093*** 0.0510 0.0652*** (0.0210) (0.0745) (0.0191) 223,233 0.2890*** 0.1446*** (0.0366) (0.0127) N 1962 Log L -1665.33 * significant at the 10% level ** s gnificant at the 5% level *** s gnificant at the 1% level standard errors in parenthesis APPENDICES 75 APPENDIX A In Model II, 0"“ E(eale12'xifiiv e2242/32) = 013“1+°23“2 (A'z) E('323I'312""151r ezz'xzfiz) = 033‘0213H1'0223F‘2 “(90213‘2013023+P°223)“3 (For simplicity, the subscript i is suppressed.) Proof. Let f(e1,e2,e3) be the trivariate normal distribution function of (e1,e2,e3)' where (e1,e2,e3)' has zero mean and covariance matrx 0 given below; and let f (e1,e2,p) be the standard bivariate normal distribution function with correlation coefficient, p. Let n = . 1 023 ; n = n 233 O O 033 We can easily show that 1 (A.3) E(e3le12-Xlfil, ezz-Xzfiz) Q CD 00 I e3f(e1,e2,e3)de3de de1 “X13 'X23 ’w 2 Note that 1 (28>3/2lnl 1 11 2 22 1/2 exp{ 5 (n e +0 e 2 f(ei'ez'e3) 1 1 22 2 33 2 12 13 23 +0 e2+n e3+ 20 e1e2+zn e1e3+20 e2e3)} 1 33 0136 +023e 1/2 eXP{' 92‘ (e3+ 33 ) } 3/2 (2n) Inl 76 11 33 13 2 13 23 12 33 1 n n - n n n -n n x eXP[’ 2 {( i3 ) )e1'2( 33 )eiez n n +( 022033-(023)2 )e2}] 33 2 n 1 33 013e1+023e2 2 = eXP{' -—- (e + ) } (2”)1/2 (n33)2 2 3 033 X f(e1,e2,p) where n31=(1-p2)/|n|. Then, w a -po 13 23 (A.4)] e f(e ,e ,e )de = e f(e ,e ,p) .4» 3 1 2 3 3 1_p2 1 1 2 a -po 23 13 + -I:;§——— e2f(e1,e2,p) Substituting (A.4) into (A.3) and using Rosenbaum's theorem (See Johnson and Kotz [1972]) gives us (A.1). To show (A.2) to hold, note that 1 2 _ 00 CD 00 2 x I I I e3f(e1,e2,e3)de3de2de1 "x131 "x252 ’” It can be easily shown that no 2 _ Ifll (A.6) {m e3f(e1, e2, e3)de3— 2 f(e1,e2,p) 1‘9 2 (pa -0 ) 23 13 2 + e f(e .e ,p) 1_p2 1 1 2 2(""23"’13)("013-023) <1-p2)2 + e1e2f(e1,e2,p) 77 2 (pa -0 ) 13 23 2 + e f(e .e ,p) 2 2 2 1 2 (l-p) Substituting (A.6) into (A.5), and using Rosenbaum's theorem again gives us (A.2). (A.2) provides a consistent estimator of 033 in the extended two-stage estimation method. As we see in (4.3'), V31=e31-013I-‘11‘023H21- Then using (A.2) , we can show that (1)-7) E(v231|eliZ-xlifil'eZ-XZiBZ) = 033‘0132(x1151l‘11+l‘112+9“31) ”023(X2152“21+“21+W31)+2013°23(“31'“11“21) This shows the consistency of 033 given in (10). 78 Appendix B The extended two-stage estimators of B3 and c are given by: A I I A -1 2 I (B.1) a = 53 = ziXBiXBi 31*31“1 1x31yzi A A| A|A AI ° ziuiXBi Siuiui 31"1Y31 2: ' E 'A '1 E ' -‘ + = a + 1x31 "31 1X31“i ix3i{(“i “1) V31} A I I A! A 31“1X31 ziuiui zi“i{“1‘"i’e+vai} where 31 means summation over i from 1 to N1. First of all, we need to know the asymptotic distribution of 31. By Taylor's expansion at the true parameter, r=(fi1'rfl2'rp).r A. I all” A | A “1 ~ “1+ 6 r (r'r)-“i+ Ai(r-r) r where A1 is defined in (9). Therefore, Al I A VN(ui- pi) z Ai(VN(r-r)). Hence, we have A. I I (B02) VN(ui’pi) * N(Or AiWAi) which shows the consistency Of [21. Note that the total number (N) of observations increases as the number (N1) of the Observed Y3i's does; N -. no as N1 —. 00. For simplicity, we assume that N1 (3.3) plim —fi_ = k, 0-x and e2>-y is: 1 1 2 (D.1) f(e |e z-x,e 2y) = exp (- ———— e ) 3 1 2 (2")1/2(033)1/2 2033 3 .F[ X"(013/03993 Y +(a23/a33) e3 (1"’:1z3/"33)1/2 (1’023/033)1/2 p"”13023/“33 (bog/"33)1/2(1"’§3/"33)1/2 ] + F(X:Y:P) Proof. It can be easily shown that 1 1 (D-2) F(e .e le ) = exp[- {(0 -o ) 1 2 3 2fl(|fll/033)1/2 ETfiT 33 23 1 a 2 13 2 {(0 ‘0 )(e - -—- e ) Zlfll 33 23 1 033 3 x exp{- 0 13 2 33 023013)‘e1 033 93) -2(pa U 2 23 2 +("33 “13)(82 033 83) }] The conditional distribution of e3 given e1>-x and e2>-y 87 is described by (0.3) F(e3|e1>-X. e2>-y) = I I -X ’Y F(XIYIP) dezde1 f(e3) w w = I I f(e1,e2|e3) dezde F(XIYIp) “X 'Y 1 where 1 eg f(e ) = eXP (- —-——-) 3 (2”)1/2(033)1/2 2033 _ e1"(013/033)e3 , 92' (023/033)e3 (0'4) t1 ‘ 1/2 ' 2 = 1/2 (1'013/033) (1’023/033) Then, the Jacobian of the transformation is /2 (0.5) _ 2 1/2 _ 2 1 8(are? _ (“33 013) (033 “23) 8(t1,t2)' a 33 Using (0.4) and (8.5), we have co 00 (8.6) I I f(e1,e2|e3)de2de1 -x -y 0000 = I I g(t t ) dt dt h k 1' 2 2 1 where h /2 -(X+(013/O33)e3)/(1-0i3/U33)1 /2 w H -(Y+(023/033)e3)/(1-0223 M33 )1 e = (9033-013023)/{(033'013)1/2(°33'9:3)1/2} 88 _ 2 1/2 _ 2 1/2 t t _ (”33 013) (033 023) g( 1! 2) - 1/2 2n<|fl|033) 2 2 (033 “13)(033 023) ’eXP[ ‘ 2|£|a33 .{ t2+t2- 2(""33""13"23) t t }] 1 2 2 1/2 2 1/2 1 2 (“33‘013) (“33'023) Note that IBIO 1_£2= 33 2 2 (“33 “13)(033 “23) This shows that g(t1,t2) is the standard bivariate normal distribution function with the correlation coefficient, 2. Therefore, we have on Q (D.7) {x {y f(e1,e2|e3) dezde1 = F(h,k,£) Substituting (D.7) into (0.3) gives us (D.1). REFERENCES 89 REFERENCES Abowd,J.M, and Farber,H.S. (1982), "Job Queues and the Union Status of Workers", Industrial and Labor Relations Review, Vol. 35, pp 354 - 368. Amemiya, T (1973), "Regression Analysis When the Dependent Variable Is Truncated Normal", Econometrica, Vol. 41, pp 997 - 1016. Amemiya, T (1985), Advanced Econometrics, Harvard Press. Blundell, R., Ham, J., and Meghir, C. (1987), "Unemployment and Female Labor Supply", The Economic Journal, Vol. 97 (Conference Papers), pp 44 - 64. Farber, 8.8. (1983), "Worker Preference for Union Representation, " Research in Labor Economics, Supplement 2, pp 171 - 205. Fishe, R., Trost, R.P., and Lurie, P.M. (1981), "Labor Force Earnings and College Choice of Young Women: An Examination of Selectivity Bias and Comparative Advantage", Economics of Education Review,‘Vol. 1, pp 169 -191. Flinn, C. J., and Heckman, J.J. (1983), "Are Unemployment and Out of the Labor Force Behavirally Distinct Labor Force States?", Journal of Labor Economics, Vol. 1, pp 169 - 191. Ham, J.C. (1982), "Estimation of a Labour Supply Model with Censoring Due to Unemployment and Underemployment", Review of Economic studies, Vol. 49, pp 335 - 354. Heckman, J. (1974), "Shadow Prices, Market wages, and Labor Supply", Econometrica, Vol. 42, pp 670 - 694. Heckman, J. (1979), "Sample Selection Bias As a Specification Error", Econ metr'ca, Vol. 47, pp 153 - 161. Johnson, N.L., and Kotz, S. (1972), Qistribgtigns in Statistic : Continuous Multivariate Distribut'on , Vol. 4, John Wiley and Sons. Lin, T. (1982), Some Applications of the Lagrangean Muitiplier Test in Ebonometrics, Dissertation for the Ph.D degree, Michigan State University. Maddala, 6.8. (1987), Limited-dependent And Qualitative ygriabies in Econometrics, Econometric Society Monograph, No. 3, Cambridge University Press. 9O Melino,.A. (1982), "Testing for Sample Selection Bias", Bcvieg cf Eccnomic Studies, Vol. XLIX, pp 151 - 153. Meng, C., and Schmidt, P. (1985), "On the Cost of Partial Observability in the Bivariate Probit Model", Ingcgncticnci Economic Review, Vol. 26, pp 71 - 85. Poirier, D.J. (1980), "Partial Observability in Bivariate Probit Models", Journal of Econometrics, Vol. 12, pp 209 - 217. Chapter 3 Efficient Estimation of Models for Dynamic Panel Data 1. Introduction This paper considers a dynamic model with panel data which include a large number of cross-section observations, but only over a short period of time. A typical problem in using panel data is that the error terms in the model contain unobservable and.time-invariant individual effects. To allow for these effects, "random effects" models are widely used in the literature on dynamic panel data. In these models, the individual effects are treated as being generated from an independently identically distributed (iid) stochastic process. This paper develops a generalized- method-of-moments (GMM) estimator for the dynamic model with random effects which is efficient under general circumstances. In the case of the static model, the simple fixed effects (within) treatment generates a consistent estimator. There are also a number of studies which develop efficient estimation methods for the static model with random effects. When no explanatory variables are correlated with the individual effects, the generalized least square (GLS) estimator is consistent and efficient in finite sample; see Hsiao [1986]. When some explanatory variables are correlated with the individual effects, we can efficiently estimate the model using some available instrument variables; see Hausman 91 92 and Taylor [1981], Amemiya and MaCurdy [1986], and Breusch, Mizon, and Schmidt [1989]. Several problems arise in the dynamic model that do not arise in the static model. First, the conventional within estimator is inconsistent unless there are a large number of time-series observations; see Hsiao [1986] . Second, even though the maximum-likelihood (ML) method is available, the form of the ML estimator depends crucially on assumptions about the initial observations and the distribution of the individual effect; see Anderson and Hsiao [1981], or Hsiao [1982]. To avoid these problems, Anderson and Hsiao [1981], Holtz-Eakin [1988], and Arellano and Bond [1988] investigate instrumental variables estimation techniques. (From now on I call their methods the conventional instrumental-variable (IV) methods.) To get a consistent estimator, they first-difference the original equation to eliminate the individual effects; and then they use lagged dependent variables as instruments. These instruments are legitimate in the usual sense that they are uncorrelated with the differenced error terms. In the framework of Hansen's [1982] GM, the conventional IV estimators can be regarded as GM estimators which use some available linear orthogonality conditions. The GM estimators are efficient in general circumstances, if all known is that the data-generating process satisfies certain moment restrictions; see Chamberlain [1987]. The GMM method may be preferred to the ML methods in the dynamic model using panel 93 data, because the GMM estimators do not rely on assumptions about the initial observations and the distribution of the individual effects. However, the conventional IV estimators are not efficient in the sense that they fail to use all the available moment conditions. This is due to their lack of a systematic treatment in counting the number of available restrictions. Furthermore, those IV estimators could be inconsistent if we relax some behavioral assumptions about the error terms. For example, lagged dependent variables are generally not legitimate instruments if the error terms are autocorelated. The main goal of this paper is to offer a systematic analysis *which. counts jproperly' all ‘the available :moment conditions under given assumptions. Under alternative sets of assumptions, I demonstrate how many moment conditions there may be, and I show how to write them in a convenient form. Under the usual assumptions, we can find some linear and nonlinear orthogonality conditions, which the conventional IV approaches do not exploit. I then derive the GMM estimator, and a linearized GMM estimator that is equally asymptotically efficient. The GM estimator based on all the available moment conditions must be more efficient than other GMM estimators based on only a subset of moment conditions. In this sense, the GMM estimator presented in this paper could be said to be efficient when the distributions of the initial observations and of the individual effects are not known. The plan of this paper is as follows. Sections II, III, 94 and IV consider a simple dynamic model which includes only a one-period lagged dependent variable as the explanatory variable. Section II briefly summarizes the conventional IV approaches and demonstrates why they miss some available moment restrictions. Section III shows the proper way to derive all the legitimate 'moment. conditions under' given assumptions. Section IV investigates the estimation procedure and the asymptotic performance of our GMM estimator under the usual assumptions. Section V extends our approach to the dynamic model which includes exogenous variables. Section VI gives some conclusion. II. Conventional Iv Methods To explain the conventional IV methods as simply as possible, I consider the following simple dynamic model: (II-1'1) Yit = 5Y1,t-1 + a1 + 6it (for i = 1,2,"'°,N;t = 1,2,----,T) = 6Yi,t-l + uit where uit = “1+51t- The subscript i denotes the ith cross-section unit, and t designates time periods. Here y is the dependent variable, a is the individual effect, and e is the error term. Let y1 = (y11,y12,.°-,y1T)'; y1’_1 := (Yiovynv ° ' ' IYi,T-l) ' r ui=(uil'ui2" ° ‘ rum) ' - Then: we can rewrite (II.1-1) as (II.1-2) Y1 = 6y1'_1 + “1 i = 1,2,- - . ,N Let y=(y1'.Y2'."':YN')'i y;1=(y1,-1'.y2,-1'.°".yn,-1)'; 95 u=(u1',u2','°-,uN')'. Then (II.1-2) becomes (II.1-3) y =- 6y_1 + u. I assume that the (2'3 and e's have zero means. More assumptions about them will be made shortly. For mathematical convenience and future use, I also define some notation: p, = A(A'A)'1A' MA 3 Im ‘ PA P = (1/T)eTeT' Q = 1:.r - P P =INGP v Qv=IN®Q=INT-Pv where eT is a T-dimensional vector of ones, and A is any mxk matrix. In the conventional literature on the dynamic panel data model, there are four common assumptions about yo, a, and e's (for simplicity, the subscript "i" is suppressed): (SA.1) e's are independent of yo; i.e., E(yoet)=0 for any t. (SA.2) e's are independent of a; i.e., E(aet)=0 for any t. (SA.3) 6'8 are homoskedastic; i.e., E(et2)=a£2 for any t. (SA.4) 6'8 are mutually independent; i.e., E(eaet)=o for any tis. The conventional IV approaches first-difference (II.1): (II-2) Ylt'Y1,t-1 = 5(Y1,t-1'Y1,t-2) + (“it-“Lt—l) Since uit—ui'td (=51t’51,t-1) does not include a1, some lagged yit's can be used as instruments for the estimation of 6. For example, Y1,t-2'Y1,t-3 is a legitimate instrumental variable in 96 the sense that it is uncorrelated with nit-uifird' but correlated with Yifird-Yinrfi; see Anderson and Hsiao [1981]. Arellano and Bond [1988] and Holtz-Eakin [1988] find and use all of the moment conditions based on lagged y's being uncorrelated with uit'“1,t-1° Consider the (T-1) first- differenced equations separately: (II-3) Y12'Y11 = 5(Y11'Y10) + (“12'“11) Y13‘Y12 = 5(Y12'Y11) + (“13'“12) Yir'Y1,'r-1 = 5(Yi,T-1‘Yi,'r-2) + (“yr-“1.151) The system (II.3) could be regarded as a simultaneous system of equations with the cross-equation restriction that the coefficients are the same everywhere. Arellano and Bond's method is akin to three stage least squares (BSLS) with different instruments for the different equations; YiorY11I"°rY1,j-1 for the jth equation of (II.3). This approach is based on the following (§)T(T-1) orthogonality conditions: (II-4) E(Yit(ui,s+1'uis)) = 0: t =\1,2,"',T-2; t < s s T-l which hold under the assumptions given in (SA). Even though Anderson and Hsiao [1981], Holtz-Eakin [1988], and Arellano and Bond [1988] adopt somewhat different instrumental variable treatments, fundamentally all of their methods are based on the conditions given in (11.4). To clarify this point, let us try a slightly different approach. Define 97 A1 — r -yio O : 0 O : : O - yio 'Yio "Y11 ° ‘ ‘ ° ° yio ‘ yi1 'Yii ‘ ‘ ° 0 0 ' : O yil : : 0 ° 0 "Via: ° ° ”Vii: "Yi,T-2 - ° ° Yio‘ ° ° Yii‘ ‘ Yi,T-1~ where A1 is a Tx(§)T(T-1) matrix. Also, define A = ( Al',A2','-',AN')' Then, (II.4) can be compactly expressed by (11.5) E(Ai'ui) = 0 or E(A'u) = 0 One merit of (II.5) is that we do not have to first-difference the equation given in (II.1). That is, we can directly apply the instrumental-variable treatment to the equation in levels. All the conventional IV’approaches can be interpreted.as using some if not all orthogonality conditions which.are just linear combinations of those in (11.5). Note that the instruments in A are legitimate because plim(1/N)A'u=o and plim(1/N)A'y_1¢0. It can be easily shown that Anderson and Hsiao use as instrumental variable a linear combination of only some columns of A, while Arellano and Bond use A itself. The IV estimator which uses A for (II.1-3) takes the following form: (11.6) 3A = (y-1'PAy-1)‘1y-1PAy It can also be shown that this estimator is asymptotically identical to the GMM estimator based on (11.5) with 98 assumptions in (SA). (See .APPENDIXI A.) The asymptotic distribution of 3A is given by: (11.7) «N (SA-a) -> N( o, aezplim[(1/N)y-1'PAy_1]'1 ) If all the available orthogonality conditions are those in (11.5), 3A must be efficient among the class of estimators which can be derived using this set of information. However, the assumptions in (SA) imply' more :moment restrictions that those in (11.5). For an example, consider the following second-differenced equation: (II-3) Y13'Y11 = 5(Y12’Y103 ’+ (“13'“11) Clearly, under the assumptions in (SA), (11.9-1) E[u12(ui3-u11)]=0, and (11.9-2) E[u12(y12-y11)]¢0. Therefore, “12 can be regarded as a nonlinear instrument for equation (11.8) in the sense of Amemiya [1974], and this restriction (11.9-1) is not implied by (11.5). This example implies that (11.5) does not incorporate all the available moment conditions enforceable under the assumptions in (SA), and therefore that 3A is not efficient. In the next section I will categorize all the restrictions implied by (SA), and also show how we can relax some assumptions. III. Derivation of Moment Conditions In this section, 1 demonstrate an appropriate way to derive all the available moment restrictions which can be exploited from the usual set of assumptions given in (SA), and I also apply this method to several cases in which some 99 assumptions are relaxed. To do so, first define the covariance matrix of yo, a, and the e'sl: r ~ 02 a o a °°°°° o - yo 7 0 0a 01 02 OT 2 a 0a Gal “a2 aaT (111.1) 2 = Cov ‘1 = “11 “12 °°°° “1T 62 022 eeeee azT i ' i I- 6T .1 .. OTT d where i is suppressed. Basically, any assumption which may be imposed on the dynamic model can be expressed as a restriction on.2L Under the usual assumptions in (SA), 2 takes the following form: - 00 00a 0 o o - 2 06 o 0 (111.2) a: o -.- o 2 b 06 - The vector of which 2 is the covariance matrix is not observable. The vector of observables, meaning things that can be written in terms of data and parameters, is (yo, ul, u2, -, uT)', which has the following covariance matrix: - 0 fl 0 fl - ' yo ‘ oo 01 02 OT u1 ‘111 912 '°°' “11 (111.3) 0 = Cov u2 = n22 .... “21 - “T . nTT . 100 2 .... ' “o “0a+°o1 00a+002 00a+OOT 2 2 O O O O 2 aa+011+20a1 0a +aa1+aa2+012 aa+oa1+aaT+a1T _ 2 ' .... 2 — aa+022+20a2 aa+aa1+aaT+02T 2 : L aa+°TT+20aT- By comparing n and 2, we can easily see that the form of 0 depends on that of 2, because each element of n is a linear combination of the elements of )3. Under the usual assumptions (SA). ' 000 ‘201 ‘102 nor 1 Q11 012 °°°' n1T (111.4) 0 = 922 --~- “21 L QTT - - 0'2 O O O 0 0a 0a 0a ] 2 2 2 0 +06 0a 0a _ 2 2 - 0 +06 0a 2 2 _ 00+ 06 . An investigation of the elements in (111.4) provides us with three types of moment restrictions; (111.5-1) Type I restrictions: n01 =n02 =noa - - “or (111.5-2) Type II restrictions: n11 =n12 =91; = " “if; (111.5-3) Type III restrictions: n12 =n13 =0“ ‘-n1'r “923 =924 "' ““21- (111.5-1) implies (T-1) restrictions; (111.5-2), (T-l); (111.5-3), (§)T(T-1)-1. Therefore, there are in total (§)T(T-1)+(2T-3) available moment conditions, which are more than those used in the conventional IV approaches. Specifically, we have (2T-3) extra moment conditions. Furthermore, the conditions described in (111.5) can be obtained even under somewhat relaxed assumptions. First of all, consider Type 1 conditions. By observing (111.3), we can see that Type I restrictions hold as long as Y0 and 6t have the same covariance for any t. The stochastic independence between Y0 and the 5's is not required. Now, consider Type 11 conditions. These require that ass+aaz+20as = cxtt._+aa2-+2cxmt for any 3 and t. For this, all we need is the homoskedasticity of the 6's and the equicovariance of a with the e's. Finally, consider Type III restrictions. These require: Ogt+0at+0aa 'to be the same for any t and 8. Obviously, the equicovariance of the 6's and. the equicovariance of a with the e's can justify those restrictions. In short, all the moment conditions in (111.5) are present under the following modified assumptions (MA): (MA-1) The covariance of yo and at is the same for any t. (MA-2) The covariance of a and at is the same for any t. (MA-3) e's are homoskedastic. 102 (MA-4) The covariance of 59 and 6t is the same for any 5 and t. A comparison of (SA) with (MA) shows that the assumptions in (SA) except the homoskedasticity of the e's are stronger than necessary. To derive explicit forms of the moment conditions which are convenient for GM estimation, let us rearrange the conditions in (111.5) as follows: (III-5’1) 001 = n02 = n03 = ° " ' = no.'1---1 = “or ‘112 = n13 = ' ‘ ' ' = n1,'r-1 = “11 n23 = = n2,'1'--1 = n21' n'1'-2,'r-1 = “152,1 (III-5‘2) 011-012 = {222—023 = = n'r-1,-r-1""r—1,'r (III-5‘3) n11 = ‘222 = ‘133 = = n'r-1,'r-1 = “11 All these conditions in (111.6) can be expressed by: (111.7-1) E[yit(ui's+1-uia)] = 0 0 S t s T-2, s > t (111°7'2) E[Y1t(“1,t+1‘“1t)'Y1,t+1(u1,t+2‘ui,t+1)1 = 0 1 S t .<_ T-2 (111.7-3) E[fii(ui't+1-u1t)] = o 1 S t S T-l where fii=(1/T)e.r'ui. For the derivation of (111.7), see APPENDIX B. (111.7-1) implies (§)T(T-1) restrictions which are exactly identical to those used in the conventional 1V approaches. (111.7-2) shows (T-2) missing linear moment conditions which are not used in the former studies. 103 (111.7-3) shows (T-l) restrictions that are nonlinear in the sense that 6 appears in both iii (=(1/T)e.r'(y1-6yi'-1)) and “1,t+1"“1t (=(Y1,t+1"Y1t)'5(Y1t"Y1,t-1))° Again, there are in total (1/2)T(T-1)+(2T-3) restrictions. Table 10 summarizes the total number of moment conditions implied by (SA). Table 10 Types of Conditions (111.7-1) (111.7-2) (111.7-3) Total T 2 1 o 1 2 3 3 1 2 6 4 6 2 3 11 5 1o 3 4 17 T (3)1(1-1) T-z 1-1 (§)T(T—1) -(2T-3) Referring to Table 10, we see that the total number of all the available moment conditions is almost twice as big as that of the usual conditions, when T is small. This means that the GMM estimator based on (111.7) could have a substantial efficiency gain over the conventional 1V estimators if T is relatively small. Until now, we have investigated the moment restrictions under the usual assumptions“ IHere, we consider three cases in which some assumptions are relaxed. CASE 1. Keep all the assumptions except (SA.3) In this case, the heteroskedasticity of the 6's is 104 allowed. One might think that the moment restrictions given in (111.7-1) would be all we can have because (111.7-2) and (111.7-3) are based on the homoskedasticity of the e's. However, this is not the case, because we still have some nonlinear restrictions. To see why, observe the form of n in CASE 1: ' 0oo 001 D02 °°'° “01 “ n11 n12 "°° 011' (111.8) 0 = 022 ~--- OZT . nTT . '- 02 O' O O 0 0a 0a 0a 7 2 2 0 +011 Ga Ga _ 2 - 0 +022 0a 2 This shows that, Type I and 111 restrictions are still available. The total number of conditions must be (§)T(T-1)+(T-2). That is, we have an extra (T-2) conditions in comparison with those adopted in the conventional IV approaches. To show the explicit form of the restrictions, let us rearrange them in a slightly different manner: (HI-9'1) ‘201 3 ‘102 = 003 = '° ‘ ‘ = n0,-1-1 = “or = n12 = "‘13)= ' ‘ °‘ = n1.1-1 = “11' =323= =nz,-r-1 =92'r = n'r-2,'r-1 = n'1'-2,'1' 105 (111.9-2) on = (223 = :234 = = nT_1 T (111.9-1) is just identical to (111.7-1), and therefore, it generates the conventional moment restrictions given in (11.5). (111.9-2) can be expressed by: (111.10) E[u12(ui3-u11)] = E[ui3(ui4-u12)] = = E[“1,r-1(“1'r’u1,1-2)1=° which are (T-2) nonlinear restrictions. If the 6's are heteroskedastic, we do not have any extra linear restrictions. However, the existence of the nonlinear conditions given in (111.10) again prevents the conventional IV estimators from being efficient. CASE 11. Keep (SA.2) and (SA.3) Here, we allow the e's to be correlated with each other. For example, the 6'8 could follow an AR or/and.MA.process. In this situation, it would be unreasonable to assume that 1’10 is independent of the 6's. The conventional IV estimators can not be consistent, because they use invalid instrument variables, in the sense that E[Ylt(u1,s+1‘“1a) ]+0 for s>t. However, we can find some available nonlinear moment conditions. To see this, observe the form of a under CASE II: - n n 0 °--- 0 - oo 01 02 OT Q11 012 '°°' “11 (111.10) 0 = 0 °.-- 0 22 2T 2 ' “o “0a+°o1 00a+002 00a+°OT 2 2 2 2 0a +06 0a +012 0a +011, _ 2 2 2 - Ca +0 0a +021, L a: +0: _ As we see in (111.10), Type II restrictions are still available. These (T-l) conditions can be written as follows: E[(u1,'r-1+u1'r) (“1,1-1'u1'rH = 0 These restrictions are all we can have under the intertemporal correlations among the e's. However, if we assume the stationarity of the e's, that is, if E(eisei,a+j)=E(eitei't+j) for any t,s,and j, we can have more restrictions. Under the stationarity of the e's, 08’8” = out”. Applying this fact to 0 gives us the following extra (%)(T-1)(T-2) conditions: (III-12) n12 = fl23 = {234 = ' ° ° ’ = “ram—1 = nan-1,1 n13 = n24 = = “ram-1 = n'r-2,'r n1,1-1 = 921' Therefore, under the stationarity condition about the 6's, we can have in total (§)T(T-1) restrictions available for GMM estimation. CASE III. Keep (SA.3) and (SA.4) Since Ylo could be correlated with e's, any lagged value of y“; can not be used as a legitimate instrument for the differenced equations, because E[yit(ui'8+1-uia) ]¢O for s>t if 107 neither Yio nor a1 have equicovariances with sit for any t. However, we can have many nonlinear restrictions. 111, 0 takes the following form: (111.13) a = Cov u Pyo- 00 Q01 011 0a 02 + + a “a1 “a2 + + 0a 06 2“(12 :3 02 12 9:) 22 2 2' oa+o€+20a For CASE 0OT 1 “11 “21 “11 _ “0a+°or ' 2 aa+oa1+aaT 02+O' +0 a al aT T Note that SIM-i-(lu..-20at = 2062. Explicitly, this means that E[(uia-uit)2] is the same for any t and s. are (§)T(T—1)-1 restrictions. Therefore, there Effectively, these represent the fact that V’ar(u.1t-fii)=[(T-1)/T]oe2 for any t, and COV(u1t-fii , uia’fii) = -Ta£2 for s¢t. consider the transformed error term: P u (111.14) u. 11 q E. 1 Qui has the covariance matrix (111.15) Cov(Qui) To clarify this point, 108 = ~ [(T-1)/T]o: -a:/T -a:/T '°° -o:/T T [(T-1)/T]a: -a:/T °-- -a:/T [(T-1)/T10§ -a§/T [(T-1)/TJa§ . in Note that the rank of Cov(Qui) is (T-1). Therefore, for the derivation of linearly independent moment conditions only, the last row and column must be ignored” ‘The restrictions implied in (111.15) can be expressed by (111.16) Var(u11-fii) = Var(uiz-Gi) = ° - ° - = Var(u1'T_1-ui) = —(T-1)Cov(uil-fii,u12-Gi) = = -(T-1)Cov(u1’T_1-fii,uiT-ui) = ~(T-1)Cov(ui'T_2-ui,ui'T_1-ui) The number of restrictions in (111.16) is again (§)T(T-1)-1. 1v. Estimation In Section III, I have introduced a systematic method to derive all the imposable moment conditions from a given set of assumptions. In this section, I examine the GMM estimation procedure based on those restrictions. Even though using all the moment conditions will no doubt improve the efficiency of the GM estimator, the existence of the nonlinear restrictions makes a simple IV treatment impossible. One of the greatest advantages of the conventional 1V methods is that the estimators could be obtained using standard 1V (ZSLS) software. Therefore, one may be reluctant to use nonlinear 109 GMM methods. However, we can avoid this problem by the linearized GMM procedure; see Newey [1985]. Define the following expressions: ' 'Yii ° ‘ Y11+Y12 ‘Yiz ’Yiz y12+Yi3 B11 = 0 ‘Yia ° ° ”Yi,T-2 ° ° Yi,T-2+yi,T-1 - ° ° “Yi,T-1 - r- -fii 0 -‘ ui -ui 321 = 0 “i 0 0 "fii __ 0 0 -ui _‘ (“12"“i1)/T (“13'“iz)/T '°° (“11’“1,T-1)/T ”21 = (“12'“122/T (“13‘“12)/T "' (“11‘“1,T-1’/T (“12’“isilT (“is'uiz)/T °°° (“iT'ui,T-1)/T B 11 B21 D21 B. = B12 ’ 34‘ B22 ’ UL: D22 3 BlN 2N Dzu N H = (A B1 Bz); D = (O 0 D2); H'u =jE1Hi'ui. where 311 is a Tx(T-2) matrix; 321 and D21 are Tx(T-1) matrices” Then, the moment conditions given in (111.7) can.be expressed by 110 (1V.1) E(Hi'ui) = O, or E(H'u) = O A consistent estimator of C=Cov(H'u) is required.t01derive the GMM estimator. For this, we can use 33 to evaluate H1 and “1° (Let 1711 and {51 be H1 and “1 evaluated at 311') Then, C= iglfii'fiifii'fii is consistent for C. One may think of H'fi as a consistent estimator of C in the same sense that A'A is consistent for Cov(A'u). However, H'H is inconsistent unless T=2, because Cov(A'u) includes the fourth-moments of cit. When the eit's are normally distributed, APPENDIX C shows (1v.2) plim(l/N)Cov(H'u) = aezplim(1/N)H'H + 0621' where J' is a matrix with the form of * ooo J=[0J0] 000 and J is a (T-2)x(T-2) matrix of the form 2 -1 o ’ -1 2 -1 J=m2 o -1 2 ’6 2 -1 -1 2 Therefore, H'H+N3*, not fi'fi, could be used for C.2 In spite of this fact, I believe that C is a better estimate for C, because H'H+NJ* is inconsistent under our modified assumptions in (MA). The GMM estimator, 36M", minimizes (Iv.2) (u'H)<‘:‘1(H'u) = [(y-6Y-1)'H(6)]&'1[H(6)'(y-6y-1)] where H(6) means that H is a function of 6. It has the following asymptotic distribution: 111 (1V.3) VN(6Gm-6)~N( 0, plim[ (1/N) (6(u'H)/66)C'1(6(H'u) [66) ]'1) It can be easily shown that (1v.4) 6(H'u)/66 - -(H+D)'y_1 Therefore, the asymptotic distribution of 2mm is given by: (1v.5) m(EGm-s) » N( o, plim[(l/N)y_1'(H+D)C'1(H+D)'y_1]'1) and the asymptotic covariance matrix of 6cm is evaluated as (IV.6) [y_1' (fi+5)8’1(fi+5) 'y_1]'1 where H and D are evaluated at 66““. To examine the characteristics of 66m, consider the first order condition for minimization of (1V.2): (1v.7) y_1'(H+D)C'1H'(y-6y_1) = 0 Then, through some algebraic operations, we can show (1v.3) 36m, = [y_1' (H+D)C_1H'y_1]'1y_1' (H+D)C'1H'y = [y_1' (fi+13)6’1(fi+13) 'y_1]'1y_1' (H+D)C'1(H+D) ' (y-Pvu) where {i is evaluated at 661m. (IV.8) has an interesting implication. It suggests that an iterative procedure with any initial consistent estimator of 6 may generate the GM estimator. This turns out to be true. Furthermore, only one iteration is needed to get an estimator which has the same asymptotic distribution as 66““. To see this, consider a new estimator, 6, which replaces H, D, and Q in (IV.8) by H, D, 6 (evaluated, say, at 6A). Then, (1v.9) 3 = [y_1'(fi+fi)é‘1(fi+fi)'y_1]‘1y_1'(fi+6)8'1(fi+fi)'(y-pvfi) = 6 + [y_1'(fi+fi)8'1(fi+fi) 'y_1]'1y-1'(fi+fi)C'1(fi+fi) 'u A “-1A A - [y-1'(fi+f>)6'1(fi+fi> 'y-11‘1y-1'(fi+n)c D'u 112 = 6 + [v-1'(fi+fi)?:'1(fi+f5) 'y_1]'1y_1'(fi+fi)e'1H'u + [y_1' (fi+fi)€:‘1(fi+fi) 'y-1]'1y_1'(fi+6)e'1(H-H) 'u - [y_1' (fi+fi)8’1(fi+fi) 'y_1]"1y_1' (fi+fi)8‘1fi' (fi-u) because Pvfi=0 and Pvfi=fi. Observe that Therefore, (IV.10) - -(1/T)eT'yi _1 O 1 0 O -(1/T)eT'yi’_1 I _ 0 0 (l/T)eT Yj_'._.ld Note that (IVell) P ‘(1/T)€T'Yi'_1 0 " ' " uil‘ I _. I (llT) eT Yi'_1 (I/T) eT Yi'-1 “12 ° (1/T)eT'Y1,-1 “13 O 0 -(1/T)e,r'yi’_1 : _ O O (1/T)e,I,'yi'_1 . L uiT— (“12’“11)/T (“13’“12)/T "' (“11'“1,T-1)/T ' Yio (“12’“11)/T (“13'“12)/T '°° (“11'“1,T-1)/T Yil (“12’“11”T (“13'“iz)/T "' (“11'“1,T-1’/T Y1,T-1 = D21'Y1,-1 Using (IV.10) and (IV.11), we have (IV.12) (321-321).“ = -(6A-6)02i'y_1 and (fi-H)'u = -(6A-6)D'y_1 Substituting (IV.12) into (IV.9) gives (1v.13) 6 - 6 = [y_1'(fi+D)C'1(H+D) 'y_1]'1y_1'(H+D)C'1fi'u 113 + (EA-6)[y_1'(fi+D)C‘1(H+D)'y;1]'1y_1'(fi+fi)é'1(D-D)'y;1 because fi'(fi-u)=-(6A-6)D'y_1. The second term in the right side of (IV.13) is asymptotically negligible; plim(1/VN)(fi-D)'y;1(EA-6)=plimVN(6A-6)plim(l/N)(fi-D)'y;1=0 Note that [y_1'(H+D)C’1(H+D) 'y_1]'1y_1' (fi+fi)c‘1n'u/«/N -» N( 0. puma/may(H+D)C'1(H+D)'y-11'1 ) This means that 6 is asymptotically identical to Ecum- This result is actually not surprising. Newey [1985] develops a simple linearized GMM estimator which has the same asymptotic distribution as the nonlinear GMM estimator. His method is applicable with any initial consistent estimator. Applying Newey's formula [1985, p. 238] to)our:mode1.generates (1v.14) 3:; + [y_1'(H+D)C'1(H+D)'y_1]'1y_1'(H+D)C'1fi'fi which can be shown algebraically identical to 6. Therefore, it turns out that 6 is nothing but Newey's linearized GMM. Even though the efficiency of 65““ is not questionable, one may argue that unless Scum has a significant efficiency gain, 63 would be preferred because of its simplicity. Therefore, it would be worth demonstrating explicitly the efficiency comparison between 6cm“ and 6A, and seeing in what cases the efficiency gain of 65“” over 6A is greatest. For simplicity, consider the case in which T=2. There are only two moment conditions available: Yio(“12'ui1) ] = o (IV.15) E[ “i (“12’“11) where fii=(§)(u12+u11). Define 114 (3) o 115 By observing (11.18), we can see that the size of efficiency 2, 00,2, and Pow The greater gain of 3cm over 6A depends on ae am2 is and the smaller 900:2 and 0:2 are, the greater efficiency gain 66““ has. Another possible problem in our estimation method is that gems loses its consistency under the heteroskedasticity of 6's while 6A does not. Therefore, an appropriate test procedure may be required. Since 66“,, is efficient under the null hypothesis (homoskedasticity of 6'5) , and 6A is consistent under both null and alternative hypotheses, Hausman's [1978] misspecification test is available. Under the null hypothesis, the test statistic, (11.19) (gem-3A) ' (Mz-erldcm-EA) converges in distribution to x2(T-1), where M1 and M2 are the asymptotic variances of 66mm and 6A, respectively. Note that 63 is not efficient even when the null hypothesis is rejected. This is because A'A is no longer consistent for Cov(A'u) and because, as CASE I demonstrates in Section III, we still have extra (T-2) moment conditions; see (111.9). To obtain an efficient estimator, we would need to construct the GMM estimator based on the moment conditions in (111.9). Again, Newey's method could be used to generate the linearized GMM estimator. V. Estimation with Exogenous Variables Until now, we have investigated the GMM estimation method for the simple dynamic model with a one-period lagged 116 dependent variable as the only explanatory variable. In this section, we extend the results obtained in the previous sections to the general model which includes exogenous variables. The most surprising finding is that there exist an enormously large number of moment conditions available for the dynamic model. Many former studies for the static model assume W conditions, in the sense that all the exogenous variables are uncorrelated with the e 's at all leads and lags. These conditions, however, do not generate extra instruments for the static model. On the contrary, all these strong exogeneity conditions must be considered in the dynamic model. I will explain this first, and later include the moment conditions obtained in the previous sections. The model can be written as (V.1-1) yit = 6y,_'t_1 + xitB + ziy + (ai+eit) = 5Y1,t-1 + xitB + ziy + nit i=1,2,° ° ‘ ,N;t=1,2,° ‘ ' ,T where x11: is a 1xk row vector of time-varying exogenous variables, and 21 is a 1xg row vector of time-invariant exogenous variables. That is, x11: and 21 are stochastically independent of €it° For the ith cross-section, (V.1-1) can be rewritten as (V.1-2) Y1 = 6yi_’_1 + x13 + 217 + “1 For all the observations, the model can be expressed in matrix form as: (V.1-3) y 6y_1+XB+Zy+u (Y-lr X, z)(6p 3', Y')' + u 117 = we + n For the static model (no y_1 present), Hausman and Taylor [1981], Amemiya and MaCurdy [1986], and Breusch, Mizon, and Schmidt [1989] develop simple 1V estimation methods, each of which provides a consistent and efficient estimator under a given set of exogeneity conditions between x, z and a. 'To see those exogeneity restrictions, it would be useful to partition X and Z as follows: (v.2-1) x = (x1, x2) (v.2-2) z = (Z1, 22) where X1 and Z1 are uncorrelated with a while X2 and 22 are correlated with a. For notational convenience, define a NTXNTm matrix, S', as follows: - S - S 11 “ 11 S12 S11 S12 S11 S12 S11 S11 S11 S12 S11 _-- * _-- __- _-- (v.3) S = : ; S = : . : SN1 SN1 SN2 SNT SN2 SN1 SN2 SNT - SNT - - SN1 SN2 SNT . where m is the number of columns of 8. Note that 8* is time— invariant matrix, in the sense that each of its columns is time-invariant. If we follow the interpretation provided by Breusch, Mizon, and Schmidt, the exogeneity conditions imposed by these authors can be compactly described by (v.4) plim(1/N)G'u = O 118 where G=(QvX, PVR). R is defined as follows: (v.5) R = (X1, 21) for Hausman and Taylor (X1, 21, X1*) for Amemiya and MaCurdy * (X1, Z1, X1 , (QvX2)*) for Breusch, Mizon, and Schmidt Since the comparison of these three approaches is not the task of this paper, we will use R without preference to any particular approach. A simple two stage least square (ZSLS) treatment using G as instrument variables will provide a consistent estimator of 9. Note that E(uu')=o?2F=ot2(Qv+¢2Pv) where ¢2=(a¢2+Taa2)/oez. The simple ZSLS estimator, 86 is the IV estimator applied to the following transformed equation: (v.5) r”y = r’iwe + r’fu Then, we have 56 = (w'r"¢c' (G'Grls'r’im'1w'r"=G(G'G)'1G'r”iy Through some algebraic operation, it can also be shown that3 (v.7) 86 = (W'G(G'I‘G)'1G'W)"1W'G(G'I‘G)'1G'y The asymptotic distribution of 86 is given: (v.8) vméG-e) .. N(0, aezplim[(1/N)W'G(G'I‘G)'lG'W]'1 ). 8G is also a GMM estimator. To see why, note that Cov(G'u)= okzs'FG. Then, the GMM estimator based on the exogeneity conditions in (v.4) minimizes (y-we) 'G(G'I‘G)'1G' (y-we) and it is exactly identical to 86. Even though (V.4) could incorporate all the exogeneity conditions on x and z in relation with a, it does not exploit 119 the W conditions about 6. To clarify this point, consider the following strong exogeneity restrictions: (V.9-1) E(Xit'519) = 0 (v.9-2) E(zi'eis) = o where t,s=1,2,'~ ,T. In terms of uis, (v.9) implies: (v.1o-1) E(xitwuLBfl-uian = o (V.10-2) E(zi'(ui’8+1-uis)) = 0 where s=1,2,°°-,T-1. Define . - I I- Xil O | I XiT 0 1 - I I _ x11 x11 ' ' x11 “11 l I 0 x11 ' ' 0 ‘11 u 0 O : : 0 0 xi = o o : : o o : : : : : : o 0 -x11 : : o 0 "x11 l I _ o o xil : : o o xiT _ .- .21 0 -I 0 z z** o o i O O 0 0 -z. L zi - where Xi" is a Tx(kT(T—1)) matrix and 21" is a Tx(g(T-1)) matrix. Let X** Z** ** 1 ** 1 ** ** ** X= : ;z= : ;W =(X,Z) x** Z** N N Then, we may use WM as instruments, because plim(1/N)W"'u=0. ** Define F=(W** G)=(X** Z QvX PVR). Then, all the available exogeneity conditions can be compactly described by 120 (v.11) plim(1/N)F'u = E(F'u) = 0 Note that not all the moment restrictions in (v.11) are linearly independent. To see ‘why, we can rewrite the restrictions in (v.11) separately as follows: (v.12-1) E(x**'u) 0 (v.12-2) E(z**'u) o (v.12-3) E(X'Qvu) = o (v.12-4) E(R'Pvu) = o (V.9-1) implies E[X1t' (cit—2i) ]=E[xit' (uit-fii) ]=0, which in turn shows (V.12-3). Since (V.12-1) is derived from (V.9-1), (V.12-1) also implies (V.12-3). That is, k restrictions among those in (V.12-1) are already incorporated. by (V.12-3). Therefore, (V.12-1) contains extra [(T2-T-1)k], not [T(T-1)k], conditions which are not captured by (V.12-3). Since (V.12-2) implies [(T-1)g] restrictions, (v.11) includes in total extra [(TZ-T-1)k+(T-1)g] conditions compared with (v.4). Following the same reasoning, we.can easily show'that the rank of (XM QVX) is T(T-1)k, while the number of columns of (X?' QvX) is (T2-T+1)k. That is, F does not have full-column rank. Therefore, we have to exclude QvX, or, as I assume here, the last k columns of X** from F, when we procceed the 2SLS estimation based on (v.11). Then, the 2SLS estimator, 8,, has the same form as 86 except that F replaces G: (v.13) 8,. = (w'r‘5F(F'F)“lF'r'5W)‘1w'r“1F(FIF)'1F'r'ky = (W'F(F'I‘F)'lF'W)'1W'F(F'PF)'1F'y The asymptotic distribution of 8F is given: 121 (V-14) ”(SF-e) -+ N(0, aezplim[(l/N)W'F(F'I‘F)’1F'W]'1 ) a, is also the GMM estimator based on all the available exogeneity conditions give in (v.11). ** is irrelevant for the static model. To see However, W why, note that Hausman and Taylor, and Amemiya and MaCurdy, use (Qv R) as instruments, not G=(QvX PQR). It actually does not matter whether we choose (QVX PVR) or (QVIR), because both generate the same estimator; see Breusch, Mizon and Schmidt [1989]. By the same reasoning, the 1V treatment with F=(W",QvX,PvR) on the static model amounts to that with (W**,QV,R). Now, since the columns of W" are linear combinations of columns of er in the sense that W**=QVW**, the projection of a variable onto (W** Qv R) must be equal to that onto (Qv R). This means that (WM Qv R) provides the same estimator as (QVZR), and therefore (QVX PVR). In other words, the inclusion of W** into the set of instrument variables is irrelevant for the estimation of the static model. However, the situation is different for the dynamic model, because Qv is no longer a legitimate instrument. (This is why the within estimator is inconsistent.) The existence of the lagged dependent variable as an explanatory variable makes (Qv R) substantially different from G=(QVX PQR). These instrument sets no longer generate the same estimator. This is because QVX, not Qv alone, is a legitimate instrument. This fact has a powerful implication. Since W** and QvX are linearly independent, the inclusion of W** into the set of instrument variables will improve the efficiency of the ZSLS 122 estimator for the dynamic model; i.e., 8, is more efficient than 86 when y_1 is present. Furthermore, when the dimension of W** ((Tz-T-1)k+(T-1)g) is considered, the efficiency gain from using W” :may be enormously huge. This is an extraordinary case for the general regression model, because the number of instrumental variables based on the exogeneity conditions usually does not change whether the given model is dynamic or static. Therefore, this suggests that the moment conditions under a model using panel data should be carefully counted. Now, let us include the moment conditions used in the conventional IV approaches for the dynamic model. For this, define s = (A, F) Then, the orthorgonality conditions can be expressed by (v.15) p1im(1/N)S'u = 0 Since Cov(S'u)=a¢2E(S'I‘S), S'I‘S can be used as a consistent estimator of Cov(S'u). The GMM estimator based on (v.15) is given by: (v.16) 88 = (W'S(S'PS)'IS'W)'LW'S(S'FS)'IS'y where (v.17) VN(§s-9) » N(0, ogzplim[(l/N)W'S(S'FG)'IS'W]'1 ) This is also algebraically identical to the IV estimator using S as the instruments on the transformed equation given in (v.6). APPENDIX B shows (v.13) WIS(s'rG)‘1s'w = W'F(F'FF)'1F'W + K1 where K1 is a positive semidefinite matrix. This means that 123 88 is more efficient than 8F. As a final step, let us include the whole set of moment conditions available. Define 146) = (3(9): F) The notation, H(6), indicates the fact that, unlike S, H is a function of 9. Then, (v.19) plim(1/N)H(8)'u = 0 Note that C=Cov(L'u)= iglml‘i'uiui'l‘i) where L1=(H1,F1). Evaluate L1 and “1 using any consistent estimator, 8, and denote them by 1.1 and {11. Then, C= Elfiivfiifigfii is a consistent estimator of C. The GMM estimator, 8L, minimizes (y-we) 'L(eI¢‘1L(e> ' (y-we) and its asymptotic distribution is given: (v.20) VN(8L-6) —» N(0, plim[(1/N)W'(L+LD)C"1(L+LD)'W]'l ) where ID=(D,O,O). For an efficiency comparison, assume that the e's are normally distributed. Then, using LEMMA 1 in APPENDIX A, we easily show H E 61: m o 'H (v.21) plim(1/N)Cov(L'u) = aiplim(1/N)[ 'H 2| o o ¢RPVR ] ogzplim(1/N)L'FL + a£2J* * 2 J O + a6 [ O O 0 0 000 where E=(W’*,QVX). This implies that L'PL+NJ* can be used as a consistent estimator of C. If we replace 6 by L'FL+NJ*, as shown in APPENDIX F, the inverse of the asymptotic covariance 124 matrix of an is given by: (v.22) w'3(s'r3)'1s'w + K2 where K2 is a positive semidefinite matrix. Therefore, 8L is more efficient than 85. Actually, 8L can be said to be efficient, because it exploits all the available moment conditions. The following linearized GMM estimator can also be used: (v.21) 5:. = (w'(£+£D)€:-1(£+£D)IW)’1WI(£+£D)€;‘1(£+£D)I(y-pv{i) where “ means "evaluated at a consistent estimator of 6." Again, 5L has the same asymptotic distribution as 8L. VI. Conclusion In this paper, I have adopted standard assumptions for the dynamic panel data model, and I have characterized all of the moment conditions that these assumptions imply. I showed that previous IV estimators do not impose all of the available linear ‘moment conditions, and also that there are some nonlinear moment conditions that they do not incorporate. This reveals the inefficiency of the conventional 1V estimators. I propose an efficient GMM estimator. Since the GMM estimator is nonlinear, I also considered an efficient linearized version of the GMM estimator. I also extended this approach to the dynamic model that includes exogenous variables. I showed that the existing treatments of the static model incorporate some but not all of the restrictions that are implied by strong exogeneity of the 125 exogenous variables. The extra restrictions implied by the strong exogeneity are irrelevant in the static model, but they are relevant in the dynamic model. This fact generates a large number of additional instruments, all of which can be exploited either by a linear IV estimator or by a nonlinear GMM estimator. These extra moment conditions may result in a large gain in efficiency in the dynamic model. 126 ENDNOTES This assumption implies that Yio's are stochastic. Since the a's are random effects, and since Yio may include ai, this seems reasonable. Also, the assumption, E(y10)=0, is adopted for simplicity. Actually, it is enough to assume that 2 is the second-moment matrix: 2 = E[(YoraI‘1I€2I' ' ' I51) ' (yo.a.€1.62,' ° ' '51)] J contains 062. Therefore, J' must be estimated by any consistent estimator of 062. We assume here that ¢2 is known. For the estimation of ¢2, see Hausman and Taylor [1981]. APPENDICES 127 APPENDIX A To derive the GMM estimator based on the restrictions in (11.5), we need to know the covariance matrix of Ai'ui under (SA) given in Section III. For this, the following lemmas are useful. LEMMA 1 Suppose that £1, £2, and 53 are random variables. Assume that 51 is independent of E3, and E(£1)=0. Also assume that 52 = 301 + bnz. where n1 is independent of £3; and n2 is independent of £1, and a and b are some constants. Then, E(£1£2£3) =E(51€2)E(€3)- Proof. E(Elizig) = aE(€1n1€3) + bE(5102€3) = aE(€1nl)E(€3) + bE(El)E(02€3) = aE(£1n1)E(£3) = E((an1+ bn2)81]E(53) = 3(5152)E(E3) QED LEMMA 2 E[Y1tY1h(‘1,e+1'€a)(€1,k+1'€1k)] = E(Yityih)E[(€i,s+l'6is)(ei,k+1-€ik)]l for tsh, tt. Then the typical element of Cov(Ai'ui) = IE(A1'uiui'Ai') must be E(YitY1h(51,s+1'eie)(‘1,k+1'51k))'Where ch, s>t, and k>h. By LEMMA 2, E(Yityih(€i,s+l'€is) (€1,k+1’€1k) ) = E(Yityih) EE (€1,e+1"€1s) (€1,k+1‘€1k)] Using this fact, we can show E(Ai'uiui'Ai) = E[Ai'E(uiui')Ai] = ae2E[Ai'FiAi] where? 66213 = E(uiui'). It is a well-known fact that F1=a¢2(/Q+6>2P) where ¢2=(052+Toa2)/062; see Hausman and Taylor [1981]. Therefore, we have E(Ai'uiui'Ai) = aezE[Ai'Ai]. QED To derive the GMM estimator, we note that plim(1/N)Cov(A'u) plim(1/N)jg1Cov(Ai'ui) N plim(1/N)i§1E(Ai'uiui'Ai) 129 , N = 062-p11m(1/N)j;1E[A1'A1] = o£2°plim(1/N)E(A'A) a£2°plim(1/N)A'A This shows that (A'A) could be used as a consistent estimator of Cov(A'u). (Actually it must be 362(A'A) where 3:2 is a consistent estimator of 062. However, 3‘2 would cancel and therefore is irrelevant for the derivation of the GMM estimator.) By minimizing (u'AI (A'AI‘1(A'u> = {(y-6y-1) 'A} (A'AI'1{A' (r6121) 1. we obtain the GMM estimator of 6, which is exactly identical to the IV estimator given in (11.6). 130 APPENDIX 3 From (111.6-1), we have (with i suppressed): (A-B'l) E(You1)=E(You2)=E(You3)=' ’ ° ' =E(You-r-1)=E(You1-) E(UZU3)=' ° ' ' =E(UZUT_1)=E(U2UT) E (“T-2‘11-1) =3 (“Ir-2‘11) This set of equations can be rewritten as follows: (A.B-Z) E[Yo (112-111) ]=E[Yo(u3'u2) ]=' ’ ° ' 3 E(Yo(ur'uT-1)] =0 E[u1(u3-u2)]=°°°°= E[u1(uT-uT_1)] =0 E[“1-2(u1‘uT-1)]=° Multiplying the equations in the first row of (A.B-2) by'6 and adding the equations in the second row gives: (LB-3) E(Y1(u3-u2) )=° ' '=E(Y1(uT-UT-1) )=0 The same procedure for the other equations generates all the moment conditions given in (III.7-1). Now, we can rewrite (111.6-2) as follows: (A.B-4) E[u1(u2-u1)] = E[u2(u3-u2)] = - - - ' = E[uT_1(uT-u.r_1)] These restrictions can be equally expressed by (A.B-S) E[u1(u2-u1)-u2(u3-u2)] = E[u2(u3-u2)-u3(u4-u3)] =°'-- = EDIT-2(“T-1'u'r-2)‘u'r-1(“'r’u1-1)1 = 0 Since E[yo(u2-u1)]=E[y1(u3-u2)]=0 by (A.B-2) and (A.B-3), E[u1(u2-u1)-u2(u3-u2)] + 6E[y0(u2-u1)] ‘ 6E[y1(u3-u2)] g E[Y1(u2'u1)'Y2 (“3'U2H = 0 Similarly, we can show that E[yt(ut+1-ut)-yt+1(ut+2-ut+1)] = 0 131 for any IStST-z. Finally, (111.6-3) implies: (A.B-6) E[(u2+u1)(u2-u1)] = E[(u3+u2)(u3-u2)] = ~°°° = E[(uT+uT_1)(uT—ur_1)] = 0 Since 023=024=~-=02T and nl3=n14=---=nu by (111.6-1) and (111.6-2), Therefore, E[ (1124-111) (‘12-‘11) +113 (112-111) +1.14 (Uz‘ul) + ° ‘ ' ° +uT(u2-u1)] = 0 E(fi(u2-u1)) = o _ T where u=(1/T)tE1ut. In essentially the same way, we can show that all the restrictions in (III.7-3) are justified. 132 APPENDIX C The following lemmas and theorems will help to derive the covariance matrix of Hi'ui under (SA). LEMMA 3 All the assumptions in (SA) are satisfied. Then, (with i suppressed) E(YtYh(‘e+1"€e) (eh+1'6h) )=E(YtYh) 1“ (€e+1"€e) (5h+1"€h) ) where t=0,1,°'-,T-2, h=1,2,-°°,T-1, and tt2h, 68+I-68 is independent of Y£Yh, and E(es+1-ea)=0. If h+1=s, eh+1 is independent of ytyh and 5h is independent of 63+I-68. If h+1t, s,k=1,2,-°-,T-1, and 6:13:16“. Proof. Let ea+1=élet-(ea+1+ea); then E=(68+1+68)/T+e3+1/T. Therefore, we have (A.c-1) E(ytz(€s+l-es) (‘k+1‘€k)] = E[Yt(es+l+€s) (63+1-es) (5k+1"€k) l/T + E[Ytee+1(‘a+1"e) (‘k+1"k) ”T = a + b Note that yt is independent of 55+1 and £8. Obviously, E(yt)=0. If k>t, 5114-1“): is independent of yt. If k=t, ck” is independent of yt, and ER is independent of (63+1+6s)(es+1-€s)° If ks+1 or kt+1 or hh+1 or k(ek.1-ek)1 + tEIe‘) - 30.41/T. If k=h-1, (A.c-14) EtehEIeh.1-eh)<(‘T-2) . d = E(e4)-3oe4. Proof. A typical element of E(Bn'uiui'Bu) is given by: 0’“ (3'15) E[{Yt(“t+1'ut) ‘Yt+1(ut+2’ut+1) }a(uk+1'uk) 1 E[{Yt(‘t+1'€t) ”Yt+1(€t+2’et+1) } (0+2) (6k+1-6k)] EtytIa+E> (em-ct) (em-em ' E[Yt+1(a+z)(€t+2'5t+1)(€k+1‘5k)] By LEMMA 5 and 10, we have (A.c-17) E(yt(a+2) (eta-ct) (€k+1-€k)] = EEYt(a+3)]E[(€t+1-6t)(ska-61.)] + g... (A.C-18) E[yt+1(a+2) (et+2-et+1) (ek+1-ek) 1 = ElYt+1(a+z)]E[(5t+2'5t+1) (€k+1'5k)] + gt+l,k Substituting (A.C-17) and (A.C-18) gives (A- (3'19) El: {Yt(ut+1'ut) 'Yt+1(ut+2'ut+1) }E(uk+1-uk) ] 142 = Etyt(a+3) ]E[(6t.1-6t) (5k+1'€k) 1 ' E(Yf+1(a+z)]E[(5t+2'6t+1)(5k+1'5k)] + btk = E(ytfi)E[(ut+l-ut)(uk+l_uk)] ' E(Yt+1G)E[ (ut+2'ut+1) (uk+1-uk) ] + btk where btk=gtk-gt+1'k. We can easily show that (A.C-20) btk== (2/T)d, if k=t -(1/T)d, if k=t+1 or k=t-1 0, otherwise. (A.C-19) and (A.C-20) imply the result. QED the following lemmas help us derive the covariance matrix Of Bzi‘Uie LEmflflk 11 E[22(et+1—et)2] E(32)E[(et+1—et)2] + 2[E(e4)-3a¢4]/T2. Proof. '2 2 2 E[e (6t+1 +2€t+16t+6t )] E[Ez(5t+1'€t)2] E(Ezet+12) - 23(Ezet+let) + E(Eetz) [E(e4)+(T-1)064-4064+E(e4)+(T-1)et4]/T2 2052/1 + 2[E(e4)-3o£4]/T2 E(32)E[(et+1-et)2] + 2[E(e4)-3ae4]/T2 QED LEMMA.12 E[22(Et+l-€t)(€t+2-et+l)] ==:E(EZ)E[(€t+l-et)(€t+2-€t+l)] - [E(e4)-3oe4]/T2. 143 Proof. E[22(et+1'et)(‘t+2'5t+1)] E-z +E-2 -E-z 2_E-2 a (5 6t:+16t~I-2) (e 6feta) (e 6t+1 ) (6 stat-+2) = [2064 +2oe4-E(e4)-(T-1)ae4-2064]/T2 _ 4 _ 4 _ 4 2 - -at /T [E(e ) 306 ]/T = E(PIEuem-et)(em-emu - [E(e‘I-Ba.‘J/T2 QED LEMMA 13 BIZEZ (€t+l-€t) (et+j+1-6t+j)] = E(Ez) El: (5t+1'5t) (et+j+1'€t+j) J r 3°22 - Proof. E[Ez(‘t+1"t)(5t+j+1'€t+j)] = E(Zzet+let+j+1) + E(Ezetet+j) - E(Ezet+1et+j) - E(Ezetet+j+1) = [2064 +Zae4-Zae4-20541/T2 = 0 E(22)E[(£t+l-et)(€t+j+1-€t+j)] QEII Now we can derive the covariance matrix of BZi'ui. THEOREM 6 ./ where - 2 -1 0 1 -1 2 -1 J3 = 0 -1 2 0 0 -1 : : : 2 -1 - 0 0 0 -1 2 J(T-1)x(T-1), 144 Proof. A typical element of E(BZi'uiui'Bu) takes the form of (A.c-21) E[32(ut+1-ut)(uk+1-uk)] = E[(a+3)2(et+1-et)(ek+1-ek)] = m2 (eel-ct) (eke-ck) I + 2E[a€(et+1-6t) (ska-61.)] '+ E[22(€t+1-6t)(ek+l-6k)] = EIaZIEI (eel-st) (em-ck) I + E[Ez(et+1-et) (€k+1"k) I By LEMMA 11,12, and 13, we have (A.c—22) E[22(5t+1'5t) (6k+1'€k)] = E(22)E[(5t+1'5t) (6k+1_€k)] 1' at); where dtk = 2d, if k=t -d, if k=t+1 or k=t-1 0, otherwise. Substituting (A.C-22) in (A.C-21) gives (A.c-23) E[Gz(ut+l'ut)(uk+l'uk)] = E[(a+z)2]E[(€t+l'€t)(ek+l-ek)]+dtk = E(62)E[(ut+1'ut) (um-uh) 1w...- (A.C-23) implies the result. QED Since H=[A B1 B2] , and H1=[A1 B11 B21], Theorem 1-6 can be used to derive the covariance matrix of H'u. (A.C-24) (1/N)Cov(H'u) (1/N)E(H'uu'H) N N 2 * 145 2 N ' 2 * ae (1/N) iE‘1E(Hi Hi) + 05 J 062(l/N)E(H'H) + ag2J* where o o l o J* = o (d/a:+d§)J1 (1/T)(d/o:)J2 0 (1m (ca/aim, (1/T2) (d/afin3 If e is normally distributed, d=0. Therefore, under normality of e, .*=[ where J=a¢52J1 . COO OQO COO I__J 146 APPENDIX D Here, I show the explicit forms of $1 and #2 given in (IV.16) and (IV.17), respectively. The e's are assumed to be normally distributed. Then observe N (A.D-l) plim(1/N)H'H p1im(1/N)1§1H1'Hi N = plim(1/N) 21E(H1'Hi) 1.: 2 2 -2 2yio 2Yioui i=1 N = plim(1/N) E E[ _ _ 2 [ 00 00a ] = 2 2 2 o aa+a€/2 0a (A.D-2) plim(1/N)H'y_f plim(1/N) g E Yio(Y11‘Yio’ i=1 “i‘Yii'Yio’ 2 = [ (6-1)oo+00a ] 2 2 (6 1)00a+0a+ae/2 (A.D-3) plim(1/N)D'y_1= plim(1/N) g E[ 0 ] 1:1 (Yio+yi1)(‘iz"i1) _ o [ -o§/2 ] Using (A.D-1), (A.D-2), and (A.D-3), we have (A.D-4) plim(1/N)y_1'D(H'H)’1H'y_1 = -aez/4 (A.D-S) plim(1/N)y_1'D(H'H)'1D'y_1 = mews/{of(1-po.’-I+o.2/2I where p0a=00a/(aoaa). Substituting (A.D-4) and (A.D—S) into (IV.16) gives 147 (A.D-6) t1 = plimII/NIy;1'Pay-I+(1/8)0.4/{oa2(1-p0a2)+o.2/2}-(1/2)a.2 Also, using (A.D—1), (A.D-2), and (A.D-3), we can show (A.D-7) plim(1/N)y;1'MAB2 = plim(1/N)y;1'B2 - plim(1/N)y_1A(A'A)'1A'82 (002002-00a2)/002 + 062/2 = aa2(1-p0a2) + 052/2 (A.D-8) p1im(1/N)BZ'MABZ plim(l/N)BZ'BZ - p11m(1/N)132'A(A'A)‘1A'B2 = 2{ (oa2002-00a2) #50244162 /2} = 2[oa2(1-p0a2)+a£2/2] Therefore, (A.D-9) plim(1/N)y_1'MABz(BZ'MAB2)'1BZ'MAy_1 = (1/2) { (oazaoz-aof) /aoz+ae2/2} = (1/2)[0a2(1-90a2)+0.2/2] Since plim(1/N)y;1'ny_1== plim(1/N)y_1'PHy_1 -plim(1/N)y_1‘MAB2 (B2 'MABZ) ‘132 'MAy_1 , (A.D-10) t2 = plim(1/N)y;1'PAy_1 = plim(1/N)y_1'PHy_1 -(1/2)[0a2(1-90a2)+0.2/2] 148 APPENDIX E Here, we show the efficiency gain of as defined in (v.16) over 8, in (v.13). Consider the inverse of the asymptotic covariance matrix of 81,; W'F(F'I‘F)'1F'W. For simplicity , let E=[w** QVX]. E' 2 E'E 0 (A.E-l) F'FF = [ R'Pv ](Qv+¢ Pv)(E PVR) = [ o ¢2R,PVR ] [ E'y_1 E'X E'Z ] EI (A.E-Z) F'W = [ ](y_1 x Z) = . ' ' R va_1 R va R pvz I R PV Using (A.E-l) and (A.E-2), some straightforward algebra shows I I _1 y-1 PEy-l y-1 PEX ° (A.E-3) W'F(F'FF) F'W = X'PEy X'PEX o o o o _ I I I . y-1 PPvRy-l Y—i PPVRX Y—i ppvRz 2 I I I + e x PPvRy x PPva x vaRz Z'P y Z'P x 2'? z PvR PVR PvR . Now, consider the inverse of the asymptotic covariance matrix of as; W'S(S'FS)'18'W AI A'A A'E o (A.E-4) SIrs= E' (Qv+¢2Pv)(A E F)= E'A E'E o RIPv o o ¢2R'PVR A' AIy_1 A'X o (A.E-S) S'W = E' (y_1 x 2) = E'y E'X o I I I I R pv R va_1 R pvx R pvz Then, after some matrix operations, we have 149 (A.E-S) WIS(SIrS)’1SIw y_1'A y_1'E AIA A'E ’ AIy_1 A'x : o I = [ [ x'A x'E ][E'A E'E] [E'y_1 E'x] : 0 ] """"""""'6 """"""""" T"6 - I I I . y-1 PP RY-i y-1 PP RX Y—i PP Rz 2 v IP vx X'P vZ + ¢ X P y_ X PvR 1 PVR PVR ZIP y_ ZIP x ZIP Z PvR 1 PVR PvR . Note that -1 A'A A'E 1 _1 _1 (A.E-7) [ ] = [ ](A'MEA) (1, -A'E(E'E) ) E'A E'E -(EIE)’1EIA o o + [ o (1:I1:)'1 ] A'y_1 A'x] '1 (A'E-8) (I! “A'E(E'E) )[ I I E y_1 E X —_- I I (A MEy_1, A MEX) = (A'MEY—ll 0) Therefore, y_1'A y_1'E A'A A'E '1 AIy_1 A'x (A'E'9) [ X'A X'E ][ E'A E'E ] [ E'y_1 E'X ] y_1'A y_1'E 0 0 A'y_1 A'x -1 [ X'A A'E ][ 0 E(E'E) ][ E'y_1 E'X ] Y 'M +[ E“ o ]A'MEA(A'MEy_1, 0) = [ y-1'PEY-1 Y—i'P X ] + [ K11 ° ] I I x PEy_1 x PEX o o where Kn=y_1IMEA(AIMEA)‘1AIMEy-1. Substituting (A.E-9) into 150 (A.E-6) gives (A.E-lO) WIS(SIPS)’1SIw = W'F(F'I‘F)'1F'W + I<1 where 151 APPENDIX E If the e's are normally distributed, the inverse of the asymptotic covariance matrix of 8L is given by: (A.F-1)‘WI(L+LD)¢‘1(L+LD)IW * -1 I I I I I Y_1 (H+D) y_1 E y_1 PVR H H+J H E o = X'(H+D) X'E X'PVR E'H E'E o 2 I g ' z D 0 Z PVR o o e R PvR (H+D)y_1 (H+D)'X (H+D)'X E'y_1 E'X 0 I I I R P y__1 R PvX R Pvz X'(H+D) X'E E'H E'E ----------------------------: ------------------- I- ( Z,D o )[H'H+J* H'E] 1[(H+D)y_1 (H+D)'x] E'H E'E E'y_1 E'x ”[y_1'(H+D) Y-1.E] [H'H+J* H'E] -1[(H+D)y_1 (H+D) 'X] I I I -§-_§li§in----Zl§ ..... Elfi------l-_------9--- ' H'H+J H'E D'z I< Z'D 0 )[ E'H E'E J E o I . y-1 PP RY-1 Y—l PPVRX y--1 PPVBZ 2 + ¢ x P y_ X'P x X'P z PvB 1 Pv PvB ZIP y ZIP x ZIP Z PvB 1 PVB PvB _ Note that HIH+J* H'E "1 1 * _1 (A.F-2)[ ] = [ _1 ](H'P H+J )(1 -HE(E'E) ) E'H E'E -E(E'E) H I '1 ] + o (E'E) 152 (H+D)y_1 (H+D)'X (A.F-3) ( 1 -HIE(EIE)'1 )[ _1 D'z (A.F-4) ( I -H'E(E'E) )[ ] = D'z 0 Substituting (A.F-2), (A.F-3), and (A.F-4) into (A.F-S) gives (A.F-S) WI(L+LD)¢'1(L+LD)IW = W'F(F'F)'1F'W 'M H 'D y-l E y--1 * + X'D (H'MEH+J )(H'MEy +DIy_1 D'X D'Z) Z'D = W'F(F'F)'1F'W * I: It I I I I y_l MEA y_1 ME(B +D ) A MBA A MEB + 0 X'D* * * * * 2 0 Z'D B 'M A B 'M B +0 J** E E e A'M y_1 0 ° * e [(3 +0 )‘MEy_1 D*Ix D*IZ ] where B*=(B1 B2); D*=(0 D2); J 0 J** = 02 [ ] 0 0 Note that I: I I A MEA A MEB ] (A F.6)I:B*'M 3*IM B*+ 2J** EA “e -1 E [(A'MEAY-1 A'MEB*] * (B -I ' * ** * -1 — M B +J -B 'MEA(A'MEA) A'M 3* E E ) 153 ' . _ _ (A'MEA)-1 o °(B* MEA(A MBA) 1 1) + [ 0 o ] = K3 . [ “a“ Z ] Substitute (A.F-6) into (A.F-S); then, we have (A.F-7) W'(L+LD)C'1(L+LD) Iw = W'F(F'I"F)'1F'W + R1 + K2 WIS(SIrS)’1SIw + K2 where I): * y_1'MEA y_1'ME(B +D ) A'M y 0 0 * K = 0 X'D K 2 at 3 * * * * 0 Z'D (B +D )'MEy_1 D 'x D 'z which is a positive semidefinite matrix. This shows that 8L is more efficient than 83. REFERENCES 154 REFERENCES Amemiya, T. (1974), "The Nonlinear Two-stage Least-squares Estimator," qutnai of Econometrics, Vol. 2, pp 105-110. Amemiya, T., and MaCurdy, T. (1986), "Instrumental-variable Estimation of An Error-components Model," Economctrica, Vol. 54, pp 869-880. Anderson, T., and Hsiao, C. (1981), "Estimation of Dynamic Model with Error Components," J urna o the r a Statistic Association, Vol 76, pp 598-606. Arellano, M., and Bond, S. (1988), "Some Tests of Specification for Panel Data: Monte Carlo Evidence and Application to Employment Equation," working Paper no. 88/4, Institute for Fiscal studies, London. Balestra, P., and Nerlove, M. (1966), "Pooling Cross Section and Time Series Data in the Estimation of a Dynamic Model: The Demand for Natural Gas," Econometrica,‘Vol 34, pp 585-612. Breusch, T, Mizon, G., and Schmidt, P. (1989), "Efficient Estimation Using Panel Data," Econo et ‘ , Vol. 47, pp 695-701. Chamberlain, G. (1987), "Asymptotic Efficiency in Estimation with Conditional Moment Restrictions," EQQIQQL__Q£ Eccnomcttics, Vol 34, pp 305-334. Hausman, J. (1978), "Specification Tests in Econometrics", Economcttica, Vol 46, pp 1251-1271. Hausman, J., and Taylor, W. (1981), "Panel Data and Unobservable Individual Effects," Econometrica, Vol 49, 1981, pp 1377-1399. Holtz-Eakin, D. (1988), "Testing for Individual Effects in Autoregressive Models, " Journai of Econometrics, Vol. 39, pp 297-307. Hsiao, C. (1982), "Formulation and Estimation of Dynamic ModelsflUsing'Panel.Data," Journai of Econometrics,.Annals of Applied Econometrics, Vol. 18, pp 47-82 Hsiao, C. (1986), Analysis of Panel Data, New York: Cambridge University Press. 155 Kiefer, N. (1980) , "Estimation of Fixed Effect Models for Time Series of Cross-sections with Arbitrary Intertemporal Covariances,“ Journal of Economctrics, Vol. 14, pp 195-202 Newey, W. (1985) , "Generalized Method of Moments Specification Testing," Jccrnai cf Econometricc, Vol. 29, pp 229-256. HICH IES 11111111111