LIBRARY
Illehlm State
University

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

FEB 0 51997

2009
97.0.20»,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4L '

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

{\I

MSU Is An Affirmative Action/Equal Opportunity Institution
cmma-ot

 

 

 

 

 

THREE ESSAYS ON SHARE CONTRACTS,
LABOR SUPPLY, AND THE ESTIMATION OF
MODELS FOR DYNAMIC PANEL DATA

BY
Seung Chan Ahn

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

1990

ABSTRACT
THREE ESSAYS ON SHARE CONTRACTS,

LABOR SUPPLY, AND THE ESTIMATION OF
MODELS FOR DYNAMIC PANEL DATA

BY
Seung Chan Ahn

This dissertation deals with three topics: share
contracts, labor supply, and the estimation of models for
dynamic panel data. Chapter 1 proposes a model which predicts
under generally acceptable assumptions, fixed wages across
different economic states and lay-offs for bad states, and
shows that a share contract exists that Pareto-dominates and
has no less employment than the fixed-wage contract. Chapter
2 considers joint estimation of the determinants of the
employment status of married women, their labor-force
participation decisions, and their market wages. The
empirical results imply that recognizing frictions in the
labor market is important to explain the determinants of
individuals' employment status in a concrete and correct way.
The estimation procedure including the wage equation generates
more significant and reasonably signed estimates. Chapter 3
considers a dynamic model using panel data which include a
large number of cross-section observations, but only over a
short period of time. 'This chapter proposes an estimator that

is efficient under general circumstances.

TABLE OF CONTENTS

Introduction . . . . . . . . . . . . . .

Chapter 1
A share Economy as a Work incentive Device

I. Introduction . . . .
II. Wage Contract Model
III. Share Contract Model
IV. Conclusion . . . . .

Appendix . . . . . . . . . . . . . . . .

References . . . . . . . . . . . . . . .

Chapter 2
The Joint Estimation of a Model of Labor
Force and Employment Decisions and Market

I. Introduction . . .
II. Model . . . . . .
III. Data . . . . . . .
IV. Empirical Result

V. Conclusion . . . .

Appendix A . . . . . . . . . . . . . .
Appendix B . . . . . . . . . . . . . . .
Appendix C . . . . . . . . . . . . . . .
Appendix D . . . . . . . . . . . . . . .

References . . . . . . . . . . . . . . .

iii

Wages

17
24

29

32

33

33
36
50~
53
64

75
78
81
86

89

Table of Contents (cont'd)

Chapter 3
Efficient Estimation of Models for Dynamic
Panel Data . . . . . . . . . . . . . . . . . . . . .

I. Introduction . . . . . . . . . . . . . . . . . 91
II. Conventional IV Methods . . . . . . . . . . . 94
III. Derivation of Moment Conditions . . . . . . . 98
IV. Estimation . . . . . . . . . . . . . . . . . . 108
V. Estimation with Exogenous Variables . . . . . 115
VI. Conclusion . . . . . . . . . . . . . . . . . . 124
Appendix A . . . . . . . . . . . . . . . . . . . . 127
Appendix B . . . . . . . . . . . . . . . . . . . . 130
Appendix C . . . . . . . . . . . . . . . . . . . . 132
Appendix D . . . . . . . . . . . . . . . . . . . . 146
Appendix E . . . . . . . . . . . . . . . . . . . . 148
Appendix F . . . . . . . . . . . . . . . . . . . . 151

References . . . . . . . . . . . . . . . . . . . . 154

iv

Table
Table
Table
Table
Table
Table
Table
Table
Table
Table

Table

10

LIST

OF TABLES

66

67

68

69

69

7O

71

72

73

74

103

INTRODUCTION

This dissertation deals with three topics: share
contracts, labor supply, and the estimation of models for
dynamic panel data. Since each topic is independent of the
others, each of following chapters focuses on one topic and
contains its own introduction and conclusion sections.

Chapter 1 attempts to provide a theoretical basis for
share contracts. There are many studies comparing share and
fixed-wage contracts from the point of view of welfare or/and
employment. However, their methods of comparison are
arbitrary, in the sense:that they simply assume the fixed-wage
contract to be optimal among wage contracts. A better
comparison could be done by investigating the conditions which
generate fixed wages across different economic states, and
examining whether a share contract could perform better than
a fixed-wage contract under those conditions. For this
reason, I propose a microeconomic model which predicts, under
generally acceptable assumptions, fixed wages across different
economic states and lay-offs for bad states. I then show'that
a share contract exists that Pareto-dominates and has no less
employment than the fixed—wage contract. This result implies

that share contracts could not only improve every economic

2
agent's well-being, but also stabilize the employment level in
the economy.

Chapter 2 considers joint estimation of the determinants
of the employment status of married women, their labor-force
participation decisions, and their market wages.

Many of the previous studies of labor supply assume that the
employment status of an individual is determined solely by
his/her desire to work. Those studies treat the unemployed
and non-participants as behaviorally equivalent, ignoring
frictions in the labor market. In this chapter, the
unemployed are regarded as willing to work but not successful
in their job search, and therefore they are treated as
behaviorally different from.non-participantsm Therefore, the
model considered in this chapter consists of two equations
describing employment and labor-force decisions, and also a
wage equation. The empirical results given in this chapter
imply that recognizing frictions in the labor market is
important to explain the determinants of individuals'
employment status in a concrete and correct way, and that the
traditional labor-supply model generates biased estimates of
the determinants of willingness to work. Furthermore,
compared to other methods for joint estimation of labor-force
and employment decisions, the estimation procedure including
the wage equation generates more significant and reasonably
signed estimates. Significant sample selection biases
generated by employment and participation decisions are also

detected in the distribution of observed wage rates, and they

3

are successfully corrected by the joint estimation procedure.

Chapter 3 considers a dynamic model using panel data
which include a large number of cross-section observations,
but only over a short period of time. This chapter proposes
an estimator that is efficient under general circumstance.
Several authors have proposed simple but consistent
instrumental-variable (IV) estimators, which are identical to
generalized-methods—of-moments (GMM) estimators based on some
available moment conditions. GMM estimators are efficient in
general circumstances if all known is a certain set of moment
restrictions. This chapter adopts standard assumptions for
the dynamic panel data model, and characterizes all of the
moment conditions that these assumptions imply. It turns out
that previous studies do not impose all of the available
moment conditions, which reveals the inefficiency of the
previous IV estimators. The estimator proposed in this
chapter is efficient because it is obtained exploiting all

useful information from the standard assumptions.

Chapter 1

A Share Economy as a Work Incentive Device

I. Introduction

Stagflation during the last two decades has put an end to
the Gblden era of the Keynesian doctrine. If the Phillips
curve is downward sloping, then a trade-off between
unemployment and inflation must exist, and the government can
choose a desirable combination of them. However, the lesson
we have learned during last 20 years is that the long-run
Phillips curve seems to be vertical. High unemployment and
inflation do not alternate; they rather frequently occur
simultaneously. Government is no longer able to buy
employment at the cost of inflation. In a Keynesian
framework, an expansionary policy successfully reduces the
unemployment rate, since inflation drives down real wages.
However, the public's expectation of inflation seems to catch
up to actual inflation so quickly that nominal wages increase
at approximately the same rate as the general price level.
Therefore, government's expansionary policies often create
high inflation without affecting employment even during
recessions. This means that real as well as nominal wages
seem to be rigid.

Through a series of publications, M. Weitzman argues that
stagflation is tied to ‘wage rigidity. He argues that
"stagflation is just an unfortunate consequence of the wage-

payment system."1 His basic view is identical to the

5
Keynesians' in that he believes that all problems have their
origins in wage rigidity. However, his prescription differs
from the Keynesians' in that he emphasizes the necessity of
making wages flexible. His suggestion is to tie compensation
to "an appropriate index of the firm's performance, say a

share of its revenues or profits."2

For simplicity, imagine
a payment system in which the wage rate is determined by two
components -- some fixed compensation and a portion depending
on firm's total revenue. In this case, wages will move in the
same direction as the firm's performance. By a simple
algebraic equation, Weitzman shows that firms are always
characterized by an excess demand for labor. Therefore, in a
share economy, firms behave like a vacuum cleaner, constantly
searching for employees and eagerly sucking' up all the
unemployed. In a share economy, capitalism not only
guarantees Consumer Sovereignty; but Worker Sovereignty.3
With this belief, Weitzman suggests that the government should
offer tax incentives in order to get firms to adopt share
contracts.

There are two main criticisms of weitzman's analysis.
His ideas can be summarized in two propositions. First, in a
share economy compensation is no longer rigid, and it adjusts
in a manner that leads the economy to full employment even in
a short run. Second, the share economy improves the welfare
of economic agents. Nordhaus [1988] and John [1987] have

criticized Weitzman's first proposition. Nordhaus argues that

Weitzman's analysis of the short-run behavior of a share

6

economy omits a detailed specification of labor supply, and
shows that the excess-demand proposition no longer holds when
labor supply constraints are introduced. John shows that a
share economy may actually lead to greater employment
fluctuation depending on the specification of labor supply
curves. Share contracts will have less employment fluctuation
only’ if the share jparameters are. determined. by' correct
information about the demand for and supply of labor.

Cooper, (see Nordhaus and John [1986]) amongst others,
has criticized the second proposition. Implicit contract and
efficiency wage theories provide some intuition on the rigid
wage phenomena. The former suggests that risk-averse workers
would prefer rigid wages. The latter argues that wages are
rigid downward in order to prevent workers from shirking. If
wage rigidity comes from the self-interests of economic
agents, then the payment system will not allow wages to
fluctuate. Weitzman does not deny that fixed wages would be
optimal at.a:microeconomic level. His basic assertion is that
the wage contracts are not optimal at a macroeconomic level.
In order to support this claim, Weitzman [1985] shows that
share contracts could increase employment while at the same
time offering approximately the same compensation as under
fixed-wage contracts. This result is based on a macroeconomic
model. In response, Cooper [1988] shows that an injection of
share contracts into one sector of a two-sector economy will
yield a Pareto-improving resource allocation only in a special

case. That is, a share economy, in which all firms adopt

7
share contracts, may be superior to a wage economy, but share
contracts adopted by a subset of sectors may not help the
whole economy.

This paper is an attempt to provide support for the share
economy, which Wietzman's model fails to do. If there exists
a share contract that Pareto-dominates, and has no less
employment than a wage contract in a microeconomic model, then
the two main criticisms described above will be no longer
valid. The Pareto-dominating share contract will be able to

help the whole economy without suspending some group's self-

interest. Employment will also fluctuate less in a share
economy than in a wage economy. I demonstrate this in two
steps.

Some implicit contract models predict constant
compensation across states of nature under fairly strong
assumptions. My first focus is on whether constant
compensation could be observed under more general conditions,
in particular, when it is necessary to monitor labor's effort
on the job. I assume only risk-neutral firms and risk-averse
workers. Workers may have an incentive to shirk on the job,
once they are employed. Shapiro and Stiglitz's [1984]
efficiency wage model shows how the monitoring cost on workers
could bring about downward wage rigidity and unemployment. My
key point is the introduction of a no-shirking condition (NSC)
to the implicit contract framework as an incentive compatible
mechanism. My model predicts fixed compensation across states

with lay-offs in bad states. If there is no monitoring cost,

8
firms maintain a higher employment level even in bad states to
insure risk-averse workers. If there is a monitoring cost,
firms have an incentive to decrease the employment level to
save on monitoring. They cannot easily cut compensation for
workers, because lower wages give the employed an incentive to
shirk. This intuition partly explains my results.

As a second step, I compare share contracts with the
optimal wage contract. Not surprisingly, there always exists
a share contract which Pareto-dominates the optimal wage
contract, The reason is quite simple. ‘Under a wage contract,
shirkers still can get the agreed compensation regardless of
whether or not they shirk. On the contrary, under a share
contract, shirkers, themselves, suffer from their shirking,
because the firm's total output and revenue decrease when
shirking occurs. Therefore, firms can reduce the monitoring
cost per worker due to workers' decreased incentive to shirk.
Furthermore, I show the existence of some forms of share
contracts which have no less employment than the wage
contract.

My model has a very different implication from
Weitzman's. The share contracts in Weitzman's model Pareto-
dominate the wage contracts at a macroeconomic level, not at
a microeconomic level. The share contracts have a positive
effect on the economy-wide employment level, which in turn
increases aggregate demand, improving all firms' market
conditionse The workers ‘who are already employed. will

initially suffer from lower wages. However, their firms'

9

improved profitability under a share economy finally will
compensate their suffering with higher wages. This reasoning
is correct only when sufficiently many firms adopt share
contracts. Instead my model predicts Pareto-dominance of
share contracts over wage contract by a different mechanism.
Share contract creates some cost to shirkers. Monitoring
cost, a pure social welfare loss, will decrease. We can also
choose a share contract that has no less employment than the
wage contract has. Higher welfare and employment will result.
This is possible even at a micro level.

Section II shows that the optimal wage contract has fixed
wages across the states. Section III proves that a share
contract exists that Pareto-dominates the optimal wage
contract, and has no less employment. Section IV summarizes

some conclusions.

II. Wage Contract Model

First, consider a labor contract between a firm and N
homogeneous workers. For simplicity, assume that the firm
uses only labor to produce a single commodity; Output depends
on total employment (E) and.each worker's level of effort (e),
which are perfect substitutes in the production process.
Define the revenue function by sf (eE) , where s denotes an
predictable product-demand shock. Here, f(-) is strictly
increasing and strictly concave, i.e., f' > 0, f" < 0. For
simplicity, I assume that each employee's working hours are

fixed for technological reason. We can relax this assumption

10

without changing the results in this section. (For details,
see APPENDIX) Then, we have the profit function:
(1) "(8) ' 8f(e(8)E(8)) - we(S)E(S) - W“(S)(N-E(S))
where w‘ is the wage paid to each employed worker and wu is
the severance pay for the unemployed.

Each worker has the same concave utility function:
(2) U I U(Y,e)
where Y denotes consumption. Assume that U? > 0, Ue‘< 0. For
employed workers, consumption in state 8 is given by w°(s),
and for the unemployed, w“(s). Assume that the firm is risk-
neutral, and workers_are risk-averse. Then, 6w describes a
wage contract with
6.,={E(s).e(s).W°(s).W“(s)}

The optimal contract, 6w*, can be characterized by the
solution to the problem:
(C.1) maxa Earns)
subject to
Eg{(E(8)/N)U(W°(S).e(S))+(1-E(S)/N)U(W“(S).0)} 2 U0
0 s E/N _<. 1
e 2 0, for all s
where 00 is a utility level of a worker obtained in the
worker's next best alternative. The first constraint will be
binding, because otherwise the firm. could lower' w” and
increase profit. (See Cooper [1987].) This formulation is
very close to Cooper's [1987] basic implicit contract model.
The only difference is that I use e rather than the worker's

hours as in Cooper's model. This difference makes my model

11

similar to the principal-agent model [1978]. Basically,
principal-agent models are designed to show how a firm could
improve its workers' productivity with some specific
compensation scheme. As efficiency wage models suggest,
worker productivity will be related to the wage rate. The
model specified above offers the mechanism generating
correlation between productivity and the wage in an implicit
contract framework.

The first interesting result that arises from (C.1) is
that for any 8, the firm employs all workers --- there is full

employment. We can summarize this as following.

PROPOSITION 1. In any optimal contract, E(s)=N for all s.

Proof. Suppose not, i.e., in 6w*, there exists a state, 31'
with E(sl) < N. Given sl, a worker's expected utility is
given by

(E(Sl)/N)U(W°(81)re(81)) + (1-E(81)/N)U(W°(Sl).0)

Consider another contract, 6

_w, such that gw is identical to

6w*, for all states other than s1, and at s1, there is full

employment with

§(81)N = e(81)E(81)

We.) = (E(sl)lN)W°(s) + (1-E(sl)/N)W“(sl)
Xu(81) = 0

Then,

1(81) = slf(e(81)N) - iﬂsﬂN

= 81f(e(51)E(81)) ' (3(81))W°(51) ‘ (N'E(Sl))wu(81)

12

= n*<s1>
Therefore, the firm is indifferent between ﬁw and 6w*. Now,
compare the worker's expected utility under the two contracts:
(E(Sl)lN)U(W°(81).e(81)) + (1-E(81)/N)U(W°(Sl),0)
< U((E(81)/N)W°(81)+(1-E(81)/N)W°(81).E(81)e(81)/N)
3 U(!e(51):§(31))
The inequality is due to the assumption that workers are risk-
averse. Thus, workers prefers aw to 6w*, so that 6w* is not
an optimal contract.

QED

(C.1) implicitly assumes that the contract can be
enforced voluntarily. However, this assumption is quite
unrealistic. Even when a worker shirks, he still gets the
agreed compensation. Since all workers are identical, no one
will work. Therefore, the firm has to monitor workers in
order to sort out those who shirk on the job. .Assume that the
firm bears some cost (C) when monitoring a worker. Let m be
the probability of catching a given shirker. Assume
(3) C = C(m); C'>0
Now, each employed worker decides whether or not to shirk. If
an.employed.worker does not shirk, he gets utility of U(w°,e).
Otherwise he gets expected utility of (1-m)U(w°,0) + mU(o,0),
because shirkers, once caught, are fired immediately. To
prevent workers from shirking, the following condition (no-

shirking condition;NSC) must be satisfied:

13
(4) U(w‘,e)z (1-m)U(w°,0) + mU(0,0)
for all s. In this case, each firm faces the following profit
function:
(5) "(8) ' 8f(e(S)E(S)) - w°(S)E(S) - wu(S)(N-E(S))
- C(m(8))E(S)

Denote a contract by
5w = {E(8),e(8).W°(S).W“(S).m(8)}
Then, the optimal contract, 6w*, solves
(C.2) maxa Ean(s)
subject to
(0-2-1) E,{(E(S)/N)U(W°(S)76(8))+(1-E(S)/N)U(W”(S).0)} 2 U0
(c.2-2) U(w°(s),e(s)) z (1-m(s))U(w°(s),0) + m(s)U(0,0)
(c.2-3) o s E(s)/N s 1
The second constraint will bind, since the firm could
otherwise decrease m, and save on monitoring costs.

If there is no monitoring cost, i.e., if C(m) = O for any
m, (C.2) will be identical to (C.1). The reason is quite
simple. The firm can perfectly monitor workers without
incurring any cost. That is, the optimal choice of m.must be
1. Since workers' expected utility does not depend on m,
nonshirking workers will not resist the firm's perfect
monitoring. If m = 1, (C.2-2) becomes
U(w°, e) z U(0,0)
Obviously, this condition must hold even for the solution of
(C.1), because otherwise no one will work. Also, the firm's
profit function in (C.2) is exactly identical to that in

(C.1). This means that when monitoring cost is arbitrarily

14
small, there is no significant difference between (C.1) and
(C.2). Actually, (C.1) is a special case of (C.2), which can
be obtained under the assumption of zero monitoring cost.
Assuming that the solution always satisfies (C.2-3),
(C.2) predicts fixed wage compensation across states and lay-

offs in bad states.

PROPOSITION 2. The solution to (C.2) satisfies followings:
W“(s) = 1“

W°(s) = re

e(8) = e

m(s) = m for all s, and

dE/ds 2 0

Proof. To solve (C.2), we can construct Lagrangean:
L = sf(eE) - 93 - w“(N-E) -C(m)E
+ 8{(E/N)U(w°,e)+(1-E/N)U(w“,0)} + ¢G(w°,e,m)
where.G(w°,e,m) ==‘U(w°,e) - (1-m)U(w°,0) - mU(0,0). Note that
6 is independent of 5, while o is a function of s. From the
first order condition, we have
(6) U’“w - N/e = o
(7) U°w(6/N) + ¢Gm/E - 1 = o
(8) sf' + Uﬁe(0/N) + ¢GelE = o
(9) c' - ¢Gm/E = o
(10) sf'e - w° + wu - C + (Ue-U“)(6/N) = O

(11) G = o

15
where each subscript denotes the derivative with respect to
the variable it represents, and U9 = U(w°,e), Uu = U(w“,0).
Since 9 is independent of s, w“ is also independent of s.
This means that w” is a constant (wu = £9). By substituting
(9) into (7), (8), (10), and (11), we can rewrite the last
five equations as follows:
(12) Uﬁw(8/N) + c'sw/sm - 1 = o
(13) sf' + U°°(8/N) + C'Ge/Gm = o
(14) c' - ¢GmlE = o
(15) sf'e - w° + w“ - c + (Ue-Uu)(8/N) = o
(16) G(w°,e,m) = 0
Total differentiation and application of Cramer's rule of
these equations yields4
dw‘lds = de/ds = dm/ds = 0

dE/ds = -f'/f"g > o QED

It is very hard to provide clear-cut explanation for
these results, because all of the variables are interrelated
in a complex manner. However, some partial intuition follows.
The existence of monitoring costs gives the firm an incentive
to decrease its level of employment. As we saw in (C.1), if
there is no monitoring cost, the optimal contract is
characterized by full employment in all states (to provide
insurance to the risk-averse workers.) This is possible
because work effort is perfectly substitutable for employment
in production precess. Workers are willing to accept lower

wages in order to guarantee employment. However, monitoring

16

costs can be regarded as a fixed cost of employment.
Therefore, in bad states, the firm would prefer lay-offs to
save on the fixed cost of employment. Furthermore, since the
employment level has no effect on NSC (see (C.2-2)), the firm
has more discretion in choosing the employment level. This
explains the employment fluctuation result. In this case, the
firm must compensate workers with higher wages even in the bad
states to make them bear the risk of being unemployed. This
causes the wage profile across states to be flatter.

Another interesting result is that the optimal contract
fixes the level of work effort. This is consistent with
observation that unions usually try to predetermine worker's
on-the-job duties in labor contracts.s There is an
conventional explanation about this phenomenon. If the firm
has discretion on using workers, labor productivity could be
increased, because firm. could deploy its employees
efficiently. Higher labor productivity will allow the firm to
produce the given quantity of output with a lower level of
employment. Therefore, the firm will have a smaller incentive
to increase employment, if there is some fixed cost of
employment. Therefore, unions resist increasing labor
productivity. This interpretation is supported by my model.
In bad states, the firm would prefer to increase the workers'
level of effort with higher wages, decreasing total employment
to save on monitoring cost. However, since risk-averse
workers put higher value on employment than on wages, they

will resist this strategy. Also, the firm's profitability of

17
adopting this strategy is limited. Higher work effort
decreases nonshirking workers' utility. Higher wages increase
shirkers' expected utility as well as nonshirkers' utility.
Therefore, the firm has to increase monitoring intensity and
cost. Different from the common explanation, (C.2) suggests
that job descriptions are not only for workers' interest but

also for that of firms.

III. Share Contract Model

Section II provided a model which explains lay-offs and
fixed wage compensation during a contract period. In this
section, I will show the Pareto-dominance of share contracts
over wage contracts. The wage compensation per employee under
a share contract can be defined as following:
(17) w° = v + asf(eE)/E
where v is a fixed component of compensation, and a is a share
parameter that specifies the ‘variable component of
compensation. A share contract, 68 is defined by
6. = {E(S).e(8).V(S).a(8).m(S)}
Begin by assuming that the contract agreement is enforced
voluntarily, so that we may ignore the NSC. In this case, no
share contract can Pareto-dominate the optimal wage contract.
In fact, the optimal share contract is identical to that of
the wage contracts The reason is quite simple. ‘Under a share
contract, wage compensation is decomposed into two parameters
-- v and a. However, it is impossible to determine v and a

separately, because the first order conditions for'V'and.a are

18
identical. Without the NSC, the optimal share contract, 63*,
solves
(C.3) max5 Ea {(1-a)sf(e(s)E(s))-v(s)E(s)-C(m(s))E(s)
'W“(8) (N-E(S) )}
subject to
(C.3-1) Ea {(E(s)/N)U(asf(e(s)E(s))/E(s)+v(s),e(s))
+(1-E(s)/N)U(w“(s),0)} 2 U0
(c.3-2) o s E/N s 1
The first order conditions for v and a are given:
-E + (e/N)EU°w=o
-sf(eE) + (6/N)sf(eE)U§d=0
Both equations are reduced to
U°w =N/6
Therefore, we cannot determine v and a separately. Instead we
can derive only an optimal combination of v and.a. This result
implies an interesting characteristic of the optimal share

contract, as summarized in following proposition.

PROPOSITION 3. When workers have no incentive to shirk, the
optimal share contract is identical to the optimal wage

contract.

Proof. Define

w‘ = csf(eE)/E + v

Substitute we into (C.3), and find the solution. ‘This must be
the solution to (C.1). From w°(s), we can obtain an optimal

combination of v and a. QED

19

Note that the solution to (C.1) predicts full employment
in all statesu Basically, a contract between the firm.and its
workers provides insurance that cannot be obtained in.market,
due to the nontransferable characteristic of human capital.
If both parties have perfect information about each other, the
contract will be Pareto optimal. This result is nothing more
than the optimal resource allocation under Debreu-Arrow's
world of uncertainty. Therefore, no reform of the wage scheme
could make both the firm and workers better off concurrently.
However, insurance markets usually suffer from the Moral
Hazard problem, caused by insurance companies' imperfect
information on customers' behaviour: As we saw in section II,
the workers' incentive to shirk leads to unemployment under
the optimal wage contract. To prevent workers from shirking,
the firm wastes its resources in monitoring workers. In this
case, a wage compensation scheme which can suppress the
incentive to shirk may improve the performance of the economy.
This turns out to be true.

Consider the NSC in (C.2-2) . Under a wage contract,
shirkers do not suffer from their own shirking, since they
still receive the same compensation as nonshirkers do. Under
a share contract, however, shirkers do suffer from their own
shirking. If a worker shirks, the total output actually
produced ‘will be smaller than the amount agreed. to be
produced. Workers get lower compensation, because some
portion of wages is related to total revenue. The wage under

a share contract is given by:

20
asf(e(E-S))/E + v
where S is the number of shirkers. When a worker decides to
shirk, his expected utility is given:
(18) (1-m)U(asf(e(E-1))/E+v,0) + mU(0,0)
Therefore, the NSC under a share contract can be expressed as
(C.3-3) U(asf(Ee)/E+v,e) 2 (1-m)U(asf(e(E-1))/E+v,0) +mU(0,0)
Comparing (C.3-3) with (C.2-3), we can. easily see that
shirkers have lower expected utility under a share contract.
Hence, the firm could reduce m, and C(m). This implies that
firm and workers could be better off under a share contract.

This is stated in following proposition.

PROPOSITION 4. When workers have incentive to shirk, there
exists a share contract which Pareto-dominates the optimal

wage contract.

PROOF. The optimal wage contract is characterized by
6.} = {P.g“.e.E*(8).m}

Consider the following share contract,

5. = {0(8).V(S).W“(S).e(5).E(S)rm(S)}

with

m8) = E*(s).

e(8) = s.

0(S)Sf(e(S)E(8))/E(S) + V(S) = E9.

w“(s) = 39, and

m(s) = m for all s.

21

For all states, both contracts yield the same level of profit
for the firm and the same level of expected utility for the
workers. However,
0(08f(§E(8))/E(S)+V(S).0) > 0(asf(s(E(S)-1))/E(S)+V(S).0)
Therefore,
U(08f(EE(8))/E(8)+V(S)r e(8))

' U(!°.e(8))

= (1-m)U(LI°.0) + mU(0.0)

= (l-m)U(an(§.E(S))/E(S)+V(S).0) + mU(0,0)

> (1-m)U(an(§(E(S)-1))/E(S)+V(S).0) + mU(Oro)
The NSC is not binding under the share contract. Therefore,
the firm can reduce m, thereby decreasing C(m) , and increasing

profits. QED

For any share contract, the bigger is the portion of
compensation related to total revenue, the higher cost a
shirker bears“ Therefore, the best way to reduce the workers'
incentive to shirk is to increase the share parameter, a, as

much as possible. This gives us following result.

PROPOSITION 5. For any share contract, the optimal

combination of a and v requires v = 0.

PROOF. Suppose not. Consider a share contract specifying 63
= {a,v,e,E,m}. I will suppress s for notational convenience.
Suppose v¢o at some so. Choose g such that

gsf(Ee)/E = asf(eE)/E + v.

22
Consider a contract, 6, which is identical to 6, except that
g replaces a. Then, firm's profits and workers' utility do
not change under both contracts. However,
asf(e(E-1))/E + v

= {asf(eE)/E+v}{f(e(E-1))/f(eE)} + v{1-f(e(E-1))/f(eE)}

{QﬂeE} /E}{f(e(E-1) ) /f(eE)} + v{1-f(e(E-1) ) /f(eE)}

gf(e(E-1))/E + v{1-f(e(E-1))/f(eE)}
> gf(e(E-1))/E
Therefore,
U(gsf(eE)/E,e) > (l-m)U(gsf(e(E-1))/E,e) + mU(0,0)
Firm can decrease m, thereby increasing profits at 50- This

is a contradiction. QED

A share contract may cause (on average) more unemployment
than a wage contract, even though all agents could be better
off under that share contract. Higher unemployment in an
economy will generate a contraction in aggregate demand, and
thereby worsen all firms' economic positions. This implies a
downward shift in the distribution of s, which firms confront.
If this is true, that kind of share contract may not be
desirable at a macroeconomic level. Therefore, my next
question is whether a share contract could guarantee higher
employment. The answer is affirmative as summarized in

following proposition.

23
PROPOSITION 6. There exists a share contract with is Pareto-
superior to, and has no less employment than the optimal wage

contract specified in (C.2).

PROOF. The optimal wage contract is denoted by
a; = m“. 2“. s. 2*(8). 11}-
Consider a share contract, 53 = {e(s), w“(s), e(s), E(s),
m(8)}
satisfying
(19) e(S) = a.
(20) 0(8)Sf(sE(S))/E(S) = Re.
(21) W“ = a“.
(22) U(G(S)8f(e(S)E(S))/E(S),e(S))

= (1-m(8))U(a(S)Sf(e(S)(E(S)-1))/E(S).0) + m(8)U(0:0)
for all states. Note that asf(g(E-1))/E,O) < £8, for any E,
as long as (20) holds. Consider the case in which E(s)=E*(s).
Since U(y_r°,§) = (1-m)U(w°,0)+mU(0,0), m(s) < m for any 8.
Therefore, 68 must Pareto-dominate 6;, when E(s) = E*(s).
Then the optimal form, 63*, of 68's also must Pareto-dominates
6;. This means that both the firm and the workers are better
off under 6". By Bellman's principle, for any state, both
are better off under 68*. Let 53* = {a*(s), 39, g, FN*(s),
m*(s)}. Suppose that E**(s) < E*(s), for some so. Then,
(E**/N>U<a*sf<ss**)/E**.g) + (1-E**/N>U(.w.“.o>

= (E"/N)U(xz°.s) + (1-E**/N)U(g“.0)

< <E*/N>U(x°.g> + (1-E*/N)U(y".0)

24

This shows that workers have lower expected utility under 63*

at so. This is a contradiction. QED

Now, in addition to Pareto dominating wage contracts, a
share contract generates less employment fluctuation than the
optimal wage contract. If workers are more likely to be
employed, there will be an increase in aggregate consumption,
which shifts up distributions of s, and improves firms'
profitability. An interesting implication of Proposition 6 is
that this is possible even when wages actually given to
workers are constant across states. Therefore, share
contracts will Pareto-dominate even at the macro level. One
shortcoming of Weitzman's analysis is that a firm has excess
demand for labor only if workers already employed are willing
to accept lower wages. (See Nordhaus [1988].) My model avoids

this problem.

IV. Conclusion

In contrast to Weitzman's work, my model, which is based
on a framework of implicit contract theory, allows us to make
a complete welfare comparison between two different wage
compensation schemes -- fixed-wage and share compensations.
Fixed wages across the states are predicted rather than
assumed in an ad hoc way. Therefore, the welfare comparison
given this paper is less open to criticism than comparisons

given by other studies.

25

Wage contracts can be regarded as a special form of share
contracts. A share contract predetermines some fixed portion
of wage compensation, with the remaining portion tied to the
firm's performance in the product market. A fixed-wage
contract, which is the optimal wage contract in my model, is
a share contract with no variable portion of compensation.
This suggests that fixed-wage contracts would be a suboptimal
choice among share contracts. My model shows that this is the
case. If workers have no incentive to shirk on their job,
the wage and the optimal share contracts are identical. Wage
contracts are always characterized by full employment in order
to insure risk-averse workers. However, if workers have some
incentive to shirk, the optimal wage contract specifies fixed
wages across the states of nature, generating lay-offs in bad
states. In this case, there exists a share contract which.not
only Pareto—dominates the optimal wage contract, but also has
no less employment for any state. This is possible because
workers' incentive to shirk decreases under share contracts.

In Weitzman's model, firms could have excess demand for
labor only when their employed workers are willing to accept
lower wages. The employed workers will accept only if the
share contracts they accept generate sufficient.macroeconomic
externalities on the whole economy that their suspended self-
interests are ultimately compensated. For this to be true,
substantially large portion of sectors in the economy must
adopt the share contracts, and the share parameters should be

based on the exact information on the economy, all of which

26

seem to be practically difficult. However, in my model, no
one suffers from a share contract even at a micro level. No
one has to wait until the macroeconomic externalities
compensate his suspended self-interest. Furthermore, the
share contracts will increase the economy-wide employment
level, even when the share sectors are small. Exact
information on the economy is not required to find this form
of share contracts, as suggested by PROPOSITION 6. Therefore,
the implementation of the share contracts will not be very
costly.

However, even though my model provides an argument in
favor of the share economy, it is early to draw some policy
conclusions. Unfortunately, my model fails to provide a
clear-cut answer to a different question: Why does an economy
resist converting from a wage to a share economy, if share
contracts are really superior to wage contracts? I can offer
only some partial intuition. under a share contract, each
worker's wage depends on other agents' work effort. If there
is a shirker, all workers suffer from lower'wagesw Therefore,
some burden of monitoring should fall on the workers
themselves -- each worker becomes more sensitive to the other
workers' behavior. This generates some kind of psychological
cost to workers. In this situation, workers may prefer wage
contracts. From standpoint of the firms, a share contract
will reduce a management's discretion on production. A share
contract can be successful, only if workers and a firm share

the. exact information. concerning' the firm's real. market

In

27
situation and its true total revenue. In other words, a
credibility problem arises in a share economy. This means
labor's participation in management is a necessary condition
for a successful share contract. In this case, a management
cannot efficiently cope with an abrupt change in the firm's
market situation, because any decision of the management
should wait for its workers' approval. A longer decision
process will reduce the firm's profitability. Therefore,
firms might also prefer wage contracts. These are just some
possible reasons why an economy might be characterized by wage

contracts. Clearly, this question requires further study.

28

ENDNOTES

1. Weitzman [1985], p. 3.

2. Ibid., p. 3.

3. Ibid., pp 118-122.

4. The total differentiation of (13), (14), (15), (16), and
(17) yields:

 

- o U:w(6/N)+C'wa/Gm-C'Gw smw/s: U:e(9/N)+C'Gwe /Gm
o U:w(6/N)+C'Gew/Gm-C'Gw Gmw/s; sf"E+U:e(8/N)+C'Gee/Gm

-Gm/E -¢Gmw/E o
0 U: (6/N)-1 sf"eE+sf'+U:(0/N)

_ 0 GW Ge

c"sw/sm+c'swm/Gm o - r d¢ . - o 1

C"Ge/Gm sf"e dwe -f'

c" ¢Gm/E2 de = 0 ds

-C' sf"e2 dE -f'e

Gm O . _ dm d _ 0 _

 

 

 

 

 

5. See Balfour [1987], pp 300-328.

APPENDIX

29

APPENDIX

In Section II, I assumed that each worker's hours on the
job are fixed. This assumption is not required to obtain
fixed wage under the optimal wage contact. To show this,
redefine e as the hourly effort level of a worker, and we as
the hourly wage rate. I assume that working hours (h) are
perfect substitute for hourly working effort (e) in the
production process. First, suppose that workers have no
incentive to shirk on the job. Let
6w = {E,e,h,w°,w“}

For simplicity, I suppress s. The optimal contract, Sw‘",
solves

(11.1) max5 E311 = E8{sf(ehE)-wehE-w“(N-E)}

subject to

(A.1-1) Ea{(E/N)U(w°,h, eh)+(1-E/N)U(w“,0)} 2 U0

We can easily show that PROPOTISION 1 still holds for (A.1).
I omit the proof, since it is similar to that of PROPOSITION
1.

Now, consider the case in which workers have incentive to
shirk. Describe a wage contract by

6 = {E,e,h,w‘,w“,m}.

w
The optimal contract 6w*, solves

(A.2) maxa E3{sf(ehE)- ehE-w“(N-E)-C(m)E}
subject to

(A.2-1) EB{(E/N)U(w°h,eh)+(1-E/N)U(w“,0)} 2 U0

(1.2-2) U(w°h,eh) z (1-m)U(w°h,0) + mU(0,0)

30

Consider the first order conditions with respect to e, we,‘and
h:
(1) sf'hE + 0(E/N)Ueeh + ween = o
(2) -E + (E/N)U°W + ¢(er-(1-m)er) = 0
(3) sf'eE - weE + 6((E/N)erwe+(E/N)Ueee)

+ ¢(erwe+Ueee-(1-m)erwe) = 0
where Ue denotes the utility of shirkers, U(weh,0). Observe
that one of (1), (2), and (3) are a linear combination of the
others. This means that we cannot determine e, we, and h,
separately. To avoid this problem, let
ee s eh
w = weh.
Substitution of e° and w into (A.2) gives us (C.2).
Therefore, the optimal choice, across the states of nature,
must fix e° and w at ge and g, respectively. Total wage
compensation for each worker (w) and total working effort per
worker (e°) are independent of the states. Firm can
arbitrarily choose fixed working hours before a contract.
Then, as in PROPOSITION 2, we have fixed hourly wage and
effort level under the optimal contract.

(A.2) contains another interesting implication. One may
consider the case in which firm predetermines e at some level
regardless of the states. Suppose that e=1. Then, (A.2) is
still identical to (C.2) except that h replaces e, and that w
(=Meh) takes the role of we in (C.2). Substituting w and h
into and solving (C.2) will generate constant w and h. That

is, when e is predetermined, the optimal contract has the

31
constant we during a contract period. This result is
surprising, because conventional implicit contract models
usually fail to predict fixed wage rates when working hours
are allowed to vary. If effort level is fixed, and if NSC
(A.2-2) is ignored, (A.2) becomes a standard contract model.
(See p. 8 in Cooper [1987].) Without NSC, (A.2) fails to
generate fixed wage rate, unless there are some fairly strong
assumptions on the form of workers' utility function and on
variations in hours. Therefore, NSC in (A.2) has a crucial
role in generating fixed wages across the states of nature
under the optimal contract. By this reasoning, it is safe to

say that we do not have to assume fixed working hours.

REFERENCES

32

REFERENCES
Balfour, A. (1987), o - ana eme ons Chan
Beehehy, Prentice-Hall, Inc., Englewood Cliff.
Cooper, R. (1987), We e and o e at s i abo
QQDLIAQES; Mierefeundehions ahd Macgoeconehie

Implieehiehe, Harwood Academic Publishers, New York.

Cooper, R. (1988), "Will Share Contracts Increase Economic

Welfare?," The Amegican Eeonomie Review, Vol 78,
pp 139 - 154.

Cooper, R. and John A. (1988), "Coordinating Coordination

Failure in the Keynsian Models," Quartezly Qeuzhal ef
Economies. August. pp 441 - 463-

John, A. (1987), "Employment Fluctuations in a "Share
Economy", Working Paper, Michigan State University.

Miller, R.E. (1979), ami O imizat' a ono

Applications, McGraw Hill Inc.

Milton, H. and Artur, R. (1978), "Some Results on Incentive
Contracts with.Applications to Education and Employment,
Health Insurance, and Law Enforcement," mm

Eson2m12_8exier. Vol. 68. pp 20 -30-

Nordhaus, W. (1988), "Can the Share Economy Conquer

Stagflation?" Quereegly;gourhal of Economies, February,
pp 201 - 223.

Nordhaus, W. and John, A., eds. (1986), "The Share Economy: A

Symposium," gogrna; oi gomparaeive Economics, Vol. 10,
pp 414 - 473.

Shapiro, C. and Stiglitz, J.E. (1984) , "Equilibrium
Unemployment as a Worker Discipline Device, " The mezicah
Ecenomie Review, Vol. 74, pp 433 - 444.

Weitzman, M.L. (1984), The Shaze Ecenomy, Harvard University
Press, Cambridge.

Weitzman, M.L. (1984), "Profit Sharing as Macroeconomic

Policy." WW. Vol. 75. pp 41 - 45-

Weitzman, M.L. (1985), "The Simple Macroeconomics of Profit

Sharing," The hmegieah Economic Review, Vol. 75,
pp 937 - 953.

Weitzman, M. L. (1987) , "Steady State Unemployment Under Profit
Sharing," The Ego DQEiC Jehgha ai, Vol. 97, pp 86 - 105.

Chapter 2

The Joint Estimation of a Model of Labor Force
and Employment Decisions and Market Wages

I. Introduction

Much of the empirical literature on the labor supply
decision simply assumes that individuals can obtain jobs once
they decide to enter the labor market. That is, the
employment status of an individual is determined by only one
selection criterion —- the individual's decision on whether to
work. In many models used to explain employment status,
individuals are categorized into two groups: employed and
nonemployed. The unemployed and non-participants are treated
as behaviorally identical in their decision process, and both
are regarded as one single group -- the nonemployed. These
models implicitly ignore frictions in the labor market. It is
a well-known fact that unemployment is not simply explained by
individuals' work incentives. According to the single
selectivity criterion based on employment and nonemployment,
both the unemployed and people not in the labor force (NLF)
choose not to work because their reservation wage rates are
greater than their market wage rates. Therefore, the
unemployment status of an individual is purely voluntary. In
this sense, I call these traditional models No-Friction
models. The aim of this paper is to provide estimates of the
determinants of the employment status of married women and

their market wages by using two different selection criteria

33

34
-- preferences for work and ability to become employed.

Some studies show the importance of the existence of the
unemployed in a given data set. Flinn and Heckman [1983],
applying a duration model to young men selected from the
National Longitudinal Survey, reject the hypothesis that the
classifications unemployed and NLF are behaviorally
equivalent. Ham [1982], using a sample of prime aged males
taken from the University of Michigan's Panel Study of Income
Dynamics (PSID), shows that the estimates of parameters in an
equation for work hours are biased if the unemployed or
underemployed workers are ignored. Also, Blundell, Ham, and
Meghir [1987] reject the Tobit model based on the traditional
No—Friction model, using a sample of married women drawn from
the UK Family Expenditure Survey of 1981.

We categorize individuals into three different groups:
employed, unemployed, and non-participants. A married woman
is assumed to enter the labor market if her reservation wage
is less than the prevailing market wage. However, not all
individuals who decide to enter the labor market get jobs
immediately. A woman is employed only when she matches with
an employer who is willing to hire her. An individual who is
better able to find potential employers will have a higher
probability of being employed. In this sense, I call the
model in this paper a Friction model. For this model, we may
construct a job-match equation which can distinguish between
the employed and the unemployed in a probit framework. Labor-

force and employment decisions can be jointly explained by a

35
bivariate probit model with partial observability. (See Meng
and Schmidt [1985], and Farber [1983].)

This paper also estimates the parameters in the wage
equation by the maximum-likelihood estimation (MLE) method.
We observe only the currently employed workers' wages. That
is, the observed distribution of wages depends not only on
individuals' decisions about labor-force participation, but
also on their ability to find jobs. The data collected,
therefore, will have two types of selection biases. If we
estimate labor—force and employment decisions and the wage
equation jointly, the parameters in equations for labor-force
and employment decisions will be estimated more efficiently.
At the same time, the conventional loglikelihood ratio (LR)
test is applicable for the hypothesis of no selection bias.

The extension of Heckman's simple two-stage estimation
method is usually used in other studies for cases where two
selection rules generate the sample. (See Fishe, Trost, and
Lurie [1981], and Ham [1982].) The selectivity regressors
used in the least squares (OLS) estimation of the wage
equation are generated by a bivariate probit model. We can
easily apply this extended two-stage estimation method to the
case where employed, unemployed, and NLF people are observed
separately. Other studies usually use an F-statistic for the
test of the joint significance of the selectivity regressors.
In Heckman's simple selectivity model, the standard t-
statistic for the selectivity regressor has been used for the

test of no selection bias. Melino[1982] (also, Lin[1982])

36

shows that this t-statistic is asymptotically equivalent to
the Lagrangean Multiplier (LM) test statistic. Likewise, this
paper shows that the F statistic for the hypothesis of no
selection bias for the model with two selection rules is
asymptotically equivalent to the LM test" This means that.the
F-statistic has good power properties, at least
asymptotically.

The empirical results in this paper reveal that the
Friction model explains a married woman's labor status better
than the No-Friction model. The joint estimation of labor-
force, employment status and the wage rate generates more
reliable estimates. There is significant evidence of
selection bias from ignoring labor-force and employment
status. The correction for sample selection bias in the
extended two-stage method turns out to be not quite as
satisfactory as that by the MLE method.

This paper is organized in the following way. Section II
describes the basic model for frictions in the labor market.
Section III summarizes the data, and describes the explanatory
variables used for the empirical study. Section IV
demonstrates the empirical results. Some concluding remarks

follow in Section V.

II. Model
This section explains the basic model based on the
assumption of frictions in the labor market. Notice that not

all married.women who enter the labor market get jobs. ‘We may

37
assume that if an individual has higher market wage than her
reservation wage, she enters the labor market. However,
unless she matches an employer who is willing to hire her, she
remains unemployed. The Current Population Survey (CPS), the
National Longitudinal Survey (NLS), and the Panel Study of
Income Dynamics (PSID) provide the data on three different
groups of married women: the employed, the unemployed, and the
non-participants. In this case, we need another employment
criterion to consider the unemployed people separately from
the NLF people. We may imagine that each individual has her
own job-match skill. Those who have better job match ability
will have a higher probability of being employed. Let "1* be
the index for the i'th individual's job-match skill; wi, the
market wage; win, the reservation wage. If "1 2 wiR, the i'th
individual participates in the labor market. If M1' 2 0, she
matches an employer, and get a job which pays her wi. If “1
< win, she retains her NLF status. When "1 z w1* and “1*‘< 0,
she is in the labor market, and remains unemployed. “1 is
observed if and only if the i'th individual is employed.
Therefore, individuals' behavior in the labor market is
determined by the three variables -- market wage rates,
reservation wage rates, and the job-match index. These are

summarized by Model I.

Model I. The Structural Model

R _
(1.1) w, - 21161 + 611

38
(1.2) ”1* = 22162 + 621

(1.3) “1 3 23163 + 631

611 _ ° 2:11 2312 2:13
621 N o 212 1 2:23
o i = 1,2,ooo,N.

‘31 r 213 323 333 v

(611,621,63i)' are independently and identically distributed.

The 351 are the observed lokj vectors of explanatory

variables.

yli = [1, if W1 ZWiR

0, otherwise.
y21 = 1, if 111* >= 0
[ 0, otherwise.
y21 is observed if and only if y11=1. wIR and ”1* are not
observed. wi is observed if and only if y11y21=1. Y11
denotes labor-force status (LF), and Y21 denotes ability to

find a job. Therefore an individual is employed if y11y2f=1.

For a simple estimation procedure for this model, let
(2-1) 911 = “31"11” (2311”333'2313)“2
(2.2) e21 - £21
(2.3) e31 = 631
Then,
(eli'e2ile3i)' ~ N(O, 0)

where

“=[ . 23
33

e era
PD
QQQ
(.1
w
L___J

39

1 (223'212)/(211+233’2213)§ (333'213)/(211+233'2213)§

' ° 1 223

e e 233

Let

(3 e 1) yli* (Vi-Win)[(211+233-2213)1/2

(2315341151) / (311+333"2Z13)1/2 +911
3 x1151 + 911

(3-2) Y21* = M1 = z2152 + 521 = X2152 + 921

(3-3) Y31 = "1 '3 z3153 "’ ‘11 = x3153 + 931

Now, we can rewrite Model I as follows.

Model II. Reduced-Form Model.

“-1) hi = x11191 + 911
(4-2) Y21 = x2152 4’ 921

“-3 Y31 = x3153 + 931

e11
821 ' N(0, Q) i = 1,2,000,N.
e31
where
1 p 013
n = o l 023
° ° “33
yli = I: 1, if ylif Z 0
0, otherwise.
0, otherwise.

40
Y21 is observed if and only if ydi=1. Y11* and Y21* are not

observed; y31 is observed if and only if y11y25=1.

If our interest is just in the joint estimation of labor-
force and employment. decisions, the jparameters in those
processes can be estimated by a bivariate probit method with
partial observability. (Specifically, this is the "Censored
probit" model of Farber [1983] and Meng and Schmidt [1985].)
Here we may distinguish two cases, depending on what is
observed. First, there is the case of partial observability
in the sense of Poirier [1980], in which we do not observe Y1i
or Yzi for anyone, but we observe (Y11Y21)° This corresponds
tarobserving only employment status (employed or not), but.not
labor-force participation status for individuals who are not
employed. Interestingly, we can still estimate separately the
labor-force participation and employment equations, using

Poirier' model.

Case I. Partial Observability in the sense of Poirier
y1i = 1, if yn" z o
[ 0, otherwise
yz1 = 1, if yn‘” 2 o
[ 0, otherwise. i = 1,2,ooo,N.
Only Y11Y21 is observed. Y11 is one if the i'th person in the
labor-force, and 1’21 is one if the i'th person is able to find

a job. Thus Y11Y21 is one if the i'th person is employed.

41

In this case, the maximum likelihood (ML) estimators of
81, 82, and p are derived by' maximizing the following
log-likelihood function with respect to £1, 32, and p :

(5) 1“ Lp(ﬂ1132rp) =1§1{Y11Y211U[F(X11511X2152:P )]

+ (1'Y11Y21)1n[1'F(X1151:X2152:P)1}
where F(o) is the bivariate standard normal distribution
function. This method can allow for frictions in the labor
market. However, the problem with this model is that the
information about who is unemployed is wasted. This model
basically categorizes the individuals in a given data set into
just two groups; the employed and the nonemployed. The latter
group consists of those who are not willing to work (NLF
people) and those who want to work but can not find jobs
(unemployed). This model does not identify who are who among
the nonemployed. Therefore, even though the estimates from
this model would be consistent, these estimates are.generally
inefficient, if information about who is unemployed is
available. (See Meng and Schmidt [1985].)

The second case we consider is the case in which we
observe Yéi when Y11=1 (though not when y1f=0). Thus we can
observe an individual's success or failure in job search only
when she is in the labor market. The unemployed people are
identified as those who are willing to work (being in the
labor market) but are not successful in their job search.
Since this model can distinguish the unemployed from the NLF
people, we can get more efficient estimates than we would get

from Poirier's model. In this case, the parameters can be

42

estimated by censored probit method.

Case II. Censored probit.

*

Y11=[1, ifyli 20

0, otherwise

*

y21=[1,ify21 20
0, otherwise.

yli is observed. Y21 is observed if and only if Yii = 1.

In this case, the ML estimators of £1, £2, and p are
derived by maximizing the following log-likelihood function

with respect to Bl, 82, and p :

N
(5) 1“ LC(31:52:P ) =121{Y11Y211n[F(xliﬁerZiﬁZIp)]

+ Y11(1-Y21)lnINXliﬁl)-F(X11ﬁlrxnﬁzm)]
+ (1’Y11)1n(1'e(X1131))}
where o(.) is the standard normal distribution.function- lThis
paper uses the censored probit model to estimate LF and
employment decisions.

The extension of Heckman's simple two-stage method is
usually used when two different types of selection biases
exist in the data set. The wage equation given in Model II
also could be estimated by the extended two-stage method. The
conditional mean of e3i‘will be given by
(7) 1“331' Y11*?°rY21*Z°) = E(eBil 9112’X1131r9212’32152)

¢(Xnﬁl)¢[ (X2132-9X11ﬁ1)/(1-92)"]

F(X1131:X2132r P)

+ 0'13

 

43

¢(X2132)9[ (xliﬁl-px2132)/(1-p2)g]

 

 

F(X11511x215219)
where
1 l 2 2
f(t ,t ,p) = exp[- -——————— (t -2pt t +t )]
1 2 2"(1_pz)1/2 2(1-p )2 1 1 2 2
and,
h k
F(h.k.p) = I I f(t1.t2.p)dt2dt1.
-oo -oo

where ¢(o) is the standard normal distribution function. We
can find this formula in Fishe, Trost, and Lurie [1981], Ham
[1981], Maddala [1987], and Poirier [1980]. (Also, see
APPENDIX A.) Now, we can rewrite (4.3) as

(4-3') Y31 = X3133 + or13l‘11 + or23"21 + V31

where E(v3ily11*20,y21*20) = 0, and

¢<xnal>¢r (X21B2-0x11ﬁ1)/(1-92)”]

 

 

“11 =
F(X1151:X2132r P)
¢<x2132m (xnﬁl-pxunz)/<1-p2)’*1
“21 =
F(X11311X21311 P)

The extended two-stage estimator can be obtained by the
following steps. First, estimate 81, 82, and p by the
censored probit model, and estimate “11 and “21 (ﬁqi,ﬁ21) by
using the estimated 81, 82, and p. We then have the estimates
of 51: 82, p, “11: 1:21; 81, 82,3, ﬁn, and [121. Second, regress
y31 on x31, 311! and £21 by OLS, using only the observed Y3i'
In order to find the asymptotic distribution of the extended

two-stage estimators, we just assume for now that

44

51' 31
(8) «N(£ -r) = «N 32- 32 ~ N( o. r >
3 - p

where 81, 82, 3 are the ML estimates of 31,32, and p generated
by the censored probit model. Let Ni denote the number of the
observed y31. For simplicity, assume that plim Nl/N = k,

0<k<1. Then, we can show (See APPENDIX B.) that

A

5 ' 3
(9) vu[ ,3 3 ] 4 N ( o, DBD' )
C'C

where
#1 = (#11 [‘21)

C = (013 023)

A11 = a“11ml" [('X11ﬁ1ﬂ11'l‘112'N‘31)X11:(#31'I‘11I‘21)X211
'((X1131‘sziﬁz)/(1'92) )I‘31'I‘11I‘31]
A21 = aﬂzi/ar' = [Warhil‘zﬂxnr('xziﬁzl‘zrl‘ziz'm‘n)X21:
-( (X2132'9X1131)/(1‘Pz) )u31-u21u31]
A1 3 aﬂi/ar' = [A11' A21']'
Q” = c'AitAj'c
"11 = 1'(°132/°33) (x1151u11+#112+9“31)
”(0232/033) (X2152“21+“212+P“31)
+2013023(#31'“11“21)/°33
n = Plim [ -§— [ 21:31LX31 2%:3%;fi ] ]
1 i 1 31 1 i 1
1 [ 21"11X31'X31 21”iix31'“1 ]
1

N 2."..u.'x E.n..

B = a oplim ,
33 1 11 1 3i 1 11ui ”i

45

N1 1 [ 2izjeijxai'xaj zizjeijxai'“1 ]
I I
N "i 21230in ij ziszijui “1

+ plim

where 21 denotes the summation over i from 1 to N1. If

oi3=o:3=0, then, u11=1, and Qif=°- In this case,

080' =

1 2.x 2.x
e33 [

.'x . .'u.
oplim 1 31 31 1 31 1 ]
N 2i“i"‘31 21“i'“1

1

which is the standard covariance matrix for the OLS estimator.

The more efficient are the estimators of 81, 82, and p

that are used, the more efficient the estimators of B3 and c

will be. Suppose r1 and r2 are consistent estimators of r

with corresponding asymptotic covariance matrices, t1, and.¢2,
respectively. Consider the second matrix in B:

[ Zizjeijxai'xsj zizjeiszi'“i ]

ziszijui'x3j ziszijui'“i

= [ Eixai'r'Ai ] [ EiXBi'r'Ai ]'= Gws'

Eiui'r'Ai Eipi'r'Ai
If tl-tz is positive semidefinite, so is G¢IG'-G¢26'. This
means that we can get more efficient estimators of 33 and c
using r2 than r1. Meng and Schmidt [1985] provide the
asymptotic covariance matrices of the estimates of £1, £2, and
p for Case I and Case II. Their comparison of asymptotic
efficiencies shows that Case II generates more efficient
estimators of 81, 82, and p than Case I. This means that Case
II is more desirable for the extended two-stage estimation
than Case I.

This extended two-stage method does not give us estimates

of all the parameters. We still have to estimate 033. For

th

65

46

this, denote the residuals for the first N1 observations
obtained from the second stage as $3,, i=1,2,---,N1. Then an
estimate of 033 is
10) 333 = (l/N1)31{:’312 + a132[(x11§1)l’;11 +3112+53311

+ 3232I(x2152)£21+3212+3£311 ' 2313323(&31'ﬁ11321)}
where (131 = f(xnﬁl,x2jﬁz,p)/F(xliﬁl,x2132,p). We can show that
plim 333 = 033 (See APPENDIX A.) This completes the
estimation of all parameters by the extended two-stage method
for Model II.

In Heckman's simple selectivity model, the test for
sample selectivity is usually done by the t-statistic for the
estimated coefficient of the selectivity regressor. Melino
[1982] and Lin [1982] show that this t-test is the same as the
LM test of the hypothesis of no selectivity bias, so it has
good asymptotic power properties. In the same way, we can
perform an F-test for sample selectivity in Model II. If 013
= 023 = 0, then
E(eail9112'X1131r9212'X2132) = 0°
Therefore, sample selection bias is not produced by applying
OLS to (4.3) directly. Furthermore, as we see later, the
conditional distribution of e3.1 given enz-xnﬁl, and eziz'xziﬁz
is normal and homoscedastic for all i = 1,2,ooo,N1. Hence, we
can use the conventional F-statistic to test for sample
selectivity. For notational convenience, let
x =

3 [x31' x32"” X3N'1'
1

A=AIAI Art
n [#1 #2 ... “N11

47

Z= [X3 H]

e =m3'c'1'

y3 = [Y31 y32 "' Y3N11'

SSEO = (y3-26)'(y3- ze)
_ r '1 '

Mx — I - x3(x3 x3) x3.

Then, under the null hypothesis that c=0, the test statistic,

A'A' AA
c u quc/Z

 

(11) F =
SSEo/(Nl-kg)

is treated as if it were distributed as F(2,N1-k3). It can be
shown that this F-test is asymptotically identical to the LM
test. (See APPENDIX C.)

Model II allows us to estimate equations for labor-force
and employment decisions and the wage rate jointly. The
log-likelihood function for Model II can be derived, after
some straightforward but messy algebra. The conditional

distribution of the e31 is given, for i =1,2,ooo,N1, as

(12) g“331'‘e1iz""1i‘31'eziz'xzi‘gz)

 

1 1 2
= ...[-__ . .]
(2n)e(a33)§ 2“33 31
.F[ x1151+(e13/e33)e31 x2132+(023/e33’e31 p"33"’13"23 ]
(("33"°§3)/"33)1/2 (("33"’23)/"33)1/2 (“33'013)3(°33'°23)%

/F(X11ﬁ1rx2132: P)

48
(See APPENDIX D.) Then, the conditional distribution of y3i

given its observability is:

(13) h(Y31|Y11*20 1Y21*20)

 

 

 

1 1 2
= (2n)e(a3 3)3 exp[ 2"33 (Y3i x3133) ]
.F x1ie1+(e13/°33)(Y31’x3153) xziez+(°23/°33)(Y3i'x3ie3)
((a -o2 )/a )1/2 ' (<0 -o2 )/a )1/2 '
33 23 33 33 23 33

9° 33' “13°23

1/2 2 1/2 ]/F(Xliﬁ1vxziﬁz.p)

(033 013) (0 33-023)

The log-likelihood function is:

(14) 1“ Ln(ﬁ1.ﬁ2.ﬁ3.9,013,023,033)
-j§1 {Y11Y2118[h(y31IY11*Z°rY21*Z°)Pr(Y11*Z°rY21*Z°)1
+ YnU-‘Yzﬂ 1n[Pr(Y11*20:Y21*<0)]
+ (1-yli)ln[Pr(y¢i*<0)1}
ﬁg]. {Y11Y2i["(1/2)1n2’(1/2) 11‘1033-(1/2033) (Yai‘x3133)2
+ ln(Hi)] + yn(1-y21) ln(Oli-Fi)
+ (1-yli)ln(1-§li)]

where,

 

H=F[ x1iﬁ1+(°13/°33)(Y3i'x3iﬂ3) x2iﬁz+(°23/°33)(Y3i'x3ie3)

“"33"’§3)/"33)1/2 ' “"33"’§3)/"33)1/2 ,

9° 33' “13°23

1/2 1/2 ]
(e 33 “23)

33 013) (0

F1 = F(x1151rx2152rp)
“11 = e(x1151) -

If od3=azy=0, the log-likelihood function given in (14)

49

becomes:
(15) 1“ L(51132a53191033) N

= -(N1/2)1n2 - (NI/2)lna33 - (1/2033)1E:1(y31-x3iﬁ3)2

+ 1n LC(31:32:D)

where lnLC is given in (6). (15) implies that (33',a33)' and
(81',52',p)' can be estimated separately. (83',o33)' can be
efficiently estimated by applying the OLS to (4.3), and
maximizing (6) provides the ML estimator of (81',82',p)'. As
we see in (12), if 013 = 023 = 0,

(16) 9(e3ileiiz'xiiel'eziz"x2132)

 

1 1 2
= exp[-———— e .]
(2103(033)3 2“'33 31

This shows that the conditional distribution of en is normal,
and that all 931'5 for i=1,2,ooo,N1 are homoscedastic. That
is why’ the OLS estimator of (33',o33)' is efficient if
al3=023=0.

One thing is noteworthy. In Model II (the Reduced-Form
model) there is no sample selection bias if “13:023=°° In
Model I (the Structural model) this means that E33-El3=223=0.
If 2§3=0, the wage equation given in (1.3) becomes
deterministic, which is difficult to believe. Also, it may
not be usual that.2b3=213. Therefore, Model I almost surely
has a selectivity problem.

All the structural parameters in Model I also can be
directly estimated by the ML estimation method. We can

easily derive the log-likelihood function for Model I:

ll—

50
(17) ln L1(51,62,63,2)
= :zi1{yny21[-(1/2)ln2-(1/2)lno33

41/2033) <w.-z..6.>2+H.J + ylul-nnlnmu-Fn

+ (1-y11)ln(1-§11)}
where
P = (323“312)/(211"""33'2’313)1/2
013 = (333413”(211+333'231;a)1/2

F1 = ““1151: 23153: P)

 

 

 

_ “1131+(e13/z33’(“1'23163) 22162+(223/333)(”1’23153)
H-F 2 -a2 )/2 )1/2 ' (<2 -22 )/z )1/2 '
‘( 33 13 33 33 23 33

p233"“13223 ]
_ 2 1/2 _ 2 1/2
(233 013) (233 223)

F1 = Fh‘nﬁirxziﬁzrp)

’11 = e(xnﬁl)

x1181 = (23153“21151)/(311+“333‘2213)1/2

Maximizing (17) with respect to 61,62,63, and E generates the
ML estimates of the parameters. For identification at least

one variable in 231 must not be included in 211°

III. Data

The sample of married women is taken from the University
of Michigan's Panel Study of Income Dynamics (PSID) for 1981.
These women are between the ages of 18 and 60 years. The
sample excludes wives who are in the agricultural sector;
self-employed; retired; disabled; students; or not in the

continental U.S.. Wives whose total family money income in

51

1980 is less than $5,000 are also excluded from the sample.
Black, White, and Hispanics are included in the sample, but
people of other races are eliminated. Some respondents'
answers to various question items are inconsistent. For
example, some wives are reported as working in 1981, but are
recorded as having zero hourly wage rates. Those observations
with. unreliable answers are also excluded” .After 'this
process, the sample contains 1962 observations. Of these, 923
people are recorded as working at the time the survey was
taken in 1981, and are therefore: categorized as the employed.
956 women are reported as housewives, and they belong to the
NLF group. This means that 48.7% of women in the data are out
of the labor force. 83 women, or 8.2% of those in the labor
force, are looking for jobs or temporarily laid-off. These
women are regarded as the unemployed.

The definitions, means, and standard deviations of the
variables used in the analysis are shown in Table 1. The mean
hourly wage rate shown in Table 1 is quite low. This is
misleading because more than half of the women in the sample
are not currently employed and their "wage" is recorded zero.
If we consider only the employed, the average hourly wage is
$5.87. Other earned and unearned income (OFINC) could affect
a wife's labor-force participation and ability to get
acceptable job offers. This variable is obtained by
extracting the wife's labor income from total family money
income. Since the logs of reservation wage and market wage

are used in the model, I choose the log of OFINC (LOFINC) as

52
a regressor. Some families in the data have zero OFINC.
Therefore, one is added to OFINC to calculate LOFINC; i.e.,
LOFINC=1n(OFINC+1).

Regional effects are captured by city size and area of
residence. The dummy variables, URB and REGS represent
residency in an SMSA.and.in the South, respectivelyu Regional
effects could be investigated in more detail if more dummy
variables for regions were created. However, in this case, a
proportionate increase in computational cost would follow.
Demographic variables, such as years of education (ED), age
(AGE), the number of children below'the age of 6 years (KIDS),
and a dummy variable for race (MINOR), are also used. Blacks
and. Hispanics are grouped as a single 'minorityu Work
experience could affect.the market.wage an individual can.earn
once employed or her job-match skill. The actual number of
years worked since the age of 18 (EXP) is used to capture this
effect. Finally, this study includes the local unemployment
rate in order to capture differing demand conditions across
areas. The PSID reports the unemployment rate in the
respondent's county. This variable (UNEMPR) is used as the
local unemployment rate.

The explanatory variables in the equation for labor-force
decisions are the constant; ED; URB; MINOR; REGS; UNEMPR;
LOFINC; KIDS; AGE, and AGE squared divided by 1,000 (AGEZ);
EXP, and EXP squared divided by 1,000 (EXP2). The vector of
explanatory variables in the job-match equation includes the

same variables in the labor-force decision except AGE and

53
AGE2. The explanatory variables in the wage equation are all
the same variables as in the labor-force decision except

LOFINC and KIDS.

1v. Empirical Results.

The first column of Table 2 reports the estimates of a
simple probit model for the No-Friction model. The last two
columns describe the results of the simple probit models for
the Friction model, assuming zero correlation between error
terms in the labor-force and employment decisions. Both the
first and second columns are about willingness to work.
However, the results in these two columns are derived by
different treatment of the unemployed. The model for no
friction does not regard those unemployed people as having a
desire to work, while the model for friction does. In spite
of these differences, the estimates in the second column are
generally similar to those in the first column and the sizes
of effects and their signs satisfy our expectation. This
similarity may come from the fact that only 4.2% of women in
the data are unemployed, so that their treatment will not
change the results dramatically.

Some differences exist, though, between the results in
the first and the second columns. First, the effect of race
is four times as large in the second column compared to that
in the first column. In fact, MINOR is insignificant when.the
unemployed are considered as preferring not to work. On the

contrary, as we see in the second column, when we interpret

54

the unemployed as willing to work, MINOR becomes significant
at the 10% level. This means that the model for no friction
understates the effect of race on the preference to work.
Second, the coefficient.of'UNEMPR.is -3.77 (significant at the
1% level) under the assumption of no friction while it is
-2.38 (significant at the 10% level) under the assumption of
frictions in labor market. Therefore, the No-Friction model
seems to exaggerate the effect of the local unemployment rate
on the willingness to work.

The third column in Table 2 shows the estimates of
parameters for the employment status equation based on the
Friction model. MINOR has a significantly negative effect on
employment status. This confirms our expectation. Blacks and
other minorities are more willing to enter the labor force,
but either their job-search ability is less than that of
whites, or there is discrimination in employment. This may
also explain why the No-Friction model underestimates the
effect of race on the preference for work. ‘Under the Friction
model, MINOR has two opposite effects on employment status.
Minorities have higher probabilities of being employed because
they are more likely to be in the labor force. At the same
time, they are more likely to be unemployed because of their
poor job-search skills. These two opposite effects are
captured by one equation under the No-Friction model, and
therefore, these opposite effects cancel out. The Friction
model can capture those different effects of race separately

by two different equations.

55

The other notable.result in the second column is that.the
local unemployment rate has a huge effect on employment
status. ‘This implies that.the demand side of the labor'market
is a major factor determining an individual's employment
status. The effects on employment status of the local
unemployment rate are twofold. First, as we see in the third
column, a higher unemployment rate decreases the probability
of an individual being in labor force. Second, once she
enters the labor market, a woman has a lower probability of
being employed. These two different negative effects of the
unemployment rate are captured by the single equation for
preference for work under the model for no friction. This may
explain why the Friction model generates the exaggerated
effect of UNEMPR on preference to work.

Even though the Friction Model explains an individual's
behavior more completely than the No—Friction Model, there is
no direct method for discriminating between the two models,
because one is not nested in the other. One roundabout.way is
to compare the goodness-of-fit of the different models. In
probabilistic-choice models, the proportion of successful
predictions of the choices made is widely used for the measure
of goodness-of-fit. (See Maddala [1987], pp 76 - 77.) Table
3 describes the frequencies of predicted outcomes for the No-
Friction Model, while Tables 4-A and -B show those for the
Friction Model. In Table 3, 69.7% of predicted outcomes are
correct. According to Tables 4-A and -B, the Friction Model

correctly predicts 76.1% of the total outcomes. The Friction

56
Model shows better predictive power than the No-Friction
Model.

In short, as we see in Table 2, the Friction model
explains an individual's labor-force status in a more complete
way than the No-Friction model. Some variables have opposite
effects on employment status, leading the No-Friction model to
understate the effect of those variables on the‘willingness to
work. On the contrary, if some variables affect employment
directly and indirectly through the labor-force decision but
in the same direction, the No-Friction model overestimates
their effects on the willingness tO‘workh These results imply
that recognizing friction in the labor market can provide a
more reliable explanatory mechanism for the supply approach to
analyzing employment status.

Until now, we have assumed zero correlation coefficient
between labor-force and employment decisions. Table 5 reports
results for the censored probit model, allowing a non-zero
correlation coefficient. A test of zero correlation yields
the LR statistic of 4.89, larger than the critical x2(1) value
of 3.84 at the 5% level. Also, the conventional t-test shows
significance of p at the 1% level. In spite of this fact, the
results in Table 5 are generally close to those in the last
two columns of Table 2. This is not surprising, because the
estimates in Table 2 obtained under the assumption of zero
correlation are still consistent. Some small differences also
followu If we allow p to be different from zero, the signs of

some of the coefficients in the employment decision equation

57

become more reasonable. For example, compared with those in
the third column of Table 2, the estimated coefficients of ED
and LOFINC become positive and negative, respectively,
following our expectation. MINOR has an insignificant
coefficient in Table 5, but the sign of the coefficient is
still negative. The number of years of work experience has no
significant.explanatory power for employment.status, when.zero
correlation is assumed. However, allowing non-zero
correlation reveals significant effects of work experience and
the expected inverted-U shape. Finally, we can test the
hypothesis that the coefficients entering in both labor-force
and employment decisions have the same sizes of effects on
those decisions. Under this null hypothesis, the restricted
log-likelihood function (results not presented) has the value
of -1421.76. Then the value of the LR statistic is 60.50,
which is considerably larger than the critical x2(10) value of
23.21 at the 1% level. This implies that the explanatory
variables affect labor-force and employment status separately
in significantly different ways.

Jointly estimating' labor-force and employment decisions
and the wage equation could generate more efficient estimates
of the parameters describing labor-force and employment
decisions. This could be done based on the Reduced-Form model
given by (4.1)-—(4.3) . Table 6 shows the ML estimation results
for this joint estimation. Compared.with the results in Table
5 derived by the censored probit model, many estimates of the

parameters describing employment status become significant, or

58

more significant. For example, ED, URB, MINOR, and LOFINC
have no significant effects on employment status when they are
estimated by the censored probit model. (See Table 5) They
become significant at the 5% level when the joint model is
estimated, as we see in Table 6. 'This is due to the increased
efficiency of MLE for the joint model as compared to the
censored probit model. The estimates in the first and the
second columns of Table 6 are generally close to their
counterparts in Table 5. MINOR has an insignificant effect on
labor-force status in Table 6 while having a significant
effect in Table 5. However, the sizes of the effect of MINOR
described in the two tables are not substantially different.

Almost all the estimates in Table 6 have the signs we
would expect. More educated people and residents of bigger
cities have a higher probability of being in the labor force,
and are more likely to be employed. They also obtain higher
wage rates once employed. Minorities are more willing to
participate, but their probability of being unemployed is
higher. Even when they find jobs, their wage rates are lower
than those of whites. .Age has no significant effect on labor-
force status or the market wage rate. (This is because
experience is included in the equations.) The local
unemployment rate seems not to affect labor-force status
significantly. chever, the local unemployment rate has a
large significant effect on employment status. A higher
unemployment rate generates a greater likelihood of an

individual being unemployed, and substantially decreases the

59
hourly wage rate. The income earned by other family members
decreases a married woman's willingness to work and her
probability of being employed. This negative effect of other
family members' income on a woman's employment status is not
surprising in light.of economic search models.and.commonsense.
A woman with higher income due to others' earning will be
more selective in choosing jobs.

The number of children has very significant negative
effects on both labor-force and employment decisions, as we
would expect. This effect also could be explained by a search
framework. A woman with more children will have more burdens
of housework, which will lessen her intensity of job search.
Years of work experience have quadratic effects on LF and
employment status and the market wage rate. According to the
estimates in Table 6, more experienced women have a greater
desire to work, a higher likelihood of being employed, and a
higher wage rate.

Table 6 also shows that the error terms for labor-force
and employment decisions are significantly correlated with the
error term for the market wage rate. The covariances between
errors in the labor-force decision and the wage rate, and
between errors in employment status and the wage rate, are
positive and significant at the 1% level. 'The same unobserved
individual characteristics that make a woman more likely to
enter the labor force and to find a job also tend to lead to

an expectedly higher wage.

60

This result also implies the existence of sample
selection bias in the estimation of an equation for observed
wage rates. Table 7 shows the results of the restricted MLE
under the null hypothesis of no selection bias. The
restricted MLE of the parameters in the wage equation are just
OLS applied to the observations on the employed women, while
the restricted MLE for labor-force and employment decisions
are identical to the estimates obtained by the censored probit
model. Therefore, the results of Table 5 are repeated in the
first and the second columns of Table 7. A test of no
selection bias yields an LR statistic of 24.40, considerably
larger than the critical value for 12(2) of 9.21 at the 1%
level. This suggests that the OLS estimates could be
seriously biased. Comparing the third column of Table 6 with
that of Table 7, we can see that OLS underestimates the effect
of the local unemployment rate on the wage rate. Under the
null hypothesis of no selection bias, the estimated
coefficient of UNEMPR is insignificant, while the unrestricted
MLE shows a significant and large negative effect of UNEMPR.on
the wage rate. The OLS estimates also understate the effect
of work experience. The absolute values of the coefficients
of EXP and EXP2 from the unrestricted MLE are almost twice as
big as those obtained by OLS.

Sample selection biases can be corrected by the two-stage
estimation method. Table 8 reports the results. The
estimated covariance between the labor-force decision and.the

wage rate is significant and has a positive sign, while the

61

covariance between employment status and the wage rate is
insignificant and has a negative sign. The F-statistic, which
is asymptotically equivalent to the LM statistic, has a value
of 8.46, greater than the critical F value of 4.61. Again,
the null hypothesis of no selection bias is rejected. The
two-stage estimation method corrects the biases of the OLS
estimates for EXP and EXP2 quite well, and the estimated
effects of EXP are quite close to those obtained using the
unrestricted MLE. However, some estimates have an unexpected
sign. Even though AGE and AGE2 are not significant in either
the two-stage or the MLE estimates, in the two-stage estimates
older women are predicted to receive lower wage rates. This
is an unbelievable result. The estimated coefficient of
UNEMPR is also insignificant and has an unexpected positive
sign. Therefore, the two-stage estimates seem to fail in
successfully eliminating biases in the OLS estimates. Also,
the two-stage method produces an insignificant and negative
covariance between errors in employment status and the wage
rate.

Compared to the unrestricted MLE results on the effect of
race, the two-stage method underestimates the effect on the
wage rate. The estimated coefficient of MINOR is not
significantly different from zero. Considering many studies
showing wage discrimination, we expect a significant and
negative effect of minority on wage rates, while we get by MLE
but not by the two stage method. On this point, the MLE

result seems to be more reliable.

62

In short, the two-stage estimation method, as well as the
MLE method, confirm the presence of sample selection biases.
However, compared with the MLE, the two-stage method does not
successfully eliminate the biases of the:OLS estimates, and in
some cases generates perversely signed coefficients. These
results imply the superiority of MLE over the two-stage
method.

As a final step, Table 9 describes the result of the
joint estimation of the employment decision and the
reservation and market wage rates, which is obtained by
applying unrestricted MLE to the Structural model given in
(1.1)-(1.3). UNEMPR, EXP, and EXP2 are excluded from the
reservation wage equation. This is done for a computational
reason. For the identification of the equation for the
reservation wage, at least one explanatory variable in the
wage equation must be excluded from the set of explanatory
variables of the reservation-wage equation. The reservation
wage could be interpreted as a woman's value of leisure or the
nonmarket value of housework. In this sense, there is no
reason why‘UNEMPR, EXP, and EXP2 should affect the reservation
wage. .Also, the exclusion of these variables can be justified
by their insignificance in explanation of reservation wages.
If we include the variables, MLE on the Structural model
amounts to that on the Reduced-Form model. The log-likelihood
value given in Table 9 is not significantly different from
that in Table 6, rejecting the hypothesis of significant

effects of UNEMPR, EXP, and EXP2 on the reservation wage.

63

The estimated coefficients of the explanatory variables
in the employment decision and the market wage rate equation
are almost identical to their counterparts in Table 6. Error
terms in the reservation wage rate equation are significantly
and positively correlated with those in the market wage rate
equation (see 213 in Table 9), but are not significantly
correlated with those in the employment decision equation. A
woman with an unexpectedly high reservation wage rate tends to
get an unexpectedly high.wage rate when she is employed, while
the unexplained part of her reservation wage rate does not
affect the probability of her being employed.

The results for the reservation wage equation are shown
in the first column of Table 9. More educated people have a
higher reservation wage rate. This, however, does not mean
that more educated people are less willing to work, as an
extra year of education increases the market wage more than
the reservation wage. The high labor-force participation rate
of minorities can be also explained by the results in the
first column showing their lower reservation wage. The effect
of age on reservation wage is insignificant but positive. As
we expected, the more other income in a family, and the more
children the family has, the higher the wife's reservation
wage rate. The elasticity of the reservation wage rate with
respect to other income is 0.15. Table 9 also shows that
residents in.SMSAs or the South.have higher reservation wages.
There is no theory which explains the effects of region on

married women's reservation wages. However, the LR test of no

64
regional effects yields a 12 statistic of 117.95, considerably
above the 12(2) critical value of 9.21 at the 1% level. This
is a quite interesting result, and further examination seems

to be required.

v. Conclusion

This paper presents a joint estimation method for labor-
force and employment decisions and.market.wageu This is based
on a Full Information Maximum Likelihood procedure as well as
on a two-step method. Frictions in the labor market are
assumed, and. therefore the ‘unemployed. are recognized. as
behaviorally different from non-participants. The information
about unemployed workers present in a given sample can be used
to estimate the parameters describing the probability of a
particular individual's being employed. This joint estimation
method provides an explanatory mechanism showing the different
and separate effects a variable could have on labor-force
participation and employment decisions, as well as improving
the estimates of parameters in the wage equation.

The traditional labor supply model, which assumes no
friction in the labor market, does not explain married women's
employment status in a satisfactory way. Some variables could
affect employment decisions directly and indirectly through
preference for work. The No-Friction model, assuming that a
person's employment status depends only on that person's
willingness to work, cannot discriminate between those two

different effects, so that it usually generates biased

65

estimates of parameters describing preferences for work.

Compared to other methods for estimating labor-force and
employment decisions jointly (for example, bivariate probit
methods) , the estimation procedure including the wage equation
generates more significant and reasonably signed estimates.
A.given.data set usually contains a relatively small number of
unemployed people, and therefore the information on them may
not be enough to generate much more efficient estimates of
parameters in employment decisions. Jointly considering the
wage rate will be helpful for more efficient estimation of
parameters in labor-force and employment decisions. These two
decision rules can censor the observed distribution of wage
rates, if error terms in the two equations describing labor-
force and employment decisions are correlated with those in
the wage equation. Therefore, as this paper shows, the
information on wage rates can improve the efficiency of
estimates of parameters explaining labor-force and employment

decisions.

66

 

 

Table 1. Means, Standard Deviations and Definitions of
Variables
Variables Definition Mean S.D.
EMP Employed=1 0.4704 0.4993
NLF NLF=1 0.4873 0.5000
WRATE Hourly wage rate($) 2.7615 3.3966
LRATE Log Of WRATE 0.8002 0.8758
ED Years of Education 12.205 2.2735
URB Resident in SMSA=1 0.6865 0.4640
MINOR Nonwhite=1 0.2717 0.4449
AGE Years of Age 37.485 11.298
AGE2 AGE2/1000 1.5327 0.8973
REGS South=l 0.3435 0.4750
UNEMPR Unemployment rate in the 0.0743 0.0242
resident's county in 1980
OFINC Other family member's 24850.7 20372.2
income in 1980($)
LOFINC Log Of (OFINC+1) 9.9052 0.6822
KIDS Number of children 3 0.5076 0.7745
5 years of age
EXP Number of years worked 8.3578 6.9111
Since age 18
EXP2 EXP2/1000 0.1176 0.1912

 

67

 

 

 

 

 

 

Table 2. Simple Probit Models for Labor-Force Participation
and Employment
Model No Friction Friction

Dep. Var. EMP LF EMP

CONSTANT 1.9064*** 2.8483*** 0.7660
(0.6564) (0.6720) (0.8787)

ED 0.0688*** 0.0797*** -0.0099
(0.0151) (0.0156) (0.0299)

URB 0.1793*** 0.1403** 0.2031
(0.0693) (0.0699) (0.1369)

MINOR 0.0403 0.1615** -0.3846**
(0.0776) (0.0781) (0.1700)

AGE 0.0022 -0.0252 -
(0.0262) (0.0265)

AGE2 -0.4820 -0.2201 -
(0.3187) (0.3219)

REGS 0.1117 0.1008 0.1750
(0.0729) (0.0730) (0.1827)

UNEMPR -3.7654*** -2.3757* -7.7640***
(1.3442) (1.3645) (2.6920)

LOFINC -0.2713*** -0.3224*** 0.1147
(0.0575) (0.0603) (0.0885)

KIDS -0.4954*** -0.5131*** -0.2054**
(0.0383) (0.0386) (0.0941)

EXP 0.1371*** 0.1543*** 0.0332
(0.0176) (0.0177) (0.0334)

EXP2 -2.3910*** -2.8442*** -0.4396
(0.5982) (0.6012) (1.4115)

N 1962 1962 1006

Log L -1148.8 -1128.9 -265.01

 

* significant at the 10% level
** significant at the 5% level
*** significant at the 1% level
standard errors in parenthesis

4 :V' l"
7"“, .'"<

U)

68

Table 3. Frequencies of Actual and Predicted Outcomes for
the Probit Model of No-Friction.

 

 

Actual\Predicted =0 EMP=1 Total
EMP = 0 761 278 1039
EMP = 1 316 607 923

 

Total 1077 885 1962

 

 

 

69

Table 4-A. Frequencies of Actual and Predicted Outcomes for
the Probit Model of Labor-Force Status Based on
the assumption of Frictions in Labor Market

 

 

 

Actual\Predicted LF=0 LF=1 Total
LF = 0 636 320 936
LF = 1 276 730 1006
Total 912 1050 1962

 

 

 

Table 4-B. Frequencies of Actual and Predicted Outcomes for
the Probit Model of EMP Status Based on the
Assumption of Frictions in Labor Market

 

 

Actual\Predicted EMP=0 EMP=1 Total
EMP = 0 O 83 83
EMP = 1 0 923 923

 

Total 0 1006 1006

 

 

 

70

Table 5. Censored Probit Estimates of the Friction Model

 

 

 

 

Dep. Var. LF EMP

CONSTANT 2.9638*** 1.0227***
(0.6659) (0.8859)

ED 0.0793*** 0.0363***
(0.0157) (0.0321)

URB 0.1437** 0.2240**
(0.0703) (0.1232)

MINOR 0.1645* -0.2469
(0.0788) (0.1600)

AGE -0.0356 -
(0.0259)

AGE2 -0.0941 -
(0.3134)

REGS 0.1014 0.1876
(0.0735) (0.1585)

UNEMPR -2.2564* -7.7352***
(1.3726) (2.3775)

LOFINC -0.3161*** -0.0465
(0.0602) (0.1180)

KIDS -0.5164*** -0.3668***
(0.0393) (0.0947)

EXP 0.1566*** 0.0640**
(0.0177) (0.0301)

EXP2 -2.9305*** -1.1999***
(0.6015) (1.2245)

p 0.6159***

(0.2100)
N 1962 1006
Log L ~1391.51

 

* significant at the 10% level
** significant at the 5% level
*** significant at the 1% level
standard errors in parenthesis

71

Table 6. Joint Estimation of Labor-Force, Employment Decisions
and the Wage Equation (the Reduced-Form model)

 

 

 

 

Dep. Var. LF EMP LWAGE

CONSTANT 3.5160*** 2.2994*** 0.1640
(0.6600) (0.8264) (0.1949)

ED 0.0884*** 0.0698** 0.0764***
(0.0158) (0.0285) (0.0490)

URB 0.1547** 0.2559** 0.1910***
(0.0699) (0.1163) (0.0289)

MINOR 0.1057 -0.3364** -0.0959***
(0.0802) (0.1592) (0.0289)

AGE -0.0213 - 0.0079
(0.0258) (0.0104)

AGE2 -0.2192 - -0.1931
(0.3132) (0.1250)

REGS 0.0746 0.0967 -0.0629**
(0.0742) (0.1476) (0.0305)

UNEMPR -2.1860 -7.1815*** -1.1938**
(1.3617) (2.2748) (0.4973)

LOFINC -0.4172*** -0.2473** -
(0.0597) (0.1066)

KIDS -0.4565*** -0.2583*** -
(0.0391) (0.0877)

EXP 0.1530*** 0.0783*** 0.0493***
(0.0180) (0.0279) (0.0080)

EXP2 -2.8690*** -1.2782 -0.8524***
(0.6213) (1.1404) (0.2055)

p 0.6858***

**(*0.1833) *** ***

013,023,033 0.2215 0.2922 0.1442
(0.0515) (0.0353) (0.0127)

N 1962 1006 923

Log L -1664.86

 

** significant at the 5% level
*** significant at the 1% level
standard errors in parenthesis

72

Table 7. Restricted MLE Estimates of the Three-Equation

System (ai3=o¢3=0)

 

 

 

 

Dep. Var. LF EMP LWAGE

CONSTANT 2.9638*** 1.0227*** 0.3834**
(0.6659) (0.8859) (0.1899)

ED 0.0793*** 0.0363*** 0.0694***
(0.0157) (0.0321) (0.0053)

URB 0.1437** 0.2240** 0.1696***
(0.0703) (0.1232) (0.0253)

MINOR 0.1645** -0.2469 -0.0928***
(0.0788) (0.1600) (0.0276)

AGE -0.0356 - 0.0134
(0.0259) (0.0097)

AGE2 -0.0941 - -0.2147*
(0.3134) (0.1203)

REGS 0.1014 0.1876 -0.0809***
(0.0735) (0.1585) (0.0260)

UNEMPR -2.2564* -7.7352*** -0.5296
(1.3726) (2.3775) (0.4771)

LOFINC -0.3161*** -0.0465 -
(0.0602) (0.1180)

KIDS -0.5164*** -0.3668*** -
(0.0393) (0.0947)

EXP 0.1566*** 0.0640** 0.0273***
(0.0177) (0.0301) (0.0062)

EXP2 -2.9305*** -1.1999*** -0.4301*
(0.6015) (1.2245) (0.1888)

p 0.6159***

(0.2100)
N 1962 923
Log L -1391.51 -285.55

 

* significant at the 10% level
** significant at the 5% level
*** significant at the 1% level
standard errors in parenthesis

73

Table 8. Results for Two-Stage Estimation of the Wage Equation

 

 

 

 

 

Dependent variable LWAGE
Variables
CONSTANT 0.4133***
(0.2475)
ED 0.0769***
(0.0070)
URB 0.1504***
(0.0375)
MINOR -0.0178
(0.0516)
AGE -0.0020
(0.0119)
AGE2 -0.1283
(0.1382)
REGS -0.0894**
(0.0369)
UNEMPR 0.2323
(0.8870)
EXP 0.0505***
(0.0106)
EXP2 -0.8612***
(0.2792)
013 0.2553***
(0.0949)
(0.2556)
033 0.2042
R2 0.3321
N 923

 

* significant at the 10% level
** significant at the 5% level
*** Significant at the 1% level
correct standard errors in parenthesis.

74

Table 9. Unrestricted Joint Estimation of Employment
Decisions and Reservation and.Market Wage Rates (the
Structural Model)

 

 

 

 

 

Dependent Unobserved . EMP LWAGE
Variable Log of Reservation
Wage Rate
Variable
CONSTANT -1.5060*** 2.2593*** 0.1743
(0.2770) (0.8332) (0.1930)
ED 0.0451*** 0.0679** 0.0764***
(0.0075) (0.0289) (0.0049)
URB 0.1376*** 0.2594** 0.1918***
(0.0326) (0.1164) (0.0289)
MINOR -0.1336*** —0.3410** -0.0965***
(0.0382) (0.1601) (0.0314)
AGE 0.0116 --- 0.0056
(0.0112) (0.0101)
AGE2 -0.0710 --- -0.1668
(0.1416) (0.1217)
REGS -0.0853** 0.1032 -0.0610**
(0.1485) (0.0303)
UNEMPR --- -7.0411*** -0.9428**
(2.2617) (0.3994)
LOFINC 0.1473*** -0.2409** ---
(0.0284) (0.1080)
KIDS 0.1599*** -0.2568*** ---
(0.0220) (0.0882)
EXP --- 0.0786*** 0.0518***
(0.0278) (0.0075)
EXP2 --- -1.3000 -0.9316***
(1.1410) (0.1838)
211,212,213 0.1093*** 0.0510 0.0652***
(0.0210) (0.0745) (0.0191)
223,233 0.2890*** 0.1446***
(0.0366) (0.0127)
N 1962
Log L -1665.33

 

* significant at the 10% level
** s gnificant at the 5% level
*** s gnificant at the 1% level
standard errors in parenthesis

APPENDICES

75

APPENDIX A

In Model II,
0"“ E(eale12'xiﬁiv e2242/32) = 013“1+°23“2
(A'z) E('323I'312""151r ezz'xzﬁz) = 033‘0213H1'0223F‘2
“(90213‘2013023+P°223)“3

(For simplicity, the subscript i is suppressed.)

Proof. Let f(e1,e2,e3) be the trivariate normal distribution
function of (e1,e2,e3)' where (e1,e2,e3)' has zero mean and
covariance matrx 0 given below; and let f (e1,e2,p) be the
standard bivariate normal distribution function with

correlation coefficient, p. Let

n = . 1 023 ; n = n 233

O O 033
We can easily show that

1

 

(A.3) E(e3le12-Xlﬁl, ezz-Xzﬁz)

Q CD 00
I e3f(e1,e2,e3)de3de

de1
“X13 'X23 ’w

2

Note that

1
(28>3/2lnl

1 11 2 22
1/2 exp{ 5 (n e +0 e

 

2
f(ei'ez'e3) 1 1

22 2 33 2 12 13 23
+0 e2+n e3+ 20 e1e2+zn e1e3+20 e2e3)}

1 33 0136 +023e

1/2 eXP{' 92‘ (e3+ 33 ) }

 

3/2

(2n) Inl

76

 

 

 

 

 

 

11 33 13 2 13 23 12 33
1 n n - n n n -n n
x eXP[’ 2 {( i3 ) )e1'2( 33 )eiez
n n
+( 022033-(023)2 )e2}]
33 2
n
1 33 013e1+023e2 2
= eXP{' -—- (e + ) }
(2”)1/2 (n33)2 2 3 033
X f(e1,e2,p)
where n31=(1-p2)/|n|. Then,
w a -po
13 23
(A.4)] e f(e ,e ,e )de = e f(e ,e ,p)
.4» 3 1 2 3 3 1_p2 1 1 2
a -po
23 13
+ -I:;§——— e2f(e1,e2,p)

Substituting (A.4) into (A.3) and using Rosenbaum's theorem
(See Johnson and Kotz [1972]) gives us (A.1).
To show (A.2) to hold, note that

1

 

2 _

00 CD 00
2
x I I I e3f(e1,e2,e3)de3de2de1
"x131 "x252 ’”

It can be easily shown that

no
2 _ Iﬂl
(A.6) {m e3f(e1, e2, e3)de3— 2 f(e1,e2,p)

1‘9

2
(pa -0 )
23 13 2
+ e f(e .e ,p)
1_p2 1 1 2

 

2(""23"’13)("013-023)
<1-p2)2

 

+ e1e2f(e1,e2,p)

77

 

2
(pa -0 )
13 23 2
+ e f(e .e ,p)
2 2 2 1 2
(l-p)

Substituting (A.6) into (A.5), and using Rosenbaum's theorem
again gives us (A.2).

(A.2) provides a consistent estimator of 033 in the
extended two-stage estimation method. As we see in (4.3'),
V31=e31-013I-‘11‘023H21- Then using (A.2) , we can show that
(1)-7) E(v231|eliZ-xliﬁl'eZ-XZiBZ) = 033‘0132(x1151l‘11+l‘112+9“31)

”023(X2152“21+“21+W31)+2013°23(“31'“11“21)

This shows the consistency of 033 given in (10).

78

Appendix B

The extended two-stage estimators of B3 and c are given

by:
A I I A -1 2 I

(B.1) a = 53 = ziXBiXBi 31*31“1 1x31yzi
A A| A|A AI
° ziuiXBi Siuiui 31"1Y31
2: ' E 'A '1 E ' -‘ +

= a + 1x31 "31 1X31“i ix3i{(“i “1) V31}
A I I A! A

31“1X31 ziuiui zi“i{“1‘"i’e+vai}

where 31 means summation over i from 1 to N1.
First of all, we need to know the asymptotic distribution
of 31. By Taylor's expansion at the true parameter,

r=(ﬁ1'rﬂ2'rp).r

 

A. I all” A | A
“1 ~ “1+ 6 r (r'r)-“i+ Ai(r-r)
r
where A1 is defined in (9). Therefore,

Al I A
VN(ui- pi) z Ai(VN(r-r)).
Hence, we have
A. I I
(B02) VN(ui’pi) * N(Or AiWAi)

which shows the consistency Of [21. Note that the total number
(N) of observations increases as the number (N1) of the
Observed Y3i's does; N -. no as N1 —. 00. For simplicity, we
assume that

N1
(3.3) plim —ﬁ_ = k, 0<k<1

(See Heckman [1979]) Using (8.2) and (B.3), we can easily

79

 

show that
(B 4) lim $—-2 x ' A 1= lim l—Pz. x’. x 2 x' ﬁ=D-1
° P N1H1 31 3131X31“1 P N1 1 31 31 1 31“1
A| AIA I I
21“1"31 21“1“1_ _21“1X31 z1“1“1

 

 

 

}

p

. l
(8.5) plim ﬁ121x31{(ui-ui)c+v33i

II
C

 

 

A. A
L 21#1{(#1-u1)C+V31}1
(8.3) and (8.4) guarantee the consistency of a. In order to
show the asymptotic covariance of «N1 (a-a), note that
-1

121x31{(#1-u1)0+v31} ._1_ 21x31{(u1-u1)C+v31}

(B 6)
er 21u1{(u1-u1)0+v31} VN1 E1u1{(u1-u1)0+v31}

= 1 2' lzj nijx 31X 33 2 izj nijx Biuj
N A'A
1 21 j "11“1x31 2121 "11“ 1“1
where 7111 = [(u1-ﬁ1)c+v31]2. Now, using (8.2) and (8.3) we

have

I 1
(8.7) pl1m ﬁlzl Pjnijx 3ix 3j

_ 1 1 I -A I -A I
- plim ﬁ— 2-21C (#1 #1) (#1 #1)cx31x31

1 1
+ lim 3— z. ( )cx
p N1 12 3' v31 “3 “3' 31X 3j
1' 1 '

1

80

N z 2. c A. lwAl. ox
= plim _1( i jg 3ix 3j)

2
N N1

1.2.
+ plim N1 21V31x31x31

N 2.123Q.
1 ij x31X 33 1
( ) + plim N Bin ii x31x31
N N1

 

= plim

Here 1111 and Q11 are given in (9) . The last equality in (8.7)
is derived from (11.7). The limiting values of the other
components in (8.6) can. be derived. by similar' methods.

Therefore, we have

2. an X n -
(8.8) plim %_ 2:i ij 3iX 3j zj ij x31“ 3
1 AIA
2.12jn1ju1x3j212jnijuiuj
2.1”. 2. '“i
_ . 1_ iix 3ix 3j in ii x31" 31
— a33p11m N1
Xi” 11ﬂ1X 3j ziﬂiiuiui

N ﬁszinBix 3j 2i szijx 3i “j
+ plim (—$)-l§
N N12 1 EjQijpiXBj ziszijniuj 1

 

 

Substituting (8.4) and (8.8) gives us

(3.9) VN(&-a) 3 N( o, 080' )

81

Appendix c

Here, we will derive the LM test statistic for no
selection bias in Model II. To construct the LM test
statistic, we need the restricted. MLE estimators. ILet
a1=°13/°33i a2=°23/°33-

Note that.ai3=aQ3 = 0 if and only if a1=a2=0. Therefore,
the null hypothesis of no sample selection bias is that
a1=a2=0. ‘We can rewrite the log-likelihood function for Model

II as

(C.1) 1n LII(ﬁ1rﬁzlplﬁ3lalla2Io33)

M2

1 1
[y -y -{- —1na - ———— (y --x .ﬁ )
i=1 11 21 2 33 2033 31 31 3

 

 

x1151+a1(y31'x3153) X2132+32(Y31’x3133)
+ 1“ F 2 1/2 ' 2 1/2 '
(l-alo (1-a a

33) 2 33)

 

p"‘3'338‘151‘2
2 1 2 2 1 2
(1-a1033) / (1-a2033) / ] ]

1nF<anuxuappn + 1nL°<31.32.p)

where 1nLF(ﬁl,ﬁz,p) is given in (6). Let the restricted MLE
estimates of £1, £2, p, 83 and 033 be 51, 52, 5, EB, and 333.
Under the null hypothesis, (C.1) is reduced to

(C.2) 11a31(31, 32, 33, p, o, o, 033)

N

_ 1 _ l - l _ _l__ - 2
‘ 1:1[Y11Y21{ 2 lnz" 2 1“°33 2033 (Y31 x31ﬂ3) }]

+ 1n LC(Bl.ﬂ2.p)

This shows that 53, 533 can be derived from the usual OLS

82

procedure for the observed y3i's, and that El, 52, 5 are just

MLE estimators derived by maximizing 1n L°(Bl,82,p).

Therefore, 51, 52, and 5 are exactly identical to the

estimates, 51, 52, and 3 used for the 2-stage estimation.
Let 6 = (31' 82' p, [33' a1, a2, 033)'. Then the

restricted MLE estimator of e is given by:

a = (51'.§2'.3.53'.0.0.333)' = (51'.§2'.8.E3'.0.0.333)'

where 333 = 2i(y3i-x3i§3)2/Nl. Then, we can show

(C.3)

II N N
alnL -_ 1 ‘ _ - 1 ‘ _ - '
-—3§_— 9‘(Ororoaori:1 “11(Y3i X3iﬂ3)’i:1 ”21(Y3i X3153),0)

Some tedious operations generate the information matrix

evaluated at 5:

(c.4) 1(9) = E( -§-Jé%’e—l— )
r- 2 C 2 C 2 C
E(- a lnL' 5) E(— 6 lnL' 5) E(- a lnL '5
631681 631632 6816p
2 C 2 C 2 C
E(- a lnL' 5) E(- a lnL' §) E(- a lnL l5
apzaa1 632632 6326p
2 C 2 C 2 C
= E(- a lnL.|§) E(- a lnL'la) E(- a In? '5)
apaﬁl apaa 2 6p
0 O 0
O 0 0
0 0 0

 

83

 

 

o o o o q
o o o o
o o o o
N A , N A A , N A A , 0
(1/033) 2 F"‘31x31 .E F1“11X31 .E F1“21x31
1-1 1-1
N A A _ N A A2 _ N A A A
i:1F1”11x31 0331:1F1“11 0331:1F1“11“21 °
N A A _ N A A A _ N A A2
i:1F1“21"31 “33i21F1“11“21 “331:1 F1“21 °
_ N A
o o o (033/2)i=1Fi _
Now, using (C.3) and (C.4), we have the LM statistic:
(c.5) LM = (312211 9) 1(9) lcﬁlggii 5)
N1 3 N1 .
= [0' E “11(Y31 X3133) 1:1 “21(Y31 x3133)]
' _ N A , N A A , N A A , I
(1’033121F1x31X31 iElFi’ﬁixsi iEIFi"21x31
NAA _ NA. _ NA...
' i:1F1“11x31 U33i§1F1“11 0331:1F1“11“21
N A A _ N A A A _ N A A2
i:1F1“21x31 0331:1F1“11“21 "331:1 F1"21 3
. o .
N1 A _
1:1 “11(Y31'X313)
. N . -
1 1:1 “21(Y31‘X3iﬁ) .

 

 

84

By theorem 4.2.3 in Amemiya [1985] (p. 118),
1 N
(C.6) plim ﬁ.z (y1 yi-in) Hi= 0
1=1
where H1 is a nonstochastic variable. Since 81, £32, and a are
consistent,
(C-7) Plim F(X11311x21ﬁzlp) = F(X1131:X2132:P)

Using (C.6) and (C.7), we can show that

 

 

 

(C.8) - _ N A , N A A, , N A A , -
(1/o ) 2 F.x .x . 2 F. u .x . z F.u .x
33 i=1 1 31 31 i=1 1 11 31 i=1 1 21 31
1 NAA _ NAAZ _ NAA A
plim— EFu x . a Z F.u . 0 2F.” .11
N i=1 1 11 31 33i=1 1 11 33i=1 1 11 21
NA]; _ NAA A _ NAA
2 F u .x . a 2 F.u .u a 2 F.“ .
_i=1 i 21 31 33i=1 1 11 21 33i=1 1 21 .
' ._ N . N . .
(1’033).§ Y11Y21X31X31 .E y11y21“11x31
1-1 1—1
1 N A - N A2
= 911” N ._ y11Y21“11X31 O33.§ y11Y21"11
1—1 1-1
N A _ N A A
.E y11Y21“21X31 ”33 E Y11Y21“11“21
_ 1-1 1-1
N A , '
2 y .y .u .x .
1=1 11 21 21 31
- N A
O33.§ y11Y21"11 “21
1—1
_ N A2
a 2 y .y .p .
33i=1 11 21 21 J

 

85

 

 

(1/01)12 x 2 u .x . 2 u x
N N N
1 l ‘ - 1 ‘2 - l ‘ “
=p1im — 2 u a 2 p . a 2 p .u .
N i=1 11x31 331=1 11 33i=1 11 21
N N N
1 ‘ - 1 ‘ ‘ - 1 ‘2
.2 u .x a z u .p . a E u

Using (C.8) and matrix notation defined in Section II, we
have:

(C-9) LM z (Y3'X353) 'M(533#'#'333#'X3(X3'X3)-1X3I-‘)-1N'(Y3'x353)

A|A| AA
c p M uc
x 2
= _ a X (2)
“33

 

using the facts;

_ - I -1 U
ﬂ3 ‘ (x3x3) X3Y3

Mx(y3:x3§3) = Y3'x353

__ '1"

0)

Now we can see that the LM statistic is identical to the

F statistic given in Section II except for the difference

 

between SSEo/(Nl-k3) and 333. Note that
AIA' AA
c p quc/z 2( )/
F = - -9 x 2 2
SSEo/(N1 RB)

and SSEo/(Nl-k3) and 333 have no substantial asymptotic

difference. Therefore, the F-test is identical to the LM

test, and has good asymptotic power properties.

86

Appendix D

Let (e1, e2, e3)' have the trivariate normal«distribution

with zero mean and covariance matrix

 

 

1 p 013
n = p 1 023
_ 013 “23 033 .

Then, the conditional distribution of e3 given e1>-x and

 

 

 

e2>-y is:
1 1 2
(D.1) f(e |e z-x,e 2y) = exp (- ———— e )
3 1 2 (2")1/2(033)1/2 2033 3
.F[ X"(013/03993 Y +(a23/a33) e3
(1"’:1z3/"33)1/2 (1’023/033)1/2

p"”13023/“33

(bog/"33)1/2(1"’§3/"33)1/2 ]
+ F(X:Y:P)

Proof. It can be easily shown that

1

1
(D-2) F(e .e le ) = exp[- {(0 -o )
1 2 3 2ﬂ(|ﬂl/033)1/2 ETﬁT 33 23

 

1 a

2 13 2
{(0 ‘0 )(e - -—- e )
Zlﬂl 33 23 1 033 3

 

x exp{-

0
13 2
33 023013)‘e1 033 93)

-2(pa

U
2 23 2
+("33 “13)(82 033 83) }]

The conditional distribution of e3 given e1>-x and e2>-y

87

is described by

(0.3) F(e3|e1>-X. e2>-y) = I I
-X ’Y F(XIYIP)

dezde1

 

f(e3) w w
= I I f(e1,e2|e3) dezde
F(XIYIp) “X 'Y

 

1

 

 

 

where
1 eg
f(e ) = eXP (- —-——-)
3 (2”)1/2(033)1/2 2033
_ e1"(013/033)e3 , 92' (023/033)e3
(0'4) t1 ‘ 1/2 ' 2 = 1/2

(1'013/033) (1’023/033)

Then, the Jacobian of the transformation is

/2
(0.5)

 

 

_ 2 1/2 _ 2 1
8(are? _ (“33 013) (033 “23)
8(t1,t2)' a

 

33

Using (0.4) and (8.5), we have
co 00
(8.6) I I f(e1,e2|e3)de2de1
-x -y
0000

= I I g(t t ) dt dt
h k 1' 2 2 1

where

h /2

-(X+(013/O33)e3)/(1-0i3/U33)1

/2

w
H

-(Y+(023/033)e3)/(1-0223 M33 )1

e = (9033-013023)/{(033'013)1/2(°33'9:3)1/2}

88

_ 2 1/2 _ 2 1/2
t t _ (”33 013) (033 023)
g( 1! 2) - 1/2
2n<|ﬂ|033)

 

2 2
(033 “13)(033 023)

 

 

’eXP[ ‘
2|£|a33
.{ t2+t2- 2(""33""13"23) t t }]
1 2 2 1/2 2 1/2 1 2
(“33‘013) (“33'023)
Note that
IBIO
1_£2= 33

 

2 2
(“33 “13)(033 “23)

This shows that g(t1,t2) is the standard bivariate normal
distribution function with the correlation coefficient, 2.
Therefore, we have

on Q
(D.7) {x {y f(e1,e2|e3) dezde1 = F(h,k,£)

Substituting (D.7) into (0.3) gives us (D.1).

 

REFERENCES

89

REFERENCES

Abowd,J.M, and Farber,H.S. (1982), "Job Queues and the Union
Status of Workers", Industrial and Labor Relations
Review, Vol. 35, pp 354 - 368.

Amemiya, T (1973), "Regression Analysis When the Dependent
Variable Is Truncated Normal", Econometrica, Vol. 41, pp
997 - 1016.

Amemiya, T (1985), Advanced Econometrics, Harvard Press.

Blundell, R., Ham, J., and Meghir, C. (1987), "Unemployment

and Female Labor Supply", The Economic Journal, Vol. 97
(Conference Papers), pp 44 - 64.

Farber, 8.8. (1983), "Worker Preference for Union
Representation, " Research in Labor Economics, Supplement
2, pp 171 - 205.

 

Fishe, R., Trost, R.P., and Lurie, P.M. (1981), "Labor Force
Earnings and College Choice of Young Women: An
Examination of Selectivity Bias and Comparative
Advantage", Economics of Education Review,‘Vol. 1, pp 169
-191.

Flinn, C. J., and Heckman, J.J. (1983), "Are Unemployment and
Out of the Labor Force Behavirally Distinct Labor Force
States?", Journal of Labor Economics, Vol. 1, pp 169 -
191.

Ham, J.C. (1982), "Estimation of a Labour Supply Model with
Censoring Due to Unemployment and Underemployment",
Review of Economic studies, Vol. 49, pp 335 - 354.

Heckman, J. (1974), "Shadow Prices, Market wages, and Labor
Supply", Econometrica, Vol. 42, pp 670 - 694.

Heckman, J. (1979), "Sample Selection Bias As a Specification
Error", Econ metr'ca, Vol. 47, pp 153 - 161.

Johnson, N.L., and Kotz, S. (1972), Qistribgtigns in
Statistic : Continuous Multivariate Distribut'on , Vol.

4, John Wiley and Sons.

Lin, T. (1982), Some Applications of the Lagrangean
Muitiplier Test in Ebonometrics, Dissertation for the
Ph.D degree, Michigan State University.

Maddala, 6.8. (1987), Limited-dependent And Qualitative
ygriabies in Econometrics, Econometric Society Monograph,

No. 3, Cambridge University Press.

9O

Melino,.A. (1982), "Testing for Sample Selection Bias", Bcvieg
cf Eccnomic Studies, Vol. XLIX, pp 151 - 153.

Meng, C., and Schmidt, P. (1985), "On the Cost of Partial
Observability in the Bivariate Probit Model",

Ingcgncticnci Economic Review, Vol. 26, pp 71 - 85.

Poirier, D.J. (1980), "Partial Observability in Bivariate
Probit Models", Journal of Econometrics, Vol. 12, pp 209
- 217.

Chapter 3

Efficient Estimation of Models for Dynamic Panel Data

1. Introduction

This paper considers a dynamic model with panel data
which include a large number of cross-section observations,
but only over a short period of time. A typical problem in
using panel data is that the error terms in the model contain
unobservable and.time-invariant individual effects. To allow
for these effects, "random effects" models are widely used in
the literature on dynamic panel data. In these models, the
individual effects are treated as being generated from an
independently identically distributed (iid) stochastic
process. This paper develops a generalized- method-of-moments
(GMM) estimator for the dynamic model with random effects
which is efficient under general circumstances.

In the case of the static model, the simple fixed effects
(within) treatment generates a consistent estimator. There
are also a number of studies which develop efficient
estimation methods for the static model with random effects.
When no explanatory variables are correlated with the
individual effects, the generalized least square (GLS)
estimator is consistent and efficient in finite sample; see
Hsiao [1986]. When some explanatory variables are correlated
with the individual effects, we can efficiently estimate the

model using some available instrument variables; see Hausman

91

92
and Taylor [1981], Amemiya and MaCurdy [1986], and Breusch,
Mizon, and Schmidt [1989].

Several problems arise in the dynamic model that do not
arise in the static model. First, the conventional within
estimator is inconsistent unless there are a large number of
time-series observations; see Hsiao [1986] . Second, even
though the maximum-likelihood (ML) method is available, the
form of the ML estimator depends crucially on assumptions
about the initial observations and the distribution of the
individual effect; see Anderson and Hsiao [1981], or Hsiao
[1982]. To avoid these problems, Anderson and Hsiao [1981],
Holtz-Eakin [1988], and Arellano and Bond [1988] investigate
instrumental variables estimation techniques. (From now on I
call their methods the conventional instrumental-variable (IV)
methods.) To get a consistent estimator, they
first-difference the original equation to eliminate the
individual effects; and then they use lagged dependent
variables as instruments. These instruments are legitimate in
the usual sense that they are uncorrelated with the
differenced error terms.

In the framework of Hansen's [1982] GM, the conventional
IV estimators can be regarded as GM estimators which use some
available linear orthogonality conditions. The GM estimators
are efficient in general circumstances, if all known is that
the data-generating process satisfies certain moment
restrictions; see Chamberlain [1987]. The GMM method may be

preferred to the ML methods in the dynamic model using panel

93

data, because the GMM estimators do not rely on assumptions
about the initial observations and the distribution of the
individual effects. However, the conventional IV estimators
are not efficient in the sense that they fail to use all the
available moment conditions. This is due to their lack of a
systematic treatment in counting the number of available
restrictions. Furthermore, those IV estimators could be
inconsistent if we relax some behavioral assumptions about the
error terms. For example, lagged dependent variables are
generally not legitimate instruments if the error terms are
autocorelated.

The main goal of this paper is to offer a systematic
analysis *which. counts jproperly' all ‘the available :moment
conditions under given assumptions. Under alternative sets of
assumptions, I demonstrate how many moment conditions there
may be, and I show how to write them in a convenient form.
Under the usual assumptions, we can find some linear and
nonlinear orthogonality conditions, which the conventional IV
approaches do not exploit. I then derive the GMM estimator,
and a linearized GMM estimator that is equally asymptotically
efficient. The GM estimator based on all the available
moment conditions must be more efficient than other GMM
estimators based on only a subset of moment conditions. In
this sense, the GMM estimator presented in this paper could be
said to be efficient when the distributions of the initial
observations and of the individual effects are not known.

The plan of this paper is as follows. Sections II, III,

94
and IV consider a simple dynamic model which includes only a
one-period lagged dependent variable as the explanatory
variable. Section II briefly summarizes the conventional IV
approaches and demonstrates why they miss some available
moment restrictions. Section III shows the proper way to
derive all the legitimate 'moment. conditions under' given
assumptions. Section IV investigates the estimation procedure
and the asymptotic performance of our GMM estimator under the
usual assumptions. Section V extends our approach to the
dynamic model which includes exogenous variables. Section VI

gives some conclusion.

II. Conventional Iv Methods

To explain the conventional IV methods as simply as
possible, I consider the following simple dynamic model:
(II-1'1) Yit = 5Y1,t-1 + a1 + 6it

(for i = 1,2,"'°,N;t = 1,2,----,T)
= 6Yi,t-l + uit
where uit = “1+51t- The subscript i denotes the ith
cross-section unit, and t designates time periods. Here y is
the dependent variable, a is the individual effect, and e is
the error term. Let y1 = (y11,y12,.°-,y1T)'; y1’_1 :=
(Yiovynv ° ' ' IYi,T-l) ' r ui=(uil'ui2" ° ‘ rum) ' - Then: we can
rewrite (II.1-1) as
(II.1-2) Y1 = 6y1'_1 + “1
i = 1,2,- - . ,N

Let y=(y1'.Y2'."':YN')'i y;1=(y1,-1'.y2,-1'.°".yn,-1)';

95

u=(u1',u2','°-,uN')'. Then (II.1-2) becomes
(II.1-3) y =- 6y_1 + u.
I assume that the (2'3 and e's have zero means. More
assumptions about them will be made shortly.

For mathematical convenience and future use, I also
define some notation:
p, = A(A'A)'1A'
MA 3 Im ‘ PA
P = (1/T)eTeT'
Q = 1:.r - P

P =INGP

v
Qv=IN®Q=INT-Pv

where eT is a T-dimensional vector of ones, and A is any mxk
matrix.

In the conventional literature on the dynamic panel data
model, there are four common assumptions about yo, a, and e's
(for simplicity, the subscript "i" is suppressed):

(SA.1) e's are independent of yo; i.e., E(yoet)=0 for any t.

(SA.2) e's are independent of a; i.e., E(aet)=0 for any t.

(SA.3) 6'8 are homoskedastic; i.e., E(et2)=a£2 for any t.

(SA.4) 6'8 are mutually independent; i.e., E(eaet)=o for any
tis.

The conventional IV approaches first-difference (II.1):
(II-2) Ylt'Y1,t-1 = 5(Y1,t-1'Y1,t-2) + (“it-“Lt—l)

Since uit—ui'td (=51t’51,t-1) does not include a1, some lagged

yit's can be used as instruments for the estimation of 6. For

example, Y1,t-2'Y1,t-3 is a legitimate instrumental variable in

96
the sense that it is uncorrelated with nit-uiﬁrd' but
correlated with Yiﬁrd-Yinrﬁ; see Anderson and Hsiao [1981].

Arellano and Bond [1988] and Holtz-Eakin [1988] find and
use all of the moment conditions based on lagged y's being
uncorrelated with uit'“1,t-1° Consider the (T-1) first-
differenced equations separately:

(II-3) Y12'Y11 = 5(Y11'Y10) + (“12'“11)

Y13‘Y12 = 5(Y12'Y11) + (“13'“12)

Yir'Y1,'r-1 = 5(Yi,T-1‘Yi,'r-2) + (“yr-“1.151)
The system (II.3) could be regarded as a simultaneous system
of equations with the cross-equation restriction that the
coefficients are the same everywhere. Arellano and Bond's
method is akin to three stage least squares (BSLS) with
different instruments for the different equations;
YiorY11I"°rY1,j-1 for the jth equation of (II.3). This
approach is based on the following (§)T(T-1) orthogonality

conditions:
(II-4) E(Yit(ui,s+1'uis)) = 0:
t =\1,2,"',T-2; t < s s T-l

which hold under the assumptions given in (SA).

Even though Anderson and Hsiao [1981], Holtz-Eakin
[1988], and Arellano and Bond [1988] adopt somewhat different
instrumental variable treatments, fundamentally all of their
methods are based on the conditions given in (11.4). To
clarify this point, let us try a slightly different approach.

Define

 

97

 

 

A1 —
r -yio O : 0 O : : O -
yio 'Yio "Y11 ° ‘ ‘ °
° yio ‘ yi1 'Yii ‘ ‘ °
0 0 ' : O yil : : 0
° 0 "Via: ° ° ”Vii: "Yi,T-2
- ° ° Yio‘ ° ° Yii‘ ‘ Yi,T-1~

where A1 is a Tx(§)T(T-1) matrix. Also, define
A = ( Al',A2','-',AN')'
Then, (II.4) can be compactly expressed by
(11.5) E(Ai'ui) = 0 or

E(A'u) = 0
One merit of (II.5) is that we do not have to first-difference
the equation given in (II.1). That is, we can directly apply
the instrumental-variable treatment to the equation in levels.
All the conventional IV’approaches can be interpreted.as using
some if not all orthogonality conditions which.are just linear
combinations of those in (11.5). Note that the instruments in
A are legitimate because plim(1/N)A'u=o and plim(1/N)A'y_1¢0.
It can be easily shown that Anderson and Hsiao use as
instrumental variable a linear combination of only some
columns of A, while Arellano and Bond use A itself. The IV
estimator which uses A for (II.1-3) takes the following form:
(11.6) 3A = (y-1'PAy-1)‘1y-1PAy
It can also be shown that this estimator is asymptotically

identical to the GMM estimator based on (11.5) with

98

assumptions in (SA). (See .APPENDIXI A.) The asymptotic
distribution of 3A is given by:
(11.7) «N (SA-a) -> N( o, aezplim[(1/N)y-1'PAy_1]'1 )
If all the available orthogonality conditions are those in
(11.5), 3A must be efficient among the class of estimators
which can be derived using this set of information.

However, the assumptions in (SA) imply' more :moment
restrictions that those in (11.5). For an example, consider

the following second-differenced equation:

(II-3) Y13'Y11 = 5(Y12’Y103 ’+ (“13'“11)

Clearly, under the assumptions in (SA),

(11.9-1) E[u12(ui3-u11)]=0, and

(11.9-2) E[u12(y12-y11)]¢0.

Therefore, “12 can be regarded as a nonlinear instrument for
equation (11.8) in the sense of Amemiya [1974], and this
restriction (11.9-1) is not implied by (11.5). This example
implies that (11.5) does not incorporate all the available
moment conditions enforceable under the assumptions in (SA),
and therefore that 3A is not efficient. In the next section

I will categorize all the restrictions implied by (SA), and

also show how we can relax some assumptions.

III. Derivation of Moment Conditions

In this section, 1 demonstrate an appropriate way to
derive all the available moment restrictions which can be
exploited from the usual set of assumptions given in (SA), and

I also apply this method to several cases in which some

99

assumptions are relaxed. To do so, first define the
covariance matrix of yo, a, and the e'sl:
r ~ 02 a o a °°°°° o -
yo 7 0 0a 01 02 OT
2
a 0a Gal “a2 aaT
(111.1) 2 = Cov ‘1 = “11 “12 °°°° “1T
62 022 eeeee azT
i ' i
I- 6T .1 .. OTT d

 

 

 

 

where i is suppressed. Basically, any assumption which may be

imposed on the dynamic model can be expressed as a

restriction on.2L Under the usual assumptions in (SA), 2

takes the following form:

- 00 00a 0 o o -
2
06 o 0
(111.2) a: o -.- o
2
b 06 -

 

 

The vector of which 2 is the covariance matrix is not

observable. The vector of observables, meaning things that

can be written in terms of data and parameters, is (yo, ul,

u2, -, uT)', which has the following covariance matrix:

- 0 ﬂ 0 ﬂ -

' yo ‘ oo 01 02 OT

u1 ‘111 912 '°°' “11

(111.3) 0 = Cov u2 = n22 .... “21
- “T . nTT .

 

 

 

 

100

 

2 ....
' “o “0a+°o1 00a+002 00a+OOT

2 2 O O O O 2
aa+011+20a1 0a +aa1+aa2+012 aa+oa1+aaT+a1T

_ 2 ' .... 2
— aa+022+20a2 aa+aa1+aaT+02T

2 :

L aa+°TT+20aT-

 

By comparing n and 2, we can easily see that the form of
0 depends on that of 2, because each element of n is a linear

combination of the elements of )3. Under the usual assumptions

 

 

 

 

(SA).
' 000 ‘201 ‘102 nor 1
Q11 012 °°°' n1T
(111.4) 0 = 922 --~- “21
L QTT -
- 0'2 O O O
0 0a 0a 0a ]
2 2 2
0 +06 0a 0a
_ 2 2
- 0 +06 0a
2 2
_ 00+ 06 .

An investigation of the elements in (111.4) provides us with
three types of moment restrictions;

(111.5-1) Type I restrictions:

n01 =n02 =noa - - “or
(111.5-2) Type II restrictions:
n11 =n12 =91; = " “if;

(111.5-3) Type III restrictions:

n12 =n13 =0“ ‘-n1'r
“923 =924 "' ““21-

(111.5-1) implies (T-1) restrictions; (111.5-2), (T-l);
(111.5-3), (§)T(T-1)-1. Therefore, there are in total
(§)T(T-1)+(2T-3) available moment conditions, which are more
than those used in the conventional IV approaches.
Specifically, we have (2T-3) extra moment conditions.

Furthermore, the conditions described in (111.5) can be
obtained even under somewhat relaxed assumptions. First of
all, consider Type 1 conditions. By observing (111.3), we can
see that Type I restrictions hold as long as Y0 and 6t have
the same covariance for any t. The stochastic independence
between Y0 and the 5's is not required. Now, consider Type
11 conditions. These require that ass+aaz+20as = cxtt._+aa2-+2cxmt
for any 3 and t. For this, all we need is the
homoskedasticity of the 6's and the equicovariance of a with
the e's. Finally, consider Type III restrictions. These
require: Ogt+0at+0aa 'to be the same for any t and 8.
Obviously, the equicovariance of the 6's and. the
equicovariance of a with the e's can justify those
restrictions. In short, all the moment conditions in (111.5)
are present under the following modified assumptions (MA):
(MA-1) The covariance of yo and at is the same for any t.
(MA-2) The covariance of a and at is the same for any t.

(MA-3) e's are homoskedastic.

102
(MA-4) The covariance of 59 and 6t is the same for any 5 and
t.
A comparison of (SA) with (MA) shows that the assumptions in
(SA) except the homoskedasticity of the e's are stronger than
necessary.
To derive explicit forms of the moment conditions which
are convenient for GM estimation, let us rearrange the

conditions in (111.5) as follows:

(III-5’1) 001 = n02 = n03 = ° " ' = no.'1---1 = “or
‘112 = n13 = ' ‘ ' ' = n1,'r-1 = “11
n23 = = n2,'1'--1 = n21'

n'1'-2,'r-1 = “152,1
(III-5‘2) 011-012 = {222—023 = = n'r-1,-r-1""r—1,'r
(III-5‘3) n11 = ‘222 = ‘133 = = n'r-1,'r-1 = “11
All these conditions in (111.6) can be expressed by:
(111.7-1) E[yit(ui's+1-uia)] = 0
0 S t s T-2, s > t
(111°7'2) E[Y1t(“1,t+1‘“1t)'Y1,t+1(u1,t+2‘ui,t+1)1 = 0
1 S t .<_ T-2
(111.7-3) E[ﬁi(ui't+1-u1t)] = o
1 S t S T-l
where ﬁi=(1/T)e.r'ui. For the derivation of (111.7), see
APPENDIX B. (111.7-1) implies (§)T(T-1) restrictions which
are exactly identical to those used in the conventional 1V
approaches. (111.7-2) shows (T-2) missing linear moment

conditions which are not used in the former studies.

103
(111.7-3) shows (T-l) restrictions that are nonlinear in the
sense that 6 appears in both iii (=(1/T)e.r'(y1-6yi'-1)) and
“1,t+1"“1t (=(Y1,t+1"Y1t)'5(Y1t"Y1,t-1))° Again, there are in
total (1/2)T(T-1)+(2T-3) restrictions. Table 10 summarizes

the total number of moment conditions implied by (SA).

Table 10

 

Types of Conditions (111.7-1) (111.7-2) (111.7-3) Total

 

 

T

2 1 o 1 2

3 3 1 2 6

4 6 2 3 11

5 1o 3 4 17

T (3)1(1-1) T-z 1-1 (§)T(T—1)
-(2T-3)

 

Referring to Table 10, we see that the total number of all the
available moment conditions is almost twice as big as that of
the usual conditions, when T is small. This means that the
GMM estimator based on (111.7) could have a substantial
efficiency gain over the conventional 1V estimators if T is
relatively small.

Until now, we have investigated the moment restrictions
under the usual assumptions“ IHere, we consider three cases in

which some assumptions are relaxed.

CASE 1. Keep all the assumptions except (SA.3)

In this case, the heteroskedasticity of the 6's is

104
allowed. One might think that the moment restrictions given
in (111.7-1) would be all we can have because (111.7-2) and
(111.7-3) are based on the homoskedasticity of the e's.
However, this is not the case, because we still have some

nonlinear restrictions. To see why, observe the form of n in

 

 

 

 

CASE 1:

' 0oo 001 D02 °°'° “01 “

n11 n12 "°° 011'

(111.8) 0 = 022 ~--- OZT
. nTT .

'- 02 O' O O

0 0a 0a 0a 7

2 2
0 +011 Ga Ga
_ 2
- 0 +022 0a

2

This shows that, Type I and 111 restrictions are still
available. The total number of conditions must be
(§)T(T-1)+(T-2). That is, we have an extra (T-2) conditions
in comparison with those adopted in the conventional IV
approaches. To show the explicit form of the restrictions,

let us rearrange them in a slightly different manner:

(HI-9'1) ‘201 3 ‘102 = 003 = '° ‘ ‘ = n0,-1-1 = “or
= n12 = "‘13)= ' ‘ °‘ = n1.1-1 = “11'
=323= =nz,-r-1 =92'r

= n'r-2,'r-1 = n'1'-2,'1'

105

(111.9-2) on = (223 = :234 = = nT_1 T
(111.9-1) is just identical to (111.7-1), and therefore, it
generates the conventional moment restrictions given in
(11.5). (111.9-2) can be expressed by:

(111.10) E[u12(ui3-u11)] = E[ui3(ui4-u12)]

= = E[“1,r-1(“1'r’u1,1-2)1=°

which are (T-2) nonlinear restrictions. If the 6's are
heteroskedastic, we do not have any extra linear restrictions.
However, the existence of the nonlinear conditions given in
(111.10) again prevents the conventional IV estimators from

being efficient.

CASE 11. Keep (SA.2) and (SA.3)

Here, we allow the e's to be correlated with each other.
For example, the 6'8 could follow an AR or/and.MA.process. In
this situation, it would be unreasonable to assume that 1’10 is
independent of the 6's. The conventional IV estimators can
not be consistent, because they use invalid instrument
variables, in the sense that E[Ylt(u1,s+1‘“1a) ]+0 for s>t.
However, we can find some available nonlinear moment

conditions. To see this, observe the form of a under CASE II:

- n n 0 °--- 0 -

oo 01 02 OT
Q11 012 '°°' “11
(111.10) 0 = 0 °.-- 0

22 2T

 

 

2
' “o “0a+°o1 00a+002 00a+°OT
2 2 2 2
0a +06 0a +012 0a +011,
_ 2 2 2
- Ca +0 0a +021,
L a: +0: _

 

 

As we see in (111.10), Type II restrictions are still

available. These (T-l) conditions can be written as follows:

E[(u1,'r-1+u1'r) (“1,1-1'u1'rH = 0
These restrictions are all we can have under the intertemporal
correlations among the e's. However, if we assume the
stationarity of the e's, that is, if E(eisei,a+j)=E(eitei't+j)
for any t,s,and j, we can have more restrictions. Under the
stationarity of the e's, 08’8” = out”. Applying this fact

to 0 gives us the following extra (%)(T-1)(T-2) conditions:

(III-12) n12 = ﬂ23 = {234 = ' ° ° ’ = “ram—1 = nan-1,1
n13 = n24 = = “ram-1 = n'r-2,'r
n1,1-1 = 921'

Therefore, under the stationarity condition about the 6's, we
can have in total (§)T(T-1) restrictions available for GMM

estimation.

CASE III. Keep (SA.3) and (SA.4)
Since Ylo could be correlated with e's, any lagged value
of y“; can not be used as a legitimate instrument for the

differenced equations, because E[yit(ui'8+1-uia) ]¢O for s>t if

107

neither Yio nor a1 have equicovariances with sit for any t.

However, we can have many nonlinear restrictions.

111, 0 takes the following form:

(111.13) a = Cov u

 

Pyo-

 

 

 

00 Q01

011

0a 02

+ +
a “a1 “a2
+ +
0a 06 2“(12

:3

02
12

9:)

22

2 2'
oa+o€+20a

 

For CASE
0OT 1
“11
“21
“11 _
“0a+°or '
2
aa+oa1+aaT
02+O' +0

a al aT

 

T

Note that SIM-i-(lu..-20at = 2062. Explicitly, this means that

E[(uia-uit)2] is the same for any t and s.

are

(§)T(T—1)-1 restrictions.

Therefore,

there

Effectively, these represent

the fact that V’ar(u.1t-ﬁi)=[(T-1)/T]oe2 for any t, and

COV(u1t-ﬁi , uia’ﬁi)

= -Ta£2 for s¢t.

consider the transformed error term:

P

u

(111.14) u.

 

11

q

E.
1

 

Qui has the covariance matrix

(111.15) Cov(Qui)

To clarify this point,

108

= ~ [(T-1)/T]o: -a:/T -a:/T '°° -o:/T T

[(T-1)/T]a: -a:/T °-- -a:/T

[(T-1)/T10§ -a§/T

 

 

[(T-1)/TJa§ .

in

Note that the rank of Cov(Qui) is (T-1). Therefore, for the
derivation of linearly independent moment conditions only, the
last row and column must be ignored” ‘The restrictions implied
in (111.15) can be expressed by

(111.16) Var(u11-ﬁi) = Var(uiz-Gi) = ° - ° - = Var(u1'T_1-ui)

= —(T-1)Cov(uil-ﬁi,u12-Gi) = = -(T-1)Cov(u1’T_1-ﬁi,uiT-ui)
= ~(T-1)Cov(ui'T_2-ui,ui'T_1-ui)

The number of restrictions in (111.16) is again (§)T(T-1)-1.

1v. Estimation

In Section III, I have introduced a systematic method to
derive all the imposable moment conditions from a given set of
assumptions. In this section, I examine the GMM estimation
procedure based on those restrictions. Even though using all
the moment conditions will no doubt improve the efficiency of
the GM estimator, the existence of the nonlinear restrictions
makes a simple IV treatment impossible. One of the greatest
advantages of the conventional 1V methods is that the
estimators could be obtained using standard 1V (ZSLS)

software. Therefore, one may be reluctant to use nonlinear

109
GMM methods. However, we can avoid this problem by the
linearized GMM procedure; see Newey [1985].

Define the following expressions:

 

 

 

 

' 'Yii ° ‘
Y11+Y12 ‘Yiz
’Yiz y12+Yi3
B11 = 0 ‘Yia
° ° ”Yi,T-2
° ° Yi,T-2+yi,T-1
- ° ° “Yi,T-1 -
r- -ﬁi 0 -‘
ui -ui
321 = 0 “i
0 0 "ﬁi
__ 0 0 -ui _‘
(“12"“i1)/T (“13'“iz)/T '°° (“11’“1,T-1)/T
”21 = (“12'“122/T (“13‘“12)/T "' (“11‘“1,T-1’/T
(“12’“isilT (“is'uiz)/T °°° (“iT'ui,T-1)/T

B

11 B21 D21
B. = B12 ’ 34‘ B22 ’ UL: D22 3
BlN 2N Dzu

N

H = (A B1 Bz); D = (O 0 D2); H'u =jE1Hi'ui.

where 311 is a Tx(T-2) matrix; 321 and D21 are Tx(T-1)
matrices” Then, the moment conditions given in (111.7) can.be

expressed by

110

(1V.1) E(Hi'ui) = O, or

E(H'u) = O
A consistent estimator of C=Cov(H'u) is required.t01derive the
GMM estimator. For this, we can use 33 to evaluate H1 and “1°
(Let 1711 and {51 be H1 and “1 evaluated at 311') Then, C=
iglﬁi'ﬁiﬁi'ﬁi is consistent for C. One may think of H'ﬁ as a
consistent estimator of C in the same sense that A'A is
consistent for Cov(A'u). However, H'H is inconsistent unless
T=2, because Cov(A'u) includes the fourth-moments of cit.
When the eit's are normally distributed, APPENDIX C shows
(1v.2) plim(l/N)Cov(H'u) = aezplim(1/N)H'H + 0621'

where J' is a matrix with the form of
* ooo
J=[0J0]

000

and J is a (T-2)x(T-2) matrix of the form

2 -1 o
’ -1 2 -1
J=m2 o -1 2
’6
2 -1
-1 2

Therefore, H'H+N3*, not ﬁ'ﬁ, could be used for C.2 In spite
of this fact, I believe that C is a better estimate for C,
because H'H+NJ* is inconsistent under our modified assumptions
in (MA).

The GMM estimator, 36M", minimizes
(Iv.2) (u'H)<‘:‘1(H'u) = [(y-6Y-1)'H(6)]&'1[H(6)'(y-6y-1)]
where H(6) means that H is a function of 6. It has the

following asymptotic distribution:

 

111

(1V.3) VN(6Gm-6)~N( 0, plim[ (1/N) (6(u'H)/66)C'1(6(H'u) [66) ]'1)
It can be easily shown that
(1v.4) 6(H'u)/66 - -(H+D)'y_1
Therefore, the asymptotic distribution of 2mm is given by:
(1v.5) m(EGm-s) » N( o, plim[(l/N)y_1'(H+D)C'1(H+D)'y_1]'1)
and the asymptotic covariance matrix of 6cm is evaluated as
(IV.6) [y_1' (ﬁ+5)8’1(ﬁ+5) 'y_1]'1
where H and D are evaluated at 66““.

To examine the characteristics of 66m, consider the
first order condition for minimization of (1V.2):
(1v.7) y_1'(H+D)C'1H'(y-6y_1) = 0
Then, through some algebraic operations, we can show
(1v.3) 36m,

= [y_1' (H+D)C_1H'y_1]'1y_1' (H+D)C'1H'y
= [y_1' (ﬁ+13)6’1(ﬁ+13) 'y_1]'1y_1' (H+D)C'1(H+D) ' (y-Pvu)
where {i is evaluated at 661m. (IV.8) has an interesting
implication. It suggests that an iterative procedure with any
initial consistent estimator of 6 may generate the GM
estimator. This turns out to be true. Furthermore, only one
iteration is needed to get an estimator which has the same
asymptotic distribution as 66““. To see this, consider a new
estimator, 6, which replaces H, D, and Q in (IV.8) by H, D, 6
(evaluated, say, at 6A). Then,
(1v.9) 3 = [y_1'(ﬁ+ﬁ)é‘1(ﬁ+ﬁ)'y_1]‘1y_1'(ﬁ+6)8'1(ﬁ+ﬁ)'(y-pvﬁ)
= 6 + [y_1'(ﬁ+ﬁ)8'1(ﬁ+ﬁ) 'y_1]'1y-1'(ﬁ+ﬁ)C'1(ﬁ+ﬁ) 'u

A “-1A A

- [y-1'(ﬁ+f>)6'1(ﬁ+ﬁ> 'y-11‘1y-1'(ﬁ+n)c D'u

112

= 6 + [v-1'(ﬁ+ﬁ)?:'1(ﬁ+f5) 'y_1]'1y_1'(ﬁ+ﬁ)e'1H'u
+ [y_1' (ﬁ+ﬁ)€:‘1(ﬁ+ﬁ) 'y-1]'1y_1'(ﬁ+6)e'1(H-H) 'u
- [y_1' (ﬁ+ﬁ)8’1(ﬁ+ﬁ) 'y_1]"1y_1' (ﬁ+ﬁ)8‘1ﬁ' (ﬁ-u)

because Pvﬁ=0 and Pvﬁ=ﬁ. Observe that

 

 

 

 

 

 

Therefore,
(IV.10) - -(1/T)eT'yi _1 O 1
0 O -(1/T)eT'yi’_1
I
_ 0 0 (l/T)eT Yj_'._.ld
Note that
(IVell) P ‘(1/T)€T'Yi'_1 0 " ' " uil‘
I _. I
(llT) eT Yi'_1 (I/T) eT Yi'-1 “12
° (1/T)eT'Y1,-1 “13
O 0 -(1/T)e,r'yi’_1 :

_ O O (1/T)e,I,'yi'_1 . L uiT—
(“12’“11)/T (“13’“12)/T "' (“11'“1,T-1)/T ' Yio
(“12’“11)/T (“13'“12)/T '°° (“11'“1,T-1)/T Yil
(“12’“11”T (“13'“iz)/T "' (“11'“1,T-1’/T Y1,T-1

= D21'Y1,-1

Using (IV.10) and (IV.11), we have

(IV.12) (321-321).“ = -(6A-6)02i'y_1 and
(ﬁ-H)'u = -(6A-6)D'y_1

Substituting (IV.12) into (IV.9) gives

(1v.13)

6 - 6 = [y_1'(ﬁ+D)C'1(H+D) 'y_1]'1y_1'(H+D)C'1ﬁ'u

113

+ (EA-6)[y_1'(ﬁ+D)C‘1(H+D)'y;1]'1y_1'(ﬁ+ﬁ)é'1(D-D)'y;1
because ﬁ'(ﬁ-u)=-(6A-6)D'y_1. The second term in the right
side of (IV.13) is asymptotically negligible;
plim(1/VN)(ﬁ-D)'y;1(EA-6)=plimVN(6A-6)plim(l/N)(ﬁ-D)'y;1=0
Note that
[y_1'(H+D)C’1(H+D) 'y_1]'1y_1' (ﬁ+ﬁ)c‘1n'u/«/N

-» N( 0. puma/may(H+D)C'1(H+D)'y-11'1 )
This means that 6 is asymptotically identical to Ecum-

This result is actually not surprising. Newey [1985]
develops a simple linearized GMM estimator which has the same
asymptotic distribution as the nonlinear GMM estimator. His
method is applicable with any initial consistent estimator.
Applying Newey's formula [1985, p. 238] to)our:mode1.generates
(1v.14) 3:; + [y_1'(H+D)C'1(H+D)'y_1]'1y_1'(H+D)C'1ﬁ'ﬁ
which can be shown algebraically identical to 6. Therefore,
it turns out that 6 is nothing but Newey's linearized GMM.

Even though the efficiency of 65““ is not questionable,
one may argue that unless Scum has a significant efficiency
gain, 63 would be preferred because of its simplicity.
Therefore, it would be worth demonstrating explicitly the
efficiency comparison between 6cm“ and 6A, and seeing in what
cases the efficiency gain of 65“” over 6A is greatest. For

simplicity, consider the case in which T=2. There are only

two moment conditions available:

Yio(“12'ui1) ]
= o

(IV.15) E[
“i (“12’“11)
where ﬁi=(§)(u12+u11). Define

114

(3)<uiz-uil)
021‘ [ (3)<u12-u11) ]

v
p
I‘—
I
‘<‘<
HP
00
L—l
I”
N
H
II
r—I
ll
Cit:
p.
|____.l

A1 B11 D21
A = A2 ; B = 312 , D = 022 ; H = (A, 32)
AN B1N DZN

If the 6's are normally distributed ‘with. variance 0‘2,
Cov(H'u) = a£2E(H'H). This is true because J' is irrelevant
in (IV.2) when T=2. Then,
(IV.16) VN(6GMM-6) a N(0, ogzwl'l)
where ¢1=plim[(1/N)y_1'(H+D)(H'H)'1(H+D)'y;1]. .APPENDIX D
shows
*1 = plim(1/N)y_1'PHy_1 + a
where a = (1/3)°e4/[°a2(1’90a2)+°e2/2]'(1/2)022 < O. Pm:
denotes the correlation coefficient. of’ yo (and a, i.e.,
p0a=00a/ (000a) . On the other hand, the conventional IV
estimator, 6A, has the
following asymptotic distribution:
(1v.17) VN(6A-6) 4' N(0, agzwz‘l)
where
*2 = Plim(1/N)YL1'PAY-1

= plim(l/N)y;1'PHy_1+plim(1/N)ygl'MABZ(BZ'MhB2)'1BZ'M§y_1.
APPENDIX D also shows
plim(1/N)y-1'MAB2(B2'MAB2)"1BZ'MAy_1= -(1/2) [aa2(1-p0a2)+062/2]
Then,

(Iv.18) tl-vz = (1/2)oa4(1-p0a2)2/[aa2(1-p0a2)+o.2/21 > o

115
By observing (11.18), we can see that the size of efficiency

2, 00,2, and Pow The greater

gain of 3cm over 6A depends on ae
am2 is and the smaller 900:2 and 0:2 are, the greater efficiency
gain 66““ has.

Another possible problem in our estimation method is that
gems loses its consistency under the heteroskedasticity of 6's
while 6A does not. Therefore, an appropriate test procedure
may be required. Since 66“,, is efficient under the null
hypothesis (homoskedasticity of 6'5) , and 6A is consistent
under both null and alternative hypotheses, Hausman's [1978]
misspecification test is available. Under the null
hypothesis, the test statistic,
(11.19) (gem-3A) ' (Mz-erldcm-EA)
converges in distribution to x2(T-1), where M1 and M2 are the
asymptotic variances of 66mm and 6A, respectively. Note that
63 is not efficient even when the null hypothesis is
rejected. This is because A'A is no longer consistent for
Cov(A'u) and because, as CASE I demonstrates in Section III,
we still have extra (T-2) moment conditions; see (111.9). To
obtain an efficient estimator, we would need to construct the
GMM estimator based on the moment conditions in (111.9).

Again, Newey's method could be used to generate the linearized

GMM estimator.

V. Estimation with Exogenous Variables
Until now, we have investigated the GMM estimation method

for the simple dynamic model with a one-period lagged

116
dependent variable as the only explanatory variable. In this
section, we extend the results obtained in the previous
sections to the general model which includes exogenous
variables. The most surprising finding is that there exist an
enormously large number of moment conditions available for the
dynamic model. Many former studies for the static model
assume W conditions, in the sense that all the
exogenous variables are uncorrelated with the e 's at all leads
and lags. These conditions, however, do not generate extra
instruments for the static model. On the contrary, all these
strong exogeneity conditions must be considered in the dynamic
model. I will explain this first, and later include the
moment conditions obtained in the previous sections.
The model can be written as
(V.1-1) yit = 6y,_'t_1 + xitB + ziy + (ai+eit)
= 5Y1,t-1 + xitB + ziy + nit
i=1,2,° ° ‘ ,N;t=1,2,° ‘ ' ,T
where x11: is a 1xk row vector of time-varying exogenous
variables, and 21 is a 1xg row vector of time-invariant
exogenous variables. That is, x11: and 21 are stochastically
independent of €it° For the ith cross-section, (V.1-1) can
be rewritten as
(V.1-2) Y1 = 6yi_’_1 + x13 + 217 + “1
For all the observations, the model can be expressed in matrix
form as:

(V.1-3) y

6y_1+XB+Zy+u

(Y-lr X, z)(6p 3', Y')' + u

117
= we + n

For the static model (no y_1 present), Hausman and Taylor
[1981], Amemiya and MaCurdy [1986], and Breusch, Mizon, and
Schmidt [1989] develop simple 1V estimation methods, each of
which provides a consistent and efficient estimator under a
given set of exogeneity conditions between x, z and a. 'To see
those exogeneity restrictions, it would be useful to partition
X and Z as follows:
(v.2-1) x = (x1, x2)
(v.2-2) z = (Z1, 22)
where X1 and Z1 are uncorrelated with a while X2 and 22 are
correlated with a. For notational convenience, define a
NTXNTm matrix, S', as follows:

- S - S

 

 

 

 

11 “ 11 S12 S11
S12 S11 S12 S11
S11 S11 S12 S11
_-- * _-- __- _--

(v.3) S = : ; S = : . :
SN1 SN1 SN2 SNT
SN2 SN1 SN2 SNT
- SNT - - SN1 SN2 SNT .

where m is the number of columns of 8. Note that 8* is time—
invariant matrix, in the sense that each of its columns is
time-invariant. If we follow the interpretation provided by
Breusch, Mizon, and Schmidt, the exogeneity conditions imposed
by these authors can be compactly described by

(v.4) plim(1/N)G'u = O

118
where G=(QvX, PVR). R is defined as follows:
(v.5) R = (X1, 21) for Hausman and Taylor
(X1, 21, X1*) for Amemiya and MaCurdy

*

(X1, Z1, X1 , (QvX2)*) for Breusch, Mizon, and
Schmidt

Since the comparison of these three approaches is not the
task of this paper, we will use R without preference to any
particular approach. A simple two stage least square (ZSLS)
treatment using G as instrument variables will provide a
consistent estimator of 9. Note that E(uu')=o?2F=ot2(Qv+¢2Pv)
where ¢2=(a¢2+Taa2)/oez. The simple ZSLS estimator, 86 is the
IV estimator applied to the following transformed equation:
(v.5) r”y = r’iwe + r’fu
Then, we have
56 = (w'r"¢c' (G'Grls'r’im'1w'r"=G(G'G)'1G'r”iy

Through some algebraic operation, it can also be shown

that3
(v.7) 86 = (W'G(G'I‘G)'1G'W)"1W'G(G'I‘G)'1G'y
The asymptotic distribution of 86 is given:
(v.8) vméG-e) .. N(0, aezplim[(1/N)W'G(G'I‘G)'lG'W]'1 ).
8G is also a GMM estimator. To see why, note that Cov(G'u)=
okzs'FG. Then, the GMM estimator based on the exogeneity
conditions in (v.4) minimizes
(y-we) 'G(G'I‘G)'1G' (y-we)
and it is exactly identical to 86.

Even though (V.4) could incorporate all the exogeneity

conditions on x and z in relation with a, it does not exploit

119
the W conditions about 6. To clarify this
point, consider the following strong exogeneity restrictions:
(V.9-1) E(Xit'519) = 0
(v.9-2) E(zi'eis) = o
where t,s=1,2,'~ ,T. In terms of uis, (v.9) implies:
(v.1o-1) E(xitwuLBﬂ-uian = o

(V.10-2) E(zi'(ui’8+1-uis)) = 0

 

 

where s=1,2,°°-,T-1. Define
. - I I-
Xil O | I XiT 0 1
- I I _
x11 x11 ' ' x11 “11
l I
0 x11 ' ' 0 ‘11
u 0 O : : 0 0
xi = o o : : o o
: : : : : :
o 0 -x11 : : o 0 "x11
l I
_ o o xil : : o o xiT _
.- .21 0 -I
0 z
z** o o
i O O
0 0 -z.
L zi -

 

 

where Xi" is a Tx(kT(T—1)) matrix and 21" is a Tx(g(T-1))

matrix. Let

X** Z**
** 1 ** 1 ** ** **
X= : ;z= : ;W =(X,Z)
x** Z**
N N

Then, we may use WM as instruments, because plim(1/N)W"'u=0.

**

Define F=(W** G)=(X** Z QvX PVR). Then, all the available

exogeneity conditions can be compactly described by

120
(v.11) plim(1/N)F'u = E(F'u) = 0
Note that not all the moment restrictions in (v.11) are
linearly independent. To see ‘why, we can rewrite the
restrictions in (v.11) separately as follows:

(v.12-1) E(x**'u)

0

(v.12-2) E(z**'u) o
(v.12-3) E(X'Qvu) = o

(v.12-4) E(R'Pvu) = o

(V.9-1) implies E[X1t' (cit—2i) ]=E[xit' (uit-ﬁi) ]=0, which in turn
shows (V.12-3). Since (V.12-1) is derived from (V.9-1),
(V.12-1) also implies (V.12-3). That is, k restrictions among
those in (V.12-1) are already incorporated. by (V.12-3).
Therefore, (V.12-1) contains extra [(T2-T-1)k], not
[T(T-1)k], conditions which are not captured by (V.12-3).
Since (V.12-2) implies [(T-1)g] restrictions, (v.11) includes
in total extra [(TZ-T-1)k+(T-1)g] conditions compared with
(v.4).

Following the same reasoning, we.can easily show'that the
rank of (XM QVX) is T(T-1)k, while the number of columns of
(X?' QvX) is (T2-T+1)k. That is, F does not have full-column
rank. Therefore, we have to exclude QvX, or, as I assume
here, the last k columns of X** from F, when we procceed the
2SLS estimation based on (v.11). Then, the 2SLS estimator,
8,, has the same form as 86 except that F replaces G:

(v.13) 8,. = (w'r‘5F(F'F)“lF'r'5W)‘1w'r“1F(FIF)'1F'r'ky
= (W'F(F'I‘F)'lF'W)'1W'F(F'PF)'1F'y

The asymptotic distribution of 8F is given:

121

(V-14) ”(SF-e) -+ N(0, aezplim[(l/N)W'F(F'I‘F)’1F'W]'1 )
a, is also the GMM estimator based on all the available
exogeneity conditions give in (v.11).

** is irrelevant for the static model. To see

However, W
why, note that Hausman and Taylor, and Amemiya and MaCurdy,
use (Qv R) as instruments, not G=(QvX PQR). It actually does
not matter whether we choose (QVX PVR) or (QVIR), because both
generate the same estimator; see Breusch, Mizon and Schmidt
[1989]. By the same reasoning, the 1V treatment with
F=(W",QvX,PvR) on the static model amounts to that with
(W**,QV,R). Now, since the columns of W" are linear
combinations of columns of er in the sense that W**=QVW**,
the projection of a variable onto (W** Qv R) must be equal to
that onto (Qv R). This means that (WM Qv R) provides the same
estimator as (QVZR), and therefore (QVX PVR). In other words,
the inclusion of W** into the set of instrument variables is
irrelevant for the estimation of the static model.

However, the situation is different for the dynamic
model, because Qv is no longer a legitimate instrument. (This
is why the within estimator is inconsistent.) The existence
of the lagged dependent variable as an explanatory variable
makes (Qv R) substantially different from G=(QVX PQR). These
instrument sets no longer generate the same estimator. This
is because QVX, not Qv alone, is a legitimate instrument.
This fact has a powerful implication. Since W** and QvX are
linearly independent, the inclusion of W** into the set of

instrument variables will improve the efficiency of the ZSLS

122

estimator for the dynamic model; i.e., 8, is more efficient
than 86 when y_1 is present. Furthermore, when the dimension
of W** ((Tz-T-1)k+(T-1)g) is considered, the efficiency gain
from using W” :may be enormously huge. This is an
extraordinary case for the general regression model, because
the number of instrumental variables based on the exogeneity
conditions usually does not change whether the given model is
dynamic or static. Therefore, this suggests that the moment
conditions under a model using panel data should be carefully
counted.

Now, let us include the moment conditions used in the
conventional IV approaches for the dynamic model. For this,
define
s = (A, F)

Then, the orthorgonality conditions can be expressed by
(v.15) p1im(1/N)S'u = 0

Since Cov(S'u)=a¢2E(S'I‘S), S'I‘S can be used as a consistent
estimator of Cov(S'u). The GMM estimator based on (v.15) is
given by:

(v.16) 88 = (W'S(S'PS)'IS'W)'LW'S(S'FS)'IS'y

where

(v.17) VN(§s-9) » N(0, ogzplim[(l/N)W'S(S'FG)'IS'W]'1 )
This is also algebraically identical to the IV estimator using
S as the instruments on the transformed equation given in
(v.6). APPENDIX B shows

(v.13) WIS(s'rG)‘1s'w = W'F(F'FF)'1F'W + K1

where K1 is a positive semidefinite matrix. This means that

123

88 is more efficient than 8F.

As a final step, let us include the whole set of moment
conditions available. Define
146) = (3(9): F)
The notation, H(6), indicates the fact that, unlike S, H is a
function of 9. Then,
(v.19) plim(1/N)H(8)'u = 0
Note that C=Cov(L'u)= iglml‘i'uiui'l‘i) where L1=(H1,F1).
Evaluate L1 and “1 using any consistent estimator, 8, and
denote them by 1.1 and {11. Then, C= Elﬁivﬁiﬁgﬁi is a
consistent estimator of C. The GMM estimator, 8L, minimizes
(y-we) 'L(eI¢‘1L(e> ' (y-we)
and its asymptotic distribution is given:
(v.20) VN(8L-6) —» N(0, plim[(1/N)W'(L+LD)C"1(L+LD)'W]'l )
where ID=(D,O,O).

For an efficiency comparison, assume that the e's are
normally distributed. Then, using LEMMA 1 in APPENDIX A, we

easily show

H
E

61:
m
o

'H
(v.21) plim(1/N)Cov(L'u) = aiplim(1/N)[ 'H

2|
o o ¢RPVR

]

ogzplim(1/N)L'FL + a£2J*

*
2 J O

+ a6 [ O O
0 0

000

where E=(W’*,QVX). This implies that L'PL+NJ* can be used as
a consistent estimator of C. If we replace 6 by L'FL+NJ*, as

shown in APPENDIX F, the inverse of the asymptotic covariance

124

matrix of an is given by:

(v.22) w'3(s'r3)'1s'w + K2

where K2 is a positive semidefinite matrix. Therefore, 8L is
more efficient than 85. Actually, 8L can be said to be
efficient, because it exploits all the available moment
conditions. The following linearized GMM estimator can also
be used:

(v.21) 5:. = (w'(£+£D)€:-1(£+£D)IW)’1WI(£+£D)€;‘1(£+£D)I(y-pv{i)
where “ means "evaluated at a consistent estimator of 6."

Again, 5L has the same asymptotic distribution as 8L.

VI. Conclusion

In this paper, I have adopted standard assumptions for
the dynamic panel data model, and I have characterized all of
the moment conditions that these assumptions imply. I showed
that previous IV estimators do not impose all of the available
linear ‘moment conditions, and also that there are some
nonlinear moment conditions that they do not incorporate.
This reveals the inefficiency of the conventional 1V
estimators. I propose an efficient GMM estimator. Since the
GMM estimator is nonlinear, I also considered an efficient
linearized version of the GMM estimator.

I also extended this approach to the dynamic model that
includes exogenous variables. I showed that the existing
treatments of the static model incorporate some but not all of

the restrictions that are implied by strong exogeneity of the

125
exogenous variables. The extra restrictions implied by the
strong exogeneity are irrelevant in the static model, but they
are relevant in the dynamic model. This fact generates a
large number of additional instruments, all of which can be
exploited either by a linear IV estimator or by a nonlinear
GMM estimator. These extra moment conditions may result in a

large gain in efficiency in the dynamic model.

126

ENDNOTES

This assumption implies that Yio's are stochastic. Since
the a's are random effects, and since Yio may include ai,
this seems reasonable. Also, the assumption, E(y10)=0,
is adopted for simplicity. Actually, it is enough to

assume that 2 is the second-moment matrix:

2 = E[(YoraI‘1I€2I' ' ' I51) ' (yo.a.€1.62,' ° ' '51)]

J contains 062. Therefore, J' must be estimated by any

consistent estimator of 062.
We assume here that ¢2 is known. For the estimation of

¢2, see Hausman and Taylor [1981].

APPENDICES

127

APPENDIX A

To derive the GMM estimator based on the restrictions in
(11.5), we need to know the covariance matrix of Ai'ui under
(SA) given in Section III. For this, the following lemmas are

useful.

LEMMA 1

Suppose that £1, £2, and 53 are random variables. Assume that
51 is independent of E3, and E(£1)=0. Also assume that

52 = 301 + bnz.

where n1 is independent of £3; and n2 is independent of £1, and

a and b are some constants. Then,

E(£1£2£3) =E(51€2)E(€3)-

Proof.
E(Elizig) = aE(€1n1€3) + bE(5102€3)
= aE(€1nl)E(€3) + bE(El)E(02€3)
= aE(£1n1)E(£3)
= E((an1+ bn2)81]E(53)
= 3(5152)E(E3) QED
LEMMA 2

E[Y1tY1h(‘1,e+1'€a)(€1,k+1'€1k)]

= E(Yityih)E[(€i,s+l'6is)(ei,k+1-€ik)]l
for tsh, t<s,k.

Proof.

Since t<s, t<k, Ylt is independent of (€1JH1’513)(51nu4‘51k)°

128
Obviously, E(Ylt)=°° Furthermore, we can decompose Yin into
a part that depends on yit and a1, which is independent of
(Ei,s+l-€is)(ei,k+l-£ik)' and a part that depends on
51,t+1I"°I‘1hI which is independent of Yit' Then LEMMA 1

applies and implies the result. QED

THEOREM 1
Cov(Ai'ui) = a‘2E(Ai'PiAi) = oEZE(A1'Ai)
where F1=Q+¢2P and ¢2=(a€2+Toa2)/062.
Proof.
A typical element of Ai'ui is Yit(eiﬁHd-£is) where t = O, 1,
2, -'°, T-2, and s>t. Then the typical element of Cov(Ai'ui)
= IE(A1'uiui'Ai') must be E(YitY1h(51,s+1'eie)(‘1,k+1'51k))'Where
ch, s>t, and k>h. By LEMMA 2,
E(Yityih(€i,s+l'€is) (€1,k+1’€1k) )

= E(Yityih) EE (€1,e+1"€1s) (€1,k+1‘€1k)]
Using this fact, we can show
E(Ai'uiui'Ai) = E[Ai'E(uiui')Ai] = ae2E[Ai'FiAi]
where? 66213 = E(uiui'). It is a well-known fact that
F1=a¢2(/Q+6>2P) where ¢2=(052+Toa2)/062; see Hausman and Taylor
[1981]. Therefore, we have

E(Ai'uiui'Ai) = aezE[Ai'Ai]. QED

To derive the GMM estimator, we note that

plim(1/N)Cov(A'u) plim(1/N)jg1Cov(Ai'ui)

N
plim(1/N)i§1E(Ai'uiui'Ai)

 

129

, N
= 062-p11m(1/N)j;1E[A1'A1]

= o£2°plim(1/N)E(A'A)

a£2°plim(1/N)A'A

This shows that (A'A) could be used as a consistent estimator
of Cov(A'u). (Actually it must be 362(A'A) where 3:2 is a
consistent estimator of 062. However, 3‘2 would cancel and
therefore is irrelevant for the derivation of the GMM
estimator.) By minimizing

(u'AI (A'AI‘1(A'u> = {(y-6y-1) 'A} (A'AI'1{A' (r6121) 1.

we obtain the GMM estimator of 6, which is exactly identical

to the IV estimator given in (11.6).

130

APPENDIX 3

From (111.6-1), we have (with i suppressed):

(A-B'l) E(You1)=E(You2)=E(You3)=' ’ ° ' =E(You-r-1)=E(You1-)
E(UZU3)=' ° ' ' =E(UZUT_1)=E(U2UT)

E (“T-2‘11-1) =3 (“Ir-2‘11)
This set of equations can be rewritten as follows:
(A.B-Z) E[Yo (112-111) ]=E[Yo(u3'u2) ]=' ’ ° ' 3 E(Yo(ur'uT-1)] =0
E[u1(u3-u2)]=°°°°= E[u1(uT-uT_1)] =0
E[“1-2(u1‘uT-1)]=°
Multiplying the equations in the first row of (A.B-2) by'6 and
adding the equations in the second row gives:
(LB-3) E(Y1(u3-u2) )=° ' '=E(Y1(uT-UT-1) )=0
The same procedure for the other equations generates all the
moment conditions given in (III.7-1).
Now, we can rewrite (111.6-2) as follows:
(A.B-4) E[u1(u2-u1)] = E[u2(u3-u2)] = - - - ' = E[uT_1(uT-u.r_1)]
These restrictions can be equally expressed by
(A.B-S) E[u1(u2-u1)-u2(u3-u2)] = E[u2(u3-u2)-u3(u4-u3)] =°'--
= EDIT-2(“T-1'u'r-2)‘u'r-1(“'r’u1-1)1 = 0
Since E[yo(u2-u1)]=E[y1(u3-u2)]=0 by (A.B-2) and (A.B-3),
E[u1(u2-u1)-u2(u3-u2)] + 6E[y0(u2-u1)] ‘ 6E[y1(u3-u2)]
g E[Y1(u2'u1)'Y2 (“3'U2H = 0

Similarly, we can show that E[yt(ut+1-ut)-yt+1(ut+2-ut+1)] = 0

131
for any IStST-z.
Finally, (111.6-3) implies:
(A.B-6) E[(u2+u1)(u2-u1)] = E[(u3+u2)(u3-u2)] = ~°°°
= E[(uT+uT_1)(uT—ur_1)] = 0
Since 023=024=~-=02T and nl3=n14=---=nu by (111.6-1) and

(111.6-2),

Therefore,
E[ (1124-111) (‘12-‘11) +113 (112-111) +1.14 (Uz‘ul) + ° ‘ ' ° +uT(u2-u1)] = 0

E(ﬁ(u2-u1)) = o
_ T
where u=(1/T)tE1ut. In essentially the same way, we can show

that all the restrictions in (III.7-3) are justified.

132

APPENDIX C

The following lemmas and theorems will help to derive the

covariance matrix of Hi'ui under (SA).

LEMMA 3

All the assumptions in (SA) are satisfied. Then, (with i
suppressed)

E(YtYh(‘e+1"€e) (eh+1'6h) )=E(YtYh) 1“ (€e+1"€e) (5h+1"€h) )

where t=0,1,°'-,T-2, h=1,2,-°°,T-1, and t<s.

Proof.

If t<h, LEMMA 2 directly shows the result. Consider the case
where hst. Since s>t2h, 68+I-68 is independent of Y£Yh, and
E(es+1-ea)=0. If h+1=s, eh+1 is independent of ytyh and 5h is
independent of 63+I-68. If h+1<s, 5h+1'5h is independent of
58+1-es. Therefore, 5h+1"e can be decomposed in the same way
that LEMMA.1 describes. LEMMA 1 again applies and implies the

result. QED

LEMMA 3 directly implies the following theorem.

THEOREM 2

E(Ai‘uiui'Bli) = 062E(Ai'I‘iBli) = 05213011111).

The following lemma is helpful to investigate the form of

the covariance matrix of A1'“1“1'321-

133
DNA 4
Under SA,
E[Ytg(es+l-es) (€k+1‘5k)] = E(ytE)E[(es+l-es) (‘k+1‘€k)1

_ '1'
where s>t, s,k=1,2,-°-,T-1, and 6:13:16“.

Proof.
Let ea+1=élet-(ea+1+ea); then E=(68+1+68)/T+e3+1/T. Therefore,
we have
(A.c-1)
E(ytz(€s+l-es) (‘k+1‘€k)] = E[Yt(es+l+€s) (63+1-es) (5k+1"€k) l/T

+ E[Ytee+1(‘a+1"e) (‘k+1"k) ”T

= a + b
Note that yt is independent of 55+1 and £8. Obviously,
E(yt)=0. If k>t, 5114-1“): is independent of yt. If k=t, ck”
is independent of yt, and ER is independent of
(63+1+6s)(es+1-€s)° If k<t, 5k+1'5k is independent of
(63+1+es)(es+1-es). That is, 5k+1“k can be decomposed in the

same way that LEMMA 1 assumes. Therefore, we have

(A.C-2) a E[Yt(€k+1'€k) ]E[ (63+1’53) (53+1+53) ]/T
= o
= E[Yt(€e+1+€e) /T]E[ (53+1'Ea) (€k+1€k)]

Furthermore, we can easily see that 53+1'ea is independent
of yt and es”, and that E[(ea+1-ea)]=0. If k>s+1 or k<s-1,
€k+l'6k is independent of €s+l-es' If k=s+1, 5k+1 is
independent of (53+1'ea) , and 6k is independent of yt and e3”.

1f k=s, 5k+1“k is independent of yteaﬂ. Finally, if k=s-1,

€k+1 is independent of ytesﬂ, and GR is independent of

134
(53+1"a)(€s+1"a)- Then LEMMA 1 applies and we have
(A.C-3) b = E(ytea+1/T)E[ (68+1-ea) (ek+1-ek)]
Substituting' (A.C-2) and (A.C-3) into (A.C-l) gives the

result. QED

LEMMA 5

E[Yta(‘e+1“a)(5k+1‘€k)] = E(yta)E[(es+l'es)(€k+l-€k)]'

for any s,t,k.

Proof.

a is independent of (63+1-63)(6k+I-€k) and E(a)=0. Note that
(A.C-4) yt = [6tyo+a(l+6+---+69'1)] + (et+6et_1+---+6°'1el).
The first term, [6tyo+a(1+6+--°+63'1)] is independent of
(59+1‘53)(5k+1‘5k)- The second term, (et+6et_1+~-+6°'lel) is
independent of a. Therefore, LEMMA 1 applies and implies the

result. QED

The following theorem shows the form of the covariance
matrix of Ai'ui and B2i'ui.
THEOREM 3
E(Ai'uiui'BZi) = 062E(Ai'ri321) = 0622(Ai'321).

Proof.

Ai'uiui'Bn has the typical element of ytﬁ(ua+1-u3)(uk+1-uk),
where t<s, which is identical to yt(a+E)(68+1-es)(ek+1-ek).
Then, LEMMA 4 and 5 imply

(A-C’S) E[ytﬁ(€s+l-Es)(€k+I-€k)] = ElYta(€s+1‘€s)(€k+1"k)3

+ E[Ytz(6s+l-€s)(ek+l-ek)]

135
= E(yta)E[ (53+1'53) (€k+1"5k)]
+ E(YtEWE (es-+1-65) (‘k+1‘€k) 1
= E(Ytﬁ) E[ Warns) (uk-I-l'uk) 1 -

Using (A.C-5), we can easily show the result. QED

The following lemmas help us prove Theorem 4 which

describe the form of E(Bli'uiu1'311)-

LEMMAG

measures] E(ytzm(et.1-et)21+[E(e‘)-a.21

Proof .

E[Yt2(5t+1‘€t)2] = E(Ytzet-ﬂz) 4" E(ytzetz)

E(yt2)E(€t+12) + E[(6Yt-1+a+6t)2€t2]
E(yé) E(5t+12)

+ E[{ (6yt_1+a) 2+6t2+2 (6yt_1+a) (age-J]

E(ytzmumz) + E[(6Yt-1+a)2€t2] + E(et‘)
E(Yt2)E(€t+12) + Euayt-1+a+et)21met2)

- E(et2)E(et2) + E(et4)

E(Yt2)E(€t+12) + E(Yt2)E(6t2) + [E(6t4)-0¢4]

E(yt2)E[(€t.1-6t)2] + [E(Et‘)-0¢‘]- QED

LEMMA7

E[Yth+h(5t+1‘5t) (et+h+1’5t+h)]

= E(Yth+h)E[(‘t+1'€t)(6t+h+1’€t+h)] + 064' 1‘21-
Proof.

(A'C'G) E[Ytyt+h(et+1"5t) (‘t+h+1'5t+h)]

= E[Yth+h(‘t+1"t)6t+h+1] ' E[Yth+h(€t+1'5t)et+h]

136

'E[Yth+h ( €t+1'5t) €t+h]

= E(Ytyt+h€t€t+h) ' E(Ytyt+h€t+1et+h)
Note that
(A-C‘7) E(Ytyuh‘t‘uh)

= E (Ytet) E (Yt+h€t+h]

4
e

= a
(A.C-8) E(Yth+h€t+1€t+h)
= E(Yt{6hYt+a(1+6+- ° -+6h'l)+(€t+h+° ' ° +6h-16t+1)}€t+1€t+h]
= E[yt{6hyt+a(1+6+° ' ' +5h-1)}et+1et+h]
+ E[Yt(5t+h+° ° ' +611-1931) et-.+1€t:+h]
= E[Yt{5hYt+a(1+5+' ' ' +5h-1}]E(5t+16t+h)
= E[Yt{5hYt+a(1+5+' ' ' 6h-1)+(€t+h+° ° ‘ +6h_let+1) 1’]
'E(5t+1€t+h)
= E(Yth+h)E(€t+1€t+h)
= -E(ytyt+h) E[ (€t+1"€t) (€t+h+1'€t+h)]
Substituting (A.C-7) and (A.C’B) into (A°C'6) gives the

result. QED

We can combine LEMMA 6 and 7 into the following lemma.
LEMMA 8
EIYtYh(‘t-.+1"t) “nu-511)] = E(Ytyh)E[(€t+1‘€t) (‘h+1"h)1 + ath

where ath=E(e4)-ae4, if h=t; ath=ae4, elsewhere.

Now we can evaluate E(Bli'uiui'Bli).

 

137

THEOREM 4

E(Bli'uiui'Bli) = a¢2E(Bli'FiB11)+(d+o¢4)J1

o¢2E(Bli'Bli)+(d+o¢2)J1

 

 

where
- 2 -1 0 a
-1 2 -1
J1 = 0 -1 2
0 0 -1
: : . 2 -1
- 0 0 0 -1 2 ~(T-2)x(T-2),

d = E(e4)-3oe4.

Proof.

A typical element of E(Bli'uiui'Bli) is given by

(A.C-9)
E[{Y£(ut+1‘ut)'Yt+1(ut+2‘“t+1)}{Yh(uh+1‘uh)‘Yh+1(“h+2'uh+1)}]

_-. E[{Yt(et+1'et)'Yt+1(6t+2-€t+1)}{yh(£h*1-6h)-yh+1(eh+2-eh+1) }]

E[YtYh(‘t+1'5t)(‘h+1’€h)]

" E[YtYh+1(5t+1‘€t)(‘h+2‘5h+1)]

" E[Yt+1yh(et+2'et+1) (€h+1'5h)]

'* E[Yt+1yh+1(€t+2'et+1)(5h+2’€h+1)3

= E(Ytyh) El: (€t+1-£t) (6h+1-6h)]

E(YtYh+1)E[ (eta-6t) (€h+2“h+1) 1

E(yt+1yh)E[(€t+2-€t+1)(€h+l'5h)]
'+ E(yt+lyh+l)E[(€t+2-£t+l)(€h+2-€h+l)]
+ dth
= E(yiyh)E[(ut+1-ut)(uh+1-uh)]
’ E(YtYh+1)E[(“%+1’ut)(“n+2'“h+1)1
' E(Y£+1Yh)E[(“t+2'ut+1)(“n+1’“h)]

I E(Yt+lyh+l)E[(ut+2-ut+l)(uh+2-uh+l)]

138
+ dth
where dth = ath-at’h+1-at+l'h+at+l'n+1. The last equality comes
from LEMMA 8. Then, we can easily show
(A.C-lO) dth = 2[E(e4)-2a£4], if h=t
-[E(e4)-2064], if h=t+1 or h=t-1
0, if h>t+1 or h<t-1.

(A.C-9) and (A.C-10) imply the result. QED

In order to derive the covariance matrix of B11 and B21,

we need the following lemmas.

LEMMA9
E[€hz(6h+l-€h)(€k+l-€k)] = E(ehE)E[(6h+l-eh)(6k+1-€k)] +ghk'
where ghk = [E(e4)-3062]/T, if k=h

-[E(e4)-3062]/T, if k=h-1

0, otherwise.

Proof.
If k>h+1 or k<h-1,
(A.C-ll)

E[ehz(eh+l'€h)(ek+l'ek)]

r
E[€h{€h+1+‘n+€k+1+€k+g§1‘€h+1‘€h'€k+1'ek)}(€h+1‘€h)(€k+1'€k)]/T

E[€h(€h+1+€h+€k+l+€k)(éh+1'€h)(6k+1-€k)]/T
+ E[6h(é§1-£h+l'€h-ek+l-€k)(€h+1'€h)(€k+l-ek)]/T
= E[5h(‘h+1+5h+‘k+1+€k)(‘h+1'€h)(€k+1‘€k)]/T
= E[5h(‘h+1+‘h)(‘n+1'5h)(‘k+1'€k)]/T
+ E[‘h(€k+1+‘k)(5h+1‘€h)(5k+1'5k)]/T

=0

139

= E(ehE)E[(eh+1-eh)(ek+I-ek)].
If k=h+1,
(A.C-12)
E[€hz(€h+l'£h)(€k+l'€k)]
= E[€hz(€h+1"€h) (6h+2'€h+1)]
==Ef€h{5h+2+5h+1+eu+ggl’5h+2‘5h+1“5h)}(‘h+1‘5h)(‘h+2'€h+1)J/T
3 E[‘h(5h+2+€h+1+5h)(‘h+1"h)(‘h+2"h+1)]/T
'+ E[€h(é§1-6h+2-eh+l-eh)(€h+l-€h)(eh+2-6h+l)]/T
‘= E[5h(6h+2+eh+l+6h)(6h+1-6h)(€h+2-6h+1)]/T
= E[€h€h+2(€h+1'5h)(€h+2'€h+1)]/T
+ E[eheh+l(€h+l-€h)(6h+2-6h+1)]/T
+ Ef‘h2(5h+1'€h)(€h+2’€h+1)1/T

-E(€h2€h+22)/T + E(6h2€h+12)/T " E(6h26h+12)/T

_ _ 4
— a: /T

E(ehz)E[(6h+l-Eh)(€h+2'€h+l)]
= E(ehE)E[(eh+1-eh)(ek+1-ek)].

If k=h,

(A.C-13) E[ehz(eh+1-eh) (ek+1-ek) 1
= E[€hz(€h+1'eh)2]
= E[eh{5h+1+€h+( til-611+1’6h) } (6h+1-6h)2] /'r
= E[5h(‘h+1+€h)(‘h+1’€h)21/T

'+ E[€h(é§1-eh+l'eh)(eh+1-eh)2]/T
= E[5h(5h+1+5h)(€h+1’€h)2]/T
= E[5h€h+1(5h+1'5h)2]/T

+ E[eh2(eh+1-eh)2]/T
= -2E(eh26h+12)/T + E(ehzehﬂzn'r + E(eh4)/T

= E(e4)/T - 0‘4/T

140

= E(ehE)E[(6h+1-eh)2] + [E(e4) - 3a£4J/T
= E(ehEIEIIeh.1-eh>(ek.1-ek)1 + tEIe‘) - 30.41/T.
If k=h-1,
(A.c-14)
EtehEIeh.1-eh)<ek.1-ek)1
‘= E[€h{€h+l+6h+£h-I+(£§1'€h+1-5h-eh-l)}(€h+l-€h)(eh-eh-l)]/T
= E[€n(‘h+1+‘h+‘h—1)(€h+1‘€h)(5h'€h-1)]/T

r
'+ E[€h(éal'eh+1'eh'eh-l)(€h+l-€h)(£h'£h-l)]/T

E[€h(5h+1+€h+5h-1)(€h+1'5h)(€h'5h-1)1/T

E[eh6h+l(eh+l-6h)(eh-eh-l)]/T
+ E[eh2(eh+1-eh)(eh-eh,1)]/T

+ E[€h€h-1(€h+1'5h)(Eh’eh-1)]/T

E(ehzehﬂzn'r - E(eh4) + E(eh2£h_12)/T
= -[E(64)-Zo£4]/T
= E(ehEIELIeh.1-eh)(eh-eh-1)1 - [E(e‘I-sa.‘J/T

This completes the proof. QED

LEMMA 10
EIYtE(‘t+1’5t)(€k+1’€k)] = E(YtE)E[(et+l-6t)(€k+l-6k)] + ghkl
where ghk = [E(e4)-3062]/T, if k=h
-[E(e4)-3o£2]/T, if k=h-1

0, otherwise.
Proof.
(A.C-15) E[y£Z(et+1-et)(ek+l-ek)]

= 6E(Yt-lz(et+l-€t)(€k+l-€k)]
+ E[(a?)(6t+1-€t)(6k+1-6k)]

+ El: (51:2) (€t+1-€t) (5k+1-6k)]

141

Therefore, applying LEMMA 4,5, and 9 gives the result.

The following theorem describes the form of

E(Bii'uiui'321)-

THEOREM 5

032(311'321) +(1/T)d- J2

where
r 2 -1 0 ~
-1 2 -1
J2 = 0 -1 2
0 0 -1
: : : 2 -1 0
b 0 0 0 -1 2 -1 4

 

 

(T-1)><(‘T-2) .
d = E(e4)-3oe4.

Proof.

A typical element of E(Bn'uiui'Bu) is given by:

0’“ (3'15) E[{Yt(“t+1'ut) ‘Yt+1(ut+2’ut+1) }a(uk+1'uk) 1

E[{Yt(‘t+1'€t) ”Yt+1(€t+2’et+1) } (0+2) (6k+1-6k)]

EtytIa+E> (em-ct) (em-em
' E[Yt+1(a+z)(€t+2'5t+1)(€k+1‘5k)]
By LEMMA 5 and 10, we have
(A.c-17) E(yt(a+2) (eta-ct) (€k+1-€k)]

= EEYt(a+3)]E[(€t+1-6t)(ska-61.)] + g...
(A.C-18) E[yt+1(a+2) (et+2-et+1) (ek+1-ek) 1

= ElYt+1(a+z)]E[(5t+2'5t+1) (€k+1'5k)] + gt+l,k
Substituting (A.C-17) and (A.C-18) gives

(A- (3'19) El: {Yt(ut+1'ut) 'Yt+1(ut+2'ut+1) }E(uk+1-uk) ]

142

= Etyt(a+3) ]E[(6t.1-6t) (5k+1'€k) 1
' E(Yf+1(a+z)]E[(5t+2'6t+1)(5k+1'5k)]
+ btk
= E(ytﬁ)E[(ut+l-ut)(uk+l_uk)]
' E(Yt+1G)E[ (ut+2'ut+1) (uk+1-uk) ]
+ btk
where btk=gtk-gt+1'k. We can easily show that
(A.C-20) btk== (2/T)d, if k=t
-(1/T)d, if k=t+1 or k=t-1
0, otherwise.

(A.C-19) and (A.C-20) imply the result. QED

the following lemmas help us derive the covariance matrix

Of Bzi‘Uie

LEmﬂﬂk 11

E[22(et+1—et)2] E(32)E[(et+1—et)2] + 2[E(e4)-3a¢4]/T2.
Proof.

'2 2 2
E[e (6t+1 +2€t+16t+6t )]

E[Ez(5t+1'€t)2]

E(Ezet+12) - 23(Ezet+let) + E(Eetz)

[E(e4)+(T-1)064-4064+E(e4)+(T-1)et4]/T2

2052/1 + 2[E(e4)-3o£4]/T2

E(32)E[(et+1-et)2] + 2[E(e4)-3ae4]/T2 QED

LEMMA.12

E[22(Et+l-€t)(€t+2-et+l)] ==:E(EZ)E[(€t+l-et)(€t+2-€t+l)]

- [E(e4)-3oe4]/T2.

143
Proof.
E[22(et+1'et)(‘t+2'5t+1)]
E-z +E-2 -E-z 2_E-2
a (5 6t:+16t~I-2) (e 6feta) (e 6t+1 ) (6 stat-+2)
= [2064 +2oe4-E(e4)-(T-1)ae4-2064]/T2
_ 4 _ 4 _ 4 2
- -at /T [E(e ) 306 ]/T

= E(PIEuem-et)(em-emu - [E(e‘I-Ba.‘J/T2 QED

LEMMA 13
BIZEZ (€t+l-€t) (et+j+1-6t+j)] = E(Ez) El: (5t+1'5t) (et+j+1'€t+j) J r 3°22 -
Proof.
E[Ez(‘t+1"t)(5t+j+1'€t+j)]
= E(Zzet+let+j+1) + E(Ezetet+j) - E(Ezet+1et+j) - E(Ezetet+j+1)
= [2064 +Zae4-Zae4-20541/T2

= 0

E(22)E[(£t+l-et)(€t+j+1-€t+j)] QEII
Now we can derive the covariance matrix of BZi'ui.

THEOREM 6

./

 

 

where
- 2 -1 0 1
-1 2 -1
J3 = 0 -1 2
0 0 -1
: : : 2 -1
- 0 0 0 -1 2 J(T-1)x(T-1),

144
Proof.
A typical element of E(BZi'uiui'Bu) takes the form of
(A.c-21) E[32(ut+1-ut)(uk+1-uk)] = E[(a+3)2(et+1-et)(ek+1-ek)]
= m2 (eel-ct) (eke-ck) I
+ 2E[a€(et+1-6t) (ska-61.)]
'+ E[22(€t+1-6t)(ek+l-6k)]
= EIaZIEI (eel-st) (em-ck) I
+ E[Ez(et+1-et) (€k+1"k) I
By LEMMA 11,12, and 13, we have
(A.c—22)
E[22(5t+1'5t) (6k+1'€k)] = E(22)E[(5t+1'5t) (6k+1_€k)] 1' at);
where
dtk = 2d, if k=t
-d, if k=t+1 or k=t-1
0, otherwise.
Substituting (A.C-22) in (A.C-21) gives
(A.c-23)
E[Gz(ut+l'ut)(uk+l'uk)] = E[(a+z)2]E[(€t+l'€t)(ek+l-ek)]+dtk
= E(62)E[(ut+1'ut) (um-uh) 1w...-

(A.C-23) implies the result. QED

Since H=[A B1 B2] , and H1=[A1 B11 B21], Theorem 1-6 can be

used to derive the covariance matrix of H'u.

(A.C-24) (1/N)Cov(H'u) (1/N)E(H'uu'H)

N

N 2 *

145

2 N ' 2 *
ae (1/N) iE‘1E(Hi Hi) + 05 J

062(l/N)E(H'H) + ag2J*

where
o o l o

J* = o (d/a:+d§)J1 (1/T)(d/o:)J2
0 (1m (ca/aim, (1/T2) (d/aﬁn3

If e is normally distributed, d=0. Therefore, under normality

of e,

.*=[

where J=a¢52J1 .

COO

OQO

COO
I__J

146

APPENDIX D

Here, I show the explicit forms of $1 and #2 given in
(IV.16) and (IV.17), respectively. The e's are assumed to be

normally distributed. Then observe

N
(A.D-l) plim(1/N)H'H p1im(1/N)1§1H1'Hi

N
= plim(1/N) 21E(H1'Hi)
1.:

2

2 -2
2yio 2Yioui
i=1

N
= plim(1/N) E E[ _ _

2
[ 00 00a ]
= 2 2 2
o aa+a€/2

0a

(A.D-2) plim(1/N)H'y_f plim(1/N) g E

Yio(Y11‘Yio’
i=1

“i‘Yii'Yio’

2
= [ (6-1)oo+00a ]

2 2
(6 1)00a+0a+ae/2

(A.D-3) plim(1/N)D'y_1= plim(1/N) g E[ 0 ]
1:1 (Yio+yi1)(‘iz"i1)
_ o
[ -o§/2 ]

Using (A.D-1), (A.D-2), and (A.D-3), we have
(A.D-4) plim(1/N)y_1'D(H'H)’1H'y_1 = -aez/4
(A.D-S) plim(1/N)y_1'D(H'H)'1D'y_1
= mews/{of(1-po.’-I+o.2/2I
where p0a=00a/(aoaa). Substituting (A.D-4) and (A.D—S) into

(IV.16) gives

147
(A.D-6)
t1 = plimII/NIy;1'Pay-I+(1/8)0.4/{oa2(1-p0a2)+o.2/2}-(1/2)a.2
Also, using (A.D—1), (A.D-2), and (A.D-3), we can show
(A.D-7) plim(1/N)y;1'MAB2 = plim(1/N)y;1'B2

- plim(1/N)y_1A(A'A)'1A'82

(002002-00a2)/002 + 062/2

= aa2(1-p0a2) + 052/2

(A.D-8) p1im(1/N)BZ'MABZ plim(l/N)BZ'BZ
- p11m(1/N)132'A(A'A)‘1A'B2
= 2{ (oa2002-00a2) #50244162 /2}
= 2[oa2(1-p0a2)+a£2/2]
Therefore,
(A.D-9) plim(1/N)y_1'MABz(BZ'MAB2)'1BZ'MAy_1
= (1/2) { (oazaoz-aof) /aoz+ae2/2}
= (1/2)[0a2(1-90a2)+0.2/2]
Since plim(1/N)y;1'ny_1== plim(1/N)y_1'PHy_1
-plim(1/N)y_1‘MAB2 (B2 'MABZ) ‘132 'MAy_1 ,
(A.D-10) t2 = plim(1/N)y;1'PAy_1 = plim(1/N)y_1'PHy_1

-(1/2)[0a2(1-90a2)+0.2/2]

148

APPENDIX E

Here, we show the efficiency gain of as defined in (v.16)
over 8, in (v.13). Consider the inverse of the asymptotic
covariance matrix of 81,; W'F(F'I‘F)'1F'W. For simplicity , let
E=[w** QVX].

E' 2 E'E 0
(A.E-l) F'FF = [ R'Pv ](Qv+¢ Pv)(E PVR) = [ o ¢2R,PVR ]
[ E'y_1 E'X E'Z ]

EI
(A.E-Z) F'W = [ ](y_1 x Z) = . ' '
R va_1 R va R pvz

I
R PV

Using (A.E-l) and (A.E-2), some straightforward algebra shows

I I
_1 y-1 PEy-l y-1 PEX °
(A.E-3) W'F(F'FF) F'W = X'PEy X'PEX o
o o o
_ I I I .
y-1 PPvRy-l Y—i PPVRX Y—i ppvRz
2 I I I
+ e x PPvRy x PPva x vaRz
Z'P y Z'P x 2'? z
PvR PVR PvR .

 

 

Now, consider the inverse of the asymptotic covariance

matrix of as; W'S(S'FS)'18'W

AI A'A A'E o
(A.E-4) SIrs= E' (Qv+¢2Pv)(A E F)= E'A E'E o
RIPv o o ¢2R'PVR
A' AIy_1 A'X o
(A.E-S) S'W = E' (y_1 x 2) = E'y E'X o
I I I I
R pv R va_1 R pvx R pvz

Then, after some matrix operations, we have

149

(A.E-S) WIS(SIrS)’1SIw

 

 

y_1'A y_1'E AIA A'E ’ AIy_1 A'x : o
I
= [ [ x'A x'E ][E'A E'E] [E'y_1 E'x] : 0 ]
""""""""'6 """"""""" T"6
- I I I .
y-1 PP RY-i y-1 PP RX Y—i PP Rz
2 v IP vx X'P vZ
+ ¢ X P y_ X
PvR 1 PVR PVR
ZIP y_ ZIP x ZIP Z
PvR 1 PVR PvR .
Note that
-1
A'A A'E 1 _1 _1
(A.E-7) [ ] = [ ](A'MEA) (1, -A'E(E'E) )
E'A E'E -(EIE)’1EIA

o o
+ [ o (1:I1:)'1 ]

A'y_1 A'x]

'1
(A'E-8) (I! “A'E(E'E) )[ I I
E y_1 E X

—_- I I
(A MEy_1, A MEX)

= (A'MEY—ll 0)
Therefore,
y_1'A y_1'E A'A A'E '1 AIy_1 A'x
(A'E'9) [ X'A X'E ][ E'A E'E ] [ E'y_1 E'X ]

y_1'A y_1'E 0 0 A'y_1 A'x
-1
[ X'A A'E ][ 0 E(E'E) ][ E'y_1 E'X ]
Y 'M
+[ E“

o ]A'MEA(A'MEy_1, 0)

= [ y-1'PEY-1 Y—i'P X ] + [ K11 ° ]

I I
x PEy_1 x PEX o o

where Kn=y_1IMEA(AIMEA)‘1AIMEy-1. Substituting (A.E-9) into

150
(A.E-6) gives
(A.E-lO) WIS(SIPS)’1SIw = W'F(F'I‘F)'1F'W + I<1

where

151

APPENDIX E

If the e's are normally distributed, the inverse of the

asymptotic covariance matrix of 8L is given by:

(A.F-1)‘WI(L+LD)¢‘1(L+LD)IW

* -1
I I I I I
Y_1 (H+D) y_1 E y_1 PVR H H+J H E o
= X'(H+D) X'E X'PVR E'H E'E o
2
I g '
z D 0 Z PVR o o e R PvR

(H+D)y_1 (H+D)'X (H+D)'X
E'y_1 E'X 0
I I I

R P y__1 R PvX R Pvz

X'(H+D) X'E E'H E'E
----------------------------: ------------------- I-
( Z,D o )[H'H+J* H'E] 1[(H+D)y_1 (H+D)'x]

E'H E'E E'y_1 E'x

”[y_1'(H+D) Y-1.E] [H'H+J* H'E] -1[(H+D)y_1 (H+D) 'X]

 

 

 

 

I
I
I
-§-_§li§in----Zl§ ..... Elﬁ------l-_------9---
' H'H+J H'E D'z
I< Z'D 0 )[ E'H E'E J E o
I .
y-1 PP RY-1 Y—l PPVRX y--1 PPVBZ
2
+ ¢ x P y_ X'P x X'P z
PvB 1 Pv PvB
ZIP y ZIP x ZIP Z
PvB 1 PVB PvB _
Note that
HIH+J* H'E "1 1 * _1
(A.F-2)[ ] = [ _1 ](H'P H+J )(1 -HE(E'E) )
E'H E'E -E(E'E) H

I '1 ]
+
o (E'E)

152

(H+D)y_1 (H+D)'X
(A.F-3) ( 1 -HIE(EIE)'1 )[

_1 D'z
(A.F-4) ( I -H'E(E'E) )[ ] = D'z
0

Substituting (A.F-2), (A.F-3), and (A.F-4) into (A.F-S) gives
(A.F-S) WI(L+LD)¢'1(L+LD)IW

= W'F(F'F)'1F'W
'M H 'D
y-l E y--1 *
+ X'D (H'MEH+J )(H'MEy +DIy_1 D'X D'Z)
Z'D

= W'F(F'F)'1F'W

* I: It
I I I I
y_l MEA y_1 ME(B +D ) A MBA A MEB
+ 0 X'D*

* * * * 2
0 Z'D B 'M A B 'M B +0

J**
E E e

A'M y_1 0
° * e
[(3 +0 )‘MEy_1 D*Ix D*IZ ]

where B*=(B1 B2); D*=(0 D2);
J 0
J** = 02 [ ]
0 0
Note that
I:
I I
A MEA A MEB ]

(A F.6)I:B*'M 3*IM B*+ 2J**
EA “e

-1

E

[(A'MEAY-1 A'MEB*] *
(B

-I

' * ** * -1
— M B +J -B 'MEA(A'MEA)

A'M 3*
E E )

153

' . _ _ (A'MEA)-1 o
°(B* MEA(A MBA) 1 1) + [ 0 o ]
= K3 . [ “a“ Z ]

Substitute (A.F-6) into (A.F-S); then, we have

(A.F-7) W'(L+LD)C'1(L+LD) Iw = W'F(F'I"F)'1F'W + R1 + K2

WIS(SIrS)’1SIw + K2

where
I): *
y_1'MEA y_1'ME(B +D ) A'M y 0 0
*
K = 0 X'D K
2 at 3 * * * *
0 Z'D (B +D )'MEy_1 D 'x D 'z

which is a positive semidefinite matrix. This shows that 8L

is more efficient than 83.

REFERENCES

154

REFERENCES

Amemiya, T. (1974), "The Nonlinear Two-stage Least-squares
Estimator," qutnai of Econometrics, Vol. 2, pp 105-110.

Amemiya, T., and MaCurdy, T. (1986), "Instrumental-variable
Estimation of An Error-components Model," Economctrica,
Vol. 54, pp 869-880.

Anderson, T., and Hsiao, C. (1981), "Estimation of Dynamic
Model with Error Components," J urna o the r a

Statistic Association, Vol 76, pp 598-606.

Arellano, M., and Bond, S. (1988), "Some Tests of
Specification for Panel Data: Monte Carlo Evidence and
Application to Employment Equation," working Paper no.
88/4, Institute for Fiscal studies, London.

Balestra, P., and Nerlove, M. (1966), "Pooling Cross Section
and Time Series Data in the Estimation of a Dynamic
Model: The Demand for Natural Gas," Econometrica,‘Vol 34,
pp 585-612.

Breusch, T, Mizon, G., and Schmidt, P. (1989), "Efficient
Estimation Using Panel Data," Econo et ‘ , Vol. 47, pp
695-701.

Chamberlain, G. (1987), "Asymptotic Efficiency in Estimation
with Conditional Moment Restrictions," EQQIQQL__Q£

Eccnomcttics, Vol 34, pp 305-334.

Hausman, J. (1978), "Specification Tests in Econometrics",
Economcttica, Vol 46, pp 1251-1271.

Hausman, J., and Taylor, W. (1981), "Panel Data and
Unobservable Individual Effects," Econometrica, Vol 49,
1981, pp 1377-1399.

Holtz-Eakin, D. (1988), "Testing for Individual Effects in
Autoregressive Models, " Journai of Econometrics, Vol. 39,
pp 297-307.

Hsiao, C. (1982), "Formulation and Estimation of Dynamic
ModelsﬂUsing'Panel.Data," Journai of Econometrics,.Annals
of Applied Econometrics, Vol. 18, pp 47-82

Hsiao, C. (1986), Analysis of Panel Data, New York: Cambridge
University Press.

155

Kiefer, N. (1980) , "Estimation of Fixed Effect Models for Time
Series of Cross-sections with Arbitrary Intertemporal
Covariances,“ Journal of Economctrics, Vol. 14, pp
195-202

Newey, W. (1985) , "Generalized Method of Moments Specification
Testing," Jccrnai cf Econometricc, Vol. 29, pp 229-256.

HICH IES

11111111111