.1
9.

L. no
..§ m...) .
‘ . . . , . _ ha.” 70....
u. 5 Hank...“-

 

THES‘S

I‘;"n

‘_

 

LIBRARY
Michigan State
Unwemmy

 

 

 

This is to certify that the
dissertation entitled

Three Essays on Generalized Method of Moments

presented by

Artem B. Prokhorov

has been accepted towards fulfillment
of the requirements for the

Ph.D. degree in Economics

 

/_ .
92% SM?
Major Professor’s Signature

Lt! ASK/cg

 

Date

NS) is an Affimtative Action/Equal Opportunity lnstitut ion

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

o723uu
my a 2mm

 

'vv

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/ClRC/Date0ue.p65—p.15

 

ﬂ

THREE ESSAYS ON GENERALIZED METHOD OF MOMENTS
BY

ARTEM B. PROKHOROV

A DISSERTATION

SUBMITTED To
MICHIGAN STATE UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

DEPARTMENT OF ECONOMICS

2006

ABSTRACT

THREE ESSAYS ON GENERALIZED METHOD OF MOMENTS
BY

ARTEM B. PROKHOROV

Generalized Method of Moments (GMM) is a powerful estimation method based
on orthogonality conditions known to hold in the population of interest. GMM is
sufficiently general to incorporate most of the extremum and minimum distance
estimators in econometrics including (Q)MLE, M-estimator, weighted and nonlin-
ear LS. By taking advantage of GMM’S universality, my thesis seeks to contribute
to three areas of (micro)econometric research: modelling processes with missing
observations (e.g., attrition and self-selection in panel data sets, counterfactual
outcomes for treatment and control groups), modelling likelihood using copulas
(e.g., PROBIT, LOGIT, selectivity models), and modelling covariance structures
(e.g., LISREL, ﬁxed effects, factor analysis).

The ﬁrst essay, “GMM Redundancy Results for General Missing Data Prob-
lem,” considers alternative GMM estimators of a parameter vector that enters into
one set of moment equations along with another vector that also enters into an
additional set of moment conditions and may be known. Alternative estimators
are ranked in terms of relative efﬁciency, and conditions for no efficiency gains are
derived. The results are applied to a general missing data problem. Conditions for
the counterintuitive result of the missing data literature that estimating selection
probabilities is better than knowing them arise naturally in the general problem.
Efficiency gains from using both weighted and unweighted moment equations under
exogenous sampling are considered.

The second essay, “Robustness, Redundancy, and Validity of Copulas in Like-

lihood Models,” considers likelihood-based estimation of multivariate models, in
which only marginal distributions are correctly speciﬁed. The unknown joint dis-
tribution is modelled with a copula function, which may be misspecified. In a
GMM framework, we study robustness and efﬁciency of resulting estimators, pro-
pose improvements to existing estimators and discuss tests of copula validity. It
is shown that radially symmetric copulas are robust against misspeciﬁcation in
problems about sample means if the true joint density is also radially symmet-
ric. Efficiency results suggest that knowledge of the true copula is redundant if
and only if the covariance matrix for relevant moment conditions is singular. A
simple simulation supports the theoretical result about robustness of the Frank,
Farlie-Gumbel-Morgenstern and Ali-Mikhail-Haq copula families.

The third essay, “Modelling Covariance Structures: First and Second Order
Asymptotics,” considers estimation of covariance structure models by quasi max-
imum likelihood (QMLE), generalized method of moments (GMM) and empirical
likelihood (EL). A general condition is derived under which the GMM (and EL)
estimators do not dominate normal QMLE in terms of ﬁrst-order efﬁciency. The
condition is formulated in terms of the fourth order moments of the true distrib-
ution. The second-order asymptotic bias of QMLE is derived and a formal proof
is presented of the intuitive result that, under normality, this bias is the same as

that of EL.

To the memory of my father

iv

ACKNOWLEDGEMENTS

This dissertation raises more questions than it answers. But the questions it raises
and answers would not have been answered (perhaps not even raised) if it had not
been for the interaction with my Doctorvater, University Distinguished Professor
Peter Schmidt. Professor Schmidt’s wisdom and creativity, his ability to put things
in perspective and prioritize, his talent of succinct statements and lively examples,
his patience, approachability and close interaction with his students, his generous
support of their endeavors and his understanding of their concerns — all this makes
him a great mentor and this dissertation worth reading. I have found immensely
enriching both my TA-ing for Professor Schmidt and attending the fabulous TA
appreciation dinners Professors Peter Schmidt and Christine Amsler sponsor each
semester for their TAs.

It is impossible to overestimate the support I have received from the other
members of my dissertation committee. Professor Jeffrey Wooldridge has been
prompt and thorough in reading drafted parts of the dissertation as they appeared
and providing feedback. He also generously supported my travel to Australia to
present the results of the ﬁrst essay. Professor Richard Baillie gave me valuable
insights into the world of financial econometrics during my RA-ship for him. Pro-
fessor Hira Koul has helped add statistical rigor to the dissertation and to my
graduate training in econometrics.

MSU graduate travel grants enabled me to attend the following meetings, where
parts of this dissertation were presented: the 2006 AEA/ASSA meetings in Boston,
the 33rd Annual Australian Conference of Economists in Sydney (October 2004),
the 5th Villa Mondragone Workshop on Economic Theory and Econometrics in
Rome (July 2005) and the 2004 Empirical Research Summer School on Experi-

mental Economics and Econometrics in Mannheim.

The ﬁnal stages of the research were supported by a Dissertation Completion
Fellowship from the Graduate School of MSU.

Emma Iglesias of MSU, Ivana Komunjer of UCSD and Rustam Ibragimov of
Harvard provided helpful discussions of some of the results. So did the participants
of the above mentioned conferences and of the econometrics seminars at Michigan
State, Bates White LLC, Concordia, Florida State, New South Wales, Massey,
Emory University and Central Michigan University.

Finally and most importantly, Irina Agafonova is the person who made this all
worthwhile.

I am very grateful to these people.

vi

Table Of Contents

Table of Contents vii
LIST OF TABLES ix
LIST OF FIGURES x
1 GMM Redundancy Results for General Missing Data Problem 1
1.1 Introduction ............................... 1
1.2 Efﬁciency and redundancy results for the general estimation problem 6
1.2.1 Preliminaries .......................... 6
1.2.2 The general estimation problem ................ 8
1.2.3 Efﬁciency and redundancy results ............... 11
1.3 Application to missing data problem ................. 16
1.3.1 The population problem .................... 16
1.3.2 Motivation and deﬁnitions ................... 20
1.3.3 Relative efﬁciency results under ignorable selection ..... 25
1.3.4 Relative efﬁciency results under exogenous selection ..... 30
1.4 Concluding remarks ........................... 34
Bibliography ................................. 36
Appendix ................................... 40

2 Robustness, Redundancy, and Validity of Copulas in Likelihood
Models 46
2.1 Introduction ............................... 46
2.2 Preliminaries .............................. 49
2.3 The GMM representation ....................... 52
2.4 Robustness of copula terms ...................... 55
2.4.1 A theoretical result ....................... 55
2.4.2 An illustrative simulation ................... 58
2.5 Redundancy of copula terms ...................... 65
2.5.1 Redundancy with correct copula ................ 66
2.5.2 Redundancy with misspeciﬁed copula ............. 71
2.5.3 Examples ............................ 75

vii

2.6 Validity of copula terms ........................ 82

2.6.1 Theoretical results ....................... 83
2.7 Concluding remarks ........................... 85
Bibliography ................................. 86
Appendix A .................................. 89
Appendix B .................................. 93
Appendix C .................................. 104
Modelling Covariance Structures: First and Second Order
Asymptotics ‘ 108
3.1 Introduction ............................... 108
3.2 Preliminaries .............................. 111
3.2.1 Setup and assumptions ..................... 111
3.2.2 An example ........................... 113
3.2.3 Estimators ............................ 115
3.2.3.1 Normal (Q)MLE ................... 115
3.2.3.2 GMM ......................... 116
3.2.3.3 EL ........................... 118
3.3 First order analysis ........................... 119
3.3.1 The ﬁrst order conditions ................... 119
3.3.2 Relative efﬁciency to the ﬁrst order .............. 120
3.4 Second order analysis .......................... 124
3.4.1 Stochastic expansions to the second order .......... 124
3.4.2 Second order bias of QMLE .................. 127
3.4.3 Comparison to GMM and EL ................. 129
3.5 Concluding remarks ........................... 130
Bibliography ................................. 131
Appendix ................................... 134

viii

List of Tables

2.1 The true values for Kendall’s 7‘ and p used in simulation ...... 61

2.2 Relative robustness measures for selected copulas, their standard
errors, and estimated Pearson’s correlation coefﬁcient 1‘0 for three
sample sizes ............................... 64

ix

List of Figures

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

3"(p) for no—parameter copulas: (a) Independence copula; (b) Lo-
gistic copula. .............................. 104
3%“, p) and 5pm, p) for one-parameter copulas: (1) Farlie-Gumbel-
Morgenstern. .............................. 104
FWD, p) and 5901, p) for one-parameter copulas: (2) Joe. ...... 105

5"(p, p) and 3901, p) for one-parameter copulas: (3) Ali-Mikhail-Haq.105

5"(p, p) and 3p(,u, p) for one—parameter copulas: (4) Clayton ..... 106
(9‘01, p) and (TN/I, p) for one-parameter copulas: (5) Gumbel ..... 106
5WD, p) and 3pm, p) for one-parameter copulas: (6) Normal. . . . . 107
(VOL, p) and 5pm, p) for one—parameter copulas: (7) Rank. ..... 107

Essay 1

GMM Redundancy Results for

General Missing Data Problem

1.1 Introduction

There are many models that can be formulated as two sets of moment conditions
with two parameter vectors one of which enters in only one of these sets and the
other in both. For example, Newey (1984) shows that multi-step estimators that
employ estimates of an additional parameter vector in estimation of the primary
parameter vector of interest can be represented in such a generalized method of
moment (GMM) framework with exact identiﬁcation of the parameters. Generated
regressors models of Pagan (1984), latent variable models of Zellner (1970) and
Goldberger (1972) and many others are two-step cases of this formulation. How-

ever, the primary focus of and the motivation for this essay are the missing data

(Or selectivity) models.

Selectivity models deal with samples in which some observations are omitted (we
call such samples “selected” ). The missing data problem arises when using selected
samples in an estimation procedure results in a biased estimator. For example, if
we were to conduct a survey of young mothers to study the effect of mother’s
smoking on the weight of the newborn, the survey would typically have missing
data due to non-response. It is likely that non-response is associated with heavy
smoking and poor birth weight. If the missing data were ignored the effect of
smoking would be underestimated. In such cases it is common to construct a
probabilistic model for the missing data generating process (we call this model a
“selection-model”) and then to apprOpriately adjust the primary model of interest

for the effect of selection into the sample.

This paper is motivated by a puzzle in the selectivity literature. Consider the set-
ting of a GMM problem is which we have a set of moment conditions, with some
parameters 01 (the “parameters of interest”), and these moment conditions hold in
the unselected sample. However, we also have a selection mechanism such that the
moment conditions do not hold in the selected sample. Under certain assumptions
given below (typically referred to as “ignorability” or “selection on observables”),
weighting the original moment conditions by the inverse of the probability of se-
lection yields a modiﬁed set of moment conditions that do hold in the selected
sample. We will follow Wooldridge (2002b, 2005) in calling the estimator based
on these weighted moment conditions the “inverse probability weighting” (IPW)

estimator.

Unless the probability of selection is known for each selected Observation, imple-

mentation of the IPW estimator will require a model that permits the estimation
of the probability of selection. Let 02 be the parameters (the “selection parame-
ters”) in the moment conditions derived from this model. Typically these moment
conditions will be based on the score function from the likelihood function for the
selection process. A two-step IPW procedure can be considered, in which the ﬁrst
step is the estimation of 02 from the selection model, and the second step is the
estimation of 01 by GMM on the weighted moment conditions, where the weighting

is done using the estimated probabilities of selection.

In this setting, the puzzle is that it is better to estimate the selection probabilities
than to use the true selection probabilities, even if the latter are known. In other
words, in terms of the augmented model described above, we get a better estimator
of 01 when we use the estimated 62 in the second step than if we used the true 02.
This phenomenon has been discussed by Wooldridge (1999, 2001, 2002b, 2005),
and it has also been noted in a number of previous works, including Rosenbaum
(1987); Imbens (1992); Robins and Rotnitzky (1995); Crepon et al. (1997), and
Hirano et al. (2003). This is puzzling because knowledge of 02, if properly exploited,

cannot be harmful.

To resolve this puzzle, we follow N ewey and McFadden (1994) in setting up an aug-
mented set of moment conditions, where the ﬁrst subset are the weighted original
moment conditions, which now contain both 61 and 02, and the second subset are
the moment conditions from the selection model, which contain only 62. We show
that the second set of moment conditions is useful (non—redundant), even when 62
is known. This is true because the second set of moment conditions is correlated
with the ﬁrst set in the selected sample (even though it is not in the full sample).

So the inefﬁciency of the estimator based on known 62 and the first set of moment

conditions only is due to its failure to exploit the information in the second set
of moment conditions; whereas, when 62 is not known, there is no choice but to

include the second set of moment conditions.

This raises the question of whether, when 62 is known, we can improve on the
two-step estimator (which uses estimated 02 in the second step) by using a GMM
estimator based on both sets of moment conditions, but where only 61 is estimated.
After all, this GMM estimator cannot be worse than the two-step estimator of 01.
The answer to this question is a bit complicated. In the case that the original
GMM problem (the one that contains the parameter of interest) is overidentiﬁed,
the two-step estimator is dominated by a one-step estimator that estimates 01
and 02 jointly in the augmented GMM model. However, we show that, in the
augmented GMM model, knowledge of 62 is redundant (does not improve the
precision of estimation of 01). So, while it can never hurt to know more, if that

knowledge is used properly, in this case it does not help either.

The result just quoted is given in Section 1.3 of the paper. In Section 1.2, we set
the stage by giving a number of results on efﬁciency and redundancy of estimation
in a general GMM setting, when one set of moment conditions depends on 61
and 62, while a second set of moment conditions depends only on 02. Some of
these results are original and interesting in their own right. We consider “m-
redundancy”, which is redundancy of moment conditions in the sense of Breusch
et a1. (1999), and we also consider “p—redundancy”, which is redundancy of the
knowledge of some of the parameters for estimation of the other parameters. One
of our results gives an interesting connection between these two concepts: the ﬁrst
set of moment conditions with 01 known is m-redundant for estimation of 02 if and

only if knowledge of 62 is p—redundant for estimation of 61. This is in fact the key

result in establishing our subsequent results for the selectivity model.

In Section 1.3 we also consider the selectivity model under a stronger “exogeneity
of selection” assumption under which both the unweighted moment conditions
and the weighted moment conditions hold in the selected population. Wooldridge
(2001) has shown that in this circumstance it is better to use the unweighted
moment conditions than the weighted moment conditions. However, this does not
rule out the possibility that it would be better to use both. We show that in this
circumstance the weighted moment conditions are m-redundant for estimation of
01, so that using both sets is no better than using just the unweighted moment
conditions. Thus when we do not have to weight for reasons of consistency, we

also do not have to weight for reasons of efﬁciency.

GMM is sufﬁciently general to accommodate most of the extremum and mini-
mum distance estimators in econometrics (see, e.g., Newey and McFadden, 1994,
p.2118). The arguments we present can be applied, for example, to (Q)MLE, M-
estimation, WLS, and NLS. They also extend to the asymptotic equivalents of
GMM such as empirical likelihood and exponential tilting estimators. Our results
relate to the treatment effect estimation literature (e.g., Rosenbaum and Rubin,
1983; Hirano et al., 2003; Heckman et al., 1998; Hahn, 1998), to stratiﬁed-sampling
literature (e.g., Manski and Lerman, 1977; Manski and McFadden, 1981; Cosslett,
1981a,b; Imbens, 1992; 'ITipathi, 2003) and other similarly-structured problems
(e.g., Hellerstein and Imbens, 1999; Nevo, 2002, 2003; Imbens, 1992; Crepon et al.,
1997)

1.2 Efﬁciency and redundancy results for the gen-

eral estimation problem

1.2.1 Preliminaries

Consider a family of distributions {P9, 6 E O 2 O1 x O2 C Rpl x R92, O compact},
a random vector W“ E W* C Rdim(w*) from P90,60 E O, and a real valued,

measurable function h : W* x O —-+ Rm such that
E90[h(W*, 6)] = 0, if and only if 6 = 60. (1.1)

The expectation is with respect to the distribution of W* indexed by 60. In the

sequel we suppress the subscript.

Let H - || denote the Euclidean norm, N (6, 6) C O denote an open p1 + pg-ball of
radius 6 with center at 6, Vgh(-,6) denote the m x (191 + p2) Jacobian of h(-, 6)

with respect to 6, and “w.p.1” stand for “with probability one”.

Assumption 1.2.1 Assume that the moment function in (1.1) satisﬁes the fol-

lowing conditions:

(ii) h(W*,6) is continuous at each 6 E O, w.p.1;

(iii) h(W*,6) is (once) continuously diﬂerentiable on N(60,6), for some 6 > 0,

w.p.1;

(iv) 1E{Slipaee||’1(W"',0)||2} < 00;
(v) EisuPaeN(60,6) ||V9h(W*,6)||} < 00, for some (5 > 0;

(vi) lE[Vgh(W*, 9ll6=60l is offull column rank.

For Simplicity, we assume here that Wi“ , i = 1, . . . , N, are i.i.d. draws from P90.

The generalized method of moments (GMM) estimator of 60 is the solution to the

following minimization problem

5213 mm'wm), (1.2)

where N
1

M0) = N Z h(W,-*, 9)
i=1
is the sample analogue of the population moment condition which is zero at 60,
and W is a positive semi-deﬁnite weighting matrix (see, e.g., Hansen, 1982). In the
GMM framework, the choice of the weighting matrix may depend on 60. In such
cases, a preliminary consistent estimate of 60 is used to construct an estimate of

W used in the above deﬁnition of the GMM estimator. We will comment on this

point again later.

Theorem 1.2.1 (see, e. g., Newey and McFadden, 1994, Theorems 2.6 and 3.4)
Under Assumption 1.2.1, the GMM estimator of 60 is consistent and asymptotically
normal (CAN).

Proofs: See the Appendix for proofs of all theorems and corollaries. Ci

1.2.2 The general estimation problem

Let 6 = (6’ , 6’2)’ and

h W*;6 ,6
h(W*;6)= 1( 1 2) ,
h2(W"‘;92)

where 61 6 O1, 62 6 O2, and h1(.) and h2(-) are m1- and mg-vectors of known

functions (m = m1 + m2). Then if we suppress W“ we can write (1.1) as

(A) lE[h1(601,602)] = 0,
(B) 1El’l2(902)l = 0-

(1.3)

We consider the general case of overidentiﬁcation, i.e., m1 2 p1 and m2 2 p2.

The optimal weighting matrix for GMM will be the inverse of the following covari-

ance matrix or its components:

C=V[h(60)]= C“ C” , (1.4)

C21 022

where variance is with respect to P90 as before. Note that C is a function of 60
and is generally unknown. In deﬁning alternative GMM estimators and deriving
their asymptotic variance matrices, we will behave as if we knew 60 and thus knew
C. In practice if we wish to use C in the weighting matrix of the GMM estimator
we would typically ﬁrst obtain an estimate of C based on a preliminary consistent
estimate of 60. Such a preliminary estimate of 60 can be the GMM estimator that

uses the identity matrix for weighting.

We assume that C is ﬁnite and nonsingular so its inverse exists. Let

011 012
021 022

0—1 =

Deﬁne the (m1 + m2) X (:01 + 192) matrix of expected derivatives

D D
D=E6h(6) _ 11 12 . (1.5)

66’ 9:90 0 D22
We assume that D11 and D22 are of full column rank so that hg alone identiﬁes

62 and h1 identiﬁes 61 given 62.

Similar to C, D depends on 60. In deriving the GMM asymptotic variance matrices,
we will treat D as know. Consistent estimates of D (and C) can be obtained using

consistent estimates of 60 in practice.

We now deﬁne four different GMM estimators that differ in which moment condi-
tions are used and / or whether 62 is treated as known. For each of these estimators
we treat C as known. We will comment on this point once again in the next

subsection.

Deﬁnition 1.2.1 Call the estimator of 6 that minimizes (1.2) with the optimal

weighting matrix W = C ‘1 the ONE-STEP estimator.

This is the usual GMM estimator that uses both moment conditions (1.3A) and

(1.38) jointly to estimate 601 and 602.

Deﬁnition 1.2.2 Call the estimator of 6 obtained in the following two step pro-
cedure the TWO-STEP estimator: (i) the estimator 62 is obtained by minimizing
(1.2), where h(6) = h2(62) and W = C231; (ii) the estimator of 61 is obtained by
minimizing (1.2), where h(6) = h1(61,62), W = Cﬁl, and 62 = 62 is treated as

known.

This estimator uses the orthogonality condition (1.3B) ﬁrst to Obtain a consistent
estimator of the unknown parameter subvector 602 and then uses the moment con-
dition (1.3A) to obtain the estimator of 601. Estimators considered in Wooldridge
(2003), Newey (1984), Newey and McFadden (1994, pp. 2176-2184) and many

others are TWO-STEP estimators with m1 = p1, m2 = p2.

Deﬁnition 1.2.3 Call the estimator of 61 obtained by minimizing (1.2), where

h(6) = h1(61, 62), W = Cﬁl, and 62 is treated as known, the KNOW-62 estimator.

Here, equation (B) in (1.3) is ignored. However, the results of Section 1.3 of the
paper all derive from understanding that (B) is potentially informative even though

620 is known because it imposes additional restrictions on the population.

Deﬁnition 1.2.4 Call the estimator of 61 obtained by minimizing (1.2), where
h() contains both h1(-) and h2(-), W = 0—1, and 62 is treated as known the

KNOW-62-JOINT estimator.

This is the augmented GMM estimator of 601 of the form considered in Qian and
Schmidt (1999). Here, the information in (1.38) is kept even though 602 is assumed

known.

10

Under Assumption 1.2.1 all four estimators are CAN.

Theorem 1-2-2 Let VONE-STEF, VTWO—STEP; VKNOW-agi and VKNOW-Gg-JOINT d3“
note the asymptotic variance of the ONE-STEP, TWO-STEP, KNOW-62, and KNOW-

62-JOINT estimators, respectively. Then,

yams = (D’C—IDH (1.6)
VTwo-STEP = 808', (1.7)
VKNOW-62 = (DllcﬁlDlll—I, (1-8)
VKNOW-62-JOINT = (DillCnDlll—lr (19)

where B is deﬁned in equation (1.31) of the Appendix.

In the above expressions, we use the standard notation that “the asymptotic vari-

ance of 6 is V” means “\/1V(6 — 60) converges in distribution to N (0, V)”

1.2.3 Efﬁciency and redundancy results

We can now state several asymptotic relative efﬁciency results (noting that a known

parameter is always more efficient than its estimator).

Theorem 1.2.3 For the estimators deﬁned in Deﬁnitions 1.2.1-1.2.4 with asymp-

totic variances given in (1.6)-(1.9), respectively, the following statements hold:

1. KNOW-62-JOINT is no less asymptotically efﬁcient than KNOW-62.

11

2. KNOW-62-JOINT is no less asymptotically efﬁcient than ONE-STEP.
3. ONE-STEP in no less asymptotically efﬁcient than TWO-STEP.

4. If 012 = 0 then KNOW-62-JOINT and KNOW-62 are equally asymptotically

eﬂicient [M—redundancyj.

5. If D12 == 0 then TWO-STEP and KNOW-62 are equally asymptotically efﬁcient

f0?" 61.

6. If 012 = 0 and D12 = 0 then ONE-STEP, TWO-STEP, KNOW-62-JOINT and
KNOW-62 are all equally asymptotically efﬁcient for 61, ONE-STEP and TWO-

STEP are equally asymptotically efﬁcient for 62, too [M/P-redundancy].
7. If m1 = p1 then ONE-STEP of 62 and TWO-STEP of 62 are equal.

8. If m1 = p1 and m2 = p2 then the ONE-STEP and TWO-STEP estimates are

equal ( for both 61 and 62)

9. If m1 = p1 and C12 = 0 then the ONE-STEP and TWO-STEP estimates are

equally eﬁ‘icient ( for both 61 and 62).

10. If 012 = 01202—2ng2 then KNOW-62-JOINT and ONE-STEP are equally as-

ymptotically eﬂicient for 61 [P—redundancy].

11. If D12 = 01202—2ng2 then ONE-STEP, TWO-STEP and KNOW-62-JOINT are

no less asymptotically efﬁcient for 61 than KNOW-62.

As noted above, we have deﬁned our estimators as depending on known C. In
practice, C is replaced by an initial consistent estimate. This has no effect on

the asymptotic variance of the estimates and so it does not affect our efﬁciency

12

comparisons. For Statements 7 and 8, which do not involve asymptotic arguments,

we would need to require that the same initial consistent estimate is used.

Statements 1-3 state the obvious fact that KNOW-62-JOINT dominates KNOW-62,
ONE-STEP and TWO-STEP. The known value of 602 is at least as efﬁcient as any
estimate of 602, and the KNOW-62-JOINT estimate of 601 is the efﬁcient GMM

estimate of 601 based on the full set of available moment conditions.

Statement 4 is essentially the result of Qian and Schmidt (1999). With 602 known,
the second set of moment conditions contains no unknown parameters, and Qian
and Schmidt show that using these conditions in addition to the ﬁrst set of moment
conditions improves efﬁciency except in the special case that 012 = 0. We call
this type of redundancy the knowledge—of—moment redundancy (M-redundancy).
Also, if we combine Statements 23 and 4, we have the corollary that if 012 = 0,

KNOW-62 is at least as efﬁcient as ONE-STEP and TWO-STEP.

Statement 5 is essentially the result of Newey and McFadden (1994, pp. 2179-
2180) for the condition under which ﬁrst stage estimation of a nuisance parameter
(602) does not affect the asymptotic variance of the second stage estimate of the
parameter of interest (601). See also Wooldridge (2002a, pp. 353-356). However,

our version treats the overidentiﬁed case as well.

Statement 6 combines the conditions of Statements 4 and 5. Therefore the equal
efﬁciency of TWO-STEP, KNOW-62 and KNOW-62-JOINT follows from those state-
ments. The fact that ONE-STEP is also equally efﬁcient is an additional result. This
statement provides conditions for redundancy of both the knowledge of 602 and of

the extra moment conditions in (B) for estimating 601 (M/P-redundancy). One

13

case when the conditions hold is when 602 does not enter (A) and the two moment
conditions are uncorrelated. This statement can also be viewed as a special case of
Theorem 7 of Breusch et al. (1999) that deals with partial redundancy of moment

conditions.

Statement 7 is the GMM separability result of Ahn and Schmidt (1995) that says
that the GMM estimate of 62 is unaffected if an equal number of parameters and
moment conditions is added, because the additional conditions only determine 61
in terms of 62. Further, it can be shown (see the Appendix of Ahn and Schmidt,
1995) that if D11 is nonsingular (which is true Since D11 is of full column rank)
the ONE-STEP estimator of 601 is expressed in terms of the ONE-STEP estimator
of 602 using the equation h1(61, 62) = C12Ci1h2(62). Thus, ONE-STEP for 601 is
derived from the same equation as TWO-STEP for 601 as long as h2(62) = 0 (which
holds under exact identiﬁcation of 62) or 012 is zero asymptotically. The former
condition implies equivalence of the estimators (Statement 8); the latter implies

their equal efﬁciency asymptotically (Statement 9).

Statements 10 and 11 are novel and interesting. They discuss implications of
the condition that D12 = 01202-21 D22. This is the condition for redundancy of
hl given h2, for estimation of 602 when 601 is known (see Breusch et al., 1999,
p. 94), which is an m-redundancy result. Under this condition, Statement 10
says that KNOW—62-JOINT and ONE-STEP are equally efﬁcient. This means that
knowledge of 602 does not help efﬁciency of estimation of 601 (from the set of all
moment conditions) under this condition, which is a p-redundancy result. This
link between m—redundancy and p—redundancy (the ﬁrst set of moment conditions
with 601 known is m—redundant for estimation of 602 if and only if knowledge of 602

is p-redundant for estimation of 601) is quite interesting and (so far as we know)

14

original.

Under the same condition, Statement 11 says that KNOW-62 is dominated by the
other three estimators. This is because knowledge of 602 is not useful, and the
KNOW-62 estimator fails to use the second set of moment conditions, which is useful
unless 0'12 = 0. Note, however, that although the TWO-STEP estimator dominates
the KNOW-62 estimator under this condition, the TWO-STEP estimator is still not
as efﬁcient as the ONE-STEP or KNOW-62-JOINT estimators unless m1 = p1 (the

ﬁrst equation is exactly identiﬁed for 61, given 62).

This condition is also important because it implies that conservative inference can
be made using the asymptotic standard errors obtained from exactly identiﬁed

estimations that neglect the ﬁrst step (Statement 11).

The condition of Statements 10 and 11 will often hold when h2(62) is the score
of a log-likelihood function that depends on 62 but not 61. In this case the esti-
mate of 602 based on h2 will be efficient, and another moment condition based on
h1(61, 62) with 601 known should be m-redundant. More precisely, the generalized
information equality (GIME) implies that the expectation of the derivative of h1
(with respect to 62), evaluated at 60, equals minus its covariance with the score so
that D12 = -C'12, and the usual information equality implies that D22 = —ng, so
that D12 = 01202—211322 holds. Indeed this is exactly what occurs in the selectivity

model of the next section.

Example 1.2.1 A sufﬁcient condition for Statements 6, 10, and 11 to hold is
that h1(91,02) = V911Df(w*|91,92) and h2(92) = V921Df(w*|91,92), where

f (w*|61, 62) is the density of W’“. Then, the asymptotic variance matrix of the

15

estimator of 602 can be equivalently written as 02—21 and as 022. This implies that
the information matrix for 61 and 62 is block diagonal, i.e. D12 = —012 = 0. Thus
by Theorem 1.2.3 we can claim more than Statements 10 and 11 in this case: it
does not make any difference for the efﬁciency of the estimate of 61 whether 62 is

estimated or known, and in fact all four estimators are equally efﬁcient (Statement

6). a

We now apply these results to the missing data problem.

1.3 Application to missing data problem

1.3.1 The population problem

Consider again a random vector W* E W* from the distribution P90, 60 E O =
81 x O2 C R61 x R92,O compact. Let W* contain random vector W E W C
Rdim(W). Consider a real valued measurable function g : WxO1 —+ Rm1(m1 2 m)
such that

E[g(W, 61)] = 0, if and only if 01 = 601. (1.10)

As before, expectation is with respect to P90. Assume that the moment function

in (1.10) satisﬁes Assumption 1.2.1.

We are interested in estimating 601. The parameter 601 usually describes some

feature of the distribution of W such as the conditional mean, the conditional

16

variance, the conditional quantiles, etc. The vector W is often partitioned into

(X, Y) E X x y and IE(Y|:I:) is often the feature of interest (see Example 1.3.1).

Example 1.3.1 Consider the I -estimation of the parameter 601 in a general non-
linear least squares model for 1E(Y|:r) = m(x, 601). This is one of the examples
considered in Wooldridge (2003). We assume that the model is correctly speci-
ﬁed. Let the identifying moment functions be the ﬁrst order conditions for op-
timization of q(a:,y;61) = (y - m(r,61))2. Then, W = (X, Y), m1 = m, and
g(W, 61) = —(Y — m(X, 61))[V91m(X, 61)]’. Note that a stronger condition than
(1.10) holds in this case, namely lE[g(W, 601)|a:] = 0. El

Example 1.3.2 Consider the maximum likelihood estimation of a LOGIT model
where Y is a binary outcome variable and X is a vector of regressors and the condi-
tional probability p(y|:r, 601) is modelled as G(:E’901)y°(1-G($,901))1—y, where G(-)

is the logistic cdf. Likelihood equations can be used to construct the GMM estima-

—

tor based on the expectation of the score function IE [V91 1n f (X, Y; 61) ’0 9 J
= 01
0, where f (SC, y; 601) is the joint density of X and Y. If the distribution of X

does not depend on 61, then f (:13, y; 61) = p(y|r, 61) f (:r), where f(zr) is the un-
known pdf of X. Then, the identifying moment condition can be rewritten as
E{v9 [lnp(Y|2:;61) +1n f(X)]‘ } = IE [v9 lnp(Y|x;61)l and the
1 61:901 1 91=9o1

ML estimation is equivalent to the conditional ML estimation. For this example,
W = (X Y) m = and (W a ) = X'(Y‘G(X'91)) . (x'v ) where () is

v a 1 pl, 9 a 1 G(X’91)(1—G(X’91)) g 1 i g
the logistic pdf. [3

 

Example 1.3.3 Consider estimation of the population averages #0 and p1 un-

der control and treatment. Suppose a random sample is available of each unit’s

17

outcome under both control and treatment. Let Y(0) denote the outcome under
control; Y(1) under treatment. The identifying moment restriction for each group
is 1E(Y(t) — Mot) = 0,t = 0,1. So for this example W = Y(t), m1 = p1 = 1, and
g(W, 6) = Y(t) - ut,t = 0,1. We can also consider the average treatment effect

r=u1—u0. C]

The above model (1.10) holds in the entire (unselected) population. Now we
consider the selected population deﬁned by a random variable S 6 {0,1} such
that W is observed if and only if S = 1. We assume that the probability of
selection depends on some additional variables Z, where Z 6 Z C Rdimw) is
always observed. Some or all of Z may be in W; that is, some of W may always

be observed, but all of W is observed only when S = 1. Deﬁne
P(z,602) = P(S = 1|z), (1.11)

where P(z, 62) is a parametric model for the probability of selection and is known
up to the parameter vector 62 6 O2 C R62. Again, in many problems, the joint
density of {5, Z} can be written as the product P(slz, 62)r(z), where r(z) is the
pdf of Z.

Assume {5, Z} is a subvector of W“ from P90. Suppose there exists a real valued

measurable function u : {0,1} x Z x O2 —* Rm? (m2 2 p2) such that
lEu(S, Z; 62) = 0, if and only if 62 = 602. (1.12)

(The expectation is with respect to P90.) Assume that (1.12) satisﬁes Assump-

tion 1.2.1. We call moment condition (1.12) the “selection moment condition”.

18

Examples 1.3.4-1.3.6 in the next section show how (1.12) can be obtained from
(1.11).

The GMM estimator based on (1.10), but with missing data, in effect makes the
empirical moments 7%; 2?; S;g(W,-, 61) close to zero. These empirical moments

are the random sample analogues of the population moments of the form
lE[Sg(W, 61)] = 0. (1.13)

We call these moment conditions the “unweighted selected population moments”
to emphasize that they hold in the selected rather than the target population and
to distinguish them from the weighted selected population moments that we will
deﬁne Shortly. The selectivity problem is that the unweighted selected population
moment conditions ( 1.13) may not hold at 601; more precisely, the value 601 that

solves (1.10) may not solve (1.13).

We also consider the “weighted selected population moments” that weight the
moment function in (1.13) by the inverse of the selection probability (see, e.g.,

Horvitz and Thompson, 1952):

E [Wigs—209M: 91)] = o. (1.14)

The weighted selected population moments also may not hold. Indeed, it is intu-
itively clear that whether (1.13) or (1.14) hold must depend on what is assumed

about the relationship of the selection mechanism and W.

19

1.3.2 Motivation and deﬁnitions

We follow Wooldridge (2002b, 2005) in making the following “ignorability” (or

“selection on observables”) assumption.

Assumption 1.3.1 (ignorability of selection) P(S = 1|w,z) = P(S = 1|z) =
P(Z,902).

Assumption 1.3.1 says that, conditional on Z, S and W are independent. This
is commonly written as S _L W | Z. In some cases, ignorability is true by con-
struction. An example would be the case that Z is an indicator of stratum, and
selection is random within stratum. In other cases it is a substantial behavioral

assumption.

As Wooldridge notes, this assumption does not imply that the unweighted selected

population moment conditions (1.13) hold at 601. This can be seen as follows:

ES ° g(W, 601) = EEis ' g(W, 001)|Z]3 USing LIE
= IEIE(S|z)lE[g(W, 601)|z], using ignorability (1-15)
= EP(Za 602)IE[g(W, 901llzla

(where LIE means law of iterated expectations), and our assumptions do not in

general imply that IE[g(W, 601)|z] = 0. However, the weighted selected moment

20

conditions (1.14) do hold at 60, since

Beggar/moi) = Emmi—,yawoolzl
= Emmsemlgwanlzl
= EE[9(W,901)|ZJ
= lEg(W,601)=O.

(1.16)

The simplest assumption under which the unweighted moment condition (1.13)

holds in the selected sample is the following.

Assumption 1.3.2 P(S = 1|w) = P(S = 1). That is, S is independent of W.

This assumption is easy to understand and clearly implies that (1.13) holds, Since
S is independent of g(W, 61). This condition is sometimes referred to as “missing
completely at random” (see, e.g., Little and Rubin, 2002) but we will not use this
terminology further, since there seems to be some inconsistency in the literature

in the use of these words.

It should be noted that this assumption is neither stronger nor weaker than the
assumption of ignorability (Assumption 1.3.1). That is, “S independent of W”
does not imply, and is not implied by, “S independent of W conditional on Z”. It
is perhaps intuitive that the ﬁrst condition is stronger than the second, but in fact

that intuition is not correct.1

 

1The intuition referred to here is based on the fact that, for general Y, X1 , X2, IE(Y|:r1 , $2) = 0
does imply that IE(Y|a:1) = 0 by the law of iterated expectations. But there is no comparable
law for conditional independence.

21

The Simplest assumption under which both the unweighted and the weighted mo—

ment conditions hold is the following.

Assumption 1.3.3 (independence of selection) (S, Z) is independent of W.

This assumption is also easy to understand, but it would appear to be too strong

to apply in practical cases.

We now consider an exogeneity condition that is weaker than 1.3.3 and which does

imply that both the weighted and unweighted moment conditions hold.

Assumption 1.3.4 (exogeneity of selection)

(i) Assumption 1.3.1 (ignorability of selection) holds.

(ii) 139(W, 901llz = 0-

This is essentially the same deﬁnition of exogeneity as in Wooldridge (2005).

Under Assumption 1.3.4, selection is both ignorable and exogenous with respect
to the primary problem of interest. For example, if W = (Y, X) and Z g X,
then having X in the conditioning set in the original problem is sufﬁcient for the
assumption to hold. If selection is based on covariates other than X, i.e. X g Z,

then g(Y, X; 61) has to be uncorrelated with any function of X " E Z \ X given X.

We now show that under Assumption 1.3.4, both the weighted and unweighted

moment conditions hold. We ﬁrst state without proof the following basic result.

22

Lemma 1.3.1 Suppose Assumption 1.3.1 holds. Then f (w|z,s) = f (w|z).

(Here f () is generic notation for probability density.) Then it is easy to see that

the following result is true.

Theorem 1.3.1 Suppose Assumption 1.3.4 (exogeneity) holds. Then

Eg(W, 601)|z, s = 0. (1.17)

This is a much simpler and stronger result than Wooldridge obtained. It imme-
diately implies that any function of Z and S is uncorrelated with g(W, 601), and
therefore that the unweighted moment condition (1.13) and the weighted moment
condition (1.14) both hold in the selected sample. In fact, this is true whether or
not the weights are correct (in the sense that they do in fact represent P( S = 1|z)).

All that is required is that the weights be a function of Z and S.

We conclude that under ignorable selection,

S
W,6
IE ”2,9025“ 01) =0 (1.18)
“(3,Z;9o2)

and, under independent or exogenous selection,

5 W,6
E g( 01) =0, (1.19)

“(Si Z; 002)

23

 

 

W,6
IE ”259902) 9( 01) =0, (1.20)
U(S,Z;602)
l- - q _
S 59(W 601)

 

 

_ 11,.(27, Z; 602) )

Example 1.3.4 Suppose that sampling in Example 1.3.1 is nonrandom and the
selection mechanism can be modelled as a PROBIT. Then, P(Z,62) = <I>(Z’62),
where <I>() is standard normal cdf. Then, the selection moment conditions for
this problem contain the likelihood equations for the log-likelihood l(62|s, z) E
sln<I>(z'62) + (1 — s)ln(1— (Hz/62)). Thus, m2 = p2 and

Z’(S — <I>(Z’62))

U(S, Z; 92) = (P(Z'92) (1 — <I>(Z’02))

' ¢(Z’62)a

 

where d)(-) is the standard normal pdf. Note that we not only have lEu(S, Z; 602) =
0 but also IE[u(S, Z; 602)|z] = 0. Under the ignorability of selection assumption,

we can use the moment condition Eq) g(X, Y ,—601) — 0, where g( ) 13 deﬁned

<I>(Z’0 29)
in Example 1.3.1. Cl

Example 1.3.5 (Variable Probability Sampling) Suppose that sampling in Exam-
ple 1.3.2 is stratiﬁed. Let the sample Space W be partitioned into J nonempty and
disjoint strata W1, W2, . . . , W J. If an observation lies in stratum W-, it is retained
with probability Poj that is usually known. So, the selection predictor Z can be de
ﬁned by vector (21,. . . ,ZJ)’ with Z- = II{W E W-}, j =1 2, ,,J where M} is
the indicator function, and the probability model P(z, 602): 221:1 Poj zj, where

602 = (P01, . . . , POJ)’ . The ignorability assumption is satisﬁed by design. We

24

have P(slz,62) = “31:1 [Pg (1 -— P J”)1 5]zJi j n.(1 11). Hence, the selection mo-
ment function for this problem contains the likelihood equation of the log-likelihood

function l(62|s, z): — 23‘- -1 zj[slnPj + (1 — s)1n(1-)]. Thus, m2: p2— = J and

u(,S Z ,=6)2 (TZDHf—‘gfl” . ”gig—£55). The weighted selected population mo-

ment condition contains W. -g(X, Y; 61), where g( ) is deﬁned In Example
j=1PJZJ

1.3.2. If, In addition, stratiﬁcation 15 based on exogenous variables, i.e. exogeneity

of selection assumption holds, the unweighted moment conditions (1.13) can also

be used. [:1

Example 1.3.6 (Average Treatment Effect) Suppose that the sample in Example
1.3.3 is not entirely observed. Instead we observe Y(0) only for the units that
are in the control group and Y(1) only for those that are in the treatment group.
Understandably, the counterfactual data are missing. If Z are treatment predictors,
the selection model for the treatment group is P(S = 1|z) = P(z; 602) and for the
control group P(S = Olz) = 1 — P(z;602), where P(z;602) is the probability of
receiving treatment. The ignorability of selection assumption implies in this case
that P(S = 1[y(0),z) = P(S = 1|z) and P(S = 0|y(1),z) = P(S = 0|z). The
selection moment condition for this example is the same as in Example 1.3.4. The
weighted population moment condition will contain m(YU) — #01) for the

l-S
treatment group and 1-P 2&2 (Y(O) — p00) for the control group. The average
. - . s 1—S
treatment effect can be Identiﬁed uSIng P Z; 002 Y(1) — 1—P 2&2 Y(O) — To. C]

1.3.3 Relative efficiency results under ignorable selection

First consider estimation based on (1.18), under ignorable selection. Following

the notation of Section 1.2 we write the weighted selected population moment

25

condition as 1Eh1(W*, 601, 602) = 0, where W* contains W, S and Z, and where

, s
h1(W 3901, 902) = mgfwﬂoll- (1-22)
a 0

Wooldridge (2005) discusses estimation based on (1.22), for the exactly identi-
ﬁed case. He compares the estimator of 601 when 602 is known to the estimator
of 601 when 602 is replaced by some consistent estimate 62. In order to ana-
lyze this or other related issues, we have to say something about how 602 is esti-
mated. In general terms, it is estimated by GMM based on a moment condition
lEh2(S, Z; 602) = 0, which puts the analysis into the framework of Section 1.2.
However, following Wooldridge, we make the speciﬁc assumption that 602 is esti-
mated by MLE based on the model P(s = 1|z) = P(z, 602). That is, h2(S, Z; 602)

is the score function corresponding to the likelihood for this model. Speciﬁcally,
S — P(z, 602)

=90? P(z, 6.2)[1 — PM. 6.2)]'
(1.23)

 

h2(5. Z; 902) = “(5. Z;9o2) = V02 P(Z,92)|02

Under these assumptions, we have the puzzle referred to in the Introduction;
namely, the TWO—STEP estimator of 601 that uses 62 in (1.22) is better than the
KNOW-62 estimator that uses the true value of 602 in (1.22). We will verify that
this result holds also in the case that (1.22) is overidentiﬁed, and also provide our

explanation of the puzzle, using the results of Section 1.2. To apply these results

26

we need to do some calculations involving the following:

012 = Eh1(W*,60)h2(Sa 29602)],
C22 = Eh2(SI 23602)h2(3i 21602)],
D12 E V92h1(W .91, 92)]6 9 .

:0

(1.24)

D22 = EV92h2(3,Z.02)]62_9 2-
-0

Theorem 1.3.2 Under the ignorability of selection assumption,

W,6 . . .
( a) C12 = FEW V92 P(Z, 62) 62:902’ which is (in general) not equal to zero;

(b) D12 = -C'12, D22 = -C22, and 80 012 = 01202—21022-

To understand Theorem 1.3.2, note ﬁrst that in the unselected population, sz E
lEg(W, 601) - h2(S, Z, 602)’ = 0. That is, the original moment condition g(W, 601) is
uncorrelated with the score function h2(S, Z, 602) by the generalized information
equality. However, in the selected sample, 012 aé 0. That is, h1(W*, 601, 602) and
h2(S, Z, 602) are correlated. This correlation makes h2(S, Z, 602) relevant for esti-
mation of 601 even if 602 is known, and the inefﬁciency of the KNOW-62 estimator

is due to its failure to capture the information in the moment condition based on

h2(S‘J Z1602)'

Although we do not pursue this point, it would appear that the inefﬁciency of the
KNOW-62 estimator (at least relative to the KNOW-62-JOINT estimator) would hold
even if h2(S, Z, 62) were not a score function. It depends only on C12 75 0, not on

the particular form of C12.

27

Part (b) of Theorem 1.3.2 gives a number of information equalities which do depend
on h2(S, Z, 62) being a score function. They establish that D12 = 01202-211222,
which is the condition for Statements 10 and 11 of Theorem 1.2.3. Statement 11 of
Theorem 1.2.3 says that the KNOW-62 estimator is inefﬁcient relative to the ONE-
STEP, TWO-STEP and KNOW-62-JOINT estimators. This extends the previously-
cited (but, we hope, no longer puzzling!) result, namely that KNOW—62 is inefﬁcient
relative to TWO-STEP, to a larger set of other estimators, and also to the case that

the GMM problem for the parameters of interest is overidentiﬁed.

Statement 10 of Theorem 1.2.3 says further that 602 is p—redundant, so that the
ONE-STEP and KNOW-62-JOINT estimators are equally efﬁcient. So long as one
includes the score function h2(S, Z, 602) in the estimation problem, it does not

matter (in terms of efﬁciency of estimation of 601) whether 602 is known or not.

Another note is that, although the TWO-STEP estimator is better than the KNOW-
62 estimator, it is not necessarily efficient. In the exactly identiﬁed case, it is
efficient because it equals the ONE-STEP estimator (Statement 7 of Theorem 1.2.3),
but in the overidentiﬁed case it is generally less efﬁcient than the KNOW-62-JOINT

and ONE-STEP estimators.

Example 1.3.7 Continuing Example 1.3.4 under ignorable selection with the ML

2]:

= V92P(z,62)]

estimate of 602, IE[S - u(S, z; 602)'|z] can be written as

(S - P(Zv 602))

E S ' P(Z.0.2)(1- P(Z.0.2))

 

- Va2P(Z.62)]0

 

2 =902

_ [13(32lz) — IE(S|2) - P(z, 0.2)]
‘ P(z.e.2)<1—P(z.o.2>> 'V92P(z92)lo

 

2:902 92 =902

28

where the second equality follows because E(S2|z) = E(S|z) and E(S|z) = P(z, 62).
This is non-zero. Also, E[g(W; 61)|z] aé 0 unless there is also exogeneity. Thus,

012, which can be expressed by the law of iterated expectations as

1

1E {mE]g(W, 601)]Z]E]SU(S, Z; 602),|Zl}’

is generally non-zero. In fact, C12 = E ]% - V92 P(Z, 62)]62=002]. We can-
not therefore claim m—redundancy under ignorability of selection: using orthog-
onality conditions from the selection process helps in estimating 601 even if the
weighting probabilities are known. However, we can claim p—redundancy by The-
orem 1.3.2: using known selection probabilities with the additional moment con-
ditions for selection is as efﬁcient as estimating the probabilities in a one-step or
two-step procedure. Each of the three alternatives is equally preferred to only

using the original problem with known probabilities. [:1

Example 1.3.8 Continuing Example 1.3.5 under ignorability with the ML esti-
gasses-mane.»
P0, (l—Poj
j = 1,... , J. Since E(Szlz) = E(S|z) = 231:1 Pojzj, the elements can be written

J
z-E-_ P -z-
‘7 Jﬁolj 0] J, j = 1, . . . , J. Thus, 012, which can be expressed by the law of

 

,

mates of Poj , E[S-u(S, z; 602)’ Iz] contains elements of the form

 

as

iterated expectations as E m - E[g(W, 601)|z] - E[Su(S, z; 602)’|z]}, can

g(W,601)lI{WEW1} g(W,6Q1)II{W€W_]}
P01 ’ ' ' ' ’ PoJ

less there is also the exogeneity or independence assumption. Similarly to Example

 

 

be simpliﬁed to E ] ]. This is nonzero, un-
1.3.7, under ignorable selection, using selection moment conditions increases pre-
cision of estimating 601. Also, if knowledge of selection probabilities is available
it provides for the same precision of 61 as the one-step or two-step procedures as

long as all m1 + m2 moment conditions are used. Cl

29

Example 1.3.9 Continuing Example 1.3.6, with ML estimates of treatment prob-
abilities P(z; 62) from PROBIT, the correlation matrix between the moment con-

ditions that identify p01 and the likelihood equations from PROBIT for 602 is

 

 

(S—P(Z;602))V9 P(Z;62)]
E P 23902 x (1—P(Z;602)) . ThIs, under ignorability, can
Y 1 — V P Z;6 :
be rewrittenas —E ]( ( ) #01)P(0Z2-6(2) 2)l62 902] which is non-zero unless Y(1)J.Z
v o

and equal to minus the expected derivative of the weighted moment equation for
#01 with respect to 62. A similar argument is valid for estimating #00 and, conse-
quently, r0. Hence, in average treatment effect estimation, m-redundancy cannot
be claimed: knowledge about the treatment assignment process should be included
into the estimation. There is p—redundancy, however: it does not matter asymp-
totically whether the parameters of the assignment process are known or estimated

as long as all available moments are used. [:1

1.3.4 Relative efﬁciency results under exogenous selection

Consider now estimation based on (1.19)-( 1.21), under exogenous selection.

Wooldridge (2002b, Theorem 5.2) shows, under the exogenous selection assump-
tion, that the IPW M-estimator that uses known selection probabilities is as ef-
ﬁcient as a two-step estimator that employs initial ML estimates of the selection
probabilities. The results of Section 1.2 allow to restate this result for other esti-

mators and for the cases of overidentiﬁcation in the primary problem of interest.

Using deﬁnitions (1.22)-(1.24), it is easy to verify that, for the GMM estimator

based on ( 1.20), the following is true.

30

Theorem 1.3.3 Under the exogeneity of selection assumption:

(0) 012 = 0;

(b) 012 = 0-

SO, by Theorem 123(6), we have m—redundancy of the selection moment condition
and p-redundancy of 602. ONE-STEP, TWO-STEP, KNOW-62 and KNOW-62-JOINT

estimators of 601 are equally efﬁcient asymptotically.

Wooldridge (2005, Theorem 4.3) shows, under exogeneity and the further assump-
tion that the original moment conditions satisfy the conditional information matrix
equality, that the estimator based on the unweighted moment conditions is more
efﬁcient than the estimator based on the weighted moment conditions. This is ﬁne
as far as it goes, but it does not rule out the possibility that using both could be

more efﬁcient than using either. Our next result does rule out this possibility.

Theorem 1.3.4 Suppose Assumption 1 .3.4 holds. Then the optimal moment con-

ditions in the selected population are the same as in the unselected population.

To see why this result is true, note that the optimal moment conditions in the

unselected population are the following:

IED(Z)’C(Z)‘19(W, 9.1) = 0, (1.25)

31

where D(z) = E V919(W’61)]g z and C(z) = Eg(W,601)g(W,601)’|z. The

 

1:901
optimal moment conditions in the selected population are:

ED(Z, s = 1)’C(Z, s = 1)_ng(W, 601) = 0, (1.26)

where D(z,S = 1) = E{Vglg(w,61)]9 6 z,S =1} and C( (z 5: 1) =
1: 01
E{g(W,601)g(W, 601)’|z,S = 1}. But D(z, S— - 1) =(Dz )by the ignorability

 

assumption, and similarly C(z, S = 1) = C(z).

An implication of this result is that the weighted moment conditions are m-
redundant for the estimation of 601. More precisely, assuming that weighting
was not part of the efﬁcient estimation problem in the unselected population, it
also plays no role in the efﬁcient problem in the selected population. Thus in this
circumstance we do not have to weight for reasons of consistency, and we also do

not have to weight for reasons of efﬁciency.

Theorem 1.3.4 is a useful result, but it falls short of being the ﬁnal word on effi-
ciency. The question is whether the moment conditions in equation (1.17) capture
all of the information in the exogeneity assumption. The ﬁrst part of the exogeneity
assumption is that Eg(W, 601)|z = 0, and the efﬁcient GMM estimator under this
conditional moment restriction (with full observability) is well understood. The
second part of the exogeneity assumption is the ignorability condition, and Theo-
rem 1.3.1 shows that this makes the original conditional moment restriction valid
in the selected sample as well. More precisely, we then have Eg(W, 601)|z, s = 0
and Theorem 1.3.4 gives the form of the efﬁcient estimator under this conditional
moment restriction. However, what is not clear is whether all of the information

in the ignorability condition is captured by the extension of the original moment

32

conditions to the selected population.

We deﬁned P(z,602) = E(S|z), so that E[S — P(z,602)]|z = 0. However, under
ignorability, we have the stronger condition that E[S — P(z, 602)]Iz, w = 0. The
score function for estimation of 602, as given in (1.23) above, will not be useful for
estimation of 601, because it is a function of Z and S only, and we have already
used the optimal functions of Z and S in (1.26) above. The question is whether
the fact that E[S — P(z, 602)]lw = 0 adds anything. This question is complicated
by the fact that W is only observed when S = 1. If no part of W (other than Z, if
Z is a subset of W) is always observed, we do not see any way to make use of the
condition that E[S — P(z, 602)]lw = 0. However, now suppose that some subset of
W is always observed. Let W0 be the part of W which (i) is always observed, and

(ii) is not part of Z. Then we can consider moment conditions of the form

Ek(Wo)[S — P(z, 602)] = 0. (1.27)

These moment conditions are not useful for estimation of 602, but they may be
useful for estimation of 601, if they are correlated with the original moment con-
ditions. It is easy to see that they are not correlated with the unselected original

moment conditions:

Eg(W, 601) k(W0), [S _ P(Z> 602)] = EELS _ P(21002)]]2E9(VV: 601)k(W0)IIZ
= 0.

33

However, they are correlated with the selected original moment conditions:

E59(Wa901) k(Wo)' [3 — P(Z,9o2)l = EESIS — P(z,902ll|ZEg(Wa901)k(Wo)'lz
= EP(Z, 902ll1 — P(Z, 902)] Eg(W, 901)k(Wo)'|Z
ea 0.

Thus the moment conditions in (1.27) may possibly be useful in estimation of 601.

We leave further exploration of this point for future work.

1.4 Concluding remarks

We summarize relative efﬁciency results for four alternative GMM estimators of a
parameter vector that enters into one set of moment conditions along with another

vector that also enters into an additional set of conditions and may be known.

We provide formal statements and proofs of efﬁciency claims and spell out condi-
tions under which some knowledge may be redundant. If the two sets of moment
conditions are uncorrelated and the expected derivative of the ﬁrst set with respect
to the additional parameter vector is zero, both the additional moment conditions
and the knowledge of the additional parameters are redundant. These are the
strongest sufﬁcient conditions we consider. The weaker condition of moment un-
correlatedness is sufficient for redundancy of extra moment conditions when the
additional parameters are known and for equal efﬁciency of the multi-step and one-
step estimators under exact identiﬁcation of the original set of moment conditions.
The condition of zero expected derivative of the original set of moments with re-

spect to the additional parameter vector turns out to be sufficient for no inﬂuence

34

of the ﬁrst step estimation over the second step standard errors in very general
settings. We provide a sufﬁcient condition for equal relative efficiency of the es-
timator that treats additional parameters as known using the full set of moment

conditions and the estimator that involves estimating both parameter vectors.

We apply these results to a general missing data problem after showing that the
weighted and unweighted GMM estimators on the selected sample preserve de-
sired asymptotic properties under reasonable assumptions. We explain the coun-
terintuitive result that estimating selection probabilities dominates using known
probabilities if this knowledge is available. It turns out that this is an outcome
of ignoring the moment conditions that characterize the selection process. In-
terestingly, however, a proper use of such knowledge along with known selection
probabilities turns out to be as good as estimating the probabilities using the same
moment conditions. Redundancy of the parameter knowledge applies. We show
that this redundancy result is driven by two factors: the ignorability assumption
on selection and the use of the score function in estimation of the selection prob-
abilities. The ignorability condition says that the ﬁrst-stage score function for the
conditional likelihood f (slz) is in fact the score function for the conditional likeli-
hood f (slz, w) and thus GCIME can be applied producing the sufﬁcient condition

for parameter knowledge redundancy.

When selection is based on exogenous variables with respect to a correctly spec-
iﬁed feature of conditional distribution, any function of the exogenous variable
can be used as a weight in the weighted GMM estimation. This implies two in-
teresting results. First, the weighted GMM estimation on the selected sample is
robust to selection model misspeciﬁcation. Second, using both weighted and un-

weighted moment conditions dominates using only one of them unless the original

35

moment function incorporates the optimal weights in the ﬁrst place. No efﬁciency

improvements are possible in that case.

Besides the examples we give, the following speciﬁc missing data problems can be
studied in the framework of Section 1.2: using auxiliary data to estimate probabil-
ities of selection (see Hellerstein and Imbens, 1999; Nevo, 2002, 2003), weighting
by nonparametric estimates of propensity scores in estimation of average treat—
ment effects (see Hirano et al., 2003), estimating weights for choice-based samples
in pseudo-MLE settings (see Manski and Lerman, 1977; Manski and McFadden,
1981; Cosslett, 1981a,b; Imbens, 1992), EL and GMM estimation for stratiﬁed
samples with possibly known sampling or population frequencies (see Tripathi,

2003).

36

Bibliography

AHN, S. AND P. SCHMIDT (1995): “A separability result for GMM estimation,
with applications to GLS prediction and conditional moment tests,” Econometric
Reviews, 14, 19—34.

BREUSCH, T., H. QIAN, P. SCHMIDT, AND D. WYHOWSKI (1999): “Redun-
dancy of moment conditions,” Journal of Econometrics, 91, 89-111.

COSSLETT, S. R. (1981a): “Efﬁcient estimation of discrete-choice models,” in

Structural Analysis of Discrete Data and Econometric Applications, ed. by C. F.
Manski and D. L. McFadden, Cambridge: The MIT Press, 51—111.

 

(1981b): “Maximum likelihood estimator for choice-based samples,” Econo-
metrica, 49, 1289—1316.

CREPON, B., F. KRAMARZ, AND A. TROGNON (1997): “Parameters of interest,
nuisance parameters and orthogonality conditions An application to autoregres-
sive error component models,” Journal of Econometrics, 82, 135—156.

GOLDBERGER, A. (1972): “Maximum likelihood estimation of regressions con-
taining unobservable independent variables,” International Economic Review,
13, 1—15.

HAHN, J. (1998): “On the role of the propensity score in efficient semiparametric
estimation of average treatment effects,” Econometica, 66, 315—331.

HANSEN, L. (1982): “Large sample properties of generalized method of moments
estimators,” Econometrica, 50, 1029—1054.

HECKMAN, J. J ., H. ICHIMURA, AND P. TODD (1998): “Matching as an econo-
metric evaluation estimator,” The Review of Economic Studies, 65, 261—294.

HELLERSTEIN, J. K. AND G. W. IMBENS (1999): “Imposing moment restrictions
from auxiliary data by weighting,” The Review of Economics and Statistics, 81,
1—14.

37

HIRANO, K., G. IMBENS, AND G. RIDDER (2003): “Efﬁcient estimation of aver-
age treatment effects using the estimated propensity score,” Econometrica, 71,

1161—1189.

HORVITZ, D. AND D. THOMPSON (1952): “A generalization of sampling without
replacement from a ﬁnite universe,” Journal of the American Statistical Associ-

ation, 47, 663—685.

IMBENS, G. W. (1992): “An efﬁcient method of moments estimator for discrete
choice models with choice-based sampling,” Econometrica, 60, 1187-1214.

LITTLE, R. J. A. AND D. B. RUBIN (2002): Statistical analysis with missing
data, Wiley series in probability and statistics, Wiley-Interscience, 2 ed.

MANSKI, C. F. AND S. R. LERMAN (1977): “The estimation of choice probabil-
ities from choice based samples,” Econometrica, 45, 1977—1988.

MANSKI, C. F. AND D. L. MCFADDEN (1981): “Alternative estimators and
sample designs for discrete choice analysis,” in Structural Analysis of Discrete
Data with Econometric Applications, ed. by C. F. Manski and D. L. McFadden,
The MIT Press, 2—50.

NEVO, A. (2002): “Sample selection and information-theoretic alternatives to
GMM,” Journal of Econometrics, 107, 149—157.

 

(2003): “Using weights to adjust for sample selection when auxiliary infor-
mation is available,” Journal of Business and Economic Statistics, 21, 43—53.

NEWEY, W. (1984): “A method of moments interpretation of sequential estima-
tors,” Economics Letters, 14, 201-206.

NEWEY, W. AND D. MCFADDEN (1994): “Large sample estimation and hypoth-
esis testing,” in Handbook of Econometrics, ed. by R. Engle and D. McFadden,
vol. IV, 2113—2241.

PAGAN, A. (1984): “Econometric issues in the analysis of regressions with gener-
ated regressors,” International Economic Review, 25, 221—247.

QIAN, H. AND P. SCHMIDT (1999): “Improved instrumental variables and gener-
alized method of moments estimators,” Journal of Econometrics, 91, 145—169.

ROBINS, J. M. AND A. ROTNITZKY (1995): “Semiparametric efﬁciency in multi-

variate regression models with missing data,” Journal of the American Statistical
Association, 90, 122—129.

ROSENBAUM, P. R. (1987): “Model-based direct adjustment,” Journal of Amer-
ican Statistical Association, 82, 387—394.

38

ROSENBAUM, P. R. AND D. B. RUBIN (1983): “The central role of the propensity
score in observational studies for causal effects,” Biometrika, 70, 41—55.

T RIPATHI, G. (2003): “GMM and empirical likelihood with stratiﬁed data,” Work—
ing Paper, University of Wisconsin.

WOOLDRIDGE, J. (1999): “Asymptotic properties of weighted M-estimators for
variable probability samples,” Econometrica, 67, 1385—1406.

 

(2001): “Asymptotic properties of weighted M—estimators for standard
stratiﬁed samples,” Econometric Theory, 17, 451—470. -

 

(2002a): Econometric analysis of cross section and panel data, Cambridge,
Mass: MIT Press.

 

(2002b): “Inverse probability weighted M-estimators for sample selection,
attrition and stratiﬁcation,” Portuguese Economic Journal, 1, 117—139.

 

(2003): “Inverse probability weighted estimation for general missing data
problems,” Working Paper, Michigan State University,
www.msu.edu/ ~ec / faculty / wooldridge / current%20research / wght2r6.pdf .

 

(2005): “Inverse probability weighted estimation for general missing data
problems,” Working Paper, Michigan State University,
www. msu.edu/ ~ec / faculty / wooldridge / current%20research / wght2r7.pdf .

ZELLNER, A. (1970): “Estimation of regression relationships containing unobserv-
able independent variables,” International Economic Review, 11, 441—454.

39

Appendix: Proofs

PROOF OF THEOREM 1.2.1: Proofs are given, e.g., in Theorems 2.6 and 3.4
of Newey and McFadden (1994). Also, see Hansen (1982). Condition (1) is the
identiﬁcation assumption. Conditions (ii) and (iv) are needed for consistency,
conditions (iii)-(v) are needed for asymptotic normality, while conditions (iv) and
(v) ensure that the objective function in (1.2) and its ﬁrst derivative, respectively,
converge uniformly to their population analogues and condition (vi) provides for
invertibility of a part of the mean-value expansion. Some of the conditions can be

relaxed at the expense of complicating proofs. El

PROOF OF THEOREM 1.2.2: Equations (1.6), (1.8), and (1.9) follow from the
standard asymptotic variance derivation for the GMM estimation using the optimal
weighting matrix (see, e.g., p. 2148 of Newey and McFadden, 1994; Hansen, 1982,
Theorems 3.1 and 3.2).

Equation (1.7) is obtained similarly but we separately expand the ﬁrst order con-

ditions corresponding to (A) and (B).

The TWO-STEP estimator of 602 minimizes h2(62)’C2—21h2(62). The ﬁrst order con-
ditions that the estimator solves are DézC2—21h2(62) = 0. Expanding around 62

gives

62 - 602 = —(Dé202—21D22)_1Dé202—21h_2(602)+ 0p(N_1/2). (1.28)

40

The TWO-STEP estimator of 601 minimizes h1(61, 62)’Cé'21h1 (61, 62). The ﬁrst order
conditions that the estimator solves are D’nCl—llh1(61,62) = 0. Expanding around

601 and using (1.28) gives
61 - 901 = —(D’IICﬁIDII)_1D'IICl-11h_1(901.902) + (1-29)
+ (DiICﬁlDII)—1Di10ﬁ1012(D’2202—21022)T10520§2152(9o2)
+ 0p(N-1/2).
On multiplying by x/IV and combining (1.28)-( 1.29), we get

VTWO-STEP = BCBII (1'30)

where C is deﬁned in (1.4) and

311 312
B: (1.31)
0 322
with
311 = —(D'11Cf11D11)_lDiICl_Ili
312 = (D(ICﬁlDll)“1D’uCﬁlD12(D§2C2'21Dgg)—1D§20231, (1-32)
1322 = -(D’22C2‘21022)‘ID§2C;21.

PROOF OF THEOREM 1.2.3: Statements 1 and 4 are proved on p. 148 of Qian

and Schmidt (1999) where it is shown that there is no gain in efficiency if and only

41

if DllCﬁlClg = 0. When the original problem is exactly identiﬁed (m1 = p1)
and D11 is non-singular (by assumption), this is true if and only if 012 = 0. If
the original problem is overidentiﬁed (m1 > p1) then the condition C12 = 0 is

sufﬁcient for no gain in efﬁciency.

To prove Statement 3 ﬁrst note that BD = —I, where I is the p1 + 192 dimensional

identity matrix. Then,

vao-STep—v6Ne-STep = BCB’-(D’C‘1D)‘1 (1.33)
= BCB’ — BD(D’C’1D)‘1D’B’

1 _1 ,_1_1_1,_11
= BC'Z[I—C 20(DC 2c: 20) DC 21023.

The matrix is brackets is the projection orthogonal to C ‘1/2D, which is positive

semideﬁnite.

vow.STEP for 01 is of the form (011011011 — A112M2‘21M21)‘1, where M12 =
M5, = 031011012 + 0110121322 and M22 is the lower right p2-block of 170-1 D,
which is positive semideﬁnite. Hence, the inverse of VONE-STEP for 61 minus

V—l

KNOWQTJOINT is negative semideﬁnite. Thus, VONE_STEP of 61 is no smaller than

VKNow_92_JOINT in positive deﬁnite sense, which proves Statement 2.

Further, KNOW-62-JOINT and ONE-STEP are equally efﬁcient if M12 = 0 but
M12 = D'11[C11D12 + 0121222]. This fact along with the fact that 012C231 =
-—(C11)"1C12 implies that if 012 = 012C2—21D22 than M12 = 0 which proves

Statement 10.

Statement 11 can be proved in two parts. First, Since M12 = 0 the inverse of

42

—1

VONE-STEP for 91 is simply 011C 11011 which is generally greater than VKNOW-62 =

D11 Cl—l1 D11 in the positive deﬁnite sense since C11 — 01—11 is positive semideﬁnite.
This along with Statement 10 implies that ONE-STEP and KNOW-62-JOINT are
no less efﬁcient for 61 than KNOW-62. Second, to prove that TWO-STEP is no
less efficient for 61 than KNOW-62 note that, by (1.30)—(1.32), VTwo-STEP for 61
is equal to 311011311 + 812021811 + 811012812 + 812022312. Also note that
811011811 = (D’11C1"11011)'1 and that, under D12 = 0120;2ng2, the sym-
metric positive semideﬁnite matrices —812021 Bil and -BllClgB(2 are equal to
812022812. ero-STEP for 61 reduces therefore to VKNow-62 minus a positive

semideﬁnite matrix, which completes the second part of the proof.

Statements 7—9 follow from Theorem 1 of Ahn and Schmidt (1995) and subsequent

discussion (pp. 21-22).

Statement 5 holds since if D12 = 0 then (1.7) reduces to (D’nCﬁ1 D11)’1, which

is equal to (1.8).

Statement 6 follows from Statements 4 and 10 and a trivial comparison of variances

in (1.7) and (1.6) under given conditions. C]

PROOF OF THEOREM 1.3.1: Follows trivially from Lemma 1.3.1 and part (ii) of

Assumption 1.3.4. Cl

PROOF OF THEOREM 1.3.2: (a) First, note that, by ignorability and (1.23), E[S -

 

. S—P ,6
h2(S, z; 602)’|z] can be ertten as E[S- P(zr(602)(1(:P0(2zfgo2D ~V92 P(z, 62) l02=002 lz] =

43

E52 —lES -P ,6 .
[ (126,212)(lifeleliﬁll "792 Perhaps, = v9, P(2192)|02=602,SlnCGE(52|Z) =

E(Slz) and E(Slz) = P(z, 602). This is nonzero in general. Second, E[g(W; 601)|z] aé

 

0 in general. Finally,

012 lEh1(W"‘, 001, 902)h2(5, Z, 902)'
= EimEINWﬂoDIZIElS/INS.z;6o2)’|zl}, by ignorability (1.34)

W,6
= Elij‘Z—ﬂggl-Va, P(Z.62)Io,=ao,l. by LIE

which is generally non-zero.

(b) Follows by (generalized) information equality, where h2(-) is the score, D22
is the expected Hessian, 022 is the expected outer product of the score, D12 is
the expected derivative of h1 with respect to 62 evaluated at 602 and 012 is the

covariance of h1 with the score. One may also write

012 = E{Vem%,2;|92=aozg(w;9ooh by (1.22)

= [—2<W>P<ze>h.=o.1
ES E W;6
= —IE[ ( lz) (g( 61)|2)V02 P(Z,62)]62=902], by LIE

 

P(Z,602)2
W;6
= _E[%Tﬂ§3v62 P(Z,92)|62=6021a 95 133(5)” = P(Z’QOZ)
= _012 by (1.34)

44

PROOF OF THEOREM 1.3.3: (a) By LIE and exogeneity,

 

S
C = E —— Y,X;6 ~uS,Z;6 ’]
12 [P(Z,,02)g( .1) < .2)
_ S . , I
_ EE[P(Z,002)9(YIX1001) ”(3321602) l2]
S
= E E W,6 zE -——--uS,z;6 'z]}
{ lg( 01)Il [P(z,902) < 02)|
= 0.
(b) By LIE and exogeneity,
D - E V —S—-— g(YX'6 )
12 92 P(Z, 02) 62:002 3 r 01

= ]E {E [g(Y, X; 6olllzl V92 [FZSTE

=0.

 

all}

45

Essay 2

Robustness, Redundancy, and
Validity of Copulas in Likelihood
Models

2.1 Introduction

In multivariate economic models, one is often ready to assume marginal distrib«
utions but is reluctant to impose a joint distribution. For example, in a panel
setting, economists often use a speciﬁc likelihood for each cross section separately
(e.g., PROBIT or LOGIT) but avoid modelling the joint distribution of the cross-
SectionS over time. Similarly, in selectivity models, it is often desired to allow for
unrestricted dependence between the disturbances in the primary and the selection

models, each of which has a well-deﬁned likelihood.

46

The usual way to handle the indeterminacy of the joint distribution is to assume
independence of the marginal distributions and employ quasi-MLE or to assume
joint normality and employ pseudo-MLE (e.g. White, 1982; Gourieroux et al.,
1984). In certain cases these approaches result in a consistent estimation while a

“sandwich” covariance matrix may be used for valid inference.

However, these approaches suffer from major weaknesses. First, there are impor-
tant cases when using a pseudo-likelihood does not result in consistent estimates.
Green (2002, Section 17.9) and Wooldridge (2002, Chapter 13) discuss such cases.
Second, as we Show below, there are estimators that dominate traditional QMLE

under non-independence.

The copula approach used here allows to replace normality or independence with an
alternative assumption about the joint distribution. Clearly such a replacement
is only warranted if the new distribution possesses some useful properties such
as ease of computation, robustness to misspeciﬁcation, and improved efﬁciency.
Arguably, copulas (or at least some of their families) may have such properties in
certain econometric models. The copula approach also incorporates multivariate

normality and independence as special cases.

The copula approach is relatively new to econometrics. A note by Lee (1983)
appears to be the earliest application of this approach in econometrics. Copulas
have recently received a lot of attention in ﬁnance literature. They are used to
model dependence in ﬁnancial time series (e.g., Patton, 2001; Breymann et al.,
2003) and in risk management applications (e.g., Embrechts et al., 2003, 2002).
Bouyé et al. (2000) provide an extensive discussion of prospects for copula in

ﬁnance. Use of copula in other subﬁelds of econometrics still appears rather limited.

47

Smith (2003) incorporates a copula in selectivity models and provides applications
to labor supply and duration of hospitalization; Cameron et al. (2004) use a copula
to develop a bivariate count data model with an application to the number of doctor

visits.

We start by presenting some basics on copulas. This is done in Section 2.2. Section
2.3 introduces the GMM representation of the likelihood-based models used in the
sequel. We show that imposing a joint distribution amounts to adding moment

conditions.

Imposing moment conditions makes consistency of the resultant estimator con—
ditional on the moment validity. Moreover, there are inﬁnitely many alternative
multivariate distributions that can be used. Section 2.4 shows that estimation of
means remains robust against copula misspeciﬁcation as long as the used copula
and the true joint density share a symmetry property. A simple simulation employs

most commonly used copula families to study their robustness properties.

It is well known that additional moment conditions cannot reduce asymptotic
efficiency if properly used. However, sometimes the additional moments do not
help even if properly used, i.e. are redundant in the sense of Breusch et al. (1999).

In Section 2.5 we develop conditions for such redundancy.

Section 2.6 proposes tests of copula validity that can help deciding on the copula.

Section 2.7 concludes.

48

2.2 Preliminaries

Deﬁnition 2.2.1 (Nelsen, 1999, p.40) An M-dimensional copula is a function
C : [0, 1]M —> [0,1] that has the following properties:

i. C(u1,...,um_1,0,um+1,...,uM)=0,m=2,...,M—1.
ii. C(1,...,1,um,1,...,1)=um,m=1,...,M.
iii. C is M-increasing: for every M-boxB = [a1,b1] x [a2,b2] x x [aM,bM],

whose 2M vertices (c1, . . . , CM) are in [0, 1]M , the C-volume of B, deﬁned by

VC(B)

2 2
Z Z (—1)21+'"+2MC(Clili-°'iCMiM)’
i1=1 iM=1

where cjl = aj and Cj2 = bj for allj E {1, . . . , M}, satisﬁes

Vc(B) Z 0.

Property (iii) implies for M = 2 that C(a1,a2)—C(a1,b2)—C(b1,a2)+C(b1,b2) Z 0
for any vectors (a1, a2), (b1,b2) 6 [0,1]2 such that am S bm, m = 1,2, i.e. C(a, b)

is non-decreasing in (a, b).

It follows from the deﬁnition that an M -dimensional copula C is an M -dimensional
cdf whose M marginals are uniform on [0,1]. One may also note that for any
M—dimensional copula C, M Z 3, each m-marginal of C, 2 S m < M, is an

m-dimensional copula.

49

The following well-known theorem establishes existence of such a function for any

joint distribution function of random variables. We restate it without proof.

Theorem 2.2.1 (Sklar, 1959, p.229-230) Let H be an M ~dimensional distribution
function with marginals F1, . . . , FM. Then there exists an M -dimensional copula

Csuch thatforallmeR,m=1,...,M

H(I1,-.-,$M) =C(F1(:v1).-~.FM($M))- (2-1)
If F1,...,FM are continuous, then C is unique. Conversely, if C is an M -
dimensional copula and F1, . . . , FM are distribution functions, then the function

H in (2.1) is an M -dimensional distribution function with marginals F1, . . . , FM.

Thus, a copula is a multivariate distribution function that connects two or more
marginal distributions to exactly form the joint distribution. A copula thus com-
pletely parameterizes the entire dependence structure between two or more random
variables. It is important to note that a given joint distribution function H de-
ﬁnes a unique set of marginal distribution functions Fm, m = 1, . . . , M, whereas
given marginal distributions do not determine a unique joint distribution (and the

implied copula).

To connect copulas to likelihood-based models, let h and c be the derivatives of

the distribution functions H and C, respectively; let fm be the derivatives of the

50

marginal distribution functions Fm, m = 1,. . . , M. Then,

6MH(x1,...,xM)
6x1...6xM
6MC(F1(171)» - - 'iFM(xM))
8:131...C$M

 

 

 

aMC<u1,...,UM) ﬁ fm(ggm)

Bu ...Bu
1 M um=Fm(xm),m=1,...,M m=1

M
= C(F1($1),...,FM(£IIM)) H fm($m)

m=1

i.e., the joint density is the product of the copula density and the marginal densities.

In what follows we restrict our attention to the bivariate case (M = 2). We let the
marginal densities f1 and f2 be functions of an unknown parameter vector 6 6 RP
and the copula density c and the joint density h be functions of an additional

parameter vector p E R4. Then
1nh($1.$2;9,p) =1nC(F1($1;9).F2($2;9l;p) +1nf1($1;9) +1nf2(132;9)- (2-2)

Note that p parameterizes the entire dependence between the two random variables.

See Appendix A for selected copula families used in this paper.

For our discussion of copula misspeciﬁcation, we let K denote some copula other
than the true copula C and we let k denote the corresponding copula density

function.

51

2.3 The GMM representation

MLE assumes a complete and correctly speciﬁed joint likelihood in (2.2). For
the purposes of this paper, quasi-MLE (QMLE) assumes correctly Speciﬁed mar-
ginal distributions and maintains their independence and thus only uses the last
two terms in (2.2). In panel settings, what we call QMLE is often referred to as
the partial likelihood method (see Wooldridge, 2002, Section 13.8). Pseudo-MLE
(PMLE) assumes an incorrect joint distribution and thus uses an incorrectly spec-

iﬁed copula term in (2.2). The (correct) copula term in (2.2) is therefore what

distinguishes MLE from QMLE (and PMLE).

It is well known that likelihood-based models can be represented as GMM models
based on likelihood equations (see Godambe, 1960, 1976). The expected value of
the score function for the correctly speciﬁed joint log-likelihood (2.2) is zero at
the true value of parameters. Furthermore, if the marginal densities are correctly

speciﬁed, the same is true for the marginal log-likelihoods.

Let 60 and p0 denote the true values of the parameters 6 and p, respectively.

Assume that the following four moment conditions hold if and only if 6 = 60 and

p = pa:
rag, lnf1(X1;6o) =0, ( )
E3%1nf2(X2;90) = 0. ( > (23)
E599 memos; 9.). F2(X2; 00);po) = 0, (C)
E39,meme.)F2<X2;6.);po) =0. ( )

We call moment conditions (A) and (B) the “marginal moments” and (C) and (D)

the “true copula moments”. Note that as stated in (2.3), the GMM problem is

overidentiﬁed: it involves p + q parameters and 3p + q moment conditions.

52

Here we will assume that the marginal distributions are correctly speciﬁed but
the copula function may not be. If the copula is incorrectly speciﬁed, then copula
moments (C-D) may not hold at (60, p0). They may however hold at (60, p5),
where p109 74 p0. If they do we will say that the copula is “robust”. In this case, we
will replace (C) and (D) in (2.3) with

12393111 k(F1(X1; 6.). F2(X2; 0.); pi) = 0. (C')

(2.4)
EgglnkfFﬁxliao),F2(X2;90liplil = 0. (D’)

We call (C’-D') the “misspeciﬁed copula moments”. For the sense in which a para-
metric model for a distribution is correctly speciﬁed see, for example, Wooldridge

(1994, p. 2672).

Our primary focus is estimation of 60. GMM is an appropriate framework for our
analysis because it allows studying robustness and efﬁciency of various likelihood-

based estimators of 60 (MLE, QMLE, PMLE) by considering misspeciﬁcation and

redundancy of copula moments.

Speciﬁcally, consider MLE versus QMLE in terms of efﬁciency. The MLE pro-
cedure for (60, p0) maximizes the joint likelihood in (2.2). This is equivalent to

the Optimal GMM estimation based on the expectation of the score of the joint

likelihood, i.e.,

E590 {1n f1(X1; 90) + 1n f2(X2; 90) + In c(F1(X1; 90), F2(X2; 9a); Poll

BBQ/3 1nC(F1(X1; 60), F2(X2; 6o); .00)
(2.5)

We Show in Section 2.5.1 that this is equivalent to the Optimal GMM based on

(2.3).

53

At the same time, the QMLE procedure for 60 maximizes the joint likelihood

assuming independence of marginal distributions. This is equivalent to the optimal

GMM based on
E—lnf (X'6)+E—lnf (X'6)=0 (26)
E99 1 13 0 96 2 2, 0 ' '

We will show in Section 2.5.1 that the Optimal GMM based on (2.6) is no more

efficient than the optimal GMM estimator based OII

ash faxing)

= o, (2.7)
Egg 1n f2(X2;00)

which we will call the Improved QMLE (IQMLE).

Thus MLE and (I)QMLE are equally efﬁcient only if the extra copula moments in
(2.3) do not help improve efficiency of estimation of 60. In GMM literature this
is known as partial redundancy of copula moment conditions given the marginal

moment conditions in estimation of 60 (see Breusch et al., 1999, Section 4).

Similarly, the PMLE procedure for (60, pg) maximizes the joint likelihood in (2.2)

with a misspeciﬁed copula. This is equivalent to GMM based on

as {In f1(X1;90) +1n f2(X2;90) +1nk(F1(X1;90),F2(X2;90);pl§)}
15356;, 1nk(F1(XI; 90), F2(X2; 90); pl?)

=0.

(2.8)
Since we assume correct speciﬁcation of the marginal distributions, the moment
conditions in (2.8) do not hold if and only if the moment conditions in (2.4) do

not hold, i.e. if and only if the copula moments are not robust to misspeciﬁcation.

54

Finally, for robust misspeciﬁed copula moments, we will Show in Section 2.5.2 that
PMLE is dominated by the optimal GMM estimator using (2.3A—B)-(2.4C’-D'),
which we will call the Improved PMLE (IPMLE). Thus the question of relative
efﬁciency of IQMLE versus IPMLE for 60 is that of partial redundancy of the

misspeciﬁed copula moments.

2.4 Robustness of copula terms

Redundancy applies to valid moment conditions. We therefore ﬁrst discuss ro-
bustness of copula terms to misspeciﬁcation. We seek to characterize an incorrect

copula K, for which the copula moments in (2.4) hold in the population.

2.4.1 A theoretical result

Let X1 and X2 be random variables with joint distribution function H, marginal
distribution functions F1 and F2, respectively, and copula C. Let (ul, [1.2) be a

point in 1122.

Deﬁnition 2.4.1 (X1,X2) is radially symmetric (RS) about (u1,p2) if

HOII +£1.1u2 +932) = 1 - F1(H1 --T1)- F2012 -$2) +H(u1 -l‘1,#2 -$2), (2-9)

for all (x1,x2) in {R2 U{:l:oo}}.

55

Essentially, RS requires that any two points equally distant from 011,112) that lie
on the same line identify tail segments under the joint density function that have
equal volume. It is clear from (2.9) that the true joint density h(x1,x2) satisﬁes
h(ul + x1,u2 + x2) = h(p1 - x1,u2 - x2) under RS. Moreover, if 2:1 or x2 in
(2.9) is taken to be equal 00, it follows that Fi(pi + x,) = 1 — Fi(p,- — x,), or
Prob(X,- — p,- S x,) = Prob(p,' - X,- S x,), i.e. X1 and X2 are marginally
symmetric (MS) about (p1, pg). R8 is therefore a stronger symmetry concept
than the usual (univariate) symmetry of random variables. It is however weaker
than joint symmetry, which holds when h(ul +x1, [L2 +x2) = h(ul +x1, pg —x2) =
h(ul — x1,u2 + x2) = h(pl — x1,u2 - x2) (see Nelsen, 1993, for details). Many
commonly used distributions are RS. For example, bivariate Normal, bivariate
Student-t, bivariate Cauchy and other elliptically contoured distributions are RS.
For a discussion of the elliptically contoured family of distributions, see Mardia

et a1. (1979, Section 2.7.2).

Now consider some copula K 79 C.

Deﬁnition 2.4.2 A copula K is radially symmetric (RS) if

K(1— u, 1 — v) =1— u — v + K(u, v), for all (u, v) in I2. (2.10)
Radial symmetry of copulas requires of the copula function what radial symmetry
of random variables requires of the joint density function. Eq.(2.10) suggests that

for the rectangles [0, u] x [0,22] and [1 — u, 1] x [1 — v, 1], the volume under the

copula density function is the same for any (u, v).

56

It can be shown that (marginally) symmetric random variables X1 and Xg are
radially symmetric if and only if C satisﬁes (2.10) (see Nelsen, 1999, p.33). So if
(X 1, Xg) is RS then (2.10) holds for the true copula 0. However, (2.10) may hold

for many other RS copulas.

It is sometimes easier to verify radial symmetry of a copula function K by checking
whether the copula density k satisﬁes the equation k(1 — v, 1 — u) = k(v, u), Vu, v.
For example, for FGM family it is easier to verify that the density function satisﬁes
this condition than to verify that the copula function satisﬁes (2.10). In contrast,
for other families in Appendix A it is easier to check (2.10). Using one of the
methods, one can establish that the independence, FGM, Normal, Plackett, and
Frank families are RS, while the Logistic, AMH, Joe, Clayton and Gumbel families
are not. Interestingly, Frank (1979) shows that the only Archimedean copula family
(see Appendix A for the deﬁnition) that satisﬁes (2.10) is the Frank family. Joe,
AMH, Clayton and Gumbel are all Archimedean copulas that are not RS.

Theorem 2.4.1 If (X1,X2) are RS about (p1,pg) then

6
E5; 1nk(F1(/11 + X1), F2012 + X2),p) = 0,Vp 6 R9,

where k is any RS copula density.

Proof: See Appendix B for all proofs that are not given in the main text.

By Theorem 2.4.1, the misspeciﬁed copula moment condition in (C') can be used
to consistently estimate the symmetry point (p1, pg) as long as the copula function

and the true joint density share the property of radial symmetry.

57

Note that the theorem does not state anything about moment condition (D’) in
(2.4) and the true copula dependence parameter p. Generally, under the regularity
conditions, (D’) will hold in the population for some value of (6, p) but not nec—
essarily for (60, p0). However, (C’) holds under the conditions of the theorem no

matter what value of p is used in (C’).

2.4.2 An illustrative simulation

To illustrate the result of the theorem and to study the behavior of both the
misspeciﬁed copula moments (C’) and (D’) in ﬁnite samples, this section presents

results of a simple Simulation concerning a sample mean problem.

For copula K, deﬁne the sample analogues of misspeciﬁed copula moments (C')

and (D’) in (2.4)

T

. _ a

o9<ep> a T 1 Z —1nk<F1(X1.;o). F2(X2t;9);p) (2.11)
t=1 as

and

_ T a

59(9. P) E T—l Z —1nk(F1(X1t;9).F2(X2t;9);p)- (2-12)
t=1 8’0

Clearly, if K = C then 5(60, p0) —>,, 0 since (C) and (D) hold in population. More-
over, by WLLN, for any misspeciﬁed copula for which (2.4) holds, 3(60, p5) —>p 0.

However, for non-robust copulas, the probability limit may be non-zero.

In order to be able to compare copulas we deﬁne a common measure of dependence.

There are very many such measures (see Nelsen, 1999, Section 5). We pick one

58

that has a simple copula representation.

Deﬁnition 2.4.3 For any two continuous random variables U and V whose copula

is K, Kendall’s 7' measure of concordance is given by

7- : 4f];2 K(u,v;p)dK(u,v;p) — 1. (2-13)

It follows from (2.13) that

7- : 4/[2 K(u, u;p)k(u,v;p)dudv — 1 = 4EK(U, Vipl — 1- (2-14)
I

For two random variables, Kendall’s r can be viewed as the probability that “large”
(“small”) values of one are associated with “large” (“small”) values of the other (the
probability of concordance) minus the probability that “large” (“small”) values of
one are associated with “small” (“large”) values of the other (the probability of
discordance). Importantly, various copulas cover unequal ranges of dependence
as measured by Kendall’s r (see Appendix A). We therefore control for r in all

one-parameter copulas.

In the simulation, we use the fact (see, e.g., Kendall, 1949) that for the Normal

copula with Normal margins, Pearson’s correlation coefﬁcient p is related to r:

p = sin gr. (2.15)

This allows us to derive the value of Kendall’s r that corresponds to the true
value of Pearson’s correlation coefﬁcient p employed in simulating the joint Normal

distribution.

59

We employ the following procedure:

X1 m I r
Step 1. Generate T realizations of ~ N , by
X2 m 7' 1
0 1 0
e generating Z ~ N , ;
0 0 1

0 using the Cholesky decomposition

Step 2. For each realization t, calculate

e u,t(p) = <I>(X,- — ,u),i = 1, 2, where <I>() is the Standard Normal c.d.f.;

- My, p) 5 (211,01), unﬁt); p);

0 5501.10) E 3%111 lamp) and 5501.10) ‘-—‘ 537, In 19:01.10);

Step 3. Calculate sample averages

'd

501.0) 5 25min)-

t=1

Step 4. Plot the resultant functions 3601, p) and 5901, p) over a relevant range of p

and p.

Step 5. Evaluate the sample means cf“ and 5p and the sample standard errors

se(5") = s“/\/Tse(5p) = sp/ﬁ

60

Table 2.1: The true values for Kendall’s r and p used in simulation

 

k

 

Copula p0 r0
Independence — 0
Logistic — 1 / 3
Farlie-Gumbel-Morgenstern (FGM) 0.872880 0.193973
Joe 1.426845 0.193973
Ali-Mikhail-Haq (AMH) 0.697058 0.193973
Clayton 0.481321 0.193973
Gumbel 1.240654 0.193973
Frank 1.801160 0.193973
Normal with Normal margins 0.3 0.193973

 

 

at the true parameter values p = m0 and p = p109, where

 

 

.- = Filament) — 3'(mo,p’8))2
T - 1 °

The true parameter values in Step 1 are m0 = 0 and r0 = 0.3. We use (2.15) to
calculate the true r and then we use (2.14) to derive the value of p corresponding to
the true value of r for each copula. We consider the independence, Logistic, Farlie-
Gumbel-Morgenstern, Joe, Ali-Mikhail—Haq, Clayton, Gumbel, Hank and Normal
copulas. For some of these copulas it is possible to obtain an analytical solution
for p in terms of 7' using (2.14) (see Appendix A), otherwise we use numerical
methods to approximate the true value of p with desired accuracy. Note that the

independence, Farlie-Gumbel-Morgenstern, Frank and Normal families are radially

symmetric.

Table 2.1 contains the true values of r and p for the considered families of copulas.
We choose r0 = 0.3 because it corresponds to a value of r within the coverage of

all the one-parameter copula families we consider. Note that the two no-parameter

61

copulas, independence and Logistic, imply dependence measures that are different

from the true.

Figures 2.1 through 2.8 of Appendix C contain the plots of (it"(p, p) and 59(p, p)
obtained in Step 4. The sample size used for the plots is 200. According to Figure
2.1, the independence copula is robust: the copula term is identically zero even
though the marginal terms are not independent. The copula term for the Logistic

copula is zero for a value of u around 0.33.

Figures 2.2—2.8 illustrate how the one—parameter copulas compare in terms of ro-
bustness. Note that all the surfaces appear to intersect the zero plane at around
the true values of the parameters, which suggests general robustness. As we show
below, however, one cannot accept the hypothesis of zero (I for all copula families.

The benchmark for comparisons is the Normal copula — Figure 2.7.

Interestingly, the sample analogue of the Normal copula moment (C) is close to
zero at the true value of u for any value of p and at p = 0 for any value of u —
panel (6a). The FGM, AMH and Frank families display a similar feature — panels
(1a), (3a) and (7a). Clearly, when p = 0, these four families of copulas reduce to
the independence copula, which is known to be robust. When p aé 0, if” is still
close to zero at the true m0. This observation suggests robustness of the FGM,
AMH and Frank families. With these copulas, one can use the copula moment (C)
with any assumed p and obtain a consistent estimate of ,u. The other families do

not exhibit this advantage.

Of course, the FGM and Frank families of copulas are RS. The observed robustness

of these families is clearly a consequence of the theoretical result in the previous

62

section. However, the AMH family is not RS. Why is the AMH copula robust?
To answer this question, write the AMH copula as an inﬁnite sum of a geometric

sequence
21.1)

= uv 00 — u — v ’9. ,
1_ ,0(1- u)(1— v) [El/)0 )(1 ll (216)

 

The F GM copula is then the ﬁrst-order approximation to the AMH family, which

explains similar robustness.

To test the features illustrated on the ﬁgures, in Step 5 we calculate 6“ and 6” at
the true parameter values p = m0 = 0 and p = p0 and evaluate standard errors for
these averages. Table 2.2 shows these values along with the estimated Pearson’s
correlation coefﬁcient f0 as sample size grows from 200 to 30,000. The ratio of
the sample average to the standard error in parenthesis is a test statistic. Under

H0 : 6 = 0, it is asymptotically standard Normal.

The table entries for the Logistic copula are signiﬁcantly different from zero. This
copula is not RS and it implies a different measure of dependence (1' = 1 / 3). This
suggests running the same simulation with common 7' = 1/3 for all copulas. How-
ever, this value falls outside the coverage range for several one-parameter copula

families (see Appendix A), making a general comparison infeasible.

As expected, the entries for the Normal copula are insigniﬁcantly different from
zero for all sample sizes. For the two RS copula families, FGM and Hank, one
cannot reject the null either. The AMH family is fairly robust, too. For the Joe,
Clayton and Gumbel families, the sample averages are signiﬁcantly different from
zero for at least one sample size which conﬁrms the observation that these non-RS

copulas are not robust in this setting.

63

.36 288% 396 .88 732 5km 23 3 83 EOE accuommv baa-momﬁwmmﬁ v.3 5% was 1% :02? new 3398 money—ow n ”@302

 

 

853 33 :23 s...
688.8 2.338 238.8 838.8 53.8 38.38
385.5 935.5- 333- 538.5- 383- 358.5 :5an
358.8 658.8 @888 @358 @328 33.8
883 535.5- $35.5. $3.5. $3.5- 38.5- assess
838.8 835.8 382.8 835.8 33.8 3888
$3.5- 5§3 SE3- 3:3 333. e335 Esau
$33.8 $358 33.8 385.8 335.8 338
£83- 383- 83.5- 385- $33- 383- e856
3838 $35.8 $888 33.8 283.8 3588
833. $33 33.5- 383- 38.5- 885.5 Sensing?
S338 8358 8388 £358 @388 388.8
833- R383 583- 333 333- 323 8..
£388 $35.8 $558 $3.88 6888 853.8
83.5- 333- $23. 383- 333- 38.5 ”assesses-aha:Chess
33.8 8388 $3.8
- 883- - Sets- - £83- eseaea
I o l o I o cocowaoaovﬁ

 

64

A551 528% Ace 528%

Asa 5.28% Ass 528% A558 528% A55. 528%

 

ooodeB ooodnm. oomH-H

 

moumw 038-3 83» new om acomoEooo
comes—850 Mcoﬂebm vegan—Ewe was .maobo «56:53 :05 .3388 @0323 no“ 35305 $05338 Q’s-Box ”mm oﬁwﬁ

Among the one-parameter copula families, several entries in the table stand out.
First, the Frank family sample averages are at least as close to zero as the Normal
benchmark for all sample sizes. Second, the FGM family sample averages are closer
to zero for T = 200 than the Normal family average. For the other sample sizes,
sample averages for the two families are comparable. Third, the AMH family also
performs well in the sense of the sample averages being insigniﬁcantly different
from zero. In particular, 6“ for this family is not signiﬁcantly different from zero
for all sample Sizes. Finally, the Clayton family averages are close to zero for the

smaller sample size but not for the larger.

In the previous section, it was noted that (D’) does not generally have to hold in
the population for RS copulas. An interesting observation from Table 2.2 is that
sample analogues of (D’) are insigniﬁcantly different from zero for RS copulas and
signiﬁcantly different from zero for others. This does not follow from Theorem

2.4.1.

2.5 Redundancy of COpula terms

We now turn to the question of redundancy of COpula moments. We assume that we
either have the true copula moments (2.3C—D) or the robust misspeciﬁed copula
moments (2.4C’-D') that hold at the true value of 6. We would like to study
conditions under which using valid copula moments (either the true or misspeciﬁed

ones) does not result in efﬁciency gains in estimation of 6.

65

2.5.1 Redundancy with correct copula

We ﬁrst prove a lemma that reveals the structure of the varianCe and derivative
matrices of the moment functions in (2.3). Recall that correct speciﬁcation of the

copula is assumed in (2.3).

Lemma 2.5.1 Denote the covariance matrix of the moment functions in (2.3) by

C, their expected derivative matrix with respect to (6, p) by D. Then,

AG—GO

G’ B —G’ o
C: (2.17)
—G’ —G J E

 

 

 

 

 

L _
and _
l —A o
—B 0
D = , (2.18)
G + G’ — J —E
—E’ —F J

 

 

 

where A, B, E, F, G, J are matrix-functions of (6, p) deﬁned in Appendix B.

Several important observations immediately follow from the lemma. First, (A)
and (B) are uncorrelated with (C) if and only if (A) and (B) are uncorrelated with
each other (G = 0). Second, the optimal GMM based on (2.3) is identical to the
ML estimation in (2.5), as claimed in Section 2.3. To see this explicitly, note that

the optimal GMM on (2.3) does not change if (2.3) is pre-multiplied by a matrix

66

W such that W = D’ C"1, if C is nonsingular. But, by Lemma 2.5.1,

, _1 II II I o _1 I I ll 0

0001i 00011

where l denotes the identity matrix of the relevant dimension. Clearly, this re-
produces the MLE ﬁrst order conditions (2.5). Not surprisingly, estimators that
use the same ﬁrst order conditions yield the same asymptotic variance matrices.
In particular, for non-singular C, the asymptotic variance matrix of the optimal

GMM estimator of (6, p) based on (2.3) can be written as
VGMM = (D’C‘ID)"1. (2.19)

(We use the standard notation according to which “V is the asymptotic variance
of an estimator 6” means that “\/N (6 - 60) converges in distribution to N (0, V).”
It is implicit that D and C in the asymptotic variance formulas are evaluated at
the true values 60 and p0.) By Lemma 2.5.1, this is identical to the asymptotic

variance matrix of the MLE estimator of (6, p)

- -—1

( 10)

IMO 1111110 110
C

VMLE=- D =
00011 00011 H0

1 I“.

In contrast to VGMM, VMLE is deﬁned even if C is singular. In fact the last

 

 

 

 

(2.20)
representation in (2.20) involves the outer-product—of-the-score form of the infor-

mation matrix, while the one before the last involves the expected-Hessian form

of the information matrix. Both are non-singular under regularity conditions.

67

By a similar argument, it follows from Lemma 2.5.1 that the marginal moments
(2.7) are not equivalent to the QMLE ﬁrst order conditions (2.6). To see this
explicitly, partition C and D as follows:

C C D 0
C: 11 12 ’ D: 11 , (221)

C21 C22 D21 D22
where C11, 012, C21, 022, D11, D21, Dgg correspond to the blocks separated by
the dotted lines in (2.17-2.18). The optimal GMM based on (2.7) does not change
if the moment conditions (2.7) are pre—multiplied by a matrix W11 such that

W11 = D11’Cfll, if C11 is nonsingular. Now, using Lemma 2.5.1,
W11 = Dlllcfll = — ] II II] - ] —G’ —G ] (311—1.

The last term is what distinguishes the optimal GMM based on the stacked mar-
ginal moments (2.7) from summation (2.6) employed by QMLE. Call the GMM
estimator based on (2.7), the Impmved QML estimator (IQMLE).

Schmidt (2004) shows that correlation between marginal scores used in the optimal
weighting matrix results in efﬁciency gains over summation and that there are
interesting cases when the two estimation methods are equally efﬁcient. A trivial
such case is when there is no correlation between the marginal scores, i.e. G = 0.
We provide a formal statement and a proof of this relative efﬁciency result in the

following theorem. The logic of the proof will be used again when we compare

PMLE and IPMLE.

Theorem 2.5.1 (Schmidt, 2004) Let VIQMLE and VQMLE denote the asymptotic
variance matrices of the IQMLE and QMLE of 60, respectively. Then, VQMLE —

68

VIQMLE is positive semi-deﬁnite.

Proof. Deﬁne A = [II II]. Then, (2.6) can be rewritten as (2.7) pre—multiplied by
A. Correspondingly, the variance matrix of the moment functions in (2.6) can be
expressed as AC1 1A’ , where C11 is the variance matrix for the moment functions
in (2.7), deﬁned in (2.21). Similarly, the expected derivative matrix for the moment

conditions in (2.6) can be expressed in terms of the relevant matrix for (2.7) as

ADll.

Then,
VQMLE = [(ADn)’(ACnA’)‘1(ADrrn—I. (2.22)

while

VIQMLE = [D11'CﬁlDlll—1- (2-23)

But VQMLE — VIQMLE is positive semi-deﬁnite (PSD) if and only if vI—CiMLE —
VOL/{LE = D11’C1‘11D11 — D11’A'(AC11A’)'1AD11 is PSD. The last expression
can be rewritten as D11’C;,1/ 2[II — C}{2A’(AC}(ZC}{2A’)-1AC](2]C;11/2D11.

This is PSD because the matrix in brackets is the PSD projection matrix orthogonal

to CiizA’. Cl

Conditions under which the copula moments do not help in terms of efﬁciency for
6 can be derived by comparing VIQMLE with the upper left p x p block of VMLE-
When C is non-Singular, the comparisons can be equivalently made to the upper

left 1) X 1) block Of VGMM°

Breusch et al. (1999) (henceforth, BQSW) developed a very useful toolbox for

analyzing redundancy of a set of moment conditions given another set of moment

69

conditions. However, their analysis assumes nonsingular C. For this reason, we
do not employ their results here but compare VIQMLE with the relevant block of

VMLE directly.

Theorem 2.5.2 VMLE for 6 and VIQMLE are equal if and only if
J — Cglcﬁctj2 — EF’IE = o, (2.24)

where C31 = Caz, = [—G' — G].

The cumbersome expression in (2.24) has a simple interpretation in terms of sin-
gularity of C. It states that the linear projection of moment condition (C) on
moment conditions (A), (B) and (D) is uncorrelated with moment condition (C).

More speciﬁcally, (2.24) can be rewritten as follows

 

 

3991an
IE Elna—92194 6111f 31116 =0
36 11 39 2 36’ I
where - .
A G 0

921=l-G' ‘G E], n11= G’ B O

 

 

00F
L J

and the arguments of the moment functions have been suppressed for brevity. In
other words, (C) has to be a linear combination of (A), (B) and (D) for the copula
- information to be redundant in terms of asymptotic efﬁciency of estimation of 6.

Thus C has to be Singular.

70

Since VMLE = VGMM for non-singular C, and VIQMLE is equal to VMLE for 6 if
and only if C is singular, thus equality of VIQMLE and VGMM for 9 is impossible

unless (C) is a linear combination of (A), (B) and (D).

Corollary 2.5.1 If (C) is a linear combination of (A) and (B) with ,0 known

then

1. E = o,-
2. J -— 03,0;llo‘f2 = o,-

3. IQMLE is eﬁ‘lcient.

We therefore have two cases when the copula knowledge in (C) and (D) is redun-
dant given the knowledge of the marginals in (A) and (B). One case is when the
copula moment (C) is a linear combination of (A) and (B). The other case is when

(C) is not a linear combination of (A) and (B) but is a linear combination of (A),

(B) and (D). In both cases, C is singular.

Examples at the end of this section illustrate how one can apply the redundancy

results in practice.

2.5.2 Redundancy with misspeciﬁed copula

Now suppose incorrect but zero—mean copula terms in (2.4C’) and (2.4D') are used

in estimation. When is such knowledge redundant in terms of efﬁcient estimation

of 0?

71

Lemma 2.5.2 Denote the covariance matrix of the moment functions in (2.3)
that employ the copula moments (240') and (2.417) instead of ( 2. 3C) and ( 2. 3D),

respectively, by Ck, their expected derivative matrix with respect to (6, pk) by Dk.

 

 

 

 

 

Then,
A G —K —P
I I I
Ck _ G B —L —Q
—K’ -L N V
L —P’ ——Q V’ W
and .- _
—A 0
—B 0
Dk _ ,
K’ + L — M —S
—S’ —T J

 

 

 

where A,B,G are as in Lemma 2.5.1, K,L,M,N,P,Q,S,T,V,W are matrix—
functions of (6, pk) deﬁned in Appendix B.

Lemma 2.5.2 can be used to make the following observation. The optimal GMM
estimator using (2.3A-B)-(2.4C’-D') is not identical to the PML estimator. This is
in contrast with Lemma 2.5.1, in which MLE coincided with GMM using (2.3A-D)
because we had knowledge of the correct copula. More speciﬁcally, the optimal
GMM estimator based on (2.3A-B)-(2.4C’-D’) is unchanged if (2.3A-B)-(2.4C'—
D’) are pre-multiplied by matrix Wk = Dl"(Ck)’1 if C1‘ is non-singular. Using

Lemma 2.5.2, it can be shown that

lllllIO
00011

Dk’(ck)-1 = — + 2(ck)-1,

72

where Z contains G’ - K’, G — L, N —— M’, P’, Q, V’ -— S’, W—T’. Clearly,
Lemma 2.5.2 becomes Lemma 2.5.1 if k = c. In this case, Z = 0, Wk = W, the

optimal weighting retrieves (2.5), and PMLE is equivalent to MLE.

For k aé c, correlation patterns impossible in Lemma 2.5.1 now provide potential
efficiency gains over PMLE. We call the GMM estimator using (2.3A-B)-(2.4C'—D')
the Improved PM L estimator (IPMLE).

Theorem 2.5.3 Let VIPMLE and VPMLE denote the asymptotic variance matri-
ces of the IPMLE and PMLE of (00, pg), respectively. Then, VpMLE — VIPMLE

is positive semi-deﬁnite.

Proof. Deﬁne
II II II 0

0001

Then, (2.8) can be rewritten as (2.3) pre—multiplied by A. Correspondingly, the
variance matrix of the moment functions in (2.8) can be expressed as ACll‘lA’.
Similarly, the expected derivative matrix for the moment conditions in (2.8) can

be expressed as ADl‘.

Then,
VPMLE = [(AD“)’<AC“A’)—1(Anni—1. (2.25)

while

I _ _
VIPMLE=le(Ck) 1131‘] 1- (226)

73

vaLE — vIpMLE is PSD if and only if v;,,1MLE — VEIIMLE = Dk’ck‘lok —

Dk’A,(ACkA,)-1ADR is PSD. Rewrite the last expression as
Dk’(Ck)—1/2[H_ (Ck)1/2A,(A(Ck)1/2(Ck)1/2AI)_1A(Ck)1/2](Ck)_1/2Dk.
This is PSD because the matrix in brackets is the PSD projection matrix orthogonal

to (Ck)1/2A’. El

Clearly, (I)PMLE does not improve precision of estimation of 6 over IQMLE if
and only if the upper left p x p block of V(I)pMLE is equal to VIQMLE- We focus
on VIPMLE because by Theorem 2.5.4, if IPMLE does not improve over precision

of IQMLE for 6, then neither does PMLE. vIpMLE is only defined when (3k is

non-singular, thus we can apply the redundancy toolbox of BQSW.

Theorem 2.5.4 VIPMLE for 6 and VIQMLE are equal if and only if
6 —1 6 - k _

where C3]; = [—K' — L], C51; = [—P’ — Q].

In (2.27), M — CglfCl—lngz and R — Cglfolngz can be viewed as covariance

matrices between copula moments (C’-D’) and the error in the linear projection of

74

the true copula moment (C) on the marginal moments (A-B). More explicitly,

I

5 In
M— oglfol—llcgz =1E $1M %lnc—C21Cfll 3;; f1 , (2.28)
lnf
39 2
a I
k _ 9 a 3 __ aylnfl
Rucgilculcl2 =19: 551m game—(3210111 a
39111 f2

Clearly, when both of these matrices are zero, (2.27) holds for any S. Also, if only
(2.28) is zero and S = 0, (2.27) holds for any R and Cgll‘.

Corollary 2.5.2 If (C) is a linear combination of (A) and (B) with p known then

6k -1 6 _ .
pk —1 9 _ .

3. IQMLE and IPMLE for 6 are equally eﬁ‘icient.

By the corollary, knowledge about robust but misspecified copulas is redundant
in estimation of 6 given (A) and (B) when the true copula moment (C) is not

informative given (A) and (B).

2.5.3 Examples

The following four examples illustrate how the redundancy results can be used

in practice. The ﬁrst three examples show problems where the copula moment

75

conditions are redundant and thus IQMLE is efﬁcient. The last example considers

a situation when copula moment conditions are not redundant in general and

IQMLE is generally inefficient.

Bivariate Normal with common mean. Assume Normal marginal densities

withof=o§=1andu1=u2=u

 

1 _(07=2-u)2
f2(172;#) = —€ -

Let the true joint density be Normal, i.e.,

1 _(ml—#)2+(1§2-H)2jﬂxlﬂilfftz—M)
h, , , ; , = —— , 2(1_ )
($1 1‘2 M I0) 2””?

Then, the implied copula is the Normal copula

_p(p($1 -u)2+p(x2-u)2-2($1-u)(x2 #1))
C(F1($1;#),F2($2;M);p) = 7:7 2W2) .

 

where p is the copula dependence parameter (Pearson’s correlation coefﬁcient).

Thus we have the setup of our simulation in Section 2.4.2.

76

The relevant moment conditions are

 

E{X1 -#} =0, (A)
E{X2-“}=03 (B)
E {_((X1-u])w:(11(2-u))p} = 0, (a)

2 2 2 2 2 2 2
1E _p(X1+X2)+u(1-p) (X1+X2)-gl+p )2X1X2+p(p —1)—# (l-p) ___ 0. (D)
(p-l) (p+1)

(C) is clearly a linear combination of (A) and (B) for known p. By Corollary
2.5.1, the true copula moments are redundant for estimation of p. Furthermore,
by Corollary 2.5.2, any valid misspeciﬁed copula moments do not help improve

precision of estimation over IQMLE of ,a. IQMLE of p is efﬁcient.
Section 2.4.2 provided evidence of robustness of independence, F GM, AMH and
Frank copula families. None of them would allow to improve efﬁciency over IQMLE

of a.

Note that using the Normal moment generating function, one can show that

 

 

 

 

 

 

1 p —p 0 l P —1 0 l
p 1 —p 0 —1 0
C = 2 2 ’ D = 2 ’
-p -p r57, 0 T125 0
1 2 1+p2
0 0 0 0 —
s (p—1)2(p+1)2 ~ - (p-1)2(p+1)2 -

where det(C) = O, and

1 + p
VMLE = VIQMLE = —2—-

77

Bivariate Normal regression. Let y = x6+e, where y = (y1,y2)’,x = (x1, xg)’.

Suppose x is non-random. Let 6 = (61, 62), ~ N(0, 2), where

a p
2: 1
p 0'3

and of, 0% are known but p is not.

 

 

Then,
(yj-xaﬁlz
1 - 2a

f1(y1;$1,»3)= 6 1 ,

‘/2ro%
Jilﬂgﬁi’
f2(y2;$2,ﬁ)= 8 202 ,

27mg

 

h(y; x. a p) = e-iw-xv’z—Hy—xa

1
27Tx/l—‘Zl

Then, the implied COpula is Normal,

2 2
2 2 _€ 510 ‘62P _e 620 —61p
0102 2(1'2‘2'? ‘22 1'12?
C(F1(y1;$1,ﬂ),F2(y2;I2,l3);p) = —-2-—2-——e “102 ”102
Xe 201 202

where e,- = y,- — xiﬁ, i = 1, 2.

78

The relevant

IE

l?!

E

{"52"
{
{”

moment conditions are
E{x]1 c] }_ 0,

p(a]20 o:x(:e:+o%222x2c21—a%px2e2— 0223le
a2 a 12(or2 07 —p2)
1_22 2 2 2

)

22_

(02102—22)2

:0,

Again, (C) is a linear combination of (A) and (B). The use of (C) and (D) or

any other zero mean copula terms does not help estimate 2 more precisely than

IQMLE.

The covariance and expected derivative matrices are

 

 

 

 

2
50 p212? _px?x2 ]
"1% "1"2 "1"2 0
pxgx? 1172 _px x 0
"1"2 "2 "1"2
_p$ x _p.’III$E ($30¥+$%03—2p$1x2)p2 0 a
0102 0102 0%U%(U%03-p2)
2 2
0 0 0 1132127
(02.2-2) .
I- 2 -
-3: 0
"1
2
_$2 0
= p(2x1x2a¥a%— px102— 2p01x x2) 0
("1"2 "2)1"
0 — 0202+p2
2 2 2
L ("102‘”) n

 

79

 

C 0%03—p2
is sin lar. V = V =
gu MLE IQMLE $30¥+x%0‘%-2P$1$2

of the GLS estimator of ﬂ, (x' 2‘1x)'1.

 

, which is also the variance

Bivariate Normal with common variance. Assume Normal marginal densities

witha¥=a§=02andp1=p2=0

 

$2
1 ._
f1($1;0) = 8 5’17
2w02

~ 7

2
a:
_2—ar%,

 

f2<332§ 0) = We

Again, let the true joint distribution be Normal, i.e.,

_ xiaz—2xlx2p-l-x302

2(04—p2)

 

1

h(131,932;0, p) = 6
27n/04 — p2

Then, the implied copula is Normal,

 

_ p (a:¥p+a:gp—2a2zlx2l

2 4 2

0' 2 _
C(F1($1;0),F2($2;0);p) = —2———2e (‘7 ‘02)"
0 -p

 

The relevant moment conditions are

2

1E Jsza‘" =0, (B)

((3p04-p3) (X12+X§)-406X1X2-2029(U4-p2)) p
2(02—p)2(02+p)202

_m2<X§+XE§)—(p2+o4>X1X2-p<a4—p2)
(022292-02

1E — =0, (C)

 

 

80

(C) is not a linear combination of (A) and (B). However, (C) is a linear combination
of (A), (B), and (D). Indeed, after some algebra, the moment function in (C) can

be written as a weighted sum of the moment functions in (A), (B) and (D) with

the weight
p202
(04 + 19'")
on (A) and (B) and
2pc4
(p2 + 04)

on (D). Thus (2.24) holds and C is singular.

 

 

 

 

 

 

 

 

 

 

 

. 2 2 .
1 ._
$21 5‘33 '58 0
2 2
2%8 27:4 —20 0
C = _2£6 _-&28- p2(40.8_30.4p2+p4) _ 202p 1
20 20 08(02-p)2(02+p)2 (02-1))2(02+p)2
. (02—9)2(02+P)2 (HQ-P)2(02+P)2 .J
_ _ 1 .
551 0
1
"274 0
D _ 102012-304) 2022
04(0'2-10)2(<72+pl2 (c72~/))2(02+10)2
202p _ oil-l-p2
- ((72—P)T(<72+10)2 (07_p)2(0T+$2 .

 

By Theorem 2.5.2, IQMLE of a2 is efﬁcient, in fact VMLE = VIQMLE = 04 + p2.

Farlie-Gumbel—Morgenstern copula with general marginals. For 2' = 1, 2

denote the marginal p.d.f.’s and c.d.f.’s by

fi 5 fi($z';9)

81

and

mi
F.- sis-(2:20) = / f.~(z;6)dz,
—00

respectively.
Assume the FGM copula. Then

c(u,v;p)=1+p—2pu—2pv+4puv.

Our moment conditions are now

In general, (C) is not a linear combination of (A), (B) or (A), (B) and (D). So
the copula based terms are not redundant in general and IQMLE is generally

inefﬁcient.

2.6 Validity of copula terms

Suppose we are ready to assume the correctness of the marginal distributions (the
marginal moments in (2.3)) but are doubtful about the correctness of the joint

distribution (the copula moments in (2.3)). One may test the validity of a copula

82

by testing the validity of the moment restrictions (C) and (D) in (2.3). There are

at least two ways to do that.

2.6.1 Theoretical results

It was noted earlier that the moment conditions in (2.3) are usually overidentiﬁed.
There are at least as many marginal moments as marginal parameters (or more
if the marginal distributions share parameters), plus there are as many copula
moments as there are parameters in total. Since the parameters are overidentiﬁed,
the moment conditions in (2.3) imply restrictions. Consequently, if the model that
led to the moment conditions is incorrect (i.e., the assumed joint distribution is
wrong) then at least some of the moment conditions will be systematically violated
in the sample. This suggests the possibility for testing copula validity by a test of

the overidentifying restrictions (see, e.g., Hansen, 1982; Newey and West, 1987).

We will need more notation. For m = 1,2 and 2' = 1,...,N, denote fmi(6) =
fm(Xu;9), cz-(Gm) = C(F1(X1i36),F2(X2i§9)3p)a

3091Bf1i09) 1

a1 ,9 3111 ,0
wi(9,p)= 2311]“) , 91(9)= ? h() ,
53111 c1-(6, P) 33111 f2i(9)

 

 

_ 3%1ncz'(9,p)j

339111421?)

. g, =
Tz( P) 6 1n (6 )
a; C1 3p

83

Note that 1]), is a (3p+q)-vector. Let

- 1 N 1 N 1 N
i=1ig=l i=1

Following our previous notation, let

Co Ed’(90,Po)1/J(001P0)I,

(1’1 Eg(0o)g(9o)’,

C32 5 ET(601p0)r(609p0)’1
(1)2 = C01, 5 Eg(60)T(60,p0)’,
_ 3
13° = "W“"l’

8

a

(211 5 1513—9, TWO Po)
E3

D32 E E871. T1(001p0)

where expectations are with respect to the joint density h(a'l, 1:2).

Proposition 2.6.1 Let (5, p) denote the optimal GMM estimate of (6, p) based on
(2.3). Then

we“, #031225, 25) 3 x3... (2.29)

This test is a speciﬁcation test which, given that the marginal distributions are
correct, should capture copula misspeciﬁcation. A consistent estimator of Co such

as

1N
Cozﬁzwiw ,0),
i=1

84

is usually used in (2.29). It is however important to note that the statistic in (2.29)

can be used only if C in non-singular, i.e. if copula terms are not redundant.

The second way to test copula validity we propose is based on a two step procedure.

Proposition 2.6.2 Letél be the optimal GMM estimate based on Eg(d) = 0. Let

p be obtained by minimizing F(d, p)’ B; 177(6: p), where

Bo =ng —Dgl(Dglcgl-ngl)—1D(1)1IC(1)1—IC({2
-1 _ _
‘03101’1 D(l)1(D(I)IC(1)1 11311) 1031’

+1331(Di’1Ci’1—1 (ill—11331"
Then,

Nﬂé. p)’B;1r(é. l3) 3 xi. (2.30)

Similarly to Proposition 2.6.1, consistent estimates of the elements of Co and D0

will be used in practice for calculating the test statistic in (2.30).

2.7 Concluding remarks

We have proposed considering likelihood—based models in a GMM setting, in which
knowledge about the joint distribution can be represented as copula moment con-
ditions and efﬁciency and robustness of estimators can be assessed in terms of

redundancy and robustness of the copula moments.

85

In considering copula robustness, all of the copula families that we compared to
the normal benchmark except the Frank family are not comprehensive, i.e., they
do not cover all possible values of the dependence measure 7‘. This makes such
copula families relevant for modelling only certain degrees of dependence to which

our robustness comparisons would apply.

For the Frank and Normal families, 7' 6 (—1, 1), so they are comprehensive. Given
the simulation results, the Frank copula appears as useful in modelling any degree
of dependence as the Normal family. It would be desirable to make comparisons
with other comprehensive copulas such as the Plackett family. Similarly, compar-
isons of the Logistic copula to copulas with the same coverage should reveal its

relative robustness.

The behavior of the AMH family of copulas was quite similar to that of the FGM
family in our simulation. This was due to the small value of the dependence
parameter p. The ﬁrst order approximation in (2.16) is in this case quite accurate.

It may not be so for larger p.

Finally, our results on copula robustness are problem-speciﬁc. For example, they
are generally inapplicable to problems involving higher moments of a distribution.
In similar simulations with problems other then sample-mean problems, radially
symmetric copulas may not be robust to misspeciﬁcation, but it should still be
possible to compare robustness properties of copula families since the true copula

is known.

86

Bibliography

Bouvé, E., V. DURRLEMAN, A. NIKEGHBALI, G. RIBOULET, AND T. RON-
CALLI (2000): “Copulas for Finance: A Reading Guide and Some Applications,”
Crédit Lyonnais Working Paper, gro.creditlyonnais.fr/content/wp/copula—
survey.pdf.

BREUSCH, T., H. QIAN, P. SCHMIDT, AND D. WYHOWSKI (1999): “Redun-
dancy of moment conditions,” Journal of Econometrics, 91, 89—111.

BREYMANN, W., A. DIAS, AND P. EMBRECHTS (2003): “Dependence structures

for multivariate high-frequency data in ﬁnance,” Quantitative Finance, 3, 1—14,
http: / / www.iop.org/ EJ / abstract / 1469-7688 / 3 / 1 / 301 / .

CAMERON, A. C., T. LI, P. K. TRIVEDI, AND D. M. ZIMMER (2004): “Mod-
elling the differences in counted outcomes using bivariate copula models with
application to mismeasured counts,” Econometrics Journal, 7, 566—84.

EMBRECI—ITS, P., A. HéilNG, AND A. JURI (2003): “Using copulae to bound the
Value-at-Risk for functions of dependent risks,” Finance and Stochastics, 7, 145
167.

EMBRECHTS, P., A. MCNEIL, AND D. STRAUMANN (2002): “Correlation and
dependence in risk management: properties and pitfalls,” in Risk Management:
Value at Risk and Beyond, ed. by M. Dempster, Cambridge: Cambridge Uni-
versity Press, 176223.

FRANK, M. (1979): “On the simultaneous associativity of F(:c, y) and a: + y —
F (m,y),” Aequationes Mathematicae, 19, 194—226.

GODAMBE, V. P. (1960): “An optimum property of regular maximum likelihood
estimation,” The Annals of Mathematical Statistics, 31, 1208—1211.

 

(1976): “Conditional likelihood and unconditional optimum estimating
equations,” Biometrika, 63, 277—284.

GOURIEROUX, C., A. MONFORT, AND A. TROGNON (1984): “Pseudo maximum
likelihood methods: theory,” Econometrica, 52, 681—700.

87

GREEN, W. (2002): Econometric Analysis, Prentice Hall.

HANSEN, L. (1982): “Large sample properties of generalized method of moments
estimators,” Econometrica, 50, 1029—1054.

KENDALL, M. G. (1949): “Rank and product-moment correlation,” Biometrika,
36, 177-193.

LEE, L.-F. (1983): “Generalized econometric models with selectivity,” Economet-
rica, 51, 507-512.

MARDIA, K., J. KENT, AND J. BIBBY (1979): Multivariate Analysis, Probability
and Mathematical Statistics, London: Academic Press.

NELSEN, R. B. (1993): “Some concepts of bivariate symmetry,” Journal of Non-
parametric Statistics, 3, 95—101.

 

(1999): An Introduction to Copulas, vol. 139 of Lecture Notes in Statistics,
Springer.

NEWEY, W. AND K. WEST (1987): “Hypothesis testing with efficient method of
moments estimation,” International Economic Review, 28, 777—787.

PATTON, A. (2001): “Modelling time-varying exchange rate dependence using the
conditional copula,” U C San Diego Department of Economics Discussion Paper
2001-09.

SCHMIDT, P. (2004): “Likelihood-based estimation in a panel setting,” MSU
Working Paper.

SKLAR, A. (1959): “Fonctions de repartition a 11 dimensions et leurs marges,”
Publications de l’Institut de Statistique de l’Universite’ de Paris, 8, 229—231.

SMITH, M. D. (2003): “Modelling sample selection using Archimedian copulas,”
Econometrics Journal, 6, 99—123.

WHITE, H. (1982): “Maximum likelihood estimation of misspeciﬁed models,”
Econometrica, 50, 1—26.

WOOLDRIDGE, J. ( 1994): “Estimation and inference for dependent processes,” in
Handbook of Econometrics, ed. by R. Engle and D. McFadden, vol. IV.

 

(2002): Econometric analysis of cross section and panel data, Cambridge,
Mass: MIT Press.

88

Appendix A: Selected c0pula families

1. Independence copula:

2. Logistic copula:

 

3. Farlie-Gumbel-Morgenstern family:

C(u, v, p) = uv(1+ p(1— u)(1 — v))
C(u,v,p) = 1+p-2pu-2pv+4puv
p 6 l-1;1]

r = 2p/9 e [—2/9, 2/9]

89

4. Joe family*:

C(uavap)=1—((1_U)p+(1—'U)p—(1—U)p(1—’U)p)1/p

 

 

p E [1, oo)
<.00) = -103(1 - (1 - t)"’)
r E [0, 1)
5. Ali—Mikhail-Haq family*:
uv
C(uavap)-1_ p(1_ U)(1— ’0)
p e [—1a1)
1 — 1— t
W) =10g “(t )
r E [—0.182,1/3)
6. Clayton family*:
uv. = 0
C(u,v,p) = I p
(u—p + v‘p — 1)‘1/p, p 7é 0
p 6 [0,00)
1
t = — t‘P — 1
s0( ) :( )
= — 6 0,1
T p+ 2 l )

90

7. Gumbel family*:

CW» 7}, P) = 8X}? [-((— ln u)P + (-1nv)p)1/p]
p E [1, oo)
4.00) = (- log t)”

—1
T=p—e[0,1)
p

8. Frank family*:

 

 

 

 

( uv, P = O
C U,” p) = _ —
i _l (e pu-1)(e pv__1)
pm 1+ e_p_1 , paéO
p 6 (_m1 (X3)
e‘pt — 1
50(t) = —1n e—p _ 1
T 6 (_1a 1)
9. Plackett family:
uv, p = 1
C(u’ 'U, ,0) = (1+(u+v)(p—1)-@(u+v)(P—1))2-4UUP(P‘1))
2(p_1) . p at 1
p E (0, 00)
T E (—1a 1)

91

10. Normal family:

C(uwm) = <I>2(<I>—1(U).<I>"l(v);p)
p 6 (—-1,1)

2
r = E arcsinp E (—1, 1)

Note: * denotes Archimedean copulas, i.e. copulas generated as

C(U, v) = <P_1(¢(U) + 90(1)».

where 4p : I —i [0,00], continuous, <p’(t) < 0 and gp"(t) > 0 Vt 6 (0,1) is called
the generator function. It can be shown (see, e.g., Nelsen, 1999, p.130) that for

Archimedean copulas, Kendall’s

1
r=1+4/ 90(t)dt.
0 t

 

92

Appendix B: Proofs

PROOF OF THEOREM 2.4.1:

We show that 132593111 lC(F10t1 + X1). F2012 + X2);p”) = 0. where u = 011.112)’,
holds for any RS K.
“)

By the chain rule, 5% 1n k(F1(u1 + x1), F2012 + 1:2); p contains terms of the form

1 X (918071011 + $1). F2012 + $2);pk)
[C(F10t1 +$1),F2(u2 +332);pk) 33011 +151)

 

 

>< film + Ii).
(2.31)
i = 1, 2.

Due ’00 MS 0f (X1.X2) and R5 of K, film + 152') = film - Ii) and k(F1(/11 +

$1). F2(/12+$2)) = k(1-F1(#1+I1).1-F2(u2+$2)) = k(51011-561). F2(u2-$2))-
So the ﬁrst term in (2.31) is the same whether evaluated at (1:1, :52) or (-x1, —:r2).

Similarly, the last term is the same whether evaluated at 2:,- or —:I:,-.

 

 

 

Furthermore,
0k(F1(#1+31).F2(u2+x2);pk) = 8k(1—F1(p1+$1),1_F2(p2+x2);pk)
6172' (Hi+$i) 6(1—Fz‘ (Hi-xi» k
= _ak(F1(#1-$1)»F2(u2-$2);p )
MAM-xi) ‘

Thus, 363111 k(F1(u1+:1:1),F2(u2+:r2);pk)= -38;1n k(F1(#l-$1)a 155012-232); Pk)-

Denote 9051.332) 5 33; 1nk(F1(u1 + $1), F2(#2 + $2);p“) 41011 + $140 + trap)-

From the above, it follows with RS that g(—a:1,—a:2) = —g(.r1,:c2).

93

We thus have

EgaglnHFN/ti+X1).F2(#2+X2);pk) L00 L00 9($1,$2)d$1d$2

Loo L00 9( $1. $2)d$1d$2

L900 fooo g(z1,x2)dx1dx2
fooo f0009( (371,332)d$1d$2

fooo f. 009($1,$2)d$1d$2

+++

fooo fowg g(-—:r:1, -r2)dx1da:2

fooo f— 009(— $19—$2)d$1d$2

fowfooog(x1,x2dx1dxg

+++

low f_ 009($1.$2)0l$1d$2
0.

PROOF OF LEMMA 2.5.1: By the information matrix equality (IME),

A s E(é’glnmxl; >83, 1nf1(X1.9)}———85:6,1nf1(X1.0-) (2.32)
Similar for B, F.
By the generalized IME (GIME),
E E IE{8661116(F1(X1;l9) F2(X2;9); 10);;1nC(F1(X1;9),F2(X2;9);P)}
= 13868;, 1DC(F1(X1,9),F2(X2;9);10) (2-33)

94

and, for i = 1,2,

62

a
IE(gglnfrxtv—1nc<F1<X1;6),61(X2;6);p)}= Eagadlnmxw) 0

BP’

Also by GIME and (2.2),

1E{E%1nfi(Xi;9)'-— [1nf1<X1;6)+1nf2<X2;0>+1“C("'“’”}=
82
Era—aae'lnf’m; 6)

80’

for i = 1, 2, which, along with (2.32), implies that

G

e{§,1nf1<x1;)df,, 1nf2(X2,6)}

= _]E{%1nf1(X1; 6)8—66,lnc(F1(X1;0),F1(X2§9);P)}

and

a (9
16(551nf2m2; )a,,1nfl(xl.6)}=

6 a I
= -E{5élnf2(X2; ”(B—‘0’ 1DC(F1(X1;9).F1(X2;9);P)} = G- (234)

Finally, by GIME and (2.2),

1E{29(7%1110(1”‘1()(1;9) F1(X2; 6;)P P)? X
X llnf1(X1;9)+1nf2(X2;9)+1nC(F1(X1;9) F1(X2; 9;) P)l}— —
— —Eb%,1nc(F1(X1;6). 61(X2;6);p)

95

With G as deﬁned above and

a 8
J E E{551nC(F1(X1;9). F1(X2;9);P)5y1nC(F1(X1;9).F1(X2;9);P)}

this implies that

2

6
E555? 1nC(F1(X1;9), F1(X2;9);P) = G + G, — J-

PROOF OF THEOREM 2.5.1: See text.

PROOF OF THEOREM 2.5.2: By (2.20) and (2.23),

—1
A+B+J—G—G’H
VMLE = E F ,

__.1 —1

A G —A
VIQMLE = [ —A -B ] . (2.35)
G’ B —B

Using partitioned inverse formulas, the upper left p x p block of VMLE can be
written as 2’1, where 2 = A + B + J — G — G’ — E’F‘IE.

96

Also,

_1 [G’G][]III]AG AG
I MLE = ’ ‘ + X
Q G’ B G’ B
A G 11 —G
x + (2.36)
G’ B I —G’
-1
A G —G
= [-G’ — l
G’ B —G’
—G’—G+A+G+G’+B—G-G’. (2.37)
Thus, VfdMLE = 2 if and only if
—1
A G —G
J—E’F‘1E=[—G’ —G]
G’ B —G’

PROOF OF COROLLARY 2.5.1:

1. If (C) is a linear combination of (A) and (B) then covariances between mo-
ment functions in (C) and (D) are linear combinations of covariances between

(D) and (A-B), which are all zero by Lemma 2.5.1.

2. Rewrite J — C31C1’11C?2 as

6
3331an 3Inc

a -
E — lnc - (3310111 80,

80 3% 111 f2

97

This is identically zero because, due to linearity of (C) in (A-B),

a 1n
9— lnc— chCI—ll 39 f1

86 =0.

@1an

3. By Theorem 2.5.2.

PROOF OF LEMMA 2.5.2: By construction, blocks A,B, G of matrices C1‘ and

D“ are the same as in Lemma 2.5.1. However, GIME does not apply now.

V 2 E(é".1nk<FI(X1;6).F2(X2;6)m “527) ,( , 1nk<F1<X1; 6),Fz(X2;6);p’°)} aé
aé —E {5337 1nk<F1(X1;6), F2(X2;6);pk)} -=- —S.

a
-P a 16(891nmxl; 6)871nk(F.(X1;6),FI<X2;6);p’°)}
62
e —16606—71nf1<xl;6)=o

and

,_ 6 , 12.
-Q = E{%1nf2(xz,6)ap.
B2

7“ 4Ea—W _lnf2(X2;6)=

1n:<61<x1;6).FI<X2;6);p’“)}

c. a IE{g5lnf1(X1;9)ai—,lnf2(X2;9)}7f

6

7'4 —E{%lnf1(X1; 0)a_6/ lnk(Fl(X1;6)vF1(X2;6);pk)} EK

98

and
3 a
12(661nmx2; 6)a—6,1nf1<xl; 6)} 7e

7“ E{:—9-1nf2(x2;9)8 1nk(FI(XI; 6),1"1(X2;9);Pk)} E U

(96’

However, by GIME and (2.2),

 

82 a
IE866,9)1I11€(F1(X1;9),1“"1(X2;f9);p“) = ~13{ElnHFﬁXnﬁlF1(X2;0);P“)X
8
X[a—9,1nf1(X1;9)+
+8
+86—71nf2(X2; 9+)
a
+5—971nc<F1(X1;6).F1<X2;6);p)] }
a K’+L—M, (2.38)
and
32 k a k
1367-80,1nk(F1(X1;9),F1(X2;9);P) = ’15 a—plnk(F1(X1;9)aFl(X2;6);/3)
8
X[6_6/lnf1(X1; 0H
6
+5é71nf2(X2;6) +
6
+¥ 1nc<FI<X1;6), F1062; 6m] }
a P’+Q—R, (2.39)
—'r = ‘92 ——1nk(F(X 9) F(X-6)- k)
— E8781), 1 1; a 1 21 ,P
B 3
= 43{55111HF1(X1;9).F1(X2;9);P Pk»)a—7p1nC(FI(X1;9) F1(X2;9); P)},

99

and

_ <92 k

—S = Ego—8p]lnk(F1(X1)6)aF1(X216)ap)
8 8

= —E{a—6lnk(F1(X1;9),F1(X2;9);P“)7d—’71n0(F1(X1;9),FI(X2;9);P)}-

Also,
a a k

NEE 551nk(F1(X1;9).F1(X2;9)P “)8—6,1nk(F1(X1;0),F1(X2;6);p) 79M
and

WsE{§/31nk<n<x1;6) F1(X2; 9;)P)5p71nk(F1(X1;9)aF1(X2;9);Pk)}#T-

Finally, by the well known algebraic property of cross-partial derivatives,

S=—P—Q’+R’.

PROOF OF THEOREM 2.5.3: See main text.

PROOF OF THEOREM 2.5.4: By Theorem 8(C) of Breusch et al. (1999), (C'-D’)
are redundant for 0 given (A—B) if and only if

K’+L—M -K’ —L -S
P’+Q—R —P’ -Q —T

100

for some matrix B : q x p.

This is equivalent to

—M — [—K’

—R — [—P’

— LICII1

— QICDI

 

 

= —TlB.

T is symmetric and invertible, so we can substitute B from the latter equation into

the former to obtain

M - l-K' - LJCﬁ‘C‘iz = ST“1(R - [-P’ - QlCthiz),

which completes the proof.

PROOF OF COROLLARY 2.5.2:

1. By (2.28), M — Calfolngz is identically zero under linearity of (C) in

(A-B).
2. As in 1.

3. By Theorem 2.5.4.

PROOF OF PROPOSITION 2.6.1: See proof of Lemma 4.2 of Hansen (1982).

101

PROOF OF PROPOSITION 2.6.2: First note that, by standard optimal GMM

results, 9 satisﬁes
y/Nw‘ — 90) = —(D° ’c° ‘1 gl)-1Dgl’c° _1\/_g(60)+ 0,,(1). (2.40)
The ﬁrst order condition for p can equivalently be written as

7“(9 P)]'Bo‘1r (9 P) = 0,

D22’Bo—lmﬂ9a P) = 012(1). (2.41)

[B—p’

Now, by the mean-value theorem, we have
mm“, a) = @7100, pa) + ng/ﬁw — a.) + Dam/Em — pa) + 0,,(1). (2.42)

Substituting (2.40) into (2.42), pre-multiplying by Dgz'Bo'l, and solving for
WW6 —- p0) using (2.41) yields

War—pa) = —(Bahamas—11332331WW6...p.)
+(D32’BS 1D° )_1D§2IBS 1931 X
x(D‘1’1C°11-1D‘1’1)—1D1C01-1\/—9(90)

Substituting (2.43) and (2.40) into (2.42) and simplifying results in

x/Nr(9,P) = Rex/Nai(6o,po)+op(1), (2.44)

102

where

R0 = H — D32(D32’BS 1D32)_1D32’BS 1)

(3(90,Po) = 7(90,Po)‘13§.;1(D(1)1,Cc1’1_1 (1’1)_1D(1)1’ (131—19(90)-

Note that y/Tvowo, p0) ~ N
note that Rnglno = B;

A

0, Bo), and thus B; 1/ “mama, p0) ~ N(o, I). Also,
_ 1 _ 1 _ 1
[I — Bo 2D32(D32’B;1D32)-1D32’B0 2130 2.

NH

Thus, the test statistic in (2.30) can be written as
Nh(é, p)’B;1h(6l, )3), (2.45)
i.e. as a quadratic form in standard normals with the coefficient matrix
—1/2

-1 2 _ _
1P=Ip+q—Bo / D32(D32’B..1D‘2’2) ngz’Bo . (2.46)

_1
This matrix is idempotent: it is the projection matrix orthogonal to B0 2Dgz.

The X2-test in (2.30) follows immediately because tr(lP’) = p + q — rank(Dg2) = p.

103

Appendix C: Plots of simulated sample moments

 

 

 

 

 

 

 

 

 

(a)
1 .
l 0.4-
,
J
1
054 .
. o.2~
1.: 2
e
-o.s- .
. .02-
-1- I I I l I l l l l
-os—oz.-o.4-o.2 o 0.2 0.4 0.5 0.3
ml

 

43-65-64-452 6 0'2 014 016 013
I'll!

Figure 2.1: 5901) for no-parameter copulas: (a) Independence copula; (b) Logistic

copula.
(1 8) (1 b)

 

Figure 2.2: S”(u,p) and 3”(p,p) for one—parameter copulas: (1) Farlie-Gumbel—

Morgenstern.

104

 

Figure 2.3: 39m, p) and 3901, p) for one-parameter copulas: (2) Joe.
(30) (3b)

 

Figure 2.4: 3901, p) and 3901, p) for one-parameter copulas: (3) Ali-Mikhail—Haq.

105

 

Figure 2.6: (9‘04, p) and 5904, p) for one-parameter copulas: (5) Gumbel.

106

 

(78) (7b)

 

'Figure 2.8: 3"(p, p) and 390i, p) for one—parameter copulas: (7) Rank.

107

Essay 3

Modelling Covariance Structures:
First and Second Order

Asymptotics

3.1 Introduction

This paper considers estimation of covariance structure models, i.e. models formu-
lated in terms of the second moments of the data. One situation when such models
arise is when there are some variables that are unobserved but whose presence in
the model introduces a particular pattern of correlation between observed variables
(e.g., linear structural relationship (LISREL) models, multiple indicators multiple
causes (MIMIC) models, factor analysis (FA) and random effect (RE) models).

108

Traditional estimation methods for such models are based on the assumption of
multivariate normality (see, e.g., JOreskog, 1970; JOreskog and Goldberger, 1975;
JOreskog and SOrbom, 1977). Because even moments of normally distributed data
are function of the second moment, no improvements can be made by using higher
order moments of the data. The maximum likelihood estimator (MLE) is efﬁcient

under normality.

If the data are not normal, MLE is still consistent. However, the MLE standard
errors are wrong and consequently inference may be incorrect. An obvious way
to make inference robust to non-normality is to adjust standard errors using the
“sandwich” form of the variance matrix. The exact form of the variance matrix
for normal quasi-MLE of covariance structures can be found, e.g., in Chamberlain
(1984, p. 1295). Although obvious, the QMLE improvement to covariance structure
modelling does not seem to be widely implemented. For example, the popular
software packages used for covariance structure models do not seem to do that

(see, e.g., JOreskog and SOrbom, 1996).

It is well known that the efﬁcient generalized method of moments estimator (GMM)
optimally uses the information available in the moment conditions (“efﬁcient” here
means “ﬁrst-order efﬁcient”). For covariance structures, this means that GMM
makes efﬁcient use of the restrictions on the second moments whether or not the
data are in fact normal. Similarly, the family of empirical likelihood estimators
that are ﬁrst order asymptotic equivalents of GMM, possess the same property.
It is therefore intuitive that GMM of covariance structure should be no worse
than normal QMLE asymptotically. This intuition has been noted in Chamberlain
(1982, 1984); Ahn and Schmidt (1995) and other papers. The common argument

is that the GMM estimator attains the lower bound for the asymptotic variance

109

matrix for estimators that use the second moment restrictions.

One of this paper’s contributions is that it provides a formal comparison between
the estimators to the ﬁrst order in terms of the relevant variance matrices. It
presents an explicit condition for equal relative efﬁciency of GMM and normal
QMLE. The condition is expressed in terms of the fourth moments of the data and
normality is shown to be one case when the condition holds. Such a representation
provides a clear form of the efﬁciency gain and identiﬁes a family of distributions
for which normal QMLE and optimal GMM are equally efﬁcient. This result is

given in Section 3.3.

Section 3.2 describes the general model and the estimators. The linear interde-
pendent structural relationship (LISREL) model is a special case of the general

model. We describe this widely used model in Subsection 3.2.2.

Section 3.4 considers second order asymptotics. It is well known that the GMM
estimator has a second order bias that contains more terms than that of the EL
estimator. Newey et al. (2003); Newey and Smith (2004) derive the relevant bias
expressions. The extra bias terms in GMM come from the estimation of the optimal
weighting matrix and the derivative matrix that are both parts of the GMM ﬁrst
order conditions. It is unknown how the two estimators (GMM and EL) compare

to normal QMLE in terms of the second order bias.

Intuitively, the answer to the question of second order asymptotic comparisons is
clear. If the true distribution of the data is discrete then MLE and EL are identical.
Earthermore, the bias expression for EL does not depend on discreteness. So if

the assumed distribution (normal, in the case of Gaussian QMLE) turns out to

110

be correct, we should have the same bias for EL as for (Q)MLE. However it is
still interesting to have an explicit form of the QMLE bias so that comparisons
with other distributions can conceivably be made. Note that equal ﬁrst order
efﬁciency of QMLE and GMM (EL) may not be attainable for other distributions.
We ﬁrst derive the second order bias of normal QMLE expressed in terms of higher
moments of the true distribution and then show formally that it is in fact the same

as EL if the data are normal.

3.2 Preliminaries

3.2.1 Setup and assumptions

Consider a family of distributions {P9, 9 E O C 1129,69 compact} and a random
vector 2 e z c R6 from 1090,00 e e, such that IEZ = o, lE{||Z||4} < oo and

IE [22’] = 2(9), if and only if e = 90. (3.1)

Expectation is with respect to P90. The measurable real-valued matrix function
2(6) comes from any structural model such as a factor analysis (FA) model, a
random effects (RE) model, a simultaneous equations model (SEM), a conditional
expectation model. For example, Ahn and Schmidt (1995) show how this setup

arises in a dynamic panel setting.

For a random sample (Z1, . . . , Z N), where Z, is measured as deviations from the

111

mean, denote

Si E Zizg (3.2)
and
1 N
s E N Z s,. (3.3)
i=1
The problem is to estimate 00 given the random sample (Z1, . . . , Z N)-

It is easy to see that since we assumed that the fourth moments exist, then S

satisﬁes a central limit theorem. Thus we can write
660(3) -* N (666(2(9o)). A(9o)),
where
n(o) = V(ueo(s,-)) = lEvec(Si)vec(Si)' — vec(2(e))ueo(2(o))’ (3.4)

and vec denotes vertical vectorization of a matrix. To save space we will often
omit the argument of matrix-functions and write 2 instead of 2(0), 20 instead

of 2(90), A0 instead of A(00), etc.

It is well known (see, e.g., Magnus and Neudecker, 1988, p. 253) that for the

multivariate normal distribution we have

where ® denotes the Kronecker product, ll)c is the identity matrix of dimension k,

2

11mg is the commutation matrix, i.e. such an m x m2-matrix that I'Img vec(A) =

112

vec(A’), for any m x m matrix A.1 Thus the fourth moments of the multivariate

normal distribution are expressed in terms of the second moments.

We will also need certain smoothness conditions on 2. Such conditions combined

with the above restrictions on Z are summarized in the following assumption.

Assumption 3.2.1 (i) Zi E Z C Rq,i = 1,...,N are iid from a distribution
P90, 00 E 9 C RP, 9 compact;

(a) IEZ = 0, ]E{||Z||4} < co and IE [22’] = 2(0) if and only ifo = 00,-

(iii) 00 e int(9) and p g §q(q + 1),-

(iv) vec(2) is continuous at each 0 E G;

(u) vec(2) is three times continuously differentiable on a neighborhood N of 00.

The following example shows that the simple setup in (3.1)-(3.3) can be used to

represent fairly complex covariance structures.

3.2.2 An example

Consider the following Linear Interdependent Structural Relationship (LISREL)

model pioneered by Karl Joreskog (see, e.g., Joreskog and Sbrbom, 1977, p.287-

 

1One important property of the commutation matrix which also gave it its name is that
it allows to interchange the two matrices in a Kronecker product while reversing the order of
multiplication as in (3.5).

113

288)

Y : Ayn + 69 (3‘6)
1311 = 1‘6 + '7, (38)

where Y and X (dimX + dimY = q) are measured as deviations from their
means, 17 and 6 are common factors and e and 6 are unique factors such that
E(n) = O, E(() = 0, E(e) = 0, E(J) = 0, E(ne') = 0, E(ﬁd’) = O, E(ee') = 9?,
E(JJ’) = 9%, E(eJ’) = 0, where 9% and G)? are diagonal matrices.

Then, after some algebra, the covariance matrix of the observed variables (Y’, X')’

can be written as follows:

AynnnAg, + e? AyﬂngA;

53-: , (3.9)
AzﬂgnA; AxnééAg+e§
where
n n B‘11‘<I>I"B’“1+B‘1\IIB"1 B‘lI‘<I>
7m 176 _
05,, 955 <I>’r’B'-1 <I>

and ‘1’ = E(éE’), ‘1' = E(W’)

If we let 0 denote the vector of all distinct parameters in Ay, Am, B, P, <I>, \II, 9?, 9%

and let Z = (Y’ , X’ )’ we will obtain the setup of Section 3.2.1.

By imposing appropriate restrictions, the LISREL model reduces to many well-

known models (see, e.g., Aigner et al., 1984). For example, equation (3.9) reduces

114

to a FA model if one imposes sufﬁcient restrictions to retain only the upper-left

block in the form I‘<I>I" + \II. From (3.6)-(3.8), SEM can be obtained by restricting
B = I, 92 = 9? = 0. To obtain a model for the conditional expectation of Y|X,
one can restrict A9; to l, 9% to 0, and <I> to the sample covariance matrix of X.

See Joreskog (1970) for other special cases.

A well known special case of LISREL known as the multiple indicators multiple
causes model (MIMIC) is obtained from (3.6)-(3.8) by setting Ax = II, B = II and
62 = 0 (see, e.g., Joreskog and Goldberger, 1975).

3.2.3 Estimators

3.2.3.1 Normal (Q)MLE

The normal QML estimator is

N
OQMLE = argggagz 1nf(zz-, 6), (3.10)
i=1

where

1 _
mm = HZ?2 122'.

1
(2709 W121
It is easy to see that the problem in (3.10) can be equivalently written as

9 =ar minF 9.
QMLE €969 MLE( ),

115

where

FMLE(6) = log |2| + u(sz—l). (3.11)

Thus QMLE amounts to ﬁnding the value of 0 that minimize distance (3.11)
between the sample covariance matrix S and the covariance matrix 2 imposed by

the model.

It is a standard result (see, e.g., Chamberlain, 1984, p. 1289) that, under Assump-

tion 3.2.1, the normal QMLE of 90 is consistent and asymptotically normal.

3.2.3.2 GMM

The optimal GMM estimator of 90 is based on the distinct elements of (3.1), i.e.

on the moment conditions

IE[m(zz'; 90)] = 0, (3-12)

where m(Zi; 6) = vech(S,-) — vech(2) and vech denotes vertical vectorization of

the lower triangle of a matrix. Thus m is a %q(q + 1)-vector.

The optimal GMM estimator of 00 is obtained as the solution to the following

problem:

éGMM = arg min FGMMW).
see

where

FGMM(9) = mN(9)IWmN(0)a (3-13)

116

1 N
mN(9) = ﬁzmwﬁg)
i=1

= uech(S) — vech(2),
and W is the appropriate (optimal) weighting matrix.

The optimal weighting matrix is
w. = {E[m(Z.-; ao)m(zi; earn—1. (3.14)

But in (3.13), one would typically use the following consistent estimator of W0

based on a preliminary consistent estimate of 00

—1

1 N « «
W: I—V-Z[m(z,-;0)m(zi;9)']
t=l

Note that there is a connection between W in (3.14) and A in (3.4). To show
the connection we need to deﬁne matrices that transform vech into vec and vice
versa. Magnus and Neudecker (1988, p. 49) show that, for a symmetric k x k
matrix A there exists a unique 1:2 x 134% duplication matrix Hk such that
Hk vech(A) = vec(A). Thus Hk transforms vech into vec, while the Moore-
Penrose inverse of Hk, Hk = (Hi‘HleH’ , transforms vec into vech. Matrices

Hk and FL, have the following properties:

0) ﬁle Hk = Hk(k+1)§

(ii) sz Hk = Hk, where Hkg is the commutation matrix defined above;
(iii) HI: ﬁle = $(Hk2 + II,,2);

117

(iv) (llkg + 11kg) Hk = 2H}: and Hk (llkg + 11kg): 21:11:.

Thus, omitting the dimensionality subscript, we can write A = V[vec(S,-)] =

V[H vech(S,-)] = HV[vech(S,-)]H’. But V[vech(S,-)] = E[m(Zi;0)m(Z,-;6)’]. We

can therefore write the optimal weighting matrix in (3.14) as [HAofI']_1.

It is easy to verify that, under Assumption 3.2.1 and with W 33» W, the standard

conditions for consistency and asymptotically normality of the GMM estimator of

00 hold (see, e.g., Newey and McFadden, 1994, Theorems 2.6 and 3.4).

3.2.3.3 EL

The EL estimator of 00 is obtained as follows:

N
0 =ar max ln7r-
EL 3068; 2

subject to
N
Znim(zi; 0) = 0
i=1
and ’
N
D:- = 1
i=1

It can also be shown that Assumption 3.2.1 is sufficiently strong to satisfy the

conditions for consistency and asymptotic normality of éEL (see, e.g., Kitamura,

1997; Owen, 2001).

118

3.3 First order analysis

3.3.1 The ﬁrst order conditions

Let G(0) denote the Jacob-ion matrix of the moment functions in (3.12). Then
8m(z- 0) 8vech(2)
G E G 0 = —“— = ——
( ) 86' (90’

The following lemmas are used in derivation of the main results of the paper. They
are well known and thus given without proof (see, e.g., Chamberlain, 1984; Hansen,

1982; Qin and Lawless, 1994, for some relevant proofs).

Lemma 3.3.1 Under Assumption 3.2.1, the ﬁrst order condition for éQMLE is

G’H’(2 a 2)-1H [vech(S) — uech(2)] = o. (3.15)

Lemma,3.3.2 Under Assumption 3.2.1, the ﬁrst order condition for éGMM is

G'W_1[vech(S) — vech()3)] = 0. (3.16)

Lemma 3.3.3 Under Assumption 3.2.1, the ﬁrst order condition for éEL is

N —1
G’ [2 nimimg] [vech(S) — vech(2)] = 0, (3.17)
i=1

where m, = m(Zi; 0).

119

In Section 3.4, we will use an alternative way of writing the ﬁrst order conditions
that circumvents the need to operate with the inverse. Deﬁne A = —[E(0) <8)
2(9)]‘1HmN(6). Then the QMLE ﬁrst order condition can be written as

G(0)’H’A
SN(.3) = — = 0
HmN(9) + [2(0) <8) 2(0)]A

and we now have a p + q2-vector of parameters [3 = (0’, X)’. A similar representa-

tion of the GMM and EL ﬁrst order conditions was used, for example, by Newey
and Smith (2004).

It is clear from (3.15)-(3.17) that the only thing that distinguishes the three es-
timators is the way in which the empirical moments mN(0) are weighted. One
way to compare the ﬁrst order variances of GMM and QMLE is to note that
éQMLE comes from the GMM problem that employs a suboptimal weighting ma-
trix H’ (2 ® 2)‘1H and is therefore inferior to éGMM in terms of ﬁrst-order
asymptotic relative efﬁciency. However, that argument cannot be used to derive

the equal efﬁciency condition.

3.3.2 Relative efﬁciency to the ﬁrst order

Theorem 3.3.1 Suppose Assumption 3.2.1 holds. Let V denote the ﬁrst order

_ 1 .
asymptotic variance matrix of the relevant estimator, i.e. V = Avar[N 2 (0-90)].

120

Then,

VQMLE = [GhHI(20 ® 2joy-IHGOFI
xG’oH’(20 a 20)’1A0(20 a 20)‘1HG0 (3.18)
x [G’OH’(20 3 sorlncorl,

VGMM = VEL =[G;<HA.H’)'1G01‘1. (3.19)

Proof. See Chamberlain (1984, p. 1295) for derivation of (3.18); see Hansen (1982,
p. 1048) and Qin and Lawless (1994, p. 306) for derivation of (3.19). Cl

If the data are multivariate normal then it is easy to show that the above ex-
pressions for variance are the same. More speciﬁcally, on using properties of the
duplication matrix and equation (3.5) without the dimensionality subscript, the

following simpliﬁcations apply:

H’(2®2)—1A(232)—1H = H’(282)‘1(H+H)(2®2)(2®2)‘1H
= H’(2®}3)'1(II+II)H

= 2H’(2 a 2)-1H,

HAH’ = H(1[ + n)(2 a 2)H’

= 2H(2 a 2)FI’.

121

To see that [ENE 83 EDI—{Fl is equal to H’(2 ® 2)_1H, note that

H’()3 s 2)-1HH(2 a 2)H’ = $H’(E a 2)-1(1 + 11)(2 a 2)}‘1’
= gm: a 2)-1(2 ® 2)(1 + 11)}?

= éH’GI + mﬁ'

1 -
= - 2H’H’
2
1

Equation (3.19) of Theorem 3.3.1 states that the asymptotic variance matrices of
optimal GMM and EL are equal, i.e. the two estimators of 90 are asymptotically
equivalent to the ﬁrst order. It is not immediately clear from only the form of

(3.18) that QMLE is dominated by the other two estimators.

The main ﬁrst-order asymptotic result of this paper is stated in the next theorem.

Theorem 3.3.2 Let Assumption 3.2.1 hold. The estimators éGMM and 6EL are
no less asymptotically eﬂicient to the ﬁrst order than éQMLE- Equal ﬁrst-order

efﬁciency occurs under the following equivalent conditions:

(i) G0 is in the column space of HA0(20 <8) Eo)‘1HG0;

(ii) There exists a 107—2:12 x q (1+1 matrix D such that

G0 = HA0<20 ® 20)-1HG0D.

122

Proof. VQMLE —- VGMM is positive semideﬁnite (PSD) if and only if VéhM -—
vC-QIMLE is PSD. Denote HAOH’ by C and H’ (20 ® 20)’1H by A. We have

vahM — “3111.1: = G’oc-loo — G’OAG0[G;ACAGO]’1G’OAGO
1 1 1 1 1 1
= age-2 [1 — c2AG0[GgA<c2c2AGO]-1G;,A<c2)<1:‘2G0.

This is PSD because the middle part is the idempotent projection matrix onto

Cl/zAGo. This proves the ﬁrst part of the theorem.

The difference is zero if and only if C—l/QGO is in the column space spanned by

C1/2AG0, or equivalently, G0 is in the column space of CAGO. Note that

CAGO = HAOH’H’(20 a 20)—1HGO
= HA0 :— (11 + r1)(>:0 a 20)-1HG0
= HA0(20 e 20):"1 % (11+ H)HG0
= HMS. 3 Earl $2119,

= HAOOJO <8) 20)-1HG0.

This proves both (i) and (ii). [:1

Theorem 3.3.2 is novel in that it states the ﬁrst order efficiency properties of QMLE,
GMM and EL explicitly in terms of the fourth moments A of the distribution. It
is clear from the theorem, that GMM and EL dominate QMLE because they make
efﬁcient use of the second moment information without imposing restrictions on

the fourth moments. Ahn and Schmidt (1995, Appendix 2) showed that the GMM

estimator of covariance structures reaches the semiparametric efficiency bound of

123

Newey (1990). Theorem 3.3.2 provides an explicit expression for the gain attained

by GMM over QMLE.

Not surprisingly, the conditions of Theorem 3.3.2 hold for the multivariate normal

distribution. Using (3.5), one can write
imam, a 20)‘1HG0 = Ha - II)HG., = ZHHGO = 2 G0.

So condition (ii) trivially holds. However, there may conceivably exist other distrib-
utions that satisfy the equal ﬁrst order asymptotic efficiency conditions of Theorem

3.3.2. We leave further exploration of this point for future work.

3.4 Second order analysis

3.4.1 Stochastic expansions to the second order

Higher order stochastic expansions are based on the Taylor approximation of the

ﬁrst order conditions at the true value. The expansions have the following form
.. __ 1
Was - a.) = u + N 2. + 0pm“), (3.20)
where p and 1' are 019(1) random vectors.

It is well known that the ﬁrst order bias can be obtained by taking the expectation
of the ﬁrst term. Since QMLE, GMM, and EL are W consistent, their ﬁrst order

bias is zero. Similarly, the ﬁrst order variances can be obtained as the expectation

124

of the outer product of ﬁrst term. The second order bias is based on the expectation
of the ﬁrst two terms in (3.20). Alternatively, the second order bias can be obtained
using the Edgeworth approximation to the distribution as in Rothenberg (1984)
and McCullagh (1987).

General expressions for p and r of extremum and minimum distance estimators
with many examples can be found in Rilstone et al. (1996); Bao and Ullah (2003);
Ullah (2004); Kim (2005). Specialized expressions for the GMM and (generalized)
EL can be found in Newey et al. (2003) and Newey and Smith (2004).

Derivation of higher order stochastic expansions involves higher order derivatives
of the objective functions. Rilstone et al. (1996) use a recursive deﬁnition of
derivatives which is useful in general settings. In our derivation we follow N ewey
and Smith (2004) in using the usual deﬁnition because we do not go to the order
higher than two and because we wish to compare the QMLE bias to the GMM

and EL bias expressions they derive.

Deﬁne
G'H’A
Si(:6) = — a
Hm,+(2®2)x
529% )
M. = IE_2__0_ h = 0’ O, I,
.7 aﬁ’aﬁj, W ere 30 ( 01 )

R = [G’H’(2®2)-1HG]’1.
Q = RG’H’(2®2)-1,

P = (2®2)‘1—(2®2)‘1HGQ.

125

Theorem 3.4.1 Under Assumption 3.2.1, the estimator ﬁQMLE satisﬁes (3.20)

with

Q0 HJ—i[vech(Si)—vech(20)], (3.21)
Po Wi=1
2
—R. Q. ”+4
1/2 uiju,
Q2 Po j=1

where #j is the j -th element of a.

Proof. See Appendix B. [:1

Note that IE}; = 0 and the ﬁrst order variance of ﬁQMLE based on (3.21) can be

written as

Emu

- I

Q0 H E[m(Z,:; 60)m(Z,-; 90),]H, Q0
P0 P0

QvoQ’o QvoPﬁ, , (3.22)

 

POAOQ’O POAOP’O

where the upper left p x p block of (3.22) represents the ﬁrst order asymptotic

variance of BQMLE in (3.18).

Interestingly, the matrix in (3.22) is not in general block diagonal unlike the EL

and GMM analogues (see, e.g., Qin and Lawless, 1994, Theorem 1). However, in

126

the case of multivariate normality, the blocks of (3.22) can be simpliﬁed as follows

QAQ’
QAP’ = o, (3.23)

2R,

PAP' = (II+H)P.

And thus éQMLE and XQMLE are in this case asymptotically uncorrelated.

3.4.2 Second order bias of QMLE

Let B denote the second order bias of the relevant estimator. Using (3.20), the

bias can be written in terms of the expected value of 1' as
l3 = lET/N.

Thus, an explicit form of the QMLE bias contains EijJ-p, j = 1, . . . , p + q2.

But Mj can be written as

 

a2 G’H’A
M,- = —IE ,
35 332' Hm,- + (2: e 2),\
0:00,A=0
-I
0 G], H’ ,
— , _ , J = 1,. . .,p
HG’O 0%

G - o

— a] , j = 1, . . . ,q2
no,- 0

127

where Gj = 2,33%, G0,- = a,5},[G.;,H’],-, nj -_- 3%;(20 a 20) and 0,,- = 3%[20 a

Bob. Therefore Mj is non-random and we can write

0 G’o'H’
— , , lEuu’ej, j=1,...,p
HG?) 9%
lEuJ-Mju = (3.24)
Goj 0 2

— Eﬂﬂ’ep+j, j=13H°aqa
no,- 0

where ek is a p + q2-vector of zeros with the k-th element equal to 1. Substituting
(3.22) into (3.24) and simplifying yields the result of the following theorem.

Theorem 3.4.2 Under Assumption 3.2.1, the second order bias of ﬁQMLE can

be written as follows

-I
1 ’R0 Q0 p 0 Gb H, QvoQb

BQMLE = —— 2 ~ - ej
2N Q5; 1Do j=1 HG'b n’b PvoQb
2
<1 G .
+2 0’ Qvopgej , (3.25)

where ek is the zero vector of relevant dimension in which the k-th element is 1.

McCullagh (1987) and Linton (1997) give expressions for the second order bias of
QMLE in terms of cumulants; we use the higher—moment representation to enable

comparison with second order biases derived in Newey and Smith (2004).

128

Based on (3.23), the following simpliﬁcation applies in the multivariate normal

03.882
1 ’R0 Q0 p 0 Gj H, 2R0
Q0 POJ j=1HGo 0g 0
_ 1 4‘0 Q0 0
W Q; P. 22§=1HG£Roej

 

 

= _i Q0H2j= IGgRoej , (3.26)

N POH 29:1 ngoej

3.4.3 Comparison to GMM and EL

Newey and Smith’s (2004, Theorems 4.1 and 4.6) second order biases of GMM and

EL of 00 are, in our notation,

1
13mm) = —2—N ELEngELej, (3.27)
j: —1

1
IE‘GMM(‘9) = BEL + NQEL Uo,

where

QBL = RELG’llEmémZ-l,

REL = (G'UEmimS-I‘IG)“1,

C,‘
II

E[m,m§Pm,-].

It is not clear how these compare to BQMLEW) in general. However, when Z is

129

multivariate normal, it is easy to show that the upper block of BQMLE is equal to

(3.27) since, under normality,

REL = {G’[2H(2 e 2)H']-1G}-1
= 2(G’H'(2 a 2)-1HG]-1
= 2R,

QEL = RELG’[2ﬁ()3 a 2:)1’1'1-1
= RG’H’(2 a 2)‘1H

= QH.

3.5 Concluding remarks

The paper examined estimation methods available for covariance structure models
in terms of their ﬁrst and second order asymptotic properties. The results suggest

the following strategy in estimating models of covariance structure.

First, if we have large samples so that the ﬁrst-order asymptotic results can be
applied we should prefer GMM or EL to quasi-MLE. Due to increased computa-
tional difﬁculty of BL, the GMM estimator would be preferable. If efﬁciency is not
an issue and we are ready to sacriﬁce efﬁciency for a simpler and yet consistent

estimation technique we may prefer the traditional normal QMLE approach.

Second, if we have small samples, EL would be the preferred method of estimation.
If the data is normal, normal QMLE will have the same second-order bias as EL.

The bias can be estimated using (3.27) and the bias-adjusted estimator can be

130

constructed. If the data are not normal and we still use the QMLE, construction
of the bias-adjusted estimator may be more complicated but is still possible using

(3.25).

Interesting related questions are how different are the alternative estimates in
applications and whether the equal efﬁciency and equal bias results can be shown

for other distributions.

131

Bibliography

AHN, S. C. AND P. SCHMIDT (1995): “Efﬁcient estimation of models for dynamic
panel data,” Journal of Econometrics, 68, 5—27.

AIGNER, D. J., C. HSIAO, A. KAPTEYN, AND T. WANSBEEK (1984): “La-
tent variable models in econometrics,” in Handbook of Econometrics, ed. by

Z. Griliches and M. D. Intriligator, vol. II, 1323—1393.

BAO, Y. AND A. ULLAH (2003): “The Second-Order Bias and Mean Squared
Error of Estimators in Time Series Models,” Working Paper, University of Cal-
ifornia — Riverside,
http: / / www.economics.ucr.edu / papers / papers03 / 03-08.pdf.

CHAMBERLAIN, G. (1982): “Multivariate regression models for panel data,” Jour-
nal of Econometrics, 18, 5—46.

 

(1984): “Panel data,” in Handbook of Econometrics, ed. by Z. Griliches
and M. D. Intriligator, vol. II, 1248—1313.

HANSEN, L. (1982): “Large sample properties of generalized method of moments
estimators,” Econometrica, 50, 1029—1054.

J DRESKOG, K. G. (1970): “A general method for analysis of covariance struc-
tures,” Biometrika, 57, 239—251.

Joassxoc, K. G. AND A. S. GOLDBERGER (1975): “Estimation of amodel with
multiple indicators and multiple causes of a single latent variable,” Journal of
American Statistical Association, 70, 631—939.

JDRESKOG, K. G. AND D. SDRBOM (1977): “Statistical models and methods for
analysis of longitudinal data,” in Latent Variables in Socio-Economic Models,
ed. by D. a. A.Goldberger, Amsterdam: North-Holland Publishing Company,
Contributions to Economic Analysis, 285-325.

(1996): LISREL 8 User’s Reference Guide, SSI Scientiﬁc Software.

 

KIM, K.-I. (2005): “Higher order bias correcting moment equation for M-
estimation,” UCLA Papers.

132

KITAMURA, Y. (1997): “Empirical likelihood methods with weakly dependent
processes,” The Annals of Statistics, 25, 2084—2102.

LINTON, O. (1997): “An asymptotic expansion in the GARCH(1,1) model,”
Econometric Theory, 13, 558.

MAGNUS, J. R. AND H. NEUDECKER (1988): Matrix differential calculus with
applications in statistics and econometrics, Wiley Series in Probability and Sta-
tistics, Chichester: John Wiley and Sons Ltd.

MCCULLAGH, P. (1987): Tensor Methods in Statistics, Monographs on Statistics
and Applied Probability, London: Chapman and Hall.

NEWEY, W. (1990): “Semiparametric efficiency bounds,” Journal of Applied
Econometrics, 5, 99—135.

N EWEY, W. AND D. MCFADDEN (1994): “Large sample estimation and hypoth-
esis testing,” in Handbook of Econometrics, ed. by R. Engle and D. McFadden,
vol. IV, 2113—2241.

NEWEY, W. K., J. S. RAMALHO, AND R. J. SMITH (2003): “Asymptotic bias
for GMM and GEL estimators with estimated nuisance parameters,” CEMMAP
working paper CWP05/03.

NEWEY, W. K. AND R. J. SMITH (2004): “Higher order properties of GMM and
Generalized Empirical Likelihood estimators,” Econometrica, 72, 219-255.

OWEN, A. B. (2001): Empirical likelihood, Monographs on statistics and applied
probability; 92, Boca Raton, Fla. : Chapman and Hall.

QIN, .1. AND J. LAWLESS (1994): “Empirical likelihood and general estimating
equations,” The Annals of Statistics, 22, 300—325.

RILSTONE, P., V. SRIVASTAVA, AND A. ULLAH (1996): “The second—order bias

and mean squared error of nonlinear estimators,” Journal of Econometrics, 75,
369—395.

ROTHENBERG, T. (1984): “Approximating the distributions of econometric esti-
mators and test statictics,” in Handbook of Econometrics, ed. by Z. Griliches

and M. D. Intriligator, vol. II, 881—935.

ULLAH, A. (2004): Finite Sample Econometrics, Advanced Texts in Econometrics,
Oxford University Press.

133

Appendix: Proofs

PROOF OF THEOREM 3. 4.1. Let M(B): N211; QHT’ M(ﬁ) ____ E621?)

Edy-(B): N Z,- 1215—682915— and B be between B and B0. By the second-order
Taylor expansion of (3.15) around B0, we have
8MB) = 0
1p _ .
= sN(B.) +M(B.) (B— B.)+ 5 21B. —B..-)M .-(B)(B — B.)

= SN(ﬁo) + M(IBoXﬁ— Hol+ [M J(.Blo) _ M(ﬁO)](B _ :30)
W42

+1132 (Bi B..)M.-(B.)(B - [30) +
23': —1
1 p+q2 . _ _ -
+5 2 (Br - BoBlew) — M.(B.))(B — B.)
j=1

Note that M(BO) = M(Bo) so that the third term in the last equation is zero.

Also note that the last term is 0,.( N ‘3/ 2).

Assume that M(Bo) is not singular. Then,

,6)-.30 = ’lM (Boll—1 SN(ﬁo)
P+q2

BM(.)11 23B. —B..—)M .(B.)(B— B.)

+O,.(N N‘3/2). (3-28)

134

0 GQH' 0
But M(BO) = — , sN(B0) = — and the sec—
HGo 2:0 ® 230 H mN(90)

ond term is 0,.(N‘1). We thus have

3 _ 3 = _1_ Q0 H—1—- i[vech(S-) — vech(20)] + O (N-l)
0 m Po ”/7”? i=1 1 p
= —\71_I_\l_“+ 0p(N_1). (3-29)

Substituting (3.29) into (3.28), multiplying by x/TV- and collecting terms of the

same order yields the result.

DEPARTMENT OF ECONOMICS
MICHIGAN STATE UNIVERSITY
EAST LANSING, MI 48824

Email address: prohoronmsu.edu

135