ESSAYS IN MULTIPLE FRACTIONAL RESPONSES WITH ENDOGENOUS
EXPLANATORY VARIABLES
By
Suhyeon Nam

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Economics – Doctor of Philosophy
2014

ABSTRACT
ESSAYS IN MULTIPLE FRACTIONAL RESPONSES WITH ENDOGENOUS
EXPLANATORY VARIABLES
By
Suhyeon Nam
This dissertation consists of three chapters. The first and second chapters develop new
estimation methods for multiple fractional response variables with endogeneity. Multiple
fractional response variables have two features. Each response is between zero and one,
and the sum of the responses for a unit is one. The first chapter proposes an estimation
method accounting for these features when there is a continuous endogenous explanatory
variable (EEV). It is a two step estimation method combining a control function approach.
The first step generates a control function using a linear regression, and the second step
maximizes a multinomial log likelihood function with a multinomial logit conditional
mean which depends on the control function generated in the first step. Monte Carlo
simulations examine the performance of the estimation method when the conditional
mean in the second step is misspecified. The simulation results provide evidence that
the method’s average partial effect (APE) estimates approximate well true APEs as long
as an instrument is not weak and that the method’s approximation outperforms an alternative linear method’s. We apply the proposed two step estimation method to Michigan’s
fourth grade math test data to estimate the average partial effects of spending on student
performance.
The second chapter develops and evaluates an estimation method allowing for the
discrete nature of an EEV. We modify the two step estimation method proposed in the
first chapter by following Wooldridge (2014); instead of unstandardized residual, we use
the generalized residuals as control functions The Monte Carlo simulation demonstrate
that although the two step estimation method cannot provide consistent estimators for the
mean parameters and average partial effects under the conditional mean misspecification,

it yields a decent approximation to average partial effects.
In the third chapter, we clarify some issues in computing average partial (or marginal)
effects in models that have been estimated using control function or correlated random
effects approaches (or some combination). In particular, we show that a common method
of estimating average partial effects, where the averaging is done across all variables and
across the entire sample, estimates an interesting parameter. Nevertheless, the method
differs from averaging across the observed covariates the partial effects obtained via the
average structural function. In the special case where unobservables are independent of
the observed covariates the two methods are identical.

Copyright by
SUHYEON NAM
2014

To my husband, Sungsam Chung,
my family,
and God.

v

ACKNOWLEDGMENTS

I would not have been able to accomplish my doctoral study without the help and
support from many people.
First, I am deeply grateful to my advisor, Professor Jeffrey Wooldridge for his invaluable guidance and continuous support during these long years. I would also like to thank
my dissertation committee members, Professor Leslie Papke, Professor Peter Schmidt,
and Dr. Kenneth Frank for their insightful comments and encouragements.
I want to express my special thanks to Hajin Kim and Eunsil Lee, who have been my
family in Michigan.
My special thanks also go to my family. Especially, I am truly grateful to my parents
and parents-in-law for their unconditional love and support. And I thank my husband,
Sungsam Chung, who shared with me every moment of this journey.
Finally, I thank God, who has always been with me.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH CONTINUOUS ENDOGENOUS EXPLANATORY VARIABLES . . . .
1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 THE MODEL AND ESTIMATION WITH ENDOGENEITY . . . . . . . . . .
1.3 MONTE CARLO SIMULATIONS . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 The Quantities of Interest . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 APPLICATION: MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM
MATH TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

CHAPTER 1

MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH A
BINARY ENDOGENOUS EXPLANATORY VARIABLE . . . . .
2.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 THE MODIFIED TWO STEP ESTIMATION . . . . . . . . . . . . . . . . . .
2.3 MONTE CARLO SIMULATION . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
1
4
11
11
12
14
60
66
67
78

CHAPTER 2

ON COMPUTING AVERAGE PARTIAL EFFECTS IN MODELS
WITH ENDOGENEITY OR HETEROGENEITY . . . . . . . . .
3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 THE AVERAGE STRUCTURAL FUNCTION AND AVERAGE PARTIAL
EFFECTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 APES ACROSS THE ENTIRE POPULATION . . . . . . . . . . . . . . . . .
3.4 EMPIRICAL EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

81
81
83
89
89
90
115
116

CHAPTER 3

vii

. 118
. 118
.
.
.
.
.

121
128
134
136
138

LIST OF TABLES

Table 1.1

Average APEs under Condition 1 . . . . . . . . . . . . . . . . . . . . . 16

Table 1.2

Percentile APEs under Condition 1 and Normal distribution . . . . . 17

Table 1.3

Percentile APEs under Condition 1 and χ23 distribution . . . . . . . . . 19

Table 1.4

F statistics of the 1st step (H0 : π2 = 0) . . . . . . . . . . . . . . . . . . 33

Table 1.5

Average APEs under Condition 2 and Normal distribution . . . . . . 34

Table 1.6

Mean Squared Errors of Average APE estimates under Condition 2 . . 44

Table 1.7

Percentile APEs under Condition 2, π2 = 0.1, and Normal distribution 46

Table 1.8

Percentile APEs under Condition 2, π2 = 0.2, and Normal distribution 48

Table 1.9

Percentile APEs under Condition 2, π2 = 0.5, and Normal distribution 50

Table 1.10 Percentile APEs under Condition 2, π2 = 0.1, and χ23 distribution . . . 52
Table 1.11 Percentile APEs under Condition 2, π2 = 0.2, and χ23 distribution . . . 54
Table 1.12 Percentile APEs under Condition 2, π2 = 0.5, and χ23 distribution . . . 56
Table 1.13 Mean Squared Errors of Percentile APE estimates under Condition 2 . 58
Table 1.14 Four levels of MEAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 1.15 Summary statistics of the dependent variables . . . . . . . . . . . . . . 61
Table 1.16 Summary statistics of the data . . . . . . . . . . . . . . . . . . . . . . . 62
Table 1.17 The first step estimation result . . . . . . . . . . . . . . . . . . . . . . . 63
Table 1.18 Average APE estimates of spending on the fourth grade MEAP math
test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Table 1.19 Percentile APE estimates of spending on the fourth grade MEAP
math test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

viii

Table 1.20 (yi1 , yi2 , yi3 ) generated across the simulations . . . . . . . . . . . . . . . 77
Table 2.1

APEs with r = ρu + e, π2 = 1, and e|v ∼ Normal (0, 1) . . . . . . . . . 92

Table 2.2

APEs with r = ρu + e, π2 = 1, and e|v ∼ χ23

Table 2.3

MSEs of APEs with r = ρu + e and π2 = 1 . . . . . . . . . . . . . . . . 94

Table 2.4

F and Wald statistics of the first stage/step (H0 : π2 = 0) . . . . . . . . 95

Table 2.5

APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ Normal (0, 1) . . . . 97

Table 2.6

APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ χ23 . . . . . . . . . . 98

Table 2.7

MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 . . . . . . 99

Table 2.8

Rejection Frequencies for α = 0.05 test in Case 1 with varying ρ . . . . 101

Table 2.9

Rejection Frequencies for α = 0.05 test in Case 1 with varying π2 . . . 104

. . . . . . . . . . . . . . . 93

Table 2.10 APEs with r = ρu + e and π2 = 1 in Case 2 . . . . . . . . . . . . . . . . 107
Table 2.11 MSEs of APE estimates with r = ρu + e and π2 = 1 in Case 2 . . . . . 109
Table 2.12 APEs with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 . . . . . . . . . . . 110
Table 2.13 MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 111
Table 2.14 Rejection Frequencies for α = 0.05 test in Case 2 with varying ρ . . . . 113
Table 2.15 Rejection Frequencies for α = 0.05 test in Case 2 with varying π2 . . . 113
Table 3.1

APE estimates in Papke and Wooldridge (2008) . . . . . . . . . . . . . 135

Table 3.2

APE estimates in Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . 136

ix

LIST OF FIGURES

Figure 1.1 Empirical distributions of Average APE estimates under Condition 1
and Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 1.2 Empirical distributions of Average APE estimates under Condition 1
and Logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 1.3 Empirical distributions of Average APE estimates under Condition 1
and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 1.4 Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Normal distribution . . . . . . . . . . . . . . . . . 24
Figure 1.5 Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Logistic distribution . . . . . . . . . . . . . . . . . 27
Figure 1.6 Empirical distribution of Percentile APEs under Condition 1 and χ23
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 1.7 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.1, and Normal distribution . . . . . . . . . . . . . . . . . . . 35
Figure 1.8 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.2, and Normal distribution . . . . . . . . . . . . . . . . . . . 36
Figure 1.9 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.5, and Normal distribution . . . . . . . . . . . . . . . . . . . 37
Figure 1.10 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.1, and Logistic distribution . . . . . . . . . . . . . . . . . . . 38
Figure 1.11 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.2, and Logistic distribution . . . . . . . . . . . . . . . . . . . 39
Figure 1.12 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.5, and Logistic distribution . . . . . . . . . . . . . . . . . . . 40
Figure 1.13 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.1, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 41

x

Figure 1.14 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.2, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 42
Figure 1.15 Empirical distributions of Average APE estimates under Condition
2, π2 = 0.5, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 2.1 Rejection Frequencies for Normal(0,1) with varying ρ . . . . . . . . . . 102
Figure 2.2 Rejection Frequencies for χ23 with varying ρ . . . . . . . . . . . . . . . 102
Figure 2.3 Rejection Frequencies for Normal(0,1) with varying π2 , (π1 = τ = 0) . 105
Figure 2.4 Rejection Frequencies for χ23 with varying π2 , (π1 = τ = 0) . . . . . . 105
Figure 2.5 Rejection Frequencies for Normal 0, 1 + 21 v2 with varying π2 , (π1 =
τ = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Figure 2.6 Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 =
τ = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

xi

CHAPTER 1
MULTIPLE FRACTIONAL RESPONSE
VARIABLES WITH CONTINUOUS
ENDOGENOUS EXPLANATORY
VARIABLES
1.1

INTRODUCTION

Fractional responses have interesting functional form issues that have been attracting
econometricians’ attentions. The research began with a single fractional response, a fractional scalar yi , which has a salient feature - the bounded nature: 0 ≤ yi ≤ 1. Then it has
moved to two kinds of systems of fractional responses. One is panel data setting in which
a cross sectional unit has relatively smaller time periods. The other is multiple responses
in which a cross sectional unit has a set of several choices.
For a single fractional response, an OLS estimator or an IV estimator of a linear model
are consistent for the parameters in the linear projection. They, however, do not guarantee
that their fitted values lie within the unit interval nor that their partial effects estimates
y
, is a
for regressors’ extreme values are good.1 The log-odds transformation, log
1−y
traditional solution to recognize the bounded nature. But it requires the response to be
strictly inside the unit interval. Papke and Wooldridge (1996) introduce a quasi maximum likelihood estimation (QMLE), which is applicable even when the response takes
the boundary values. Their nonlinear estimation method directly models the conditional
mean of the response as an appropriate function.
1

Theses are the same drawbacks as the linear probability model for a binary response has.

1

Papke and Wooldridge (2008) extend their single fractional response discussion to a
panel data setting with allowing for endogeneity. They allow time invariant unobserved
effect to be correlated with explanatory variables and develop another QMLE method
employing a control function approach to account for endogeneity.
Multiple fractional responses have one additional feature as well as the bounded nature: an adding-up constraint. The sum of a cross sectional unit’s multiple responses is
one. For example, a researcher studies student performance for a test in a state. Suppose
that she is interested in how the public school spending affects the test outcome and her
response variable is a set of students’ pass and fail shares in a district i, (yi,pass , yi,fail ) where
yi,pass + yi,fail = 1. This example can fall into the single fractional response category since
there are only two shares. However, if the test outcomes are graded into the student’s
level of proficiency from Level 1 to Level 4 and district-level data she can access contain
the four shares (yi,level1 , yi,level2 , yi,level3 , yi,level4 ) where ∑41 yi,levelg = 1 instead of the pass and
fail rates, the types of behaviors described by the single fractional response analysis are
limited. Hence, an alternative estimation method is required to exploit all of the available
information when there are more than two shares.
Such an estimation method is proposed by Sivakumar and Bhat (2002). It is a method
of QMLE with the multinomial distribution and the multinomial logit conditional mean
specification. It is a multivariate generalization of the method proposed by Papke and
Wooldridge (1996). Mullahy (2010) studies this method with more detail. Buis (2008) has
written a STATA

module of this QMLE, and dubs it as “fractional multinomial logit

(fmlogit).” In this chapter, we also refer this QMLE as fractional multinomial logit or
fmlogit.
Although these studies develop a new estimation method, which can consistently estimate the parameters in the mean as long as the mean specification is correct, they do
not address endogeneity. In empirical works, however, endogeneity often arises. In the
student performance example, school spending could be endogenous because it is likely

2

to be correlated with some unobserved district effects. This endogeneity issue may lead
to inconsistency of the fractional multinomial logit estimation.
In this chapter, we develop an estimation method for multiple fractional responses
with endogenous explanatory variables. In the model, we allow a continuous endogenous explanatory variable to be correlated with an unobserved omitted variable. Then
we propose a two step estimation method employing a control function approach to deal
with the endogeneity. The first step generates a control function and the second step applies fractional multinomial logit with including the control function as an extra regressor
in the conditional mean. The method can provide a consistent estimator of the conditional
mean parameters provided that the conditional mean specification in the second step is
correct.
A distinct feature of this method is that although the multinomial logit specification
in the second step is sensible as a multiple fractional responses’ conditional mean, it is
not underpinned by usual structural assumptions. The functional form of the conditional
mean in the second step is determined by the two structural components. One is the
functional form of the conditional mean depending on the unobserved omitted variable
(structural conditional mean). The other is the distributional assumption of the error,
which appears when the control function approach is combined. However, there are no
closed forms for them to allow the second step conditional mean to be multinomial logit.
Thus we suggest directly specifying the conditional mean of the second step as multinomial logit without usual assumptions about them.
But we would like to examine how the two step estimation method works if the multinomial logit specification is wrong by conducting Monte Carlo simulations. The simulations focus on whether or not the estimates by the proposed two step estimation method
can approximate well the average partial effects of the endogenous explanatory variable,
which is the partial effects of the endogenous explanatory variable on the conditional
mean averaged across the population distribution of the endogenous explanatory vari-

3

able. Further, we compare the method’s approximation ability with an alternative linear
model’s.
The simulation results provide evidence that even though the conditional mean is
misspecified, the two step estimation method with a strong instrument yields a good approximation. Although a weak instrument deteriorates its approximation performance,
it still outperforms an alternative linear approach.
The rest of the chapter is organized as follows. Section 1.2 describes the model and the
two step estimation method.Section 1.3 presents a Monte Carlo simulation design and
results where the conditional mean of the two step estimation method is misspecified.
Section 1.4 includes an application of the method to examine the relationship between
public school spending and the fourth grade math test outcome for Michigan. And Section 1.5 concludes the chapter.

1.2

THE MODEL AND ESTIMATION WITH ENDOGENEITY

We assume that a random sampling across the cross section is available, and each cross
sectional unit i has G choices where the sum of i’s responses is one. The dependent variable for i is expressed as



yi1
 
 .. 
 . 
 
 

yi = 
 yig 
 
 .. 
 . 
 
yiG

(1.1)

G ×1

where
0 ≤ yig ≤ 1,

g = 1, · · · , G,

4

(1.2)

and
G

∑ yig = 1.

(1.3)

g

(1.2) and (1.3) represent the bounded nature and the adding-up constraint, respectively.
To represent endogeneity in the model, we assume that there is a continuous endogenous explanatory variable wig , and that it is correlated with an unobserved omitted
variable rig . To simplify the exposition, wig and rig are assumed to be invariant across
choices: ∀ g, wig = wi and rig = ri . Then, for a set of explanatory variables in all choices,
Xi = (xi1 , · · · , xiG ), we assume
E(yig |Xi , ri ) = E(yig |Zi , wi , ri ) = Gg (Zi1 , wi , ri ; β),

g = 1, · · · , G,

(1.4)

where
0 < Gg (·) < 1

(1.5)

and
G

∑ Gg (·) = 1.

(1.6)

g

Zi ≡ (zi1 , · · · , ziG ) is a set of exogenous variables in all choices where zig = (zi1g zi2g )
indicates exogenous variables for choice g and Zi1 ≡ (zi11 , · · · , zi1G ) is a set of zi1g , ∀ g.
(1.5) ensures that the fitted value will lie between zero and one. The adding-up constraint
(1.3) leads to (1.6). Any function satisfying both (1.5) and (1.6) can be specified for G (·).
wi can appear very flexibly in (1.4); for example, we can add wi2 to allow for the
quadratic effect of w. If wi and wi2 appear in the specification, plug-in methods are subject
to the “forbidden regression” problem as Wooldridge (2010) discusses.
To deal with the endogeneity, we employ a control function approach. It includes extra
regressors in the estimating equation so that the remaining variation in the endogenous
explanatory variable would not be correlated with the unobservables. Since the approach

5

requires an exclusion restriction, a part of Zi appears in Gg (·). We further assume

wi = f ( Z i ; π ) + v i
ri = ρvi + ei

(1.7)
(1.8)

and

(ri , vi ) is independent of Zi .

(1.9)

(1.7) models the endogenous variable wi as a function of Zi where π is the parameter
vector. It includes the exogenous variables excluded from (1.4) so that the instruments
could be allowed to be correlated with w. (1.8) models the omitted variable ri as a linear
function of the reduced form error vi , which plays a role of the control function - the
extra regressor - in this study, where ei is independent of wi . (1.8) is for simplicity; it can
be allowed to be more flexible by modeling it with polynomial functions of vi as well
as vi . (1.8) reveals that if there is any correlation between wi and ri , it can only come
through vi . So ρ shows how much wi is correlated with ri , and consequently tells whether
wi is endogenous or not. Due to (1.9), wi cannot have discreteness. The independence
assumption implies
D ( ei | Z i , v i ) = D ( ei ).

(1.10)

(1.8) and (1.9) ensure that a single control function, vi , can correct the endogeneity of wi
even when flexible functional forms for wi appear in (1.4).
If we assume a parametric model for the distribution in (1.10), then one could derive
the mean function conditional on Xi as
E(yig |Xi ) = K g (Zi1 , wi , vi ; θ)

(1.11)

0 < K g (·) < 1

(1.12)

where

6

and
G

∑ Kg (·) = 1.

(1.13)

g

If we know the functional form of K g (·) and vi is observed, θ can be estimated by
nonlinear least squares or a QMLE using multinomial distribution by specifying K g (·)
as a proper functional form satisfying (1.12) and (1.13). However, since vi is unobserved
here, a simple way to estimate θ is to replace vi with a consistent estimator of vi and apply
one of those estimation methods.
In general, it is difficult to start with function Gg (·) and a distribution for ei and obtain
K g (·) as a simple function. Instead, the proposal is to directly model K g (·) parametrically.
A natural choice for a proper functional form of K g (·) is multinomial logit,

K g (hi ; θ) =

exp hi θg
∑hG exp (hi θh )

(1.14)

where hi = (xi1 vi ) = (zi1 wi vi ) is a 1 × p vector, θ = (θ1 . . . θG ) is a pG × 1 parameter
vector, θg is a p × 1 vector, g = 2, · · · , G, and θ1 = 0.2 In the basic multinomial logit
model, a set of explanatory variables change by unit i but not by choice g. Its coefficient
parameters change by choice g, instead.3 In accordance with this choice, we rewrite (1.7)
as a linear function of zi = (zi1 zi2 )1× M :

wi = zi π + vi = zi1 π 1 + zi2 π 2 + vi

(1.15)

where π = (π 1 π 2 ) is a M × 1 parameter vector and the constant is subsumed in zi1 .
The transformation of w should be carefully chosen to yield (1.15) where vi is arguably
independent of zi . Plus, we can add zi in a flexible way. (1.15) is to simplify the notation.
Then we propose the following procedure for θ:
2

The first choice is a reference.
Hence this specification is appropriate for problems where the characteristics of choices are unimportant or are not of interest.
3

7

Procedure 1.2.1
Step 1. Obtain the OLS residual vi from the regression of wi on zi .
Step 2. Apply fractional multinomial logit of (yi1 , yi2 , · · · , yiG ) on zi1 , wi and vi to estimate
θ. This is a QMLE with (1.14) and the following log likelihood for i replacing vi with
vi :

G

i ( θ)

= ∑ yig log K g (hi ; θ).

(1.16)

g

Procedure 1.2.1 yields a consistent estimator of θ under (1.14). Its consistency does
not hinge on whether or not (1.16) is true. It is because the multinomial distribution is
a member of the linear exponential family (LEF). Gourieroux et al. (1984) show that a
QMLE with a distribution in the LEF provides a consistent estimate of the parameters in
a correctly specified conditional mean even when the rest of distribution is misspecified.
Furthermore, Procedure 1.2.1 is able to provide a very useful estimator for the quantity
regarding the structural conditional mean Gg (·). Dropping the cross-sectional unit i, the
partial effect of interest for a continuous explanatory variable x1j , the jth element of x1 is
∂Gg (x1 , r; β)
∂E(y g |x, r )
=
,
∂x1j
∂x1j

∀ g,

(1.17)

where x = (z, w) and x1 = (z1 , w). However, (1.17) is not identified because r is unobserved. Thus the quantity of more interest is the average partial effect (APE), which can
be identified by averaging (1.17) over the distribution of r:

Er

∂Gg (x01 , r; β)
∂x1j

,

∀g

(1.18)

where the APEs are evaluated at x01 , a set of fixed values of the covariates. From Wooldridge

8

(2010, section 2.2.5)

Er

∂Gg (x01 , r; β)
∂x1j

= Ev

∂K g (x01 , v; θ)
∂x1j

(1.19)

under (1.9) and (1.15). Hence Procedure 1.2.1 can estimate the APE on the structural
conditional mean Gg (·) even though it does not estimate β, the structural parameters.
The asymptotic variances of θ and the APE estimator need to consider the additional
variation from the first step of the procedure. The appendix derives their valid asymptotic
variances.
Notice that the two step estimation method does not assume anything about a functional form of Gg (·) and a distribution of D (e) although they determine the functional
form of K g (·). If a combination of their specific forms were able to obtain a certain explicit functional form for K g (·) satisfying (1.12) and (1.13), we would assume those specific forms and maximize the multinomial distribution with a derived K g (·) from those
assumptions. However, there are no closed forms of Gg (·) and D (e) to generate an explicitly known form satisfying (1.12) and (1.13). Considering that Gg (·) should satisfy (1.5)
and (1.6), a natural choice for Gg (·) is also multinomial logit. Yet, it cannot derive a closed
form of K g (·) whatever D (e) is. If Gg (·) is specified as (1.20), it can derive an explicit form
by assuming that e is normally distributed:
Gg (Xi ; β) = Φ(xig β),
G −1

GG ( X i ; β ) = 1 −

∑

g = 1, · · · , G − 1,

Φ(xig β)

(1.20)

g

where Φ(·) is the standard normal cumulative distribution function. Based on the mixing
property of the normal distribution, the derived function is similar as (1.20). However,
GG (·) and the derived function for choice G are not necessarily between zero and one,
which violate (1.5) and (1.12). It is the same drawback that the linear models have. So
(1.20) is not appropriate, either.
9

Alternatively, the two step estimation method suggests directly specifying K g (·) as
multinomial logit. That is, instead of making usual assumptions about Gg (·) and D (e), it
implicitly assumes that their combination derives the multinomial logit functional form
of K g (·).4 To be cautious about this, we conduct Monte Carlo simulations to investigate
how it works as an approximation when the specification is wrong.
Some researchers may be inclined to use a linear model rather than a nonlinear model,
in which case it would be natural to drop one of the G equations and apply a linear
method, say a linear control function (LCF) approach to remaining equations. Then, for
choice g, w’s coefficient parameter estimate by the linear method is comparable to (1.23).
Procedure 1.2.2 summarizes the LCF approach:
Procedure 1.2.2
Step 1. Obtain the OLS residual vi from the regression of wi on zi .
This is the same as Step 1 of Procedure 1.2.1.
Step 2. For each g = 2, · · · , G,5 regress yig on zi1 , wi and vi to estimate γ g , where γ g =

(γzg γwg γvg ) is a 4 × 1 coefficient parameter vector for choice g. Obtain γ1 based
on γ1 = e1 − γ2 − γ3 , where e1 is a 4 × 1 unit vector.6
The asymptotic variance of γ g also needs the adjustment taking the extra variation from
the first step into account; see the appendix.
The simulations compare the approximation by the two step estimation method with
the misspecified conditional mean and one by this LCF approach.
4

The approach reflects the manner in which Petrin and Train (2010) employ a control function approach
when their dependent variable is a multinomial choice. They divide the structural error in their consumer
utility into two parts to generate a mixed logit. Without a distributional assumption of the structural error,
one divided part is assumed to be normal and the other is assumed to be type 1 extreme value.
5 The first choice is dropped as the reference choice.
6 The coefficients of a variable across choices sum to be 0 and those of the constant sum to be 1 because
of (1.3).

10

1.3

MONTE CARLO SIMULATIONS

1.3.1

The Quantities of Interest

The quantity of interest in the simulations is the APE of the endogenous explanatory
variable w,
Er

∂Gg (x01 , r; β)
∂w

= Er

∂Gg (z01 , w0 , r; β)
∂w

,

∀ g.

(1.21)

Since (1.21) depends on where it is evaluated, the simulations use two approaches to
obtain a single number. One is averaging (1.21) out across the sample again and the other
is evaluating (1.21) at a certain set of values, (z1 , w p ) where z1 is the mean of z1 and w p
stands for the pth percentile of w’s distribution. We call the former “average APE” and
the latter “percentile APE.”
If the two step estimation method’s mean specification (1.14) is correct, (1.21) is obtained by estimating

Ev

∂K g (x01 , v; θ)
∂w

= Ev

K g (x1◦ , v; θ) ·

∑hG θwh exp (x1◦ θxh + θvh v)
θwg −
∑hG exp x1◦ θxh + θvh v

(1.22)

where θxh = (θzh θwh ) . Since the distribution of v is not assumed, (1.22) can be estimated
by averaging out vi across the sample, instead:
1
N

N





∑ Kg (x01, vi ; θ) · θwg −
i

∑hG θwh exp x1◦ θxh + θvh vi
∑hG

exp

x1◦ θxh

+ θvh vi




(1.23)

where θ is obtained from Procedure 1.2.1.
The simulations let (1.14) be misspecified, and so we examine how close (1.23) is to
(1.21), if it is closer than the estimates by the LCF approach.
Some simulations allow for w’s quadratic effect by including w2 in the model. These
simulations add v2i and v3i in the two procedures’ second steps to see if it improves their
11

approximations.

1.3.2

Data Generating Process

For the simulations, the number of observation N and choice G are 500 and 3, respectively.
We use 1000 replications.
The covariates For each replication, we generate 500 observations of zi , wi , ri , vi and ei
as following.
• zi = (zi1 zi2 ) = (1 zi1 zi2 )1×3
where zi1 = (1 zi1 ) and






 



zi1 
 0   1 τ 
  ∼ MV Normal   , 
 ,
zi2
0
τ 1

τ ∈ {0, −0.5}.

There are one included exogenous variable and one excluded exogenous variable
where they are drawn from the multivariate normal distribution. A simulation allows them to be correlated: τ = −0.5.
• D (e) is one of the three distributions:
(a) ei ∼ Normal (0, 1)
(b) ei ∼ Logistic(0, 1)
(c) ei ∼ χ23
To study various misspecifications, three distributions of e are in use: two symmetric
distributions and one asymmetric distribution.
• vi ∼ Normal (0, σ2 )7
7

σ2 is adjusted for the variance of wi to be invariant across the simulations.

12

• wi = π1 zi1 + π2 zi2 + vi
The endogenous variable is generated based on (1.15). The coefficient parameter for
the constant is set to be zero.
• ri = ρvi + ei
The omitted variable is generated based on (1.8).
The structural conditional mean Gg (·) specification: We specify Gg (·) as multinomial
logit because it satisfies (1.5) and (1.6):

E(yig |xi , ri ) = Gg (zi1 , wi , ri ; β) =

exp zi1 βzg + wi β wg + ri β rg
∑3h exp (zi1 βzh + wi β wh + ri β rh )

(1.24)

where β = ( β1 β2 β3 ) is a 12 × 1 parameter vector, β g = ( βzg β wg β rg ) is a 4 × 1
parameter vector for g = 2, 3, and β1 = 0 since the first choice is chosen as a reference.
Other parameters are set to be 1: β g = (1 1 1 1)

for g = 2, 3.8 Note that (1.14) is

misspecified under (1.24) and any of the three distributions for e.
The multiple fractional dependent variables y: The multiple fractional dependent variables for each observation i are generated by the following process.
(1) Calculate the response probabilities Gi1 , Gi2 , and Gi3 by using (1.24) and the covariates
generated above.9
(2) Draw 100 multinomial outcomes among 1, 2, and 3 based on the calculated response
probabilities.
8

For the simulations including w2 ,

Gg (zi1 , wi , ri ; β) =

exp zi1 βzg + wi β wg + wi2 β w2 g + ri β rg
∑3h exp zi1 βzh + wi β wh + wi2 β w2 h + ri β rh

where β is a 15 × 1 parameter vector, β g = ( βzg β wg β w2 g β rg ) = (1 1 1 − 0.1 1) for g = 2, 3.
9 (1.25) is used for the model including w2 .

13

(1.25)

(3) Count the frequencies and obtain the proportion for each outcome.
For instances, if 1 is drawn 50 times, 2 is drawn 30 times, and 3 is drawn 20 times for
an observation i, then yi1 = 0.5, yi2 = 0.3, and yi3 = 0.2.10 Appendix includes a table
showing the summary of (yi1 , yi2 , yi3 ) generated by this process.

1.3.3

Simulation Results

The first columns of the simulation result tables show whether or not the quadratic effect
of w is included in the model: w indicates that it is not, and w2 indicates that it is. The
tables report the mean of (1.21) over the 1000 replications (True) along with the results
of the two step estimation method and the LCF approach - the means of the estimates
(Mean), their standard deviations (SD), and the means of their adjusted standard errors
(SE). For the model including the quadratic effect, there are two additional estimation
results allowing for the control function in a flexible way: Two step (flexible) and LCF
(flexible). They includ v, v2 ,and v3 in the second stage; We would like to examine if it
helps the approximations.
Condition 1: π2 = 1, ρ = 1
In generating the data for Condition 1, we allow the instrument and the endogeneity to
be strong: wi = zi2 + vi and ri = vi + ei .11,12
The simulation results under Condition 1 demonstrate that while both the two step estimation method and the LCF approach provide good approximations to average APEs,
the two step estimation method provides better percentile APE estimates. In Table 1.1,
the average APE estimates by the two methods are quite similar to true APEs. However,
10

Through this process, the upper corner 1 is generated only for the reference choice while the lower
corner 0 is generated for all three choices because of the multinomial logit response probabilities structure.
11 z has no effect on w : π = 0
1
i1
i
12 The simulations allowing z to affect w (w = 0.5z + z + v , τ = −0.5, ρ = 1) provide similar
i1
i
i
i1
i2
i
results as those under Condition 1.

14

Table 1.2 and Table 1.3 illustrate that the percentile APE estimates by the two step estimation method have less biased without any sign distortions across the percentiles of w’s
distribution than those by the LCF approach.13 The percentile estimates by the LCF approach under the χ23 distribution have the opposite directions to the true APEs at the 90th
percentile.
The results also present that allowing for the flexible forms of vi does not help the approximations when w’s quadratic effect is included in the models. In Table 1.1 through 1.3,
the estimates with allowing for the flexible forms of vi are similar to those without it.
The empirical distributions of APE estimates in Figure 1.1 through 1.6, confirm these
results.

13 The results of percentile APEs under the logistic distribution are similar to those under the normal
distribution in general.

15

Table 1.1: Average APEs under Condition 1
D (e)
g
True
Two Step

w

LCF

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)
1.
2.

Mean
Mean
SD
SE
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

Normal
1
2
-0.117 0.059
-0.117 0.059
0.008 0.005
-0.109 0.055
0.009 0.005
0.009 0.005
-0.127 0.063
-0.127 0.063
0.009 0.005
-0.126 0.063
0.009 0.005
-0.118 0.059
0.009 0.005
0.008 0.005
-0.118 0.059
0.008 0.005
0.008 0.005

3
0.059
0.058
0.005
0.054
0.005
0.005
0.063
0.063
0.005
0.063
0.005
0.059
0.005
0.005
0.059
0.005
0.005

Logistic
1
2
-0.108 0.054
-0.108 0.054
0.011 0.006
-0.102 0.051
0.011 0.006
0.011 0.006
-0.116 0.058
-0.115 0.058
0.011 0.006
-0.115 0.057
0.011 0.006
-0.108 0.054
0.011 0.006
0.011 0.006
-0.109 0.054
0.011 0.006
0.011 0.006

3
0.054
0.054
0.006
0.051
0.006
0.006
0.058
0.058
0.006
0.057
0.006
0.054
0.006
0.006
0.054
0.006
0.006

1
-0.049
-0.048
0.007
-0.052
0.008
0.008
-0.061
-0.060
0.008
-0.058
0.008
-0.064
0.008
0.008
-0.064
0.008
0.008

χ23
2
0.025
0.024
0.004
0.026
0.005
0.005
0.031
0.030
0.004
0.029
0.004
0.032
0.005
0.004
0.032
0.005
0.004

3
0.025
0.024
0.004
0.026
0.005
0.005
0.031
0.030
0.005
0.029
0.004
0.032
0.005
0.004
0.032
0.005
0.004

π1 = 0, π2 = 1, ρ = 1
We cannot obtain the standard errors of average APE estimates by the two step estimation
method; the process in STATA to calculate them takes too much time to complete.

16

Table 1.2: Percentile APEs under Condition 1 and Normal distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)
1.
2.

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.181
-0.184
0.016
0.016
-0.241
-0.240
0.022
0.021
-0.240
0.022
0.021
-0.206
0.018
0.016
-0.190
0.018
0.016

10th
2
0.090
0.092
0.008
0.008
0.121
0.120
0.011
0.011
0.120
0.011
0.011
0.103
0.009
0.008
0.095
0.009
0.008

3
0.090
0.092
0.008
0.008
0.121
0.120
0.011
0.011
0.120
0.011
0.011
0.103
0.009
0.008
0.095
0.009
0.009

1
-0.171
-0.173
0.018
0.018
-0.206
-0.208
0.021
0.020
-0.208
0.021
0.020
-0.164
0.013
0.012
-0.156
0.013
0.012

25th
2
0.085
0.087
0.009
0.009
0.103
0.104
0.011
0.010
0.104
0.011
0.010
0.082
0.007
0.006
0.078
0.007
0.006

APEs at (z1 , w p ).
π1 = 0, π2 = 1, ρ = 1.

17

3
0.085
0.087
0.009
0.009
0.103
0.104
0.011
0.010
0.104
0.011
0.010
0.082
0.007
0.006
0.078
0.006
0.006

1
-0.132
-0.131
0.012
0.011
-0.133
-0.133
0.014
0.011
-0.132
0.014
0.011
-0.118
0.009
0.008
-0.118
0.009
0.008

50th
2
0.066
0.066
0.006
0.006
0.066
0.067
0.007
0.006
0.066
0.008
0.006
0.059
0.005
0.005
0.059
0.005
0.005

3
0.066
0.066
0.006
0.006
0.066
0.066
0.007
0.006
0.066
0.007
0.006
0.059
0.005
0.005
0.059
0.005
0.005

1
-0.085
-0.082
0.007
0.004
-0.073
-0.072
0.009
0.007
-0.071
0.010
0.008
-0.071
0.010
0.010
-0.080
0.010
0.009

75th
2
0.043
0.041
0.004
0.003
0.036
0.036
0.005
0.005
0.036
0.006
0.005
0.036
0.006
0.006
0.040
0.005
0.005

3
0.043
0.041
0.004
0.003
0.036
0.036
0.006
0.005
0.036
0.006
0.005
0.035
0.006
0.006
0.040
0.006
0.005

Table 1.2: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

18

1
-0.050
-0.048
0.006
0.003
-0.040
-0.039
0.007
0.005
-0.039
0.007
0.006
-0.029
0.016
0.014
-0.045
0.014
0.012

90th
2
0.025
0.024
0.004
0.003
0.020
0.020
0.005
0.005
0.019
0.006
0.005
0.014
0.008
0.008
0.023
0.008
0.007

3
0.025
0.024
0.004
0.003
0.020
0.019
0.006
0.005
0.019
0.006
0.005
0.014
0.009
0.008
0.023
0.008
0.007

Table 1.3: Percentile APEs under Condition 1 and χ23 distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.100
-0.096
0.022
0.020
-0.153
-0.134
0.033
0.029
-0.133
0.032
0.028
-0.162
0.019
0.017
-0.143
0.022
0.020

10th
2
0.050
0.048
0.011
0.010
0.076
0.067
0.017
0.015
0.067
0.016
0.015
0.081
0.010
0.009
0.072
0.011
0.010

3
0.050
0.048
0.011
0.010
0.076
0.067
0.017
0.015
0.066
0.016
0.015
0.081
0.010
0.009
0.071
0.011
0.010

1
-0.068
-0.061
0.011
0.010
-0.085
-0.081
0.013
0.012
-0.078
0.013
0.012
-0.116
0.013
0.012
-0.106
0.015
0.014

25th
2
0.034
0.031
0.006
0.005
0.043
0.041
0.007
0.007
0.039
0.007
0.007
0.058
0.007
0.007
0.053
0.008
0.007

1.

3
0.034
0.030
0.006
0.005
0.043
0.040
0.007
0.007
0.039
0.007
0.007
0.058
0.007
0.007
0.053
0.008
0.007

1
-0.038
-0.033
0.004
0.003
-0.038
-0.040
0.007
0.006
-0.039
0.007
0.006
-0.064
0.009
0.008
-0.064
0.008
0.008

50th
2
0.019
0.017
0.003
0.003
0.019
0.020
0.004
0.004
0.019
0.004
0.005
0.032
0.005
0.004
0.032
0.005
0.004

3
0.019
0.017
0.003
0.003
0.019
0.020
0.004
0.004
0.019
0.004
0.005
0.032
0.005
0.005
0.032
0.005
0.004

1
-0.018
-0.017
0.002
0.002
-0.016
-0.018
0.004
0.003
-0.018
0.004
0.004
-0.013
0.008
0.007
-0.023
0.007
0.006

75th
2
0.009
0.009
0.003
0.003
0.008
0.009
0.003
0.003
0.009
0.003
0.005
0.006
0.005
0.005
0.011
0.004
0.004

3
0.009
0.009
0.003
0.003
0.008
0.009
0.003
0.003
0.009
0.004
0.005
0.006
0.005
0.005
0.011
0.005
0.004

APEs at (z1 , w p ).
π1 = 0, π2 = 1, ρ = 1.
3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction
to its true APE.
2.

19

Table 1.3: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

20

1
-0.009
-0.009
0.002
0.001
-0.007
-0.008
0.002
0.002
-0.008
0.002
0.002
0.034
0.012
0.011
0.015
0.012
0.010

90th
2
3
0.004 0.004
0.005 0.005
0.003 0.003
0.003 0.003
0.004 0.004
0.004 0.004
0.004 0.004
0.004 0.004
0.004 0.004
0.004 0.005
0.006 0.006
-0.017 -0.017
0.007 0.007
0.006 0.006
-0.007 -0.007
0.007 0.007
0.006 0.006

−.14

−.12
ape1

−.08

100
.04

.05

.06
ape2
2Step

.07

.08

−.12
ape1

2Step
2Step(flexible)

−.1
LCF
LCF(flexible)

−.08

.06
ape3

.07

.08

LCF

60
0

20

density
40

60
20
0
−.14

.05

2Step

density
40

60
density
40
20
0
−.16

.04

LCF

80

LCF

80

2Step

−.1

80

−.16

0

20

density
40
60

80

100
80
density
40
60
20
0

0

20

density
40
60

80

100

Figure 1.1: Empirical distributions of Average APE estimates under Condition 1 and Normal distribution

.04

.05

.06
ape2

2Step
2Step(flexible)

1. The

.07
LCF
LCF(flexible)

.08

.04

.05

.06
ape3

2Step
2Step(flexible)

.07

.08

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

21

−.12

−.1
ape1

−.08

80
.03

.04

.05
ape2
2Step

.06

.07

.04

LCF

−.1

−.08

ape1
2Step
2Step(flexible)

LCF
LCF(flexible)

.07
LCF

60
0

20

density
40

60
20
0
−.12

.06

2Step

density
40

60
density
40
20
0

−.14

.05
ape3

80

LCF

80

2Step

−.06

80

−.14

0

20

density
40

60

80
60
density
40
20
0

0

20

density
40

60

80

Figure 1.2: Empirical distributions of Average APE estimates under Condition 1 and Logistic distribution

.04

.05

.06
ape2

2Step
2Step(flexible)

1. The

.07

.08

.03

.04

.05

.06

.07

.08

ape3
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

22

−.07

−.06

−.05

−.04

−.03

100
0

20

density
40
60

80

100
80
density
40
60
20
0

0

20

density
40
60

80

100

Figure 1.3: Empirical distributions of Average APE estimates under Condition 1 and χ23 distribution

−.02

.01

.02

ape1

.04
LCF

−.06
ape1

2Step
2Step(flexible)

−.05

−.04

LCF
LCF(flexible)

.04
LCF

60
0

20

density
40

60
20
0
−.07

.03
2Step

density
40

60
density
40
20

−.08

.02

80

2Step

0
−.09

.01

ape3

80

LCF

80

2Step

.03
ape2

.01

.02

.03
ape2

2Step
2Step(flexible)

1. The

.04
LCF
LCF(flexible)

.05

.01

.02

.03
ape3

2Step
2Step(flexible)

.04

.05

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

23

−.3

−.25

−.2

−.15

50
0

10

density
20
30

40

50
40
density
20
30
10
0

0

10

density
20
30

40

50

Figure 1.4: Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Normal distribution

.08

.1

ape1_p10

.16

.08

LCF
LCF(flexible)

−.25

−.2

−.15

ape1_p25
2Step
2Step(flexible)

LCF
LCF(flexible)

.12
ape3_p10

.14

2Step
2Step(flexible)

LCF
LCF(flexible)

.08

.12

.16

40
0

20

density

40
0

20

density

40
density
20
0
−.3

.1

60

2Step
2Step(flexible)

.14

60

LCF
LCF(flexible)

60

2Step
2Step(flexible)

.12
ape2_p10

.06

.08

.1
ape2_p25

2Step
2Step(flexible)

24

.12
LCF
LCF(flexible)

.14

.06

.1
ape3_p25

2Step
2Step(flexible)

LCF
LCF(flexible)

.14

−.14
ape1_p50

−.12

−.1

LCF
LCF(flexible)

−.1

−.06

80
.05

.06

.07
ape2_p50

.08

.09

2Step
2Step(flexible)

LCF
LCF(flexible)

.03

.05

.04

−.08
ape1_p75

2Step
2Step(flexible)

LCF
LCF(flexible)

−.04

.06
.07
ape3_p50

.08

.09

LCF
LCF(flexible)

60
0

20

density
40

60
0

20

density
40

60
density
40
20
0
−.12

.05

2Step
2Step(flexible)

80

2Step
2Step(flexible)

.04

80

−.16

80

−.18

0

20

density
40

60

80
60
density
40
20
0

0

20

density
40

60

80

Figure 1.4: (cont’d)

.02

.04
ape2_p75

2Step
2Step(flexible)

25

LCF
LCF(flexible)

.06

.02

.03

.04
ape3_p75

2Step
2Step(flexible)

.05
LCF
LCF(flexible)

.06

−.06

−.04
ape1_p90
2Step
2Step(flexible)

−.02
LCF
LCF(flexible)

0

80
0

20

density
40

60

80
60
density
40
20
0

0

20

density
40

60

80

Figure 1.4: (cont’d)

0

.01

.02
ape2_p90

2Step
2Step(flexible)

26

.03
LCF
LCF(flexible)

.04

0

.01

.02
ape3_p90

2Step
2Step(flexible)

.03
LCF
LCF(flexible)

.04

−.25

−.2
ape1_p10

−.1

40
.04

.06

.08

.1
ape2_p10

.12

2Step
2Step(flexible)

.14

.04

−.2

−.15
ape1_p25

2Step
2Step(flexible)

LCF
LCF(flexible)

−.1

.08

.1
ape3_p10

2Step
2Step(flexible)

.12

.14

LCF
LCF(flexible)

40
0

20

density

40
0

20

density

40
density
20
0
−.25

.06

LCF
LCF(flexible)

60

LCF
LCF(flexible)

60

2Step
2Step(flexible)

−.15

60

−.3

0

10

density
20

30

40
30
density
20
10
0

0

10

density
20

30

40

Figure 1.5: Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Logistic distribution

.04

.06

.08
ape2_p25

2Step
2Step(flexible)

27

.1

.12
LCF
LCF(flexible)

.06

.08

.1
ape3_p25

2Step
2Step(flexible)

.12
LCF
LCF(flexible)

.14

−.2

−.15

−.1

−.05

80
0

20

density
40

60

80
60
density
40
20
0

0

20

density
40

60

80

Figure 1.5: (cont’d)

.02

.04

.06
ape2_p50

ape1_p50

2Step
2Step(flexible)

−.04
LCF
LCF(flexible)

−.02

.08

.1

LCF
LCF(flexible)

60
0

20

density
40

60
20
−.08
−.06
ape1_p75

.06
ape3_p50

2Step
2Step(flexible)

0
−.1

.04

LCF
LCF(flexible)

density
40

60
density
40
20
0
−.12

.02

80

2Step
2Step(flexible)

.1

80

LCF
LCF(flexible)

80

2Step
2Step(flexible)

.08

.02

.03

.04
ape2_p75

2Step
2Step(flexible)

28

.05
LCF
LCF(flexible)

.06

.01

.02

.03

.04
ape3_p75

2Step
2Step(flexible)

.05

.06

LCF
LCF(flexible)

−.08

−.06

−.04
ape1_p90

2Step
2Step(flexible)

−.02

0

LCF
LCF(flexible)

60
0

20

density

40

60
40
density
20
0

0

20

density

40

60

Figure 1.5: (cont’d)

−.01

0

.01

.02
ape2_p90

2Step
2Step(flexible)

29

.03

.04

LCF
LCF(flexible)

−.02

0

.02
ape3_p90

2Step
2Step(flexible)

.04
LCF
LCF(flexible)

.06

−.2

−.15
ape1_p10

−.05

40
0

.05

.1

.15

.02

2Step
2Step(flexible)

LCF
LCF(flexible)

−.1
−.08
ape1_p25

2Step
2Step(flexible)

−.06
LCF
LCF(flexible)

−.04

.08
ape3_p10

.1

.12

LCF
LCF(flexible)

40
0

20

density

40
0

20

density

40
density
20

−.12

.06

2Step
2Step(flexible)

60

LCF
LCF(flexible)

0
−.14

.04

ape2_p10

60

2Step
2Step(flexible)

−.1

60

−.25

0

10

density
20

30

40
30
density
20
10
0

0

10

density
20

30

40

Figure 1.6: Empirical distribution of Percentile APEs under Condition 1 and χ23 distribution

.02

.03

.04
ape2_p25

2Step
2Step(flexible)

30

.05

.06

LCF
LCF(flexible)

.02

.03

.04
.05
ape3_p25

2Step
2Step(flexible)

.06
LCF
LCF(flexible)

.07

−.06

−.05
−.04
ape1_p50

−.02

100
0

.01

.02
ape2_p50

2Step
2Step(flexible)

.03

.04

0

−.02
ape1_p75
2Step
2Step(flexible)

−.01
LCF
LCF(flexible)

.02
ape3_p50

2Step
2Step(flexible)

.03

.04

LCF
LCF(flexible)

100
0

50

density

100
0

50

density

100
density
50
0

−.03

.01

LCF
LCF(flexible)

150

LCF
LCF(flexible)

150

2Step
2Step(flexible)

−.03

150

−.07

0

20

density
40
60

80

100
80
density
40
60
20
0

0

20

density
40
60

80

100

Figure 1.6: (cont’d)

0

.005

.01
ape2_p75

2Step
2Step(flexible)

31

.015

.02

LCF
LCF(flexible)

−.005

0

.005
.01
ape3_p75
2Step
2Step(flexible)

.015

.02

LCF
LCF(flexible)

−.02

−.015

−.01
ape1_p90

2Step
2Step(flexible)

−.005

0

200
0

50

density
100

150

200
150
density
100
50
0

0

50

density
100

150

200

Figure 1.6: (cont’d)

−.01

0

.01

.02

ape2_p90
LCF
LCF(flexible)

2Step
2Step(flexible)

32

−.01

0

.01

.02

ape3_p90
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

Condition 2: π2 < 1, ρ = 1
To see if the above results are dependent on the instrument’s strong predictive power, we
consider a condition where its predictive power is weaker than Condition 1 by generating
the data with wi = π2 zi2 + vi where π2 ∈ {0.1, 0.2, 0.5} and ri = vi + ei .
Staiger and Stock (1997) suggest a guideline for dividing weak and strong instruments
by using the first step’s F statistic, which tests the hypothesis that the instruments are not
correlated with the endogenous regressor; a threshold that they suggest is a F statistic of
10. Table 1.4 shows that the mean of F statistics testing the null hypothesis of π2 being 0
is not larger than 10 until π2 = 0.2. It also shows that about a half of 1000 replications has
the F statistic which is larger than 10 as π2 = 0.2 and every replication has it as π2 = 0.5.
Thus, according to the Staiger and Stock’s discussion, the instrument is weak as π2 = 0.1
and strong as π2 = 0.5. When π2 = 0.2, it barely manages to have properties as a strong
instrument.
Table 1.4: F statistics of the 1st step (H0 : π2 = 0)
π2
F statistics

1.

Mean
SD
( F > 10)

0.1
3.379
(3.436)
0.053

0.2
10.942
(6.665)
0.485

0.5
71.888
(19.440)
1.000

1
507.208
(80.357)
1.000

( F > 10) stands for the proportion of the F statistics being
greater than 10 among the 1000 replications.

Table 1.5 and Figure 1.7 to 1.9 illustrate that under the normal distribution, the average APE estimates by both the two methods are more biased and more volatile as π2 decreases; the weak instrument makes their approximations worse.14 But the mean squared
errors (MSEs) in Table 1.6 suggest that, for all three distributions, the LCF approach is
worse than the two step estimation method as a weak instrument is in use.

14

Figure 1.10 through 1.15 show that the results under the other distributions are similar.

33

Table 1.5: Average APEs under Condition 2 and Normal distribution
π2
g
w

True
Two Step

LCF

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.113
-0.087
0.110
-0.008
2.509
36.96
-0.121
-0.091
0.113
-0.085
0.109
0.035
2.954
43.42
0.032
3.301
47.57

0.1
2
3
1
0.056 0.056 -0.113
0.045 0.042 -0.096
0.059 0.067 0.061
0.033 -0.025 -0.085
1.511 1.318 0.105
27.92 19.25 0.088
0.061 0.061 -0.121
0.045 0.046 -0.102
0.063 0.065 0.064
0.042 0.043 -0.097
0.062 0.064 0.063
-0.036 0.001 -0.091
1.624 1.732 0.094
29.42 26.22 0.082
-0.033 0.000 -0.092
1.482 2.076 0.099
22.03 36.71 0.082

1.

0.2
2
0.057
0.048
0.032
0.043
0.052
0.045
0.061
0.051
0.033
0.049
0.034
0.045
0.044
0.041
0.046
0.046
0.041

3
0.057
0.048
0.033
0.042
0.055
0.047
0.061
0.051
0.034
0.049
0.034
0.046
0.054
0.045
0.046
0.056
0.045

1
-0.114
-0.111
0.020
-0.098
0.021
0.021
-0.122
-0.119
0.021
-0.117
0.022
-0.106
0.020
0.020
-0.106
0.020
0.019

0.5
2
0.057
0.055
0.011
0.049
0.011
0.011
0.061
0.059
0.011
0.058
0.011
0.053
0.011
0.011
0.053
0.010
0.010

3
0.057
0.055
0.011
0.049
0.011
0.011
0.061
0.059
0.011
0.058
0.012
0.053
0.011
0.011
0.053
0.011
0.010

π1 = 0, ρ = 1.
We cannot obtain the standard errors of average APE estimates by the two step estimation
method; the process in STATA to calculate them takes too much time to complete.
3. The grey colored cells indicate that at least one of the APE estimates for three choices has
the opposite direction to its true APE.
2.

34

0

.2
ape1

10

.6

−.2

−.1

0
ape2

.1

.2

2Step

LCF

.2
ape1

2Step
2Step(flexible)

.4

.6

.2
LCF

8
2
0
−.3

−.2

−.1

0

.1

ape2
LCF
LCF(flexible)

0

density
4
6

8
2
0
0

−.2
2Step

density
4
6

8
density
4
6
2
0

−.2

−.4

ape3

10

LCF

10

2Step

.4

10

−.2

0

2

4

density

6

8

10
8
6
density
4
2
0

0

2

4

density

6

8

10

Figure 1.7: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Normal distribution

2Step
2Step(flexible)

1. The

.2

−.4

−.2

0

.2

ape3
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

35

−.3

−.2

−.1

0

.1

.2

15
0

5

density

10

15
10
density
5
0

0

5

density

10

15

Figure 1.8: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Normal distribution

−.1

−.05

ape1

0

.05

.1

.15

−.1

LCF

2Step

0

.1

ape1
2Step
2Step(flexible)

LCF
LCF(flexible)

.2

LCF

10
0

5

density

10
0

5

density

10
density
5

−.1

.2

15

2Step

0

−.2

.1
ape3

15

LCF

15

2Step

−.3

0

ape2

−.1

−.05

0
ape2

2Step
2Step(flexible)

1. The

.05

.1

−.1

0

.1

.2

ape3
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

36

−.15

−.1

−.05

0

40
0

10

density
20

30

40
30
density
20
10
0

0

10

density
20

30

40

Figure 1.9: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Normal distribution

0

.02

.04
ape2

ape1

2Step
2Step(flexible)

−.05

0

.08

LCF

30
10
0
0

.02

.04

.06

.08

ape2
LCF
LCF(flexible)

.06

density
20

30
10
−.1
ape1

.04
ape3
2Step

0
−.15

.02

LCF

density
20

30
density
20
10
0
−.2

0

40

2Step

.08

40

LCF

40

2Step

.06

2Step
2Step(flexible)

1. The

LCF
LCF(flexible)

.1

0

.02

.04
ape3

2Step
2Step(flexible)

.06

.08

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

37

0

.2
ape1

.4

.8

8
−.2

−.1

0
ape2
2Step

.1

.2

.2
ape1

2Step
2Step(flexible)

.4

.6
LCF
LCF(flexible)

.8

−.2
ape3

0

.2

LCF

6
0

2

density
4

6
2
0
0

−.4

2Step

density
4

6
density
4
2
0
−.2

−.6

LCF

8

LCF

8

2Step

.6

8

−.2

0

2

density
4

6

8
6
density
4
2
0

0

2

density
4

6

8

Figure 1.10: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Logistic distribution

−.6

−.4

−.2
ape2

2Step
2Step(flexible)

1. The

0

.2

−.4

−.2

0

.2

ape3
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

38

−.3

−.2

−.1

0

.1

.2

15
0

5

density

10

15
10
density
5
0

0

5

density

10

15

Figure 1.11: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Logistic distribution

−.15

−.1

−.05

ape1

0

.05

.1

−.1

LCF

2Step

0

.1

ape1
2Step
2Step(flexible)

.2

.15

LCF

10
5
0
−.1

−.05

0

.05

.1

ape2
LCF
LCF(flexible)

.1

density

10
0

5

density

10
density
5

−.1

.05

15

2Step

0

−.2

0
ape3

15

LCF

15

2Step

−.3

−.05

ape2

2Step
2Step(flexible)

1. The

LCF
LCF(flexible)

.15

−.2

−.1

0
ape3

2Step
2Step(flexible)

.1

.2

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

39

−.2

30
0

10

density

20

30
20
density
10
0

0

10

density

20

30

Figure 1.12: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Logistic distribution

0

0

.02

.04
ape2

ape1

.08

LCF

2Step
2Step(flexible)

−.05

0

.08

.1

LCF

30
10
0
0

.02

.04

.06

.08

ape2
LCF
LCF(flexible)

.06

density
20

30
10
0
−.1
ape1

.04
2Step

density
20

30
density
20
10

−.15

.02

40

2Step

0
−.2

0

ape3

40

LCF

40

2Step

.06

2Step
2Step(flexible)

1. The

LCF
LCF(flexible)

.1

0

.02

.04
ape3

2Step
2Step(flexible)

.06

.08

.1

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

40

−.5

0

.5

8
0

2

density
4

6

8
6
density
4
2
0

0

2

density
4

6

8

Figure 1.13: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and χ23 distribution

1

−.6

−.4

ape1

0

1

2Step
2Step(flexible)

LCF
LCF(flexible)

0

.2

LCF

8
0

2

density
4
6

8
2
.5
ape1

−.2
ape3
2Step

0
0

−.4

LCF

density
4
6

8
density
4
6
2
0
−.5

−.6

10

2Step

.2

10

LCF

10

2Step

−.2
ape2

−.6

−.4

−.2
ape2

2Step
2Step(flexible)

1. The

0

.2
LCF
LCF(flexible)

−.6

−.4

−.2
ape3

2Step
2Step(flexible)

0

.2

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

41

−.1

0
ape1

.2

20
−.1

0

.1

.2

2Step

2Step
2Step(flexible)

.1
LCF
LCF(flexible)

.2

.05

.1

LCF

15
0

5

density
10

15
5
0
0
ape1

0
ape3
2Step

density
10

15
density
10
5

−.1

−.05

LCF

20

LCF

0
−.2

−.1

ape2

20

2Step

.1

20

−.2

0

5

density
10

15

20
15
density
10
5
0

0

5

density
10

15

20

Figure 1.14: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and χ23 distribution

−.1

−.05

0
ape2

2Step
2Step(flexible)

1. The

.05

.1

−.1

0

.1

.2

ape3
LCF
LCF(flexible)

2Step
2Step(flexible)

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

42

−.1

−.05

0

.05

50
0

10

density
20
30

40

50
40
density
20
30
10
0

0

10

density
20
30

40

50

Figure 1.15: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and χ23 distribution

−.02

0

ape1

.06

2Step
2Step(flexible)

0
LCF
LCF(flexible)

.05

.04

.06

LCF

40
0

10

density
20
30

40
10
−.05
ape1

.02
ape3
2Step

0
−.1

0

LCF

density
20
30

40
density
20
30
10
0
−.15

−.02

50

2Step

.04

50

LCF

50

2Step

.02
ape2

−.02

0

.02
ape2

2Step
2Step(flexible)

1. The

.04
LCF
LCF(flexible)

.06

−.02

0

.02
ape3

2Step
2Step(flexible)

.04

.06

LCF
LCF(flexible)

distributions in the first row are those of the estimates when the model includes only w and those in the second row
are when the model includes w2 .

43

Table 1.6: Mean Squared Errors of Average APE estimates under Condition 2
1

0.1
2

3

Two Step
LCF
Two Step
Two Step (Flexible)
LCF
LCF (Flexible)

0.013
6.305
0.014
0.013
8.752
10.923

0.004
2.285
0.004
0.004
2.647
2.204

0.005
1.743
0.004
0.004
3.003
4.314

Two Step
LCF
Two Step
Two Step (Flexible)
LCF
LCF (Flexible)

0.015
9.419
0.016
0.015
15.493
12.334

0.004
1.312
0.005
0.005
5.048
3.486

0.005
5.220
0.005
0.004
3.587
3.412

Two Step
LCF
Two Step
Two Step (Flexible)
LCF
LCF (Flexible)

0.011
2.386
0.010
0.011
5.328
5.039

0.003
0.855
0.004
0.004
2.061
1.914

0.004
0.538
0.003
0.004
1.312
1.272

π2
g
w
w2

w
w2

w
w2

1.

0.2
1
2
Normal
0.004 0.001
0.012 0.003
0.004 0.001
0.005 0.001
0.010 0.002
0.011 0.002
Logistic
0.005 0.001
0.013 0.003
0.006 0.001
0.006 0.001
0.012 0.003
0.012 0.003
χ23
0.002 0.001
0.007 0.002
0.002 0.001
0.003 0.001
0.004 0.001
0.004 0.001

3

1

0.5
2

0.001
0.003
0.001
0.001
0.003
0.003

0.000
0.001
0.000
0.000
0.001
0.001

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.001
0.003
0.002
0.002
0.003
0.004

0.001
0.001
0.001
0.001
0.001
0.001

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.001
0.002
0.001
0.001
0.001
0.001

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

0.000
0.000
0.000
0.000
0.000
0.000

3

1

1
2

3

The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction
to its true APE.

44

Table 1.7 through 1.1215 show that percentile APE estimates have similar patters with
those in average APE estimates. Plus, the tables presents that the weak instrument causes
huge standard errors at some replications.16 The standard errors significantly decrease
as π2 rises. As in the average APE estimates, Table 1.13 shows that the weak instrument
causes the LCF approach to have much worse approximation than the two step estimation
method.
Under Condition 2, the results does not provide enough evidence that including vi in a
flexible way helps their approximations. Actually, with the weak instrument, it makes the
two methods’ approximations worse; it causes the two step estimation method to have
enormous standard errors at several replications and the two methods to have more sign
distortions.
Hence the results under Condition 2 demonstrate that the quality of an instrument
affects the two methods’ approximation performances, and that the two step estimation
method is less sensitive to a weak instrument than the LCF approach. Plus, it is better not
to include the additional terms of the control function for their approximations if a weak
instrument is used.

15

The logistic distribution has similar results to the normal distribution.
There are big differences between the medians and the means of the standard errors except the two
step estimation.
16

45

Table 1.7: Percentile APEs under Condition 2, π2 = 0.1, and Normal distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)

LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
SE*
Mean
SD
SE
SE*
Mean
SD
SE
SE*

1
-0.164
-0.125
0.122
0.232
-0.219
-0.150
0.125
0.237
-0.146
0.147
8E+04
0.155
-0.063
2.953
43.42
0.106
-0.009
3.467
47.77
0.168

10th
2
0.082
0.064
0.063
0.132
0.110
0.075
0.068
0.156
0.074
0.084
2E+13
0.085
0.013
1.624
29.42
0.058
-0.012
1.563
22.13
0.088

3
0.082
0.061
0.069
0.148
0.110
0.075
0.069
0.158
0.073
0.085
6E+11
0.081
0.050
1.732
26.22
0.057
0.021
2.148
36.81
0.090

1
-0.156
-0.160
0.173
0.485
-0.188
-0.182
0.180
0.367
-0.177
0.180
5E+07
0.180
-0.016
2.954
43.42
0.105
0.011
3.385
47.65
0.130

1.

25th
2
0.078
0.082
0.088
0.259
0.094
0.090
0.096
0.258
0.088
0.099
7E+12
0.094
-0.010
1.624
29.41
0.057
-0.022
1.521
22.07
0.068

3
0.078
0.078
0.097
0.353
0.094
0.092
0.096
0.224
0.089
0.101
3E+11
0.093
0.027
1.732
26.22
0.057
0.011
2.113
36.75
0.069

1
-0.127
-0.112
0.147
0.355
-0.127
-0.112
0.147
0.305
-0.110
0.154
8E+09
0.134
0.035
2.954
43.42
0.105
0.032
3.299
47.57
0.099

50th
2
0.064
0.058
0.078
0.269
0.064
0.055
0.084
0.211
0.054
0.088
3E+12
0.071
-0.036
1.624
29.41
0.057
-0.032
1.481
22.03
0.055

3
0.064
0.054
0.088
0.246
0.064
0.056
0.083
0.200
0.056
0.087
9E+10
0.075
0.001
1.732
26.22
0.057
0.001
2.075
36.71
0.055

APEs at (z1 , w p ).
π1 = 0, ρ = 1.
3. SE* is the median of the standard errors.
4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite
direction to its true APE.

2.

46

Table 1.7: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)

LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
SE*
Mean
SD
SE
SE*
Mean
SD
SE
SE*

1
-0.089
-0.044
0.092
0.456
-0.075
-0.035
0.094
0.294
-0.031
0.116
2E+09
0.094
0.087
2.955
43.42
0.105
0.054
3.244
47.58
0.100

75th
2
0.044
0.023
0.056
0.489
0.037
0.016
0.058
0.284
0.015
0.070
1E+12
0.061
-0.062
1.624
29.41
0.057
-0.044
1.457
22.04
0.056

47

3
0.044
0.020
0.063
0.438
0.037
0.018
0.062
0.273
0.016
0.069
5E+10
0.060
-0.025
1.732
26.22
0.057
-0.010
2.050
36.71
0.058

90th
1
2
-0.057 0.028
-0.014 0.008
0.058 0.043
0.213 0.235
-0.044 0.022
-0.003 0.001
0.062 0.044
0.251 0.178
0.010 -0.005
0.098 0.065
2E+14 5E+12
0.087 0.068
0.134 -0.085
2.955 1.625
43.42 29.42
0.106 0.058
0.074 -0.054
3.220 1.449
47.65 22.08
0.127 0.073

3
0.028
0.006
0.045
0.193
0.022
0.003
0.046
0.190
-0.005
0.065
2E+11
0.069
-0.049
1.732
26.22
0.057
-0.020
2.039
36.75
0.074

Table 1.8: Percentile APEs under Condition 2, π2 = 0.2, and Normal distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.164
-0.144
0.089
0.088
-0.220
-0.182
0.092
0.090
-0.182
0.104
152.5
-0.189
0.097
0.085
-0.152
0.159
0.120

10th
2
0.082
0.072
0.045
0.045
0.110
0.091
0.046
0.046
0.091
0.055
13.09
0.094
0.046
0.042
0.076
0.078
0.060

3
0.082
0.072
0.046
0.045
0.110
0.091
0.047
0.046
0.091
0.056
217.2
0.095
0.055
0.046
0.076
0.083
0.063

1
-0.157
-0.147
0.101
0.102
-0.188
-0.174
0.108
0.108
-0.174
0.108
43.23
-0.143
0.095
0.083
-0.124
0.123
0.099

25th
2
0.078
0.074
0.051
0.052
0.094
0.087
0.055
0.055
0.087
0.056
0.626
0.071
0.045
0.041
0.062
0.059
0.049

1.

3
0.078
0.073
0.052
0.052
0.094
0.087
0.055
0.055
0.087
0.057
6.530
0.072
0.054
0.045
0.062
0.067
0.052

1
-0.127
-0.111
0.074
0.072
-0.127
-0.112
0.075
0.073
-0.110
0.078
5.656
-0.091
0.094
0.082
-0.092
0.098
0.082

50th
2
0.064
0.056
0.039
0.039
0.064
0.056
0.039
0.039
0.055
0.041
0.089
0.045
0.044
0.041
0.046
0.046
0.041

3
0.064
0.055
0.040
0.039
0.064
0.056
0.041
0.039
0.055
0.042
0.457
0.046
0.054
0.045
0.046
0.056
0.045

1
-0.089
-0.065
0.045
0.039
-0.075
-0.055
0.048
0.043
-0.050
0.062
0.759
-0.040
0.094
0.082
-0.061
0.103
0.078

75th
2
0.044
0.033
0.026
0.027
0.037
0.027
0.027
0.028
0.025
0.034
0.066
0.020
0.044
0.041
0.030
0.050
0.041

3
0.044
0.032
0.028
0.027
0.037
0.028
0.029
0.028
0.025
0.035
0.243
0.020
0.054
0.045
0.031
0.059
0.045

APEs at (z1 , w p ).
π1 = 0, ρ = 1.
3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction
to its true APE.
2.

48

Table 1.8: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

49

1
-0.057
-0.037
0.033
0.033
-0.043
-0.024
0.038
0.037
-0.013
0.063
8.476
0.007
0.095
0.084
-0.032
0.130
0.088

90th
2
3
0.028 0.028
0.019 0.018
0.022 0.023
0.024 0.025
0.022 0.022
0.012 0.012
0.024 0.025
0.026 0.025
0.007 0.006
0.037 0.037
0.256 0.690
-0.004 -0.003
0.045 0.054
0.042 0.046
0.016 0.016
0.065 0.072
0.048 0.052

Table 1.9: Percentile APEs under Condition 2, π2 = 0.5, and Normal distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)
1.
2.

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.168
-0.164
0.037
0.037
-0.224
-0.213
0.041
0.039
-0.214
0.044
0.042
-0.202
0.028
0.026
-0.168
0.036
0.032

10th
2
0.084
0.082
0.019
0.019
0.112
0.106
0.020
0.020
0.107
0.022
0.022
0.101
0.014
0.014
0.084
0.018
0.017

3
0.084
0.082
0.019
0.019
0.112
0.107
0.020
0.020
0.107
0.022
0.022
0.101
0.015
0.014
0.084
0.018
0.017

1
-0.159
-0.158
0.040
0.039
-0.192
-0.188
0.044
0.043
-0.188
0.044
0.042
-0.157
0.024
0.022
-0.139
0.027
0.025

25th
2
0.080
0.079
0.020
0.020
0.096
0.094
0.022
0.022
0.094
0.023
0.022
0.078
0.012
0.012
0.069
0.014
0.013

APEs at (z1 , w p ).
π1 = 0, ρ = 1.

50

3
0.080
0.079
0.020
0.020
0.096
0.094
0.022
0.022
0.094
0.023
0.022
0.078
0.012
0.012
0.070
0.014
0.013

1
-0.129
-0.125
0.026
0.025
-0.129
-0.126
0.027
0.026
-0.125
0.028
0.026
-0.106
0.021
0.020
-0.106
0.020
0.019

50th
2
0.064
0.062
0.014
0.013
0.064
0.063
0.014
0.014
0.062
0.015
0.014
0.053
0.011
0.011
0.053
0.010
0.010

3
0.064
0.062
0.014
0.013
0.064
0.063
0.015
0.014
0.062
0.015
0.014
0.053
0.011
0.011
0.053
0.011
0.010

1
-0.088
-0.083
0.010
0.009
-0.075
-0.072
0.014
0.013
-0.069
0.018
0.017
-0.056
0.021
0.021
-0.074
0.020
0.018

75th
2
0.044
0.041
0.007
0.007
0.037
0.036
0.009
0.009
0.035
0.011
0.010
0.028
0.011
0.011
0.037
0.011
0.010

3
0.044
0.041
0.008
0.007
0.037
0.036
0.009
0.009
0.035
0.011
0.010
0.028
0.012
0.011
0.037
0.011
0.010

Table 1.9: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

51

1
-0.055
-0.052
0.007
0.006
-0.043
-0.041
0.010
0.010
-0.037
0.018
0.016
-0.010
0.026
0.024
-0.044
0.026
0.021

90th
2
0.028
0.026
0.007
0.007
0.021
0.021
0.008
0.008
0.019
0.012
0.012
0.005
0.013
0.013
0.022
0.014
0.013

3
0.028
0.026
0.007
0.007
0.021
0.020
0.008
0.008
0.018
0.012
0.011
0.005
0.014
0.013
0.022
0.015
0.013

Table 1.10: Percentile APEs under Condition 2, π2 = 0.1, and χ23 distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)

LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
SE*
Mean
SD
SE
SE*
Mean
SD
SE
SE*

1
-0.097
-0.133
0.160
0.237
-0.145
-0.146
0.164
0.271
-0.136
0.192
4E+25
0.209
-0.058
2.305
41.22
0.090
0.019
2.401
37.35
0.221

10th
2
0.049
0.069
0.083
0.886
0.072
0.073
0.087
3.391
0.070
0.103
5E+21
0.117
0.028
1.434
27.90
0.051
-0.008
1.420
26.65
0.113

3
0.049
0.064
0.089
0.877
0.072
0.072
0.087
3.362
0.067
0.107
1E+20
0.116
0.030
1.144
20.46
0.051
-0.011
1.238
19.09
0.116

1
-0.070
-0.093
0.150
0.268
-0.087
-0.118
0.160
0.458
-0.100
0.155
1E+29
0.136
0.002
2.305
41.22
0.089
0.041
2.302
37.13
0.152

1.

25th
2
0.035
0.049
0.078
0.685
0.044
0.059
0.089
3.292
0.051
0.090
2E+22
0.078
-0.002
1.434
27.90
0.050
-0.019
1.391
26.55
0.079

3
0.035
0.044
0.089
0.612
0.044
0.059
0.086
3.084
0.049
0.087
4E+21
0.079
0.000
1.144
20.46
0.051
-0.021
1.175
18.98
0.082

1
-0.043
-0.030
0.109
0.306
-0.043
-0.044
0.109
0.434
-0.041
0.140
1E+27
0.126
0.069
2.305
41.22
0.089
0.063
2.241
36.96
0.090

50th
2
0.022
0.017
0.065
0.573
0.022
0.023
0.069
1.627
0.021
0.081
1E+31
0.072
-0.035
1.434
27.90
0.050
-0.031
1.383
26.47
0.051

3
0.022
0.013
0.071
0.674
0.022
0.021
0.065
1.667
0.020
0.078
6E+29
0.074
-0.034
1.144
20.46
0.051
-0.032
1.124
18.91
0.050

APEs at (z1 , w p ).
π1 = 0, ρ = 1.
3. SE* is the median of the standard errors.
4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite
direction to its true APE.

2.

52

Table 1.10: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)

LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
SE*
Mean
SD
SE
SE*
Mean
SD
SE
SE*

75th
1
2
3
-0.023 0.012 0.012
0.006 -0.001 -0.004
0.081 0.053 0.057
0.503 0.436 0.421
-0.020 0.010 0.010
-0.003 0.002 0.001
0.078 0.056 0.052
0.525 1.332 0.998
0.020 -0.010 -0.010
0.136 0.081 0.076
4E+24 5E+30 2E+29
0.150 0.108 0.113
0.136 -0.069 -0.067
2.304 1.434 1.144
41.22 27.90 20.46
0.089 0.050 0.050
0.087 -0.044 -0.043
2.248 1.400 1.111
37.01 26.50 18.95
0.076 0.049 0.049

53

90th
1
2
3
-0.012 0.006 0.006
0.014 -0.007 -0.007
0.057 0.041 0.044
0.164 0.380 0.339
-0.010 0.005 0.005
0.007 -0.003 -0.004
0.051 0.042 0.041
0.561 0.537 0.947
0.048 -0.025 -0.023
0.122 0.075 0.073
2E+28 2E+30 4E+28
0.189 0.162 0.169
0.196 -0.099 -0.097
2.304 1.434 1.144
41.22 27.90 20.46
0.089 0.051 0.051
0.109 -0.056 -0.053
2.308 1.435 1.131
37.22 26.59 19.05
0.125 0.075 0.073

Table 1.11: Percentile APEs under Condition 2, π2 = 0.2, and χ23 distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.097
-0.111
0.113
0.107
-0.145
-0.134
0.124
0.120
-0.130
0.127
0.817
-0.183
0.069
0.062
-0.130
0.185
0.132

10th
2
0.049
0.056
0.057
0.055
0.072
0.067
0.063
0.061
0.065
0.067
15.77
0.091
0.036
0.033
0.065
0.095
0.068

3
0.049
0.056
0.057
0.055
0.072
0.067
0.063
0.062
0.065
0.067
25.78
0.092
0.039
0.035
0.065
0.094
0.069

1
-0.070
-0.066
0.069
0.066
-0.087
-0.091
0.080
0.077
-0.080
0.069
0.312
-0.123
0.067
0.061
-0.096
0.122
0.095

25th
2
0.035
0.033
0.037
0.036
0.044
0.045
0.042
0.041
0.040
0.038
15.56
0.062
0.035
0.032
0.048
0.062
0.049

1.

3
0.035
0.033
0.037
0.036
0.044
0.045
0.042
0.041
0.040
0.039
34.77
0.062
0.038
0.034
0.048
0.062
0.050

1
-0.043
-0.027
0.038
0.031
-0.043
-0.041
0.036
0.029
-0.037
0.060
0.420
-0.057
0.066
0.060
-0.058
0.064
0.059

50th
2
0.021
0.014
0.024
0.022
0.022
0.020
0.022
0.021
0.018
0.033
13.94
0.028
0.034
0.032
0.029
0.033
0.032

3
0.021
0.014
0.023
0.022
0.022
0.021
0.023
0.022
0.019
0.033
49.29
0.029
0.037
0.034
0.029
0.036
0.033

1
-0.023
-0.009
0.032
0.024
-0.020
-0.016
0.027
0.023
0.001
0.071
19.58
0.009
0.065
0.060
-0.019
0.075
0.046

75th
2
3
0.011 0.011
0.005 0.004
0.022 0.020
0.021 0.021
0.010 0.010
0.008 0.008
0.019 0.020
0.021 0.020
-0.001 0.000
0.038 0.039
12.11 65.85
-0.005 -0.004
0.033 0.037
0.032 0.034
0.009 0.010
0.039 0.042
0.028 0.030

APEs at (z1 , w p ).
π1 = 0, ρ = 1.
3. SE* is the median of the standard errors.
4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to
its true APE.
2.

54

Table 1.11: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

55

1
-0.012
-0.002
0.029
0.024
-0.010
-0.006
0.023
0.022
0.025
0.085
4143
0.069
0.065
0.061
0.016
0.129
0.070

90th
2
3
0.006 0.006
0.001 0.001
0.021 0.020
0.021 0.021
0.005 0.005
0.003 0.003
0.018 0.018
0.020 0.021
-0.013 -0.012
0.047 0.046
490.5 400.8
-0.035 -0.034
0.033 0.037
0.033 0.035
-0.008 -0.008
0.068 0.069
0.041 0.042

Table 1.12: Percentile APEs under Condition 2, π2 = 0.5, and χ23 distribution
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

1
-0.098
-0.096
0.045
0.042
-0.147
-0.123
0.056
0.052
-0.124
0.054
0.054
-0.184
0.026
0.023
-0.138
0.046
0.040

10th
2
0.049
0.048
0.023
0.021
0.073
0.062
0.028
0.027
0.062
0.028
0.028
0.092
0.014
0.012
0.069
0.024
0.021

3
0.049
0.048
0.023
0.021
0.073
0.062
0.028
0.027
0.062
0.028
0.028
0.092
0.014
0.012
0.069
0.024
0.021

1
-0.070
-0.063
0.023
0.021
-0.087
-0.085
0.028
0.025
-0.079
0.025
0.029
-0.127
0.021
0.019
-0.103
0.032
0.028

25th
2
0.035
0.031
0.012
0.012
0.044
0.043
0.014
0.014
0.039
0.014
0.016
0.063
0.011
0.010
0.051
0.016
0.015

1.

3
0.035
0.031
0.012
0.012
0.044
0.043
0.015
0.014
0.040
0.014
0.015
0.064
0.011
0.010
0.052
0.016
0.015

1
-0.042
-0.036
0.008
0.007
-0.042
-0.047
0.010
0.009
-0.042
0.014
0.025
-0.064
0.018
0.016
-0.064
0.017
0.016

50th
2
0.021
0.018
0.006
0.006
0.021
0.023
0.007
0.007
0.021
0.008
0.013
0.032
0.010
0.009
0.032
0.010
0.009

3
0.021
0.018
0.006
0.006
0.021
0.023
0.007
0.007
0.021
0.009
0.013
0.032
0.010
0.009
0.032
0.010
0.009

1
-0.022
-0.019
0.004
0.003
-0.019
-0.022
0.006
0.005
-0.019
0.009
0.045
-0.001
0.017
0.017
-0.025
0.013
0.010

75th
2
0.011
0.010
0.005
0.005
0.010
0.011
0.006
0.006
0.010
0.007
0.019
0.000
0.010
0.009
0.012
0.009
0.007

3
0.011
0.010
0.005
0.005
0.010
0.011
0.006
0.006
0.010
0.007
0.019
0.000
0.010
0.009
0.012
0.009
0.007

APEs at (z1 , w p ).
π1 = 0, ρ = 1.
3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction
to its true APE.
2.

56

Table 1.12: (cont’d)
wp
g
w

True
Two Step

w2

True
Two Step

Two Step
(Flexible)
LCF

LCF
(Flexible)

Mean
Mean
SD
SE
Mean
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE
Mean
SD
SE

57

1
-0.011
-0.011
0.003
0.003
-0.009
-0.010
0.004
0.004
-0.008
0.009
0.123
0.056
0.020
0.019
0.011
0.023
0.017

90th
2
3
0.006 0.006
0.006 0.006
0.005 0.005
0.005 0.005
0.005 0.005
0.005 0.005
0.006 0.006
0.006 0.006
0.004 0.004
0.009 0.009
0.040 0.040
-0.028 -0.028
0.011 0.011
0.011 0.011
-0.005 -0.005
0.014 0.014
0.011 0.011

Table 1.13: Mean Squared Errors of Percentile APE estimates under Condition 2

π2

g

1

10th
2

0.1

Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)

0.020
0.027
8.747
12.06
0.010
0.012
0.010
0.030
0.002
0.002
0.001
0.004

0.006
0.008
2.646
2.457
0.002
0.003
0.002
0.007
0.000
0.001
0.000
0.001

0.006
0.009
3.002
4.621
0.003
0.003
0.003
0.008
0.000
0.001
0.000
0.001

Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)

0.027
0.037
5.321
5.793
0.015
0.016
0.006
0.034
0.004
0.003
0.002
0.002

0.008
0.011
2.060
2.023
0.004
0.004
0.002
0.009
0.001
0.001
0.001
0.001

0.008
0.012
1.310
1.539
0.004
0.005
0.002
0.009
0.001
0.001
0.001
0.001

wp

w2

0.2

0.5

w2

0.1

0.2

0.5

1.
2.

3

25th
1
2
Normal
0.032 0.009
0.033 0.010
8.753 2.648
11.50 2.328
0.012 0.003
0.012 0.003
0.011 0.003
0.019 0.005
0.002 0.001
0.002 0.001
0.002 0.000
0.004 0.001
χ23
0.027 0.008
0.024 0.008
5.320 2.059
5.316 1.938
0.006 0.002
0.005 0.001
0.006 0.002
0.015 0.004
0.001 0.000
0.001 0.000
0.002 0.001
0.001 0.000

3

1

50th
2

3

1

75th
2

3

0.009
0.010
3.003
4.470
0.003
0.003
0.003
0.005
0.001
0.001
0.000
0.001

0.022 0.007
0.024 0.008
8.754 2.648
10.91 2.202
0.006 0.002
0.006 0.002
0.010 0.002
0.011 0.002
0.001 0.000
0.001 0.000
0.001 0.000
0.001 0.000

0.007
0.008
3.003
4.308
0.002
0.002
0.003
0.003
0.000
0.000
0.000
0.000

0.010 0.004 0.004
0.015 0.005 0.005
8.757 2.649 3.004
10.54 2.130 4.206
0.003 0.001 0.001
0.004 0.001 0.001
0.010 0.002 0.003
0.011 0.003 0.004
0.000 0.000 0.000
0.000 0.000 0.000
0.001 0.000 0.000
0.000 0.000 0.000

0.008
0.008
1.310
1.385
0.002
0.002
0.002
0.004
0.000
0.000
0.001
0.000

0.012 0.005
0.020 0.006
5.324 2.060
5.035 1.917
0.001 0.000
0.004 0.001
0.005 0.001
0.004 0.001
0.000 0.000
0.000 0.000
0.001 0.000
0.001 0.000

0.004
0.006
1.311
1.266
0.001
0.001
0.001
0.001
0.000
0.000
0.000
0.000

0.006
0.020
5.335
5.064
0.001
0.005
0.005
0.006
0.000
0.000
0.001
0.000

0.003
0.007
2.063
1.964
0.000
0.002
0.001
0.002
0.000
0.000
0.000
0.000

0.003
0.006
1.314
1.238
0.000
0.002
0.002
0.002
0.000
0.000
0.000
0.000

The Mean Squared Errors are calculated from Table 1.7 through 1.12.
The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its
true APE.
58

Table 1.13: (cont’d)
1

90th
2

3

0.005
0.013
8.765
10.38
0.002
0.005
0.012
0.017
0.000
0.000
0.002
0.001

0.002
0.005
2.651
2.104
0.001
0.002
0.003
0.004
0.000
0.000
0.000
0.000

0.002
0.005
3.006
4.158
0.001
0.002
0.004
0.005
0.000
0.000
0.000
0.000

0.003
0.006
5.353
5.340
0.001
0.008
0.010
0.017
0.000
0.000
0.005
0.001

0.002
0.006
2.067
2.063
0.000
0.003
0.003
0.005
0.000
0.000
0.001
0.000

0.002

wp
π2

g
Normal

w2

0.1

0.2

0.5

w2

0.1

0.2

0.5

Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)
χ23
Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)
Two step
Two step (Flexible)
LCF
LCF (Flexible)

1.318
1.282
0.000
0.002
0.003
0.005
0.000
0.000
0.001
0.000

Condition 3: ρ < 1
We also generate the data by allowing the amount of endogeneity to be smaller than
Conditions 1 and 2: wi = π2 zi2 + vi where π2 ∈ {0.1, 0.2, 0.5, 1} and ri = ρvi + ei where
ρ ∈ {0.1, 0.5} Although a fewer sign distortions are observed than ρ = 1, the previous
results remain almost the same, in general.
In summary, the simulations under Condition 1 through Condition 3 demonstrate that
the two step estimation method with a strong instrument provides a good approxima-

59

tion even though its conditional mean is misspecified, and that it outperforms the LCF
approach regardless of the instrument’s quality and the amount of endogeneity. Furthermore, in the simulations, adding v2i and v3i in the estimation does not improve the two
methods’ approximations.17

1.4

APPLICATION: MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM MATH TEST

We apply the two step estimation method to Michigan Educational Assessment Program
(MEAP) data of the school year 2004/2005 in order to estimate the effects of spending
on the fourth grade math test outcome. The fourth grade MEAP math test is a statewide
assessment test given by the State Board of Education in Michigan. It measures public
school student achievement in relation to Michigan curriculum standards, which groups
of educators, teachers and school administrators set. A student’s outcome is rated at one
of the four levels as described in Table 1.14 and public school districts’ percentage shares
of students for the four levels are available from the Michigan Department of Education
(MDE) website.18
Table 1.14: Four levels of MEAP
Outcome
Level 1
Level 2
Level 3
Level 4
1.

Description
Exceeded Michigan Standards
Met Michigan Standards
demonstrated Basic knowledge and skills of Michigan Standards
Apprentice level, showing little success in meeting Michigan standards

The description is from Michgian Department of Education (2005).

Papke (2005, 2008), Papke and Wooldridge (2008), and Roy (2011) examine the relationship between spending and pass rates of this test by using panel data. During their
data periods, the test had three performance levels (Satisfactory, Moderate, Low) and
17
18

That the simulations model ri as a linear function of vi could be one of the reasons.
❤tt♣✿✴✴✇✇✇✳♠✐❝❤✐❣❛♥✳❣♦✈✴♠❞❡✴

60

their pass rates indicate the percentage of students in the satisfactory level. They find
that there exist significant positive causal effects of spending on the pass rates although
their magnitudes are different. The application in this chapter can help understand how
spending shifts students in the four different levels instead of between pass and fail.
We use district-level data including 512 districts.19 We turn these districts’ percentage
shares into proportions to obtain fractional dependent variables.20 Table 1.15 illustrates
the dependent variables’ summary statistics. While the lower corner 0 appears for all of
the four levels, the upper corner 1 appears only for level 1 like the dependent variables
generated in the simulations. We choose the first level as the reference as in the simulations.
Table 1.15: Summary statistics of the dependent variables
Variable
y1
y2
y3
y4
Total

Mean
0.283
0.463
0.221
0.033
1.000

SD
0.138
0.082
0.098
0.035

Min
0
0
0
0

Max
1.000
0.742
0.643
0.278

Description
fraction of Level 1
fraction of Level 2
fraction of Level 3
fraction of Level 4

The key explanatory variable, spending, is constructed as per pupil general fund expenditure in a logarithmic form, log(per pupil GF expenditure). Although we additionally
control for the fraction of applications for the free and reduced-price lunch program as
a measure of the poverty rate and log(enrollment) as a measure of school district size, we
still suspect that spending is endogenous. There can be unobserved district effects such as
parental involvement, which are correlated with spending and are able to affect student
outcome as well. Thus we need an instrument to achieve more accurate estimation for
the effects of spending on the student test outcome.
To find an instrument, we exploit Michigan’s school funding system reform in 1994,
which is called “Proposal A.” The reform changed Michigan’s school funding sources
19

In the year of 2004/2005, Michigan has 552 public school districts.
The original percentage shares for some districts do not sum to 100 because of rounding errors. Thus
We calculate the proportions not based on 100, but based on total percentage shares of the four levels.
20

61

and started to provide school districts with foundation allowances. It resulted in not
only significantly raising districts spending, but also reducing the spending inequalities
across districts by letting the low spending districts’ foundation allowances increase faster
than the others. The initial foundation allowance for a district, which was awarded in
1994/1995, was determined based on its per pupil spending in 1993/1994 and the dollar
increases for the following years has been decided solely by the state legislature. Therefore per pupil foundation allowance in 2004/2005 meets the requirements of an instrument for spending if spending in 1993/1994 is controlled for.
Letting zi1 = (log(enrollmenti ) freelunchi spending93i ), we apply Procedure 1.2.1 to the
model
K g (hi ; θ) =

exp hi θg
∑4h exp (hi θh )

,

∀ g = 1, · · · , 4

(1.26)

where hi = (zi1 spendingi vi ), freelunchi represents the fraction of applications for the
free and reduced-price lunch program, and spending93i is spending in 1993/1994. The
spending variable’s reduced form is expressed as

spendingi = zi1 π 1 + π2 foundationi + vi

(1.27)

where foundationi = log(per pupil foundation allowancei ). Table 1.16 includes summary
statistics of the data.
Table 1.16: Summary statistics of the data
Variable
enrollment
fraction of applications for free and reduced lunch
per pupil expenditure in 2004/2005
per pupil expenditure in 1994/1995
per pupil foundation allowance
# of districts

Mean (SD)
3127.186 (6972.843)
0.354 (0.176)
8086.428 (1090.264)
4901.967 (943.182)
6979.738 (655.772)
512

Table 1.17 contains the first step estimation result. The foundation variable’s t statistic
shows that spending is correlated with the foundation allowance, netting out the other
62

explanatory variables. In addition, the F statistic suggests that it can be declared as a
strong instrument according to Staiger and Stock (1997).
Table 1.17: The first step estimation result
log(enrollment)
freelunch
spending93
foundation
constant
R2
F ( H0 : foundation = 0)

coefficient
0.004
0.323
0.180
0.738
0.787
0.625
37.28

SE
0.005
0.024
0.053
0.121
0.706

t
0.860
13.230
3.390
6.110
1.110

p-values
0.391
0.000
0.001
0.000
0.266

Table 1.18 reports the average APE estimates by the two methods. The two step estimation method provides statistically significant effects on level 1 and level 3. When the
quadratic effect of spending is not allowed for, a 10% increase in per pupil expenditure
- an increase in spending by 0.1 - causes the fractions of student at level 1 and level 3
to raise 6.8 percentage points and decrease 6.2 percentage points, respectively.21 With
the quadratic effect, the magnitudes of the effects are slightly larger. The LCF approach
yields similar result except that the fraction at level 4 is also statistically significant. The
magnitudes of the effects are bigger than those by the two step estimation method. Without the quadratic effect, its estimated effect of a 10% increase in per pupil expenditure on
the fraction at level 1 is 7.7 percentage points and those on the fractions at level 3 and 4
are -6.3 and -1.0 percentage points, respectively. With the quadratic effect, those effects
become larger. Including the flexible forms of vi causes the estimated effects by both the
methods to be smaller except that on level 4 by the LCF approach.
Table 1.19 presents that the two methods’ percentile APE estimates for level 1 and level
3 by the two methods and those for level 4 by the LCF approach at 75th and 90th percentiles
are statistically significant. All of the estimates for level 1 are positive and those for level
21

Considering that 119,687 students took the MEAP math test in 2004/2005, one percentage point
increase(decrease) in the number of students at a certain level translates into about 1200 student increase(decrease).

63

Table 1.18: Average APE estimates of spending on the fourth grade MEAP math test
w

level
Two Step
LCF

w2

Two Step
Two Step
(Flexible)
LCF
LCF
(Flexible)

1
0.681*
(0.158)
0.768*
(0.213)
0.779*
(0.067)
0.691*
(0.038)
0.856*
(0.218)
0.771*
(0.227)

2
0.017
(0.158 )
-0.039
(0.131)
-0.044
(0.126)
0.014
(0.158)
-0.051
(0.143)
0.019
( 0.146)

3
4
-0.619* -0.079
(0.048) (0.048)
-0.625* -0.104*
(0.165) (0.049)
-0.667* -0.068
(0.095) (0.035)
-0.602* -0.103
(0.174) (0.054)
-0.693* -0.112*
(0.182) (0.056)
-0.626* -0.163*
(0.171) (0.068)

1.

w = spending = log(per pupil GF expenditure).
Standard errors are in parentheses.
3. The two step estimation method’s standard errors
are calculated using 1000 bootstrap replications.
4. * is significant at, or below, 5 percent.
2.

3 and 4 are negative. As the percentile of spending increases, the magnitudes of the
estimated effects for level 1 and 3 by the LCF approach decrease, illustrating that given
the same percentage increase in per pupil expenditure, low spending districts are affected
more than high spending districts. On the other hand, the two step estimation method
has different patterns. Without the quadratic effect, the higher the percentile of spending
is, the bigger its estimated effect for level 1 is and the smaller that for level 3 is. But its
level 1 estimate with the quadratic effect increases from the 10th to the 75th percentile and
then drops at the 90th percentile. Allowing for v2i and v3i in the estimation has the same
result as the average APE estimates.
The effects on level 2 in Table 1.18 and Table 1.19 are not significant. It might be
because the fractions of students who used to be at lower levels move to level 2 and who
used to be at level 2 to level 1 are similar.
In summary, the two methods show that spending affects mainly on level 1 and 3.
Plus, considering the magnitudes of their estimates along with the directions, we can

64

Table 1.19: Percentile APE estimates of spending on the fourth grade MEAP math test

w

wp
level
Two Step

w2

Two Step
Two Step
(Flexible)
LCF
LCF
(Flexible)

w

wp
level
Two Step

w2

Two Step
Two Step
(Flexible)
LCF
LCF
(Flexible)

w

wp
level
Two Step

w2

Two Step
Two Step
(Flexible)
LCF
LCF
(Flexible)

10th
1
0.601*
(0.135)
0.720*
(0.146)
0.628*
(0.163)
0.922*
(0.231)
0.791*
(0.242)

2
0.213
(0.156)
0.163
(0.194)
0.202
(0.178)
-0.060
(0.163)
0.040
(0.169)

3
-0.729*
(0.209)
-0.832*
(0.246)
-0.710*
(0.215)
-0.745*
(0.208)
-0.646*
(0.192)

25th
4
-0.084
(0.074)
-0.051
(0.078)
-0.120
(0.102)
-0.117
(0.066)
-0.185*
(0.081)

1
0.633*
(0.156)
0.755*
(0.171)
0.658*
(0.184)
0.903*
(0.227)
0.785*
(0.237)

50th
1
0.679*
(0.188)
0.798*
(0.206)
0.697*
(0.213)
0.872*
(0.221)
0.776*
(0.230)

2
0.062
(0.125)
-0.012
(0.141)
0.052
(0.140)
-0.053
(0.147)
0.024
(0.151)

3
-0.660*
(0.179)
-0.718*
(0.190)
-0.643*
(0.175)
-0.706*
(0.188)
-0.631*
(0.175)

2
-0.252
(0.179)
-0.273
(0.181)
-0.230
(0.185)
-0.038
(0.132)
-0.011
(0.132)

3
-0.452*
(0.072)
-0.436*
(0.070)
-0.437*
(0.074)
-0.619*
(0.162)
-0.598*
(0.159)

3
-0.706*
(0.200)
-0.792*
(0.227)
-0.688*
(0.202)
-0.730*
(0.200)
-0.641*
(0.185)

4
-0.083
(0.067)
-0.059
(0.071)
-0.115
(0.088)
-0.115
(0.063)
-0.179*
(0.077)

75th
4
-0.080
(0.056)
-0.067
(0.059)
-0.106
(0.067)
-0.113
(0.058)
-0.169*
(0.071)

90th
1
0.764*
(0.242)
0.775*
(0.243)
0.733*
(0.245)
0.761*
(0.214)
0.741*
(0.216)

2
0.157
(0.142)
0.095
(0.169)
0.145
(0.159)
-0.057
(0.156)
0.034
(0.162)

4
-0.060*
(0.019)
-0.065*
(0.022)
-0.066*
(0.018)
-0.104*
(0.049)
-0.132*
(0.058)

1.

w = spending = log(per pupil GF expenditure).
Standard errors are in parentheses.
3. APEs at ( z , spending ).
1
p
4. * is significant at, or below, 5 percent.
2.

65

1
0.728*
(0.222)
0.819*
(0.236)
0.731*
(0.239)
0.828*
(0.215)
0.762*
(0.222)

2
-0.070
(0.132)
-0.141
(0.143)
-0.072
(0.147)
-0.047
(0.137)
0.010
(0.140)

3
-0.584*
(0.140)
-0.607*
(0.138)
-0.568*
(0.134)
-0.672*
(0.174)
-0.618*
(0.166)

4
-0.074
(0.041)
-0.072
(0.042)
-0.091*
(0.043)
-0.109*
(0.053)
-0.154*
(0.064)

conclude that an increase in spending tends to shift students who are rated at a lower
level to an upper level, in general. It is consistent with the results of the four research that
mentioned above.

1.5

CONCLUSION

This chapter develops a two step estimation method for multiple fractional dependent
variables especially when endogenous explanatory variables are continuous. The method
directly specifies the estimating conditional mean rather than the structural one and deals
with the endogeneity by combining a control function approach. Although the method is
not applicable when the characteristics of choices are of interest, it can provide not only a
consistent estimator of the parameters in the estimating conditional mean as long as the
conditional mean specification is correct, but also a useful estimator for the quantities of
the structural conditional mean without estimating the structural mean parameters.
Monte Carlo simulations demonstrate that the method even with a misspecified conditional mean works well as an approximation to true APEs if a strong instrument is
available. The simulations also provide evidence that the two step estimation is preferable to an alternative linear method - a LCF approach; the linear method’s approximation
is more sensitive to the quality of an instrument.
The application to the fourth grade MEAP math test of the year 2004/2005 illustrates
that the two step estimation method and the LCF approach provide similar results in
general: the more a school district spends, the more students are in the exceeded Michigan
Standards level and the fewer students are in the demonstrated basic knowledge and
skills of Michigan Standards level. That is, an increase in spending tends to shift students
from a lower level to a upper level.

66

APPENDIX

67

Appendix for Chapter 1
In this appendix, we show how to obtain the estimators’ standard errors taking the extra
variations from the first steps of Procedure 1.2.1 and Procedure 1.2.2 into account especially when the quadratic effect is not allowed for. The model including w2 and using the
flexible forms of v require modifications. This appendix also includes Table 1.20 showing
how the response variables for three choices are generated across the simulations.
The standard errors of θ First, we obtain the standard errors of θ from Procedure 1.2.1.
The first step of Procedure 1.2.1 applies OLS to (1.15) and so, under the standard regularity conditions,

√

1
N (π − π ) = √
N

N

∑ E(z z)
i

−1

1
z i v i + o p (1) = √
N

N

∑ q i + o p (1)

(1.28)

i

where qi ≡ E(z z)−1 zi vi . The conditional mean of the second step obtaining θ is
E(yig |zi , wi ) = K g (hi ; θ) =

=

exp(θg hi )
1 + ∑hG=2 exp(θh hi )
1
1 + ∑hG=2 exp(θh hi )

,

g = 2, · · · , G,

(1.29)

,

g = 1,

(1.30)

where hi = (xi1 vi ) is a 1 × p vector, vi = wi − zi π, and we redefine θ for the appendix
by dropping θ1 from it:

θ=

θ2 · · · θg · · · θG

p( G −1)×1

where θg = (θz θw θv ) p×1 , for g = 2, · · · G. Then the first order condition is
N

∑ si (θ, π ) = 0
i

68

(1.31)

where



∂ i
 
)
(
 ∂θ2 
s
 .   i2 
 .   .. 
 .   . 
  

 ∂ i   

 
)
si (θ, π ) ≡ (∇θ i ) =  (
 =  sig 
∂θ
g

  
 ..   .. 
 .   . 

  
 ∂ i 
siG
(
)
∂θG
and
sig = (

∂ i
) = hi (yig − K g (hi ; θ)), p × 1 vector.
∂θg

A mean value expansion (MVE) around θ gives
N

∑ si (θ, π ) = ∑ si (θ, π ) +
i

i

∇θ ∑ si (θ¨ ) (θ − θ)

(1.32)

i

1
where θ¨ is on the line segment between θ and θ. By multiplying through by √ and
N
using (1.31) and the weak law of large numbers (WLLN), we rearrange (1.32):

√

1
N (θ − θ) = − √
N

N

∑

A−1 si (θ, π ) + o p (1)

(1.33)

i

where A = − E(Hi (θ)) = − E(∇2θ i (θ)) p(G−1)× p(G−1) ,




hi hi Ki2 Ki3
···
···
hi hi Ki2 KiG
−hi hi Ki2 (1 − Ki2 )





h
h
K
K
−
h
h
K
(
1
−
K
)
·
·
·
·
·
·
h
h
K
K
i3


i i i3 i2
i i i3
i i i2 iG
Hi = 
,
.
.
.


..
..
..




hi hi KiG Ki2
···
· · · hi hi KiG KiG −hi hi KiG (1 − KiG )
and Kig = K g (hi ; θ).
Since ∑i si (θ, π ) still depends on the sample, we can not apply the central limit the-

69

orem (CLT) yet. Using a MVE around π, multiplying through by

√1
N

and using (1.28)

gives
1
√
N

√
N ( π − π ) + o p (1).
s
(
θ,
π
)
+
E
[∇
s
(
θ,
π
)]
π i
∑ i

N

N

1
∑ si (θ, π ) = √ N
i

i
N

1
=√
N

∑ (si + Fqi ) + o p (1)

(1.34)

i

where
∂si2
∂π

F = E[∇π si (θ, π )] p(G−1)× M = E

···

∂sig
∂π

···

∂siG
∂π

and
∂sig
∂h
θ exp(hi θh )
∑
= i (yig − Kig ) + hi zi Kig θvg − h=2 vh
∂π
∂π
1 + ∑h=2 exp(hi θh )

.

By plugging (1.34) into (1.33),

√

N (θ − θ) = A−1

1
−√
N

N

∑ di (θ, π )

+ o p (1)

(1.35)

i

where di ≡ si + Fqi . Therefore,

Avar

√

N (θ − θ) = A−1 DA−1

(1.36)

where D ≡ Var(di ) = Var(si + Fqi ), and so a valid estimator of Avar(θ) is
1 −1
A D A −1
N

(1.37)

where
D≡

1
N

N

∑ di di =
i

1
N

∑(si + Fqi )(si + Fqi ) ,

(1.38)

i

si = si (hi ; θ),

70

(1.39)

F = Fi (hi ; θ),
q=(

1
N

(1.40)

N

∑ z i z i ) −1 z i v i ,

(1.41)

i

and
1
A=−
N

N

∑ Hi (hi ; θ).

(1.42)

i

The square roots of (1.37)’s diagonal elements are the standard errors.
The standard errors of the two step APE estimator Next, we derive the standard errors
of (1.23). Let’s define (1.22) as δg (x1 ; η):

δg (x1 ; η) ≡ Ev K g (x1 , v; θ) ·

where η =

θ

θwg −

∑hG θwh exp (x1 θxh + θvh v)
∑hG exp (x1 θxh + θvh v)

(1.43)

. Since it depends on the value of x1 , we use two ap-

π

( p( G −1)+ M)×1

proaches to obtain a single number. One is the average APEs,
δgAVG = Ex1 δg (x1 ; η)

(1.44)

and the other is the percentile APEs,
δgPCT = δg (x1◦ ; η)

(1.45)

where x1◦ = (z1 , w p ).
(1.44) is estimated as
δgAVG =

1
N

N

∑ δg (x j1; η)
j


where δg (x j1 ; η) =

1
N

(1.46)

∑iN K g (x j1 , vi ; θ) · θwg −

71

∑hG θwh exp x j1 θxh + θvh vi
∑hG

exp x j1 θxh + θvh vi


. Based on

(1.28) and (1.35), we can write


√

N (η − η) =

√





1
 θ−θ 
N
= √
N
π−π

N

A −1 d

i

qi


∑


1
 + o p (1) = √
N

i

Applying a MVE to (1.46) around η, multiplying through

√

N

∑ k i + o p (1).

(1.47)

i

N, using (1.47) and WLLN

gives

√

We subtract

√

√

N (δgAVG

N

1
N δgAVG = √
N

∑

δg (x j1 ; η) + E ∇η δg (x j1 ; η) k j + o p (1).

(1.48)

j

NδgAVG from both sides of (1.48):

− δgAVG )

1
=√
N

N

∑

δg (x j1 ; η) − δgAVG + E ∇η δg (x j1 ; η) k j + o p (1)

(1.49)

j

where E δg (x j1 ; η) − δgAVG + E ∇η δg (x j1 ; η) k = 0. Therefore, based on the CLT,
Avar

√

N (δgAVG − δgAVG ) = Var δg (x j1 ; η) − δgAVG + ∆ g (η)k

(1.50)

where ∆ g (η) = E ∇η δg (x j1 ; η) , the 1 × { p( G − 1) + M} Jacobian of δg (x j1 ; η), and a valid
estimator of (1.50) is
1
N

N

∑

δg (x j1 ; η) − δgAVG + ∆ g (η)k j

δg (x j1 ; η) − δgAVG + ∆ g (η)k j

(1.51)

∇η δg (x j1 ; η) ,

(1.52)

j

where
1
∆ g (η) =
N
and

N

∑
j



kj = 

A −1 d
qj

72


j

.

(1.53)

The percentile APE (1.45) is estimated as
δgPCT

=

δg (x1◦ ; η)

1
=
N


where jg (x1◦ , vi ; η) = K g (x1◦ , vi ; θ) · θwg −

N

∑ jg (x1◦ , vi ; η)

∑hG θwh exp x1◦ θxh + θvh vi
∑hG

(1.54)

i

exp

x1◦ θxh

+ θvh vi


. Through the

same process as in the average APE estimate,

√

1
N (δgPCT − δgPCT ) = √
N

N

∑

jg (x1◦ , vi ; η) − δgPCT + J g (η)ki + o p (1)

(1.55)

i

where J g (η) = E ∇η jg (x1◦ , v; η) , the 1 × { p( G − 1) + M} Jacobian of jg (x1◦ , v; η). Since
E jg (x1◦ , v; η) − δgPCT + J g (η)k = 0,
Avar

√

N (δgPCT − δgPCT ) = Var jg (x1◦ , v; η) − δgPCT + J g (η)k .

(1.56)

Thus, a valid estimator of (1.56) is
1
N

N

∑

jg (x1◦ , vi ; η) − δgPCT + J g (η)ki

jg (x1◦ , vi ; η) − δgPCT + J g (η)ki

(1.57)

j

where
J g (η) =

1
N

N

∑

∇η jg (x1◦ , vi ; η) .

(1.58)

i

Hence, the the asymptotic standard errors of δgAVG and δgPCT are obtained as the square
√
roots of (1.51) and (1.57), divided by N, respectively.
The standard errors of γ For g = 2, · · · , G, the LCF approach models

yig = zi1 γzg + γwg wi + uig

73

(1.59)

and
uig = ρ g vi + eig

(1.60)

with the reduced form of the endogenous variable w, (1.15) and yi1 = ∑G
g=2 yig . Plugging
(1.60) into (1.59),

yig = zi1 γzg + γwg wi + ρ g vi + eig = hi γ g + eig ,

g = 2, · · · , G

(1.61)

where γ g = (γzg γwg ρ g ) = (γzg γwg γvg ) p×1 . Considering that the second step of
Procedure 1.2.2 replaces v with v, the estimating equation of the LCF approach for g =
2, · · · , G is

yig = zi1 γzg + γwg wi + ρ g vi + ρ g (vi − vi ) + eig

= hi γ g + ρ g (vi − vi ) + eig
= hi γ g + (hi − hi )γ g + eig

(1.62)

and the LCF estimator is expressed as
−1

∑ hi hi

γg = γg +

∑ hi

(hi − hi )γ g + eig ,

g = 2, · · · , G

(1.63)

i

i

and
G

∑ γg .

γ1 = e1 −

(1.64)

g =2

Therefore,

√

N (γ g − γ g ) =

=

1
N
1
N

−1

∑ hi hi
i

∑ hi hi
i

1
√
N

−1 G

∑

g =2

∑ hi

(hi − hi )γ g + eig ,

g = 2, · · · , G,

(1.65)

i

1
−√
N

74

∑ hi [(hi − hi )γg + eig ]
i

,

g = 1. (1.66)

A similar reasoning in Wooldridge (2010, Appendix 6A) rewrites (1.65) and (1.66) as

√

N γ g − γ g = C −1

= C −1

1
√
N

N

∑

hi eig − R g B−1 zi vi

+ o p (1),

g = 2, · · · , G, (1.67)

i

1
−√
N

N

G

∑∑

hi eig − R g B−1 zi vi

+ o p (1),

g = 1.

(1.68)

i g =2

where
C = E ( h h ),
R g = E (γ g ⊗ h) ∇π h ,
and
B = E ( z z ).

(1.69)

Since E h e g − R g B−1 z v = 0 ∀ g,

Avar

√

N ( γ g − γ g ) = C −1 M g C −1

(1.70)

where
M g = Var h e g − R g B−1 z v ,

g = 2, · · · , G.

(1.71)

and
G

M1 = Var

∑

h e g − R g B −1 z v

(1.72)

g =2

Therefore, Avar(γ g ) is estimated as
1
C −1 M g C −1
N

(1.73)

where
C=

1
N

N

∑ hi hi ,
i

75

(1.74)

1
Mg =
N

N

∑

hi eig − R g B−1 zi vi

,

g = 2, · · · , G,

(1.75)

i

G

M1 =

hi eig − R g B−1 zi vi
1

∑ Mg + N ∑

g =2

hi eig − R g B−1 zi vi

hi eik − Rk B−1 zi vi

,

(1.76)

g =k

Rg =

1
N

N

∑

γ g ⊗ hi

∇ π x i ( π ),

(1.77)

i

B=

1
N

N

∑ zi zi ,

(1.78)

i

and
eig = yig − hi γ g .

(1.79)

The standard error of γ g is obtained as the square root of (1.73).
(y1 , y2 , y3 ) generated across the simulations In Table 1.20, we report the average outcome in each of the three choices across the simulations, and the fraction of times at least
one choice being below 0.05. The fractions are about 0.25 ∼ 0.45 with the two symmetric
distributions and the χ2 distribution has higher fractions, about 0.70. They suggest that
the dependent variable generating process covers the cases where yi has a set of extreme
values such as (0.95, 0.05,0).

76

Table 1.20: (yi1 , yi2 , yi3 ) generated across the simulations
D (e)
π2
ρ
1

1

0.5

1

0.2

1

0.1

1

1

0.5

0.5

0.5

0.2

0.5

0.1

0.5

1

0.1

0.5

0.1

0.2

0.1

0.1

0.1

y1
0.296
(0.014)
0.289
(0.014)
0.286
(0.013)
0.286
(0.013)
0.277
(0.013)
0.268
(0.012)
0.265
(0.012)
0.264
(0.012)
0.263
(0.012)
0.251
(0.011)
0.248
(0.011)
0.247
(0.011)

Normal
mean
y2
y3
0.352
0.352
(0.007) (0.007)
0.356
0.356
(0.007) (0.007)
0.357
0.357
(0.007) (0.007)
0.357
0.357
(0.007) (0.007)
0.361
0.361
(0.007) (0.007)
0.366
0.366
(0.006) (0.006)
0.368
0.368
(0.006) (0.006)
0.368
0.368
(0.006) (0.006)
0.368
0.368
(0.006) (0.006)
0.374
0.374
(0.006) (0.006)
0.376
0.376
(0.006) (0.006)
0.376
0.376
(0.006) (0.006)

fraction
0.39
(0.02)
0.37
(0.02)
0.36
(0.02)
0.36
(0.02)
0.34
(0.02)
0.31
(0.02)
0.30
(0.02)
0.30
(0.02)
0.30
(0.02)
0.27
(0.02)
0.26
(0.02)
0.25
(0.02)

y1
0.313
(0.015)
0.307
(0.015)
0.305
(0.015)
0.305
(0.015)
0.299
(0.014)
0.291
(0.014)
0.289
(0.014)
0.289
(0.014)
0.289
(0.014)
0.280
(0.013)
0.277
(0.013)
0.277
(0.013)

Logistic
mean
y2
y3
0.344
0.344
(0.008) (0.008)
0.346
0.346
(0.008) (0.008)
0.347
0.347
(0.008) (0.007)
0.347
0.347
(0.008) (0.007)
0.351
0.351
(0.007) (0.007)
0.354
0.354
(0.007) (0.007)
0.355
0.355
(0.007) (0.007)
0.356
0.355
(0.007) (0.007)
0.356
0.356
(0.007) (0.007)
0.360
0.360
(0.007) (0.007)
0.361
0.361
(0.007) (0.007)
0.361
0.361
(0.007) (0.007)

1.

τ = 0.
Standard deviations are in parentheses.
3. These are the results when the quadratic effect of w is not allowed.
2.

77

χ23
fraction
0.44
(0.02)
0.42
(0.02)
0.42
(0.02)
0.42
(0.02)
0.40
(0.02)
0.38
(0.02)
0.37
(0.02)
0.37
(0.02)
0.37
(0.02)
0.35
(0.02)
0.34
(0.02)
0.34
(0.02)

y1
0.103
(0.009)
0.095
(0.008)
0.093
(0.008)
0.093
(0.008)
0.085
(0.008)
0.077
(0.007)
0.074
(0.007)
0.074
(0.006)
0.073
(0.007)
0.065
(0.006)
0.062
(0.005)
0.062
(0.005)

mean
y2
0.449
(0.005)
0.452
(0.005)
0.453
(0.005)
0.453
(0.005)
0.458
(0.004)
0.462
(0.004)
0.463
(0.004)
0.463
(0.004)
0.463
(0.004)
0.468
(0.004)
0.469
(0.003)
0.469
(0.003)

fraction
y3
0.449
(0.005)
0.452
(0.005)
0.453
(0.004)
0.453
(0.004)
0.457
(0.004)
0.462
(0.004)
0.463
(0.004)
0.463
(0.004)
0.463
(0.004)
0.468
(0.004)
0.469
(0.003)
0.469
(0.003)

0.68
(0.02)
0.68
(0.02)
0.68
(0.02)
0.68
(0.02)
0.69
(0.02)
0.69
(0.02)
0.69
(0.02)
0.69
(0.02)
0.69
(0.02)
0.70
(0.02)
0.70
(0.02)
0.70
(0.02)

REFERENCES

78

REFERENCES

Arsen, D., and D. N. Plank., Michigan School Finance Under Proposal A: State Control, Local
Consequences.
Buis, M. L. 2008. “FMLOGIT: Stata module fitting a fractional multinomial logit model
by quasi maximum likelihood.” Statistical Software Components, Boston College Department of Economics, June.
Gourieroux, C., A. Monfort, and A. Trognon. 1984. “Pseudo Maximum Likelihood
Methods: Theory.” Econometrica, 52(3): 681–700.
Greene, W. H. 2008. Econometric Analysis.: Prentice Hall.
Lockwood, A. 2002. School finance reform in Michigan, Proposal A: Retrospective.: Office of
Revenue and Tax Analysis, Michigan Department of Treasury.
Michgian Department of Education. 2005. “Michigan Educational Assessment Program
(MEAP) Frequently Asked Questions – Winter 2005.” State of Michigan, Department of
Education, May.
Mullahy, J. 2010. “Multivariate Fractional Regression Estimation of Econometric Share
Models.” NBER Working Papers 16354, National Bureau of Economic Research, Inc.
Papke, L. E. 2005. “The effects of spending on test pass rates: evidence from Michigan.”
Journal of Public Economics, 89(5-6): 821–839.
. 2008. “The Effects of Changes in Michigan’s School Finance System.” Public
Finance Review, 36(4): 456–474.
, and J. M. Wooldridge. 1996. “Econometric Methods for Fractional Response
Variables with an Application to 401(K) Plan Participation Rates.” Journal of Applied
Econometrics, 11(6): 619–32.
, and
. 2008. “Panel data methods for fractional response variables with
an application to test pass rates.” Journal of Econometrics, 145(1-2): 121–133.
Petrin, A., and K. Train. 2010. “A Control Function Approach to Endogeneity in Consumer Choice Models.” Journal of Marketing Research, 47(1): 3–13.
Roy, J. 2011. “Impact of School Finance Reform on Resource Equalization and Academic
Performance: Evidence from Michigan.” Education Finance and Policy, 6(2): 137–167.
Sivakumar, A., and C. Bhat. 2002. “Fractional Split-Distribution Model for Statewide
Commodity-Flow Analysis.” Transportation Research Record, 1790(1): 80–88.

79

Staiger, D., and J. H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65(3): 557–586.
Wicksall, B., and M. A. Cleary. 2009. “The Basics of the Foundation Allowance – FY
2006-07.” House Fiscal Agency, January, memorandum.
Wooldridge, J. M. 2003. Solutions Manual and Supplementary Materials for Econometric
Analysis of Cross Section and Panel Data.: MIT Press.
. 2005. “Unobserved Heterogeneity and Estimation of Average Partial Effects.”
In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg.
eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, , Chap. 3.
. 2010. Econometric Analysis of Cross Section and Panel Data.: MIT Press.

80

CHAPTER 2
MULTIPLE FRACTIONAL RESPONSE
VARIABLES WITH A BINARY
ENDOGENOUS EXPLANATORY
VARIABLE
2.1

INTRODUCTION

In this chapter, we extend Chapter 1 by allowing endogenous explanatory variables (EEVs)
to be discrete. In Chapter 1, we developed a two step estimation method for the response
variable. Unlike in a linear model, the response variable’s two features are inherent in
the nonlinear model for the two step estimation method; each response lies in the unit
interval1 and the sum of responses for a cross sectional unit is one. It maximizes a multinomial distribution with specifying the conditional mean, which includes the first step
residuals, as multinomial logit. That is, the method combines fractional multinomial logit
developed by Sivakumar and Bhat (2002) and Mullahy (2010) with a control function (CF)
approach solving the endogeneity problem by including extra regressors, which is called
a control function, in the equation so that the correlation between EEVs and the unobservables would not exist.
The method, however, restricts the probabilistic nature of EEVs. The EEVs should be
continuous since it relies on the assumption that the exogenous variables are independent
of the EEVs’ reduced form errors. So it is not applicable for a research which has discrete
EEVs such as “how much people have their pension funds invested in stocks, bonds, and
1

It can take a corner value, zero or one.

81

other” where only the shares for these three assets are available and the key explanatory
variable is whether or not taking a class of financial education, which could be correlated
with some factors that a researcher cannot observe.
Thus, in this chapter we modify the method in order to take the discrete nature of
EEVs into account. Especially we consider a binary EEV in the model. A control function approach handling discrete EEVs in nonlinear models was discussed by Terza et al.
(2008).2 They suggested ”two-stage residual inclusion” (2SRI), which includes the unstandardized residuals – an estimator of the difference between the EEV and its conditional mean – as additional regressors in the second step. Wooldridge (2014) extended
two-stage residual inclusion and proposed another control function approach. Motivated
by variable addition tests, he suggests using the standardized or generalized residuals as
control functions instead of the unstandardized residuals. We employ the approach in
Wooldridge (2014) to modify the two step estimation method proposed in Chapter 1.
The modified two step estimation method generates the generalized residuals from a
probit regression at the first step. Then, it applies the fractional multinomial logit with
including the generalized residuals in the conditional mean at the second step.
The second step is a quasi maximum likelihood estimation (QMLE) using a distribution belonging to the linear exponential family. Therefore, it provides a consistent estimator of the mean parameter if the multinomial logit conditional mean is correctly specified,
which Gourieroux et al. (1984) describe. In other words, the consistency does not require
other distributional assumptions than the conditional mean specification.
Notice that it is the parameter in the “estimating” conditional mean that the method
consistently estimates; it is not the structural mean parameter. However, without estimating the structural mean parameter, the method can provide a consistent estimator of
average partial effect (APE) on the “structural” conditional mean, which is often more
interesting than the mean parameter itself. For the APE estimator to be consistent, the
2

Actually, they did not restrict the EEVs’ nature.

82

multinomial logit specification needs to be true.
To see how the method works as an approximation when the specification is wrong,
we conduct Monte Carlo simulations. The quantities that we are interested in are the
true APEs of the binary EEV. We evaluate the method’s approximation to these APEs,
compared to several alternative estimation methods’ including two stage least squares
(2SLS), linear control function (LCF) approaches, and forbidden regressions. In addition,
we compare the performance of its test for endogeneity with one from a linear control
function using the same control function – the generalized residuals.
The simulations provide evidence that although the two step estimation method’s approximation to APEs depends on how strong the instrument is, it is generally as good
as or often better than the alternative methods’. In addition, the two step estimation
method’s test for endogeneity have not only about the correct size but also better power.
The remainder of this chapter is structured as follows. In the next section, we describe
the set of assumptions and the modified two step estimation method. Section 2.3 contains
the Monte Carlo simulation design and presents the simulation results where the conditional mean of the method is misspecified. And we conclude the chapter in Section 2.4.

2.2

THE MODIFIED TWO STEP ESTIMATION

Let’s consider a random sample in the cross section where each cross sectional observation has G choices or shares. The dependent response of interest – multiple fractional
response – for observation i is written as,



yi1
 
 .. 
 . 
 
 

yi = 
 yig 
 
 .. 
 . 
 
yiG
83

(2.1)

G ×1

where 0 ≤ yig ≤ 1 and ∑G
g yig = 1.
Dropping the cross sectional observation index i, write the structural conditional mean
of the response for choice g:

E(y g |x, r ) = E(y g |z, w, r ) = Gg (z1 , w, r; β),

g = 1, 2, · · · , G,

(2.2)

where 0 < Gg (·) < 1 and ∑G
g Gg = 1. These two conditions for Gg (·) are required because
of the response variable’s two features: the bounded nature and the adding-up constraint.
x is the set of explanatory variables including a binary endogenous explanatory variable
w and the set of exogenous variables z = (z1 , z2 ) where z1 includes an intercept. r is an
unobserved omitted variable. Note that the covariates do not have are the g subscript in
(2.2). That is, each choice has the same covariates; choice specific covariates are not allowed for. The modified two step estimation method specifies an estimating conditional
mean, derived from (2.2), as multinomial logit.3 The multinomial logit model allows covariates to contain characteristics varying across cross sectional observations, not choices.
Yet, E(y g |x, r ) = E(yh |x, r ) for g = h – is still possible since the model allows the parameters to vary across g. The second equality of (2.2) shows that z2 is redundant in the
structural conditional mean, indicating that there is an exclusion restriction.
To set up endogeneity into the model, we use an omitted variable problem by assuming w is correlated with r in the following fashion;

w = 1 [zπ + u > 0] ,

(2.3)

u ∼ Normal (0, 1),

(2.4)

and we are allowing r to be correlated with u.
3

In (2.2), we are not specifying Gg (·) as multinomial logit. The assumptions regarding Gg (·) are just
the two conditions and the restriction for the covariates. Due to the restriction, this approach is more
appropriate for the cases where the characteristics for choices are not important.

84

Additionally, add the following conditional independence assumption:

D (r |z, w) = D (r |v),

where v is the generalized error

4

(2.5)

of w, which playing a role of a sufficient statistic to

control for w’s endogeneity. Since u ∼ Normal (0, 1),

v ≡ E(u|z, w) = wλ(zπ ) − (1 − w)λ(−zπ )

(2.6)

φ(·)
is the Inverse Mill’s Ratio.
Φ(·)
Based on the assumptions above, the estimating conditional mean

where λ(·) =

E(y g |x) = K g (z1 , w, v; θ).

(2.7)

Its functional form is determined by the functional form of Gg (·) and the distribution of e.
The estimating conditional mean, however, can be specified as any function satisfying 0 <
K g (·) < 1 and ∑G
g K g = 1, which are from the two conditions of Gg (·), because we have
not yet made any assumptions about those. Instead, we assume that their combination
leads to the multinomial logit form for K g (·):

K g (h; θ) =

exp hθg
∑hG exp (hθh )

(2.8)

where h = (z1 w v) is a 1 × p vector, θg is a p × 1 parameter vector for choice g. θ =

(θ1 . . . θG ) is a pG × 1 vector where θ1 = 0 as a reference.5
We can start with specifying Gg (·) as a multinomial logit in (2.2), and assume joint
normality of (r, u); then, in principle, we can find E(y g |x). But it would not be in closed
form. So, instead, we use (2.5) as an approximation, and the model E(y g |x) = E(y g |x, v)
4
5

Gourieroux et al. (1987)
(2.8) satisfies the two conditions for K g (·)

85

directly as a multinomial logit. Neither approach contains the other, as Wooldridge (2014)
discussed; they use different assumptions.
In order to estimate θ, a consistent estimator of generalized error vi should be obtained first since it is not observed. Therefore, the modified two step estimation method
is summarized as follows:
Procedure 2.2
Step 1. Obtain π from the probit regression of wi on zi and compute the generalized
residual vi ,
vi = wλ(zπ ) − (1 − w)λ(−zπ ) =

φ(zi π ) (wi − Φ(zi π ))
Φ(zi π )(1 − Φ(zi π ))

(2.9)

Step 2. Run fractional multinomial logit (fmlogit) of (yi1 , · · · , yiG ) on zi1 , wi and vi , which
maximizes a multinomial log-likelihood:
N

∑
i

N

i ( θ)

=∑
i

G

∑ yig log Kg (hi ; θ)

.

(2.10)

g

where hi = (zi1 wi vi ).
The method provides a convenient test for endogeneity with the null hypothesis that
w is exogenous by obtaining an asymptotic robust t statistic on vi . This is a variable
addition test discussed in Wooldridge (2014).
Furthermore, it is able to estimate average partial effect (APE), which is often the quantity of more interest, without estimating the structural mean parameter β. It is because

86

the assumptions above ensure the following equivalence:
δg (z◦ ) = APEg (z◦ ) ≡ Er [ E(y g |z◦ , w = 1, r ) − E(y g |z◦ , w = 0, r )]

(2.11)

= Er [ Gg (z1◦ , w = 1, r; β) − Gg (z1◦ , w = 0, r; β)]

(2.12)

= Ev [K g (z1◦ , w = 1, v; θ) − K g (z1◦ , w = 0, v; θ)],

(2.13)

where z◦ denotes a fixed value of z.6 As shown in Wooldridge (2010, section 2.2.5), (2.13)
holds under (2.5) and E(y g |x, r, v) = E(y g |x, r ), which can be derived under the assumptions that have been made so far.
In order to have a representative single number summarizing δg (z◦ ), we average it
out across the sample for z:
δg = Ez δg (z)

(2.14)

Thus, δg can be estimated by obtaining
2step

δg

=

1
N

N

∑
j

1
N

N

∑

K g (z j1 , w = 1, vi ; θ) − K g (z j1 , w = 0, vi ; θ)

,

(2.15)

i

where θ is the two step estimator of θ.
2step

One thing to be aware of is that the standard errors of θ and δg

need to take the

additional variation caused by the firs step into consideration by using the delta method
or bootstrapping the two steps.
θ is a QMLE estimator using a distribution, which is one of linear exponential family. Therefore, based on the discussion of Gourieroux, Monfort, and Trognon (1984), its
consistency is ensured only by (2.8). In other words, it is still consistent even though the
distribution specification is completely wrong except the conditional mean.
2step

If (2.8) is wrong, δg

is inconsistent because θ is inconsistent and the wrong func-

tional form is used in (2.15). So we conduct Monte Carlo simulations in the next section
6 (2.11) can be written as ASF ( z◦ , w = 1) − ASF ( z◦ , w = 0), ASF (·) denotes “average structure function” defined in Blundell and Powell (2003) and Wooldridge (2005).

87

in order to examine how well the two step estimation method approximate to δg when the
multinomial logit conditional mean is misspecified. The simulations compare its approximation with several alternative methods that researchers would use. These methods are
1) Two stage least squares (2SLS),7 2) Linear control function approach using the generalized residual (LCF), 3) Linear IV using the fitted probability Φ(·) as an instrument (LIV),
4) Linear plug-in method, and 5) Two step plug-in method.8
The adding-up constraint is not inherent in the alternative linear models. So they
should apply their estimation methods to G − 1 equations by dropping one of G choices.
Then, the dropped one’s parameters are obtained based on the constraint and the estimates for the other choices.9 The coefficient estimates of these linear models are comparable to δg .
The alternative methods include two forbidden regressions – the two plug-in methods.
They substitute wi with its fitted probability Φ(·) at their second steps. Researchers often
attempt to use this kind of approach, believing that it is legitimate because it emulates the
2SLS procedure. However, they are inappropriate especially for nonlinear models.
In addition to 1) through 5), we include the generalized residual – the control function
– in a flexible way10 for the two step estimation method and LCF in order to examine if
it helps their approximations. We call them 6) Two step flexible and 7) LCF flexible, respectively.
Furthermore, we compare the performance of tests for endogeneity done by the two
step method and LCF with significance level α = 0.05 by varying degree of endogeneity
and the instrument’s predictive power; the null hypotheses of the tests are that there is
no endogeneity.11
7

This is the same as a linear control function approach using the residual from the linear regression w

on z.
8

Terza et al. (2008) call this approach “two-stage predictor substitution (2SPS).”
In the simulations, we drop the first choice (g = 1), which is the reference choice for the two step
estimation method, and obtain its estimates using γ1 = e1 − ∑G
g=2 γ g where γ g is the coefficient parameter
for choice g in the linear models and e1 is a unit vector.
10 v2 and v3 are additionally included in the second step.
11 H : θ = 0 for the two step estimation method and H : γ = 0 for LCF.
v
v
0
0
9

88

2.3
2.3.1

MONTE CARLO SIMULATION
Data Generating Process

We use 1000 replications where each replication has 500 cross sectional observations (N =
500) and 3 choices (g = 3).
The data generating process for each replication is as follows:
The covariates
• zi =

zi1 zi2


=

1 zi1 zi2



1×3



 



zi1 
 0   1 τ 
where   ∼ MV Normal   , 
 and τ ∈ {0, −0.5}.
zi2
0
τ 1
• ui ∼ Normal (0, σu2 )12
• wi = 1[wi∗ > 0] = 1[zi π + ui > 0] = 1[zi1 π1 + zi2 π2 + ui > 0] where π1 ∈ {0, 0.5}
and π2 ∈ {0.1, 0.2, 0.5, 1}.
• vi = σu wi λ

zi π
σu

− (1 − wi ) λ −

zi π
σu

• ri = ρui + ei where ρ ∈ {0.1, 0.25, 0.5, 0.75, 0.9, 1} and D (ei |vi ) is one of the three:
(a) ei |vi ∼ Normal (0, 1)
(b) ei |vi ∼ χ23
1
(c) ei |vi ∼ Normal 0, 1 + v2i .
2
Two symmetric distributions and an asymmetric distribution are in use for the distribution of ei |vi With (a) or (b), r is generated to be uncorrelated with w if ρ = 0.
The values of π1 and π2 affect Var (wi ). As (π1 , π2 ) varies, we adjust σu for Var (wi∗ ) to be invariant
instead of Var (wi ): Var (wi∗ ) = 2.
12

89

However, with (c), where there is heteroskedasticity, r still depends on w even
though ρ = 0.
The structural conditional mean Gg (·) specification: We specify the structural conditional mean as multinomial logit as well since it satisfies the two conditions for Gg (·):

E(yig |xi , ri ) = Gg (zi1 , wi , ri ; β) =

exp zi1 βzg + wi β wg + ri β rg
∑3h exp (zi1 βzh + wi β wh + ri β rh )

(2.16)

where β = ( β1 β2 β3 ) is a 12 × 1 vector, β g = ( βzg β wg β rg ) = (1 1 1 1) for g = 2, 3,
and β1 = 0. Under (2.16), neither of the three distributions of e can derives (2.8).
The multiple fractional dependent variables y: The dependent variable generating
process is the same as that in Chapter 1; we first draw 100 multinomial outcomes among
1, 2, and 3 based on (2.16), and then calculate the proportions for the three outcomes.

2.3.2

Simulation Results

Case 1: Endogeneity comes only through u.
Let’s consider the two settings where the data are generated so that w becomes exogenous
when ρ = 0.
Table 2.1 through 2.3 contain simulation results as π2 = 1.13 In Table 2.1, with the
2step

standard normal distribution, δg

is pretty similar to δg regardless of the degree of en-

dogeneity, and the alternative methods also provide good approximations. Even the forbidden regressions’ approximations are good. Including the flexible forms of v does not
improve the approximations for both the two step estimation method and LCF. In addition, allowing for the correlation between z1 and z2 does not change the story.14 Table 2.2
shows that, under the asymmetric distribution, the biases are slightly larger than those in
13
14

The results with other values of ρ are available upon request.
π1 = 0.5 and τ = −0.5

90

Table 2.1. Yet, in considering the efficiency as well as the bias, all of the methods under
the asymmetric distribution also approximate well, as shown in Table 2.3.

91

Table 2.1: APEs with r = ρu + e, π2 = 1, and e|v ∼ Normal (0, 1)
ρ
π1
g
δg
Two Step
Two Step
Flexible
Two Step
Plug-in
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

Two Step
Flexible
Two Step
Plug-in
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in
1.

3
0.054
0.053
(0.019)
0.052
(0.026)
0.053
(0.020)
0.053
(0.020)
0.053
(0.026)
0.053
(0.020)
0.053
(0.020)
0.053
(0.020)

1
-0.107
-0.108
(0.029)
-0.106
(0.035)
-0.107
(0.030)
-0.107
(0.030)
-0.107
(0.036)
-0.107
(0.031)
-0.107
(0.030)
-0.107
(0.030)

0.5

ρ
π1
g
δg
Two Step

1
-0.107
-0.107
(0.035)
-0.105
(0.046)
-0.106
(0.037)
-0.107
(0.037)
-0.107
(0.047)
-0.106
(0.037)
-0.106
(0.037)
-0.106
(0.037)

1
0.5
2
0.054
0.054
(0.020)
0.053
(0.026)
0.053
(0.021)
0.054
(0.020)
0.054
(0.026)
0.053
(0.021)
0.053
(0.021)
0.053
(0.021)

1
0
2
0.054
0.054
(0.016)
0.053
(0.020)
0.054
(0.017)
0.054
(0.017)
0.054
(0.020)
0.054
(0.017)
0.054
(0.017)
0.054
(0.017)

3
0.054
0.054
(0.016)
0.053
(0.020)
0.053
(0.016)
0.053
(0.016)
0.053
(0.020)
0.053
(0.017)
0.053
(0.016)
0.053
(0.016)

0.1
0

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

1
-0.108
-0.108
(0.023)
-0.108
(0.028)
-0.107
(0.024)
-0.107
(0.024)
-0.108
(0.030)
-0.107
(0.025)
-0.107
(0.024)
-0.107
(0.024)

2
0.054
0.054
(0.014)
0.054
(0.017)
0.054
(0.014)
0.054
(0.014)
0.054
(0.017)
0.054
(0.015)
0.054
(0.014)
0.054
(0.014)

3
0.054
0.054
(0.013)
0.054
(0.017)
0.054
(0.014)
0.054
(0.014)
0.054
(0.017)
0.053
(0.014)
0.054
(0.014)
0.053
(0.014)

As π1 = 0.5, τ = −0.5; otherwise, τ = 0.
92

1
-0.108
-0.108
(0.021)
-0.109
(0.025)
-0.108
(0.021)
-0.108
(0.022)
-0.108
(0.027)
-0.108
(0.022)
-0.108
(0.022)
-0.108
(0.022)

2
0.054
0.054
(0.013)
0.055
(0.016)
0.054
(0.013)
0.054
(0.013)
0.054
(0.016)
0.054
(0.014)
0.054
(0.013)
0.054
(0.013)

3
0.054
0.054
(0.012)
0.054
(0.015)
0.054
(0.013)
0.054
(0.013)
0.054
(0.016)
0.054
(0.013)
0.054
(0.013)
0.054
(0.013)

Table 2.2: APEs with r = ρu + e, π2 = 1, and e|v ∼ χ23
ρ
π1
g
δg
Two Step
Two Step
Flexible
Two Step
Plug-in
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

Two Step
Flexible
Two Step
Plug-in
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in
1.

3
0.017
0.019
(0.011)
0.017
(0.015)
0.014
(0.013)
0.014
(0.013)
0.016
(0.017)
0.014
(0.013)
0.014
(0.013)
0.014
(0.013)

1
-0.032
-0.036
(0.012)
-0.033
(0.014)
-0.029
(0.015)
-0.028
(0.015)
-0.032
(0.018)
-0.029
(0.015)
-0.029
(0.015)
-0.029
(0.015)

0.5

ρ
π1
g
δg
Two Step

1
-0.033
-0.039
(0.015)
-0.035
(0.020)
-0.029
(0.019)
-0.028
(0.020)
-0.034
(0.025)
-0.029
(0.019)
-0.029
(0.019)
-0.029
(0.019)

1
0.5
2
0.017
0.020
(0.012)
0.018
(0.016)
0.015
(0.013)
0.015
(0.013)
0.018
(0.017)
0.015
(0.013)
0.015
(0.013)
0.015
(0.013)

1
0
2
0.016
0.019
(0.010)
0.017
(0.012)
0.015
(0.011)
0.015
(0.011)
0.017
(0.013)
0.015
(0.011)
0.015
(0.011)
0.015
(0.011)

3
0.016
0.018
(0.009)
0.016
(0.012)
0.014
(0.010)
0.014
(0.010)
0.016
(0.013)
0.014
(0.011)
0.014
(0.010)
0.014
(0.010)

0.1
0

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

1
-0.028
-0.029
(0.009)
-0.028
(0.011)
-0.028
(0.011)
-0.027
(0.011)
-0.028
(0.013)
-0.027
(0.011)
-0.027
(0.011)
-0.027
(0.011)

2
0.014
0.015
(0.009)
0.015
(0.011)
0.014
(0.009)
0.014
(0.009)
0.015
(0.011)
0.014
(0.009)
0.014
(0.009)
0.014
(0.009)

3
0.014
0.014
(0.008)
0.014
(0.011)
0.013
(0.009)
0.013
(0.009)
0.013
(0.011)
0.013
(0.009)
0.013
(0.009)
0.013
(0.009)

As π1 = 0.5, τ = −0.5; otherwise, τ = 0.

93

1
-0.027
-0.027
(0.008)
-0.027
(0.010)
-0.027
(0.009)
-0.027
(0.009)
-0.027
(0.011)
-0.027
(0.009)
-0.027
(0.009)
-0.027
(0.009)

2
0.013
0.014
(0.009)
0.014
(0.011)
0.014
(0.009)
0.014
(0.009)
0.014
(0.011)
0.014
(0.009)
0.014
(0.009)
0.014
(0.009)

3
0.013
0.013
(0.008)
0.013
(0.011)
0.013
(0.008)
0.013
(0.008)
0.013
(0.011)
0.013
(0.009)
0.013
(0.008)
0.013
(0.008)

Table 2.3: MSEs of APEs with r = ρu + e and π2 = 1
D (e|v)
Normal
(0,1)

χ23

1.
2.

ρ
π1
g
Two Step
Two Step Flexible
Two Step Plug-in
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in
Two Step
Two Step Flexible
Two Step Plug-in
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in

1
0.001
0.002
0.001
0.001
0.002
0.001
0.001
0.001
0.000
0.000
0.000
0.000
0.001
0.000
0.000
0.000

1
0.5
2
0.000
0.001
0.000
0.000
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

1
3
0.000
0.001
0.000
0.000
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

1
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

As π1 = 0.5, τ = −0.5; otherwise, τ = 0.
MSEs are calculated from Table 2.1 and 2.2.

94

2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

1
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

0.5
0
2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

0.1
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

1
0.000
0.001
0.000
0.000
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

We weaken the predictive power (π2 < 1)15 while keeping the strongest degree of
endogeneity (ρ = 1) in order to examine if these good approximations hinges on the
instrument’s strong predictive power.
To determine if z2 is a strong instrument, we use the rule of thumb suggested by
Staiger and Stock (1997); the first stage F statistic, testing the null hypothesis that instruments are uncorrelated with EEVs, should be larger than 10 for the instruments to
have properties as strong instruments. Since the two step estimation method’s first step
is a probit regression, we apply their rule to the first step Wald statistics testing the same
null hypothesis. Table 2.4 gives the summary of the Wald and the F statistics.16
Table 2.4: F and Wald statistics of the first stage/step (H0 : π2 = 0)
D (e|v)
Normal
(0,1)

χ23

OLS

τ
π1
π2
F statistic

Probit

[F>10]
Wald statistic

1st

step

[Wald>10]
Replication
OLS
F statistic

Probit

[F>10]
Wald statistic

[Wald>10]
Replication

0
0
0.1
2.661
(2.914)
2.6%
2.582
(2.746)
2.5%
995
2.861
(3.215)
4.0%
2.763
(2.988)
3.0%
987

0.2
7.923
(5.733)
28.9%
7.410
(5.025)
26.2%
999
7.992
(5.905)
32.0%
7.464
(5.204)
29.4%
999

0.5
53.215
(19.551)
100.0%
39.589
(10.962)
99.8%
1000
54.203
(19.308)
99.8%
40.174
(10.957)
99.8%
1000

1
443.384
(84.613)
100.0%
133.797
(8.861)
100.0%
1000
443.938
(85.330)
100.0%
134.474
(9.186)
100.0%
1000

-0.5
0.5
1
261.906
(60.292)
100.0%
107.881
(11.291)
100.0%
1000
263.869
(59.632)
100.0%
108.501
(11.390)
100.0%
1000

1.

Standard deviations are in parentheses
[F>10] and [Wald>10] indicate the proportions of the F and Wald statistics being
greater than 10 among the replications, respectively.
3. The tests are robust to heteroskedasticity.
2.

They are not greater than 10 until π2 = 0.5, on average. When π2 = 0.1, only 25
15

π1 = τ = 0.
The first stages of 2SLS and LIV are linear regressions. We report only the former because the latter
are quite similar. (The summary of LIV’s first stage F statistics are available upon request.)
16

95

∼ 40 replications have the statistics being larger than 10. The proportions increase as
π2 increases, and then they are larger than 10 in almost every replication as π2 = 0.5.
Therefore, we consider z2 as a weak instrument if π2 ≤ 0.2 and as a strong one if π2 ≥ 0.5.
Table 2.5, containing the results with the standard normal distribution, show that, in
general, the higher the predictive power is, the less biased and the more precise the estimates are, and that as π2 increases, the nonlinear models more quickly recover their
approximation abilities. When π2 = 0.1, the linear models yield worse approximations
than the three nonlinear models, providing much larger biases and huge standard deviations. When π2 = 0.2, the nonlinear models’ estimates become closer to δg and more
precise whereas the linear models still provide poor approximations. As π2 = 0.5, which
makes z2 a strong instrument, the approximations by all the methods are quite good.
Table 2.7 clearly shows that, among the nonlinear models, the two step estimation
method yields a better approximation in both bias and efficiency criteria, and that it does
not improve the approximations to include v in a flexible way. The flexible way does
not help the estimation to obtain smaller biases but causes bigger standard deviations.
Table 2.6 and 2.7 also present that, in general, the asymmetric distribution have a similar
results as the standard normal distribution.

96

Table 2.5: APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ Normal (0, 1)
π2
g
Mean
Mean
SD
Two Step Mean
SD
Flexible
Two Step Mean
SD
Plug-in
LCF
Mean
SD
LCF
Mean
SD
Flexible
2SLS
Mean
SD
LIV
Mean
(IV=Φ(·))
SD
Linear
Mean
Plug-in
SD
Replications
δg
Two Step

1.

1
-0.106
-0.171
(0.300)
-0.184
(0.427)
-0.147
(0.440)
-0.061
(6.387)
1.427
(15.611)
2.545
(91.074)
0.112
(7.392)
-0.218
(5.287)

0.1
2
0.053
0.092
(0.186)
0.097
(0.283)
0.082
(0.270)
0.061
(3.274)
-0.861
(9.053)
-1.303
(47.430)
-0.068
(4.376)
0.150
(2.830)
995/1000

3
0.053
0.079
(0.170)
0.088
(0.290)
0.064
(0.249)
0.000
(3.165)
-0.566
(7.174)
-1.242
(43.651)
-0.045
(3.079)
0.067
(2.590)

1
-0.106
-0.082
(0.189)
-0.127
(0.350)
-0.053
(0.273)
-0.017
(0.752)
0.108
(2.567)
-0.023
(0.524)
-0.027
(0.496)
-0.019
(0.597)

π1 = τ = 0.

97

0.2
2
0.053
0.042
(0.110)
0.067
(0.210)
0.028
(0.151)
0.010
(0.370)
-0.035
(1.639)
0.014
(0.260)
0.015
(0.259)
0.012
(0.284)
999/1000

3
0.053
0.039
(0.103)
0.060
(0.223)
0.025
(0.147)
0.007
(0.397)
-0.073
(1.504)
0.009
(0.292)
0.013
(0.257)
0.007
(0.337)

0.5
1
2
3
-0.106 0.053
0.053
-0.102 0.051
0.051
(0.071) (0.040) (0.038)
-0.100 0.050
0.050
(0.123) (0.070) (0.069)
-0.101 0.050
0.050
(0.075) (0.041) (0.039)
-0.101 0.051
0.050
(0.075) (0.041) (0.039)
-0.099 0.050
0.049
(0.132) (0.073) (0.073)
-0.101 0.050
0.050
(0.075) (0.041) (0.040)
-0.101 0.050
0.050
(0.075) (0.041) (0.040)
-0.101 0.050
0.050
(0.075) (0.041) (0.040)
1000/1000

Table 2.6: APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ χ23
π2
g
Mean
Mean
SD
Two Step Mean
Flexible
SD
Two Step Mean
SD
Plug-in
LCF
Mean
SD
LCF
Mean
SD
Flexible
2SLS
Mean
SD
LIV
Mean
SD
(IV=Φ(·))
Linear
Mean
Plug-in
SD
Replications
δg
Two Step

1.

1
-0.036
-0.064
(0.275)
0.011
(0.431)
-0.036
(0.428)
-0.046
(3.994)
-1.639
(9.191)
0.057
(2.241)
-0.074
(1.891)
0.077
(2.632)

0.1
2
0.018
0.035
(0.168)
-0.005
(0.279)
0.018
(0.252)
0.021
(2.047)
0.772
(5.188)
-0.044
(1.396)
0.194
(4.725)
-0.055
(1.632)
987/1000

3
0.018
0.028
(0.166)
-0.006
(0.283)
0.018
(0.253)
0.025
(2.060)
0.868
(4.902)
-0.013
(1.118)
-0.120
(3.054)
-0.022
(1.229)

1
-0.036
-0.029
(0.159)
-0.002
(0.345)
-0.012
(0.218)
0.042
(1.484)
-0.464
(4.056)
-0.029
(0.645)
-0.007
(0.411)
-0.043
(1.044)

π1 = τ = 0.

98

0.2
2
0.018
0.018
(0.093)
0.000
(0.215)
0.008
(0.119)
-0.013
(0.654)
0.227
(2.409)
0.018
(0.275)
0.007
(0.195)
0.024
(0.443)
999/1000

3
0.018
0.012
(0.093)
0.002
(0.209)
0.003
(0.126)
-0.029
(0.838)
0.237
(1.797)
0.012
(0.392)
0.000
(0.250)
0.019
(0.614)

0.5
1
2
3
-0.035 0.018
0.018
-0.044 0.023
0.021
(0.031) (0.022) (0.022)
-0.030 0.016
0.013
(0.082) (0.053) (0.051)
-0.027 0.014
0.013
(0.042) (0.026) (0.026)
-0.026 0.014
0.012
(0.041) (0.026) (0.025)
-0.039 0.021
0.018
(0.081) (0.054) (0.051)
-0.026 0.014
0.012
(0.040) (0.025) (0.025)
-0.026 0.014
0.012
(0.040) (0.025) (0.025)
-0.026 0.014
0.012
(0.040) (0.025) (0.025)
1000/1000

Table 2.7: MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1
D (e|v)
Normal
(0,1)

χ23

1.
2.

π2
g
Two Step
Two Step Flexible
Two Step Plug-in
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in
Two Step
Two Step Flexible
Two Step Plug-in
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in

1
0.094
0.189
0.196
40.802
246.059
8301.533
54.686
27.963
0.076
0.188
0.183
15.953
87.046
5.032
3.579
6.941

0.1
2
0.036
0.082
0.074
10.722
82.795
2251.416
19.166
8.016
0.029
0.078
0.063
4.189
27.488
1.952
22.353
2.670

3
0.030
0.085
0.062
10.023
51.851
1907.082
9.490
6.706
0.028
0.081
0.064
4.244
24.755
1.251
9.343
1.512

π1 = τ = 0.
MSEs are calculated from Table 2.5 and 2.6.

99

1
0.036
0.123
0.078
0.574
6.636
0.281
0.252
0.364
0.025
0.120
0.048
2.208
16.638
0.416
0.170
1.091

0.2
2
3
0.012 0.011
0.044 0.050
0.024 0.022
0.139 0.160
2.693 2.279
0.069 0.087
0.069 0.068
0.082 0.116
0.009 0.009
0.047 0.044
0.014 0.016
0.428 0.705
5.848 3.277
0.075 0.153
0.038 0.063
0.196 0.377

1
0.005
0.015
0.006
0.006
0.017
0.006
0.006
0.006
0.001
0.007
0.002
0.002
0.007
0.002
0.002
0.002

0.5
2
0.002
0.005
0.002
0.002
0.005
0.002
0.002
0.002
0.001
0.003
0.001
0.001
0.003
0.001
0.001
0.001

3
0.001
0.005
0.002
0.002
0.005
0.002
0.002
0.002
0.000
0.003
0.001
0.001
0.003
0.001
0.001
0.001

Table 2.8 compares the rejection frequencies for the different degrees of endogeneity
when the strongest instrument is in use. The tests by both the two step estimation method
and LCF have good size properties; as ρ = 0, their rejection frequencies are quite close to
the nominal value. Allowing for the additional terms of v makes them over-reject. The
two step estimation method, however, has better power than LCF for both the two distributions. Figure 2.1 and 2.217 illustrate a pattern that the two step estimation method’s
rejection frequencies increase faster than LCF’s as ρ > 0. With χ23 distribution, the pattern
is more evident and the flexible way helps the two step estimation method have slightly
better power.

17

The graphs show average rejection frequencies for the two choices at each value of ρ in Table 2.8.

100

Table 2.8: Rejection Frequencies for α = 0.05 test in Case 1 with varying ρ
D (e|v)
Normal
(0,1)

χ23

1.
2.

ρ
g
Two Step
LCF
Two Step flexible
LCF flexible
Two Step
LCF
Two Step flexible
LCF flexible

0
2
0.05
0.05
0.11
0.10
0.06
0.06
0.17
0.11

0.1
3
0.06
0.06
0.11
0.10
0.07
0.04
0.17
0.11

2
0.11
0.09
0.15
0.13
0.09
0.06
0.19
0.12

3
0.12
0.09
0.16
0.12
0.08
0.05
0.18
0.11

0.25
2
0.46
0.30
0.40
0.26
0.22
0.10
0.28
0.13

3
0.46
0.32
0.41
0.25
0.22
0.07
0.29
0.13

0.5
2
0.92
0.79
0.89
0.66
0.54
0.20
0.58
0.18

3
0.92
0.79
0.89
0.68
0.55
0.18
0.58
0.21

0.75
2
1.00
0.97
0.99
0.93
0.78
0.37
0.85
0.32

3
1.00
0.97
0.99
0.94
0.79
0.38
0.85
0.31

π2 = 1, π1 = τ = 0.
The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes.

101

0.9
2
1.00
0.99
1.00
0.98
0.88
0.48
0.94
0.41

3
1.00
0.99
1.00
0.98
0.88
0.51
0.93
0.41

1
2
1.00
1.00
1.00
0.99
0.91
0.56
0.96
0.48

3
1.00
1.00
1.00
0.99
0.90
0.60
0.96
0.49

Figure 2.1: Rejection Frequencies for Normal(0,1) with varying ρ

1.0
0.9
0.8
0.7
0.6

0.5
0.4

0.3
0.2
0.1
0.0

0

0.1

Two Step

0.25
LCF

0.5

0.75

0.9

Two Step flexible

1

ρ

LCF flexible

Figure 2.2: Rejection Frequencies for χ23 with varying ρ

1.0
0.9
0.8
0.7
0.6
0.5

0.4
0.3

0.2
0.1
0.0

0

Two Step

0.1

0.25

LCF

0.5

0.75

Two Step flexible

0.9

1

ρ

LCF flexible

Furthermore, the rejection frequencies in Table 2.9 suggest that the stronger the instru-

102

ment is, the higher the power is, except LCF flexible. Figure 2.3 and 2.418 show that the
two step estimation method has slightly better power with the standard normal distribution and much better power with the χ23 distribution than LCF. Again, adding additional
forms of v in the estimation have the two step estimation method have slightly higher
rejection frequencies for χ23 distribution. LCF flexible has a different pattern, however; it
has a U-shape graph whereas the other three are monotonically increasing, suggesting
LCF could have higher power by including v in a flexible way if a weak instrument is in
use.

18 The graphs show average rejection frequencies for the two choices at each value of π where π =
2
1
τ = 0 in Table 2.9.

103

Table 2.9: Rejection Frequencies for α = 0.05 test in Case 1 with varying π2

D (e|v)
Normal
(0,1)

χ23

1.
2.

τ
π1
π2
g
Two Step
LCF
Two Step flexible
LCF flexible
Replication
Two Step
LCF
Two Step flexible
LCF flexible
Replication

-0.5
0.5
1

0
0
1

0.5

0.2

0.1

2
3
1.00 1.00
1.00 1.00
1.00 1.00
0.98 0.99
1000
0.88 0.89
0.58 0.60
0.95 0.95
0.48 0.48
1000

2
3
1.00 1.00
1.00 1.00
1.00 1.00
0.99 0.99
1000
0.91 0.90
0.56 0.60
0.96 0.96
0.48 0.49
1000

2
3
0.97 0.97
0.92 0.92
0.98 0.97
0.86 0.87
1000
0.61 0.61
0.35 0.36
0.70 0.70
0.31 0.28
1000

2
3
0.43 0.43
0.32 0.33
0.40 0.39
0.60 0.58
999
0.17 0.16
0.10 0.10
0.23 0.24
0.35 0.33
999

2
3
0.15 0.14
0.14 0.12
0.16 0.15
0.69 0.68
995
0.09 0.09
0.08 0.07
0.16 0.16
0.53 0.52
987

ρ=1
The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes.

104

Figure 2.3: Rejection Frequencies for Normal(0,1) with varying π2 , (π1 = τ = 0)

1.0
0.9
0.8
0.7
0.6
0.5

0.4
0.3

0.2
0.1
0.0

0.1

Two Step

0.2

LCF

0.5

Two Step flexible

1

π2

LCF flexible

Figure 2.4: Rejection Frequencies for χ23 with varying π2 , (π1 = τ = 0)

1.0
0.9
0.8
0.7
0.6

0.5
0.4

0.3
0.2
0.1
0.0

0.1
Two Step

0.2
LCF

0.5
Two Step flexible

1

π2

LCF flexible

Therefore, the simulations with the data generated with the two distributions demonstrate that when the instrument is strong, the two step estimation’s approximation to
105

APEs is as good as the alternative methods. Plus, although a weak instrument deteriorates its approximation, it makes much worse the approximations by the alternative linear
models. In addition, the two step estimation’s test for endogeneity show good size and
better power than LCF’s.
Case 2: Endogeneity comes through e as well as u.
Now we consider a setting where D (e|v) is a heteroskedastic normal, in which w is not
exogenous even though ρ = 0. For this case, we include two additional estimation methods treating that all covariates are exogenous: Fmlogit and OLS. We would like to examine
how they work as ρ = 0.
When z2 is a strong instrument,19 Table 2.10 shows that the estimates have bigger biases than those with the standard normal distribution, and that the larger ρ is, the larger
their biases are.20 Among the estimation methods, LCF flexible is the best with regard to
bias. Yet, in considering the mean squared errors (MSEs) in Table 2.11, the two step estimation method and the other alternative methods also provide good approximations.21
Interestingly, as ρ = 0, the estimates of Fmlogit and OLS are quite similar to δg . So
these estimates suggest that the dependence w and r due to the heteroskedasticity of e
does not cause any distortion in estimating APEs.
With the strong degree of endogeneity (ρ = 1), although the biases in Table 2.12 are
not monotonically decreasing as π2 is larger, the MSEs in Table 2.13 are. Note that when
a weak instrument is used, LCF Flexible provides more biased estimates than Fmlogit and
OLS, and the MSEs of the methods taking the endogeneity into account are not smaller
than them except the two step estimation method with π2 = 0.2. The results also suggest
that the flexible methods are not helpful at all.
19

The first stage F statistics and the first step Wald statistics in this case have similar summary statistics
as in Table 2.4.
20 The results with other values of ρ are available upon request.
21 When τ = −0.5 and ( π , π ) = (0.5, 1), the results are not much different from those as π = 1 in
2
2
1
Table 2.10 and2.11.

106

Table 2.10: APEs with r = ρu + e and π2 = 1 in Case 2
ρ
g
δg
Two Step
Two Step
Flexible
Two Step
Plug-in
Fmlogit
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in
OLS

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

1
-0.106
-0.115
(0.031)
-0.112
(0.038)
-0.114
(0.032)
-0.224
(0.018)
-0.115
(0.032)
-0.106
(0.039)
-0.113
(0.032)
-0.114
(0.032)
-0.114
(0.032)
-0.224
(0.018)

1.

e|v ∼ Normal 0, 1 + 21 v2

2.

π1 = τ = 0.

1
2
0.053
0.058
(0.018)
0.056
(0.021)
0.057
(0.018)
0.112
(0.010)
0.058
(0.018)
0.053
(0.021)
0.057
(0.018)
0.057
(0.018)
0.057
(0.018)
0.112
(0.010)

107

3
0.053
0.057
(0.017)
0.056
(0.021)
0.057
(0.017)
0.112
(0.010)
0.058
(0.018)
0.053
(0.021)
0.056
(0.018)
0.057
(0.017)
0.057
(0.017)
0.112
(0.010)

1
-0.107
-0.112
(0.028)
-0.115
(0.033)
-0.111
(0.027)
-0.167
(0.016)
-0.111
(0.029)
-0.107
(0.034)
-0.110
(0.028)
-0.110
(0.028)
-0.110
(0.028)
-0.167
(0.016)

0.5
2
0.053
0.056
(0.016)
0.058
(0.019)
0.055
(0.016)
0.083
(0.009)
0.056
(0.016)
0.053
(0.019)
0.055
(0.016)
0.055
(0.016)
0.055
(0.016)
0.083
(0.009)

3
0.053
0.056
(0.015)
0.057
(0.019)
0.055
(0.015)
0.083
(0.009)
0.055
(0.016)
0.053
(0.019)
0.055
(0.016)
0.0554
(0.015)
0.055
(0.015)
0.083
(0.009)

Table 2.10: (cont’d)
ρ
g
δg
Two Step
Two Step
Flexible
Two Step
Plug-in
Fmlogit
LCF
LCF
Flexible
2SLS
LIV
(IV=Φ(·))
Linear
Plug-in
OLS

Mean
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD

1
-0.107
-0.109
(0.026)
-0.115
(0.031)
-0.108
(0.025)
-0.119
(0.015)
-0.108
(0.027)
-0.107
(0.031)
-0.108
(0.026)
-0.108
(0.026)
-0.108
(0.026)
-0.119
(0.016)

0.1
2
0.054
0.055
(0.015)
0.058
(0.018)
0.054
(0.015)
0.059
(0.009)
0.054
(0.015)
0.054
(0.018)
0.054
(0.015)
0.054
(0.015)
0.054
(0.015)
0.059
(0.009)

108

3
0.054
0.054
(0.015)
0.057
(0.018)
0.054
(0.015)
0.059
(0.009)
0.054
(0.015)
0.053
(0.018)
0.054
(0.015)
0.054
(0.015)
0.054
(0.015)
0.059
(0.009)

1
-0.107
-0.108
(0.026)
-0.115
(0.031)
-0.108
(0.025)
-0.107
(0.015)
-0.108
(0.027)
-0.107
(0.031)
-0.108
(0.026)
-0.108
(0.025)
-0.108
(0.026)
-0.107
(0.016)

0
2
0.054
0.054
(0.015)
0.057
(0.017)
0.054
(0.015)
0.053
(0.009)
0.054
(0.015)
0.054
(0.018)
0.054
(0.015)
0.054
(0.015)
0.054
(0.015)
0.053
(0.009)

3
0.054
0.054
(0.015)
0.057
(0.018)
0.054
(0.014)
0.053
(0.009)
0.054
(0.015)
0.054
(0.018)
0.054
(0.015)
0.054
(0.015)
0.054
(0.015)
0.053
(0.009)

Table 2.11: MSEs of APE estimates with r = ρu + e and π2 = 1 in Case 2
ρ
g
Two Step
Two Step Flexible
Two Step Plug-in
Fmlogit
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in
OLS

1
0.001
0.002
0.001
0.014
0.001
0.001
0.001
0.001
0.001
0.014

1
2
0.000
0.000
0.000
0.004
0.000
0.000
0.000
0.000
0.000
0.004

3
0.000
0.000
0.000
0.004
0.000
0.000
0.000
0.000
0.000
0.004

1.

e|v ∼ Normal 0, 1 + 12 v2

2.

MSEs are calculated from Table 2.10.

1
0.001
0.001
0.001
0.004
0.001
0.001
0.001
0.001
0.001
0.004

0.5
2
0.000
0.000
0.000
0.001
0.000
0.000
0.000
0.000
0.000
0.001

109

3
0.000
0.000
0.000
0.001
0.000
0.000
0.000
0.000
0.000
0.001

1
0.001
0.001
0.001
0.000
0.001
0.001
0.001
0.001
0.001
0.000

0.1
2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

1
0.001
0.001
0.001
0.000
0.001
0.001
0.001
0.001
0.001
0.000

0
2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

Table 2.12: APEs with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2
π2
g
Mean
Mean
SD
Two Step Mean
SD
Flexible
Two Step Mean
SD
Plug-in
Fmlogit
Mean
SD
LCF
Mean
SD
LCF
Mean
SD
Flexible
2SLS
Mean
SD
LIV
Mean
(IV=Φ(·))
SD
Linear
Mean
Plug-in
SD
OLS
Mean
SD
Replications
δg
Two Step

0.1
1
2
-0.102
0.051
-0.175
0.093
(0.324) (0.192)
-0.198
0.097
(0.440) (0.286)
-0.138
0.073
(0.483) (0.280)
-0.333
0.167
(0.020) (0.010)
-0.205
0.112
(2.644) (1.284)
0.884
-0.446
(11.653) (7.151)
-0.194
0.108
(2.241) (1.314)
-0.221
0.142
(2.618) (1.842)
-0.197
0.112
(2.213) (1.268)
-0.333
0.167
(0.020) (0.010)
997/1000

1.

e|v ∼ Normal 0, 1 + 12 v2

2.

π1 = τ = 0.

3
0.051
0.082
(0.192)
0.101
(0.293)
0.064
(0.280)
0.167
(0.011)
0.093
(1.509)
-0.438
(5.858)
0.085
(1.189)
0.079
(1.314)
0.086
(1.203)
0.167
(0.011)

0.2
1
2
3
-0.102 0.051
0.051
-0.109 0.057
0.052
(0.212) (0.116) (0.117)
-0.160 0.081
0.079
(0.378) (0.223) (0.227)
-0.090 0.047
0.042
(0.271) (0.146) (0.148)
-0.330 0.165
0.165
(0.020) (0.010) (0.011)
-0.103 0.056
0.047
(0.782) (0.469) (0.336)
0.111
-0.060 -0.051
(3.230) (2.032) (1.919)
-0.089 0.051
0.038
(0.575) (0.398) (0.232)
-0.079 0.043
0.037
(0.397) (0.207) (0.219)
-0.091 0.052
0.039
(0.628) (0.430) (0.246)
-0.330 0.165
0.165
(0.020) (0.010) (0.011)
1000/1000

110

0.5
1
2
3
-0.103 0.051
0.051
-0.125 0.063
0.062
(0.081) (0.044) (0.043)
-0.120 0.060
0.060
(0.149) (0.079) (0.081)
-0.126 0.063
0.063
(0.080) (0.043) (0.042)
-0.306 0.153
0.153
(0.020) (0.010) (0.011)
-0.128 0.064
0.064
(0.082) (0.044) (0.043)
-0.094 0.047
0.047
(0.156) (0.082) (0.085)
-0.126 0.063
0.063
(0.081) (0.044) (0.043)
-0.127 0.064
0.063
(0.081) (0.044) (0.043)
-0.127 0.064
0.063
(0.081) (0.044) (0.043)
-0.306 0.153
0.153
(0.020) (0.010) (0.011)
1000/1000

Table 2.13: MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2
π2
g
Two Step
Two Step Flexible
Two Step Plug-in
Fmlogit
LCF
LCF Flexible
2SLS
LIV (IV=Φ(·))
Linear Plug-in
OLS

1
0.110
0.203
0.234
0.054
7.004
136.772
5.031
6.867
4.908
0.054

0.1
2
0.039
0.084
0.079
0.013
1.652
51.391
1.730
3.403
1.611
0.013

3
0.038
0.088
0.079
0.014
2.278
34.559
1.415
1.727
1.449
0.014

1.

e|v ∼ Normal 0, 1 + 12 v2

2.

π1 = τ = 0.
MSEs are calculated from Table 2.12.

3.

111

1
0.045
0.146
0.073
0.052
0.611
10.480
0.330
0.158
0.395
0.053

0.2
2
3
0.014 0.014
0.051 0.052
0.021 0.022
0.013 0.013
0.220 0.113
4.141 3.695
0.159 0.054
0.043 0.048
0.185 0.060
0.013 0.013

1
0.007
0.023
0.007
0.042
0.007
0.024
0.007
0.007
0.007
0.042

0.5
2
0.002
0.006
0.002
0.010
0.002
0.007
0.002
0.002
0.002
0.010

3
0.002
0.007
0.002
0.010
0.002
0.007
0.002
0.002
0.002
0.010

The rejection frequencies in Table 2.14 and 2.15 tell that the tests by the two step estimation method and LCF cannot detect the endogeneity due to the heteroskedasticity of
e; as ρ = 0, their rejection frequencies are still close to 0.05. Yet, those by their flexible
methods can detect it. The form of heteroskedasticity and the flexible methods contain
v2 and v2 , respectively. We suspect that it might be related with that the flexible methods
yield higher rejection frequencies.
As ρ > 0, the patters in the rejection frequencies are similar as those with the standard
normal distribution although the rejection frequencies themselves are generally smaller.
It is more obviously shown in Figure 2.5 and 2.6.22
Overall, we find that the results with the heteroskedastic normal distribution are pretty
similar to those of the standard normal distribution regarding approximations to APEs
and a test of endogeneity. The two additional estimation methods provide evidence that
the endogeneity caused only by heteroskedasticity does not matter to obtain an approximation to APEs. Plus, although including the higher polynomial v ensure the two step
estimation method and LCF to have tests detecting this kind of endogeneity, it does not
helps their approximations to APEs.
In summary, the simulations demonstrate that the two step estimation method with
a misspecified conditional mean works well as an approximation if the instrument is
strong. Even with a weak instrument, it yields a better approximation than the alternative methods. In addition, its test for endogeneity has about the correct size and better
power than LCF’s.

22 The graphs are average rejection frequencies for the two choices at each value of ρ and π in Table 2.14
2
and 2.15, respectively.

112

Table 2.14: Rejection Frequencies for α = 0.05 test in Case 2 with varying ρ
D (e|v)
Normal
0, 1 + 12 v2

1.
2.

0

0.1

0.25

0.5

0.75

0.9

2
0.08

3
0.07

2
0.08

3
0.08

2
0.20

3
0.20

2
0.53

3
0.53

2
0.84

3
0.85

2
0.94

3
0.94

2
0.96

3
0.96

LCF
Two Step flexible
LCF flexible

0.07
0.43
0.31

0.07
0.43
0.30

0.07
0.45
0.31

0.07
0.45
0.32

0.16
0.50
0.39

0.17
0.50
0.37

0.46
0.70
0.58

0.47
0.70
0.59

0.77
0.88
0.80

0.77
0.89
0.81

0.90
0.95
0.89

0.90
0.94
0.89

0.94
0.97
0.93

0.94
0.97
0.92

π2 = 1, π1 = τ = 0.
The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes.
Table 2.15: Rejection Frequencies for α = 0.05 test in Case 2 with varying π2

D (e|v)
Normal
0, 1 + 12 v2

1.
2.

1

ρ
g
Two Step

-0.5
0.5
1

τ
π1
π2
g
Two Step

2
0.94

LCF
Two Step flexible
LCF flexible
Replication

0.91 0.91
0.94 0.94
0.90 0.92
1000

3
0.94

0
0
1
2
0.96

0.5
3
0.96

0.94 0.94
0.97 0.97
0.93 0.92
1000

2
0.69

3
0.70

0.62 0.65
0.66 0.66
0.63 0.62
1000

0.2
2
0.21

3
0.21

0.18 0.18
0.18 0.18
0.39 0.39
1000

0.1
2
0.09

3
0.10

0.09 0.09
0.10 0.10
0.51 0.52
997

ρ=1
The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes.

113

Figure 2.5: Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 = τ = 0)

1.0
0.9
0.8
0.7
0.6
0.5

0.4
0.3

0.2
0.1
0.0

0

0.1

Two Step

0.25

LCF

0.5

0.75

Two Step flexible

0.9

1

ρ

LCF flexible

Figure 2.6: Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 = τ = 0)

1.0
0.9
0.8
0.7
0.6
0.5

0.4
0.3

0.2
0.1
0.0

0.1

Two Step

0.2

LCF

0.5

Two Step flexible

114

1

LCF flexible

π2

2.4

CONCLUSION

This chapter studies a two step estimation method for multiple fractional dependent variables especially when there is a binary endogenous explanatory variable. By employing a
control function approach suggested by Wooldridge (2014), this method is an extension of
a two step estimation method developed in Chapter 1 where the endogenous explanatory
variable is continuous.
With the assumption that the conditional mean, conditional on only observed variables and the generalized residual, is multinomial logit, it applies a QMLE to obtain consistent estimator of the parameters in the conditional mean. Although it is not able to
estimate the mean parameter in the structural conditional mean, it is able to provide a
consistent estimator of APE, which is often more interesting.
By conducting Monte Carlo simulations, this chapter provides evidence that even
when the conditional mean is misspecified, the two step estimation method yields a
decent approximation to average partial effect. Its approximation is as good as or often better than the alternative methods including a linear control function approach, the
standard two stage least squares, and plug-in methods. This results tell a consistent story
with Chapter 1. In addition, the simulations demonstrate that the two step estimation
method’s test for endogeneity outperforms the linear control function approach in power
although both of the two methods have approximately the correct size.

115

REFERENCES

116

REFERENCES

Blundell, R., and J. L. Powell. 2003. “Endogeneity in Nonparametric and Semiparametric Regression Models.” In Advances in Economics and Econometrics. eds. by L. P. H.
Dewatripont, Mathias, and S. J. Turnovsky, 2: Cambridge: Cambridge University Press,
, 1st edition 312–357.
Gourieroux, C., A. Monfort, E. Renault, and A. Trognon. 1987. “Generalised residuals.”
Journal of Econometrics, 34(1-2): 5–32.
,
, and A. Trognon. 1984. “Pseudo Maximum Likelihood Methods: Theory.” Econometrica, 52(3): 681–700.
Mullahy, J. 2010. “Multivariate Fractional Regression Estimation of Econometric Share
Models.” NBER Working Papers 16354, National Bureau of Economic Research, Inc.
Nam, S. 2014. “Multiple fractional response variables with continuous endogenous explanatory variables.” June.
Sivakumar, A., and C. Bhat. 2002. “Fractional Split-Distribution Model for Statewide
Commodity-Flow Analysis.” Transportation Research Record, 1790(1): 80–88.
Staiger, D., and J. H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65(3): 557–586.
Terza, J. V., A. Basu, and P. J. Rathouz. 2008. “Two-stage residual inclusion estimation:
Addressing endogeneity in health econometric modeling.” Journal of Health Economics,
27(3): 531–543.
Wooldridge, J. M. 2005. “Unobserved Heterogeneity and Estimation of Average Partial
Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg. eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, ,
Chap. 3.
. 2010. Econometric Analysis of Cross Section and Panel Data.: MIT Press.
Wooldridge, J. M. 2014. “Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables.” Journal of Econometrics.

117

CHAPTER 3
ON COMPUTING AVERAGE PARTIAL
EFFECTS IN MODELS WITH
ENDOGENEITY OR HETEROGENEITY
3.1

INTRODUCTION

It is now widely recognized that magnitudes of partial effects are important for determining the importance of policy interventions and for understanding the strengths of
relationships posited by economic theory. In models where unobservables are assumed
to be independent of observed covariates (or, in some cases, a weaker assumption, such
as mean independence), there is little controversy about how one should compute partial
effects – at least once one has decided whether the conditional mean or some other feature of a conditional distribution is the focus. Assuming, as we do in this chapter, that
the conditional mean is the focus, one typically computes partial derivatives or discrete
changes of a conditional mean function with respect to the explanatory variables of interest. To obtain a single number summarizing the effect of a covariate, one often averages
the partial effects across the distribution of the covariates. This average leads to the notion of an “average partial effect” (or APE), and APEs are typically straightforward to
estimate in parametric, semiparametric, and even nonparametric models with exogenous
explanatory variables.1 Other possibilities for summarizing the effect of a covariate is
to insert specific values of the covariates, such as means or medians, into the estimated
1 We

prefer the name “partial effect” to “marginal effect” because “marginal” connotes a small change.
Average partial effects can be computed using derivatives or discrete changes, and the discrete changes can
be of any magnitude.

118

partial effects, but these seems less desirable and is less popular.
When unobservables are correlated with one or more covariates – so we have some
form of “endogeneity” – it is not clear how one should summarize partial effects. Blundell and Powell (2003) propose the notion of an “average structural function” (ASF).
The ASF is defined as a function of covariates after the unobservables have been averaged out. More precisely, suppose a response Y is determined as Y = G (X, U) for
observed covariates X and unobservables U. We obtain the ASF as a function of x by
inserting x into G (·, ·) and then averaging across the unobservables. More precisely,
ASF (x) ≡ EU [ G (x, U)]. Partial effects are then obtained by taking partial derivatives
or differences of ASF (x). In general, the partial effects defined in this way depend on
x. Blundell and Powell (2003) showed how to estimate the ASF very generally in models
with endogenous explanatory variables, provided, of course, one has sufficient instrumental variables. Wooldridge (2005b) focused on the partial effects, which in the continuous case are defined as ∂ASF (x)/∂x j , in a similar setting, but with parametric models.2
Part of the appeal of the ASF is that its definition is the same regardless of whether U
and X are dependent. Because the ASF is a function of x, one can see how partial effects
on the ASF change as elements of x change, and the differences across different values
of x can be of significant interest. Even so, one often wants to compare estimates from
nonlinear models with estimates from simple linear models – which are often estimated
by two stage least squares or, in the case of panel data, standard fixed effects methods.
The question is: How should one summarize the partial effects of observed covariates
in nonlinear models to make them comparable to linear estimates? There are two possibilities. First, we can obtain the partial effects from the ASF and then average across
the distribution of the observed covariates, X. A second possibility is to compute partial
effects with respect to X from Y = G (X, U) and then average across (X, U). As we will
see in Section 3.3, these two methods are not generally the same, and they can actually be
2 In

Wooldridge (2010)and the first edition, the notion of an APE is used regularly for models with
endogeneity or correlated random effects.

119

very different.
Wooldridge (2010) discusses APEs based on the average structural function and those
based directly on a conditional mean specification such as E(Y |X, U) = G (X, U). But he
makes no systematic attempt to compare these APEs, and does not even mention that they
can be different. This chapter has two goals. The first is to clarify the relationship between
the two kinds of partial effects and demonstrate that they generally differ. We will not
resolve which partial effect is “better” because it is mainly a matter of test. Second, we
show that the kind of control function/correlated random effects approaches discussed
in Blundell and Powell (2003), Altonji and Matzkin (2005), and Wooldridge (2005b, 2010)
can be used to consistently estimate both types of partial effects.
The rest of the chapter is organized as follows. Section 3.2 defines the average structural function, slightly extending the definition in Blundell and Powell (2003). We then
show that the key Blundell and Powell result about identifying the ASF when suitable
“proxies” are available carries through with the more general definition. We also note
that, assuming derivatives can be passed under integrals, the ASF identifies average partial effects.
Section 3.3 offers a different way to define a single average partial effect and shows
that it is generally different from a definition based on averaging the observed covariates out of the average structural function. We also show that if the heterogeneity and
covariates are assumed to be independent, the two ways of computing APEs coincide.
Section 3.4 illustrates how to compute the two types of partial effects in two empirical
examples using Michigan Educational Assessment Program data, and Section 3.5 summarizes and concludes.

120

3.2

THE AVERAGE STRUCTURAL FUNCTION AND AVERAGE PARTIAL EFFECTS

In what follows, it is helpful to use a notation that clearly distinguishes random vectors from particular outcomes of those vectors. We use the traditional convention from
probability that upper case letters are random variables or vectors and the lower case
counterparts are specific possible values.
We are interested in the conditional mean of a response variable, Y, conditional on a
vector of observed covariates, X, and a vector of unosbervables, U:

E(Y |X, U) = G (X, U),

(3.1)

where (Y, X, U) has a joint distribution in a population. We can also write

E(Y |X = x, U = u) = G (x, u).

(3.2)

This setup subsumes that in Blundell and Powell (2003), who assume that Y is a deterministic function of (X, U). Because conditional probabilities can be written as conditional
means we cover partial effects for probabilities as a special case.
In many applications, G (·, u) is continuously differentiable on the the support of X,

X , which we would then assume to be an open set. But we are also interested in cases
where G (·, ·) is not differentiable in either argument. For example, if Y is binary and we
assume
Y = 1[α + Xβ + U > 0]
for a K × 1 vector β, G (·, ·) is not differentiable. In this case we define partial effects
as changes. For example, in moving the K th variable from x0K to x1K , holding the other

121

elements of x, say x(K ) fixed, the partial effect is
1[α + x(K ) β(K ) + β K x1K + u > 0] − 1[α + x(K ) β(K ) + β K x0K + u > 0]
More generally, the effect of changing x from x0 to x1 is
G ( x1 , u ) − G ( x0 , u ).

When G (·, u) is continuously differentiable in x j we can define the partial effect with
respect to x j as
g j (x, u) =

∂G (x, u)
.
∂x j

(3.3)

Wooldridge (2005b, 2010) defines the average partial effect as

APEj (x) = EU [ g j (x, U)],

(3.4)

so that the unobservables are averaged out. It is important to see that the APE is obtained
by inserting a fixed value for x and then averaging across the unconditional distribution of
U. We are not using the conditional distribution, D (U|X), in the averaging, and we are not
restricting this conditional distribution. In the special case that U and X are independent,
D (U|X) = D (U) and then the distinction is irrelevant.
As discussed in Wooldridge (2005b, 2010, Section 2.2.5), APEj (x) is closely related to
the notion of an average structural function (Blundell and Powell, 2003):

ASF (x) = EU [ G (x, U)].

(3.5)

Actually, this definition is somewhat more general than that used by Blundell and Powell,
who effectively write Y = G (X, U). The more general definition is important in situations
where Y is discrete and we model endogeneity as correlation between one or more omit-

122

ted variables, in U, and one or more observed covariates, in X.
The ASF is a very useful function because one can take derivatives or changes with respect to elements of x. Assuming that the partial derivative can be taken through the sum
or integral defining EU [ G (x, U)] – see, for example, Bartle (1966) for general conditions –
we have
APEj (x) =

∂ASF (x)
.
∂x j

(3.6)

As mentioned earlier, the definition of the ASF takes no stand on D (U|X). However,
if U and X are independent then the ASF is the same as the conditional expectation. This
follows immediately from the the law of iterated expectations and the general Lebesgue
representation of conditional expectation. Let

E(Y |X = x) = E[ E(Y |X, U)|X = x] =

=

G (x, u) Q(du|x)

G (x, u) Q(du) = EU [ G (x, U)] ≡ ASF (x),

(3.7)

where Q(·) is an appropriate σ-finite measure.
When U and X are dependent, we generally cannot obtain interesting partial effects
by estimating E(Y |X). There are exceptions, of course. For example, if Y = α + Xfi + U
with E(U ) = 0, E(X U ) = 0, but E(U |X) = E(U ), OLS using a random sample is generally consistent for (α, β), which indexes the ASF: ASF (x) = α + xfi. However, simple
extensions with correlated random slopes do permit OLS to consistently estimate the ASF.
And in nonlinear models, directly estimating E(Y |X) rarely leads to interesting quantities
unless U and X are independent.
Blundell and Powell (2003) and Wooldridge (2005b) show how to identify the ASF
when “proxy” variables for U, say V, are available. Sometimes we assume that we observe suitable proxies, such as standardized test scores to proxy for cognitive ability. In
the context of BP (2003), V is a vector of reduced form errors for endogenous elements of
X. When V is a vector of reduced form errors we need exogenous variables from outside

123

the equation to serve as instruments. Suitable proxies are also available in a panel data
context, where V can be a vector of functions of a time series of covariates {X1 , X2 , ..., X T }.
A leading case is V = T −1 ∑tT=1 Xt . See, for example, Altonji and Matzkin (2005) and
Wooldridge (2005a, 2010).
Two key restrictions on V suffice to identify average partial effects. The first is that V
is redundant in the “structural” conditional expectation. The second is that V is a good
enough proxy for U so that U and X are independent conditional on V. Formally, we state
these assumptions as

E(Y |X, U, V) = E(Y |X, U) (redundancy of V)
D (U|X, V) = D (U|V) (conditional independence)

(3.8)
(3.9)

Sometimes assumption (3.8) is called a “conditional mean independence” assumption,
because the mean of Y is independent of V once we also condition on (X, U).
Under assumptions (3.8) and (3.9), we have an important identification result, which
was discovered independently in different settings by Blundell and Powell (2003), Altonji and Matzkin (2005), and Wooldridge (2002, 2005b). In what follows we assume that
conditional means exist, along with standard moment conditions.
Proposition 1: Let (Y, X, U, V) be a random vector such that (3.8) and (3.9) hold. Define
H (x, v) ≡ E(Y |X = x, V = v).

(3.10)

ASF (x) = EV [ H (x, V)]

(3.11)

Then

124

Proof: By the law of iterated expectations and the redundancy condition (3.8),

E(Y |X = x, V = v) = E[ E(Y |X = x, U, V = v)|X = x, V = v]

= E[ E(Y |X = x, U)|X = x, V = v]
= E[ G (x, U)|X = x, V = v]

Next, by conditional independence (3.9),

E[ G (x, U)|X = x, V = v] = E[ G (x, U)|V = v],

and so we have established the key relationship

H (x, v) = E[ G (x, U)|V = v].

(3.12)

Now integrate (in the measure theoretic sense) both sides with respect to the distribution
of V and use iterated expectations on the right:

EV [ H (x, V)] = EV { E[ G (x, U)|V]} = EU [ G (x, U)] = ASF (x).

We can use Proposition 1 to obtain useful formulas for partial effects based on discrete
changes. The “structural” quantity of interest is

ASF (x1 ) − ASF (x0 ) = EU [ G (x1 , U) − G (x0 , U)]

and Proposition 1 shows that this is the same as

EV [ H (x1 , V) − H (x0 , V)].

An important special case, suppose XK is a binary treatment indicator – perhaps indicat-

125

ing participation in a program. Generally, we can estimate the average treatment effect
of the program with the observables set at particular values. In particular, under the
conditions of Proposition 1,

EV [ H (x(K ) , 1, V) − H (x(K ) , 0, V)] = APEK (x(K ) ) = EU [ G (x(K ) , 1, U) − G (x(K ) , 0, U)],
(3.13)
where x(K ) denotes x without xK .
For partial effects defined as derivatives, we have the following.
Proposition 2: Under the same assumptions in Proposition 1, assume in addition that
G (·, u) is continuously differentiable and the partial derivative with to x j can be passed
through the integrals defining EV [ H (x, V)] and EU [ G (x, U)] then

EV

∂H (x, V)
∂x j

= EU

∂G (x, U)
∂x j

(3.14)

and so the APEs can be gotten from E(Y |X = x, V = v) by taking derivatives (or changes)
with respect to x j and then averaging out V.
The conclusion in equation (3.11) is very powerful, especially considering that there
are general conditions under which H (x, v) is nonparametrically identified. Blundell and
Powell (2003) study the case where some elements of X are correlated with U, but exogenous variables Z are available such that endogenous variables X2 can be represented
as
X2 = F ( Z ) + V

(3.15)

where (U, V) is independent of Z. Provided there are enough elements Z2 , where Z =

(X1 , Z2 ), the function H (x, v) is nonparametrically identified. Of course, one can use
parametric models or semiparametric models to estimate the functions F(z) and H (x, v)
– either as the true models or as convenient approximations.
We can turn the population formulas into estimators very generally. Given a random

126

sample {(Yi , Xi , Vi ) : i = 1, ..., N } and a consistent estimator Hˆ (x, v) of E(Y |X = x, V =
v), a generally consistent estimator of ASF (x) is
N

ASF (x) = N −1 ∑ Hˆ (x, Vi ),

(3.16)

i =1

and the previous analysis shows that we can take derivatives or changes with respect to
elements in x to estimate average partial effects. In some cases, including in the Blundell
ˆ i , which depends on parameters or
and Powell (2003) setup, Vi must be replaced with V
functions that are consistently estimated in a first stage.
In many studies one wants to report a single number that measures the effect of, say,
x j on ASF (x). We might do this by evaluating APEj (x) at a central values of the elements
of x, such as the means or medians. In the spirit of the average treatment effect literature,
it is probably more appealing to average APEj (X) across the distribution of X:

δj = EX [ APEj (X)].

(3.17)

A consistent estimate of δj is immediate under standard regularity conditions, provided
APE j (x) is consistent for each x:
N

δˆj = N −1 ∑ APE j (Xi ).

(3.18)

i =1

It turns out that the quantity in (3.18) is not the most commonly reported in empirical
studies, and it is a bit cumbersome to compute. Plus, obtaining a standard error via
analytical methods is a bit tricky. In the next section we discuss APEs where we average
jointly across the distribution of (X, U).

127

3.3

APES ACROSS THE ENTIRE POPULATION

As in the previous section, we assume interest centers on E(Y |X = x, U = u) = G (x, u),
but now we wish to average the partial effects across the joint distribution of the observables and unobservables. For example, in moving xK from x0K to x1K , the average partial
effect across the entire population is
E[X(K) ,U] G (X(K ) , x1K , U) − G (X(K ) , x0K , U) .
For a continuous variable X j we may define the partial effect as

g j (x, u) =

∂G (x, u)
∂x j

(3.19)

and then the parameter of interest is

η j = E(X,U) g j (X, U) .

(3.20)

Notice that X and U are treated symmetrically in equation (3.20), as in the treatment effects literature [for example, Imbens and Wooldridge (2009)]. In studying identification of
(3.20) it makes no sense to start with the ASF because E(X,U) [ G (X, U)] = E(Y ) by iterated
expectations, and so the expected value of the ASF with respect (X, U) conveys no useful
information about how X j affects Y.
Focusing for now on the the case where X j is continuous, the partial effect defined by
(3.20) generally differs from δj in (3.17). In other words,

EX

∂ASF (X)
∂x j

= E(X,U) g j (X, U) .

(3.21)

To see why, it is helpful to consider a simple example, where X and U are both scalars.

128

Assume the conditional mean function is
E(Y | X, U ) = β 0 + β 1 X + β 2 U + β 3 X 2 U,

(3.22)

where E(U ) = 0. Then
ASF ( x ) = EU ( β 0 + β 1 x + β 2 U + β 3 x2 U ) = β 0 + β 1 x

and so
∂ASF ( x )
= β1.
∂x

(3.23)

No further averaging is needed to obtain a single effect because APE( x ) is constant.
By contrast, the partial effect of x on E(Y | X = x, U = u) is

PE( x, u) =

∂E(Y | X = x, U = u)
= β 1 + 2β 3 xu
∂x

(3.24)

and so
η ≡ E(X,U ) [ PE( X, U )] = β 1 + 2β 3 Cov( X, U ).

(3.25)

Only if U and X are uncorrelated does (3.25) equal the partial derivative of the ASF. With
substantial correlation between X and U or if β 3 is large in magnitude, the difference
between (3.25) and (3.23) can be substantial.
When one wants to study how the partial effects differ across a range of values of the
explanatory variables, the ASF seems to be natural quantity of interest. But it is less clear
that (3.17) is the best definition of the average effect across the population. If we follow
the approach from the average treatment effects literature then (3.20) is preferred.
As in the previous section, we first state a result that can be applied to partial effects
defined by differences.
Proposition 3: Under the same assumptions as Proposition 1, let xK be a fixed value.

129

Then
E[X(K) ,V] H (X(K ) , xK , V) = E[X(K) ,U] G (X(K ) , xK , U) .
Proof: From equation (3.12) we can write H (x(K ) , xK , v) = E[ G (x(K ) , xK , U)|V = v].
Now use conditional independence again:

E[ G (x(K ) , xK , U)|V = v] = E[ G (x(K ) , xK , U)|X(K ) = x(K ) , V = v]

= E[ G (X(K) , xK , U)|X(K) = x(K) , V = v],
which we can write in terms of random variables as

H (X(K ) , xK , V) = E[ G (X(K ) , xK , U)|X(K ) , V].
The proof is finished by taking the expected value with respect to (X(K ) , V) and using
iterated expectations.
For the continuous case, we have the following. Define

h j (x, v) =

∂H (x, v)
,
∂x j

(3.26)

the partial effect of E(Y |X = x, V = v) with respect to x j .
Proposition 4: Define η j as in equation (3.20). Under the same assumptions as Proposition 2,
η j = E(X,V) h j (X, V) .

(3.27)

Proof: From equation (3.12), H (x, v) = E[ G (x, U)|V = v], and assuming the partial
derivative with respect to x j can be passed through the integrals,

h j (x, v) = E[ g j (x, U)|V = v].

130

Now use the conditional independence assumption again:

h j (x, v) = E[ g j (x, U)|V = v] = E[ g j (x, U)|X = x, V = v]

= E[ g j (X, U)|X = x, V = v],
which we can write in terms of random variables as

h j (X, V) = E[ g j (X, U)|X, V].

Now use iterated expectations to get

E(X,V) h j (X, V) = E(X,U) g j (X, U) .

(3.28)

The four propositions stated in this and the previous section show that, under the
same set of “control function” assumptions, average partial effects obtained by averaging
X out of the ASF and those obtained by averaging the partial effects of E(Y |X, U) are
generally identified. For better or worse, these partial effects are not generally the same,
and there is unlikely to be concensus on which measure is “best.” The differences can
be economically important. For example, in equation (3.22), the partial effect of the ASF,
averaged across X, is simply β 1 . The average partial effect across ( X, U ) is β 1 + 2β 3 σXU .
In principal, these need not even have the same sign, let alone similar magnitudes. (In
practice, β 3 might be small because it is the coefficient on the interaction X 2 U.)
The example in equation (3.22) can also be used to illustrate the main result of this
section, namely, that the η j in (3.20) are identified if we have access to a suitable proxy
variable, V. Assume V – which is observed or depends on parameters that we can consistently estimate – has a zero mean. Make the redundancy assumption along with a

131

linearity assumption on E(U |V ):

E(U | X, V ) = E(U |V ) = θV

(3.29)

Then
E(Y | X, V ) = β 0 + β 1 X + β 2 E(U | X, V ) + β 3 X 2 E(U | X, V )

= β 0 + β 1 X + β 2 θV + β 3 X 2 θV
≡ h( X, V )

The partial derivative of h( x, v) with respect to x is
∂h( x, v)
= β 1 + 2γ3 xv
∂x
where γ3 = β 3 θ. Now

E(X,V ) ( β 1 + 2γ3 XV ) = β 1 + 2γ3 E( XV )

= β 1 + 2β 3 E( XθV )
and E( XθV ) = E[ XE(U |V )] = E[ E( XU | X, V )] = E( XU ) by iterated expectations. This
shows that estimating E(Y | X, V ), taking the partial effect with respect to X, and then averaging across ( X, V ) is the same as starting with E(Y | X, U ) and performing the same operations. This same analysis shows that β 1 = ∂ASF ( x )/∂x is also identified by E(Y | X, V ).
As mentioned earlier, there is one important case where the different definitions of the
average partial effects are the same. We already showed that when U and X are independent then ASF (x) = E(Y |X = x). We now can say more. Namely, basing the average
partial effect on the ASF, or first taking the partial derivative of the original structural
function, give the same answer after all random variables are averaged out.

132

Proposition 5: Under the assumptions of Proposition 4, assume that U and X are
independent. Then
E(X,U) g j (X, U) = EX

∂ASF (X)
.
∂x j

Proof: By the law of iterated expectations,

E(X,U) g j (X, U) = EX { E[ g j (X, U)|X]}.
Now, we can write

E[ g j (X, U)|X = x] =

g j (x, U) Q(du|x) =

∂
∂g
(x, u) Q(du) =
∂x j

g(x, u) Q(du)
∂ASF (x)
=
,
∂x j
∂x j

where the second equality uses independence and the third uses the technical assumption
of interchanging the derivative and the integral. It follows that

EX { E[ g j (X, U)|X]} = EX

and this completes the proof.

133

∂ASF (X)
∂x j

3.4

EMPIRICAL EXAMPLE

In this section, we illustrate how to obtain the estimates of (3.17) and (3.20) with two
empirical examples using Michigan Educational Assessment Program (MEAP) math test
outcome for the fourth graders.
Papke and Wooldridge (2008) apply a control function approach to estimate the effects
of spending on student’s performance for this test using 501 school districts from 1992
through 2001. Their dependent variable is a pass rate3 and they use foundation allowance
as the instrument variable for spending.
For this example, (3.17) and (3.20) are estimated as

δ = β·

1
NT

N T

∑∑

1
NT

1
NT

N T

j

s

N T

∑∑
i

φ(ψ + x js β + hi ξ + ρvit )

,

(3.30)

t

and
η= β·

∑∑
i

φ(ψ + xit β + hi ξ + ρvit )

,

(3.31)

t

where xit includes spendingit , freelunchit , log(enrollment)it , spendingi,1994 , time dummies
and the interactions between spendingi,1994 and time dummies, and hi contains time averages of freelunchit and log(enrollment)it .4,5 vit is the residual obtained by the first step of,
and β, ψ, ξ and ρ are the estimates obtained by the second step of Procedure 4.1 in Papke
and Wooldridge (2008), respectively.
Table 3.1 presents δ and η for spendingit , freelunchit , and log(enrollment)it . The estimates
η are the same as those reported in Papke and Wooldridge (2008).6 We see that δ is more
3 In

their data period, the test outcome is graded as one of Satisfactory, Moderate, or Low. The pass rate
is a fraction of students rated at the Satisfactory level.
4 In Papke and Wooldridge (2008), spending is constructed as the average of real expenditures per pupil
it
for the recent four years in logarithmic form, freelunchit is the fraction of students who are eligible for free
and reduced-price lunch program.
5 They allow the correlation between the district individual heterogeneity and the time averages of
freelunchit and log(enrollment)it .
6 Table 5 in Papke and Wooldridge (2008).

134

precisely estimated than η but they are similar.7
Table 3.1: APE estimates in Papke and Wooldridge (2008)
APE
variables
spending
freelunch
log(lenrollment)
Scale factor

η
0.583
(0.256)
-0.100
(0.069)
0.096
(0.073)
0.337

δ
0.558
(0.210)
-0.096
(0.065)
0.092
(0.061)
0.323

1.

The standard errors, in parentheses, are obtained by using
1000 bootstrap replications.
2. Scale factor indicates the average sum of φ(·) in (3.30) and
(3.31).
We now calculate the two APE estimates of the empirical application in Chapter 1,
which also examines how spending affects the MEAP test outcomes using the 2004 school
year data with multiple fractional response variables. In this application, the dependent
variable is a set of student fractions of the four levels for a district where the sum of a
district’s four fractions becomes one.8
The two APE of spending on these four levels are estimated by

δg =

1
N

N


1

N


 exp x j θxg + θvg vi

∑N ∑
j

i

∑4h exp



x j θxh + θvh vi

· θwg −

∑4h θwh exp x j θxh + θvh vi
∑4h exp x j θxh + θvh vi



 

(3.32)

7 Their

difference comes from the difference between their scale factors. Considering that the scale
factors are normal density functions raging from zero to 0.4, the difference could not be large unless the
coefficient estimate β is huge.
8 In 2004, there were four categories.

135

, and

ηg =

N

1
N



∑
i



exp xi θxg + θvg vi
∑4h exp

xi θxh + θvh vi

· θwg −

∑4h θwh exp xi θxh + θvh vi
∑4h exp

xi θxh + θvh vi


 ,

(3.33)

where xi contains spendingi , freelunchi , log(enrollmenti ), and spending93i ,9 , vi is the OLS
residual in the first step, and θ is the fmlogit estimates in the second step.
Table 3.2 reports the two estimates. As in the above illustration, they are similar in
general.
Table 3.2: APE estimates in Chapter 1
spending

v

η
δ

spending2

v only

η
δ

v, v3 , v3

η
δ

level 1
0.699
(0.223)
0.681
(0.205)
0.806
(0.239)
0.779
(0.216)
0.709
(0.248)
0.691
(0.229)

level 2
0.016
(0.139)
0.017
(0.135)
-0.053
(0.153)
-0.044
(0.150)
0.012
(0.158)
0.014
(0.154)

level 3
-0.634
(0.190)
-0.619
(0.172)
-0.679
(0.199)
-0.667
(0.183)
-0.618
(0.188)
-0.602
(0.170)

level 4
-0.081
(0.054)
-0.079
(0.057)
-0.074
(0.055)
-0.068
(0.056)
-0.103
(0.061)
-0.103
(0.068)

1.

The standard errors, in parentheses, are obtained by using
1000 bootstrap replications.
2. spending2 indicates, the model includes spending2 as well
as spending.
2. v, v3 , v3 indicates that the estimation includes them.

3.5

CONCLUSION

In this chapter, we have examined two types of a single APE that summarizes the APEs
of observed covariates. One is obtained by calculating ASF – averaging the conditional
9 spending

i

= log(per pupil GF expenditure), and spending93i is spending in 1993/1994.

136

mean over the distribution of unobservables – and then averaging its partial derivatives
or discrete changes across the observed covariates. The other, which is more commonly
used in empirical studies, is obtained by averaging the partial effects of conditional mean
over the joint distribution of observed covariates and unobservables.
Through the propositions, we have shown that the two APEs are identified in general
as long as there are suitable proxy variables satisfying the redundancy and the conditional independence assumptions. Furthermore, they are not generally the same unless
the unobservables and observed covariates are independent.
We have also illustrated how the two APEs are estimated in the empirical examples
using MEAP math test outcomes. In these examples, the two types of APE estimates are
similar.

137

REFERENCES

138

REFERENCES

Altonji, J. G., and R. L. Matzkin. 2005. “Cross section and panel data estimators for
nonseparable models with endogenous regressors.” Econometrica, 73(4): 1053–1102.
Bartle, R. G. 1966. The Elements of Integration.: New York: Wiley.
Blundell, R., and J. L. Powell. 2003. “Endogeneity in Nonparametric and Semiparametric Regression Models.” In Advances in Economics and Econometrics. eds. by L. P. H.
Dewatripont, Mathias, and S. J. Turnovsky, 2: Cambridge: Cambridge University Press,
, 1st edition 312–357.
Imbens, G. W., and J. M. Wooldridge. 2009. “Recent Developments in the Econometrics
of Program Evaluation.” Journal of Economic Literature, 47(1): 5–86.
Nam, S. 2014. “Multiple fractional response variables with continuous endogenous explanatory variables.” June.
Papke, L. E., and J. M. Wooldridge. 2008. “Panel data methods for fractional response
variables with an application to test pass rates.” Journal of Econometrics, 145(1-2): 121–
133.
Wooldridge, J. M. 2002. Econometric analysis of cross section and panel data.: MIT press.
. 2005a. “Fixed-effects and related estimators for correlated random-coefficient
and treatment-effect panel data models.” Review of Economics and Statistics, 87(2): 385–
390.
Wooldridge, J. M. 2005b. “Unobserved Heterogeneity and Estimation of Average Partial
Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg. eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, ,
Chap. 3.
. 2010. Econometric Analysis of Cross Section and Panel Data,. MIT Press Books:
MIT Press, , 2nd edition.

139