581$

 

2.

70061 LIBRARY

" Michigan State
University

 

 

 

This is to certify that the 3
dissertation entitled '

SOCIAL LEARNING AND PARAMETER UNCERTAINTY
IN IRREVERSIBLE INVESTMENTS '

AND

PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A
SPATIAL PROBIT MODEL

presented by
HONGLIN WANG

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in Agricultural Economics
and Economics :

 

 

Major Professor’s Signature
I /

Date

 

 

MSU is an Afﬁrmative Action/Equal Opportunity Employer 9

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5108 K:/Prq/Acc&Pres/CIRC/DateDue.indd

 

SOCIAL LEARNING AND PARAMETER UNCERTAINTY IN
IRREVERSIBLE INVESTMENTS
AND
PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A
SPATIAL PROBIT MODEL

BY

HONGLIN WANG

A DISSERTATION

Submitted to
Michigan State University
in particular fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Agricultural Economics and Economics

2009

ABSTACT

SOCIAL LEARNING AND PARAMETER UNCERTAINTY IN IRREVERSIBLE
INVESTMENTS
AND
PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A SPATIAL PROBIT
MODEL

BY
HONGLIN WANG

The dissertation is composed of two essays.

The ﬁrst paper discusses the social leaning and parameter uncertainty in irreversible
investments. The adoption of new technology usually involves irreversible investments
where the future payoff is uncertain. In addition, investors often have to contend with a
limited understanding of the technology itself, which can be modeled as uncertainty
regarding the parameters of the stochastic process describing the future payoff. It is
hypothesize that social learning (having previous adopters in the farmer’s social network)
increases the probability of the farmer adopting the new technology. This is posited based
on theory: social learning would reduce parameter uncertainty, and thus the overall level
of risk facing the farmer-investor, and thus induce investment. The paper tests this
hypothesis using Chinese farm household data on adoption of greenhouses. The latter are
of the “intermediate technology” type, made of clay walls, a plastic-sheet roof, and a
straw mat roll-out awning for cold nights. The empirical ﬁndings of this paper support
the hypothesis. It is also found that market volatility discourages adoption.

The second paper analyzes a spatial Probit model for cross sectional dependent data

in a binary choice context. Observations are divided by pairwise groups and bivariate
normal distributions are speciﬁed within each group. Partial maximum likelihood
estimators are introduced and they are shown to be consistent and asymptotically normal
under some regularity conditions. Consistent covariance matrix estimators are also
provided. Finally, a simulation study shows the advantages of the new estimation
procedure in this setting. The proposed partial maximum likelihood estimators are shown

to be more efﬁcient than that of generalized method of moments counterparts.

ACKNOWLEDGMENT

I owe my gratitude to all those people who have made this dissertation possible and
because of whom my graduate experience has been one that I will cherish forever.

My deepest gratitude is to my major advisor in agricultural economics, Dr. Thomas
Reardon, and my major advisor in economics, Dr. Jeffrey Wooldridge. I would like to
sincerely thank Dr. Reardon for his guidance, understanding, and strong support whenever
and for whatever I need. He encouraged me to pursue dual degree program, which provides
me a wider view of the long-term career path. His extensive international experience,
enthusiasms on global development, encouragement and his intelligent visions on human
society and culture exhibit me an exciting world for further exploring. I have been
amazingly fortunate to have Dr. Wooldridge as my major professor in economics. His
wisdom, effective guidance and great teaching, patience and constructive comments help
me enrich but focus my ideas at different stages. I am grateful to him for holding me to a
high research standard. Inspired by Dr. Wooldridge, learning econometrics becomes a very
enjoyable part of my life. I hope one day I could become a great teacher like Dr.
Wooldridge to my students.

I would like to extend my special thanks to my committee members, Dr. Songqing J in,
Dr. Emma Iglesias and Dr. John Giles. They always spent a lot of time to discuss with me
and give very helpful advice. I would also express my sincere thanks to Dr. Fan Yu. His

insightful critics, guidance, and careful reviews on my paper help me overcome difficulties

iv

and ﬁnish my research. My special thanks also go to Dr. Robert Myers and Dr. Zhengfei
Guan, who always offer me their help when I have difﬁculties in my research.

Grateful and sincere thanks also go to Dr. Jikun Huang, Dr. Scott Rozelle and Dr.
Linxiu Zhang. I am indebted to them for their guidance on designing survey questionnaire,
support on ﬁeld surveys and the trip in China, and comments on the research paper. They
have been advising me since my pursuit of MS. in China, and I am thankful for their
continuous encouragement, support and inspiration.

I am grateful to my graduate colleagues J inxia Wang, Chengfang Liu, Xiaoxia Dong,
Zijun Wang, Haiqing Zhang and Ruijian Chen in China, who worked very hard in assisting
surveys, data collection, validation and data entry. I am grateful for their hard work, sharing
the thoughts, and friendships. I would like express gratitude to my graduate fellows Wei
Zhang, Yanyan Liu, Zhiying Xu, Feng Song, Fang Xie, Feng Wu, Lili Gao, Wolfgang
Pej uan, Kirimi Sindi, Vandana Yadav, Ricardo Hernandez, Kang-Hung Chang and Panutat
Satchachai in both department of agricultural economics, and department of economics, for
the valuable discussion, ‘happy hour’, and care from them. I highly value such ﬁiendships.
They make my staying in the MSU a pleasant and unforgettable experience.

Finally, and most importantly, I would like thank my wife Qing Xiang. None of this
would have been possible without the love and patience of my wife. My wife and my
parents, Ruixia Chen and Jinyu Wang, have been a constant source of love, concern,

encouragement and strength all these years.

TABLE OF CONTENTS

LIST OF TABLES ....................................................................................... viii
LIST OF FIGURES ........................................................................................ ix
Chapter 1: Social Learning and Parameter Uncertainty in Irreversible
Investments: Evidence from Greenhouse Adoption in Northern China ........ 1
1.1 Introduction ............................................................................................... l
1.2 The Theoretical Model Framework .......................................................... 6
1.3 Greenhouse Intermediate-Technology in Northern China ..................... 13
1.4 Data Description ..................................................................................... 16
1.4.1 Sample Selection .............................................................................................. 16
1.4.2 Social Learning ................................................................................................ 18
1.4.3 Other Household Characteristics ..................................................................... 20
1.5 Empirical Methodology .......................................................................... 23
1.6 Empirical Results .................................................................................... 28
1.6.1 Identiﬁcation Strategy ...................................................................................... 28
1.6.2 Linear Probability Model ................................................................................. 31
1.7 Conclusion ........................................................ ’ ...................................... 35
BIBLIOGRAPY ............................................................................................ 43
Chapter 2: Partial Maximum Likelihood Estimation of a Spatial Probit Model
....................................................................................................................... 46
2.1 Introduction ............................................................................................. 46
2.2 Discrete Choice Models with Spatial Dependence ................................ 51
2.2.1 Probit Model without Dependence .................................................................. 51
2.2.2 A Probit Model with Spatial Error Correlation ................................................ 52
2. 2. 3 Probit Models with Other Forms of Spatial Correlation .................................. 54
2. 3 Using Partial MLEs to Estimate General Spatial Probit Models ........... 56
2.3.1 Univariate Probit Partial MLE ......................................................................... 57
2.3.2 Bivariate Probit Partial MLE ........................................................................... 58
2.4 Partial Maximum Likelihood Estimation ............................................... 64
2.4.1 Consistency of Bivariate Probit Estimation ..................................................... 66
2.2.2 Asymptotic Normality ...................................................................................... 68
2.4.3 Estimation of Variance-covariance Matrices ................................................... 70
2.5 Simulation Study ..................................................................................... 74
2.5.1 Simulation Design and Results ........................................................................ 74
2.6 Conclusions ............................................................................................. 78
APPENDIX I ................................................................................................. 79

vi

A.1 Proofs to Theorems ............................................................................................ 79

A2 Technical Lemmas ............................................................................................. 90
APPENDIX II ............................................................................................. l l 1
BIBLIOGRAPHY ....................................................................................... 1 13

vii

LIST OF TABLES

Table 1.1 Descriptive Statistics: Household Level Data ................................................... 38
Table 1.2 Greenhouse Adoption and Social Learning: LPM Estimated by ZSLS ............ 39
Table 1.3 Greenhouse Adoption and Social Learning: First Stage 2SLS Results ............ 40

Table 1.4 Greenhouse Adoption and Social Learning: LPM with Interaction Terms ...... 41
Table 1.5 Distance to Neighborhood and Characteristics of Household .......................... 42

TABLE 2.1: Simulation Results of Different Estimators of lambda in the Context of the
Bivariate Spatial Probit Model ........................................................................................ 111

TABLE 2.2: Simulation Results of Different Estimators of betas in the Context of the
Bivariate Spatial Probit Model ........................................................................................ 112

viii

LIST OF FIGURES

Figure 1.1 Greenhouse Diffusion Curve at the Household Level ..................................... 37

Figure 2.1 N pairwise groups of Zn observations based on Euclidean Distance ...... 59

ix

Chapter 1: Social Learning and Parameter Uncertainty in Irreversible
Investments: Evidence from Greenhouse Adoption in Northern China

1.1 Introduction

Risk and uncertainty have been important themes in the agricultural technology
adoption literature since the 19703. They were included in studies of green revolution
technology adoption to explain lagged or partial adoption or even disadoption. Examples
include Roumasset (1976) and Feder (1980). This can be seen as part of a wider strand of
literature on the economics of risk and uncertainty, and their constraining effects on
investment (Newbery and Stiglitz, 1981).

Distinctions in two dimensions in particular that interest us here have been drawn from
the initial foundation of inclusion of risk and uncertainty in agricultural technology
adoption analysis. The ﬁrst dimension is the modeling of various forms of “information
capital” as part of the vector of capital assets in the adoption function. The earliest forms
modeled were public information in the form of farmers’ education and access to extension
services. Then, and of most interest to us here, came the introduction of personal
experience with a technology (“learning by doing”) and observation of neighbors’
experience with the technology (“learning from neighbors”). These were introduced for
example in Besley and Case (1994) and Foster and Rosenzweig (1995).

The modeling of “learning from neighbors” has been further reﬁned in recent papers
that model “social learning,” such as: (1) Conley and Udry (2001) in their modeling of
Ghana farmers’ adoption of fertilizer in pineapple production, conditioned by their

incomplete information and communication networks with neighbors; (2) Bandiera and

Rasul (2006) in their modeling of Mozambique farmers’ adoption of sunﬂowers,
conditioned by their social network (neighbors and friends who have adopted); and (3)
Munshi (2004) in his modeling of Indian farmers’ adoption of HYV of rice and wheat,
conditioned by their neighbors’ experiences but differentiated over rice and wheat areas
due to the inﬂuence of heterogeneous population. This body of work has demonstrated the
effects of social learning on technology adoption. In most cases the social learning’s effect
on adoption is interpreted as increasing the capacity of the farmer to adopt as well as
reducing the farmer’s uncertainty and perception of risk in adoption.

The second dimension is the modeling of irreversible investments in capital embodying
technology, such as tube wells, greenhouses, and so on. This distinction — between
reversible investments such as adoption of an annual crop, a hybrid seed, fertilizer, or a new
planting technique - and irreversible investments where the salvage value of the asset is
negligible or the asset cannot be transferred or sold, is important in the analysis of risk and
uncertainty in technology adoption.

Because of incomplete information with respect to the performance, reliability, and
appropriateness of agricultural equipment, irreversibility entails substantial risk for the
investor (Dixit and Pindyck, 1994, and Sunding and Zilberman, 2000). McDonald and
Siegel (1986) and Dixit and Pindyck (1994) show that the ability to delay an irreversible
investment can be considered as a real option; a higher level of uncertainty regarding future
beneﬁts raises the option value and causes the investment decision to deviate from the
classical NPV rule. Speciﬁcally, investors may rationally delay investment to gain
additional information, reduce the level of uncertainty, and increase discounted expected

payoffs. This has been modeled in two strands of literature.

On the one hand, delayed investment to gain additional information in the face of
uncertainty has been studied in the economics literature, inspired by McDonald and Seigel
and Dixit and Pindyck. Examples include Olmstead and Rhode (1993), Zilberman et al.
(2004), Hassett and Metcalf (1995), and Nelson and Amegbeto (1998), inter alia. These
studies have tended to assume that all parameters of the dynamic process are known to
agents, and the only uncertainty in the model comes from the future value of the dynamic
process.

On the other hand, investment under parameter uncertainty has been examined in the
ﬁnance literature. Merton (1980) shows that while the variance of the return can be
estimated precisely from continuous observations on a ﬁnite interval, the estimator of mean
return does not converge unless the length of the interval becomes large. Gennotte (1986)
studies portfolio choice under incomplete information about the stock return process. He
uses tools of nonlinear ﬁltering from Lipster and Shiryaev (1978) to derive the optimal drift
estimator as agents continuously observe the returns. Brennan (1998) and Xia (2001)
construct similar models to examine how learning about unknown parameters and
unknown predictability affects portfolio choice. More recently, Abasov (2005) modeled
irreversible investment under parameter uncertainty, and Huang and Liu (2007) modeled
learning from discrete noisy signals about the true drift in their study of periodic news on
portfolio selection. Note that much of the ﬁnance literature is primarily theoretical, with
few empirical applications and none in the domain of investment in agriculture capital as
an embodiment of agricultural technology adoption.

The present paper aims at a particular, and a particularly important, gap left by the two

dimensions discussed. That is, while the literature on social learning and technology

adoption has modeled the effect of social learning as a means of reducing uncertainty, that
literature has not treated the issue of irreversibility of the investment per se, and thus has
not modeled the effect of social learning in a real options context. Moreover, while the
literature on irreversible investment and uncertainty has indeed modeled investmentin a
real options framework, it has not examined uncertainty-reduction measures taken by
adopters, in particular, social learning.

There is thus a gap in the literature, both theoretical and empirical, where an analysis of
irreversible investment under parameter uncertainty models the effect of social learning.
The contribution of the present paper is to address that gap.

We address the gap empirically by modeling greenhouse investments with primary data
from Shandong province in China. The data are multi-year, observing the characteristics,
including their social network of prior adopters, of the adopters the year before their
adoption, and thus, new to this literature, we capture causality of social learning and
adoption.

We address it theoretically, by presenting a new model to the literature of these links.
Following McDonald and Siegel (1986), we assume that a farmer is considering an
investment project, whose value follows a geometric Brownian motion. Departing from the
standard framework, we assume that the true drift of the Brownian motion is unobservable
to the farmer (we call this parameter uncertainty). In essence, the farmer is imperfectly
informed as to the expected rate of return of his investment. He must make an inference
about the true expected return based on his information and, at the same time, determine the
optimal timing for investing in the project. The farmer can learn about the unknown

parameter in two ways. First, he extracts information on the true drift from a continuous

observation of past realized returns on the project value. This captures the process of
continuous learning from public information about the project. Second, he obtains discrete
noisy signals of the true driﬁ. This represents the process of social learning from early
adopters in his social network, who might possess information about the project that the
public do not have. In our model, parameter uncertainty adds to the overall risk that the
farmer faces; this raises the threshold project value needed to induce the farmer to invest. In
contrast, social learning reduces parameter uncertainty, which decreases the overall level of
uncertainty and reduces the investment threshold, thereby increasing the likelihood of
adoption. In our model, social learning also causes the farmer’s belief about the expected
return to converge to the average belief of his social network; the higher the average belief,
the higher is the investment threshold, and the less likely the farmer will adopt the
technology.

The rest of the paper is organized as follows: In Section 2, we present the theoretical
model. In Section 3, we provide background information about the greenhouse technology
in northern China. In Section 4, we outline our sample selection and summarize the data. In
Section 5, we explain our empirical methodology. In Section 6, we present the empirical

ﬁndings using linear probability models. We conclude in Section 7.

1.2 The Theoretical Model Framework

In this section, we use a real options model to articulate the effect of parameter
uncertainty and social learning on technology adoption. We begin with a model of
continuous learning, which is essentially that of Abasov (2005). Speciﬁcally, a farmer is
considering whether to pay a sunk cost of I for an agricultural technology, whose value V

evolves according to:

th= Vr(#dt+0dzt)

where Z is a Brownian motion.
Motivated by Merton (1980), we assume that the farmer can observe V continuously

and knows its volatility 0; however, he only knows that the drift 1: is a normal random

variable with mean m0 and variance ya in the beginning. According to Lipster and Shiryaev

(1978), the conditional mean of the drift given the farmer’s information set,

mt = E(,u|FtV), follows:
dmt = L’dZ,
a

where VI = E [(11 — mt )2 [FtV :l is the conditional variance of the drift, satisfying:

7
d7, =—-’—dz (1.1)
0'

I
and Z is a new Brownian motion related to the original Brownian motion through:

 

dZt =dZt +mt-ﬂdt
0'

We can solve equation 1.1 for y,:

__ 7002
71"“?
yot+a

This result shows that continuous learning decreases the conditional variance of the
unknown parameter. Thus the longer the farmer observes the value process, the less
uncertain he is about the drift. This is consistent with Merton (1980)’s results: the
uncertainty of the drift is not related to the number of observations, but is rather related to
the length of the observation period. However, the conditional mean of the drift can
ﬂuctuate up or down, depending on new observations of the Brownian motion Zt.

According to Gennotte (1986), the farmer’s decision can be separated into two

problems: the inference of the unknown parameter given {2; }O<S <1, and the optimal
stopping decision based on the current state variables (mt,yt,Vt) and the dynamics

of (m, y,V). Putting’everything together, we can characterize the farmer’s problem using

observable processes:

J(m0,70,V0)= max E[e_pT(VT—1):l,
reFV

s.t. th = Vt[mtdt+0'dZt),

, (1.2)
dmt = ade,
0'
2
d — —id
yt— 2 t.
0'

Here, p is the farmer’s discount rate, and I has to be an F" -stopping time, reﬂecting

that the farmer must make a decision based on his information set. The stopping rule takes

the form of:

r = inf{t 2 0 : Vt Z V*(mt,7t)},

where V1172, 7 ) is the trigger value of investing, which depends on the state variables.l

Abasov (2005) derives the Hamilton-Jacobi-Bellman equation for the optimal stopping
problem (1.2) and transforms it into a linear complementarity problem, which he solves

with the ﬁnite difference method. His numerical results demonstrate that the trigger value

at:
of investing, V (m0, yo) , obtained as a part of the solution, increases with 70. This result

is sensible given that the trigger value in the McDonald and Siegel (1986) and Dixit and

Pindyck (1994) model increases with a, and parameter uncertainty contributes to the total

t
uncertainty in our model. In addition, Abasov shows that V increases with m0; this is also

consistent with the traditional real options model without parameter uncertainty.

In developed countries, there are public economic forecasts and newsletters informing
investors. Therefore, agents can make inferences based on past realized returns. However,
in rural China, information is more likely to come from local private sources. Similar to
Huang and Liu (2007), we allow farmers to obtain direct signals of the drift from early
adopters in their social networks. These signals are noisy, reﬂecting the fact that even early
adopters are unlikely to learn everything about the technology from their own experience.
Different from Huang and Liu (2007), we assume that the signals are costless. However,

the number of signals to which a farmer has access is limited by the scope of his social

 

Since 7 IS a deterministic ﬁanction oft, we can equrvalently formulate the problem in terms of state

variables (m,t) .

network, which we take as exogenous. For simplicity, we also assume that these signals are
received at time 0, just as the farmer begins to consider his adoption decision. Since
discrete signals are much more effective than continuous learning in changing the farmer’s
belief, it seems reasonable to assume that he would seek out these signals at the very
beginning of his decision-making process. This implies that discrete updating affects the
farmer’s optimal stopping problem only insofar as it changes his initial belief; discrete
updating plays no role in the dynamics of the conditional mean and conditional volatility.
Let signal i be given by:

where 81' ~ N[0,ag) is independently and identically distributed. After receiving n

such signals, it can be shown that the conditional mean and variance of the drift are given

by:
i 2 __
m0=__0'8_m0+__m_#, (1.4)
n70+0§ n70+crg~

2

' 0'
70=____(0 82, (1.5)

nyo +08

where Z = £23.21 .Ui- Equation (1.4) shows that the conditional variance is decreasing

in the number of signals, which can be taken as the scope of social learning. Therefore,
social learning reduces parameter uncertainty. Using Abasov’s numerical results, this
implies that social learning decreases the trigger value for adoption, making it more likely

that the farmer would adopt the technology.

Considering the conditional mean equation (1.3), we ﬁnd that as the number of signals

' _
increases, m0 tends to move away from m0 and approach 11. This indicates that social

learning causes the farmer’s belief about the drift to converge to the average belief in the

farmer’s social network. The net effect depends on the relation between m0 and;. If

m0 > Z, the farmer is initially too optimistic; social learning causes him to lower his

expectation about the project’s return. This, in turn, lowers the trigger value and facilitates
adoption. If the farmer is, on average, unbiased in his initial belief, then social learning is
unlikely to change the probability of adoption through its effect on the conditional mean
return.

If we generalize this model to allow the dynamics of social learning to enter the
farmer’s decision making, then we can write down the following optimal stopping problem,
where we combine continuous ﬁltering with discrete updating:

J(mo.7o,Vo)= max Armin—1)],
TEFVVF

SJ. th = Vt(mtdt+odZt),

dmt =ﬂdZt+—Zt;—2—(,u(t)-mt_)dNt, (1.6)
0 7t- +08
72 72
dy, = ———’2—dz-——’—‘—2dN,.
0' 712.4178

Here, p(t) refers to the independently and identically distributed noisy signals

described in equation (1.5), and Nt is a counting process that counts the number of signals

that the farmer has received up to time t. It can be periodic and deterministic as in Huang

10

and Liu (2007), or stochastic, as in the case of a Poisson process with arrival rate x1 , which

describes social interaction as a random phenomenon. In all cases, however, the ﬁrst part of

the dynamic equations for (m, y) captures the effect of continuoUs updating as the farmer

learns from the past history of V. The second part represents a jump in the conditional

mean and variance when the farmer receives a noisy signal of the drift. Because 7 and N

are deterministically related through the conditional variance relation, we have suppressed

the dependence of the value function onN . Similarly, we can write the trigger value as

V"(mt, yr), with the understanding that the effect of N t is already reﬂected in the conditional

variance y,.

Generally, the optimal stopping problem (1.6) must be solved numerically. The
adoption decision is related to the amount of social learning that the farmer has experienced.

According to the above model, this is measured by N t- As the conditional variance equation

shows, a largerN (more social learning) always reduces y. We conjecture that the trigger

value is increasing in y, regardless of whether farmers are cognizant or ignorant of future
social learning.2 This implies that social learning can lower the trigger level for adoption.
Summarizing the various models, the classical real options analysis of McDonald and
Siegel (1986) predicts that the trigger value for investment increases with the uncertainty of
the project value. We show that this result also extends to parameter uncertainty. Building
from recent work on social learning and technology diffusion (such as Bandiera and Rasul,

2006), we argue that social learning can facilitate adoption by reducing parameter

 

2 One can conceive of cases in which knowledge of the social learning dynamics can actually delay adoption.
For example, if the farmer knows that parameter uncertainty will be fully resolved tomorrow, he is unlikely to
invest today.

11

uncertainty. In rural China, where public extension information is not easily accessible to
small farmers, information from social learning could play an important role in their

adoption decisions. The rest of our paper is dedicated to testing this hypothesis.

12

1.3 Greenhouse Intermediate-Technology in Northern China

Before economic reforms, China gave ﬁrst priority to the development of heavy
industry. In agriculture, China emphasized the importance of self-sufﬁciency for grains -
the “iron rice bowl policy.” After the “household responsibility system” reform started in
1981, the shortage of grain supply was relieved by a signiﬁcant increase in grain production.
This made it possible for China to diversify into horticulture and livestock husbandry.
Meanwhile, rapid income growth in the 19805 and 19905 created an increasing demand for
high-value horticultural products. However, poor infrastructure and high energy costs
prevented the transportation of perishable products from southern China to northern China,
and affordable fresh vegetables were still unavailable in the 19905 to consumers during the
winter season in northern China.

The huge demand for cheap fresh vegetables led to the development and widespread
diffusion of an affordable greenhouse technology for northern Chinese farmers. Rather
than the modern, expensive type made of steel frame, plastic or glass walls and ceilings,
and requiring energy-using heating and cooling mechanisms (promoted in the 19705 in
China but saw very little adoption because of the cost, Wan 2000), the greenhouse adopted
in the 19903 in northern China was of the “intermediate technology” type, made of simple
clay walls, bamboo frame, a plastic-sheet roof, and a straw mat roll-out awning for cold
nights. The sun warms the interior, with the greenhouse built with an orientation to
maximize sunlight capture. These greenhouses changed not only the food consumption
pattern for hundreds of millions of consumers, but also the face of farming in northern

China. These greenhouses helped to transform China from a modest global player to the

13

volume leader in horticulture - growing one third of the fruits and vegetables on the planet
by 2003. By 2004, China grew 47 percent of the vegetable volume in the world
(Weinberger and Lumpkin 2005). The vegetable greenhouse area in China reached 150,000
hectares in 2004 (Chinese Agriculture Yearbook 2006), and at least half a million farmers
were by that year using the intermediate-technology greenhouse.

Greenhouse yields exceed open-ﬁeld cropping: for example, the tomato yield is 200
tons/hectare/year in the greenhouse, versus 40 tons in an open ﬁeld. Several factors,
including labor intensive production, contribute to this high yield. For example, the popular
greenhouse size in Shandong province is only about 60 meters long and 10 meters wide, but
it usually employs two full-time workers. Greenhouse production usually lasts more than
eight months, because the temperature inside the greenhouse is high enough during the
winter months to sustain production. Moreover, high quality crop varieties and intensive
use of organic fertilizers are common in greenhouse production. Nutrient replacement is
important due to the intensive and continuous use of the land under the greenhouse.

The intermediate-technology greenhouse is far cheaper than a modern type, but is still a
major investment for the very small farmers of Shandong. The construction cost of
intermediate-technology greenhouses is roughly four dollars per square meter, much
cheaper than modern greenhouses of glass or plastic which cost about 80 dollars per square
meter to construct. Yet even four dollars per square meter is a large investment for very
small farmers. For example, if a greenhouse is 60 meters long and 10 meters wide, the
construction cost would be about $2,400, while the average Chinese farmer earned less
than $500 in 2005. Moreover, the labor time involved in building the greenhouse is

substantial: the farmer spends months creating the main component - the rear-wall of the

14

greenhouse, which is usually made of pounded clay bricks.

Moreover, the investment is “irreversible,” in the sense of Bertola and Caballero (1994),
as the structure can only be used in immediate production, and has little to no salvage value
and cannot be sold or transferred. The bricks cannot be reused or sold; if the farmer decides
to demolish the greenhouse (as it cannot be transferred or sold as it is not movable), the
bricks would be broken into dirt clods, and the old straw awning and old bamboo beams

worth little in salvage.

15

1.4 Data Description

1.4.1 Sample Selection

Our survey area is in Shandong province, the leading horticulture province in China. It
has seven percent of China’s cropland, but 12 percent of China’s horticultural land in 2004.
The latter share has been steadily rising over time. The number of greenhouses and the
level of commercialization as well as yields in Shandong are higher than in the rest of
China.

In Shandong, we conducted two coordinated community and household level surveys
in 2005 and 2006, respectively. The ﬁrst one, the Shandong village survey, provided a
representative sample of tomato and cucumber growing villages in Shandong. During the
ﬁrst step of the survey, we created sampling frames of county-level tomato and cucumber
production in order to select ﬁve sample counties per crop. Speciﬁcally, with knowledge of
county production of each crop, we ranked counties by the output per capita of that crop.
For each crop in our sample, one high production county was randomly selected from the
counties in the top quintile; the other high production county was randomly selected from
the second quintile. The two medium production counties were randomly chosen from the
third and fourth quintiles, respectively. After eliminating ﬁve percent of the counties with
the lowest production, the low production county was randomly chosen from the lowest
quintile. In the end, there were two counties in the high production set, two counties in the
medium production set, and one county in the low production set.

After the sample counties were chosen, a similar process was used to select sample

townships and villages. For each crop, the survey teams visited a total of ten townships.

l6

Moreover, for each crop (among the ﬁve counties and ten townships), we interviewed
respondents in 35 villages (22 in high production counties, 10 in medium production
counties, and 3 in low production counties). Since we collected area data on all villages,
townships, and counties in the sample, we were able to construct area-based weights in
order to create point estimates of our variables that are provincially representative.

Having selected the villages, the enumeration team visited each community and
undertook data collection. Speciﬁcally, the enumerator conducted a two-hour interview
with three village leaders for the village survey. In each village, we divided all households
into two groups. For the cucumber sample, they are non-cucumber households and
cucumber households. We randomly sampled seven cucumber farmers and three
non-cucumber farmers. As a result, we obtained 350 households from cucumber growing
villages.3 With knowledge of the distribution of cucumber farmers and non-cucumber
farmers, plus the distribution of greenhouse adopters in each village, we calculated the
weights to adjust for selection bias. Following this procedure, we also obtained 350
households from tomato growing villages.

After data cleaning, we collected 638 valid household observations. Among this
sample, 204 (64 percent) out of 317 households from tomato growing villages were found
to have adopted greenhouses, while 158 (49 percent) out of 321 households from cucumber
growing villages were found to have adopted greenhouses. That a higher share of tomato
growers adopted greenhouses is apparently due to the fact that in cucumber production, a

shading shed is a substitute for a greenhouse, while in tomato production there is no

 

3 . . . . . .

The reason why we d1d not directly stratify on greenhouse use IS that our survey 15 part of a large hort1culture
production survey, which required stratiﬁed sampling of cucumber/tomato and non-cucumber/tomato
households.

17

substitute for a greenhouse, and the options are only growing in the open ﬁeld or in a
greenhouse.

Shandong farmers did not adopt greenhouses all at once, but rather, in a process typical
of diffusion of new technology, over years. The greenhouse diffusion process can be
roughly divided into three stages: early stage, take-off stage, and slow-down stage. Figure
1.1 shows that the diffusion process is relatively slow in the early stage before 1990; only a
few farmers adopted the technology. Between 1990 and 1995, many more farmers adopted.
The diffusion process reached its peak between 1996 and 2000, after which the trend began
to slow down. This diffusion curve is similar to the “s-curve” observed by Griliches (l 95 7)
for the adoption of hybrid maize in the US, and subsequently documented in many other

settings.

1.4.2 Social Learning

We are interested in the effect of social learning on farmers’ adoption of greenhouses.
Our theoretical model predicts that social learning helps to reduce parameter uncertainty,
thus facilitating adoption. Empirically, however, social learning could be one of many
factors affecting adoption. For example, farmers may have other options such as off-farm
jobs. Alternatively, farmers may be credit-constrained because greenhouse adoption is a
major investment. To disentangle the effect of social learning from other determinants, we
need to ﬁnd appropriate empirical proxies for social learning and control for other factors
that might inﬂuence farmers’ decisions.

Social learning is a key variable in our study. We measure social learning in a way

similar to the approach of Bandiera and Rasul (2006). Speciﬁcally, we asked the farmers

18

who adopted, “How many people do you know who adopted greenhouses before you
adopted in your village?” We asked the non-adopters how many adopters they knew at the
time of the survey. We control for year with year dummy variables. We then asked, “How
many of these people are your relatives and friends?” (We did not include neighbors as a
separate category because Chinese farmers usually consider neighbors among friends.) The
answer to the second question is taken as our empirical proxy for social learning. Differing
from Bandiera and Rasul (2006) (who asked about the social network at the time of the
survey, not before adoption), we obtained the size of the farmer’s social network of
adopters before his adoption, so that we can infer causality.

There are several reasons why our measure of social network of adopters is an
appropriate measure of social learning before adoption. First, the number of earlier
adopters among relatives and friends is likely to be positively correlated with the number of
different sources of information on greenhouse adoption that the farmer accessed before
adoption, which corresponds to the number of discrete signals in our theoretical model.
Second, village membership, kinship, and friends are the deﬁning elements of a farmer’s
social network, or a group of people with whom the farmer has close contact, and from
whom information can be most easily obtained. By concentrating on the number of earlier
adopters among relatives and friends, we also mitigate the concern for ex post social
network formation. While this is obvious for kin adopters, we noticed during our survey
that Shandong farmers tended to deﬁne friendship based on long-term relation, such as
classmates, neighbors, and people who served with them in the army. Typically, they
consider a friend someone from whom they can borrow money in case of illness; they

would not consider passing acquaintances as friends. Third, we found that farmers were

19

easily able to remember the number of adopters they knew before they adopted; we surmise
that this is because a greenhouse is a big investment for local farmers and hence easily
observable.

The ﬁrst two rows of Table 1.1 provide the means and standard errors of our social
learning measures by adoption status. In the last column, tests of equality of the means are
provided to examine whether the differences between adopters and non-adopters are
signiﬁcant. The ﬁrst row indicates that, on average, adopters know about 6.9 earlier
adopters among relatives and friends in their own village, while non—adopters only know
about 4.7 earlier adopters in their social network. The result of the t-test shows that this
difference is signiﬁcant. This implies that there is more social learning for adopters than for
non-adopters. When we extend the scope of the social network to include earlier adopters

among relatives and friends in nearby villages (the second row), the ﬁndings are similar.

1.4.3 Other Household Characteristics

Table 1.1 presents other household characteristics by adoption status. There are several
salient points.

(1) Demographics differ between adopters and non-adopters. The family size of
adopters is signiﬁcant larger than that of non-adopters, while the amount of farm labor is
signiﬁcant smaller for adopters than for non-adopters. This is because adopters have more
dependent family members (either young children or old parents) than non-adopters. For
such households, greenhouse adoption could be a good choice because it allows the adults
to work close to home, so that they can care for dependent family members. Non-adopters

are, on average, substantially older than non-adopters - a point consistent with younger

20

farmers having more young children and old parents to care for.

(2) Off-farm employment and income are signiﬁcantly larger for non-adopters than
for adopters, which suggests that greenhouse labor and off-farm jobs are substitutes.

(3) There is no signiﬁcant difference in education between adopters and
non-adopters in our sample. This suggests that education is not the main determinant of
greenhouse adoption when the main source of information for the technology is social
learning.

(4) The farm size of adopters is larger than that of non-adopters, which indicates
that farmers with more land are more dependent on agricultural income, and farmers with
less land are more likely to favor off-farm jobs.

(5) Irrigation is of course important to greenhouse farming, and 89 percent of the
adopters have access to irrigation. However, 80 percent of the non-adopters also have
access to irrigation, showing that there is not much variation in irrigation access among
farmers in this well-irrigated region.

(6) Adopters have greater land tenure security than non-adopters. This is a sensible
result given the long-term nature of greenhouse investment. We proxy land tenure security
by the number of land reallocatio'ns undertaken by village leaders every few years to ensure
relative land distribution equality in the village.

(7) Adopters and non-adopters have no signiﬁcant difference in grain land share,
which suggests that both groups have a similar agricultural production pattern except that
adopters use greenhouses to produce vegetables and non-adopters produce vegetables in
the open ﬁeld.

(8) The presence of a credit constraint would in theory undermine an important

21

investment such as greenhouses, all else equal. However, it is difﬁcult to measure a credit
constraint facing a farmer, as this is equivalent to examining whether a farmer can borrow
as much as he would like at the going market interest rate (Banerjee and Duﬂo, 2002).
Since we are focusing on greenhouse adoption rather than testing whether the farmer has
invested in a greenhouse of optimal scale, we only need to know whether a farmer is
capable of building a greenhouse by borrowing money or using his savings. Therefore, we
observed the house value as a proxy for household wealth. We also collected the
household’s credit history (maximum borrowing and maximum lending) before adoption
as an indicator of how much credit/savings is available. Our data shows that non-adopters
are signiﬁcantly wealthier than are adopters before the latter’s adoption; non-adopters have
a mean house value of 8,773 yuan vs. 4,294 yuan for adopters. Similarly, non-adopters
have signiﬁcantly greater credit/savings than adopters. The maximum borrowing is 1,352
yuan for non-adopters vs. 925 yuan for adopters, and the maximum lending is 862 yuan for
non-adopters vs. 368 yuan for adopters. Given that non-adopters are both wealthier and
have more access to credit, credit constraints are unlikely to play an important role in

greenhouse adoption in Shandong.

22

1.5 Empirical Methodology

In this section, we illustrate the connection between our theoretical model and the
empirical framework. According to our real option model of greenhouse adoption, the
farmer decides to adopt or to wait based on a comparison between the current value of the
technology and the trigger value. Therefore, we can deﬁne the farmer’s adoption status at

time t as:

Yt = 1(ad0pt), if Yt* = Vt - Vt," > 0’ (1 7)

Y; = 0 (non - adopt), if Y; = Vt — V; S 0,

where V; is the discounted expected value of all future cash ﬂow from greenhouse

a:
vegetable production, and Vt is the trigger value.

McDonald and Siegel (1986)’s model, in which the drift ,u is known, shows the trigger

a:
value V as a function of the parameters (p, ,u, I, 0'). However, the drift p is unknown in

our model. Thus, the trigger value also depends on the conditional mean and variance of the

drift, (mt,yt). According to the dynamics of (my) in equation (1.6), we can substitute

I _
(mtg/t) with functions of (mo, 70,Zt,Nt,a,ag, ,u,t).4 Therefore, we can express the

trigger value V ,* as:

* I _
Vt =g[pa1,0,m0,70,Zt,Nt,Ug,#,t) (1-8)

 

4 This is only a simpliﬁed representation; strictly speaking, the solution of (mt , 71) according to equation

I
(1.6) depends on the paths of Z and N, as well as the history of the signals up to time t.

23

Following similar reasoning, the current project value Vt can be written as a function of the

same group of variables. Therefore, we can express Y; = V; — V; as:

I _
Y; =h(p,1,0,m0,y0,Zt,Nt,0'541¢). (1.9)
To motivate the empirical proxies for the variables in equation (1.9), we ﬁrst note that

Z; represents the stochastic change in the project value. A good proxy for Z; is the

observed proﬁtability of greenhouse production in the current period. We proxy that
proﬁtability by the ratio of the output price to the input price. Because historical data are
not available on vegetable prices in Shandong, we use the ratio of the vegetable price index
and the input price index at the national level as a proxy for the proﬁtability of greenhouse
production over the years. For the investment cost I, we use the greenhouse construction
cost (real value) for each adopter. For non-adopters, we use the average construction cost
for adopters in their village or nearby villages as the proxy.

Continuing with the interpretation of equation (1.9), 0' is the volatility of the project

value, which we measure as the standard deviation of the national vegetable price index
over the three years prior to the farmer’s adoption. Tu- represents the average signal

received by the farmer from his social network, the proxy for which is the vegetable price
index growth rate over the three years preceding the farmer’s adoption. This is a reasonable
assumption if the expected return of the project is close to the average return in the
economy. The time t in our model is equated with the amount of time the agent spent in
continuous learning. We use the number of years that the farmer had been aware of the

technology before adoption to represent the continuous learning effect. As noted above, N t

is the key variable in our study. We measure it by the number of earlier adopters in a

24

farmer’s social network, which includes relatives and friends in his own village and nearby
villages.

Besides these theoretically motivated variables, there may be other factors that affect
greenhouse adoption in practice, such as land tenure security, off-farm employment, and
household wealth. These factors were discussed in the preceding section. In addition, we do

not have compelling empirical proxies for farmers’ discount factor p , their initial values of
the conditional mean and variance (m0,70) before any learning had taken place, and the
standard deviation of their signals 03. These parameters, however, are likely correlated

with household characteristics such as age, family size, and education, which we include in
our empirical analysis to capture potential omitted factors.

Our theoretical model is based on observables; with knowledge of these observables,
the model predicts adoption with certainty. In reality, however, we do not observe all
information relevant for determining adoption. Therefore, our empirical model must allow
for the presence of unobserved determinants.

In brief, our empirical model can be written as:

Yi" =f(X,—,z,~,N,-,Dl,02)+e,-, (1.10)

*. . . . . .
where i denotes a household, 1’; IS the adoption cr1terlon 1n year 1 according to

equation (1.7), and X i are household characteristics before adoption (year t—l ), which

include age, education of household head, family size, farm size, off-farm employment and
income, family labor, irrigation conditions, family wealth, years of awareness of the

technology, and greenhouse construction costs. Z 1' are institutional and market variables at

25

t—l , which include the number of land reallocations, the ratio of the output price index to
the input price index, the volatility of the vegetable price index, and the average grth rate

of the vegetable price index. Ni is the number of earlier adopters in the farmer’s social

network at t—l . Dl and D2 are, respectively, year and county dummies that control for
heterogeneity in farmers’ adoption across different years and counties. Finally, e,“

represents the effect of unobservable determinants of adoption. According to equation (5.4),

the probability of adoption is:
P(Yl.*>0)=P(e,'>—f(X,',Zi,N,',D1,D2)) (1.11)

In our empirical analysis, we estimate a linear probability model (LPM), which
speciﬁes the above probability as a linear function of the explanatory variables. LPM has
its strengths and weaknesses. (I) It is a linear model, which offers convenience in model
estimation. For example, OLS provides consistent and even unbiased estimators and ease
in dealing with heteroskedasticity using heteroskedasticity-robust standard errors and
t-statistics. (2) However, the coefﬁcients in the linear model measure the effect of the
explanatory variables on the response probability. Unless the range of the explanatory
variables is severely restricted, the LPM cannot be a good description of the population
response probability. The hope is that the linear speciﬁcation approximates the response
probability for common values of the covariates; fortunately, this often turns out to be the
case (Wooldridge 2002). (3) The LPM model allows us to use year dummies to control for
heterogeneities over time, which is important to this empirical study given the structure of
our data set (in which different farmers adopted greenhouse in different years). Therefore,

even with some weaknesses, LPM often provides good estimates of the partial effects on

26

the response probability near the center of the distribution of the explanatory variables.

27

1.6 Empirical Results

1.6.1 Identiﬁcation Strategy

In this section we focus on the potential endogeneity of the social learning effect and
our identiﬁcation strategy. The endogeneity problem is one of the most formidable
problems in empirical studies. In order to ﬁnd an appropriate identiﬁcation strategy for this
study, it is crucial to understand the reasons why we could face the problem.

Manski (1993) uses the reﬂection problem to describe the tendency for people in the
same social network to behave in similar ways. He identiﬁes two possibilities: (I) an
endogenous effect, wherein the propensity of an individual to behave in certain ways varies
with the prevalence of the behavior in the group; (2) a correlated effect, wherein common
environment and personal characteristics produce similar behavior.

In this paper, we attempt to show that farmers’ adoption decision is inﬂuenced by social
learning. Therefore, we need to empirically distinguish the social learning effect from the
endogenous effect and the correlated effect.

In our context, the endogenous effect is essentially the social pressure problem.
Psychologists often use social pressure as a way of explaining herd behavior. For
greenhouse adoption, adopters are usually the minority in most villages. From this
observation one can infer that it would be rare for farmers to choose greenhouse adoption
because of social pressure.

In our context, the correlated effect poses a more serious challenge. An endogeneity
problem could arise from the simultaneous determination of adoption and network

formation: for example, a farmer could know more adopters because he adopted the

28

greenhouse. In other words, the adoption could affect social learning instead of social
learning affecting adoption (endogeneity from simultaneous determination). To mitigate
this problem, we collected household and institutional information for the year before the
adoption for adopters. For non-adopters, we collected the information in the year before the
survey occurred (2005).

Moreover, farmers who are entrepreneurial in spirit are likely to know more people
(hence more adopters). At the same time, they are more likely to try out new things (thus
more likely to adopt). Therefore, a farmer’s adoption could be explained by his personality,
rather than by learning from others in his social network. Thus, a key problem is how to
identify social learning from unobservable error terms such as similar personalities in the
social network. We need to ﬁnd at least one instrumental variable which is (1) correlated
with social learning after we control for other factors, but that is (2) not correlated with the
error terms. We can test the ﬁrst condition. We cannot test the second condition directly
because the error terms are not observable.

Fortunately, we have an appropriate instrument in this study: the walking time from the
farm to a farmer’s neighborhood. More speciﬁcally, we ask farmers the following question
in the ﬁeld survey: “How many minutes does it take to walk by your 20 closest neighbors?”
The logic of this question is that social learning could be negatively correlated to the
walking time. For example, if a farmer lives in a mountainous area, it could take two hours
or even more to walk by his 20 closest neighbors. On the contrary, it only takes 10 minutes
for farmers to walk by his 20 closest neighbors if people live closely. We surmise that
farmers in the second case are more likely to have access to social learning. We test this

hypothesis with data after controlling for other factors: we ﬁnd that walking time is

29

signiﬁcantly negatively correlated with social learning (ﬁrst row of Table 1.3 for both
social learning measures). This result demonstrates that the walking time variable satisﬁes
the ﬁrst condition for a valid instrument.

For an analysis of whether this instrument meets the second condition (lack of
correlation with the error term in the adoption equation), the following three-step
discussion provides further justiﬁcation for the validity of the instrument.

First, we use a heuristic explanation to justify the instrument. In rural China, it is not
unusual for a family to live in the same place for decades. A well-ﬁmctioning real estate
market does not exist in rural China for several reasons: (1) a farmer could own his house,
but not the land on which his house is built because all land is owned by the village
collective; (2) it is illegal to buy a house in a village if the buyer is not a member of the
village; (3) it is also illegal for a household to buy an additional house from another villager
because Chinese law forbids any household to occupy two pieces of land for housing in a
village; (4) if a farmer wants to change his house location, either he has to obtain a new
piece of land from the village collective under very strict conditions due to land scarcity in
Shandong, or he can ﬁnd another household in the village that is willing to give up its
housing land, which is very rare. In addition, in both cases the farmer has to give up his old
housing land. Based on these observations, it appears very difﬁcult, if not impossible, for a
household to change its location. In other words, the farmer’s housing location in rural
China can be considered as ﬁxed in most cases. From this we infer that the walking time to
the neighborhood is ﬁxed and exogenous to greenhouse adoption.

Second, we constructed interaction terms between the IV (distance to neighborhood)

and year dummies. We used the Hansen-J over-identiﬁcation test to examine the validity of

30

the IV given that we believe the other instruments (the interaction terms) to be truly
exogenous. The C-statistic from the Hansen-J test (the last row of Table 1.4) indicates that
the distance to neighborhood variable passes the validity test in both social learning
measurements. We must be cautious by not over-emphasizing this result, as the power of
the Hansen-J test depends on the exogeneity of the other instruments. However, this is the
best test we can do to check the validity of an instrumental variable.

Finally, we tabulate the distance to neighborhood by household characteristics such as
education, age, and wealth. These simple but reliable summary statistics can tell us whether
the distance to neighborhood is correlated with typical household characteristics. If the
distance to neighborhood is truly exogenous due to the ﬁxed housing location in rural
China, we would not expect to see a signiﬁcant correlation with household characteristics.
Indeed, the results in Table 1.5 indicate that the distance to neighborhood does not show
any robust correlation with the education and age of the household head, or the real value of
the house. These ﬁndings lend support to our working hypothesis that that the distance to
neighborhood is exogenous to greenhouse adoption.

As a result of these discussions, we are fairly conﬁdent that the IV (distance to
neighborhood) is exogenous to greenhouse adoption, and therefore it allows us to obtain
consistent estimators given that social learning is shown to be endogenous by the

Durbin-Wu-Hausman Test (last row of Table 1.2).

1.6.2 Linear Probability Model

Table 1.2 presents the estimation results for the linear probability model estimated by

2SLS with cluster-robust standard errors using distance to neighborhood as the instrument.

31

The ﬁrst two columns report the results using a measure of social learning within the
farmer’s own village; the next two columns report the results using a measure of social
learning that also includes the farmer’s nearby villages. Generally speaking, the two sets of
results are very similar, suggesting that village boundaries are not crucial to how social
learning affects greenhouse adoption.

We will focus on the ﬁrst two columns for a detailed discussion of our results. The ﬁrst
row conﬁrms the key result for our study: social learning has a signiﬁcantly positive impact
on greenhouse adoption. Speciﬁcally, one more adopter in a farmer’s social network
increases the probability of his adoption by 1.9 percent after controlling for other factors. In
other words, if there are currently 10 earlier adopters in the farmer’s social network, his
adoption probability in the next year will increase by about 19 percent. Given that the
greenhouse adoption rate is still low in rural China, this amount of increasing probability is
economically signiﬁcant.

The third row shows how adoption is affected by the conditional mean return to the
greenhouse technology. From our theoretical model, we know that the farmer’s belief about
the mean return will converge to the average belief of his social network as a result of social
learning. Because we cannot observe farmers’ expectations, we use the vegetable price
index (national level) growth rate before adoption to approximate the average belief of
project return in the social network. The coefﬁcient is not signiﬁcant; however, the sign is
consistent with the prediction of our theoretical model, namely, higher expected return
results in a higher trigger value for investment and a lower probability for adoption. It is
also possible that the price index growth rate is acting as a proxy for farmers’ outside

opportunities; however, we have already included off-farm income in our regression

32

speciﬁcation.

We use the market volatility of vegetable prices before adoption to represent the
uncertainty in the stochastic project value in our theoretical model. Our result indicates that
this source of uncertainty discourages adoption. This ﬁnding is consistent with theory,
which predicts that the option value of waiting to invest is larger when the future
investment value is more uncertain.

We use the number of years that the farmer had been aware of the technology before
adoption to represent the continuous learning effect. However, it is not signiﬁcant
according to our estimation. It could be that farmers in rural China simply did not have
continuous access to information about the greenhouse technology and its returns. It is also
possible that the main source of information about the greenhouse technology is discrete
social learning.

Our proxy for the current proﬁtability of the greenhouse technology is the ratio of the
output price index to the input price index: the higher is the stochastic project value, the
higher is the probability of adoption. Our result conﬁrms this prediction.

Among the included household characteristics, only the age of the family head is

statistically signiﬁcant. However, the effects of most household characteristics are

consistent with our discussion in section 1.4.3. TheRzof this regression is 0.83, which
suggests that we have included most of the factors that could affect the adoption decision.
It also reinforces the idea that our irreversible investment model is an appropriate choice
for describing the greenhouse adoption behavior.

In Table 1.4, the interaction terms between the distance to neighborhood and the year

dummies are included as extra instruments in the regression. The results are very similar to

33

the results in Table 1.2, which suggests that the results are robust. Moreover, the extra
instruments allow us to use the Hansen-J test to test the validity of the IV (distance to

neighborhood).

34

1.7 Conclusion

In technology adoption with irreversible investment, agents commonly face two
sources of uncertainty. First, the future value of the investment is uncertain. Second, agents
have incomplete information regarding the parameters of the process describing the future
investment value. In this paper, we model social learning as a way of reducing parameter
uncertainty, thus facilitating technology adoption with irreversible investment. We use
household-level data from intermediate-technology greenhouse adoption in northern China
to test the predictions, with the following main results.

(1) Social learning has a signiﬁcantly positive impact on greenhouse adoption. Ten
more adopters in the farmer’s social network increase the probability of adoption by 19
percent, which is an economically signiﬁcant effect.

(2) The empirical data conﬁrms what we know from the conventional theory of
irreversible investment: higher uncertainty about the ﬁlture investment value results in less
adoption.

(3) Social learning could also affect technology adoption through its inﬂuence on
the farmer’s belief about the expected return on the technology. The empirical data offers
some support for this hypothesis.

Our paper also provides an answer to the following question: how could small farmers
in developing countries deal with the risk from irreversible investment and incomplete
information? Our results suggest that social learning can be an effective solution. Therefore,
the policy implication from this paper is clear: when small farmers face technology

adoptions such as investing in tube wells or machinery, helping several farmers adopt

35

successfully may be the best way to induce more adoption in their village.

36

Figure 1.1 Greenhouse Diffusion Curve at the Household Level

 

 

 

 

 

 

 

A: .5 .3 ..L A _s _s N N N N
s a a a a 0 s a a s 0 °
g 0) on O N g 8 on O N g 8

 

L—o— the number of adopters

 

 

 

37

 

Table 1.1 Descriptive Statistics: Household Level Data

This table contains the basic household characteristics used in our study. The mean value
for each variable is presented with the associated standard error in parentheses. For
adopters, all variables are measured in the year before adoption. For non-adopters, all
variables are measured in the year before the survey. *** denotes signiﬁcance at
one-percent, ** ﬁve-percent, and * ten-percent level.

 

 

Basic characteristics Non-adopter Adopter Test of equality
of the means
(p-value)
Social learning within village 4.7 6.9 0.027“
(0.7) (0.67)
Social learning within village 5.8 8.45 0.018**
and nearby villages (0.8) (0.76)
Family size 3.7 3.9 0.016“
(0.07) (0.06)
Farm labor 2.92 2.46 0.01***
(0.07) (0.043)
Off-farm employment 0.8 0.24 0.01 ***
(0.054) (0.022)
Age of family head 46.4 35 0.01***
(0.6) (0.46)
Education of family head 7.0 7.24 0.25
(0.17) (0.14)
Off-farm income (yuan) 8420 1643 0.01***
(649) (182)
Farm size (mu) 5.6 6.01 0.09*
(0.19) (0.16)
Irrigation ratio 0.80 0.89 0.01***
(0.019) (0.013)
Major land reallocations since 1.44 0.79 0.01***
1980 (0.067) (0.05)
Minor land reallocations since 4.29 3.19 0.01***
1980 (0.26) (0.19)
House value (yuan) 8773 4294 0.01***
(539) (413)
Grain Land Share (percent) 0.579 0.577 0.92
(0.282) (0.252)
Maximum lend 862 368 0.01***
(104) (66)
Maximum borrow 1352 925 0.01**
(146) (102)

 

38

Table 1.2 Greenhouse Adoption and Social Learning: LPM Estimated by ZSLS

This table contains a 2SLS estimation of the linear probability model for farmers’ adoption
decision. The instrumental variable for social learning is distance to neighborhood
(measured by the walking time to the 20 closest neighbors). The dependent variable is 1 for
adopters and 0 for non-adopters. *** denotes signiﬁcance at one-percent, ** ﬁve-percent,

and * ten-percent level.

 

 

 

 

 

 

 

Explanatory variables Coefﬁcient Robust Coefﬁcient Robust
std error std error

Social Learning

Social learning within village 0.019 0.01**

Social learning within village 0.017 0.009“

and nearby villages

Conditional mean of market -0.41 0.34 -0.37 0.31

return

Market volatility -0.0017 0.0006" -0.0017 0.0006***

Years of awareness of the -0.009 0.0073 -0.01 0.008

technology

Output price/input price 0.83 0.21 *** 0.86 022*"

Household Characteristics

Family size 0.020 0.017 0.021 0.016

Age of family head -0.0034 0.0017** -0.0037 0.0015“

Education of family head 0.0012 0.0044 -0.0003 0.004

Off-farm income -0.0068 0.0065 -0.007 0.006

Farm size 0.006 0.006 0.0075 0.0054

Irrigation ratio 0.058 0.045 0.060 0.038

House value -0.0017 0.0032 -0.0022 0.0031

Greenhouse construction cost 0.0073 0.015 0.006 0.14

Times of major reallocations -0.017 0.03 -0.025 0.031

Times of minor reallocations -0.001 0.008 0.0005 0.008

Grain share 0.158 0.095 0.144 0.089

Dummies and constant terms

Crop dummy 0.0043 0.038 0.061 0.044

County dummies Yes Yes

Year dummies Yes Yes

Constant terms -0.687 0.26** -0.69 0.26**

Observations 626 626

Adjusted R-squared 0.83 0.84

Durbin—Wu—Hausman Test for p-value 0.014 p-value 0.013

Endogeneity

 

39

Table 1.3 Greenhouse Adoption and Social Learning: First Stage ZSLS Results

This table contains the ﬁrst stage results of a ZSLS estimation of the linear probability
model for farmers’ adoption decision. The dependent variable is social learning within
village or social learning within village or nearby villages. *** denotes signiﬁcance at one
percent, ** ﬁve percent, and * ten percent level.

 

Social learning within Social learning within

 

 

 

 

 

 

village village and nearby
villages
Explanatory variables Coefﬁcient Robust Coefﬁcient Robust
std error std error

Walking time to 20 closest -0.088 0.044” -0.093 0.046“
neighbors
Conditional mean of market 1.02 2.535 -1.375 2.585
return
Market volatility 0.005 0.012 0.007 0.015
Years of awareness of the 0.657 0.30" 0.771 0.304M
technology
Output price/input price 14.82 16.2 14.09 16.94
Household Characteristics
Family size 0.132 0.743 0.086 0.784
Age of family head -0.062 0.080 -0.046 0.089
Education of family head -0.l32 0.286 -0.052 0.308
Off—farm income 0187 0.338 -0.185 0.346
Farm size -0.030 0.358 -0.107 0.351
Irrigation ratio 1.729 4.076 1.700 3.656
House value -0.035 0.143 -0.010 0.143
Greenhouse construction cost -1 .169 0.605* -1 .1 68 0.580”
Times of major reallocations 1.416 0.974 1.946 0.970"
Times of minor reallocations -0.610 0.521 -0.727 0.526
Grain share 0.847 4.289
Dummies and constant terms
Crop dummy -1.898 2.092 -3.11 3.05
County dummies Yes Yes
Year dummies Yes Yes
Constant terms -5.996 19.69 -6.21 20.62
Observations 626 626
Adjusted R-squared 0.267 0.293

 

40

Table 1.4 Greenhouse Adoption and Social Learning: LPM with Interaction Terms

This table contains a ZSLS estimation of the linear probability model for farmers’ adoption
decision. The instrumental variables for social learning include distance to neighborhood
(measured by the walking time to the 20 closest neighbors) and its interaction with year
dummies. The dependent variable is 1 for adopters and 0 for non-adopters. *** denotes
signiﬁcance at one-percent, ** ﬁve-percent, and * ten-percent level.

 

 

 

 

 

 

 

 

Explanatory variables Coefﬁcient Robust Coefﬁcient Robust
std error std error

Social Learning

Social learning within village 0.019 0009’”

Social learning within village 0.018 0.008**

and nearby villages

Conditional mean of market 0415 0.357 -0.37 0.33

return

Market volatility -0.0017 0.0006“ —0.0017 0.0005***

Years of awareness of the -0.0094 0.0073 -0.01 0.0073

technology

Outpugarice/input price 0.82 0.211*** 0.85 022*”

Household Characteristics

Family size 0.020 0.017 0.021 0.016

Age of family head -0.0034 0.0017” 7 -0.0037 0.0015“

Education of family head 0.0013 0.0044 -0.0003 0.004

Off-farm income -0.0068 0.0067 -0.007 0.006

Farm size 0.006 0.006 0.0076 0.0057

Irrigation ratio 0.057 0.047 0.059 0.039

House value -0.0017 0.0032 -0.0022 0.0031

Greenhouse construction cost 0.0082 0.015 0.007 0.13

Times of major reallocations -0.018 0.03 -0.026 0.032

Times of minor reallocations -0.0004 0.0082 0.0009 0.008

Grain share 0.157 0.097 0.143 0.09

Dummies and constant terms

Crop dummy 0.0044 0.039 0.064 0.044

County dummies Yes Yes

Year dummies Yes Yes

Interaction terms Yes Yes

Constant terms -0.681 0.26** -0.684 0.27**

Observations 626 626

Adjusted R-squared 0.82 0.83

Over-Identiﬁcation Hansen J p-value 0.20 p-value 0.21

Test: C-Statistics

 

4l

Table 1.5 Distance to Neighborhood and Characteristics of Household

This table summarizes the walking time to the 20 closest neighbors for households
categorized by their education, wealth, and age levels.

 

Education of Distance to Real value of Distance to Age of head Distance to

 

family head 20 closest House 20 closest of household 20 closest
(school year) neighbors (I 0,000 neighbors (year) neighbors
(minute) Yuan) (minute) (minute)
0 14 <02 14 <20 18
1 21 0.2~0.5 25 20~25 16
2 13 0.5~1 16 25~30 14
3 15 1~2 16 30~35 17
4 21 2~3 16 35~40 l6
5 19 3~4 16 40~45 16
6 l7 4~5 16 45~50 15
7 13 5~6 l3 50~55 17
8 15 6~7 15 55~60 16
9 l6 7~8 17 >60 15
10 15 8~9 13
>1 1 13 9~10 16
>10 16

 

42

BIBLIOGRAPY

Abasov, T. M. (2005): Dynamic learning effect in corporate ﬁnance and risk management.
Ph.D. Dissertation. University of California, Irvine.

Banerjee, A., and Duﬂo, E. (2002). Do ﬁrms want to borrow more? Testing credit
constraints using a directed lending Prggram. MIT Department of Economics, Working
Paper No. 02-25.

Bradiera, 0., and Rasul, I. (2006). Social network and technology adoption in Northern
Mozambique. Economic Journal: 116, 869-902.

Bertola, G., and Caballero, R. (1994). Irreversibility and aggregate investment. Review of
Economic Studies: 61, 223-246.

Besley, T., and Case, A. (1994). Diffusion as a learning process. Evidence from HYV
cotton. mimeo, Princeton University.

Brennan, M. J. (1998). The role of learning in dynamic portfolio decisions. Eurogean
Economic Review: 1, 295-306.

Chinese Agricultural Yearbook (2006). Chinese Agricultural Press.

Conley, T., and Udry, C. (2001). Learning about a new technology: pineapple in Ghana.
American Journal of Agricultural Economics: 83, 668-673.

Dixit, A. K., and Pindyck, R. S. (1994). Investment under Uncertainty. Princeton
University Press.

Feder, G. (1980). Farm size, risk aversion and the adoption of new technology under
uncertainty. Oxford Economic Papers, New Series: 32, 2, 263-283.

Foster, A., and Rosenzweig, M. (1995). Learning by doing and learning from others:
human capital and technical change in agriculture. Journal of Political Economy: 103,
1176-1209.

Gennotte, G. (1986). Optimal portfolio choice under incomplete information. Journal of
Finance: 41, 733-746.

Griliches, Z. (1957). Hybrid corn: an exploration in the economics of technological change.
Econometrica: 25, 501-522.

Hassett, K. A., and Metcalf, G. E. (1995). Energy tax credits and residential conservation
investment. NBER Working Paper No. W4020.

43

Huang, L., and Liu, H. (2007). Rational inattention and portfolio selection. Journal of
Finance: 62, 1999-2040.

Liptser, R., and Shiryaev, A. (2001). Statistics of random processes. Springer-Verlag,
Berlin.

Manski, C. F. (1993). Identiﬁcation of social effects: reﬂection problem. Review of
Economic Studies: 60, 531-542.

McDonald, R., and Siegel, D. (1986). The value of waiting to invest. Quarterly Journal of
Economics: 101, 707-728.

Merton, R. C. (1980). On estimating the expected return on the market. J ourn_al of F infancial
Economics: 8, 323-361.

Munshi, K. (2004). Social learning in a heterogeneous population: social learning in the
Indian green revolution. Journal of Development Economics: 73, 185-213.

Nelson, A. W., and Amegbeto, K. (1998). Option values to conservation and agricultural
price policy: application to terrace construction in Kenya. American Journal of

Agricultural Economics: 80, 409-418.

Newbery, D. and J. Stiglitz (1981). The they of commodity price stabilization. Oxford:
Clarendon Press.

Olmstead, A. L., and Rhode, P (1993). Induced innovation in American agriculture: a
reconsideration. Journal of Political Economy: 101, 100-118.

Roumasset, J. (1976). Rice and risk: decision making among low income farmers.
Amsterdam: North Holland.

Sunding, D., and Zilberman, D. (2000). Research and technology adoption in a changing
agricultural sector. Draft for the Handbook of Agricultural Economics.

Wan, X. (2000). The Chinese protection agriculture outlook and trend. Agricultural
Machinem: 2000, 4-6 (in Chinese).

Weinberger, K., and Lumpkin, T. (2005). Horticulture for poverty alleviation: the
unfunded revolution. AVRDC Working Paper 15.

Wooldridge, J. (2002). Econometric analysis of cross section and panel data. MIT Press,
Cambridge.

Xia, Y. (2001). Learning about predictability: the effects of parameter uncertainty on

44

dynamic asset allocation. Journal of Finance: 56, 205-246.

Zilberman, D., Sunding, D., Howitt, R., Dinar, A., and MacDougall, R. (1994). Water for
California agriculture: lessons from the drought and new water market reform. Choices:
4, 25-28.

45

Chapter 2: Partial Maximum Likelihood Estimation
of a Spatial Probit Model

2.1 Introduction

Most econometrics techniques on cross-section data are based on the assumption of
independence of observations. However, economic activities become more and more
correlated over space with modern communication and transportation improvements. On
the other hand, technological advances in communications and the geographic information
system (GIS) make spatial data more available than before. Spatial correlations among
observations received more and more attentions in regional, real estate, agricultural,
environmental and industrial organizations economics (Lee 2004).

Econometricians began to pay more attention on spatial dependence problems in the
last two decades and some important advances have been done in both theoretical and
empirical studiess. Spatial dependence not only means lack of independence between
observations, but also a spatial structure underlying these spatial correlations (Anselin and
Florax 1995). There are two ways to capture spatial dependence by imposing structures on
a model: one is in the domain of geostatistics where the spatial index is continuous (Conley
1999), the other is that spatial sites form a countable lattice (Lee 2004). Among the lattice
models, there are also two types of spatial dependence models according to spatial
correlation between variables or error terms: the spatial autoregressive dependent variable

model (SAR) and the spatial autoregressive error model (SAE). In most applications of

 

5 . . . . .
Anselm, Florax and Rey (2004) wrote a comprehenswe revrew about econometrics for spatial models.

46

spatial models, the dependent variables are continuous (Conley 1999; Lee 2004; Kelejian
and Prucha, 1999, 2001; among others), and only few applications address the spatial
dependence with discrete choice dependent variables (exceptions include: Case 1991;
McMillen 1995; Pinkse and Slade, 1998; Lesage 2000; Beron and Vijerberg 2003). This
paper is designed to address this gap and we are concerned about the SAE model with
discrete choice dependent variables.

As the name indicates, there are two aspects in the discrete choice model with spatial
dependence. First, the dependent variable is discrete and the leading cases occur where the
choice is binary. Probit and Logit are the two most popular non-linear models for binary
choice problems. For the sake of brevity, in this study we focus on Probit model, but the
approach developed here generalizes to other discrete choice models.

In discrete choice models, if the observations are independent, we use maximum
likelihood estimation to get efﬁcient estimators given the correct conditional distribution of
dependent variables. The nice part of the maximum likelihood estimator (MLE) is that we
can still get consistency, asymptotic normality but inefﬁcient estimators in many situations
(panel data or clustering) by pseudo MLE even when we ignore certain dependence among
observations (Poirier and Rudd 1988). However, the non-linear property causes
computation difﬁculties in estimation, and this computational difﬁculty becomes much
worse when dependence occurs, which results in solving n-dimensional integration.

Dependence is the other aspect of this problem. General forms of dependence are rarely
allowed for in cross-sectional data, although routinely allowed for in time-series data
(Conley 1999). For example, some scholars discussed discrete choice models with

dependence in time-series data: Robinson (1982) relaxed Amemiya (1973) assumptions of

47

independence in Tobit model, and proved that the MLE with dependent observations is
strongly consistent and asymptotically normal under some regularity conditions. Poirier
and Rudd (1988) discussed the Probit model with dependence in time-series data, and
developed generalized conditional moment (GCM) estimators which are computational
attractive and relatively more efﬁcient.

However, dependence in space is more complicated than in the time setting because of
four reasons: ﬁrst, time is one dimensional whereas space has at least two dimensions;
second, time has natural order (direction) whereas space has no natural direction; third,
time is regularly divided because of regular astronomical phenomena whereas spatial
observations are attached to geographic properties of the surface of the earth; fourth,
time-series observations are draws from a continuous process whereas, with spatial data, it
is common for the sample and the population to be the same (Pinkse et al. 2007).

Therefore, how to deal with dependence in space in estimation is the key to spatial
econometricians. Inspired by works about dependence in time-series data, Conley (1999)
uses metrics of economic distance to characterize dependence among agents, and shows
that the GMM estimator is consistent and asymptotically normal under some assumptions
similar to time-series data. He also provides how to get consistent covariance matrix
estimator by an approach similar to Newey-West (1987). Pinkse and Slade (1998) use
GMM in the discrete choice setting with the SAE model, and show that the GMM estimator
remains consistent and asymptotically normal under some regularity conditions. Although
Pinkse and Slade (1998) generated generalized residuals from the MLE as the basis of the
GMM estimators, they do not take advantage of information from spatial correlations

among observations, and hence the GMM estimator is much less efﬁcient than full ML

48

estimators. Lee (2004) examines carefully the asymptotic properties of MLE and
quasi-MLE for the linear spatial autoregressive model (SAR), and he shows that the rate of
convergence of those estimators may depend on some general features of the spatial

weights matrix of the model. If each units are inﬂuenced by only a few neighboring units,

the estimators may have J; -rate of convergence and asymptotic normality; otherwise, it
may have lower rate of convergence and estimators could be inconsistent.

In this study, we choose to capture spatial dependence by considering spatial sites to
form a countable lattice, and explore a middle-ground approach which trades off efﬁciency
and computation burdensome. The idea is to divide spatial dependent observations into
many small groups (clusters) in which adjacent observations belong to one group. The
implicit rationale behind this is adjacent observations usually account for the most
important spatial correlations between observations. If we can correctly specify the
conditional joint distribution within groups, which allows us to utilize relatively more
information of spatial correlations, estimating the model by partial MLE will give us
consistent and more efﬁcient estimators, which should be generally better than GMM
estimators. However, this approach is subject to biased variance-covariance matrix
estimators because of spatial correlations among groups. To deal with this problem, we
follow the methods proposed by Newey-West (1987) and Conley (1999) to get consistent
variance-covariance matrix estimators. Of course, this middle ground approach will not get
the most efﬁcient estimator. However, since information from adjacent observations
usually capture important spatial correlations in the whole sample, we get a consistent and
a relatively efﬁcient estimators, and we avoid some tedious computations at expense of a

loss of a relatively small part of efﬁciency.

49

This paper is organized as follows. First, we review econometric techniques on discrete
choice models. Second, the SAE model with discrete choice dependent variable is
presented and regularity conditions are speciﬁed. Section 3 presents the bivariate spatial
Probit model. In Section 4, we prove consistency and asymptotic normality of partial ML
estimators under regularity assumptions, and discuss how to get consistent covariance
matrix estimators. Section 5 presents a simulation study showing the advantages of our new
estimation procedure in this setting. Finally, Section 6 concludes. The proofs are collected

in Appendix 1, while the results for the simulation study are provided in Appendix 2.

50

2.2 Discrete Choice Models with Spatial Dependence

2.2.1 Probit Model without Dependence

We ﬁrst review the standard Probit model without dependence and the underlying

linear latent variable model is:

Yi*=Xi,B+ai, (1)

where Y; is the latent dependent variable and a scalar, X‘l is a 1x K vector of regressors,

ﬂ is a K x1 parameter vector to be estimated, and 51' is a continuous random variable,

independent of X i, and it follows a standard normal distribution. However, we cannot

observe Y; , and we can only observe the indicator 1’; , which is related to 1’? as follows

1 nrf>a
Yi = l, (2)
0 if Y1. 30.
Therefore, we can get the conditional distribution of Y ,- given X; as
a]:
PO? =1|Xi)= P(Y,- > 0 I Xi) = P(8i > -Xiﬂ I Xi) = <I>(Xiﬂ), (3)

where CD denotes the standard normal cumulative distribution function (cdf). It is easy to

see we can get
1’0? =0IXi)=1-<I>(Xiﬂ). (4)
Since Y,‘ is a Bernoulli random variable, we can write the conditional density function of Y i

conditional on X i as

51

Y- l—Y- .
f(Yi IXi)=l¢(Xiﬂ)l III-“Xiﬂﬂ ', Y1 =0,1. (5)
Also, given the independence assumption of random variables, the log likelihood

function can be written as

n
L0g(L) = 2 U? 10gl¢(Xi,3)l + (1 - Yi)10gll - (”Xi/0]}, (6)
i=1

and the sufﬁcient condition for uniqueness of the global maximum of L0g(L) is that the

/\
function is strictly concave (Gourieroux 2000). We can solve then ,6 from the ﬁrst order

condition

aLog<L> = g Y: —<I><Xiﬂ)

X' X'-=o, 7
613 i=1¢(Xiﬂ)[1-¢(Xiﬂ)l¢( ’ﬂ) ’ U

 

where¢ is the probability density function (pdt) of the standard normal distribution.

However, the simple closed-form expressions for the MLE are not available because the
cdf of the normal distribution has no close-fonn solution. So the MLE must be solved by
using numerical algorithms6. In general, we can prove that the conditional MLE is
consistent and the most efﬁcient estimator given some regularity conditions7 such as
correctly specifying a parametric model, an identiﬁed ,8 and a log-likelihood function that is

continuous in ,8.

2.2.2 A Probit Model with Spatial Error Correlation

Consider the Probit model with spatial error correlation (SAE), where the underlying

 

6 . . . .
Commonly used numerical solutions are all derived from Newton's method. (see Gourieroux, 2000 for
details).

7See details in Wooldridge (2001, page 391).

52

linear latent variable model is

8
Yi*=Xi,8+ei, U
(9)

n
ai=l Z Wyej+up
j=1

where Wij is an element in the spatial weights matrix W which can be deﬁned by different

spatial distances such as the Euclidean distance. A is the spatial autoregressive error

coefﬁcient and we have a random variable u,“ ~ i.i.d N (0, 1) . We can write equations (8)

and (9) in matrix form as follows

10

Y* = Xﬂ + s ( )

£=(I—/1W)-1u, (11)
so that the variance-covariance matrix for the model is

Q a Var(e | X) = [(1 - AW)'(1 — itW)]_1. (12)

If Y * is observable, equation (10) becomes a linear function, and we can use the

Jacobian transformation of u into Y * and write the log likelihood function as
n 1 =1: , , >1:
L(,B,A)=—-2—ln(27r)—-2—(Y —X,B)AA(Y —X,B)+lnlA| (13)

A v I I 1 1|:
where A=I- AW, and then the estimate of ,6 can be solved as ,B = (X A A){)_1 X A A Y

However, 1n practlce we cannot observe Y , and we can only observe Y i, and 1t Implies

a non-linear Probit model because of the normal distributional assumption. Moreover the

errors are correlated, and the full likelihood function becomes

53

a1 a (14)

n
L=P(Y1=J’1-Y2=y29"'Yn=J/n)= I l¢(u)du>
—oo —oo

1 1 (15)
-11 __. ’Q"
¢<u>= (22:) 2 191“ e 20‘ u).

Although theoretically, if we take the ﬁrst derivatives subject to ,6 and the spatial

coefﬁcient 2. , we obtain

 

 

a1 a _17_ _I ' _ ' _ W
a{ j "(2m 2|(1—AW)'(1—,1W)le 2[u(I AW)“ '1 )u]du}
9L: —oo —oo :0
aa aﬂ ’
a] a __l]_ __l_ r _ I _
5i I Ina”) 2 |(1-/‘~W)'(1--4W)|e 2[u(1 AW)“ lW)u]du}
95_ —oo —oo :0
at at '
(above,(16) and (17)

The expression of the ﬁrst derivatives are quite complicated, but if we have sufﬁcient

computational ability and ,6 and x1 are identiﬁable, we can get consistent and efﬁcient
estimates of ,6 and l by using numerical methods. However, in practice, it would be a

formidable computational task even for a moderate size sample. We now propose a more

attractive procedure in the next sections.

2.2.3 Probit Models with Other Forms of Spatial Correlation

Generally, there is no reason to think that spatial correlation is properly modeled by (9).
Other forms are possible. For example, one might assume that, outside of a certain

geographic radius from a given observation 1', 81' is uncorrelated with shocks to the

54

outlying regions. So, for example, we might assume a constant correlation with any unit
within a given radius -- similar to a random effects structure for unbalanced panel data.

Alternatively, we may prefer more of a moving average structure, such as

8i = “1+4 2 Wihuh (18)
h¢i

where the ui are i.i.d. with unit variance. This formulation is attractive because it is

relatively easy to ﬁnd variances and pairwise correlations, which we will use in the partial

MLEs described in the next section. For example,

Var(g,- |W)=l+/12 2 W31 (19)
h¢i

Clearly, methods that use only the variance in estimation can only identify x12 (but we

almost always think ,1 > 0 , anyway). Pairwise covariances can also be obtained,

C0v(ei,ej|W)=/1Wij +2Wji+12 hi2“ ,Wthjh . (20)
l, j

Expressions like this for the covariance between different errors are important for

applying grouped partial MLE methods

55

2.3 Using Partial MLEs to Estimate General Spatial Probit Models

Estimating a Probit spatial autocorrelation model by full MLE is a prodigious task,
although several approaches have been applied. The EM algorithm can be used (McMillen
1992), the R18 simulator (Beron and Vijverberg 2003), and the Bayesian Gibbs sampler
(Lesage 2000). But each of these approaches is still computationally burdensome. To
combine such approaches with simulation studies, or to be able to quickly estimate a range
of models, is outside the abilities of even current computation capabilities for even
moderate sample sizes.

To get an estimator that is computationally feasible, Pinkse and Slade (1998) proposed
using generalized method of moments (GMM) using information on the marginal
distributions of the binary responses. In particular, the generalized residuals from the
marginal probit log likelihood are used to construct moment conditions for the GMM
method. Pinske and Slade show that, under conditions very similar to those in this paper,
the GMM estimator is consistent and asymptotically normal. The consistent
variance-covariance matrix can also be obtained theoretically without a covariance
stationary assumption, although Pinske and Slade (1998) do not discuss estimation of the
asymptotic variance. Therefore, the GMM estimator is almost practically useful, but it is
fundamentally based on the marginal probit models. Thus, while a GMM estimator can be
obtained that is efﬁcient given the information on the marginal likelihood, the method
throws out much useful information. We describe a simpliﬁed version of this approach in

section 2.3.1, which, in effect, uses a heteroskedastic probit model to estimate the ,6}

along with any spatial autocorrelation parameter.

56

Using only the marginal distribution of Y1, , conditional on the covariates and weights,

likely results in serious loss of information for estimating both ,6 and the spatial
autocorrelation parameters. Our key contribution in this paper is to explore the use of
partial maximum likelihood where we group small numbers of nearby observations and
obtain the joint distribution of those observations. Naturally, these distributions are
determined by the fully speciﬁed spatial autocorrelation model -- just as we must obtain the
implied variance to apply marginal probit methods. Once the covariances between
observations are found as a function of the weights and l , we can use that information in
multivariate probit estimation. Section 2.3.2 covers the case of where we describe a
bivariate probit approach, with heteroskedasticity and covariance implied by the particular
spatial autocorrelation model. Using a single covariance in addition to the variance seems

likely to improve efﬁciency of estimation.

2.3.1 Univariate Probit Partial MLE

One way to estimate the coefﬁcients ,6 along with spatial correlation parameters is to
derive the marginal distributions, P(Y1' = 1 | X, W) as a function of all of the weights (and
the parameters, ,6 and A , of course). Under the joint normality assumption, the model will

be a form of probit with heteroskedasticity. In particular, given any spatial probit model

such that the variances are well deﬁned, we can ﬁnd

Pa? =1IX.W)= chum/aim», (21)

where 01.201) = Var(e,' |X,W) = Var(ei |W) is a function of all weights, W, and the

spatial correlation parameters A . As is well-known in time series contexts — for example,

57

Poirier and Ruud (198 8) or Robinson (1982) — using probit while ignoring the time series
correlation leads to consistent estimation under standard regularity conditions, provided the
data are weakly dependent. Thus, it is not surprising that pooled probit that accounts for the
heteroskedasticity in the marginal distribution is generally consistent for spatially
correlated data, too -- provided, of course, we limit the amount of spatial correlation.

The log likelihood can be written generically as

n
L0g(L) = Z {Yi10gl<D(Xiﬂ/0i(/l))l+(1-Yi)10g[1-¢(Xiﬂ/0i(4))l}, (22)
i=1

Assuming that )6 and ,1 are identiﬁed, and that the conditions in Section 4 hold, the

pooled heteroskedastic probit is generally consistent and V ’7 -asymptotically normal. But,
for reasons we discussed above, it is likely to be very inefﬁcient relative to the full MLE.
Further, estimators that use some information on the spatial correlation across observations

seem more promising in terms of increasing precision.

2.3.2 Bivariate Probit Partial MLE
We now turn to using information on pairs of nearby observations to identify ,6 and A .

There is nothing special about using pairs; we could use, say, triplets, or even larger groups.
But the bivariate case is easy to illustrate and is computationally quite feasible.

For illustration, assume a sample includes 2n observations, and we divide the Zn
observations into n pairwise groups according to the spatial Euclidean distance between
them (Figure 2.1). In other words, each group includes two observations, with the idea
being that the internal correlation between the two observations is more important than

external correlations with observations in other groups. Within a group, the two

58

 

observations follow a conditional bivariate normal distribution because error terms are

assumed to have a joint normal distribution.

Figure 2.1 N pairwise groups of Zn observations based on Euclidean Distance

 

 

 

 

 

 

In group g , we have

* (23)
Yg1= ﬂng11+ﬂ2Xg12 +---+ﬂng1k +8g1
:1: (24)
Yg2 = ,61Xg21+,62Xg22 +...+,6ngzk +8g2, g =1,2....n.
Rewrite the above equations in matrix form as
a: 25
(26)

*
Yg2=Xg2ﬂ+8g2, g=l,2....n,

where X g 1 and ng are 1x K vectors of regressors and ,6 is a K x1 vector. 8g] and egg

59

are scalars. In group g, observation A and observation B are not only correlated with each

other, but also correlated with other observations over space. Therefore, the variances and

covariance between 8g 1 and egg not only depend on the weight within group, but also

weights with other observations out of the group, and of course the parameters, ,1. See, for
example equation (20).
W) = 0 , and the

It is easy to see that E(sg1|X W)=E(eg2|X

gl’ g29

covariance-variance matrix for group g is deﬁned as Qg E Var(eg | X g,W ) where

Slgll £2g12
= _
"Wag IXg’” l-Qg(W’/I)‘[ng21 og22j’ ‘ (27)

where we suppress the dependence on W and A in what follows for notational simplicity.

Note here that elements in ﬁg depend not only on the weight between two observations in

group g, but also weights for every observation in the whole sample, because two
observations in group g not only correlated with each other, but also correlated with other
observations over space. Since we deﬁne two nearby observations as one group, we pick up

the corresponding part (52g) from the whole covariance-variance matrix (equation 20).

Since we cannot observe Y * and Ygz , as we discussed in the univariate Probit model,

g1
we deﬁne

. *
— l lng>0,

Yg _ (23)

. =1:
0 If Yg S 0
Therefore the conditional bivariate normal distribution of Y g 1 and Y g2 given X g is given

as

60

10(1’g,=1,1’g2 =1| Xg)= P(Xgl,6+egl > O,Xg2,6+8g2 >O|Xg) (29)

X1,3 X 2,5
=P(8gl<Xglﬂ98g2<Xg216ng)=(-D2( g , g mg), (30)
Qgii ngz

 

COV(8gl,8g2) 0312

p = = 9
g JVartegIM/aregz) ,iognogzz <31)

 

 

 

where (D2 is the standard bivariate normal distribution, (152 is the standard density

function of the bivariate normal distribution and pg is the standardized covariance between

two error terms.

Given that (8g 1, egg) has a joint normal distribution, we can write

 

5g] : 6g18g2 + egl (32)
where
C ,
agl = ”(85715?” (33)
Var(ag2)

and 8g] is independent of X g and egg.
Because of the joint normality of (8g 1, egg), 8g] is also normally distributed with
E(eg1) =0, and
Var(eg1) = Var(eg1)—6§1Var(eg2 ). (34)
Thus, we can write the conditional distribution of eg] as
(egl ng,£g2) ~N(0,Var(eg1)). (35)

Substitute equation (32) back to 1;] = X gl ,6 + a g1 , and we can get

61

Therefore
X 1,3 +5 15 2
g g g
P(Y 1=1|X ,5 2)=CD(———————). (37)
g g g IVar(eg1)
The reason we want to ﬁnd (37) is to retrieve P(Yg1 = 1,Yg2 = 1 | X g) . Since
P(Yg1=1,Yg2 =1|Xg)=P(Yg1 =1|Yg2 =1,Xg)xP(Yg2 =1|Xg) (38)

XgZﬂ

1jVaI‘(8g2)

) , and thus it remains to get

 

it is easy to see that P(Yg2 =l|Xg)=<D(

First, since Y g2 = 1 if and only if eg2 > — X g2,6, and egg follows a normal distribution

and it is independent of X g, then the density of 5g2 given ag2> — X g 26 egg is

 

 

 

 

 

 

 

 

 

5g2 8g2
¢(\/chr(r;,>,2))(15(\/V67”(5g2)) 3
P(ag2 >—- ngﬂ): <I>( ngﬂ ) (9)
\/Var (8g2)
Therefore,
P(Yg1=1|Yg2 =1,Xg)=E[P(Yg1=1|Xg,eg2)|Yg2 =1,Xg) (40)
X lﬂ+5 18 2
_ g g g _ (41)
_E(D Y —1,X
1 00 WXglﬂ+5g15g2 5g2

 

 

 

 

= d
(D( ngﬂ )J—ngﬂcp iVar(eg1) ¢(2),/Var(eg ))8g2 (42)

, jVar(8g2)

62

and it is easy to see that p(yg1=o|yg2 =1.Xg)=l-P(Ygl =1|yg2 =LXg) because YgI is

the binary variable.
Similarly, we can get

-———-)d£
JVar(eg1) /V0r(£g2 ) g2

 

Pn’gi = I I Ygz =0.Xg) = 15.525 cut

I
Xg216

 

 

 

 

 

 

l—<D(——-————)
,iVar(agz)
(43)
and P(Yg1=O|Yg2 =0,Xg)=I—P(Yg1 =1|Yg2 =0,Xg).
Now we are ready to get P(Yg1 = 1,1’g2 =1 ng) as follows
I Xg115+5g15g2 5g2
13(ng = I,Yg2 =1|Xg)= [:0 <1> (———-)deg2
(W ngﬂ ) XgZﬂ \[Var(eg1)‘ Var(£g2)
1jVaI‘(¢S‘g2)
X 26
g
xq>(-————)
/Var(£g2) (44)
=I°°Xg deglﬁ+5g15g2m 5g2 )dg
/Var(eg1) iVar(gg2) 32’ (45)
and similarly we can obtain ﬁnally
X 25 15“? glé‘ 2 5 2
_ _ g 00 X8 g g
P(Y1-0,Y 2 -1IX )=‘D(—-)- ¢( )¢(——-—-)d£
g g g /Var(gg2) L gzﬂ \/Var(e:1) /Var(£g2) g2
X 1ﬁ+6 16 2 2 (46)
_ _ _ X g g g 58
P(Yg1 — I,Yg2 —O|Xg)-L°82ﬂ¢( \[Var(e 1) JVarbeg 2))dg g2
g (47)
X 2,6 113+5 15 2 5 2
_ __ g X 16(1) Xg 8 g g
_[1 <1>(—————)]—_ 2 <1>( d .
jVar(£g2) I 08 JVar(eg61) \/Var(£g2)) 5g2 (48)

63

2.4 Partial Maximum Likelihood Estimation

As we discussed in the introduction, if the observations are independent, we can
simplify the multivariate distribution into the product of univariate distributions, and then
the ML estimator can be obtained easily. However, spatial correlations among observations
do not allow the simpliﬁcation any more. Under spatial correlation, the situation is kind of
similar to the panel data case. In panel data, we cannot assume independence among
observations over different periods for the same person (or ﬁrm), which means we are not
likely to specify the full conditional density of Y given X correctly. Therefore, we need to
relax the assumption in the panel data case. The way we deal with the problem is that if we

have a correctly speciﬁed model for the density of 1’; given X t, we can deﬁne the partial

log likelihood function as

Max 1; g log ft(yit | X1139), (49)
9€®i=1t=1
where ft (yit | X it: 6) is the density for yit given xi, for each I. The partial log likelihood
function works because 60 (the true value) maximizes the expected value of the above
equation provided we have the densities ft (Vit | X it, 6) correctly speciﬁed (Wooldridge
2002)

We can apply a similar idea to the spatial Probit model: if we have the bivariate normal

densities (152 g(Yg1,Yg2 [X g,6) correctly speciﬁed for each group, we could get a

consistent estimator by partial ML. However, there are several differences between panel

data and spatial dependent data: ﬁrst, the panel data model assumes that the cross section

64

dimension (N) is sufﬁciently large relative to the time dimension (T), but in spatial data we
do not have this assumption. Second, in the panel data model, we view the cross section
observations as independent, while in the spatial data model, even though we divided the
sample into 11 groups, however, we are deﬁnitely not assuming independence among
groups. Observations in different groups are still correlated, but the correlations are
assumed to decay as distances become further away. Third, as we discussed before,
dependence in space is more complicated than dependence in time, and we need to assume
that the correlations between groups die out quickly enough as distance goes further away.
In short, we need to examine carefully how the weak law of large numbers (WLLN) and
central limit theorem (CLT) can be applied in the spatial dependent case. We will discuss
these issues and provide proofs in the following sections.
First, we can write the partial log likelihood function as
n
L = gE1{Yg1Yg2 long(Yg1 = 1,1’g2 = 1 |Xg)+ Yg1(1 —Yg2)long(ng = I,Yg2 = 0| Xg)

+(1—Yg1)(l—Yg2)long(Yg1=0,Yg2 =0|Xg)}, g: l,2...n

(50)
and for the sake of brevity, we deﬁne

Pg<1.1>along(Yg1=l.Yg2 =1ng); Pg(1,0)-Iong(Yg1 =1,Yg2 =01Xg); (51)

Therefore, we can rewrite the partial log likelihood function as

n
L: z {Ylegng(l,1)+Yg1(l—Yg2)Pg(1,0)
g=1 (53)

+(1—Yg1)Yg2Pg(0,l)+(1—Yg1)(1—Yg2)Pg(0,0)}.

65

2.4.1 Consistency of Bivariate Probit Estimation

A A A ,
Consistent estimators 6 s ( ,6,/1) are the ones that converge in probability to the true

/\
value 60 E(,60,/10)' , i.e.6—"L>6O, as the sample size goes to inﬁnity for all possible

true values. In this section, to make the asymptotic arguments formal, we distinguish
between the true value, 60 , and a generic parameter value 6.

A A
In the bivariate probit estimation, the estimator 6 is deﬁned as: 6 maximizes Qn (6)

subject to 6 e O , where O is the parameters set. The objective function Q" (6) is

deﬁned as

1 n
Qn(9)E — z {YgiYgng<I,1)+ Yg1(1 — Yg2)Pg(1.0)
n g=1 (54)
+<1— Yg1)Yg2Pg(0,1)+<1— Yg1)(1- Yg2)Pg<0,0)},
i.e, in other words,
/\
6 = arg max Q, (6) . (55)

666)

Remember that this objective function represents a partial log likelihood, not a fully log
likelihood: we are only using information on the conditional distribution

D(Yg1,Yg2 |X,W) across the groups g. We are not using D(Y1,Y2,...,Yn |X,W) as in a

full maximum likelihood setting.

The identiﬁcation condition is that Q(6) is uniquely maximized at the true value 60,

where Q(6) is deﬁned as

66

Q(9)E lim ElQn (9)1 (56)

n—)OO
This condition typically holds for well-speciﬁed models when there is not perfect
collinearity among the regressors. Further, one needs to be a little careful in parameterizing
the spatial autocorrelation, but standard models of spatial autocorrelation cause no
problems.

The following Theorem 1 states the main consistency result. We deﬁne

5(e)saQ"(e) and lim E[Sn(6)]=S(6).
66 n—>oo

 

THEOREM 1. If (i) 60 is the interior of a compact set 6), which is the closure of a
concave set, (ii) Q attains a unique maximum over the compact set O at 60, (iii) Q is
continuous on O , (iv) the density of observations in any region whose area exceeds a

ﬁxed minimum is bounded, (v) as n—mo,

)<°°,

supg(ll l + I + I + 1 N
”Pr(Ygl=l,Yg2=l|Xg) Pr(rg.=1,rg2=0|xg) Pr(Yg1=0,Yg2=l|Xg) Pr(Yg,=0,Yg2=0|Xg)“

(vi) as n —> oo, SUPg("Xg“+leg”) =00), (vii) SUPngf |Cov(Ygi,Yj,°) ls a(dgj),i= 1,2

where dgf denotes the distance between group g and j , and a(d) —) 0 as d —-) co, and

M”) lim E[Qn (6)] exists, (ix) supglle“ < oo , then 6— 60 = op (1)
n—>oo

Proof: Given in Appendix 1.
Condition (i) is a standard assumption from set theory. Condition (ii) is the

identiﬁcation condition for MLE. Condition (iii) assumes that the function Q is continuous

67

in the metric space, which is a reasonable assumption and necessary for the proof that Qn (6)

is stochastically equicontinuous. Condition (iv) simply excludes that an inﬁnite number of
observations crowd in one bounded area. The minimum area restriction is imposed because
an inﬁnitesimal area around a single observation has inﬁnite density. Condition (v) makes
sure any one of these four situations will be present in a sufﬁciently large sample.
Condition (vi) makes sure the regressors are deterministic and uniformly bounded, which is
not a strong assumption in this literature. Condition (vii) is the key assumption for this
theorem, and it requires that the dependence among groups decays sufﬁciently quick when
the distance between groups become further apart. This assumption employs the concept
from a -mixing to deﬁne the rate of dependence decreasing as distance increases.

Condition (viii) assumes the limit of E [Sn (6)] exists as n —) 00, which is not a strong

assumption. Condition (ix) is actually implied by the rule of dividing groups, which just

excludes that the two groups are exactly in the same location.

2.2.2 Asymptotic Normality

As we discussed in the introduction, the spatial dependence is more complicated than
time-series dependence at least in four perspectives. These differences cause that central
limit theorem (CLT) need stronger conditions for the spatial dependence case. To deal with
general dependence problems, the common way in the literature is to use the so called

"Bernstein Sums", which break up Sn into blocks (partial sums), and we consider the

sequence of blocks. Each block must be so large, relative to the rate at which the memory of
the sequence decays, that the degree to which the next block can be predicted from current

information is negligible. But at the same time, the number of blocks must increase with n

68

so that the CLT argument can be applied to this derived sequence (Davidson 1994).

In this section, we show under what assumptions we are able to apply McLeish‘s central
limit theorem (1974) to spatial dependence cases to get asymptotic normality for the spatial
Probit estimator. This is presented in the following Theorem. AT denotes the transpose of
matrix A.

THEOREM 2: If the assumptions of Theorem 1 hold, and in addition: (i) as d —> oo,

 

2 at:

d a (d? ) = 0(1) for all ﬁxed 61* > 0 (ii) the sampling area grows uniformly at a
a (d )

rate of J; in two non-opposing directions, (iii)

B(60)Elimn_,oo E[nSn(60)S,Z;(60)] and A(eo)slimn_,oo—E[H(eo)] are

uniformly positive definite matrices; then

J;(8—90)—>N[0,A(90)_IB(90)A(90)_1] where Sn(90)EaaQ6"(90) and

 

52
H(90) =—9"T(6o).
0950

Proof: Given in Appendix 1.

Condition (i) is stronger than condition (vii) in Theorem 1, and it is also stronger than
the usual condition in time series data because spatial dependent data has more dimension
correlations than time series data. It shows that how dependence decays when distance
between groups gets further away, and the dependence decays at the rate fast enough.
Condition (ii) just repeats the assumption in the Bernstein's blocking method, the two

non-opposing directions just exclude sampling area grows at two parallel directions, which

69

 

does not make much sense in spatial dependent case. Conditions in (iii) are natural
conditions about matrices, which are implied by the previous assumptions. Matrices are

semideﬁnite if some extreme situations happen such as Pr (Y g 1 = l, Yg2 =l| X g) = 0,

which are assumed to be excluded in the previous assumptions.

2.4.3 Estimation of Variance-covariance Matrices

Consistent estimation of the asymptotic covariance matrix is important for the

construction of asymptotic conﬁdence intervals and hypothesis tests (Newey and West

A A
1987). The estimations of A (i.e. A = A(6)) are relatively easy, usually just obtaining

A A A
sample analogues of 60 with 6; but the estimation of B (i.e. B = 6(6)) is more difﬁcult

and more important because of the correlations among groups. Newey-West (1987)
proposed a method to estimate the variance-covariance matrix in settings of dependence of
inﬁnite order under a covariance stationary condition, and they suggested modiﬁed Bartlett
weights to make sure the estimated variance and test statistics were positive. Andrews
(1991) established the consistency of kernel HAC (Heteroskedasticity and Autocorrelation

Consistent) estimators under more general conditions. Pinkse and Slade (1998) also

A
showed that we can obtain 3" (6)— 8(60) =0 p(l) under regularity assumptions,

where Bn (6) s nE[Sn (6)5;(6)] (see Lemma 9 in Appendix 1). This approach is

feasible in practice only if we can get closed form expressions for E [Sn (6)S,Zj (6)] , which

A
should be a function of 6, and then plug in 6 for 60 in the function to get consistent

70

A
covariance estimators. However, it is difﬁcult to get closed form expressions for B" (6)

in practice, and hence we follow an alternative approach proposed by Conley (1999).
A feasible way to obtain a consistent estimate of a variance-covariance matrix that
allows for a wider range of dependence is to apply the approach of Conley (1999) along the

lines of Newey-West (1987). We follow this procedure in the following Theorem 3.

Let EA be the 0' — algebra generated by a given random ﬁeld V’s m ,3," E A with
Acompact, and let |A| be the number of sm 6 A. Let Y(AI,A2) denote the minimum
Euclidean distance from an element of A] to an element of A2. There exists also a regular

lattice index random ﬁeld W 3* that is equal to one if location s e Z 2is sampled and zero

otherwise. W s * is assumed to be independent of the underlying random ﬁeld and to have a

ﬁnite expectation and to be stationary. The mixing coefﬁcient is deﬁned as

ak,1(n)a suplP(A A B)— P(A)P(B)}, A 6 EM ,8 6 EM and
lAllS k, A2|Sl, Y(A1,A2)Zn.

 

We also deﬁne a new process RS (6) such as

s(e) if W§ =1,

Rs(9)= *
0 if W5 =0.

Then

THEOREM 3. If (i) Ar grows uniformly in two non-opposing directions as r —) oo,

(1'!) 3090) E “mu—>00 1331514905171w (90)] and A(90) E limn—mo—EII'KQOH are

uniformly positive deﬁnite matrices, (iii) Y gi, 137 as deﬁned in Theorem I , i=1,2 and

71

a1: . . . . .
W s are mixing where 05k, 1 (n) converges to zero as n -> 00, S (6) is Borel measurable

for all 668, and continuous on O and first moment continuous on O, (iv)

2;: 1makJ(m)<oo for k+ls4, (v) a1,oo(m)=o(m-2) (vi) for some 5>0,

a/(2+5)

EG|S(60 102“? < 00 and ZzzlmamV”) < 00, (vii) H(6) is Borel

measurable for all 66 O, continuous on O and second moment continuous, A(60)

exists and is full rank, (viii) 2 se Z2 COV(R() (60 ), Rs (60 )) is a non-singular matrix,

(ix) the K Mp(j,k) are uniformly bounded and K Mp0,k)—>1 , n, —>00 as

r—>oo(M,P—>oo), LM=0(M1/3) and Lp=o(P1/3] , (x) for some 6>0,
4+6
EQIS(90 )j) < 00 and Ygi, Y],- as deﬁned in Theorem 1, i= 1,2and WS* are mixing
. a/(2+a) —4 . 2
where 0100,0007?) = o m , (x1) E supgan, 17(61‘ < co and

E sup@”(6/ 66)[Rm, p(6)]|2 < 00, then

A
BT_B(60)=0p(1) as r—)oo

where we split 5 = [m, p] , AT is a rectangle so that m 15 {1,2,..., M} and p 6 {1,2,..., P}

and

72

A L. L. 3,, (a) Rm-,.,_. (2)3.
B. = n; 2: S: Zawgk)

A A T
j=0 k=0 m=j+1p=k+l trim—”-1, (6) Rn”, (6)
M

ﬁzz-lam) a"?

m=1 p=1
To ensure positive semi-deﬁnite covariance matrix estimates, we need to choose an

appropriate two-dimensional weightsfunction that is a Bartlett window in each dimension

kl)

(1- )0“ IF) f0r|j|<LM,lkl<LP.

KMPIJak)=
0M else

Proof: It follows from Conley (1999), Proposition 3.

73

 

2.5 Simulation Study

In the previous section, we have proved that the partial maximum likelihood estimator
(PMLE) based on the bivariate normal distribution is consistent and asymptotically normal.
Moreover, one of the most attractive properties of our new PMLE is that we can get a more
efﬁcient estimator compared to the GMM estimator, and the approach is much less
computational demanding when compared to full information methods. In order to learn
about the gains in efﬁciency that we obtain in the context of a Bivariate Spatial Probit
model when using PML versus GMM, we conduct in this Section a simulation study to

show the efﬁciency gains of PML.

2.5.1 Simulation Design and Results

Instead of comparing our PMLE to the GMM estimator of Pinkse and Slade (1998)
directly, we choose to compare the PMLE to the heteroskedastic Probit estimator (HPE)
because of two reasons: First, the HPE uses similar information with the GMM estimator
because both methods use generalized residuals from the Probit estimation to construct the
moment conditions, which means that both methods use the information from the
heterogeneities of the diagonal terms of the variance-covariance matrix, while our PMLE
uses both diagonal and off-diagonal correlations information between two closest
neighbors. Second, the STATA 8 source codes for bivariate probit estimation and
heteroskedastic Probit estimation are available online, and we can easily add the spatial

parts into these existing source codes to compare PML estimators with Heteroskedastic

 

8See http://www.stata.com/

74

 

Probit Estimators.

According to the theoretical framework given in previous sections, we could generate a
dataset which allows a general correlation structure across groups as equations (8) and (9),
and it requires to specify the exact formula (as functions of 21 and W ) for the elements of

Qg. However, it is quite difﬁcult to derive the pairwise covariances for a bivariate probit
because the exact formula for Q gl 2 (and of Q g1 1’Qg22) is very complicated, which is

an element of the inverse matrix with 2n spatially correlated observations as follows
_ Q111

an Qg12

' —1
nglﬂgZI Qg22]=[(1—/IW) (I—AW)]g = 0g” lez

ngi 0g22

 

 

Qn22J
(57)
Therefore, it seems reasonable to do the following. Let R be the weighting matrix

which can be generated in STATA9 according to the distance between observations
. =1:

Y) = X1161+ X1262 + X13163 + 81’ Egg;
8 2' ARu,

where u ~ Normal(0, In) . The weighting matrix R is standardized so that the diagonal

elements are ones, and then the elements of R shrink as distance is increasing. The reason

we do this is because it is easier to determine Var(el') and Cov( 8i, sj) to apply the HP

 

9 . . . . .

The STATA command 15 Spatwmat. Since the speed to calculate the inverse of a matrix IS much slower as the
size of matrix increases, and moreover the maximum matrix size in Stata is 800, we allow here each
observation to be spatially correlated to nearby 99 observations.

75

 

Inni-
:
\.

and the bivariate probit estimators. In this way, we still allow general correlation across
groups, and we are able to compare the efﬁciency gains from only using the diagonal
information (the HP approach) to using both diagonal and off-diagonal information

(bivariate probit), and we do not require to know the exact formula for the elements in ()8

(given in equation (5 7)) to reach the same goal.
Therefore, we generate the dataset according to equations (58) and (59), which allows
spatial correlation between any two observations, and we set the true parameter values for

,6] , ,62 and )63 equal to 1, 1 and 1 respectively. Since our main focus in this study is on the

estimation of the spatial parameter 21 , we also set different A true values for each simulated

sample: 2 =0.2;0.4;0.6 and 0.8, to test for the performance of the two estimation

methods (PML and HP). These values for}. are in the range of the estimated value in the
empirical application of Pinkse and Slade (1998). In this setting and with 1000 replications,
we consider a sample size of N = 1000 observations (where the sample size is divided into
500 pairwise groups). Finally, we also simulate samples of sizes 500 and 1500 (with 250
and 750 pairwise groups respectively) to check the performance of the two methods in
different samples sizes. The simulation results are reported in Tables 2.1 (for the spatial

parameter ,1) and Table 2.2 (for the ,61, 62 and )63) in Appendix 2.
From Table 2.2, we can observe that both the HPE and the PMLE of ,61 , ,62 and ,63

converge to true parameter values across the different parameter values as sample size
increasing. Also the PML estimator has much less bias than the HPE. Moreover, as
expected, PML always provides smaller standard errors than the HP estimation method and

bias and standard errors decrease in general when sample size increases.

76

Furthermore, it is in Table 2.1 where we can observe the largest advantages of using
PML versus HP. We can see that the PMLE is much better than the HPE in terms of
estimating the spatial parameter/1 . The PMLE is always much closer to the true parameter
values and with small standard errors across different sample sizes and parameter values
(as expected from our theoretical results), while the HPE is much further away from true
parameter values and it is has a much larger standard deviation over the different sample
sizes, even though HPE also shows the trend to converge to the true values in general as the
sample increases. The HPE has always much larger standard deviation than the PMLE,
showing clearly the gains in efﬁciency of PML versus HPE/GMM as predicted by our
theory. Since both the HPE and the GMM estimator use generalized residuals from Probit
estimation to construct the moment conditions, we conjecture that the GMM estimator is
subject to similar inefﬁciency problems in estimating the spatial coefﬁcient. Also, as it is
expected, the bias of the PMLE decreases when N increases.

In summary, from the simulation results of Tables 2.1 and Table 2.2, we see how the
PMLE outperforms clearly the HPE (i.e., the GMM estimator of Pinkse and Slade (1998)),
specially when estimating the spatial parameter/l , which implies that the PMLE is much
more robust and efﬁcient in the context of the spatial probit model. The simulation results
provide clear evidence of the gains in efﬁciency that can be obtained by PML versus GMM,

as predicted by our theoretical results in the previous section.

77

 

2.6 Conclusions

The idea of this paper is simple and intuitive: instead of just using information in
moment conditions (GMM), we divide observations into pairwise groups. Provided we
correctly specify the conditional joint distribution within these pairwise groups, we show
that the spatial bivariate Probit model allows us to use the most important information of
spatial correlations among adjacent observations and to get more efﬁcient estimators. We
also prove that partial MLE is consistent and asymptotically normal under some regularity
conditions. We also discuss how to get consistent covariance matrix estimators under
general spatial dependence by following the approach of Conley (1999) and Newey-West
(1987), which is more usable in practice compared to the proposal of Pinkse and Slade
(1998). The attractive part of this study is that we can get a more efﬁcient partial ML
estimator without introducing stronger assumptions (in sOme sense, we need weaker
assumptions than the GMM method), and the approach is much less computational
demanding compared to full information methods. In order to learn about the gains in
efﬁciency that we obtain in the bivariate Probit model with PMLE versus the GMM
estimator, we provide a simulation study in Section 5. The advantages in terms of bias and
efﬁciency of our new estimation procedure proposed in this paper are clearly demonstrated.
Moreover, if we extend this method to the trivariate or higher dimensional multivariate
Probit models, we can obtain even more efﬁcient estimators, but it comes at the expense of

more computational demands.

78

 

APPENDIX I

A.1 Proofs to Theorems

Proof of Theorem 1. If we can prove that Qn (6)——p—> Q(6) uniformly, by the information

inequality, Q(6) has a unique maximum at the true parameter when 60 is identiﬁed. Then
under technical conditions for the limit of the maximum to be the maximum of the limit,
6 should converge in probability to 60. Sufﬁcient conditions for the maximum of the limit

to be the limit of maximum are that the convergence in probability is uniform and the
parameter set is compact (N ewey, 1994).
To prove consistency, the proof includes three parts:

(i) Q has a unique maximum at 60.
(ii) Qn(6)- Q(6) = op(l) at all e e e.
(iii) Q" (6) is stochastically equicontinuous and Q is continuous on O.
Condition (i) and Q to be continuous on O are assumed. The proof of condition (ii)

is provided in Lemma 1, and the proof that Qn (6) is stochastically equicontinuous can be

found in Lemma 2. Q.E.D.

Proof of Theorem 2. To ﬁnd out the asymptotic normality of the Partial MLE for spatial
bivariate Probit model, we start the proof from mean value theorem. Since%(6) = 0

and by using the mean value theorem

79

 

62
air, 66" 6Q” * ‘—
ae ——(6’)= 0=—-— 519 —(60)+6_eaeT(6 )(6 90) (60)

(61)
. _ 6
=16—60>=-1—62—Q—’} (6* )1 1 —Q——” ———(60)
6666
where 6* lies between 6 and 60. I

2
First, let us discuss the term 191—(6*)to ﬁnd out the asymptotic properties of
6666

 

2 *
519%:(9 ). Recall that
5969

1 11
92(9) = ; 216,113,210, (1.1) + Yg1<1— Yg2>Pg <1, 0)
g:

. (62)
+(1— Yg1>Yg2Pg(0.1)+(1— Yg1><1- Yg2in(o,o>},
where Pg(191)EIOng(Yg1:19Yg2 =1l Xg) etc. Also
22 1 n 62,, (1,1) 62,, (1,0)
—Q—",<6> = — 2 {11.11322 ——g—T—+ 13,10 - Yg2)—g—7—
acne n g=l acne 561219
63
6g2P (o, 1) 21;, (0,0) ( )

 

9

+1—Y 1+—-Y 1—
( g1>1Yg2)———6T666 +( an< g2) aeaeT

80

where

 

 

 

6ng (1,1) = -1 6Pr(Yg1 = 1,17g2 =1|xg)]2
aeaaT [Pr(Yg1 = 1, Ygz =1 1 Xg)]2 (’39
2 (64)
+ 1 5 [Pr(Ygl:19Yg2 =1IXg)]
Pr(Yg1=1,Yg2=1ng) 6666T ,

and all other terms behave similar.
As before, we only discuss one of these terms, and the same logic applies to the other

terms. We know that

 

 

 

 

 

 

 

 

 

 

 

2
l n a P (191) 4;
— ZIYgIYg2—g—T—(e )1
’7 g=l 6666
In -1 6Pr(Y1=l,Y 2=1|X) *
—— Z Yleg2{ 2 g 6; g (Q >12 (65)
n g=1 [Frag] =1.Yg2 =11 X,»
+ 1 62[Pr(Ygl =1,Yg2 =1|Xg )] (6*)}
Pr(Yg1 =1.Yg2 =11Xg) aeaeT °
Look at the ﬁrst term of the above equation given by
1 n -1 6Pr(Y1=1,Y 2=11X)
; 2 Yg1Yg2{ 21 g 219g g (0*)12}.
(66)
Since 1 2 <oo , we can write this term as
“[Pr(Yg1=l,Yg2=1|Xg)]
1 ’7 6Pr(Y1=1,Y2=l|X)
— z Kg111 g g g (9*)12, (67>
ng=1 66

81

where Kgll a Ylegz “1

2
1Pr<Yg1=1,Yg2=11Xg)1

In order to prove

n 6Pr(Y1=1,Y2=1|X) p] n 6Pr(Y1=l,Y 2=1|X )
z Kg111 g 66? g (9512—); >: Kg111 g a; g (60)]2,
g=1 g=1

 

 

 

 

 

 

 

(68)
we need to show that it holds for all “w“:l. Set Kg” -w Kgand then
T] n 6Pr(Yg1 =1,Y =g.11X)
_ K (6
(U {ngz=l glll 6:2 )I2
l n 6Pr(Yg] =1,Yg2 =I| g) 2
__ K ’6
”gzzl gll 66 \ 0)] }
1n 6Pr(Y1=1,Y =.gl|X) 6Pr(Y]=1,Y2=1|X)
721K211 g 652 (6)12— g a; g (90112}
g:
22—5er =1,Y =IX 62PrI’ =l,gY =1Xg
=(é_ 90);"gEKg11 (1 g2 I gm) 6)x (g1 T2 | Q(6)
60 56619
(above, equation (69), (70), (71))

From the proof of Theorem 1, we know that sup gua Pr(Yg ’YgZ =1|Xg)ll< oo

66

”62 Pr(Yg1,Yg2=1|Xg)”
From Lemma 3, sup g < 00. From Theorem 1, we also
I 66667" I

know that 6 — 60 = op (1) and hence

82

__6Pr(l’ gl=lYg2=l|/(\g) 6x*) 62Pr(Yg]=IYg2=I|Xg)(6 1.)

 

 

2
(9 00)_ 2 K II

”g: 1 g 5‘9 aaaeT
=0p(l)

6Pr(Yg]=l,l’g2=I|Xg) 6Pr(l’g1=l,l’g2=l|Xg

)
(601121

 

 

n
26711 2 Kgni (6)1 “—1 z @111
n n

 

 

 

 

 

 

 

 

g=l ae g—l ae
=0p(l)
6P Y =1,Y =1,Y) n 6150’ =1,Y =I|X)
=>- 2 K glll “ g] 5:2 I g(6*)12—p+l Z Kg111 r g] 5:2 g (60)]2
"=g-1 "g=l
(above, (72), (73), (74).
By deﬁnition,
6Pr(l’ =1,Y =I|X 6Pr(Y 1:1,1’ 2=I|X)
11m — z Kgni ,1 .52 g(60)12 =61Kg111 g 6: 5 (601121.
n—>oo" g: _l
(75)
and therefore,
1 ”‘ -1 6Pr(Y1=1,Y2=1|X) * p
— 2 Y,1Y,2( , , a; , (6 )121—>
" g=1 [Pr(Yg1 =1,Y,2 =1 I X,)] (76)
-1 6Pr(Y]=1,Y 2=1|X)
E{Y, 21 , , , (6011’). (77)
“rY,1P( =1,Y,2=11X,)1 56’
Similarly, we can prove in relation to the second term that
1 n 1 621Pr(Yg1=1Yg2=11Xg)1
Z Z YgIYgZPrU —1Y -1|X ) T (6) (78)
g: g1 — 1 82' g 6666
2
P 1 6 [Pr(Y 1:1,1’ 2=1|X )]
TEQ/glygzp Y -1Y -1X 8 6‘ g (90» 79
r(gl-agZ-l g) 5959 ()

As usual, we apply repeatedly the above arguments to the other terms. Finally, we can

get that

83

11m ——Q——62’;(6*—>) [961-6—— 2.0%,)“ (80)
n—->00 6666 6666

If we deﬁne

2 a2
a P (1, 1) 6P (1,0)
g g
HE{Y1Y 2——+Y 1(1—Y 2)———
g g aeaeT g g aeaeT (81)

2 2
6 P (0,1) 6 P (0,0)
+(1-Yg1)(Yg2)—i—+(1—Yg1)(1—Yg2)—£———

aeaeT aeaeT

where H denotes the Hessian, equation (81) can be rewritten as

n [7
lim 1 2 me” )—2 lim E[H(6())]. (82)
"awng: 1 ”9(1)

. . . . 6
Therefore, it remalns to show the asymptotrc normallty of the score term: %(60 ).

 

For the sake of brevity, redeﬁne the score as: S n (60)= _ aaQ" «(60) Then

 

n 6P 1,1 6P 1,0
Sn(90)=% X {YlegZ g( )(90)+Ygl(1—Yg2)—%(90)
g=1 (83)
, an(0, 1) Pg(0 0)
+(1—Yg1)Yg2—(60)+(1-Yg1)(1—Yg2)— (60))

-1
We need to show that B 2(60)Sn(60)—>N(0,IK), where

8(6) 5 lim nE[Sn (6)S,Z; (6)]. Note that the information matrix equality does not hold
n—>oo

here, i.e. — E[H (60 )] ¢ E[Sn (6)8;(6)], because the score terms are correlated with

each other over space. In this part, we follow Pinkse and Slade (1998) and we use

Bernstein's blocking methods and the McLeish’s (1974) central limit theorem for

84

dependent processes. First, deﬁne Than 2116;: (1+i7Dn, j), where 1'2 _—l, and

Dn, j( j = 1,2...an) is an array of random variables on the probability triple (O, F, P). y is

a real number. McLeish's (1974) central limit theorem for dependent processes requires the
following four conditions

(i) {Than } is unifome integrable,

(II) ETnan —) 1,
a 2 p
(111) :14;le .—>l,

J

P
(iv) Max anaj |—>0.

jSan
Now we need to deﬁne D", j in our case. Let
«Ms 6 -1
Y0” - wT {___g(__02} = n 2 2:114” for implicitly deﬁne Ant- In order to prove
3(90) ’-
d

YOn —->N (0,1), we need to establish that the property holds for all ”m” =1 using the
Cramer-Wold device. As in the proof of Theorem 1, we split the region in which

observations are located up to an an area of size ,ibn x ,(bn. We also know that an
increases faster than J6 and bn slower, where an and bn are integers such that

anbn=n. Let an and b” be constructed such that a(,/bn )an—)0. Let

I
T-_ C O
n 2 x [2,, < I, un1formly1nn,for some ﬁxed 0<r<l

2. Let A nj denote the set of

85

indices corresponding to the observations in area j . By assumption a number C > O

_L
exists such that Maxj(#/\nj) <Cbn. Deﬁne ij a n 2 ZIEAnj Ant , and hence

we can write Y0” = 2511.1; 1 Dn 1

Now we are ready to discuss the four conditions for Mcleish's (1974) central limit

theorem. First, look at condition (iv), which requires that Max ID" j I: 017(1)

j<an
_1
Max ”)nd: Max In 2 z AmI. (84)
j<an j<an teAnj
Since by assumption
_1 _L
Max J(#Anj)<Cbn::> Max In 2 z Antlstnxn 2 supHAm", (85)

J<an IEAnj

where # denotes the number of objects, by deﬁnition we have that

T «£8ng —i n n
w {——- = 2 )3 A ,2 A
r—BWO) } ’7 t=1 nt t=1 nt

an an 1) (10)
=wTj—B——{Yg166 Yg2—— <60)+Yg1<1—Yg2>—— (3pr

(60) (86)

(o 0)
M<Ho>+<I—Yg1><1—Yg2>——m0g6 (6 )}

 

an

-1
Since 8(60) is positive deﬁnite, 3(60) 2 is bounded for sufﬁciently large n, and

we have that sup g ”Y gnll < 00 by assumption (vi) in Theorem 1. We have also proved that

86

an(1,1)
aa

sup g < co in Lemma 2. Therefore, we are able to prove that supHAm” < oo .

 

 

 

 

_L _L
Then Cbn xn 2 sup ”Am ||=0p(Cbn xn 7- )=op(1) by construction ofbn.

Hence we can get that Max |Dn, j I: 010(1).
jsan

Second, let us discuss condition (i): {Tnan} is uniformly integrable. Following l

Davidson (1994), if a random variable is integrable, the contribution to the integer of

 

:1!“

extreme random variable values must be negligible. In other words, if

E lTnan |< oo,E(|Tnan lllTnan |>K)—>O, as K -—) 00, it is equivalent to say

P[supn> N lTnan |> K] = 0, for some K > O as n —) 00. Here we follow the proof of

Lemma 10 in Pinkse and Slade (1998). We have that
(87)

P1 sup ITna |>K]=P[ sup In”: <I+inn ')|> K1
” 1‘1 ’j 88
n>N n>N ( )

sP[ sup milglnyng j)|> K]
n>N J ’

={p[ sup |chi’;1(‘/l+y2Dg,j)l>K|( sup n7|Dnj|sC)]xP[supnT|Dnj|sC1

n>N n>N,j (89)
+P[ sup |H3:1(‘/1+72D’3 J.)|> K|( sup n1anj|>C)]xP[supnT|Dnj|>C]}
n>N ’ n>N,j
s{P[ sup In?“ /l+y203 j)|>1<|( sup nT anjlsC)]+P[supnTanj|>C] (90)
n>N ’ n>N,j

where C is a uniform upper bound to ZtEAnj Am. Therefore,

87

-1
P[supnr anj |> C]: P[supnT |n 2 2 Ant |> C]

tEAnj (91)

1 1
r—— T——
=P[supn 2 )3 |Am|>C]sP[supn 21),, z lAnt|>C]=O
tEAnj tEAnj

(92)

1
T__._
since n 7— bn <1 and by construction of bn. Then,

an
H sup III a" <,/1+7202>I>K|( sup n an I<C)]<P[ sup 10+an 2TC2)2 I>Kl=0
j= l 0,11] J
n>N n>N,j n>N

(93)
provided we set K sufﬁciently large. Therefore, we proved that

P[supn> N lTnan |> K ]=0:> {Tn} is uniformly integrable.
Third, condition (ii) requires that ETnan —)1, which is equivalent to say that

Erna" —l = 0(1); see proof in Lemma 4.

. P
Fourth, 1n order to prove (111): 2‘11“"1D2n j_)1, by Lemma 8,

2a.}? 132 —1=za.a1E(1)2 .)—1+0p(l) and

=1 n J' "J
an 2 2 an

)3 15(1)” j)—1+0p(1)=E(YOn)—1— z E(Dm-Dnj)+op(1)=op(1), (94)
j=1 i¢j

by construction of Y0", since E(Y02n)=1. It remains to show that

88

 

Egg]. E (Dn iDnj) = 0(1). This condition is proved in Lemmas 5-7'0. Q.E.D.

 

 

IOLemmas 5-8 are along the lines ofthose in Pinkse and Slade (1998), which are a simpliﬁed version ofthe
proofs in Davidson (1994).

89

A.2 Technical Lemmas
The proofs of Theorems 1-2 require the use of the following Lemmas 1-8.

LEMMA lender the assumptions in Theorem I, Qn(0)—Q(6)=o [7(1) for all

668.

Proof: we can rewrite Q" (6) as

n
2 {Ylegthg(1,1)— Pg(1.0) - Pg(0.1) + Pg(0,0)]
g=1
+Yg1rPg<1.0>- Pg(0,0)] + Ygthg(o,1)— Pg(0,0)1+ Pg(0,0)}.

91(9):},-

(95)

Since we assume that lim E[Qn(6)] exists, and by deﬁnition
n-—)OO

Q(6); lim E[Qn(6)], this implies that: Q(e)—E[Qn (e)]=o(1). In order to prove
n—-)00

Qn(9)—Q(6)=0 [7(1), we only need to show that Qn(6)— E[Qn(6)]=o p(l). That is

equivalent to prove that the distance between Qn (0) and E[Qn(6)]is inﬁnitely small as

n -> 00. That is: EllQn (0)— E [Qn (6)]“2 —> Gas n —-) 00, and by deﬁnition, it is equivalent

to Var[Qn(6)]—)O as n—>oo.

It is easy to see that

V arngj [Qn (9)] =
I n n
n g: J:
+27ng17nj3 COV(Yg1Yg29Yj2)
+7ng27nj2C0V(Yg1,Yj1)+27ng27nj3COV(Yg1,Yj2)+7ng37nj300V(Yg2rYj2)r

Where yng] =[Pg(l,l)—Pg(l,0)—Pg(0,1)+Pg(0,0)],)/ng2 = [Pg(l,0)—Pg(090)]’ and

90

 

7ng3 =[Pg(0,l)—Pg(0,0)]. The same deﬁnition applies to Ynjls 2’an and 7nj3-

 

Note that here
X 1ﬂ+6g18g2 8g2
P 1,1 =1 00 <1>( g ¢(———)de } (97)
g( ) ogﬁ-ngﬂ \/Var(eg1) Var(£g2) g2

which is not a function of Y g or Y J Hence Yngl is not a function of Y g or Y J The

same logic applies to the other terms (7ng2a 7ng3a 7nj1a7nj2 and 7nj3 ). Since
05 Pg(1,1) .<_ 1, the same applies to Pg(1,0), Pg(0,1) and Pg(0,0). Therefore, it is
easy to see that ll’ngils 2, and the same lYnjil» and hence lYngiYnjilS 4, i=1,2.

Therefore, we can write

Supngj l Var[Qn (9)1 |=
lnn4YYYY 8YYY8YYY)(98)
__ , . . , - + , '
n2 gE—IjE-1{ COV( g1 g2 jI j2)+ C0V( g1 g2 11) 00V( g1 g2 j2
+4cov(Yg1,Yj1)+8cov(Yg1,Yj2)+4cov(Yg2,Yj2).

n n
In the previous equation, ﬁrstly, let us look at the term —% Z Z 4cov(Yg1,Yj1)
l n n 1 n n 4 n n
~3— z _z 4cov<Yg1.Yj1)s——2— z .2 4Suplcov<Yg1,Yj1)Is~—5 z .2 a<dg,-) (99)
n gzljzl n gzljzl n gzljzl
by assumption (vii). Therefore, we need to prove that
4 n n
—- Z 2 a(dg]') =0(1) as n —9 00. (100)
2 .

Following Pinkse and Slade (1998), we also use the Bernstein's (1927) blocking

method to prove this as follows. We split the region in which observations are located up to

91

 

an an area of size ch/bn x c2,/bn. We also know that an increases faster than x/i;
and bn slower, where an and bn are integers such that anbn =n. Without loss of

generality, we assume c1 = c2 =1 , and let an and bn be constructed such that

1

a(,/b )a —>O Let n7 2xb <1 uniform] inn for some ﬁxed O<z'<l B
n n - n , Y , 2. y

__1_
construction of bn , 0p (n 2 bn) = 019(1). Then we are able to apply the same idea to

our case. In our case, the groups g and j take the role of an and bn, where one grows
faster and the other grows slower than x/ii. We also know the d g] is the distance between
I g — j | . So we can ﬁnd an upper bound for | g — j I as the maximum between group g
and j. Let us suppose that j is the one that grows faster than Jr; and g is the one that
grows slower thanfo; . Then we can cancel one of the summations corresponding to g

l l

with n— .

#2

deﬁne Z ja( j) as the one that grows faster than J; but slower than n in such a

Moreover, since j grows faster than x/ﬁ but slower than n_ , one way is to

J=1
waythat
n n N5
Z Za(dgj)=0(- ja(j)). (101)
gzljzl njzl

J; )2

Finally, 2 ja( j) grows slowerthan n and therefore, 0(l Z ja(j))=0(l). So,
. n .
J=1 J=1

we can get

92

 

n
)3 a(dg]') =o(1).

1 n n 4 n
‘3' Z Z 400V(Yglryjl)$—2 Z .

n gzljzl n
(102)

n n
We can apply the same logic to -17 z z 890V(Ygerj2) and

n g=lj=l
n n _ 1 n n
‘12— 2 £4 cov(Yg2,Yj2), Let us consrder 7 Z Z 4COV(Yg1Yg2’Yj1Yj2)- If we
” g=1j=l n g=1j=1

deﬁne Yg =Yg1Yg2 and Y j =Y jlY 1'2, we can apply the same logic to prove that

n

2 a(dg]°)=o(1). Therefore, we are able to show
1j=1

n n
z; z 4cov(Yg,Yj)S

_1_ A. ’2’;
2 . 2
n gzljzl n g:

that

n n
EIIQn(6)—E[Qn(6)1|12SSupnngVarIQn(6)JIs36- z 2 a(dgj)=0(1)o (103)

n g=1 jzl

Hence, Q(6)— E[Qn(9)] = 0(1) 2» Qn (e)— Q(6) = 019(1) at all a 6 (99.5.1).

LEMMA 2 Under the assumptions in Theorem 1, Qn (9)—Q(6) is stochastically

equicontinuous.

Proof: The proof requires only to show that Qn (6) is stochastically equicontinuous

because Q(6) is continuous by assumption (iii). We have that

93

 

 

 

n ~
2 {Yleg2[Pg(1,l,0)—Pg(1,1,6)]
g=1
+ Yg1(1— Ygz )[Pg(1,0, 19) — Pg(1,o,§)] (104)
+ (1 — Yg1)Yg2[Pg(O,1,0) —- Pg(o,1,a )]
+ (1 — Yg1)(1— Yg2)[Pg(0,0,6) — Pg(o,o,5)]

Qn(9)-Qn(5) =

3|.—

By the mean value theorem

 

 

~ 1 n 6Pg(1,1) 3I‘ ~
Qn(9)-Qn(6’)=- z {Yg1Yg21 T (6 )(6-6)1
6Pg(1,0) :1: ~
+Yg1(1-Yg2)1 T (6 )(e—an
66 (105)

 

5P (0,1) ~
g T who—6)]
69

6Pg

+(1 —Yg1>Yg21

(0,0) :1: ~
T (9 )(9 - 9 )1}
66

 

+(1—Ygl)(1—Yg2)l

where 0* lies between 6 and 5 . In order to prove Qn ((9) is stochastically equicontinuous,

it is sufﬁcient to show that

an(1,1)

 

1 n
SUP l—YlegZZ

(9) |= 019(1), (106)
BEG) n g=1 (MT

and the same requirement applies to other terms. For simplicity issues we just prove one of

them and the rest follow the same argument. Recall that

Pg(1,1) a long(Yg1 =1,Yg2 =1 ng), (107)

X X
and note that Pg(Yg1=1,Yg2 :1 ng)=(I)2( glﬂ ’A’pg ng), where (I)2

lel Qg22

 

is the bivariate normal distribution function. Also

94

 

X
6[log(D2( Xgl'B gZin

an(1,1)= ,/og11’/Qg22

aeT aeT

 

and since 19 a (IBJ)

 

810 (11) tan(;,1)(ﬂ)
g ’ (e)=i aﬂ

aeT an(1,1)
l 6,1

 

 

 

 

an(1,1)
aﬁT

We focus ﬁrst on

 

(,6), where

X
6[log<D2(Xg1'B gZ'Bp

an(1,1)= maﬂ gllT ’Q/ g22

aﬁT

 

¢2(

with

Xg2ﬂ_ Xg—lﬂ)

X815 )¢((Q\/ 822 p9\/ gll

 

 

581: A t—_ ),
\/ 1"Pg

Xglﬂ _p XgZﬂ )
Xg2ﬂ lel ngz

 

 

 

 

Sg2 = ¢( )¢( ).
N/Qg22 l-pé

95

(108)

(109)

Sngg1+ SgZXg2

 

 

(iggll (/Qg22
Xglﬂ XngB

’ ’pg)
lel ngz

(110)

 

 

(111)

(112)

 

By assumption (v)

1
sup =

X X
g ¢2( glﬂ ngB

(lﬂgll ’

1 l (113)
gupPr(Yg1 =1 ,Yg2 =.1|Xg)”<°O

 

 

 

 

 

 

"5 5+ng81 82X82 ll

if? (22—ll

and it is easy to see that

 

< 00 provided that sup g (“X g H) < oo.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Therefore,
6P 1,1
SUP g(T )(ﬂ) <09 (114)
8 5,5
6P Y =1,Y =1X
We now discuss the second term g( g1 aAgZ I g)(/1), where
X lﬂ X 2,3
6[108<D2( g , g M
6Pg(1,1)= (lggll Qg22 (115)
62. 0/1
8 1 5g2 ggl 8g2
¢2( g 9 9pg) a¢2(______,_______’pg)
(/Qg11 (/Qg22 X x/an w/ngz (116)
_ X 1.3 X zﬂ 54
lel ngz
5 2
5¢2( g p)
/Qg1/’§22 g
and after some algebra, we can prove that sup g glla g22 <oo

 

 

provided that sup g ”W g li < oo.

96

has

 

an(1,1)

an(1,1)
a).

apT

Therefore, it easy to see when sup g < co and sup g < 00, we

 

 

 

 

 

 

 

 

an(1,1)

 

<00. (117)

 

 

 

 

aeT

We apply the same logic to the other terms, and we can prove that sup g

810 1,0
g( ) (19)” ,
aeT

 

 

 

 

 

 

 

 

 

 

6P 0, O
sup g Wm) and sup g JET-2(6) are also bounded.
66 66
1 Pg(1.1) .
Therefore, ﬁnally sup |— Y le g2 Z” _l—T—-(6) |= 019(1) given
666) n g— 59

supg(”Yg“) =0(1), and hence we can prove that Qn(6)—Q(6) is stochastically

equicontinuous. Q.E.D.

LEMMA 3 Under the assumptions in Theorem 2,

 

sup “62 Pr(Yg1=1,Yg2 =1|Xg )[l < OO-
aeaaT I

Proof: From Lemma 2, we know that

97

 

6Pr(l’g1 =1,Yg2 =l|Xg)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8,87"
xgzr_ng1/3 Xglﬂ_ ngZﬂ
¢ g1/3 ”(19.82 p108“) 826 )¢(,Iﬂgll szQg )X 2
IQ “(I9 22 g
gll l-Pg g l—pg
= +
,Ile (9822
2 ,
6 P I =1,Y :1 X
:3. r( g1 g2 l g)
666,6T
Xg2/3__ng1ﬂ ngﬂ pxglﬂ 282 _plsL
x 1¢(———— ﬂIXH “1&tingngng >1+ +¢1Vng22 pJanangzz VQg”)
g 81 r—‘2 I
(IQ: 0811 til‘ pg l—pé \Il—pg
Jﬂgll
Xg]/’_ ng2ﬂ X__g_lr5_ ngZ'B Xgl_ ng2

 

 

 

Xg2———¢( 2224M XgZ XgZﬂcb [Vﬂgl szng )1+ +¢1Vle pVngz )IVQg11 pVngz)
VQ Veg” ("Pg \I‘ Pg 1/' Pg

Jogzz

 

+

(above, (118) and (119))

and even though the above expression is complicated, it is easy to see that all the terms are

bounded provided the assumptions in Theorem 2 hold. This is equivalent to

sup 62 Pr(Yg1 =1, ,Ygz— _ 1 |Xg)II<00 (120)
g 560.5 II<00

98

 

 

 

 

 

 

 

 

 

6x1
p) 642(821 822 p)
”__9 g 9 g
“11:11 (mg g22 x/an x/QgZZ
, (121)
Xger XgZﬂ (2’2
¢2( Pg)x
x/lel r/ngz
2
6 P Y =1,Y =1 X
:> 1‘( g1 g2 l g)
5,12
{3852 2
_ —) (<D2— mm m 52¢2 (122)
<¢2r2 “’2 642

It is easy to see that the ﬁrst term of the above equation is bounded from previous

6152"
52. ll

 

results (i.e. sup g < oo ) and the second term can be also proved bounded since

 

 

 

2
6 ¢2 can be proved to be bounded given that sup g IIW gII < 00 after some algebra. Hence
622

sup II62Pr(Y=1Y1,Y g—2 =1|X 8’2”“), QED
8” (99697" I

 

LEMMA 4 Under the assumptions in Theorem 2, E Tim” —1 = 0(1), where
Tm = 11"." (1 + in -).
n j =1 ”2 J
. . a . .
PrOOf: By deﬁnltlon, Tnan '2 Uj:1(1 +17Dn, j) = T", an _1 +1771”, an _ann. By

repeatedly multiplying out, we ﬁnally get Tnan = 1 + i 7 23.21726 j—anj' Hence,

99

 

an
ETnan —1=E(i7 .2 Tn,j—anj)- (123)

F1

In order to prove ETnan —l=o(l), we just need to show that:

E (i 7 231.1; 1 T n, j— 1 Dnj) = 0(1). This is equivalent to prove that

E(Tn,j—anj)=0(aﬁ1). We can rewrite Tn,j—l as Tn,j—1 = Hi;j(l+i7Dn,/€)‘ We
know there are j—l groups of D", k in Tn,j-1- We split these j —1 groups into

two parts: groups adjacent to group j , and groups that are not adjacent to group j . We

. then deﬁne the area 53. nj- 1 as the area which is adjacent to group j. Therefore,

Tn,j—l : nkeEnJ-_1(1+WDn,k)erEnJ-_l(1+i7Dn,k)= nkEEnj—l (1+i7Dn,k)Tan2
where Tan a 1'1 kEEnj—l (1 + iYDn, k), which includes the groups which are not

adjacent to group j.

Since Tn,j—l = HkEEnj—l (1+i7Dn,k)Tan, wejust need to prove

EtDnjtnkea, 1 (I + ian, k )2an )1 = EanjTan(nkeEnJ-_1(1+i7Dn,k))l= 01a; 1 ). (124)

y'-
We know that
EanjTan(nkeEnj-_l (1+iYDn,k))l= EanjTanU +17 2 Tn,k—ank)]
keznj_1
= EanjTanl+EanjTanUY E Tn,k—-ank)]-

kEEnj_]

(above, (125) and (126)).

First, we look at the term E[Danan]. Since Tan EHkﬁEnj_1(1+i7Dn,/€)2

that means the group is not adjacent to group j. By Bernstein's method, we split the region

100

 

1

in such a way that the distance between group j and non-adjacent group is at least b2 .
Hence, Max | E[Danan] |= Max |cov(Dnj,Tan) = 0(1/bn) provided E(Dnj) = O

and by assumption (vi) in Theorem 1. By construction of an and bn , org/EM” = 0(1),

and hence we obtain Max | E[Danan] |= o(a,7,'1 ).

Second, we look at the term E[Danan(iy ZkeEnj_1 Tn',k-1an )]. We have that

EanjTanUY 3: Tn,k—1an)l=i7 _Z EanjTannkeEnJ-_1 an )l-(127)
keanj_1 kEan_]

Consider E [Danan an )1 ﬁrst. We know that
E[Dananan )] = COV(Dnj,Tanan) pI'OVided E(Dnj) = 0.
Since cov(Dnj, T anan) —) cov(Dnj, Tan) as n —) 00, because Tan gets more

and more terms (all groups not adjacent to group j ), while an keeps the same amount.

In the ﬁrst step, we have proved that cov(Dnj,Tan) =0(aﬁl), and by the same
argument cov(Dnj, Tanan) = 0(aﬁl).

Therefore, we can prove that E(Tn,j_1Dnj)=o(aﬁl):>ETnan —1=o(1).
Q.E.D.

LEMMA 5. Under the assumptions in Theorem 2, Egg]. E (DniDnj) = 0(1).

Proof: We know that

ngjHDniDnj): Zlc'l_—’31AV-Cjz";1E(Dn1'Dnj')"X521]-E(Dm'Dnj)=0(1) if we can

101

 

show that Mafolznl|E(DniDnj)|=o(a,;1). This is equivalent to prove

Egg]. E (DniDnj) = 0(1) because the summation over j contains an —1 terms.

Deﬁne Em] as the set of indices corresponding to blocks that have 1 blocks

removed from every direction from block I . In other words, we assume there are no more

than 81 blocks within distance 1. Hence,

an an
Max 2 |E(Dni-Dnj)lS Max X X |E(DniDnj)| (128)
i=1 [=1 jEEm'I
Van
SMax Z |E(DniDnj)|+Max Z X lE(Dm'Dnj)|-
team-1 [=2 jean” (129)

The ﬁrst term is proved to be 0(n_1bn) = 0(a; 1) in Lemma 6. The second term can

be also proved to be 0(aﬁ 1) in Lemma 7. Q.E.D.

LEMMA 6: Under the assumptions in Theorem 2,
Max 2,211 E(Dm'Dnj) 1= o(n‘1bn) = otazl ).

.1
Proof: Since D", j = n 2 ZtEAnj Am by deﬁnition

Max 2: |E(DniDnj)l'—' Maxl'esjln_1 >2 E<AnsAm)I (130)
i¢j seAni,teAnj
SMaxiej'Cin“ z a(dts)
seAni,teAnj (131)

102

 

because E(AnSAnt) = Cov(AnS,Am) = C1a(dts ),. where C1 > 0.
To compute the upper bound of the correlation between i and j , we just need to
consider the strongest case, e.g. the i and j are adjacent each other. By Bemsteins'

blocking method, the number of (t, s) combinations that are within distance d is bounded

by C2 bndz, where C2>0. Hence we can get

MaxiejCln z a(dtS)SC3Maxi¢jn f1); )3 d a(d), (132)
seAni,teAnj d=0

where C3 = C1C2,C4 >0.
By assumption (ii) in Theorem 2, d 2(2t(d)—)0, as d 900. Therefore,
C 1/b
. . —1 r— 4 " 2 _ —1
C3Maxl¢jn bn Z d Q(d)—0(n b”). (133)
d=0

Since anbn =n by construction, 0(n_1bn) = 0(aﬁl). Q.E.D.

LEMMA 7: Under the assumptions in Theorem 2,

. . -1
Max:213? XjEEnil I E(DmDnj) l= 0(an ).
Proof:
Because MaijEnil X MaxseAni X MaxtEAnj |E(AnsAnt) = 0(ax/bn (14)),

we have that

103

 

o: (—
Maxlz X |E(DntDnj)|< CSMCDCIXZ #:m’l" X#AniX#Anja(\/——(l— 1)) (134)
2 163ml
an M
SC6n_lbgl\/;Y—a(\/Z;l)=o(n—lbnl z a(1)=o(n“b,,) (135)
/= [=1
l (136)

= 0(a;l ).

where # denotes the number of objects, and 0(n'1bnl Z ___aln 01(1) = 0(n_lbn) follows '

 

 

2 r1
. . d a(dd )_
from assumption (1): as d—>oo, QHED.
a(d ) _
LEMMA 8: Under the assumptions in Theorem 2,
2
zjnD _1 nj=ZCJI-1_E(Dn,j)+0p(1).

Proof: In order to prove Zanl D31, =ZQn1E(D’% j) + 019(1), it sufﬁces to show

that
an an D2
5 Z Cov(D2 I" Dnj):0(1)' (137)
i: 1j--1 n, n,
We have that

an an an an

2 2
Z Z C0V(D2 ,D -)= Z X {[D --E(D2 “HID -E(D2 -)]}
i=1j= 1 n,i "’1 i: lj==l n,1n,j ”’1 (138)
C8Van (139)

5C7 z (1+1)a(JZ;1)MaxE(D:i),
[:0

where C7 , C8 > 0 are large enough. Also

104

MaxE(D:l.) .<. n_2Max z IE1Am1,Amz,Am3,Am41

t1,t2,t3,t4EAnj (140)
—<— C9n‘2Max j z {01(dt1,t2)+m+a(dt3,t4)}
tl,t2, t3, t4EAnj
(141)

SClon—ZMaxj' z {a(dt1,t2)}
t1,t2€Anj (142)

_2 2 CUE _2 3
_<_C11n bnMaxj 2 Z 105(1)=0(" bnh
IIEAnj [=0 (143)

where C9,C,O,CH,C,2 >0, Sup|2ﬁ01a(l)|<oo. Therefore ﬁnally

C3\/a—
C7 2 n (1+1)a(\/E1)MaxE(D:i) = 0(n—2bgan) = 0(1), (144)
[=0

because anbnzn and ”—1123 —)0 as n—+oo. Q.E.D.

Finally, the following Lemma 9 generalizes Pinkse and Slade (1998) results as a way to

obtain consistent estimates of the variance covariance matrix.

””4
|| 50

5‘1’3 ll

LEMMA 9: If assumptions in Theorem 2 hold, and sup g 66 II

+ < 00, then

 

24,09“) — A(BO ) = op (1) and 3,,(é) - 8(60) = 0,,(1) where 3,,(6) a nE[Sn(6)S; (9)]
and A" (9) E —E[H(t9)].

Proof: First, we prove that An(é)—A(60)=op(1) . We know that

An(é) = —%Z§=1Hg(é) , and by deﬁnition, lim An (60) = A090). So we just need
"—900

prove that wT{An(é)— lim An(60)} =op(l) for all llwllzl. From the proof of
"—)OO

105

 

Theorem 2, we have already proved that

—z"_ Hg(é)—>-ngz= Hg(60) (145)
as n —> 00, provided that 61 - 60 = 019(1) which is proved in Theorem 1. Therefore, we
can get An(é) — A(t90) = op(1).

Second, we consider how to show 13,, (B) — 80%) = op(1). As before, it is sufﬁcient
to show that Bn (é) - 8(60) = op(1) as n —) 00. We know that
Bn(6’o) = nE[Sn (90)S£(00 )] = nVar(Sn(190 )) given Sn (90) = 0. Recall from the

proof of Theorem 2 that

1 n 6P (1,1) 6P (1,0)
; Z {YlegZ (90)+Yg1(1—Yg2) (90)
g=1 f (146)

GP (0,0)
g (60)}.

 

 

Sn(90)=

 

 

51’ 01)
+(1—Yg1)Yg2 (90)+(1—Yg1)(1—Yg2)

and we can rewrite it as

 

 

 

 

 

 

 

 

5Pg(1,1) 6Pg(1,0) 6Pg(0,l) 6Pg(0, O)
Sn(90)=- Z {YgIYgZI 6’0)'"‘-—— 90 " (90H (90 )1
ng_1 619 66
01’ (1, 0) 51’ (0,0) Pg(0,1) 5P (0, 0)
+Yg11 g6 (90)- gg (601+ 2’ng3 (90)- g6, (60)]
51’ (0,06)0
8
+ 69 (60)}- (147)

For the sake of brevity, we redeﬁne

106

 

10 6P 0,1 612 0,0
(11)66P 6Pg(,)6 _ g( )(o)+ g( ) (148)

an1=[—— 0 - aa 0 66 0 66 (90)],

an2 = [aP—jg—f 5“ 0)(190)- anw ——0)-(00 )] (149)

an3=[6—P—:—:g)’l) 60— an:(:0’ ———)(90 )1 (150)

an4 = $1290). (151)
Therefore,

Var(Sn(90)) = n“Bn(60)
n n
22 Z {anIan1C0V(YgIYg2 leYjZ)+2ananj2C0V(Yg1Yg2 le)
8:1}: 1
+21Yng1an3Cov(Yg1Yg2,Yj2 ) + an2‘l’nj2C0V(Y g1,Y jl )

152
+2an2an3Cov(Y g1 , Y j2 ) + V’ng3ll’nj3C0V(Y g2 , Y j2 ), ( )

where anLV/anall/nj3 are deﬁned similarly as anangZaV/ng}

As before, we just need to provide the proof for one of these terms, and the same logic

applies to other terms. We consider the most complicated term and the rest follow the same

argument
—1 n n
'1 Z Z [ananj1C0V(Yleg2stleZ)1
gzljzl
—‘1n n 'EY Y Y-Y- EY Y err-(153)
—" 21 .XllllngIV/njll '( g1 g2 11 j2)— (g1 g2) (jl ].
g: J:

= ¢4<yg1,yg2.yj1,y)'2.p12,p13,p14,p23,p24,p34) (155)

where (D4 is the cdf for the quadvariate standard normal distribution,

Ygl

y l:—
g ,/Var(Yg1)

etc. Similarly,

107

”1

I g '
I

E(Yg1Yg2)= Pr(Yg1 =19Yg2 =1|Xg)=q’2(yg1,yg23912), 83$;
E(Yj1Yj2)= Pr(Yj1 =1,sz =1|Xg)=¢2(yj1,yj2,P34),

and therefore,

E(Yleg2)E(leYj2) = ¢2(yglayg2ap12)x(1)2(yj12yj22p34) 833;
= (D4(yglayg22yj12yj2’p12’0202090’p34)’

so we can write the ﬁrst term as
“'1 n n
Bn (90) = n 2 Z V’nngnjl [E(Yg1Yg2Yj1YJ-2 ) — E(Yg1Yg2 )E(Yj1YJ-2 )] (160)
g=1j=1

n n
= n" X Z wnglwnJ-1[<D4(yg1,yg2,yj1,yj2,,012(90),P13(90)a (161)
g=l i=1
1014(90 )9 P230510 )4 p24 (90 ), P34(90 ))

Similarly, we can write the ﬁrst term of B” ((2) as
n n

_l A A
n 21 ZangIV/nj11q34(yglayg2ayjlayj2ap12(6)apl3(9)a
g: J:

p14 (é), p23 (é), p24 (é), p34 (BAD (162)
— (1)4(yg1ayg22yjlayj22p12(6)909090909p34(6))1-

By the mean value theorem, the ﬁrst term of Bn (0) — 3(60) is given as

108

n n ,. ,, ,. . ,. .
n—1 2 Z anlV/njl[¢4(Yg1,Yg2,le,Yj2,p12(9),p13(9),pl4(0),p23(9),p24(9)4234(9))

g=1j=1
-¢4(Yg1,Yg2,yj1.yj2,p12(00),p13(6’0),p14(90),p23(00),p24(00),p34(90))]
-¢4(yg1.232,le072,2212(9),0.0,0.0,p34(é))]

-<D4(yg1.yg2,yj1,yj2,p12(00),0,0,0,0,p34(60))l (163)
—l . n n
=n (9‘90) Z Z l.l’ngIIanl
g=1j=l

II! III II! *n *- *
{aq’MyglaJ’gZleaJ’j2,P12(9 M1309 ),p14(6 M22309 0),p24(9 0),p34(6’ )
66

* *
(361

 

 

(164)

 

Since sup gljll’ngl H < 00 by the proof in Theorem 2, we just need to assume

a¢4<yg1,yg2,yj1,yj2,p12(6"‘).p13(6*).p14(6*),p23(6*),p24(6*),p34(6’*)” < oo

 

sup

 

.. 1 ’
g
(165)
and the same argument applies to
:1: a1:
a¢4(y 19y 9y . 2y ' 2p (0 )’0’O’O’O’p (6 ))ll
sup g g2 j1 j2 12 34 <00 (166)
66 H
8
so that

109

_1 . n n
'7 (9‘60) 2 Z V’nglanjl
gzljzl

alt :1: alt *A alt» 3k
6<D4<yg1,yg2,yj1,yj2,p12(6 ).p13(6 ),p14(6’ ),p23(0 6),p24(9 (9)4234“? )
66

Ill *
_5¢4(yg1,yg2,yj1,yj2,p12(0 )909020202p34(6 )}_)
66

{

 

0 (167)

 

because (9 — 60) —> 0 and the other terms are bounded.

 

Repeat the proofs to the other terms, plus the new assumption about sup g

 

 

 

 

and then we can prove Bn (6) — 13(00): op(1). Q.E.D.

110

 

 

APPENDIX II

TABLE 2.1: Simulation Results of Different Estimators of lambda in the Context of

the Bivariate Spatial Probit Model"

 

 

 

 

 

2:02 2:04 2:06 2:08

HPE PMLE HPE PMLE HPE PMLE HPE PMLE

N=500 mean 3.938 0.514 6.177 0.519 7.698 0.571 7.735 0.634
bias 3.738 0.314 5.777 0.319 7.098 -0029 6.935 -0.166

(s.d.) (12.158) (0.120) (15.776) (0.205) (16.929) (0.151) (16.202) (0.289)
N=1000 mean 3.174 0.512 4.668 0.518 5.456 0.581 5.914 0.672
bias 2.974 0.312 4.268 0.118 4.856 -0.019 5.114 -0.128

(5.6) (8.844) (0.107) (9.100) (0.133) (9.631) (0.149)(10.173) (0.276)
N=1500 mean 2.746 0.511 4.050 0.507 4.872 0.609 5.426 0.708
bias 2.546 0.311 3.650 0.107 4.272 0.009 4.626 -0092

(s.d.) (6.423) (0.099) (7.414) (0.124) (8.598) (0.149) (8.514) (0.253)

 

 

 

 

 

 

 

 

 

 

" Results are presented for our new Partial Maximum Likelihood Estimator (PMLE) and
the Heteroskedastic Probit Estimator (HPE) of 2. Numbers in brackets show standard

deviations (s.d.).

111

 

 

TABLE 2.2: Simulation Results of Different Estimators of betas in the Context of the

Bivariate Spatial Probit Model"

 

ﬂlzl

162:1

ﬂ3=1

 

HPE

PMLE

HPE

PMLE

HPE

PMLE

 

N=500

>3
II

0.2

mean

(SE

5.322
(8.844)

2.618
(0.839)

5.333
(8.872)

2.619
(0.855)

5.329
(8.863)

2.623
(0.870)

 

N=1000

mean

(s.d)

5.308
(7.612)

2.616
(0.560)

5.296
(7.570)

2.616
(0.560

5.289
(7.568)

2.618
(0.564)

 

N=1500

mean

(s.d.)

5.247
(6.624)

2.604
(0.540)

5.239
(6.606)

2.602
(0.536)

5.235
(6.613

2.604
(0.543)

 

N=500

mean
(s.d.)

3.610
(5.305)

1.329
(0.362

3.614
(5.311)

1.329
(0.365)

3.608
(5.290)

1.328
(0.366)

 

N=1000

mean
(s.d.)

3.600
(4.192)

1.318
(0.355)

3.593
(4.177)

1.316
(0.355)

3.588
(4.178)

1.315
(0.353)

 

N=1500

mean
(s.d.)

3.456
(3.818)

1.281
(0.342)

3.441
(3.793)

1.281
(0.343)

3.438
(3.798)

1.278
(0.339)

 

N=500

mean

(s.d.)

2.898
(3.761)

0.972
(0.271)

2.876
(3.723)

0.966
(0.268)

2.885
(3.735)

0.969
(0.271)

 

N=1000

mean

(s.d.)

2.669
(2.951)

0.981
(0.261)

2.669
(2.953)

0.979
(0.261)

2.657
(2.916)

0.978
(0.259)

 

N=1500

mean

(s.d.)

2.508
(2.726)

1.016
(0.250)

2.499
(2.706)

1.015
(0.250)

2.501
(2.708)

1.016
(0.253)

 

N=500

mean
(s.d.)

2.246
(2.810)

0.805
(0.373)

2.237
(2.803)

0.801
(0.373)

2.249
(2.841)

0.802
(0.392)

 

N=1000

mean

(s.d.)

2.098
(2.281)

0.843
(0.349)

2.096
(2.279)

0.843
(0.349)

2.082
(2.246)

0.843
(0.340)

 

N=1500

 

mean
(s.d.)

 

2.086
(2.059)

 

0.884
(0.316)

 

2.096
(2.071)

 

0.886
(0.314)

 

2.094
(2.073)

 

0.886
(0.318)

 

*Results are presented for our new Partial Maximum Likelihood Estimator (PMLE) and
the Heteroskedastic Probit Estimator (HPE) of ,8, , ,8, and ,6, . Numbers in brackets

show standard deviations (s.d.).

112

BIBLIOGRAPHY

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance
matrix estimation. Econometrica: 59, 3, 817-858.

Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal.
Econometrica: 41, 997-1016.

Anselin, L. (1988). Spatial econometrics: methods and models. Kluwer Academic
Publishers.

Anselin, L. and Florax, R.J.G.M. (1995). New direction in spatial econometrics,
Springer-Verlag, Berlin, Germany.

Anselin, L. Florax, R.J.G.M, and Rey, IS (2004). Econometrics for spatial models: recent
advances, in Advances in spatial econometrics. Springer—Verlag, Berlin,
Germany,l-28.

Beron, K]. and Vijverberg, WP. (2003). Probit in a spatial context: A Monte Carlo
approach, in Advances in spatial econometrics. Springer-Verlag, Berlin, Germany,
169-196.

Bernstein, S. (1927). Sur l'Extension du Theoreme du Calcul des Probabilities aux Sommes
de Quantities Dependantes. Mathematische Annalen: 97, 1-59.

Case, A.C. (1991). Spatial patterns in household demand. Econometrica: 59, 953-965.

Case, A.C. (1992). Neighborhood inﬂuence and technology change. Regional Science and
Urban Economics 22, 491-508.

Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of
Econometrics: 92, 1-45.

Davidson, J. (1994). Stochastic limit theory. Oxford: Oxford University Press.
Fleming, M. M. (2005). Techniques for estimation spatially dependent discrete choice
models, in Advances in Spatial econometrics. Springer-Verlag, Berlin, Germany,

145-168.

Gourieroux, C. (2000). Econometrics of qualitative dependent variables. Cambridge
University Press.

Greene, W.H. (2003). Econometrics analysis. 4th Edition, Prentice-Hall, Upper Saddle
River, NJ.

113

Harvey, A. (1976). Estimating regression models with multiplicative heteroscedasticity.
Econometrica : 44, 461-465.

Kelejian, H.H. and Prucha, I. R. (1999). A generalized moments estimator for the
autpregressive parametre in a spatial model. International Economic Review: 40,
509-533.

Kelejian, H.H. and Prucha, I. R. (2001). On the asymptotic distribution of the Moran I test
statistic with applications. Journal of Econometrics: 104, 219-257.

Kotz, S. Balakrishnan, N. and Johnson, N. (2000). Continuous multivariate distributions,
2nd Edition. Wiley Series in Probability and Statistics.

Lee, L.-F. (2004). Asymptotic distribution of quasi-maximum likelihood estimators for
spatial autoregressive models. Econometrica: 72, 6, 1899-1925.

Lesage, J. P. (2000). Bayesian estimation of limit dependent variable spatial autoregressive
models. Geographical Analysis: 32, 19-35.

McLeish, D. L. (1974). Dependent Central Limit Theorems and Invariance Principals.
Annals of Probability: 2, 620-628.

McMillan, D. P. (1995). Spatial effects in Probit models. A Monte Carlo Investigation, in
New directions in Spatial econometrics. Springer-Verlag, Berlin, Germany, 189-228.

McMillan, D. P. (1992). Probit with spatial autocorrelation. Jougal of Regional Science:
32, 335-348.

Mukherjea, A. and Stephens, R. (1990). The problem of identiﬁcation of parameters by the
distribution of the maximum random variable: solution for the trivariate normal case.

Journal of Multivariate Analysis: 34, 95-115.

Newey, W.K. and West, K. D. (1987). A simple, positive semi-deﬁnite, Heteroskedasticity
and autocorrelation consistent covariance matrix. Econometrica: 55, 703-308.

Newey, W.K and Mcfadden, D. (1994). Large sample estimation and hypothesis testing, in
Handbook of Econometrics, Ch. 36, Vol 4, North-Holland, New York.

Pinkse, J. Shen L. and Slade, M. E. (2007). A central limit theorem for endogenous
locations and complex spatial interactions. Journal of Econometrics: 140, 215-225.

Pinkse, J and Slade, M. E. (1998). Contracting in space: An application of spatial statistics
to discrete-choice models. Journal of Econometrics: 85, 125-154.

114

Plackett, R.L. (1954). A reduction formula for normal multivariate integrals. Biometrika:
41, 351-360.

Poirier, D. and Ruud, P. A. (1988). Probit with dependent observations. Review of
Economic Studies: 55, 593-614.

Robinson, P. M. (1982). On the asymptotic properties of estimators of models containing
limit dependent variables. Econometrica: 50, 27-41.

White, H. (2001). Asymptotic theog for econometricians. 2nd Edition. Orlando, FL.
Academic Press.

Wooldridge, J. (2002). Econometric analysis of cross section and panel data. The MIT
Press, Cambridge, Massachusetts.

115