581$ 2. 70061 LIBRARY " Michigan State University This is to certify that the 3 dissertation entitled ' SOCIAL LEARNING AND PARAMETER UNCERTAINTY IN IRREVERSIBLE INVESTMENTS ' AND PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A SPATIAL PROBIT MODEL presented by HONGLIN WANG has been accepted towards fulfillment of the requirements for the Ph.D. degree in Agricultural Economics and Economics : Major Professor’s Signature I / Date MSU is an Affirmative Action/Equal Opportunity Employer 9 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5108 K:/Prq/Acc&Pres/CIRC/DateDue.indd SOCIAL LEARNING AND PARAMETER UNCERTAINTY IN IRREVERSIBLE INVESTMENTS AND PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A SPATIAL PROBIT MODEL BY HONGLIN WANG A DISSERTATION Submitted to Michigan State University in particular fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Agricultural Economics and Economics 2009 ABSTACT SOCIAL LEARNING AND PARAMETER UNCERTAINTY IN IRREVERSIBLE INVESTMENTS AND PARTIAL MAXIMUM LIKELIHOOD ESTIMATION OF A SPATIAL PROBIT MODEL BY HONGLIN WANG The dissertation is composed of two essays. The first paper discusses the social leaning and parameter uncertainty in irreversible investments. The adoption of new technology usually involves irreversible investments where the future payoff is uncertain. In addition, investors often have to contend with a limited understanding of the technology itself, which can be modeled as uncertainty regarding the parameters of the stochastic process describing the future payoff. It is hypothesize that social learning (having previous adopters in the farmer’s social network) increases the probability of the farmer adopting the new technology. This is posited based on theory: social learning would reduce parameter uncertainty, and thus the overall level of risk facing the farmer-investor, and thus induce investment. The paper tests this hypothesis using Chinese farm household data on adoption of greenhouses. The latter are of the “intermediate technology” type, made of clay walls, a plastic-sheet roof, and a straw mat roll-out awning for cold nights. The empirical findings of this paper support the hypothesis. It is also found that market volatility discourages adoption. The second paper analyzes a spatial Probit model for cross sectional dependent data in a binary choice context. Observations are divided by pairwise groups and bivariate normal distributions are specified within each group. Partial maximum likelihood estimators are introduced and they are shown to be consistent and asymptotically normal under some regularity conditions. Consistent covariance matrix estimators are also provided. Finally, a simulation study shows the advantages of the new estimation procedure in this setting. The proposed partial maximum likelihood estimators are shown to be more efficient than that of generalized method of moments counterparts. ACKNOWLEDGMENT I owe my gratitude to all those people who have made this dissertation possible and because of whom my graduate experience has been one that I will cherish forever. My deepest gratitude is to my major advisor in agricultural economics, Dr. Thomas Reardon, and my major advisor in economics, Dr. Jeffrey Wooldridge. I would like to sincerely thank Dr. Reardon for his guidance, understanding, and strong support whenever and for whatever I need. He encouraged me to pursue dual degree program, which provides me a wider view of the long-term career path. His extensive international experience, enthusiasms on global development, encouragement and his intelligent visions on human society and culture exhibit me an exciting world for further exploring. I have been amazingly fortunate to have Dr. Wooldridge as my major professor in economics. His wisdom, effective guidance and great teaching, patience and constructive comments help me enrich but focus my ideas at different stages. I am grateful to him for holding me to a high research standard. Inspired by Dr. Wooldridge, learning econometrics becomes a very enjoyable part of my life. I hope one day I could become a great teacher like Dr. Wooldridge to my students. I would like to extend my special thanks to my committee members, Dr. Songqing J in, Dr. Emma Iglesias and Dr. John Giles. They always spent a lot of time to discuss with me and give very helpful advice. I would also express my sincere thanks to Dr. Fan Yu. His insightful critics, guidance, and careful reviews on my paper help me overcome difficulties iv and finish my research. My special thanks also go to Dr. Robert Myers and Dr. Zhengfei Guan, who always offer me their help when I have difficulties in my research. Grateful and sincere thanks also go to Dr. Jikun Huang, Dr. Scott Rozelle and Dr. Linxiu Zhang. I am indebted to them for their guidance on designing survey questionnaire, support on field surveys and the trip in China, and comments on the research paper. They have been advising me since my pursuit of MS. in China, and I am thankful for their continuous encouragement, support and inspiration. I am grateful to my graduate colleagues J inxia Wang, Chengfang Liu, Xiaoxia Dong, Zijun Wang, Haiqing Zhang and Ruijian Chen in China, who worked very hard in assisting surveys, data collection, validation and data entry. I am grateful for their hard work, sharing the thoughts, and friendships. I would like express gratitude to my graduate fellows Wei Zhang, Yanyan Liu, Zhiying Xu, Feng Song, Fang Xie, Feng Wu, Lili Gao, Wolfgang Pej uan, Kirimi Sindi, Vandana Yadav, Ricardo Hernandez, Kang-Hung Chang and Panutat Satchachai in both department of agricultural economics, and department of economics, for the valuable discussion, ‘happy hour’, and care from them. I highly value such fiiendships. They make my staying in the MSU a pleasant and unforgettable experience. Finally, and most importantly, I would like thank my wife Qing Xiang. None of this would have been possible without the love and patience of my wife. My wife and my parents, Ruixia Chen and Jinyu Wang, have been a constant source of love, concern, encouragement and strength all these years. TABLE OF CONTENTS LIST OF TABLES ....................................................................................... viii LIST OF FIGURES ........................................................................................ ix Chapter 1: Social Learning and Parameter Uncertainty in Irreversible Investments: Evidence from Greenhouse Adoption in Northern China ........ 1 1.1 Introduction ............................................................................................... l 1.2 The Theoretical Model Framework .......................................................... 6 1.3 Greenhouse Intermediate-Technology in Northern China ..................... 13 1.4 Data Description ..................................................................................... 16 1.4.1 Sample Selection .............................................................................................. 16 1.4.2 Social Learning ................................................................................................ 18 1.4.3 Other Household Characteristics ..................................................................... 20 1.5 Empirical Methodology .......................................................................... 23 1.6 Empirical Results .................................................................................... 28 1.6.1 Identification Strategy ...................................................................................... 28 1.6.2 Linear Probability Model ................................................................................. 31 1.7 Conclusion ........................................................ ’ ...................................... 35 BIBLIOGRAPY ............................................................................................ 43 Chapter 2: Partial Maximum Likelihood Estimation of a Spatial Probit Model ....................................................................................................................... 46 2.1 Introduction ............................................................................................. 46 2.2 Discrete Choice Models with Spatial Dependence ................................ 51 2.2.1 Probit Model without Dependence .................................................................. 51 2.2.2 A Probit Model with Spatial Error Correlation ................................................ 52 2. 2. 3 Probit Models with Other Forms of Spatial Correlation .................................. 54 2. 3 Using Partial MLEs to Estimate General Spatial Probit Models ........... 56 2.3.1 Univariate Probit Partial MLE ......................................................................... 57 2.3.2 Bivariate Probit Partial MLE ........................................................................... 58 2.4 Partial Maximum Likelihood Estimation ............................................... 64 2.4.1 Consistency of Bivariate Probit Estimation ..................................................... 66 2.2.2 Asymptotic Normality ...................................................................................... 68 2.4.3 Estimation of Variance-covariance Matrices ................................................... 70 2.5 Simulation Study ..................................................................................... 74 2.5.1 Simulation Design and Results ........................................................................ 74 2.6 Conclusions ............................................................................................. 78 APPENDIX I ................................................................................................. 79 vi A.1 Proofs to Theorems ............................................................................................ 79 A2 Technical Lemmas ............................................................................................. 90 APPENDIX II ............................................................................................. l l 1 BIBLIOGRAPHY ....................................................................................... 1 13 vii LIST OF TABLES Table 1.1 Descriptive Statistics: Household Level Data ................................................... 38 Table 1.2 Greenhouse Adoption and Social Learning: LPM Estimated by ZSLS ............ 39 Table 1.3 Greenhouse Adoption and Social Learning: First Stage 2SLS Results ............ 40 Table 1.4 Greenhouse Adoption and Social Learning: LPM with Interaction Terms ...... 41 Table 1.5 Distance to Neighborhood and Characteristics of Household .......................... 42 TABLE 2.1: Simulation Results of Different Estimators of lambda in the Context of the Bivariate Spatial Probit Model ........................................................................................ 111 TABLE 2.2: Simulation Results of Different Estimators of betas in the Context of the Bivariate Spatial Probit Model ........................................................................................ 112 viii LIST OF FIGURES Figure 1.1 Greenhouse Diffusion Curve at the Household Level ..................................... 37 Figure 2.1 N pairwise groups of Zn observations based on Euclidean Distance ...... 59 ix Chapter 1: Social Learning and Parameter Uncertainty in Irreversible Investments: Evidence from Greenhouse Adoption in Northern China 1.1 Introduction Risk and uncertainty have been important themes in the agricultural technology adoption literature since the 19703. They were included in studies of green revolution technology adoption to explain lagged or partial adoption or even disadoption. Examples include Roumasset (1976) and Feder (1980). This can be seen as part of a wider strand of literature on the economics of risk and uncertainty, and their constraining effects on investment (Newbery and Stiglitz, 1981). Distinctions in two dimensions in particular that interest us here have been drawn from the initial foundation of inclusion of risk and uncertainty in agricultural technology adoption analysis. The first dimension is the modeling of various forms of “information capital” as part of the vector of capital assets in the adoption function. The earliest forms modeled were public information in the form of farmers’ education and access to extension services. Then, and of most interest to us here, came the introduction of personal experience with a technology (“learning by doing”) and observation of neighbors’ experience with the technology (“learning from neighbors”). These were introduced for example in Besley and Case (1994) and Foster and Rosenzweig (1995). The modeling of “learning from neighbors” has been further refined in recent papers that model “social learning,” such as: (1) Conley and Udry (2001) in their modeling of Ghana farmers’ adoption of fertilizer in pineapple production, conditioned by their incomplete information and communication networks with neighbors; (2) Bandiera and Rasul (2006) in their modeling of Mozambique farmers’ adoption of sunflowers, conditioned by their social network (neighbors and friends who have adopted); and (3) Munshi (2004) in his modeling of Indian farmers’ adoption of HYV of rice and wheat, conditioned by their neighbors’ experiences but differentiated over rice and wheat areas due to the influence of heterogeneous population. This body of work has demonstrated the effects of social learning on technology adoption. In most cases the social learning’s effect on adoption is interpreted as increasing the capacity of the farmer to adopt as well as reducing the farmer’s uncertainty and perception of risk in adoption. The second dimension is the modeling of irreversible investments in capital embodying technology, such as tube wells, greenhouses, and so on. This distinction — between reversible investments such as adoption of an annual crop, a hybrid seed, fertilizer, or a new planting technique - and irreversible investments where the salvage value of the asset is negligible or the asset cannot be transferred or sold, is important in the analysis of risk and uncertainty in technology adoption. Because of incomplete information with respect to the performance, reliability, and appropriateness of agricultural equipment, irreversibility entails substantial risk for the investor (Dixit and Pindyck, 1994, and Sunding and Zilberman, 2000). McDonald and Siegel (1986) and Dixit and Pindyck (1994) show that the ability to delay an irreversible investment can be considered as a real option; a higher level of uncertainty regarding future benefits raises the option value and causes the investment decision to deviate from the classical NPV rule. Specifically, investors may rationally delay investment to gain additional information, reduce the level of uncertainty, and increase discounted expected payoffs. This has been modeled in two strands of literature. On the one hand, delayed investment to gain additional information in the face of uncertainty has been studied in the economics literature, inspired by McDonald and Seigel and Dixit and Pindyck. Examples include Olmstead and Rhode (1993), Zilberman et al. (2004), Hassett and Metcalf (1995), and Nelson and Amegbeto (1998), inter alia. These studies have tended to assume that all parameters of the dynamic process are known to agents, and the only uncertainty in the model comes from the future value of the dynamic process. On the other hand, investment under parameter uncertainty has been examined in the finance literature. Merton (1980) shows that while the variance of the return can be estimated precisely from continuous observations on a finite interval, the estimator of mean return does not converge unless the length of the interval becomes large. Gennotte (1986) studies portfolio choice under incomplete information about the stock return process. He uses tools of nonlinear filtering from Lipster and Shiryaev (1978) to derive the optimal drift estimator as agents continuously observe the returns. Brennan (1998) and Xia (2001) construct similar models to examine how learning about unknown parameters and unknown predictability affects portfolio choice. More recently, Abasov (2005) modeled irreversible investment under parameter uncertainty, and Huang and Liu (2007) modeled learning from discrete noisy signals about the true drift in their study of periodic news on portfolio selection. Note that much of the finance literature is primarily theoretical, with few empirical applications and none in the domain of investment in agriculture capital as an embodiment of agricultural technology adoption. The present paper aims at a particular, and a particularly important, gap left by the two dimensions discussed. That is, while the literature on social learning and technology adoption has modeled the effect of social learning as a means of reducing uncertainty, that literature has not treated the issue of irreversibility of the investment per se, and thus has not modeled the effect of social learning in a real options context. Moreover, while the literature on irreversible investment and uncertainty has indeed modeled investmentin a real options framework, it has not examined uncertainty-reduction measures taken by adopters, in particular, social learning. There is thus a gap in the literature, both theoretical and empirical, where an analysis of irreversible investment under parameter uncertainty models the effect of social learning. The contribution of the present paper is to address that gap. We address the gap empirically by modeling greenhouse investments with primary data from Shandong province in China. The data are multi-year, observing the characteristics, including their social network of prior adopters, of the adopters the year before their adoption, and thus, new to this literature, we capture causality of social learning and adoption. We address it theoretically, by presenting a new model to the literature of these links. Following McDonald and Siegel (1986), we assume that a farmer is considering an investment project, whose value follows a geometric Brownian motion. Departing from the standard framework, we assume that the true drift of the Brownian motion is unobservable to the farmer (we call this parameter uncertainty). In essence, the farmer is imperfectly informed as to the expected rate of return of his investment. He must make an inference about the true expected return based on his information and, at the same time, determine the optimal timing for investing in the project. The farmer can learn about the unknown parameter in two ways. First, he extracts information on the true drift from a continuous observation of past realized returns on the project value. This captures the process of continuous learning from public information about the project. Second, he obtains discrete noisy signals of the true drifi. This represents the process of social learning from early adopters in his social network, who might possess information about the project that the public do not have. In our model, parameter uncertainty adds to the overall risk that the farmer faces; this raises the threshold project value needed to induce the farmer to invest. In contrast, social learning reduces parameter uncertainty, which decreases the overall level of uncertainty and reduces the investment threshold, thereby increasing the likelihood of adoption. In our model, social learning also causes the farmer’s belief about the expected return to converge to the average belief of his social network; the higher the average belief, the higher is the investment threshold, and the less likely the farmer will adopt the technology. The rest of the paper is organized as follows: In Section 2, we present the theoretical model. In Section 3, we provide background information about the greenhouse technology in northern China. In Section 4, we outline our sample selection and summarize the data. In Section 5, we explain our empirical methodology. In Section 6, we present the empirical findings using linear probability models. We conclude in Section 7. 1.2 The Theoretical Model Framework In this section, we use a real options model to articulate the effect of parameter uncertainty and social learning on technology adoption. We begin with a model of continuous learning, which is essentially that of Abasov (2005). Specifically, a farmer is considering whether to pay a sunk cost of I for an agricultural technology, whose value V evolves according to: th= Vr(#dt+0dzt) where Z is a Brownian motion. Motivated by Merton (1980), we assume that the farmer can observe V continuously and knows its volatility 0; however, he only knows that the drift 1: is a normal random variable with mean m0 and variance ya in the beginning. According to Lipster and Shiryaev (1978), the conditional mean of the drift given the farmer’s information set, mt = E(,u|FtV), follows: dmt = L’dZ, a where VI = E [(11 — mt )2 [FtV :l is the conditional variance of the drift, satisfying: 7 d7, =—-’—dz (1.1) 0' I and Z is a new Brownian motion related to the original Brownian motion through: dZt =dZt +mt-fldt 0' We can solve equation 1.1 for y,: __ 7002 71"“? yot+a This result shows that continuous learning decreases the conditional variance of the unknown parameter. Thus the longer the farmer observes the value process, the less uncertain he is about the drift. This is consistent with Merton (1980)’s results: the uncertainty of the drift is not related to the number of observations, but is rather related to the length of the observation period. However, the conditional mean of the drift can fluctuate up or down, depending on new observations of the Brownian motion Zt. According to Gennotte (1986), the farmer’s decision can be separated into two problems: the inference of the unknown parameter given {2; }O Z, the farmer is initially too optimistic; social learning causes him to lower his expectation about the project’s return. This, in turn, lowers the trigger value and facilitates adoption. If the farmer is, on average, unbiased in his initial belief, then social learning is unlikely to change the probability of adoption through its effect on the conditional mean return. If we generalize this model to allow the dynamics of social learning to enter the farmer’s decision making, then we can write down the following optimal stopping problem, where we combine continuous filtering with discrete updating: J(mo.7o,Vo)= max Armin—1)], TEFVVF SJ. th = Vt(mtdt+odZt), dmt =fldZt+—Zt;—2—(,u(t)-mt_)dNt, (1.6) 0 7t- +08 72 72 dy, = ———’2—dz-——’—‘—2dN,. 0' 712.4178 Here, p(t) refers to the independently and identically distributed noisy signals described in equation (1.5), and Nt is a counting process that counts the number of signals that the farmer has received up to time t. It can be periodic and deterministic as in Huang 10 and Liu (2007), or stochastic, as in the case of a Poisson process with arrival rate x1 , which describes social interaction as a random phenomenon. In all cases, however, the first part of the dynamic equations for (m, y) captures the effect of continuoUs updating as the farmer learns from the past history of V. The second part represents a jump in the conditional mean and variance when the farmer receives a noisy signal of the drift. Because 7 and N are deterministically related through the conditional variance relation, we have suppressed the dependence of the value function onN . Similarly, we can write the trigger value as V"(mt, yr), with the understanding that the effect of N t is already reflected in the conditional variance y,. Generally, the optimal stopping problem (1.6) must be solved numerically. The adoption decision is related to the amount of social learning that the farmer has experienced. According to the above model, this is measured by N t- As the conditional variance equation shows, a largerN (more social learning) always reduces y. We conjecture that the trigger value is increasing in y, regardless of whether farmers are cognizant or ignorant of future social learning.2 This implies that social learning can lower the trigger level for adoption. Summarizing the various models, the classical real options analysis of McDonald and Siegel (1986) predicts that the trigger value for investment increases with the uncertainty of the project value. We show that this result also extends to parameter uncertainty. Building from recent work on social learning and technology diffusion (such as Bandiera and Rasul, 2006), we argue that social learning can facilitate adoption by reducing parameter 2 One can conceive of cases in which knowledge of the social learning dynamics can actually delay adoption. For example, if the farmer knows that parameter uncertainty will be fully resolved tomorrow, he is unlikely to invest today. 11 uncertainty. In rural China, where public extension information is not easily accessible to small farmers, information from social learning could play an important role in their adoption decisions. The rest of our paper is dedicated to testing this hypothesis. 12 1.3 Greenhouse Intermediate-Technology in Northern China Before economic reforms, China gave first priority to the development of heavy industry. In agriculture, China emphasized the importance of self-sufficiency for grains - the “iron rice bowl policy.” After the “household responsibility system” reform started in 1981, the shortage of grain supply was relieved by a significant increase in grain production. This made it possible for China to diversify into horticulture and livestock husbandry. Meanwhile, rapid income growth in the 19805 and 19905 created an increasing demand for high-value horticultural products. However, poor infrastructure and high energy costs prevented the transportation of perishable products from southern China to northern China, and affordable fresh vegetables were still unavailable in the 19905 to consumers during the winter season in northern China. The huge demand for cheap fresh vegetables led to the development and widespread diffusion of an affordable greenhouse technology for northern Chinese farmers. Rather than the modern, expensive type made of steel frame, plastic or glass walls and ceilings, and requiring energy-using heating and cooling mechanisms (promoted in the 19705 in China but saw very little adoption because of the cost, Wan 2000), the greenhouse adopted in the 19903 in northern China was of the “intermediate technology” type, made of simple clay walls, bamboo frame, a plastic-sheet roof, and a straw mat roll-out awning for cold nights. The sun warms the interior, with the greenhouse built with an orientation to maximize sunlight capture. These greenhouses changed not only the food consumption pattern for hundreds of millions of consumers, but also the face of farming in northern China. These greenhouses helped to transform China from a modest global player to the 13 volume leader in horticulture - growing one third of the fruits and vegetables on the planet by 2003. By 2004, China grew 47 percent of the vegetable volume in the world (Weinberger and Lumpkin 2005). The vegetable greenhouse area in China reached 150,000 hectares in 2004 (Chinese Agriculture Yearbook 2006), and at least half a million farmers were by that year using the intermediate-technology greenhouse. Greenhouse yields exceed open-field cropping: for example, the tomato yield is 200 tons/hectare/year in the greenhouse, versus 40 tons in an open field. Several factors, including labor intensive production, contribute to this high yield. For example, the popular greenhouse size in Shandong province is only about 60 meters long and 10 meters wide, but it usually employs two full-time workers. Greenhouse production usually lasts more than eight months, because the temperature inside the greenhouse is high enough during the winter months to sustain production. Moreover, high quality crop varieties and intensive use of organic fertilizers are common in greenhouse production. Nutrient replacement is important due to the intensive and continuous use of the land under the greenhouse. The intermediate-technology greenhouse is far cheaper than a modern type, but is still a major investment for the very small farmers of Shandong. The construction cost of intermediate-technology greenhouses is roughly four dollars per square meter, much cheaper than modern greenhouses of glass or plastic which cost about 80 dollars per square meter to construct. Yet even four dollars per square meter is a large investment for very small farmers. For example, if a greenhouse is 60 meters long and 10 meters wide, the construction cost would be about $2,400, while the average Chinese farmer earned less than $500 in 2005. Moreover, the labor time involved in building the greenhouse is substantial: the farmer spends months creating the main component - the rear-wall of the 14 greenhouse, which is usually made of pounded clay bricks. Moreover, the investment is “irreversible,” in the sense of Bertola and Caballero (1994), as the structure can only be used in immediate production, and has little to no salvage value and cannot be sold or transferred. The bricks cannot be reused or sold; if the farmer decides to demolish the greenhouse (as it cannot be transferred or sold as it is not movable), the bricks would be broken into dirt clods, and the old straw awning and old bamboo beams worth little in salvage. 15 1.4 Data Description 1.4.1 Sample Selection Our survey area is in Shandong province, the leading horticulture province in China. It has seven percent of China’s cropland, but 12 percent of China’s horticultural land in 2004. The latter share has been steadily rising over time. The number of greenhouses and the level of commercialization as well as yields in Shandong are higher than in the rest of China. In Shandong, we conducted two coordinated community and household level surveys in 2005 and 2006, respectively. The first one, the Shandong village survey, provided a representative sample of tomato and cucumber growing villages in Shandong. During the first step of the survey, we created sampling frames of county-level tomato and cucumber production in order to select five sample counties per crop. Specifically, with knowledge of county production of each crop, we ranked counties by the output per capita of that crop. For each crop in our sample, one high production county was randomly selected from the counties in the top quintile; the other high production county was randomly selected from the second quintile. The two medium production counties were randomly chosen from the third and fourth quintiles, respectively. After eliminating five percent of the counties with the lowest production, the low production county was randomly chosen from the lowest quintile. In the end, there were two counties in the high production set, two counties in the medium production set, and one county in the low production set. After the sample counties were chosen, a similar process was used to select sample townships and villages. For each crop, the survey teams visited a total of ten townships. l6 Moreover, for each crop (among the five counties and ten townships), we interviewed respondents in 35 villages (22 in high production counties, 10 in medium production counties, and 3 in low production counties). Since we collected area data on all villages, townships, and counties in the sample, we were able to construct area-based weights in order to create point estimates of our variables that are provincially representative. Having selected the villages, the enumeration team visited each community and undertook data collection. Specifically, the enumerator conducted a two-hour interview with three village leaders for the village survey. In each village, we divided all households into two groups. For the cucumber sample, they are non-cucumber households and cucumber households. We randomly sampled seven cucumber farmers and three non-cucumber farmers. As a result, we obtained 350 households from cucumber growing villages.3 With knowledge of the distribution of cucumber farmers and non-cucumber farmers, plus the distribution of greenhouse adopters in each village, we calculated the weights to adjust for selection bias. Following this procedure, we also obtained 350 households from tomato growing villages. After data cleaning, we collected 638 valid household observations. Among this sample, 204 (64 percent) out of 317 households from tomato growing villages were found to have adopted greenhouses, while 158 (49 percent) out of 321 households from cucumber growing villages were found to have adopted greenhouses. That a higher share of tomato growers adopted greenhouses is apparently due to the fact that in cucumber production, a shading shed is a substitute for a greenhouse, while in tomato production there is no 3 . . . . . . The reason why we d1d not directly stratify on greenhouse use IS that our survey 15 part of a large hort1culture production survey, which required stratified sampling of cucumber/tomato and non-cucumber/tomato households. 17 substitute for a greenhouse, and the options are only growing in the open field or in a greenhouse. Shandong farmers did not adopt greenhouses all at once, but rather, in a process typical of diffusion of new technology, over years. The greenhouse diffusion process can be roughly divided into three stages: early stage, take-off stage, and slow-down stage. Figure 1.1 shows that the diffusion process is relatively slow in the early stage before 1990; only a few farmers adopted the technology. Between 1990 and 1995, many more farmers adopted. The diffusion process reached its peak between 1996 and 2000, after which the trend began to slow down. This diffusion curve is similar to the “s-curve” observed by Griliches (l 95 7) for the adoption of hybrid maize in the US, and subsequently documented in many other settings. 1.4.2 Social Learning We are interested in the effect of social learning on farmers’ adoption of greenhouses. Our theoretical model predicts that social learning helps to reduce parameter uncertainty, thus facilitating adoption. Empirically, however, social learning could be one of many factors affecting adoption. For example, farmers may have other options such as off-farm jobs. Alternatively, farmers may be credit-constrained because greenhouse adoption is a major investment. To disentangle the effect of social learning from other determinants, we need to find appropriate empirical proxies for social learning and control for other factors that might influence farmers’ decisions. Social learning is a key variable in our study. We measure social learning in a way similar to the approach of Bandiera and Rasul (2006). Specifically, we asked the farmers 18 who adopted, “How many people do you know who adopted greenhouses before you adopted in your village?” We asked the non-adopters how many adopters they knew at the time of the survey. We control for year with year dummy variables. We then asked, “How many of these people are your relatives and friends?” (We did not include neighbors as a separate category because Chinese farmers usually consider neighbors among friends.) The answer to the second question is taken as our empirical proxy for social learning. Differing from Bandiera and Rasul (2006) (who asked about the social network at the time of the survey, not before adoption), we obtained the size of the farmer’s social network of adopters before his adoption, so that we can infer causality. There are several reasons why our measure of social network of adopters is an appropriate measure of social learning before adoption. First, the number of earlier adopters among relatives and friends is likely to be positively correlated with the number of different sources of information on greenhouse adoption that the farmer accessed before adoption, which corresponds to the number of discrete signals in our theoretical model. Second, village membership, kinship, and friends are the defining elements of a farmer’s social network, or a group of people with whom the farmer has close contact, and from whom information can be most easily obtained. By concentrating on the number of earlier adopters among relatives and friends, we also mitigate the concern for ex post social network formation. While this is obvious for kin adopters, we noticed during our survey that Shandong farmers tended to define friendship based on long-term relation, such as classmates, neighbors, and people who served with them in the army. Typically, they consider a friend someone from whom they can borrow money in case of illness; they would not consider passing acquaintances as friends. Third, we found that farmers were 19 easily able to remember the number of adopters they knew before they adopted; we surmise that this is because a greenhouse is a big investment for local farmers and hence easily observable. The first two rows of Table 1.1 provide the means and standard errors of our social learning measures by adoption status. In the last column, tests of equality of the means are provided to examine whether the differences between adopters and non-adopters are significant. The first row indicates that, on average, adopters know about 6.9 earlier adopters among relatives and friends in their own village, while non—adopters only know about 4.7 earlier adopters in their social network. The result of the t-test shows that this difference is significant. This implies that there is more social learning for adopters than for non-adopters. When we extend the scope of the social network to include earlier adopters among relatives and friends in nearby villages (the second row), the findings are similar. 1.4.3 Other Household Characteristics Table 1.1 presents other household characteristics by adoption status. There are several salient points. (1) Demographics differ between adopters and non-adopters. The family size of adopters is significant larger than that of non-adopters, while the amount of farm labor is significant smaller for adopters than for non-adopters. This is because adopters have more dependent family members (either young children or old parents) than non-adopters. For such households, greenhouse adoption could be a good choice because it allows the adults to work close to home, so that they can care for dependent family members. Non-adopters are, on average, substantially older than non-adopters - a point consistent with younger 20 farmers having more young children and old parents to care for. (2) Off-farm employment and income are significantly larger for non-adopters than for adopters, which suggests that greenhouse labor and off-farm jobs are substitutes. (3) There is no significant difference in education between adopters and non-adopters in our sample. This suggests that education is not the main determinant of greenhouse adoption when the main source of information for the technology is social learning. (4) The farm size of adopters is larger than that of non-adopters, which indicates that farmers with more land are more dependent on agricultural income, and farmers with less land are more likely to favor off-farm jobs. (5) Irrigation is of course important to greenhouse farming, and 89 percent of the adopters have access to irrigation. However, 80 percent of the non-adopters also have access to irrigation, showing that there is not much variation in irrigation access among farmers in this well-irrigated region. (6) Adopters have greater land tenure security than non-adopters. This is a sensible result given the long-term nature of greenhouse investment. We proxy land tenure security by the number of land reallocatio'ns undertaken by village leaders every few years to ensure relative land distribution equality in the village. (7) Adopters and non-adopters have no significant difference in grain land share, which suggests that both groups have a similar agricultural production pattern except that adopters use greenhouses to produce vegetables and non-adopters produce vegetables in the open field. (8) The presence of a credit constraint would in theory undermine an important 21 investment such as greenhouses, all else equal. However, it is difficult to measure a credit constraint facing a farmer, as this is equivalent to examining whether a farmer can borrow as much as he would like at the going market interest rate (Banerjee and Duflo, 2002). Since we are focusing on greenhouse adoption rather than testing whether the farmer has invested in a greenhouse of optimal scale, we only need to know whether a farmer is capable of building a greenhouse by borrowing money or using his savings. Therefore, we observed the house value as a proxy for household wealth. We also collected the household’s credit history (maximum borrowing and maximum lending) before adoption as an indicator of how much credit/savings is available. Our data shows that non-adopters are significantly wealthier than are adopters before the latter’s adoption; non-adopters have a mean house value of 8,773 yuan vs. 4,294 yuan for adopters. Similarly, non-adopters have significantly greater credit/savings than adopters. The maximum borrowing is 1,352 yuan for non-adopters vs. 925 yuan for adopters, and the maximum lending is 862 yuan for non-adopters vs. 368 yuan for adopters. Given that non-adopters are both wealthier and have more access to credit, credit constraints are unlikely to play an important role in greenhouse adoption in Shandong. 22 1.5 Empirical Methodology In this section, we illustrate the connection between our theoretical model and the empirical framework. According to our real option model of greenhouse adoption, the farmer decides to adopt or to wait based on a comparison between the current value of the technology and the trigger value. Therefore, we can define the farmer’s adoption status at time t as: Yt = 1(ad0pt), if Yt* = Vt - Vt," > 0’ (1 7) Y; = 0 (non - adopt), if Y; = Vt — V; S 0, where V; is the discounted expected value of all future cash flow from greenhouse a: vegetable production, and Vt is the trigger value. McDonald and Siegel (1986)’s model, in which the drift ,u is known, shows the trigger a: value V as a function of the parameters (p, ,u, I, 0'). However, the drift p is unknown in our model. Thus, the trigger value also depends on the conditional mean and variance of the drift, (mt,yt). According to the dynamics of (my) in equation (1.6), we can substitute I _ (mtg/t) with functions of (mo, 70,Zt,Nt,a,ag, ,u,t).4 Therefore, we can express the trigger value V ,* as: * I _ Vt =g[pa1,0,m0,70,Zt,Nt,Ug,#,t) (1-8) 4 This is only a simplified representation; strictly speaking, the solution of (mt , 71) according to equation I (1.6) depends on the paths of Z and N, as well as the history of the signals up to time t. 23 Following similar reasoning, the current project value Vt can be written as a function of the same group of variables. Therefore, we can express Y; = V; — V; as: I _ Y; =h(p,1,0,m0,y0,Zt,Nt,0'541¢). (1.9) To motivate the empirical proxies for the variables in equation (1.9), we first note that Z; represents the stochastic change in the project value. A good proxy for Z; is the observed profitability of greenhouse production in the current period. We proxy that profitability by the ratio of the output price to the input price. Because historical data are not available on vegetable prices in Shandong, we use the ratio of the vegetable price index and the input price index at the national level as a proxy for the profitability of greenhouse production over the years. For the investment cost I, we use the greenhouse construction cost (real value) for each adopter. For non-adopters, we use the average construction cost for adopters in their village or nearby villages as the proxy. Continuing with the interpretation of equation (1.9), 0' is the volatility of the project value, which we measure as the standard deviation of the national vegetable price index over the three years prior to the farmer’s adoption. Tu- represents the average signal received by the farmer from his social network, the proxy for which is the vegetable price index growth rate over the three years preceding the farmer’s adoption. This is a reasonable assumption if the expected return of the project is close to the average return in the economy. The time t in our model is equated with the amount of time the agent spent in continuous learning. We use the number of years that the farmer had been aware of the technology before adoption to represent the continuous learning effect. As noted above, N t is the key variable in our study. We measure it by the number of earlier adopters in a 24 farmer’s social network, which includes relatives and friends in his own village and nearby villages. Besides these theoretically motivated variables, there may be other factors that affect greenhouse adoption in practice, such as land tenure security, off-farm employment, and household wealth. These factors were discussed in the preceding section. In addition, we do not have compelling empirical proxies for farmers’ discount factor p , their initial values of the conditional mean and variance (m0,70) before any learning had taken place, and the standard deviation of their signals 03. These parameters, however, are likely correlated with household characteristics such as age, family size, and education, which we include in our empirical analysis to capture potential omitted factors. Our theoretical model is based on observables; with knowledge of these observables, the model predicts adoption with certainty. In reality, however, we do not observe all information relevant for determining adoption. Therefore, our empirical model must allow for the presence of unobserved determinants. In brief, our empirical model can be written as: Yi" =f(X,—,z,~,N,-,Dl,02)+e,-, (1.10) *. . . . . . where i denotes a household, 1’; IS the adoption cr1terlon 1n year 1 according to equation (1.7), and X i are household characteristics before adoption (year t—l ), which include age, education of household head, family size, farm size, off-farm employment and income, family labor, irrigation conditions, family wealth, years of awareness of the technology, and greenhouse construction costs. Z 1' are institutional and market variables at 25 t—l , which include the number of land reallocations, the ratio of the output price index to the input price index, the volatility of the vegetable price index, and the average grth rate of the vegetable price index. Ni is the number of earlier adopters in the farmer’s social network at t—l . Dl and D2 are, respectively, year and county dummies that control for heterogeneity in farmers’ adoption across different years and counties. Finally, e,“ represents the effect of unobservable determinants of adoption. According to equation (5.4), the probability of adoption is: P(Yl.*>0)=P(e,'>—f(X,',Zi,N,',D1,D2)) (1.11) In our empirical analysis, we estimate a linear probability model (LPM), which specifies the above probability as a linear function of the explanatory variables. LPM has its strengths and weaknesses. (I) It is a linear model, which offers convenience in model estimation. For example, OLS provides consistent and even unbiased estimators and ease in dealing with heteroskedasticity using heteroskedasticity-robust standard errors and t-statistics. (2) However, the coefficients in the linear model measure the effect of the explanatory variables on the response probability. Unless the range of the explanatory variables is severely restricted, the LPM cannot be a good description of the population response probability. The hope is that the linear specification approximates the response probability for common values of the covariates; fortunately, this often turns out to be the case (Wooldridge 2002). (3) The LPM model allows us to use year dummies to control for heterogeneities over time, which is important to this empirical study given the structure of our data set (in which different farmers adopted greenhouse in different years). Therefore, even with some weaknesses, LPM often provides good estimates of the partial effects on 26 the response probability near the center of the distribution of the explanatory variables. 27 1.6 Empirical Results 1.6.1 Identification Strategy In this section we focus on the potential endogeneity of the social learning effect and our identification strategy. The endogeneity problem is one of the most formidable problems in empirical studies. In order to find an appropriate identification strategy for this study, it is crucial to understand the reasons why we could face the problem. Manski (1993) uses the reflection problem to describe the tendency for people in the same social network to behave in similar ways. He identifies two possibilities: (I) an endogenous effect, wherein the propensity of an individual to behave in certain ways varies with the prevalence of the behavior in the group; (2) a correlated effect, wherein common environment and personal characteristics produce similar behavior. In this paper, we attempt to show that farmers’ adoption decision is influenced by social learning. Therefore, we need to empirically distinguish the social learning effect from the endogenous effect and the correlated effect. In our context, the endogenous effect is essentially the social pressure problem. Psychologists often use social pressure as a way of explaining herd behavior. For greenhouse adoption, adopters are usually the minority in most villages. From this observation one can infer that it would be rare for farmers to choose greenhouse adoption because of social pressure. In our context, the correlated effect poses a more serious challenge. An endogeneity problem could arise from the simultaneous determination of adoption and network formation: for example, a farmer could know more adopters because he adopted the 28 greenhouse. In other words, the adoption could affect social learning instead of social learning affecting adoption (endogeneity from simultaneous determination). To mitigate this problem, we collected household and institutional information for the year before the adoption for adopters. For non-adopters, we collected the information in the year before the survey occurred (2005). Moreover, farmers who are entrepreneurial in spirit are likely to know more people (hence more adopters). At the same time, they are more likely to try out new things (thus more likely to adopt). Therefore, a farmer’s adoption could be explained by his personality, rather than by learning from others in his social network. Thus, a key problem is how to identify social learning from unobservable error terms such as similar personalities in the social network. We need to find at least one instrumental variable which is (1) correlated with social learning after we control for other factors, but that is (2) not correlated with the error terms. We can test the first condition. We cannot test the second condition directly because the error terms are not observable. Fortunately, we have an appropriate instrument in this study: the walking time from the farm to a farmer’s neighborhood. More specifically, we ask farmers the following question in the field survey: “How many minutes does it take to walk by your 20 closest neighbors?” The logic of this question is that social learning could be negatively correlated to the walking time. For example, if a farmer lives in a mountainous area, it could take two hours or even more to walk by his 20 closest neighbors. On the contrary, it only takes 10 minutes for farmers to walk by his 20 closest neighbors if people live closely. We surmise that farmers in the second case are more likely to have access to social learning. We test this hypothesis with data after controlling for other factors: we find that walking time is 29 significantly negatively correlated with social learning (first row of Table 1.3 for both social learning measures). This result demonstrates that the walking time variable satisfies the first condition for a valid instrument. For an analysis of whether this instrument meets the second condition (lack of correlation with the error term in the adoption equation), the following three-step discussion provides further justification for the validity of the instrument. First, we use a heuristic explanation to justify the instrument. In rural China, it is not unusual for a family to live in the same place for decades. A well-fimctioning real estate market does not exist in rural China for several reasons: (1) a farmer could own his house, but not the land on which his house is built because all land is owned by the village collective; (2) it is illegal to buy a house in a village if the buyer is not a member of the village; (3) it is also illegal for a household to buy an additional house from another villager because Chinese law forbids any household to occupy two pieces of land for housing in a village; (4) if a farmer wants to change his house location, either he has to obtain a new piece of land from the village collective under very strict conditions due to land scarcity in Shandong, or he can find another household in the village that is willing to give up its housing land, which is very rare. In addition, in both cases the farmer has to give up his old housing land. Based on these observations, it appears very difficult, if not impossible, for a household to change its location. In other words, the farmer’s housing location in rural China can be considered as fixed in most cases. From this we infer that the walking time to the neighborhood is fixed and exogenous to greenhouse adoption. Second, we constructed interaction terms between the IV (distance to neighborhood) and year dummies. We used the Hansen-J over-identification test to examine the validity of 30 the IV given that we believe the other instruments (the interaction terms) to be truly exogenous. The C-statistic from the Hansen-J test (the last row of Table 1.4) indicates that the distance to neighborhood variable passes the validity test in both social learning measurements. We must be cautious by not over-emphasizing this result, as the power of the Hansen-J test depends on the exogeneity of the other instruments. However, this is the best test we can do to check the validity of an instrumental variable. Finally, we tabulate the distance to neighborhood by household characteristics such as education, age, and wealth. These simple but reliable summary statistics can tell us whether the distance to neighborhood is correlated with typical household characteristics. If the distance to neighborhood is truly exogenous due to the fixed housing location in rural China, we would not expect to see a significant correlation with household characteristics. Indeed, the results in Table 1.5 indicate that the distance to neighborhood does not show any robust correlation with the education and age of the household head, or the real value of the house. These findings lend support to our working hypothesis that that the distance to neighborhood is exogenous to greenhouse adoption. As a result of these discussions, we are fairly confident that the IV (distance to neighborhood) is exogenous to greenhouse adoption, and therefore it allows us to obtain consistent estimators given that social learning is shown to be endogenous by the Durbin-Wu-Hausman Test (last row of Table 1.2). 1.6.2 Linear Probability Model Table 1.2 presents the estimation results for the linear probability model estimated by 2SLS with cluster-robust standard errors using distance to neighborhood as the instrument. 31 The first two columns report the results using a measure of social learning within the farmer’s own village; the next two columns report the results using a measure of social learning that also includes the farmer’s nearby villages. Generally speaking, the two sets of results are very similar, suggesting that village boundaries are not crucial to how social learning affects greenhouse adoption. We will focus on the first two columns for a detailed discussion of our results. The first row confirms the key result for our study: social learning has a significantly positive impact on greenhouse adoption. Specifically, one more adopter in a farmer’s social network increases the probability of his adoption by 1.9 percent after controlling for other factors. In other words, if there are currently 10 earlier adopters in the farmer’s social network, his adoption probability in the next year will increase by about 19 percent. Given that the greenhouse adoption rate is still low in rural China, this amount of increasing probability is economically significant. The third row shows how adoption is affected by the conditional mean return to the greenhouse technology. From our theoretical model, we know that the farmer’s belief about the mean return will converge to the average belief of his social network as a result of social learning. Because we cannot observe farmers’ expectations, we use the vegetable price index (national level) growth rate before adoption to approximate the average belief of project return in the social network. The coefficient is not significant; however, the sign is consistent with the prediction of our theoretical model, namely, higher expected return results in a higher trigger value for investment and a lower probability for adoption. It is also possible that the price index growth rate is acting as a proxy for farmers’ outside opportunities; however, we have already included off-farm income in our regression 32 specification. We use the market volatility of vegetable prices before adoption to represent the uncertainty in the stochastic project value in our theoretical model. Our result indicates that this source of uncertainty discourages adoption. This finding is consistent with theory, which predicts that the option value of waiting to invest is larger when the future investment value is more uncertain. We use the number of years that the farmer had been aware of the technology before adoption to represent the continuous learning effect. However, it is not significant according to our estimation. It could be that farmers in rural China simply did not have continuous access to information about the greenhouse technology and its returns. It is also possible that the main source of information about the greenhouse technology is discrete social learning. Our proxy for the current profitability of the greenhouse technology is the ratio of the output price index to the input price index: the higher is the stochastic project value, the higher is the probability of adoption. Our result confirms this prediction. Among the included household characteristics, only the age of the family head is statistically significant. However, the effects of most household characteristics are consistent with our discussion in section 1.4.3. TheRzof this regression is 0.83, which suggests that we have included most of the factors that could affect the adoption decision. It also reinforces the idea that our irreversible investment model is an appropriate choice for describing the greenhouse adoption behavior. In Table 1.4, the interaction terms between the distance to neighborhood and the year dummies are included as extra instruments in the regression. The results are very similar to 33 the results in Table 1.2, which suggests that the results are robust. Moreover, the extra instruments allow us to use the Hansen-J test to test the validity of the IV (distance to neighborhood). 34 1.7 Conclusion In technology adoption with irreversible investment, agents commonly face two sources of uncertainty. First, the future value of the investment is uncertain. Second, agents have incomplete information regarding the parameters of the process describing the future investment value. In this paper, we model social learning as a way of reducing parameter uncertainty, thus facilitating technology adoption with irreversible investment. We use household-level data from intermediate-technology greenhouse adoption in northern China to test the predictions, with the following main results. (1) Social learning has a significantly positive impact on greenhouse adoption. Ten more adopters in the farmer’s social network increase the probability of adoption by 19 percent, which is an economically significant effect. (2) The empirical data confirms what we know from the conventional theory of irreversible investment: higher uncertainty about the filture investment value results in less adoption. (3) Social learning could also affect technology adoption through its influence on the farmer’s belief about the expected return on the technology. The empirical data offers some support for this hypothesis. Our paper also provides an answer to the following question: how could small farmers in developing countries deal with the risk from irreversible investment and incomplete information? Our results suggest that social learning can be an effective solution. Therefore, the policy implication from this paper is clear: when small farmers face technology adoptions such as investing in tube wells or machinery, helping several farmers adopt 35 successfully may be the best way to induce more adoption in their village. 36 Figure 1.1 Greenhouse Diffusion Curve at the Household Level A: .5 .3 ..L A _s _s N N N N s a a a a 0 s a a s 0 ° g 0) on O N g 8 on O N g 8 L—o— the number of adopters 37 Table 1.1 Descriptive Statistics: Household Level Data This table contains the basic household characteristics used in our study. The mean value for each variable is presented with the associated standard error in parentheses. For adopters, all variables are measured in the year before adoption. For non-adopters, all variables are measured in the year before the survey. *** denotes significance at one-percent, ** five-percent, and * ten-percent level. Basic characteristics Non-adopter Adopter Test of equality of the means (p-value) Social learning within village 4.7 6.9 0.027“ (0.7) (0.67) Social learning within village 5.8 8.45 0.018** and nearby villages (0.8) (0.76) Family size 3.7 3.9 0.016“ (0.07) (0.06) Farm labor 2.92 2.46 0.01*** (0.07) (0.043) Off-farm employment 0.8 0.24 0.01 *** (0.054) (0.022) Age of family head 46.4 35 0.01*** (0.6) (0.46) Education of family head 7.0 7.24 0.25 (0.17) (0.14) Off-farm income (yuan) 8420 1643 0.01*** (649) (182) Farm size (mu) 5.6 6.01 0.09* (0.19) (0.16) Irrigation ratio 0.80 0.89 0.01*** (0.019) (0.013) Major land reallocations since 1.44 0.79 0.01*** 1980 (0.067) (0.05) Minor land reallocations since 4.29 3.19 0.01*** 1980 (0.26) (0.19) House value (yuan) 8773 4294 0.01*** (539) (413) Grain Land Share (percent) 0.579 0.577 0.92 (0.282) (0.252) Maximum lend 862 368 0.01*** (104) (66) Maximum borrow 1352 925 0.01** (146) (102) 38 Table 1.2 Greenhouse Adoption and Social Learning: LPM Estimated by ZSLS This table contains a 2SLS estimation of the linear probability model for farmers’ adoption decision. The instrumental variable for social learning is distance to neighborhood (measured by the walking time to the 20 closest neighbors). The dependent variable is 1 for adopters and 0 for non-adopters. *** denotes significance at one-percent, ** five-percent, and * ten-percent level. Explanatory variables Coefficient Robust Coefficient Robust std error std error Social Learning Social learning within village 0.019 0.01** Social learning within village 0.017 0.009“ and nearby villages Conditional mean of market -0.41 0.34 -0.37 0.31 return Market volatility -0.0017 0.0006" -0.0017 0.0006*** Years of awareness of the -0.009 0.0073 -0.01 0.008 technology Output price/input price 0.83 0.21 *** 0.86 022*" Household Characteristics Family size 0.020 0.017 0.021 0.016 Age of family head -0.0034 0.0017** -0.0037 0.0015“ Education of family head 0.0012 0.0044 -0.0003 0.004 Off-farm income -0.0068 0.0065 -0.007 0.006 Farm size 0.006 0.006 0.0075 0.0054 Irrigation ratio 0.058 0.045 0.060 0.038 House value -0.0017 0.0032 -0.0022 0.0031 Greenhouse construction cost 0.0073 0.015 0.006 0.14 Times of major reallocations -0.017 0.03 -0.025 0.031 Times of minor reallocations -0.001 0.008 0.0005 0.008 Grain share 0.158 0.095 0.144 0.089 Dummies and constant terms Crop dummy 0.0043 0.038 0.061 0.044 County dummies Yes Yes Year dummies Yes Yes Constant terms -0.687 0.26** -0.69 0.26** Observations 626 626 Adjusted R-squared 0.83 0.84 Durbin—Wu—Hausman Test for p-value 0.014 p-value 0.013 Endogeneity 39 Table 1.3 Greenhouse Adoption and Social Learning: First Stage ZSLS Results This table contains the first stage results of a ZSLS estimation of the linear probability model for farmers’ adoption decision. The dependent variable is social learning within village or social learning within village or nearby villages. *** denotes significance at one percent, ** five percent, and * ten percent level. Social learning within Social learning within village village and nearby villages Explanatory variables Coefficient Robust Coefficient Robust std error std error Walking time to 20 closest -0.088 0.044” -0.093 0.046“ neighbors Conditional mean of market 1.02 2.535 -1.375 2.585 return Market volatility 0.005 0.012 0.007 0.015 Years of awareness of the 0.657 0.30" 0.771 0.304M technology Output price/input price 14.82 16.2 14.09 16.94 Household Characteristics Family size 0.132 0.743 0.086 0.784 Age of family head -0.062 0.080 -0.046 0.089 Education of family head -0.l32 0.286 -0.052 0.308 Off—farm income 0187 0.338 -0.185 0.346 Farm size -0.030 0.358 -0.107 0.351 Irrigation ratio 1.729 4.076 1.700 3.656 House value -0.035 0.143 -0.010 0.143 Greenhouse construction cost -1 .169 0.605* -1 .1 68 0.580” Times of major reallocations 1.416 0.974 1.946 0.970" Times of minor reallocations -0.610 0.521 -0.727 0.526 Grain share 0.847 4.289 Dummies and constant terms Crop dummy -1.898 2.092 -3.11 3.05 County dummies Yes Yes Year dummies Yes Yes Constant terms -5.996 19.69 -6.21 20.62 Observations 626 626 Adjusted R-squared 0.267 0.293 40 Table 1.4 Greenhouse Adoption and Social Learning: LPM with Interaction Terms This table contains a ZSLS estimation of the linear probability model for farmers’ adoption decision. The instrumental variables for social learning include distance to neighborhood (measured by the walking time to the 20 closest neighbors) and its interaction with year dummies. The dependent variable is 1 for adopters and 0 for non-adopters. *** denotes significance at one-percent, ** five-percent, and * ten-percent level. Explanatory variables Coefficient Robust Coefficient Robust std error std error Social Learning Social learning within village 0.019 0009’” Social learning within village 0.018 0.008** and nearby villages Conditional mean of market 0415 0.357 -0.37 0.33 return Market volatility -0.0017 0.0006“ —0.0017 0.0005*** Years of awareness of the -0.0094 0.0073 -0.01 0.0073 technology Outpugarice/input price 0.82 0.211*** 0.85 022*” Household Characteristics Family size 0.020 0.017 0.021 0.016 Age of family head -0.0034 0.0017” 7 -0.0037 0.0015“ Education of family head 0.0013 0.0044 -0.0003 0.004 Off-farm income -0.0068 0.0067 -0.007 0.006 Farm size 0.006 0.006 0.0076 0.0057 Irrigation ratio 0.057 0.047 0.059 0.039 House value -0.0017 0.0032 -0.0022 0.0031 Greenhouse construction cost 0.0082 0.015 0.007 0.13 Times of major reallocations -0.018 0.03 -0.026 0.032 Times of minor reallocations -0.0004 0.0082 0.0009 0.008 Grain share 0.157 0.097 0.143 0.09 Dummies and constant terms Crop dummy 0.0044 0.039 0.064 0.044 County dummies Yes Yes Year dummies Yes Yes Interaction terms Yes Yes Constant terms -0.681 0.26** -0.684 0.27** Observations 626 626 Adjusted R-squared 0.82 0.83 Over-Identification Hansen J p-value 0.20 p-value 0.21 Test: C-Statistics 4l Table 1.5 Distance to Neighborhood and Characteristics of Household This table summarizes the walking time to the 20 closest neighbors for households categorized by their education, wealth, and age levels. Education of Distance to Real value of Distance to Age of head Distance to family head 20 closest House 20 closest of household 20 closest (school year) neighbors (I 0,000 neighbors (year) neighbors (minute) Yuan) (minute) (minute) 0 14 <02 14 <20 18 1 21 0.2~0.5 25 20~25 16 2 13 0.5~1 16 25~30 14 3 15 1~2 16 30~35 17 4 21 2~3 16 35~40 l6 5 19 3~4 16 40~45 16 6 l7 4~5 16 45~50 15 7 13 5~6 l3 50~55 17 8 15 6~7 15 55~60 16 9 l6 7~8 17 >60 15 10 15 8~9 13 >1 1 13 9~10 16 >10 16 42 BIBLIOGRAPY Abasov, T. M. (2005): Dynamic learning effect in corporate finance and risk management. Ph.D. Dissertation. University of California, Irvine. Banerjee, A., and Duflo, E. (2002). Do firms want to borrow more? Testing credit constraints using a directed lending Prggram. MIT Department of Economics, Working Paper No. 02-25. Bradiera, 0., and Rasul, I. (2006). Social network and technology adoption in Northern Mozambique. Economic Journal: 116, 869-902. Bertola, G., and Caballero, R. (1994). Irreversibility and aggregate investment. Review of Economic Studies: 61, 223-246. Besley, T., and Case, A. (1994). Diffusion as a learning process. Evidence from HYV cotton. mimeo, Princeton University. Brennan, M. J. (1998). The role of learning in dynamic portfolio decisions. Eurogean Economic Review: 1, 295-306. Chinese Agricultural Yearbook (2006). Chinese Agricultural Press. Conley, T., and Udry, C. (2001). Learning about a new technology: pineapple in Ghana. American Journal of Agricultural Economics: 83, 668-673. Dixit, A. K., and Pindyck, R. S. (1994). Investment under Uncertainty. Princeton University Press. Feder, G. (1980). Farm size, risk aversion and the adoption of new technology under uncertainty. Oxford Economic Papers, New Series: 32, 2, 263-283. Foster, A., and Rosenzweig, M. (1995). Learning by doing and learning from others: human capital and technical change in agriculture. Journal of Political Economy: 103, 1176-1209. Gennotte, G. (1986). Optimal portfolio choice under incomplete information. Journal of Finance: 41, 733-746. Griliches, Z. (1957). Hybrid corn: an exploration in the economics of technological change. Econometrica: 25, 501-522. Hassett, K. A., and Metcalf, G. E. (1995). Energy tax credits and residential conservation investment. NBER Working Paper No. W4020. 43 Huang, L., and Liu, H. (2007). Rational inattention and portfolio selection. Journal of Finance: 62, 1999-2040. Liptser, R., and Shiryaev, A. (2001). Statistics of random processes. Springer-Verlag, Berlin. Manski, C. F. (1993). Identification of social effects: reflection problem. Review of Economic Studies: 60, 531-542. McDonald, R., and Siegel, D. (1986). The value of waiting to invest. Quarterly Journal of Economics: 101, 707-728. Merton, R. C. (1980). On estimating the expected return on the market. J ourn_al of F infancial Economics: 8, 323-361. Munshi, K. (2004). Social learning in a heterogeneous population: social learning in the Indian green revolution. Journal of Development Economics: 73, 185-213. Nelson, A. W., and Amegbeto, K. (1998). Option values to conservation and agricultural price policy: application to terrace construction in Kenya. American Journal of Agricultural Economics: 80, 409-418. Newbery, D. and J. Stiglitz (1981). The they of commodity price stabilization. Oxford: Clarendon Press. Olmstead, A. L., and Rhode, P (1993). Induced innovation in American agriculture: a reconsideration. Journal of Political Economy: 101, 100-118. Roumasset, J. (1976). Rice and risk: decision making among low income farmers. Amsterdam: North Holland. Sunding, D., and Zilberman, D. (2000). Research and technology adoption in a changing agricultural sector. Draft for the Handbook of Agricultural Economics. Wan, X. (2000). The Chinese protection agriculture outlook and trend. Agricultural Machinem: 2000, 4-6 (in Chinese). Weinberger, K., and Lumpkin, T. (2005). Horticulture for poverty alleviation: the unfunded revolution. AVRDC Working Paper 15. Wooldridge, J. (2002). Econometric analysis of cross section and panel data. MIT Press, Cambridge. Xia, Y. (2001). Learning about predictability: the effects of parameter uncertainty on 44 dynamic asset allocation. Journal of Finance: 56, 205-246. Zilberman, D., Sunding, D., Howitt, R., Dinar, A., and MacDougall, R. (1994). Water for California agriculture: lessons from the drought and new water market reform. Choices: 4, 25-28. 45 Chapter 2: Partial Maximum Likelihood Estimation of a Spatial Probit Model 2.1 Introduction Most econometrics techniques on cross-section data are based on the assumption of independence of observations. However, economic activities become more and more correlated over space with modern communication and transportation improvements. On the other hand, technological advances in communications and the geographic information system (GIS) make spatial data more available than before. Spatial correlations among observations received more and more attentions in regional, real estate, agricultural, environmental and industrial organizations economics (Lee 2004). Econometricians began to pay more attention on spatial dependence problems in the last two decades and some important advances have been done in both theoretical and empirical studiess. Spatial dependence not only means lack of independence between observations, but also a spatial structure underlying these spatial correlations (Anselin and Florax 1995). There are two ways to capture spatial dependence by imposing structures on a model: one is in the domain of geostatistics where the spatial index is continuous (Conley 1999), the other is that spatial sites form a countable lattice (Lee 2004). Among the lattice models, there are also two types of spatial dependence models according to spatial correlation between variables or error terms: the spatial autoregressive dependent variable model (SAR) and the spatial autoregressive error model (SAE). In most applications of 5 . . . . . Anselm, Florax and Rey (2004) wrote a comprehenswe revrew about econometrics for spatial models. 46 spatial models, the dependent variables are continuous (Conley 1999; Lee 2004; Kelejian and Prucha, 1999, 2001; among others), and only few applications address the spatial dependence with discrete choice dependent variables (exceptions include: Case 1991; McMillen 1995; Pinkse and Slade, 1998; Lesage 2000; Beron and Vijerberg 2003). This paper is designed to address this gap and we are concerned about the SAE model with discrete choice dependent variables. As the name indicates, there are two aspects in the discrete choice model with spatial dependence. First, the dependent variable is discrete and the leading cases occur where the choice is binary. Probit and Logit are the two most popular non-linear models for binary choice problems. For the sake of brevity, in this study we focus on Probit model, but the approach developed here generalizes to other discrete choice models. In discrete choice models, if the observations are independent, we use maximum likelihood estimation to get efficient estimators given the correct conditional distribution of dependent variables. The nice part of the maximum likelihood estimator (MLE) is that we can still get consistency, asymptotic normality but inefficient estimators in many situations (panel data or clustering) by pseudo MLE even when we ignore certain dependence among observations (Poirier and Rudd 1988). However, the non-linear property causes computation difficulties in estimation, and this computational difficulty becomes much worse when dependence occurs, which results in solving n-dimensional integration. Dependence is the other aspect of this problem. General forms of dependence are rarely allowed for in cross-sectional data, although routinely allowed for in time-series data (Conley 1999). For example, some scholars discussed discrete choice models with dependence in time-series data: Robinson (1982) relaxed Amemiya (1973) assumptions of 47 independence in Tobit model, and proved that the MLE with dependent observations is strongly consistent and asymptotically normal under some regularity conditions. Poirier and Rudd (1988) discussed the Probit model with dependence in time-series data, and developed generalized conditional moment (GCM) estimators which are computational attractive and relatively more efficient. However, dependence in space is more complicated than in the time setting because of four reasons: first, time is one dimensional whereas space has at least two dimensions; second, time has natural order (direction) whereas space has no natural direction; third, time is regularly divided because of regular astronomical phenomena whereas spatial observations are attached to geographic properties of the surface of the earth; fourth, time-series observations are draws from a continuous process whereas, with spatial data, it is common for the sample and the population to be the same (Pinkse et al. 2007). Therefore, how to deal with dependence in space in estimation is the key to spatial econometricians. Inspired by works about dependence in time-series data, Conley (1999) uses metrics of economic distance to characterize dependence among agents, and shows that the GMM estimator is consistent and asymptotically normal under some assumptions similar to time-series data. He also provides how to get consistent covariance matrix estimator by an approach similar to Newey-West (1987). Pinkse and Slade (1998) use GMM in the discrete choice setting with the SAE model, and show that the GMM estimator remains consistent and asymptotically normal under some regularity conditions. Although Pinkse and Slade (1998) generated generalized residuals from the MLE as the basis of the GMM estimators, they do not take advantage of information from spatial correlations among observations, and hence the GMM estimator is much less efficient than full ML 48 estimators. Lee (2004) examines carefully the asymptotic properties of MLE and quasi-MLE for the linear spatial autoregressive model (SAR), and he shows that the rate of convergence of those estimators may depend on some general features of the spatial weights matrix of the model. If each units are influenced by only a few neighboring units, the estimators may have J; -rate of convergence and asymptotic normality; otherwise, it may have lower rate of convergence and estimators could be inconsistent. In this study, we choose to capture spatial dependence by considering spatial sites to form a countable lattice, and explore a middle-ground approach which trades off efficiency and computation burdensome. The idea is to divide spatial dependent observations into many small groups (clusters) in which adjacent observations belong to one group. The implicit rationale behind this is adjacent observations usually account for the most important spatial correlations between observations. If we can correctly specify the conditional joint distribution within groups, which allows us to utilize relatively more information of spatial correlations, estimating the model by partial MLE will give us consistent and more efficient estimators, which should be generally better than GMM estimators. However, this approach is subject to biased variance-covariance matrix estimators because of spatial correlations among groups. To deal with this problem, we follow the methods proposed by Newey-West (1987) and Conley (1999) to get consistent variance-covariance matrix estimators. Of course, this middle ground approach will not get the most efficient estimator. However, since information from adjacent observations usually capture important spatial correlations in the whole sample, we get a consistent and a relatively efficient estimators, and we avoid some tedious computations at expense of a loss of a relatively small part of efficiency. 49 This paper is organized as follows. First, we review econometric techniques on discrete choice models. Second, the SAE model with discrete choice dependent variable is presented and regularity conditions are specified. Section 3 presents the bivariate spatial Probit model. In Section 4, we prove consistency and asymptotic normality of partial ML estimators under regularity assumptions, and discuss how to get consistent covariance matrix estimators. Section 5 presents a simulation study showing the advantages of our new estimation procedure in this setting. Finally, Section 6 concludes. The proofs are collected in Appendix 1, while the results for the simulation study are provided in Appendix 2. 50 2.2 Discrete Choice Models with Spatial Dependence 2.2.1 Probit Model without Dependence We first review the standard Probit model without dependence and the underlying linear latent variable model is: Yi*=Xi,B+ai, (1) where Y; is the latent dependent variable and a scalar, X‘l is a 1x K vector of regressors, fl is a K x1 parameter vector to be estimated, and 51' is a continuous random variable, independent of X i, and it follows a standard normal distribution. However, we cannot observe Y; , and we can only observe the indicator 1’; , which is related to 1’? as follows 1 nrf>a Yi = l, (2) 0 if Y1. 30. Therefore, we can get the conditional distribution of Y ,- given X; as a]: PO? =1|Xi)= P(Y,- > 0 I Xi) = P(8i > -Xifl I Xi) = (Xifl), (3) where CD denotes the standard normal cumulative distribution function (cdf). It is easy to see we can get 1’0? =0IXi)=1-(Xifl). (4) Since Y,‘ is a Bernoulli random variable, we can write the conditional density function of Y i conditional on X i as 51 Y- l—Y- . f(Yi IXi)=l¢(Xifl)l III-“Xiflfl ', Y1 =0,1. (5) Also, given the independence assumption of random variables, the log likelihood function can be written as n L0g(L) = 2 U? 10gl¢(Xi,3)l + (1 - Yi)10gll - (”Xi/0]}, (6) i=1 and the sufficient condition for uniqueness of the global maximum of L0g(L) is that the /\ function is strictly concave (Gourieroux 2000). We can solve then ,6 from the first order condition aLog = g Y: —1: L(,B,A)=—-2—ln(27r)—-2—(Y —X,B)AA(Y —X,B)+lnlA| (13) A v I I 1 1|: where A=I- AW, and then the estimate of ,6 can be solved as ,B = (X A A){)_1 X A A Y However, 1n practlce we cannot observe Y , and we can only observe Y i, and 1t Implies a non-linear Probit model because of the normal distributional assumption. Moreover the errors are correlated, and the full likelihood function becomes 53 a1 a (14) n L=P(Y1=J’1-Y2=y29"'Yn=J/n)= I l¢(u)du> —oo —oo 1 1 (15) -11 __. ’Q" ¢= (22:) 2 191“ e 20‘ u). Although theoretically, if we take the first derivatives subject to ,6 and the spatial coefficient 2. , we obtain a1 a _17_ _I ' _ ' _ W a{ j "(2m 2|(1—AW)'(1—,1W)le 2[u(I AW)“ '1 )u]du} 9L: —oo —oo :0 aa afl ’ a] a __l]_ __l_ r _ I _ 5i I Ina”) 2 |(1-/‘~W)'(1--4W)|e 2[u(1 AW)“ lW)u]du} 95_ —oo —oo :0 at at ' (above,(16) and (17) The expression of the first derivatives are quite complicated, but if we have sufficient computational ability and ,6 and x1 are identifiable, we can get consistent and efficient estimates of ,6 and l by using numerical methods. However, in practice, it would be a formidable computational task even for a moderate size sample. We now propose a more attractive procedure in the next sections. 2.2.3 Probit Models with Other Forms of Spatial Correlation Generally, there is no reason to think that spatial correlation is properly modeled by (9). Other forms are possible. For example, one might assume that, outside of a certain geographic radius from a given observation 1', 81' is uncorrelated with shocks to the 54 outlying regions. So, for example, we might assume a constant correlation with any unit within a given radius -- similar to a random effects structure for unbalanced panel data. Alternatively, we may prefer more of a moving average structure, such as 8i = “1+4 2 Wihuh (18) h¢i where the ui are i.i.d. with unit variance. This formulation is attractive because it is relatively easy to find variances and pairwise correlations, which we will use in the partial MLEs described in the next section. For example, Var(g,- |W)=l+/12 2 W31 (19) h¢i Clearly, methods that use only the variance in estimation can only identify x12 (but we almost always think ,1 > 0 , anyway). Pairwise covariances can also be obtained, C0v(ei,ej|W)=/1Wij +2Wji+12 hi2“ ,Wthjh . (20) l, j Expressions like this for the covariance between different errors are important for applying grouped partial MLE methods 55 2.3 Using Partial MLEs to Estimate General Spatial Probit Models Estimating a Probit spatial autocorrelation model by full MLE is a prodigious task, although several approaches have been applied. The EM algorithm can be used (McMillen 1992), the R18 simulator (Beron and Vijverberg 2003), and the Bayesian Gibbs sampler (Lesage 2000). But each of these approaches is still computationally burdensome. To combine such approaches with simulation studies, or to be able to quickly estimate a range of models, is outside the abilities of even current computation capabilities for even moderate sample sizes. To get an estimator that is computationally feasible, Pinkse and Slade (1998) proposed using generalized method of moments (GMM) using information on the marginal distributions of the binary responses. In particular, the generalized residuals from the marginal probit log likelihood are used to construct moment conditions for the GMM method. Pinske and Slade show that, under conditions very similar to those in this paper, the GMM estimator is consistent and asymptotically normal. The consistent variance-covariance matrix can also be obtained theoretically without a covariance stationary assumption, although Pinske and Slade (1998) do not discuss estimation of the asymptotic variance. Therefore, the GMM estimator is almost practically useful, but it is fundamentally based on the marginal probit models. Thus, while a GMM estimator can be obtained that is efficient given the information on the marginal likelihood, the method throws out much useful information. We describe a simplified version of this approach in section 2.3.1, which, in effect, uses a heteroskedastic probit model to estimate the ,6} along with any spatial autocorrelation parameter. 56 Using only the marginal distribution of Y1, , conditional on the covariates and weights, likely results in serious loss of information for estimating both ,6 and the spatial autocorrelation parameters. Our key contribution in this paper is to explore the use of partial maximum likelihood where we group small numbers of nearby observations and obtain the joint distribution of those observations. Naturally, these distributions are determined by the fully specified spatial autocorrelation model -- just as we must obtain the implied variance to apply marginal probit methods. Once the covariances between observations are found as a function of the weights and l , we can use that information in multivariate probit estimation. Section 2.3.2 covers the case of where we describe a bivariate probit approach, with heteroskedasticity and covariance implied by the particular spatial autocorrelation model. Using a single covariance in addition to the variance seems likely to improve efficiency of estimation. 2.3.1 Univariate Probit Partial MLE One way to estimate the coefficients ,6 along with spatial correlation parameters is to derive the marginal distributions, P(Y1' = 1 | X, W) as a function of all of the weights (and the parameters, ,6 and A , of course). Under the joint normality assumption, the model will be a form of probit with heteroskedasticity. In particular, given any spatial probit model such that the variances are well defined, we can find Pa? =1IX.W)= chum/aim», (21) where 01.201) = Var(e,' |X,W) = Var(ei |W) is a function of all weights, W, and the spatial correlation parameters A . As is well-known in time series contexts — for example, 57 Poirier and Ruud (198 8) or Robinson (1982) — using probit while ignoring the time series correlation leads to consistent estimation under standard regularity conditions, provided the data are weakly dependent. Thus, it is not surprising that pooled probit that accounts for the heteroskedasticity in the marginal distribution is generally consistent for spatially correlated data, too -- provided, of course, we limit the amount of spatial correlation. The log likelihood can be written generically as n L0g(L) = Z {Yi10gl0, Yg _ (23) . =1: 0 If Yg S 0 Therefore the conditional bivariate normal distribution of Y g 1 and Y g2 given X g is given as 60 10(1’g,=1,1’g2 =1| Xg)= P(Xgl,6+egl > O,Xg2,6+8g2 >O|Xg) (29) X1,3 X 2,5 =P(8gl — X g2,6, and egg follows a normal distribution and it is independent of X g, then the density of 5g2 given ag2> — X g 26 egg is 5g2 8g2 ¢(\/chr(r;,>,2))(15(\/V67”(5g2)) 3 P(ag2 >—- ngfl): ( ngfl ) (9) \/Var (8g2) Therefore, P(Yg1=1|Yg2 =1,Xg)=E[P(Yg1=1|Xg,eg2)|Yg2 =1,Xg) (40) X lfl+5 18 2 _ g g g _ (41) _E(D Y —1,X 1 00 WXglfl+5g15g2 5g2 = d (D( ngfl )J—ngflcp iVar(eg1) ¢(2),/Var(eg ))8g2 (42) , jVar(8g2) 62 and it is easy to see that p(yg1=o|yg2 =1.Xg)=l-P(Ygl =1|yg2 =LXg) because YgI is the binary variable. Similarly, we can get -———-)d£ JVar(eg1) /V0r(£g2 ) g2 Pn’gi = I I Ygz =0.Xg) = 15.525 cut I Xg216 l— (———-)deg2 (W ngfl ) XgZfl \[Var(eg1)‘ Var(£g2) 1jVaI‘(¢S‘g2) X 26 g xq>(-————) /Var(£g2) (44) =I°°Xg deglfi+5g15g2m 5g2 )dg /Var(eg1) iVar(gg2) 32’ (45) and similarly we can obtain finally X 25 15“? glé‘ 2 5 2 _ _ g 00 X8 g g P(Y1-0,Y 2 -1IX )=‘D(—-)- ¢( )¢(——-—-)d£ g g g /Var(gg2) L gzfl \/Var(e:1) /Var(£g2) g2 X 1fi+6 16 2 2 (46) _ _ _ X g g g 58 P(Yg1 — I,Yg2 —O|Xg)-L°82fl¢( \[Var(e 1) JVarbeg 2))dg g2 g (47) X 2,6 113+5 15 2 5 2 _ __ g X 16(1) Xg 8 g g _[1 <1>(—————)]—_ 2 <1>( d . jVar(£g2) I 08 JVar(eg61) \/Var(£g2)) 5g2 (48) 63 2.4 Partial Maximum Likelihood Estimation As we discussed in the introduction, if the observations are independent, we can simplify the multivariate distribution into the product of univariate distributions, and then the ML estimator can be obtained easily. However, spatial correlations among observations do not allow the simplification any more. Under spatial correlation, the situation is kind of similar to the panel data case. In panel data, we cannot assume independence among observations over different periods for the same person (or firm), which means we are not likely to specify the full conditional density of Y given X correctly. Therefore, we need to relax the assumption in the panel data case. The way we deal with the problem is that if we have a correctly specified model for the density of 1’; given X t, we can define the partial log likelihood function as Max 1; g log ft(yit | X1139), (49) 9€®i=1t=1 where ft (yit | X it: 6) is the density for yit given xi, for each I. The partial log likelihood function works because 60 (the true value) maximizes the expected value of the above equation provided we have the densities ft (Vit | X it, 6) correctly specified (Wooldridge 2002) We can apply a similar idea to the spatial Probit model: if we have the bivariate normal densities (152 g(Yg1,Yg2 [X g,6) correctly specified for each group, we could get a consistent estimator by partial ML. However, there are several differences between panel data and spatial dependent data: first, the panel data model assumes that the cross section 64 dimension (N) is sufficiently large relative to the time dimension (T), but in spatial data we do not have this assumption. Second, in the panel data model, we view the cross section observations as independent, while in the spatial data model, even though we divided the sample into 11 groups, however, we are definitely not assuming independence among groups. Observations in different groups are still correlated, but the correlations are assumed to decay as distances become further away. Third, as we discussed before, dependence in space is more complicated than dependence in time, and we need to assume that the correlations between groups die out quickly enough as distance goes further away. In short, we need to examine carefully how the weak law of large numbers (WLLN) and central limit theorem (CLT) can be applied in the spatial dependent case. We will discuss these issues and provide proofs in the following sections. First, we can write the partial log likelihood function as n L = gE1{Yg1Yg2 long(Yg1 = 1,1’g2 = 1 |Xg)+ Yg1(1 —Yg2)long(ng = I,Yg2 = 0| Xg) +(1—Yg1)(l—Yg2)long(Yg1=0,Yg2 =0|Xg)}, g: l,2...n (50) and for the sake of brevity, we define Pg<1.1>along(Yg1=l.Yg2 =1ng); Pg(1,0)-Iong(Yg1 =1,Yg2 =01Xg); (51) Therefore, we can rewrite the partial log likelihood function as n L: z {Ylegng(l,1)+Yg1(l—Yg2)Pg(1,0) g=1 (53) +(1—Yg1)Yg2Pg(0,l)+(1—Yg1)(1—Yg2)Pg(0,0)}. 65 2.4.1 Consistency of Bivariate Probit Estimation A A A , Consistent estimators 6 s ( ,6,/1) are the ones that converge in probability to the true /\ value 60 E(,60,/10)' , i.e.6—"L>6O, as the sample size goes to infinity for all possible true values. In this section, to make the asymptotic arguments formal, we distinguish between the true value, 60 , and a generic parameter value 6. A A In the bivariate probit estimation, the estimator 6 is defined as: 6 maximizes Qn (6) subject to 6 e O , where O is the parameters set. The objective function Q" (6) is defined as 1 n Qn(9)E — z {YgiYgngoo THEOREM 1. If (i) 60 is the interior of a compact set 6), which is the closure of a concave set, (ii) Q attains a unique maximum over the compact set O at 60, (iii) Q is continuous on O , (iv) the density of observations in any region whose area exceeds a fixed minimum is bounded, (v) as n—mo, )<°°, supg(ll l + I + I + 1 N ”Pr(Ygl=l,Yg2=l|Xg) Pr(rg.=1,rg2=0|xg) Pr(Yg1=0,Yg2=l|Xg) Pr(Yg,=0,Yg2=0|Xg)“ (vi) as n —> oo, SUPg("Xg“+leg”) =00), (vii) SUPngf |Cov(Ygi,Yj,°) ls a(dgj),i= 1,2 where dgf denotes the distance between group g and j , and a(d) —) 0 as d —-) co, and M”) lim E[Qn (6)] exists, (ix) supglle“ < oo , then 6— 60 = op (1) n—>oo Proof: Given in Appendix 1. Condition (i) is a standard assumption from set theory. Condition (ii) is the identification condition for MLE. Condition (iii) assumes that the function Q is continuous 67 in the metric space, which is a reasonable assumption and necessary for the proof that Qn (6) is stochastically equicontinuous. Condition (iv) simply excludes that an infinite number of observations crowd in one bounded area. The minimum area restriction is imposed because an infinitesimal area around a single observation has infinite density. Condition (v) makes sure any one of these four situations will be present in a sufficiently large sample. Condition (vi) makes sure the regressors are deterministic and uniformly bounded, which is not a strong assumption in this literature. Condition (vii) is the key assumption for this theorem, and it requires that the dependence among groups decays sufficiently quick when the distance between groups become further apart. This assumption employs the concept from a -mixing to define the rate of dependence decreasing as distance increases. Condition (viii) assumes the limit of E [Sn (6)] exists as n —) 00, which is not a strong assumption. Condition (ix) is actually implied by the rule of dividing groups, which just excludes that the two groups are exactly in the same location. 2.2.2 Asymptotic Normality As we discussed in the introduction, the spatial dependence is more complicated than time-series dependence at least in four perspectives. These differences cause that central limit theorem (CLT) need stronger conditions for the spatial dependence case. To deal with general dependence problems, the common way in the literature is to use the so called "Bernstein Sums", which break up Sn into blocks (partial sums), and we consider the sequence of blocks. Each block must be so large, relative to the rate at which the memory of the sequence decays, that the degree to which the next block can be predicted from current information is negligible. But at the same time, the number of blocks must increase with n 68 so that the CLT argument can be applied to this derived sequence (Davidson 1994). In this section, we show under what assumptions we are able to apply McLeish‘s central limit theorem (1974) to spatial dependence cases to get asymptotic normality for the spatial Probit estimator. This is presented in the following Theorem. AT denotes the transpose of matrix A. THEOREM 2: If the assumptions of Theorem 1 hold, and in addition: (i) as d —> oo, 2 at: d a (d? ) = 0(1) for all fixed 61* > 0 (ii) the sampling area grows uniformly at a a (d ) rate of J; in two non-opposing directions, (iii) B(60)Elimn_,oo E[nSn(60)S,Z;(60)] and A(eo)slimn_,oo—E[H(eo)] are uniformly positive definite matrices; then J;(8—90)—>N[0,A(90)_IB(90)A(90)_1] where Sn(90)EaaQ6"(90) and 52 H(90) =—9"T(6o). 0950 Proof: Given in Appendix 1. Condition (i) is stronger than condition (vii) in Theorem 1, and it is also stronger than the usual condition in time series data because spatial dependent data has more dimension correlations than time series data. It shows that how dependence decays when distance between groups gets further away, and the dependence decays at the rate fast enough. Condition (ii) just repeats the assumption in the Bernstein's blocking method, the two non-opposing directions just exclude sampling area grows at two parallel directions, which 69 does not make much sense in spatial dependent case. Conditions in (iii) are natural conditions about matrices, which are implied by the previous assumptions. Matrices are semidefinite if some extreme situations happen such as Pr (Y g 1 = l, Yg2 =l| X g) = 0, which are assumed to be excluded in the previous assumptions. 2.4.3 Estimation of Variance-covariance Matrices Consistent estimation of the asymptotic covariance matrix is important for the construction of asymptotic confidence intervals and hypothesis tests (Newey and West A A 1987). The estimations of A (i.e. A = A(6)) are relatively easy, usually just obtaining A A A sample analogues of 60 with 6; but the estimation of B (i.e. B = 6(6)) is more difficult and more important because of the correlations among groups. Newey-West (1987) proposed a method to estimate the variance-covariance matrix in settings of dependence of infinite order under a covariance stationary condition, and they suggested modified Bartlett weights to make sure the estimated variance and test statistics were positive. Andrews (1991) established the consistency of kernel HAC (Heteroskedasticity and Autocorrelation Consistent) estimators under more general conditions. Pinkse and Slade (1998) also A showed that we can obtain 3" (6)— 8(60) =0 p(l) under regularity assumptions, where Bn (6) s nE[Sn (6)5;(6)] (see Lemma 9 in Appendix 1). This approach is feasible in practice only if we can get closed form expressions for E [Sn (6)S,Zj (6)] , which A should be a function of 6, and then plug in 6 for 60 in the function to get consistent 70 A covariance estimators. However, it is difficult to get closed form expressions for B" (6) in practice, and hence we follow an alternative approach proposed by Conley (1999). A feasible way to obtain a consistent estimate of a variance-covariance matrix that allows for a wider range of dependence is to apply the approach of Conley (1999) along the lines of Newey-West (1987). We follow this procedure in the following Theorem 3. Let EA be the 0' — algebra generated by a given random field V’s m ,3," E A with Acompact, and let |A| be the number of sm 6 A. Let Y(AI,A2) denote the minimum Euclidean distance from an element of A] to an element of A2. There exists also a regular lattice index random field W 3* that is equal to one if location s e Z 2is sampled and zero otherwise. W s * is assumed to be independent of the underlying random field and to have a finite expectation and to be stationary. The mixing coefficient is defined as ak,1(n)a suplP(A A B)— P(A)P(B)}, A 6 EM ,8 6 EM and lAllS k, A2|Sl, Y(A1,A2)Zn. We also define a new process RS (6) such as s(e) if W§ =1, Rs(9)= * 0 if W5 =0. Then THEOREM 3. If (i) Ar grows uniformly in two non-opposing directions as r —) oo, (1'!) 3090) E “mu—>00 1331514905171w (90)] and A(90) E limn—mo—EII'KQOH are uniformly positive definite matrices, (iii) Y gi, 137 as defined in Theorem I , i=1,2 and 71 a1: . . . . . W s are mixing where 05k, 1 (n) converges to zero as n -> 00, S (6) is Borel measurable for all 668, and continuous on O and first moment continuous on O, (iv) 2;: 1makJ(m)0, a/(2+5) EG|S(60 102“? < 00 and ZzzlmamV”) < 00, (vii) H(6) is Borel measurable for all 66 O, continuous on O and second moment continuous, A(60) exists and is full rank, (viii) 2 se Z2 COV(R() (60 ), Rs (60 )) is a non-singular matrix, (ix) the K Mp(j,k) are uniformly bounded and K Mp0,k)—>1 , n, —>00 as r—>oo(M,P—>oo), LM=0(M1/3) and Lp=o(P1/3] , (x) for some 6>0, 4+6 EQIS(90 )j) < 00 and Ygi, Y],- as defined in Theorem 1, i= 1,2and WS* are mixing . a/(2+a) —4 . 2 where 0100,0007?) = o m , (x1) E supgan, 17(61‘ < co and E sup@”(6/ 66)[Rm, p(6)]|2 < 00, then A BT_B(60)=0p(1) as r—)oo where we split 5 = [m, p] , AT is a rectangle so that m 15 {1,2,..., M} and p 6 {1,2,..., P} and 72 A L. L. 3,, (a) Rm-,.,_. (2)3. B. = n; 2: S: Zawgk) A A T j=0 k=0 m=j+1p=k+l trim—”-1, (6) Rn”, (6) M fizz-lam) a"? m=1 p=1 To ensure positive semi-definite covariance matrix estimates, we need to choose an appropriate two-dimensional weightsfunction that is a Bartlett window in each dimension kl) (1- )0“ IF) f0r|j| Q(6) uniformly, by the information inequality, Q(6) has a unique maximum at the true parameter when 60 is identified. Then under technical conditions for the limit of the maximum to be the maximum of the limit, 6 should converge in probability to 60. Sufficient conditions for the maximum of the limit to be the limit of maximum are that the convergence in probability is uniform and the parameter set is compact (N ewey, 1994). To prove consistency, the proof includes three parts: (i) Q has a unique maximum at 60. (ii) Qn(6)- Q(6) = op(l) at all e e e. (iii) Q" (6) is stochastically equicontinuous and Q is continuous on O. Condition (i) and Q to be continuous on O are assumed. The proof of condition (ii) is provided in Lemma 1, and the proof that Qn (6) is stochastically equicontinuous can be found in Lemma 2. Q.E.D. Proof of Theorem 2. To find out the asymptotic normality of the Partial MLE for spatial bivariate Probit model, we start the proof from mean value theorem. Since%(6) = 0 and by using the mean value theorem 79 62 air, 66" 6Q” * ‘— ae ——(6’)= 0=—-— 519 —(60)+6_eaeT(6 )(6 90) (60) (61) . _ 6 =16—60>=-1—62—Q—’} (6* )1 1 —Q——” ———(60) 6666 where 6* lies between 6 and 60. I 2 First, let us discuss the term 191—(6*)to find out the asymptotic properties of 6666 2 * 519%:(9 ). Recall that 5969 1 11 92(9) = ; 216,113,210, (1.1) + Yg1<1— Yg2>Pg <1, 0) g: . (62) +(1— Yg1>Yg2Pg(0.1)+(1— Yg1><1- Yg2in(o,o>}, where Pg(191)EIOng(Yg1:19Yg2 =1l Xg) etc. Also 22 1 n 62,, (1,1) 62,, (1,0) —Q—",<6> = — 2 {11.11322 ——g—T—+ 13,10 - Yg2)—g—7— acne n g=l acne 561219 63 6g2P (o, 1) 21;, (0,0) ( ) 9 +1—Y 1+—-Y 1— ( g1>1Yg2)———6T666 +( an< g2) aeaeT 80 where 6ng (1,1) = -1 6Pr(Yg1 = 1,17g2 =1|xg)]2 aeaaT [Pr(Yg1 = 1, Ygz =1 1 Xg)]2 (’39 2 (64) + 1 5 [Pr(Ygl:19Yg2 =1IXg)] Pr(Yg1=1,Yg2=1ng) 6666T , and all other terms behave similar. As before, we only discuss one of these terms, and the same logic applies to the other terms. We know that 2 l n a P (191) 4; — ZIYgIYg2—g—T—(e )1 ’7 g=l 6666 In -1 6Pr(Y1=l,Y 2=1|X) * —— Z Yleg2{ 2 g 6; g (Q >12 (65) n g=1 [Frag] =1.Yg2 =11 X,» + 1 62[Pr(Ygl =1,Yg2 =1|Xg )] (6*)} Pr(Yg1 =1.Yg2 =11Xg) aeaeT ° Look at the first term of the above equation given by 1 n -1 6Pr(Y1=1,Y 2=11X) ; 2 Yg1Yg2{ 21 g 219g g (0*)12}. (66) Since 1 2 ng=1 66 81 where Kgll a Ylegz “1 2 1Pr: Kg111 g a; g (60)]2, g=1 g=1 (68) we need to show that it holds for all “w“:l. Set Kg” -w Kgand then T] n 6Pr(Yg1 =1,Y =g.11X) _ K (6 (U {ngz=l glll 6:2 )I2 l n 6Pr(Yg] =1,Yg2 =I| g) 2 __ K ’6 ”gzzl gll 66 \ 0)] } 1n 6Pr(Y1=1,Y =.gl|X) 6Pr(Y]=1,Y2=1|X) 721K211 g 652 (6)12— g a; g (90112} g: 22—5er =1,Y =IX 62PrI’ =l,gY =1Xg =(é_ 90);"gEKg11 (1 g2 I gm) 6)x (g1 T2 | Q(6) 60 56619 (above, equation (69), (70), (71)) From the proof of Theorem 1, we know that sup gua Pr(Yg ’YgZ =1|Xg)ll< oo 66 ”62 Pr(Yg1,Yg2=1|Xg)” From Lemma 3, sup g < 00. From Theorem 1, we also I 66667" I know that 6 — 60 = op (1) and hence 82 __6Pr(l’ gl=lYg2=l|/(\g) 6x*) 62Pr(Yg]=IYg2=I|Xg)(6 1.) 2 (9 00)_ 2 K II ”g: 1 g 5‘9 aaaeT =0p(l) 6Pr(Yg]=l,l’g2=I|Xg) 6Pr(l’g1=l,l’g2=l|Xg ) (601121 n 26711 2 Kgni (6)1 “—1 z @111 n n g=l ae g—l ae =0p(l) 6P Y =1,Y =1,Y) n 6150’ =1,Y =I|X) =>- 2 K glll “ g] 5:2 I g(6*)12—p+l Z Kg111 r g] 5:2 g (60)]2 "=g-1 "g=l (above, (72), (73), (74). By definition, 6Pr(l’ =1,Y =I|X 6Pr(Y 1:1,1’ 2=I|X) 11m — z Kgni ,1 .52 g(60)12 =61Kg111 g 6: 5 (601121. n—>oo" g: _l (75) and therefore, 1 ”‘ -1 6Pr(Y1=1,Y2=1|X) * p — 2 Y,1Y,2( , , a; , (6 )121—> " g=1 [Pr(Yg1 =1,Y,2 =1 I X,)] (76) -1 6Pr(Y]=1,Y 2=1|X) E{Y, 21 , , , (6011’). (77) “rY,1P( =1,Y,2=11X,)1 56’ Similarly, we can prove in relation to the second term that 1 n 1 621Pr(Yg1=1Yg2=11Xg)1 Z Z YgIYgZPrU —1Y -1|X ) T (6) (78) g: g1 — 1 82' g 6666 2 P 1 6 [Pr(Y 1:1,1’ 2=1|X )] TEQ/glygzp Y -1Y -1X 8 6‘ g (90» 79 r(gl-agZ-l g) 5959 () As usual, we apply repeatedly the above arguments to the other terms. Finally, we can get that 83 11m ——Q——62’;(6*—>) [961-6—— 2.0%,)“ (80) n—->00 6666 6666 If we define 2 a2 a P (1, 1) 6P (1,0) g g HE{Y1Y 2——+Y 1(1—Y 2)——— g g aeaeT g g aeaeT (81) 2 2 6 P (0,1) 6 P (0,0) +(1-Yg1)(Yg2)—i—+(1—Yg1)(1—Yg2)—£——— aeaeT aeaeT where H denotes the Hessian, equation (81) can be rewritten as n [7 lim 1 2 me” )—2 lim E[H(6())]. (82) "awng: 1 ”9(1) . . . . 6 Therefore, it remalns to show the asymptotrc normallty of the score term: %(60 ). For the sake of brevity, redefine the score as: S n (60)= _ aaQ" «(60) Then n 6P 1,1 6P 1,0 Sn(90)=% X {YlegZ g( )(90)+Ygl(1—Yg2)—%(90) g=1 (83) , an(0, 1) Pg(0 0) +(1—Yg1)Yg2—(60)+(1-Yg1)(1—Yg2)— (60)) -1 We need to show that B 2(60)Sn(60)—>N(0,IK), where 8(6) 5 lim nE[Sn (6)S,Z; (6)]. Note that the information matrix equality does not hold n—>oo here, i.e. — E[H (60 )] ¢ E[Sn (6)8;(6)], because the score terms are correlated with each other over space. In this part, we follow Pinkse and Slade (1998) and we use Bernstein's blocking methods and the McLeish’s (1974) central limit theorem for 84 dependent processes. First, define Than 2116;: (1+i7Dn, j), where 1'2 _—l, and Dn, j( j = 1,2...an) is an array of random variables on the probability triple (O, F, P). y is a real number. McLeish's (1974) central limit theorem for dependent processes requires the following four conditions (i) {Than } is unifome integrable, (II) ETnan —) 1, a 2 p (111) :14;le .—>l, J P (iv) Max anaj |—>0. jSan Now we need to define D", j in our case. Let «Ms 6 -1 Y0” - wT {___g(__02} = n 2 2:114” for implicitly define Ant- In order to prove 3(90) ’- d YOn —->N (0,1), we need to establish that the property holds for all ”m” =1 using the Cramer-Wold device. As in the proof of Theorem 1, we split the region in which observations are located up to an an area of size ,ibn x ,(bn. We also know that an increases faster than J6 and bn slower, where an and bn are integers such that anbn=n. Let an and b” be constructed such that a(,/bn )an—)0. Let I T-_ C O n 2 x [2,, < I, un1formly1nn,for some fixed 0 O _L exists such that Maxj(#/\nj) Max In 2 z Antlstnxn 2 supHAm", (85) J—— (3pr (60) (86) (o 0) M+<1—Yg2>——m0g6 (6 )} an -1 Since 8(60) is positive definite, 3(60) 2 is bounded for sufficiently large n, and we have that sup g ”Y gnll < 00 by assumption (vi) in Theorem 1. We have also proved that 86 an(1,1) aa sup g < co in Lemma 2. Therefore, we are able to prove that supHAm” < oo . _L _L Then Cbn xn 2 sup ”Am ||=0p(Cbn xn 7- )=op(1) by construction ofbn. Hence we can get that Max |Dn, j I: 010(1). jsan Second, let us discuss condition (i): {Tnan} is uniformly integrable. Following l Davidson (1994), if a random variable is integrable, the contribution to the integer of :1!“ extreme random variable values must be negligible. In other words, if E lTnan |< oo,E(|Tnan lllTnan |>K)—>O, as K -—) 00, it is equivalent to say P[supn> N lTnan |> K] = 0, for some K > O as n —) 00. Here we follow the proof of Lemma 10 in Pinkse and Slade (1998). We have that (87) P1 sup ITna |>K]=P[ sup In”: K1 ” 1‘1 ’j 88 n>N n>N ( ) sP[ sup milglnyng j)|> K] n>N J ’ ={p[ sup |chi’;1(‘/l+y2Dg,j)l>K|( sup n7|Dnj|sC)]xP[supnT|Dnj|sC1 n>N n>N,j (89) +P[ sup |H3:1(‘/1+72D’3 J.)|> K|( sup n1anj|>C)]xP[supnT|Dnj|>C]} n>N ’ n>N,j s{P[ sup In?“ /l+y203 j)|>1<|( sup nT anjlsC)]+P[supnTanj|>C] (90) n>N ’ n>N,j where C is a uniform upper bound to ZtEAnj Am. Therefore, 87 -1 P[supnr anj |> C]: P[supnT |n 2 2 Ant |> C] tEAnj (91) 1 1 r—— T—— =P[supn 2 )3 |Am|>C]sP[supn 21),, z lAnt|>C]=O tEAnj tEAnj (92) 1 T__._ since n 7— bn <1 and by construction of bn. Then, an H sup III a" <,/1+7202>I>K|( sup n an IKl=0 j= l 0,11] J n>N n>N,j n>N (93) provided we set K sufficiently large. Therefore, we proved that P[supn> N lTnan |> K ]=0:> {Tn} is uniformly integrable. Third, condition (ii) requires that ETnan —)1, which is equivalent to say that Erna" —l = 0(1); see proof in Lemma 4. . P Fourth, 1n order to prove (111): 2‘11“"1D2n j_)1, by Lemma 8, 2a.}? 132 —1=za.a1E(1)2 .)—1+0p(l) and =1 n J' "J an 2 2 an )3 15(1)” j)—1+0p(1)=E(YOn)—1— z E(Dm-Dnj)+op(1)=op(1), (94) j=1 i¢j by construction of Y0", since E(Y02n)=1. It remains to show that 88 Egg]. E (Dn iDnj) = 0(1). This condition is proved in Lemmas 5-7'0. Q.E.D. IOLemmas 5-8 are along the lines ofthose in Pinkse and Slade (1998), which are a simplified version ofthe proofs in Davidson (1994). 89 A.2 Technical Lemmas The proofs of Theorems 1-2 require the use of the following Lemmas 1-8. LEMMA lender the assumptions in Theorem I, Qn(0)—Q(6)=o [7(1) for all 668. Proof: we can rewrite Q" (6) as n 2 {Ylegthg(1,1)— Pg(1.0) - Pg(0.1) + Pg(0,0)] g=1 +Yg1rPg<1.0>- Pg(0,0)] + Ygthg(o,1)— Pg(0,0)1+ Pg(0,0)}. 91(9):},- (95) Since we assume that lim E[Qn(6)] exists, and by definition n-—)OO Q(6); lim E[Qn(6)], this implies that: Q(e)—E[Qn (e)]=o(1). In order to prove n—-)00 Qn(9)—Q(6)=0 [7(1), we only need to show that Qn(6)— E[Qn(6)]=o p(l). That is equivalent to prove that the distance between Qn (0) and E[Qn(6)]is infinitely small as n -> 00. That is: EllQn (0)— E [Qn (6)]“2 —> Gas n —-) 00, and by definition, it is equivalent to Var[Qn(6)]—)O as n—>oo. It is easy to see that V arngj [Qn (9)] = I n n n g: J: +27ng17nj3 COV(Yg1Yg29Yj2) +7ng27nj2C0V(Yg1,Yj1)+27ng27nj3COV(Yg1,Yj2)+7ng37nj300V(Yg2rYj2)r Where yng] =[Pg(l,l)—Pg(l,0)—Pg(0,1)+Pg(0,0)],)/ng2 = [Pg(l,0)—Pg(090)]’ and 90 7ng3 =[Pg(0,l)—Pg(0,0)]. The same definition applies to Ynjls 2’an and 7nj3- Note that here X 1fl+6g18g2 8g2 P 1,1 =1 00 <1>( g ¢(———)de } (97) g( ) ogfi-ngfl \/Var(eg1) Var(£g2) g2 which is not a function of Y g or Y J Hence Yngl is not a function of Y g or Y J The same logic applies to the other terms (7ng2a 7ng3a 7nj1a7nj2 and 7nj3 ). Since 05 Pg(1,1) .<_ 1, the same applies to Pg(1,0), Pg(0,1) and Pg(0,0). Therefore, it is easy to see that ll’ngils 2, and the same lYnjil» and hence lYngiYnjilS 4, i=1,2. Therefore, we can write Supngj l Var[Qn (9)1 |= lnn4YYYY 8YYY8YYY)(98) __ , . . , - + , ' n2 gE—IjE-1{ COV( g1 g2 jI j2)+ C0V( g1 g2 11) 00V( g1 g2 j2 +4cov(Yg1,Yj1)+8cov(Yg1,Yj2)+4cov(Yg2,Yj2). n n In the previous equation, firstly, let us look at the term —% Z Z 4cov(Yg1,Yj1) l n n 1 n n 4 n n ~3— z _z 4covO Let n7 2xb <1 uniform] inn for some fixed OYg21 (0,0) :1: ~ T (9 )(9 - 9 )1} 66 +(1—Ygl)(1—Yg2)l where 0* lies between 6 and 5 . In order to prove Qn ((9) is stochastically equicontinuous, it is sufficient to show that an(1,1) 1 n SUP l—YlegZZ (9) |= 019(1), (106) BEG) n g=1 (MT and the same requirement applies to other terms. For simplicity issues we just prove one of them and the rest follow the same argument. Recall that Pg(1,1) a long(Yg1 =1,Yg2 =1 ng), (107) X X and note that Pg(Yg1=1,Yg2 :1 ng)=(I)2( glfl ’A’pg ng), where (I)2 lel Qg22 is the bivariate normal distribution function. Also 94 X 6[log(D2( Xgl'B gZin an(1,1)= ,/og11’/Qg22 aeT aeT and since 19 a (IBJ) 810 (11) tan(;,1)(fl) g ’ (e)=i afl aeT an(1,1) l 6,1 an(1,1) afiT We focus first on (,6), where X 6[log1+ +¢1Vng22 pJanangzz VQg”) g 81 r—‘2 I (IQ: 0811 til‘ pg l—pé \Il—pg Jflgll Xg]/’_ ng2fl X__g_lr5_ ngZ'B Xgl_ ng2 Xg2———¢( 2224M XgZ XgZflcb [Vflgl szng )1+ +¢1Vle pVngz )IVQg11 pVngz) VQ Veg” ("Pg \I‘ Pg 1/' Pg Jogzz + (above, (118) and (119)) and even though the above expression is complicated, it is easy to see that all the terms are bounded provided the assumptions in Theorem 2 hold. This is equivalent to sup 62 Pr(Yg1 =1, ,Ygz— _ 1 |Xg)II<00 (120) g 560.5 II<00 98 6x1 p) 642(821 822 p) ”__9 g 9 g “11:11 (mg g22 x/an x/QgZZ , (121) Xger XgZfl (2’2 ¢2( Pg)x x/lel r/ngz 2 6 P Y =1,Y =1 X :> 1‘( g1 g2 l g) 5,12 {3852 2 _ —) (ETnan —1=o(1). Q.E.D. LEMMA 5. Under the assumptions in Theorem 2, Egg]. E (DniDnj) = 0(1). Proof: We know that ngjHDniDnj): Zlc'l_—’31AV-Cjz";1E(Dn1'Dnj')"X521]-E(Dm'Dnj)=0(1) if we can 101 show that Mafolznl|E(DniDnj)|=o(a,;1). This is equivalent to prove Egg]. E (DniDnj) = 0(1) because the summation over j contains an —1 terms. Define Em] as the set of indices corresponding to blocks that have 1 blocks removed from every direction from block I . In other words, we assume there are no more than 81 blocks within distance 1. Hence, an an Max 2 |E(Dni-Dnj)lS Max X X |E(DniDnj)| (128) i=1 [=1 jEEm'I Van SMax Z |E(DniDnj)|+Max Z X lE(Dm'Dnj)|- team-1 [=2 jean” (129) The first term is proved to be 0(n_1bn) = 0(a; 1) in Lemma 6. The second term can be also proved to be 0(afi 1) in Lemma 7. Q.E.D. LEMMA 6: Under the assumptions in Theorem 2, Max 2,211 E(Dm'Dnj) 1= o(n‘1bn) = otazl ). .1 Proof: Since D", j = n 2 ZtEAnj Am by definition Max 2: |E(DniDnj)l'—' Maxl'esjln_1 >2 E 0. To compute the upper bound of the correlation between i and j , we just need to consider the strongest case, e.g. the i and j are adjacent each other. By Bemsteins' blocking method, the number of (t, s) combinations that are within distance d is bounded by C2 bndz, where C2>0. Hence we can get MaxiejCln z a(dtS)SC3Maxi¢jn f1); )3 d a(d), (132) seAni,teAnj d=0 where C3 = C1C2,C4 >0. By assumption (ii) in Theorem 2, d 2(2t(d)—)0, as d 900. Therefore, C 1/b . . —1 r— 4 " 2 _ —1 C3Maxl¢jn bn Z d Q(d)—0(n b”). (133) d=0 Since anbn =n by construction, 0(n_1bn) = 0(afil). Q.E.D. LEMMA 7: Under the assumptions in Theorem 2, . . -1 Max:213? XjEEnil I E(DmDnj) l= 0(an ). Proof: Because MaijEnil X MaxseAni X MaxtEAnj |E(AnsAnt) = 0(ax/bn (14)), we have that 103 o: (— Maxlz X |E(DntDnj)|< CSMCDCIXZ #:m’l" X#AniX#Anja(\/——(l— 1)) (134) 2 163ml an M SC6n_lbgl\/;Y—a(\/Z;l)=o(n—lbnl z a(1)=o(n“b,,) (135) /= [=1 l (136) = 0(a;l ). where # denotes the number of objects, and 0(n'1bnl Z ___aln 01(1) = 0(n_lbn) follows ' 2 r1 . . d a(dd )_ from assumption (1): as d—>oo, QHED. a(d ) _ LEMMA 8: Under the assumptions in Theorem 2, 2 zjnD _1 nj=ZCJI-1_E(Dn,j)+0p(1). Proof: In order to prove Zanl D31, =ZQn1E(D’% j) + 019(1), it suffices to show that an an D2 5 Z Cov(D2 I" Dnj):0(1)' (137) i: 1j--1 n, n, We have that an an an an 2 2 Z Z C0V(D2 ,D -)= Z X {[D --E(D2 “HID -E(D2 -)]} i=1j= 1 n,i "’1 i: lj==l n,1n,j ”’1 (138) C8Van (139) 5C7 z (1+1)a(JZ;1)MaxE(D:i), [:0 where C7 , C8 > 0 are large enough. Also 104 MaxE(D:l.) .<. n_2Max z IE1Am1,Amz,Am3,Am41 t1,t2,t3,t4EAnj (140) —<— C9n‘2Max j z {01(dt1,t2)+m+a(dt3,t4)} tl,t2, t3, t4EAnj (141) SClon—ZMaxj' z {a(dt1,t2)} t1,t2€Anj (142) _2 2 CUE _2 3 _<_C11n bnMaxj 2 Z 105(1)=0(" bnh IIEAnj [=0 (143) where C9,C,O,CH,C,2 >0, Sup|2fi01a(l)|-ngz= Hg(60) (145) as n —> 00, provided that 61 - 60 = 019(1) which is proved in Theorem 1. Therefore, we can get An(é) — A(t90) = op(1). Second, we consider how to show 13,, (B) — 80%) = op(1). As before, it is sufficient to show that Bn (é) - 8(60) = op(1) as n —) 00. We know that Bn(6’o) = nE[Sn (90)S£(00 )] = nVar(Sn(190 )) given Sn (90) = 0. Recall from the proof of Theorem 2 that 1 n 6P (1,1) 6P (1,0) ; Z {YlegZ (90)+Yg1(1—Yg2) (90) g=1 f (146) GP (0,0) g (60)}. Sn(90)= 51’ 01) +(1—Yg1)Yg2 (90)+(1—Yg1)(1—Yg2) and we can rewrite it as 5Pg(1,1) 6Pg(1,0) 6Pg(0,l) 6Pg(0, O) Sn(90)=- Z {YgIYgZI 6’0)'"‘-—— 90 " (90H (90 )1 ng_1 619 66 01’ (1, 0) 51’ (0,0) Pg(0,1) 5P (0, 0) +Yg11 g6 (90)- gg (601+ 2’ng3 (90)- g6, (60)] 51’ (0,06)0 8 + 69 (60)}- (147) For the sake of brevity, we redefine 106 10 6P 0,1 612 0,0 (11)66P 6Pg(,)6 _ g( )(o)+ g( ) (148) an1=[—— 0 - aa 0 66 0 66 (90)], an2 = [aP—jg—f 5“ 0)(190)- anw ——0)-(00 )] (149) an3=[6—P—:—:g)’l) 60— an:(:0’ ———)(90 )1 (150) an4 = $1290). (151) Therefore, Var(Sn(90)) = n“Bn(60) n n 22 Z {anIan1C0V(YgIYg2 leYjZ)+2ananj2C0V(Yg1Yg2 le) 8:1}: 1 +21Yng1an3Cov(Yg1Yg2,Yj2 ) + an2‘l’nj2C0V(Y g1,Y jl ) 152 +2an2an3Cov(Y g1 , Y j2 ) + V’ng3ll’nj3C0V(Y g2 , Y j2 ), ( ) where anLV/anall/nj3 are defined similarly as anangZaV/ng} As before, we just need to provide the proof for one of these terms, and the same logic applies to other terms. We consider the most complicated term and the rest follow the same argument —1 n n '1 Z Z [ananj1C0V(Yleg2stleZ)1 gzljzl —‘1n n 'EY Y Y-Y- EY Y err-(153) —" 21 .XllllngIV/njll '( g1 g2 11 j2)— (g1 g2) (jl ]. g: J: = ¢4 0 and the other terms are bounded. Repeat the proofs to the other terms, plus the new assumption about sup g and then we can prove Bn (6) — 13(00): op(1). Q.E.D. 110 APPENDIX II TABLE 2.1: Simulation Results of Different Estimators of lambda in the Context of the Bivariate Spatial Probit Model" 2:02 2:04 2:06 2:08 HPE PMLE HPE PMLE HPE PMLE HPE PMLE N=500 mean 3.938 0.514 6.177 0.519 7.698 0.571 7.735 0.634 bias 3.738 0.314 5.777 0.319 7.098 -0029 6.935 -0.166 (s.d.) (12.158) (0.120) (15.776) (0.205) (16.929) (0.151) (16.202) (0.289) N=1000 mean 3.174 0.512 4.668 0.518 5.456 0.581 5.914 0.672 bias 2.974 0.312 4.268 0.118 4.856 -0.019 5.114 -0.128 (5.6) (8.844) (0.107) (9.100) (0.133) (9.631) (0.149)(10.173) (0.276) N=1500 mean 2.746 0.511 4.050 0.507 4.872 0.609 5.426 0.708 bias 2.546 0.311 3.650 0.107 4.272 0.009 4.626 -0092 (s.d.) (6.423) (0.099) (7.414) (0.124) (8.598) (0.149) (8.514) (0.253) " Results are presented for our new Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of 2. Numbers in brackets show standard deviations (s.d.). 111 TABLE 2.2: Simulation Results of Different Estimators of betas in the Context of the Bivariate Spatial Probit Model" fllzl 162:1 fl3=1 HPE PMLE HPE PMLE HPE PMLE N=500 >3 II 0.2 mean (SE 5.322 (8.844) 2.618 (0.839) 5.333 (8.872) 2.619 (0.855) 5.329 (8.863) 2.623 (0.870) N=1000 mean (s.d) 5.308 (7.612) 2.616 (0.560) 5.296 (7.570) 2.616 (0.560 5.289 (7.568) 2.618 (0.564) N=1500 mean (s.d.) 5.247 (6.624) 2.604 (0.540) 5.239 (6.606) 2.602 (0.536) 5.235 (6.613 2.604 (0.543) N=500 mean (s.d.) 3.610 (5.305) 1.329 (0.362 3.614 (5.311) 1.329 (0.365) 3.608 (5.290) 1.328 (0.366) N=1000 mean (s.d.) 3.600 (4.192) 1.318 (0.355) 3.593 (4.177) 1.316 (0.355) 3.588 (4.178) 1.315 (0.353) N=1500 mean (s.d.) 3.456 (3.818) 1.281 (0.342) 3.441 (3.793) 1.281 (0.343) 3.438 (3.798) 1.278 (0.339) N=500 mean (s.d.) 2.898 (3.761) 0.972 (0.271) 2.876 (3.723) 0.966 (0.268) 2.885 (3.735) 0.969 (0.271) N=1000 mean (s.d.) 2.669 (2.951) 0.981 (0.261) 2.669 (2.953) 0.979 (0.261) 2.657 (2.916) 0.978 (0.259) N=1500 mean (s.d.) 2.508 (2.726) 1.016 (0.250) 2.499 (2.706) 1.015 (0.250) 2.501 (2.708) 1.016 (0.253) N=500 mean (s.d.) 2.246 (2.810) 0.805 (0.373) 2.237 (2.803) 0.801 (0.373) 2.249 (2.841) 0.802 (0.392) N=1000 mean (s.d.) 2.098 (2.281) 0.843 (0.349) 2.096 (2.279) 0.843 (0.349) 2.082 (2.246) 0.843 (0.340) N=1500 mean (s.d.) 2.086 (2.059) 0.884 (0.316) 2.096 (2.071) 0.886 (0.314) 2.094 (2.073) 0.886 (0.318) *Results are presented for our new Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of ,8, , ,8, and ,6, . Numbers in brackets show standard deviations (s.d.). 112 BIBLIOGRAPHY Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica: 59, 3, 817-858. Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica: 41, 997-1016. Anselin, L. (1988). Spatial econometrics: methods and models. Kluwer Academic Publishers. Anselin, L. and Florax, R.J.G.M. (1995). New direction in spatial econometrics, Springer-Verlag, Berlin, Germany. Anselin, L. Florax, R.J.G.M, and Rey, IS (2004). Econometrics for spatial models: recent advances, in Advances in spatial econometrics. Springer—Verlag, Berlin, Germany,l-28. Beron, K]. and Vijverberg, WP. (2003). Probit in a spatial context: A Monte Carlo approach, in Advances in spatial econometrics. Springer-Verlag, Berlin, Germany, 169-196. Bernstein, S. (1927). Sur l'Extension du Theoreme du Calcul des Probabilities aux Sommes de Quantities Dependantes. Mathematische Annalen: 97, 1-59. Case, A.C. (1991). Spatial patterns in household demand. Econometrica: 59, 953-965. Case, A.C. (1992). Neighborhood influence and technology change. Regional Science and Urban Economics 22, 491-508. Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of Econometrics: 92, 1-45. Davidson, J. (1994). Stochastic limit theory. Oxford: Oxford University Press. Fleming, M. M. (2005). Techniques for estimation spatially dependent discrete choice models, in Advances in Spatial econometrics. Springer-Verlag, Berlin, Germany, 145-168. Gourieroux, C. (2000). Econometrics of qualitative dependent variables. Cambridge University Press. Greene, W.H. (2003). Econometrics analysis. 4th Edition, Prentice-Hall, Upper Saddle River, NJ. 113 Harvey, A. (1976). Estimating regression models with multiplicative heteroscedasticity. Econometrica : 44, 461-465. Kelejian, H.H. and Prucha, I. R. (1999). A generalized moments estimator for the autpregressive parametre in a spatial model. International Economic Review: 40, 509-533. Kelejian, H.H. and Prucha, I. R. (2001). On the asymptotic distribution of the Moran I test statistic with applications. Journal of Econometrics: 104, 219-257. Kotz, S. Balakrishnan, N. and Johnson, N. (2000). Continuous multivariate distributions, 2nd Edition. Wiley Series in Probability and Statistics. Lee, L.-F. (2004). Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica: 72, 6, 1899-1925. Lesage, J. P. (2000). Bayesian estimation of limit dependent variable spatial autoregressive models. Geographical Analysis: 32, 19-35. McLeish, D. L. (1974). Dependent Central Limit Theorems and Invariance Principals. Annals of Probability: 2, 620-628. McMillan, D. P. (1995). Spatial effects in Probit models. A Monte Carlo Investigation, in New directions in Spatial econometrics. Springer-Verlag, Berlin, Germany, 189-228. McMillan, D. P. (1992). Probit with spatial autocorrelation. Jougal of Regional Science: 32, 335-348. Mukherjea, A. and Stephens, R. (1990). The problem of identification of parameters by the distribution of the maximum random variable: solution for the trivariate normal case. Journal of Multivariate Analysis: 34, 95-115. Newey, W.K. and West, K. D. (1987). A simple, positive semi-definite, Heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica: 55, 703-308. Newey, W.K and Mcfadden, D. (1994). Large sample estimation and hypothesis testing, in Handbook of Econometrics, Ch. 36, Vol 4, North-Holland, New York. Pinkse, J. Shen L. and Slade, M. E. (2007). A central limit theorem for endogenous locations and complex spatial interactions. Journal of Econometrics: 140, 215-225. Pinkse, J and Slade, M. E. (1998). Contracting in space: An application of spatial statistics to discrete-choice models. Journal of Econometrics: 85, 125-154. 114 Plackett, R.L. (1954). A reduction formula for normal multivariate integrals. Biometrika: 41, 351-360. Poirier, D. and Ruud, P. A. (1988). Probit with dependent observations. Review of Economic Studies: 55, 593-614. Robinson, P. M. (1982). On the asymptotic properties of estimators of models containing limit dependent variables. Econometrica: 50, 27-41. White, H. (2001). Asymptotic theog for econometricians. 2nd Edition. Orlando, FL. Academic Press. Wooldridge, J. (2002). Econometric analysis of cross section and panel data. The MIT Press, Cambridge, Massachusetts. 115